CSE Technical Reports Sorted by Technical Report Number
|Existing approaches to providing high availability (HA) for virtualized environments require a backup VM for every primary running VM. These approaches are expensive in memory because the backup VM requires the same amount of memory as the primary, even though it is normally passive. In this paper, we propose a storage-based, memory-efficient HA solution for VMs, called HydraVM, that eliminates the passive memory reservations for backups. HydraVM maintains
a complete, recent image of each protected VM in shared storage using an incremental checkpointing technique. Upon failure of a primary VM, a backup can be promptly restored on any server with available memory. Our evaluation results have shown that HydraVM provides protection for VMs at a low overhead, and can restore a failed VM within 1.6 seconds without excessive use of memory resource in a virtualized environment.
|Multithreaded programs can have subtle errors that result from undesired interleavings of concurrent threads. A common technique programmers use to prevent these errors is to ensure that certain blocks of code are atomic. A block of code is atomic if every execution is equivalent to a serial execution in which no other thread's instructions are interleaved with the code. Atomic blocks of code are amenable to sequential reasoning and are therefore significantly simpler to analyze, verify, and maintain.
This paper presents a system for automatically detecting atomicity violations in Java programs without requiring any specifications. Our system infers which blocks of code must be atomic and detects violations of atomicity of those blocks. The paper first describes a synchronization pattern in programs that is likely to indicate a violation of atomicity. The paper then presents a static analysis for detecting occurrences of this pattern.
We tested our system on over half a million lines of popular open source programs, and categorized the resulting atomicity warnings. Our experience demonstrates that our system is effective. It successfully detects all the previously known atomicity errors in those programs as well as several previously unknown atomicity errors. Our system also detects badly written code whose atomicity depends on assumptions that might not hold in future versions of the code.
|Users’ mental models of privacy and visibility in social networks often involve natural subgroups, or communities, within their local networks of friends. Such groupings are not always explicit, and existing policy comprehension tools, such as Facebook’s Audience View, which allows the user to view her profile as it appears to each of her friends, are not naturally aligned with this mental model. In this paper, we introduce PViz, an interface and system which corresponds more directly with the way users model groups and privacy policies applied to their networks. PViz allows the user to understand the visibility of her profile according to natural sub-groupings of friends, and at different levels of granularity. We conducted an extensive user study comparing PViz to current privacy comprehension tools (Facebook’s Audience View and Custom Settings page). Despite requiring users to adapt to new ways of exploring their social spaces, our study revealed that PViz was comparable to Audience View for simple tasks, and provided a significant improvement for more complex, group based tasks.
|Over a few short years, the Internet has grown to play an integral part of daily economic, social, and political life in most countries. From the Egyptian ``Velvet Revolution' to the last US presidential campaign, Internet communication shapes public opinion and fosters social change. But despite its immense social importance, the Internet has proven remarkably susceptible to disruption and manipulation, including government induced multi-week outages (e.g. Libya and Egypt) and multi-year campaigns by autocratic regimes to render web sites and large address blocks unreachable. While parents, enterprises, and governments have always placed restrictions on end-user communication to meet social or legal goals we argue recent years have seen the beginning of a new trend---the co-option of the Internet infrastructure itself to affect large-scale censorship. In this paper, we use Internet routing, allocation, and backbone traffic statistics to explore several recent and ongoing infrastructure-based efforts to disrupt Internet communication. We focus our discussion on the risks of this infrastructure corruption trend to long-term evolution of the Internet.
|Low-latency, low-overhead and reliable control channels are essential to the efficient operation of wireless networks. However, control channels that utilize cur- rent in-band and out-of-band designs do not fully meet this requirement. In this paper, we design and implement Aileron, a novel control channel based on automatic modulation recognition that carries control frames over an OFDM(A) PHY by varying the modulation rate of the OFDM subcarriers. Under Aileron, the control information is embedded into the modulation type, not as the actual symbol value. Aileron has three important advantages: (a) control frame exchange without frame synchronization, (b) signaling with low bandwidth overhead, and (c) resilience to channel errors.
We have evaluated Aileron using both extensive simulations and real-world measurements, and discovered that control frames can be transmitted with more than 80% accuracy using only 10 OFDM blocks on a channel with SNR of merely 10dB.
|Troubleshooting the performance of complex production software is challenging. Most existing tools, such as profiling, tracing, and logging systems, reveal "what" events occurred during performance anomalies. However, the users of such tools must then infer "why" these events occurred during a particular execution; e.g., that their
execution was due to a specific input request or configuration setting. Because manual root cause determination is time-consuming and difficult, this paper introduces performance summarization, a
technique for automatically inferring the root cause of performance problems. Performance summarization first attributes performance
costs to fine-grained events such as individual instructions and system calls. It then uses dynamic information flow to determine the
probable root causes for the execution of each event. The cost of each event is assigned to root causes according to the relative probability that the causes led to the execution of that event. Finally, the total cost for each root cause is calculated by summing the percause costs of all events. This paper also describes a differential
form of performance summarization that compares two activities. We have implemented a tool called X-ray that performs performance summarization. Our experimental results show that X-ray accurately diagnoses 14 performance issues in the Apache HTTP
server, Postfix mail server and PostgreSQL database, while adding only 1â€“7% overhead to production systems.
|We present new nearest neighbor methods for text classication and an evaluation of these methods against the existing nearest neighbor methods as well as other well-known text classication algorithms. Inspired by the language modeling approach to information retrieval, we show improvements in k-nearest neighbor (kNN) classication by replacing the classical cosine similarity with a KL divergence based similarity measure. We also present an extension of kNN to the semi-supervised case which turns out to be a formulation that is equivalent to semi-supervised learning with harmonic functions. In both supervised and semi-supervised experiments, our algorithms surpass traditional nearest neighbor methods and produce competitive results when compared to the state-of-the-art methods such as Support Vector Machines (SVM) and transductive SVM on the Reuters-21578 dataset, the 20 Newsgroups dataset, and the Reuters Corpus Volume I (RCV1) dataset. To our knowledge, this paper presents one of the most comprehensive evaluation of dierent machine learning
algorithms on the entire RCV1 dataset.
|Software developers often find databases difficult to work with because they are not documented properly. When documentation does exist, it is often inaccurate, out-of-date,
and/or unclear. In this paper, we present ModelDoc â€“ an extension on MediaWiki â€“ as a new approach to documenting databases and, potentially, other aspects of an application. ModelDoc will auto-generate documentation from a
live data source, and it then continually auto-regenerates that documentation as the data source changes. The documentation is consistent and kept up-to-date, but also exists along the standard wiki functionality, allowing collaboration throughout the evolution of the database.
Technical Reports Page