CSE Technical Reports Sorted by Technical Report Number

TR Number Title Authors Date Pages

CSE-TR-550-09 TrapperKeeper: Using Virtualization to Add Type-Awareness to File Systems Daniel Peek and Jason Flinn May, 2009 14

CSE-TR-551-09 Analysis of the Green Dam Censorware System Scott Wolchok, Randy Yao, and J. Alex Halderman June, 2009 4

CSE-TR-552-09 Remote Fingerprinting and Exploitation of Mail Server Antivirus Engines Jon Oberheide and Farnam Jahanian June, 2009 8
Vulnerabilities within antivirus engines deployed at a mail server represent a serious risk to the security of organizations. If a sophisticated attacker is able to remotely probe a mail server and identify the particular antivirus engine used, he may craft a malformed message to exploit the engine with a low risk of detection. This paper explores how much information is exposed by these mail servers, how this information could be used effectively by an attacker, and what can be done to limit the exposure and reduce the impact of antivirus vulnerabilities. Towards this goal, we introduce and evaluate three techniques that can expose the vendor and version of antivirus software used by a mail server: message bouncing, detection probing, and iterative refinement. Through a series of experiments, we determine that over 28% of bounced messages expose information, 78% of response messages and 16% of delivered messages leak information through detection probing, and demonstrate the effectiveness of iterative refinement with a dataset of over 7200 malware samples and antivirus engines from 10 popular vendors. We also show that the most commonly deployed mail server antivirus engine is the most vulnerable engine and is one of the top offenders in exposing information. Finally, we conclude by suggesting methods of reducing the amount of exposure and discuss isolation techniques that may provide greater security.

CSE-TR-553-09 Splash: Integrated Ad-Hoc Querying of Data and Statistical Models Lujun Fang and Kristen LeFevre July, 2009 13
This paper presents a system called Splash, which integrates statistical modeling and SQL for the purpose of ad-hoc querying and analysis. Splash supports a novel, simple, and practical abstraction of statistical modeling as an aggregate function, which in turn provides for natural integration with standard SQL queries and a relational DBMS. In addition, we introduce and implement a novel representatives operator to help explain statistical models using a limited number of representative examples. We present a proof-of-concept implementation of the system, which includes several performance optimizations. An experimental study indicates that our system scales well to large input datasets. Further, to demonstrate the simplicity and usability of the new abstractions, we conducted a case study using Splash to perform a series of exploratory analyses using network log data. Our study indicates that the query-based interface is simpler than a common data mining software package, and for ad-hoc analysis, it often requires less programming effort to use.

CSE-TR-554-09 An Online Framework for Publishing Dynamic Privacy-Sensitive Location Traces Wen Jin, Kristen LeFevre and Jignesh M Patel July, 2009 13
This paper considers the problem of protecting individual anonymity when continuously publishing a stream of location trace information collected from a population of users. Fundamentally, the key challenge that arises in this setting is the presence of evolving data, and in particular, data that evolves in semi-predictable ways. The main contribution of this paper is the first comprehensive formal framework for reasoning about privacy in this setting. Through careful analysis of the expected threat, we articulate a new privacy principle called temporal unlinkability. Then, by incorporating a model of user motion, we are able to quantify the risk of privacy violations probabilistically. Within this framework, we develop a simple initial set of algorithms for continuous publishing, and we demonstrate the feasibility of the approach using both real and synthetic location data.

CSE-TR-555-09 Offline Symbolic Analysis for Multi-Processor Execution Replay Dongyoon Lee, Satish Narayanasamy, Mahmoud Said, and Zijiang (James) Yang July, 2009 20
Ability to replay a programís execution on a multi-processor system can significantly help parallel programming. To replay a shared-memory multi-threaded program, existing solutions record the program input (I/O, DMA, etc.) and the shared-memory dependencies between threads. Prior processor based record-and-replay solutions are efficient, but they require non-trivial modifications to the coherency protocol and the memory sub-system for recording the shared-memory dependencies. In this paper, we propose a processor-based record-and-replay solution that does not require detecting and logging shared-memory dependencies to enable multi-processor replay. It is based on our insight that, a load-based checkpointing scheme that records the program input has sufficient information for deterministically replaying each thread. We propose an offline symbolic analysis algorithm based on a SMT solver that determines the shared-memory dependencies using just the program input logs during replay. In addition to saving log space, the proposed approach significantly reduces the hardware support required for enabling replay.

CSE-TR-556-09 Anatomizing Application Performance Differences on Smartphones Junxian Huang, Qiang Xu, Z. Morley Mao, Ming Zhang, Paramvir Bahl September, 2009 13
The widespread deployment of 3G technologies and the rapid adoption of new smartphone devices like iPhone and Blackberry are making cellular data networks increasingly popular. In addition to email and Web browsing, a variety of different network applications are now available, making smartphones potentially reasonable substitute for their desktop counterparts. Unfortunately, the performance of smartphone applications, from the perspective of the users and application developers, is still not well understood. We believe our study, the first of its kind, fills this void. We identify and study important factors that impact user perceived performance. We formalize the method for comparing the performance of smartphone applications along several unique dimensions such as carrier network, device capabilities, application types, and network protocols. To ensure a fair comparison across platforms and networks we develop a detailed measurement methodology. Our work is an important and necessary step towards understanding the performance of smartphone applications from users and application developers perspective. Our analysis culminates with a set of recommendations that can lead to better application design and infrastructure support for smartphone users.

CSE-TR-557-09 Maestro: Orchestrating Lifetime Reliability in Chip Multiprocessors Shuguang Feng, Shantanu Gupta, Amin Ansari, and Scott Mahlke November , 2009 10

CSE-TR-558-09 Charlatans' Web: Analysis and Application of Global IP-Usage Patterns of Fast-Flux Botnets Matthew Knysz, Xin Hu, and Kang G. Shin November , 2009 22
Botnet-based hosting or redirection/proxy services provide botmasters with an ideal platform for hosting malicious and illegal content while affording them a high level of misdirection and protection. Because of the unreliable connectivity of the constituent bots (typically compromised home computers), domains built atop botnets require frequent updates to their DNS records, replacing the IPs of offline bots with online ones to prevent a disruption in (malicious) service. Consequently, their DNS records contain a large number of constantly-changing (i.e., ``fluxy") IPs, earning them the descriptive moniker of fast-flux domains---or, when both the content and name servers are fluxy, double fast-flux domains. In this paper, we study the global IP-usage patterns exhibited by different types of malicious and benign domains, including single and double fast-flux domains. We have deployed a lightweight DNS probing engine, called DIGGER, on 240 PlanetLab nodes spanning 4 continents. Collecting DNS data for over 3.5 months on a plethora of domains, our global vantage point enabled us to identify distinguishing behavioral features between them based on their DNS-query results. We have quantified these features and demonstrated their effectiveness for detection by building a proof-of-concept, multi-leveled SVM classifier capable of discriminating between five different types of domains with minimal false positives. We have also uncovered new, cautious IP-management strategies currently employed by criminals to evade detection. Our results provide insight into the current global state of fast-flux botnets, including the increased presence of double fast-flux domains and their range in implementation. In addition, we expose potential trends for botnet-based services, uncovering previously-unseen domains whose name servers alone demonstrate fast-flux behavior.

CSE-TR-559-09 Detection of Botnets Using Combined Host- and Network-Level Information Yuanyuan Zeng, Xin Hu, and Kang G. Shin December, 2009 11
Bots are coordinated by a command and control (C&C) infrastructure to launch such attacks as Distributed-Denial-of-Service (DDoS), spamming, identity theft and phishing, all of which seriously threaten the Internet services and users. Most contemporary botnet-detection approaches have been designed to function at the network level, requiring the analysis of packetsí payloads. However, analyzing packetsí payloads raises privacy concerns and incurs large computational overheads. Moreover, network traffic analysis alone can seldom provide a complete picture of botnetsí behavior. By contrast, general in-host detection approaches are useful to identify each botís host-wide behavior, but are susceptible to the host-resident malware if used alone. To address these limitations, we account for both the coordination within a botnet and the malicious behavior each bot exhibits at the host level, and propose a C&C protocol-independent detection framework that combines both host- and network-level information for making detection decisions. This framework clusters similarly-behaving hosts into groups based on network-flow analysis without accessing packetsí payloads, and then correlates the clusters with each individualís in-host behavior for validation. The framework is shown to be effective and incurs low false-alarm rates in detecting various types of botnets.

CSE-TR-560-09 On Detection of Storm Botnets Yuanyuan Zeng, Kang G. Shin December, 2009 7
A botnet, which is a group of compromised and remotely controlled computers (also called bots), poses a serious threat to the Internet. The commonly-used command and control (C&C) channel for a botnet is used by a central server, such as IRC or HTTP. Recently, Storm botnet, a P2P-based botnet with a decentralized C&C channel has appeared in the wild. In this paper, we propose a distributed approach to detecting Storm botnets at the network level. Our approach is composed of two stages. First, we identify P2P and SMTP packets from each hostís traffic. Second, we use a machine learning technique to differentiate Storm from benign P2P traffic based on several distinguishing traffic attributes. Both of the two stages only require packet header information without analyzing payloads. Our evaluation has shown the detection strategy to be effective with low false alarm rates.

Technical Reports Page