#### CSE Technical Reports Sorted by Technical Report Number

TR Number |
Title |
Authors | Date | Pages |

CSE-TR-446-02 |
HYPE: A Hybrid Power Estimation Method for Programmable Systems |
Liu and Papaefthymiou | Jan, 02 | 11 |

In this paper, we present a novel power estimation scheme for programmable systems consisting of predesigned datapath and memory components. The proposed hybrid methodology yields highly accurate estimates within short runtimes by combining high-level behavioral simulation with analytical macromodeling of circuit characteristics. Given a system of predesigned components along with its initial state and a program to execute, behavioral level simulation is used to derive and analyze all control signals and all data signals at the datapath/memory interface. Within the datapath itself, an iterative procedure is used in conjunction with analytical output macromodeling to compute signal statistics, as opposed to actual signals, at the inputs of its constituent logic components. These statistics are then applied to analytical power macromodels to obtain the dissipation of the datapath and the global interconnects. Our approach is theoretically sound and yields fast and accurate estimates in practice. Using Banachs fixed point theorem, we prove that, under certain sufficient conditions on the output macromodels, the iterative procedure always converges to a unique point. We have implemented our hybrid technique into a power estimation tool called HyPE and used it to explore various architectural alternatives in the design of a 256-state Viterbi decoder and a Rijndael encryptor. For designs with close to 1 million transistors, our estimator terminates within 10 seconds. Compared with state-of-the-art industrial gate-level power estimators, the proposed methodology is about 1,000 times faster with 5.4% error on average. |

CSE-TR-447-02 |
Predicting Last-Touch References under Optimal Replacement |
Lin and Reinhardt | Jan, 02 | 17 |

Effective cache replacement is becoming an increasingly important issue in cache hierarchy design as large set-associative caches are widely used in high-performance systems. This paper proposes a novel approach to approximate the decisions made by an optimal replacement algorithm (OPT) using last-touch prediction. The central idea is to identify, via prediction, the final reference to a cache block before the block would be evicted under OPT—the “OPT last touch”. Given perfect prediction, replacing the referenced block immediately after each OPT last touch would give optimal replacement behavior. This paper evaluates the feasibility of this approach by studying, at a fundamental level, the predictability of OPT last-touch references, and the applicability of these predictions to improving replacement decisions. We show that trace-based predictors, previously proposed for LRU last-touch prediction, can predict OPT last touches as well. We enhance these predictors using future information, but find that their performance degrades significantly as cache size and associativity increase. We introduce a new class of predictors based on last-touch history, which significantly outperform trace-based predictors on large set-associative secondary caches. Across eight SPEC CPU2000 benchmarks on a 1MB 16-way associative secondary cache, an idealized history-based OPT last-touch predictor can potentially eliminate 39% of LRU misses—eliminating 63% of the gap between LRU and OPT. |

CSE-TR-448-02 |
The FMSAT Satisfiability Solver: Hypergraph Partitioning meets Boolean Satisfiability |
Ramani and Markov | Jan, 02 | 6 |

This report is intended to present the Fiduccia Mattheyses (FM) heuristic for hypergraph bipartitioning as a method for solving large Boolean satisfiability (SAT) instances. We hope to extend the success of the FM heuristic in the partitioning domain to satisfiability. This report outlines how a SAT problem can be viewed as a partitioning problem, which argues for the applicability of FM. Further, we present the software architecture and implementation of our SAT solver, inspired by a previous, highly successful implementation of FM for hypergraph bipartitioning. We briefly discuss experimental results that justify our claim that the FM heuristic shows promise in the satisfiability domain. |

CSE-TR-449-02 |
Visual Servoing via Navigation Functions |
Cowan Weingarten and Koditschek | Feb, 02 | 31 |

This technical report presents a framework for visual servoing that guarantees convergence to a visible goal from most initially visible configurations while maintaining full view of all the feature points along the way. The method applies to first and second order fully actuated plant models. The solution entails three components: a model for the "occlusion-free" workspace; a change of coordinates from image to model coordinates; and a navigation function for the model space. We present three example applications of the framework, along with experimental validation of its practical efficacy. |

CSE-TR-450-02 |
Optimization-Based Multicast Flow Control Using Virtual M-Ary Feedback |
Zhang and Shin | Feb, 02 | 15 |

CSE-TR-451-02 |
Majority-Based Decomposition of Carry Logic in Binary Adders |
Nazhandali and Sakallah | Mar, 02 | 14 |

We introduce a new carry lookahead adder architecture based on a decomposition of the carry logic into a network consisting exclusively of majority gates. We call adders with this architecture “M&M” adders to distinguish them from adders based on the conventional Generate/Propagate, or “G&P,” architecture. We show how M&M and G&P adders are related, and characterize optimal realizations of M&M adders. |

CSE-TR-452-02 |
Design and Analysis of a Flipping Controller for RHex |
Saranli and Koditschek | Mar, 02 | 18 |

We report on the design and analysis of a controller that can achieve dynamical self-righting of our hexapedal robot, RHex. We present an empirically tuned controller that works reasonably well on indoor surfaces, using a hybrid energy pumping strategy to overcome torque limitations of its actuators. Subsequent modeling and analysis yields a new controller with a much wider domain of success as well as a preliminary understanding of the hybrid control strategy. Simulation results demonstrate the superiority of the improved control strategy relative to the first generation empirically designed controller. |

CSE-TR-453-02 |
Registration of Range Images in the Presence of Occlusions and Missing Data |
Sharp Lee and Wehe | Mar, 02 | 16 |

We describe a method for the registration of range images in the presence of occlusions and missing data. Our approach uses the missing data as an additional clue for matching. Each point in both images, valid or missing, is classified and scored within a maximum likelihood framework. Occlusions, shadows, noise, partially overlapping views, and outliers are all considered in within this framework. We found that our registration method can be used to register real scenes with complex geometry, discontinuities, missing data, and overlap as small as ten percent. |

CSE-TR-454-02 |
Towards Capturing Representative AS-Level Internet Topologies |
Chang Govindan Jamin Shenker and Willinger | Mar, 02 | 14 |

Recent studies concerning the Internet connectivity at the AS level have attracted considerable attention. These studies have exclusively relied on the BGP data from Oregon route-views~cite{Oregon} to derive some unexpected and intriguing results. The Oregon route-views data sets reflect AS peering relationships, as reported by BGP, seen from a handful of vantage points in the global Internet. The possibility that these data sets from Oregon route-views may provide only a very sketchy picture of the complete inter-AS connections that exist in the actual Internet has received surprisingly little scrutiny. In this paper, we will use the term ``{em AS peering relationship}' to mean that there is ``at least one direct router-level connection' between two existing ASs, and that these two ASs agree to exchange traffic by enabling BGP between them. By augmenting the Oregon route-views data sets with BGP summary information from a large number of Internet {em Looking Glass/} sites and with routing policy information from Internet Routing Registry (IRR) databases, we find that (1) a significant number of existing AS connections remain hidden from most BGP routing tables, (2) the AS connections to tier-1 ASs are in general more easily observed than those to non tier-1 ASs, and (3) there are at least about 25--50% more AS connections in the Internet than commonly-used BGP-derived AS maps reveal (but only about 2% more ASs). These findings point out the need for an increased awareness of and a more critical attitude toward the applicability and completeness of given data sets at hand when establishing the generality of any particular observations about the Internet. |

CSE-TR-455-02 |
A Markov Chain Sequence Generator for Power Macromodeling |
Liu and Papaefthymiou | Apr, 02 | 14 |

In macromodeling-based power estimation, circuit macromodels are created from simulations of synthetic input vector sequences. Fast generation of these sequences with all possible statistics is crucial for ensuring the accuracy and speed of macromodeling. In this paper, we present a novel sequence generator based on a Markov chain model. Specifically, we formulate the problem of generating a sequence of vectors with given statistics as a transition matrix computation problem, in which the matrix elements are subject to constraints derived from the specified statistics. We also present a practical heuristic that computes such a matrix and generates a sequence of l n-bit vectors in O(nl+n*n) time. Our generator is guaranteed to yield vector sequences with a given average input probability p, average transition density d, and spatial correlation s, or reports that the specified sequence type does not exist. Derived from a strongly mixing Markov chain, it generates binary vector sequences with accurate statistics, high uniformity, and high randomness. Experimental results show that our sequence generator can cover more than 99% of the parameter space. Sequences of 2,000 48-bit vectors are generated in less than 0.05 seconds, with average deviations of the signal statistics p, d, and s equal to 1.0%, 0.6%, and 0.6%, respectively. Our generator enables the detailed study of power macromodeling. Using our tool and the ISCAS-85 benchmark circuits, we have assessed the power sensitivities of three input statistics. Our investigation has revealed that power is most sensitive to transition density, while only occasionally exhibiting high sensitivity to signal probability and spatial correlation. We have also investigated the power estimation error resulting from input signal imbalance. Our experiments show that signal imbalance can cause estimation error as high as 100% in extreme cases, although errors are usually within 25%. |

CSE-TR-456-02 |
Inet-3.0: Internet Topology Generator |
Winick and Jamin | May, 02 | 18 |

In this report we present version 3.0 of Inet, an Autonomous System (AS) level Internet topology generator. Our understanding of the Internet topology is quickly evolving, and thus, our understanding of how synthetic topologies should be generated is changing too. We document our analysis of Inet-2.2, which highlighted two shortcommings in its topologies. Inet-3.0 improves upon Inet-2.2s two main weaknesses by creating topologies with more accurate degree distributions and minimum vertex covers as compared to Internet topologies. We also examine numerous other metrics to show that Inet-3.0 better approximates the actual Internet AS topology than does Inet-2.2. Inet-3.0s topologies still do not well represent the Internet in terms of maximum clique size and clustering coefficient. These related problems stress a need for a better understanding of Internet connectivity and will be addressed in future work. |

CSE-TR-457-02 |
Satometer: How Much Have We Searched? |
Aloul Sierawski and Sakallah | Jun, 02 | 17 |

We introduce Satometer, a tool that can be used to estimate the percentage of the search space actually explored by a backtrack SAT solver. Satometer calculates a normalized minterm count for those portions of the search space identified by conflicts. The computation is carried out using a zero-suppressed binary decision diagram (ZBDD) data structure and can have adjustable accuracy. The data provided by Satometer can help diagnose the performance of SAT solvers and can shed light on the nature of a SAT instance. |

CSE-TR-458-02 |
The Phi-Calculus - A Hybrid Extension of the Pi-Calculus to Embedded Systems |
Rounds and Song | Jun, 02 | 27 |

In this paper we present a new language which allows concurrent programs to interact with arbitrary continuous environments. It is an extension of the powerful pi-calculus of Milner, which already provides for concurrency and mobility. Our contribution adds the notion of *active environments* which can specify flows over continuous time using ordinary differential equations. We provide examples of robotic assembly systems; we show how Petri nets can be expressed in the calculus, and we prove ``context theorems' showing that bisimilar phi-calculus processes flow the same way in the same active environment. |

CSE-TR-459-02 |
Dynamic Leakage-Energy Management of Secondary Caches |
Hallnor and Reinhardt | Jun, 02 | 16 |

Static leakage currents are the prime contributer to energy consumption for large on-chip secondary/tertiary caches. This energy cost can be reduced by dynamically disabling unneeded portions of the cache. To provide overall energy savings, however, the leakage-energy reduction must exceed the extra energy incurred due to additional off-chip accesses and increased runtime. We derive a practical formula for on-line estimation of net energy reduction based on the number of additional misses incurred and the ratio of off-chip access energy to per-cycle cache leakage energy. We estimate reasonable values for this ratio by measuring the actual off-chip access energy cost on a current system. We incorporate our estimation formula into a previously proposed dyamic resizing scheme, preventing it from increasing net energy consumption and improving its overall effectiveness. We also present a new resizing framework that tracks misses and controls power at the granularity of asssociative ways. Per-way miss counters enable simultaneous estimation of net energy consumption at all possible cache sizes, allowing direct identification of the minimum-energy configuration. This structure also greatly reduces hardware overhead compared to block-level resizing schemes. We call the combination of this framework with our estimation formula Energy Aware Bank-Level Resizing (EnABLR). Simulations show that, across the entire SPEC CPU2000 suite, EnABLR reduces secondary-cache leakage by up to 80%, without increasing overall memory-system energy consumption by more than 1% in any situation. Average energy reductions range from 14% to 29% depending on off-chip access energy costs. |

CSE-TR-460-02 |
Remora: A Dynamic Self-Tuning Processor |
Weaver Gebara Austin and Brown | Jul, 02 | 8 |

CSE-TR-461-02 |
Generic ILP Versus Specialized 0-1 ILP: An Update |
Aloul Sakallah and Markov | Aug, 02 | 10 |

CSE-TR-462-02 |
Improved Score Following for Acoustic Music |
Pardo and Birmingham | Aug, 02 | 4 |

CSE-TR-463-02 |
Solving Difficult Instances of Boolean Satisfiability in the Presence of Symmetry |
Aloul Ramani Markov and Sakallah | Aug, 02 | 10 |

Research in algorithms for Boolean satisfiability (SAT) and their implementations has recently outpaced benchmarking efforts. Most of the classic DIMACS benchmarks can now be solved in seconds on commodity PCs. More recent benchmarks take longer to solve due of their large size, but are still solved in minutes. Yet, small and difficult SAT instances must ex-ist if P != NP. To this end, our work articulates SAT instances that are unusually difficult for their size, including satisfiable instances derived from Very Large Scale Integration (VLSI) routing problems. With an efficient implementation to solve the graph automorphism problem , we show that in structured SAT instances difficulty may be associated with large numbers of symmetries. We point out that a previously published symmetry-detection mechanism based on a reduction to the graph automorphism problem often produces many spurious symmetries. Our work contributes two new reductions to graph automorphism, which detect all correct symmetries detected previously as well as phase-shift symmetries not detected earlier. The correctness of our reductions is rigorously proven, and they are evaluated empirically. We also formulate an improved construction of symmetry-breaking clauses in terms of permutation cycles and propose to use only generators of symmetries in this process. These ideas are implemented in a fully automated flow that first detects symmetries in a given SAT instance, pre-processes it by adding symmetry-breaking clauses and then calls a state-of-the-art backtrack SAT solver. Significant speed-ups are shown on many benchmarks versus di-rect application of the solver. In an attempt to further improve the practicality of our approach, we pro-pose a scheme for fast opportunistic symmetry detection and also show that considerations of symmetry may lead to more efficient reductions to SAT in the VLSI routing domain. |

CSE-TR-464-02 |
A Local Convergence Proof for the minvar Algorithm for Computing Continuous Piecewise Linear Approximations* |
Groff Khargonekar and Koditschek | Sep, 02 | 29 |

The class of continuous piecewise linear (PL) functions represents a useful family of approximants because invertibility can be readily imposed, and if a PL function is invertible, then it can be inverted in closed form. Many applications, arising for example in control systems and robotics, involve the simultaneous construction of a forward and inverse system model from data. Most approximation techniques require that separate forward and inverse models be trained, whereas an invertible continuous PL affords, simultaneously, the forward and inverse system model in a single representation. The minvar algorithm computes a continuous PL approximation to data. Local convergence of minvar is proven for the case when the data generating function is itself a PL function and available directly rather than through data. |

CSE-TR-465-02 |
Operating System Extensions to Support Host Based Virtual Machines |
King and Chen | Sep, 02 | 14 |

This paper presents two new operating system extensions that facilitate the fast execution of host based virtual machines: KTrace and MMA. KTrace provides a convenient and efficient mechanism for writing kernel modules that control the execution of user-level processes. MMA exposes more detail of the underlying memory management hardware, providing applications with access to high performance intra-address space protection and address space overloading functionality. These extensions are applied to UMLinux, an x86 virtual machine that implements Linux as a user-mode process. As a result, overhead for UMLinux is reduced from 33% for a CPU bound workload and 819% - 1240% for system call and I/O intensive workloads, to 6% and 24% - 49% respectively |

CSE-TR-466-02 |
Image-Based Illumination for Electronic Display of Artistic Paintings |
Sharp Lee Ju Yoo | Sep, 02 | 7 |

Visual impressions from two-dimensional artistic paintings greatly vary under different illumination conditions, but this effect has been largely overlooked in most poster productions and electronic display. The light-dependent impressions are more pronounced in oil paintings and they arise mainly from the non-diffuse specular reflectances. We present an efficient method of representing the variability of lighting conditions on artistic paintings utilizing both simple empirical reflectance models and an image-based lighting method. The Lambertian and Phong models account for a significant portion of image variations depending on illumination directions, and residual intensity and color variations that cannot be explained by the reflection models are processed in a manner that is similar to the image-based lighting methods. Our technique allows brush strokes and paint materials to be clearly visible with relatively low data dimensionality. |

CSE-TR-467-02 |
Towards a factored analysis of legged locomotion models |
Altendorfer Koditschek and Holmes | Sep, 02 | 9 |

In this paper, we report on a new stability analysis for hybrid legged locomotion systems based on factorization of return maps. We apply this analysis to a family of models of the Spring Loaded Inverted Pendulum (SLIP) with different leg recirculation strategies. We obtain a necessary condition for the asymptotic stability of those models, which is formulated as an exact algebraic expression despite the non-integrability of the SLIP dynamics. We outline the application of this analysis to other models of legged locomotion and its importance for the stability of legged robots and animals. |

CSE-TR-468-02 |
Effect of Node Size on the Performance of Cache-Conscious Indices |
Hankins and Patel | Oct, 02 | 21 |

In main-memory environments, the number of processor cache misses has a critical impact on the performance of the system. Cache-conscious indices are designed to improve the performance of main-memory indices by reducing the number of processor cache misses that are incurred during a search operation. Conventional wisdom suggests that the indexs node size should be equal to the cache line size in order to minimize the number of cache misses and improve performance. As we show in this paper, this design choice ignores additional effects, such as instruction count, which play a significant role in determining the overall performance of the index. Using analytical models and a detailed experimental evaluation, we investigate the effect of the indexs node size on two common cache-conscious indices: a cache-conscious B+-tree (CSB+-tree), and a cache-conscious extendible hash index. We show that using node sizes much larger than the cache line size can result in better search performance for the CSB+-tree. For the hash index, reducing the number of overflow chains is the key to improving search performance, even if it requires using a node size that is much larger than the cache line size. Extensive experimental evaluation demonstrates that these node size choices are valid for a variety of data distributions, for range searches on the CSB+-tree, and can also be used to speed up the execution of traditional hash-based query operators that use memory-resident indices internally. |

CSE-TR-469-02 |
Enhanced Wired Equivalent Privacy for IEEE 802.11 Wireless LANs |
Park Wang Cho and Shin | Nov, 02 | 21 |

The Wired Equivalent Privacy (WEP) is defined as part of the IEEE 802.11 standard to provide secure communication over a wireless channel. However, it suffers serious security flaws, such as the vulnerability of RC4 to keystream reuse and the misuse of CRC checksum in ensuring data integrity. In this paper, we design, implement, and evaluate a software (middleware) approach, which runs on top of WEP, to fill the security holes of WEP. The core of this middleware is a novel key-management protocol in which (1) to minimize the possibility of keystream reuse, the message-keys for data encryption are frequently refreshed; (2) to achieve secure exchange of message-keys, we append the Hash Message Authentication Code (HMAC) to each message-key, and then encrypt it with a base-key; (3) to provide reliable key-management service, we set up a hidden TCP connection and develop the corresponding daemons at the access point (AP) and a mobile node; and (4) to provide a mobile node with a new base-key and a message-key when the node moves from one microcell to another, we realize ``secure roaming' based on Inter-Access Point Protocol (IAPP). Furthermore, to ensure data integrity at the data-link layer, each data frame is augmented with HMAC. By setting the key-refreshment interval judiciously, we can reduce the rate of keystream reuse to an acceptably low level. More importantly, any compromised message-key becomes unusable after a single refreshment interval, and it does not reveal any information about the future message-keys. Our performance evaluation shows that the computation overhead of the proposed scheme is insignificant even when data is transferred at the maximum rate, and it is feasible for an AP to maintain hidden TCP connections for many mobile nodes. |

CSE-TR-470-02 |
SYN-dog: Sniffing SYN Flooding Attacks |
Wang Zhang and Shin | Nov, 02 | 25 |

This paper presents a simple and robust mechanism, called SYN-dog, to sniff SYN flooding attacks. The core of SYN-dog is based on the distinct protocol behavior of TCP connection establishment and teardown, and is an instance of the Sequential Change Point Detection. To make the detection mechanism insensitive to site and access pattern, a non-parametric Cumulative Sum (CUSUM) method is applied, thus making the detection mechanism robust, more generally applicable and its deployment much easier. The statelessness and low computation overhead of SYN-dog make itself immune to flooding attacks. The efficacy of SYN-dog is evaluated by extensive trace-driven simulations. The evaluation results show that SYN-dog has short detection latency and high detection accuracy. Moreover, due to its proximity to the flooding sources, SYN-dog not only sets alarm upon detection of ongoing SYN flooding attacks, but also reveals the location of the flooding sources without resorting to the IP traceback. |

CSE-TR-471-02 |
Johnny Can't Sing: a Comprehensive Trainable Error Model for Sung Music Queries |
Meek and Birmingham | Nov, 02 | 40 |

We propose a model for errors in sung queries, a variant of the hidden Markov model (HMM). This is a solution to the problem of identifying the degree of similarity between a (typically error-laden) sung query and a potential target in a database of musical works, an important problem in the field of music information retrieval. Similarity metrics are a critical component of ``query-by-humming' (QBH) applications which search audio and multimedia databases for strong matches to aural queries. Our model comprehensively expresses the types of error or variation between target and query: cumulative and non-cumulative local errors, transposition, tempo and tempo changes, insertions, deletions and modulation. The model is not only expressive, but automatically trainable, or able to learn and generalize from query examples. We present results of simulations, designed to assess the discriminatory potential of the model, and tests with real sung queries, to demonstrate relevance to real-world applications. |

Technical Reports Page