# Performance and Complexity Analysis of VLSI Multi-Carrier Receivers for Low-Energy Wireless Communications\*

Sangjin Hong
Department of Electrical and Computer Engineering
State University of New York, Stony Brook
snjhong@ece.sunysb.edu

Riten Gupta, Wayne E. Stark, and Alfred O. Hero III
Department of Electrical Engineering and Computer Science
University of Michigan, Ann Arbor
{robby,stark,hero}@eecs.umich.edu

Abstract – A high data rate multi-carrier receiver employing orthogonal frequency division multiplexing (OFDM) for mobile communications requires joint low-power VLSI design optimization of the channel equalizer and demodulator. Although the multi-carrier receiver structure can be greatly simplified by use of the Fast Fourier Transform (FFT) for demodulation, the physical hardware complexity of the FFT is still significant. By introducing a powerful channel equalizer, reduction in FFT complexity can be achieved while maintaining sufficient reception quality. This paper analyzes the performance and hardware complexity of the receiver structure in a multipath communications environment.

# I. INTRODUCTION

High data rate communication in highly-mobile environments is increasingly becoming desirable [1]. For single carrier systems, this would either entail using a very complex constellation with many bits per symbol or a very high symbol rate [2]. Using a dense signaling constellation is undesirable for a wireless system since noise and fading make it difficult to reliably detect which constellation point was sent. Likewise, signaling at a high symbol rate is equally undesirable because the multipath nature of the wide area channel would require a complex high-speed equalizer or similar technique to deal with the time dispersion of the transmitted signal.

For these reasons OFDM is used to provide acceptable performance in a fading, multipath RF environment while promising high peak data rates [3]. OFDM techniques have been used in high-speed wireless LANs, digital audio broadcast systems and wireline high-speed data communications systems.

The use of the FFT in the transmitter and receiver greatly simplifies the overall system structure. Processing each individual tone's modulation in the frequency domain reduces the modulation/demodulation process to one of setting and comparing a single complex number for each tone. However, the circuit complexity of the FFT implementation is still great, even with current low-power digital signal processing technol-

ogy. Thus a small FFT size is highly desirable to minimize overall receiver complexity. One way to reduce the complexity of the FFT is to introduce a powerful channel equalizer in the receiver [3]. The circuit complexity tradeoff between the FFT and channel equalizer in a multipath channel environment at high data rates is the main objective of this study.

The complexity of the multi-carrier receiver which is comprised mainly of the channel equalizer and the FFT demodulator is strongly dependent on the data rate and the delay spread of the channel. A large delay spread  $T_{ds}$  requires long OFDM symbol duration and consequently a large FFT size at the receiver. An increase in  $T_{ds}$  thus increases the receiver complexity by requiring a large FFT structure. On the other hand, a powerful channel equalizer will reduce the effective delay spread. By increasing the complexity of the equalizer, a great reduction in FFT complexity can be achieved for a given performance level. It is therefore of great interest to observe the performance-complexity tradeoff between the channel equalizer and FFT for various multipath channels. Theoretical analysis has shown that significant power savings can be obtained with little loss in equalizer performance by reduction of bit resolution on the coefficients and on the data stream [4, 5]. This paper incorporates such a power reduction strategy for OFDM reception along with equalizer tap length reduction and FFT length reduction.

The remainder of this paper has five sections. Section II describes the multi-carrier transceiver system model. The models of the main components are discussed in detail, as well as the channel models. Section III describes the performance of the multi-carrier receiver including the effects of the channel equalizer and the FFT demodulator. In Section IV, the complexity of the channel equalizer and the FFT demodulator is discussed. The components are actually designed and implemented to extract the complexity information. In Section V, we present a design example and tradeoff results. Finally our contributions are summarized in Section VI.

# II. SYSTEM MODEL

A simplified communication system model as shown in Figure 1 is adopted in this paper. The model consists of a multi-carrier modulator employing an inverse FFT (IFFT), trans-

<sup>\*</sup>This research was supported by the Department of Defense Research & Engineering (DDR&E) Multidisciplinary University Research Initiative (MURI) on "Low Energy Electronics Design for Mobile Platforms" and managed by the Army Research Office (ARO) under grant DAAH04-96-1-0377.



Figure 1: Multi-carrier system model.

mitter power amplifier (PA), channel, channel equalizer, and multi-carrier demodulator employing a forward FFT. We assume perfect synchronization so that carrier frequency and sampling timing errors are zero. We also assume that both the digital-to-analog converter (DAC) and analog-to-digital converter (ADC) have ideal transfer characteristics and are not sources of signal degradation. Finally we assume coherent reception.

## A. MC-Modulator Model

A set (packet) of  $N_{fft}$  real-valued information bits  $d_{k,n} \in \{1,-1\}$  for time index  $n=0,1,\cdots,N_{fft}-1$  at rate  $R_d=1/T_d$  and set index k is converted into a complex-valued sequence  $X_{k,m}$  for  $m=0,1,\cdots,N_{fft}/2-1$ . The packet rate is thus  $R_d/N_{fft}$ . Each complex-valued symbol  $X_{k,m}$  corresponds to a signal in the QPSK signal constellation (i.e.,  $X_{k,m} \in \{1+j,1-j,-1+j,-1-j\}$ ). This length- $N_{fft}/2$  complex sequence is repeated to form a complex conjugate symmetric sequence  $X_{k,m}$  for sub-carrier index  $m=0,1,\cdots,N_{fft}-1$  such that  $Re\{X_{k,m}\}=Re\{X_{k,N_{fft}-m}\}$  and  $Im\{X_{k,m}\}=-Im\{X_{k,N_{fft}-m}\}$ . The set  $\{X_{k,m}\}_{m=0}^{m=N_{fft}-1}$  is processed by an  $N_{fft}$ -point IFFT executing at rate  $R_d/N_{fft}$  and converted to a serial sequence resulting in the following MC-DS/SS real baseband signal (after digital-to-analog conversion)

$$s_{k}(t) = \frac{1}{\sqrt{N_{fft}}} \sum_{m=0}^{N_{fft}-1} X_{k,m} e^{j2\pi m f_{0}t}$$

$$= \frac{1}{\sqrt{N_{fft}}} \sum_{n=0}^{N_{fft}-1} d_{k,n} \cos(2\pi n f_{0}t) \quad 0 \le t \le T(2)$$

where  $N_{fft}$  is the number of carriers (corresponding to the size of the IFFT),  $f_0$  is the frequency spacing between adjacent carriers, and  $T_{tf} = N_{fft}/R_d = 1/f_0$  is the transmitter IFFT transform duration in seconds (i.e.,  $T_{tf}$  is equal to the data interval times  $N_{fft}$ ). After appending a cyclic prefix, the transmitted packet interval is  $T_{packet} = T_{tf} + T_g$  where  $T_g$  is the duration of the cyclic prefix (guard interval) and is assumed to be equal to the channel delay spread  $T_{ds}$ .

# B. Multipath Channel Model

Many models are available for signal-degrading channels. The additive white Gaussian noise (AWGN) channel and various multipath fading channels are commonly used as models of realistic conditions. The AWGN channel model assumes that the transmitted signal is corrupted by addition of a white Gaussian noise process with two-sided power spectral density  $N_0/2$ . Examples of fading channel models are the American Legion Drive and Pine Street models. These channel models are based on measured data. The delay profiles for the channels are illustrated in Figure 2 and the frequency responses for the channels



Figure 2: Illustration of delay profiles. Delay profiles for American Legion Drive is an example of a suburban area and Pine Street is an example of an urban area.



Figure 3: Illustration of frequency responses of two measured channels

are illustrated in Figure 3. The frequency selectivity depends on delay profile and the time selectivity depends on doppler power spectrum. In this paper, we focus on the American Legion Drive channel model. We also assume that the transmitter power amplifier is linear and causes no degradation.

The channel-corrupted transmitted signal can be written

$$x(t) = As(t) * h(t) + n(t)$$
(3)

where A is the gain of the power amplifier, h(t) is the impulse response of the channel and n(t) is additive white Gaussian noise.

## C. Channel Equalizer Model

The received signal x(t) is digitized by the analog-to-digital converter (ADC) with oversampling factor  $S_{eq}$  generating the digital sequence  $x_k$ . Thus the rate of the data samples at the input to the channel equalizer is  $T_d/S_{eq}$ .

The channel equalizer is a finite-precision  $N_{eq}$ -tap FIR filter with possibly different word widths for data and coefficients. The equalizer operates in two modes – training mode and filtering mode. During training mode a known preamble is transmitted and the filter coefficients are updated according to the



Figure 4: Block diagram of the channel equalizer.

well-known LMS adaptive algorithm:

$$\underline{w}_{k+1} = \underline{w}_k + \mu \underline{x}_k (s_k - \underline{x}_k^T \underline{w}_k) \tag{4}$$

where  $s_k$  is the preamble sequence,  $\underline{x}_k$  is a vector of recent samples of the channel-corrupted data sequence,  $\underline{w}_k$  is the length- $N_{eq}$  coefficient vector, and  $\mu$  is an adaptive gain parameter, usually chosen to be a power of two. During filtering mode the coefficient vector is fixed and the equalizer acts as a simple FIR filter.

#### D. MC-Demodulator Model

The output sequence of the channel equalizer  $y_l$  is oversampled by a factor of  $S_{fft}$ . Thus a sequence of length  $N_{fft}S_{fft}$  is the input to the FFT demodulator. The size of the FFT demodulator is therefore  $S_{fft}$  times larger than the size of the IFFT modulator. The output of the FFT demodulator is given by

$$Y_{k,m} = \frac{1}{\sqrt{N_{fft}S_{fft}}} \sum_{l=0}^{N_{fft}S_{fft}-1} y_l e^{-j2\pi l m/N_{fft}S_{fft}}$$
 (5)

where k is the set index corresponding to a block of FFT data samples. The sequence  $Y_{k,m}$  is decimated by a factor of  $S_{fft}$  so that the resulting sequence  $\tilde{Y}_{k,m} = Y_{k,S_{fft}m}$  and  $X_{k,m}$  are compared for possible bit errors.

## III. PERFORMANCE ANALYSIS

A multi-carrier receiver incorporates two main components, a channel equalizer and an FFT demodulator. Commonly, multi-carrier transceivers packetize the transmitted data into packets of length  $N_{fft}$ .

A primary strength of multi-carrier modulation is its performance in multipath fading environments. Multipath leads to both intersymbol interference (ISI) and interpacket interference (IPI). It is well known that using an adequate number of carriers renders the subchannels memoryless [7] while inserting a cyclic prefix (guard interval) between packets prevents IPI [8].

## A. Effects of Channel Equalizer

To prevent IPI, the length of the guard interval inserted between packets must be equal to the delay spread of the multipath channel. As a result, the effective data rate  $R_{eff}$  is less than the desired rate of  $R_d=1/T_d$ .

To combat this reduction in rate, the channel equalizer is used to decrease the effective delay spread of the channel. The resulting guard interval will be shorter and – assuming the training time of the equalizer is short – the data rate will be closer to the desired rate  $R_d$ .



Figure 5: Transversal tapped delay line.

#### B. Effects of FFT Demodulator

The IFFT modulator divides the total bandwidth into  $N_{fft}$  sub-carriers. Each sub-carrier is orthogonal to all the others and each sub-channel behaves like a flat-fading Rayleigh channel with bandwidth  $2/(N_{fft}T_d)$  assuming the coherence bandwidth of the channel  $1/T_{ds}$  exceeds the bandwidth of the sub-channels. With pilot-based correction, the Rayleigh fading can be overcome and the sub-carriers can be recovered by the FFT demodulator.

The number of carriers  $N_{fft}$  must be chosen large enough to ensure that over each frequency band of bandwidth  $2/(N_{fft}Td)$  the flat fading assumption is satsified. At the same time, a small FFT size is desired for a low-complexity receiver.

In addition to the number of carriers, the oversampling factor  $S_{fft}$  is very critical in reducing the complexity of the receiver. By doubling the oversampling factor, the size of the FFT doubles also.

#### IV. VLSI COMPLEXITY ANALYSIS

A low-complexity multi-carrier receiver comprising only a channel equalizer and FFT demodulator is designed and synthesized for complexity evaluation. These two functional blocks are implemented based on the 0.35- $\mu m$  CMOS standard cell technology using the Epoch CAD design environment. In the analysis, we assume the switching power is the main source of power consumption and is given by

$$P = \sum_{i} \alpha_{i} \cdot C_{i} \cdot V_{supply}^{2} \cdot f_{clock}$$
 (6)

where  $C_i$  is the average effective capacitance switched per operation of processing element i,  $\alpha_i$  is the switching activity ratio of processing element i,  $V_{supply}$  is the operating supply voltage, and  $f_{clock}$  is the sample frequency [9].

# A. Equalizer Complexity

The channel equalizer described in Section IIC is designed at the logic level. The equalizer is based on a tapped delay line integrated with the coefficient update function. The block diagram of the channel equalizer is shown in Figure 4.

Each filter tap requires a multiplier and the complexity of the channel equalizer is a strong function of the filter length (number of taps)  $N_{eq}$  since the multiplier complexity is dominant among circuitry comprising the channel equalizer. The multiplier is designed to operate at a high enough clock rate while the supply voltage requirement is at a minimum. The multipliers themselves are pipelined to reduce critical delay. The multipliers are also needed in the coefficient update unit. For low-speed operation, a single multiplier can handle all of the coefficient updates. However, multiple multipliers, equal



Figure 6: Power dissipation of the channel equalizer as a function of filter length  $N_{eq}$ . Word width of coefficient  $W_{eq,c}$  and data  $W_{eq,d}$  are specified. Reference sample clock rate  $f_{clock} = 10 \text{MHz}$  is chosen for the evaluation.



Figure 7: Block diagram of the pipeline FFT demodulator.

in number to the filter length  $N_{eq}$  are needed for high-speed operation. A transversal tapped delay line shown in Figure 5 is used for the FIR filter implementation.

Figure 6 shows the power dissipated by the equalizer for various filter lengths  $N_{eq}$  and data and coefficient word widths,  $W_{eq,d}$  and  $W_{eq,c}$ , respectively. The power dissipation increases (faster than linearly due to interconnect capacitance overheads) as a function of the filter length  $N_{eq}$ . For a fixed length  $N_{eq}$ , the power dissipation is proportional to  $f_{clock} = S_{eq}/T_d$ . Thus  $N_{eq}$  and  $S_{eq}$  directly influence the operating frequency  $f_{clock}$  and indirectly influence the supply voltage (since the supply voltage is reduced by reducing the critical delay paths of the multipliers and adders).

# **B. FFT Complexity**

A low-complexity pipeline FFT architecture shown in Figure 7 is designed and its complexity is analyzed in [6]. The degree of parallelism  $d_p$  and the degree of time-multiplexing  $d_m$  are functions of the throughput and transform size requirements. The degree of parallelism  $d_p$  is defined as the number of data samples consumed and produced in one cycle of execution by the butterfly stage and the degree of time-multiplexing  $d_m$  is defined as the number of times each butterfly stage needs to execute to transform  $N_{fft}$  samples. Thus,  $N_{fft} = d_p \times d_m$  and the FFT architecture is optimized for a given transform size  $N_{fft}$  by appropriately choosing  $d_p$  and  $d_m$ .

Figure 8 shows the power dissipated by the FFT for various transform sizes  $N_{fft}$  where the data and coefficient word widths,  $W_{fft,d}$  and  $W_{fft,c}$  respectively, are fixed at 12. The power dissipation increases exponentially as a function of the transform size  $N_{fft}$ . The size of the FFT cannot be ignored when selecting the transform size  $N_{fft}$ . For a fixed transform size  $N_{fft}$ , the power dissipation is proportional to  $f_{clock} = S_{fft}N_{fft}/T_dd_p$ . Similar to the equalizer,  $N_{fft}$  and



Figure 8: Power dissipation of the FFT as a function of transform size  $N_{fft}$ . The FFT coefficient width  $W_{fft,c}$  and  $W_{fft,d}$  are fixed at 12 bits. Reference data rate  $f_{clock}=20 \mathrm{MHz}$  is chosen for the evaluation.

 $S_{fft}$  directly influence the operating frequency and indirectly influence the supply voltage.

# V. TRADEOFF RESULTS

In this section we consider joint optimization of the number of carriers and the equalizer length of a multi-carrier transceiver used with the noiseless American Legion Drive channel at desired rates 1 Mbps and 10 Mbps. The equalizer oversampling rate  $S_{eq}$  is set to 1 while the FFT oversampling rate is 2. Therefore, a  $2N_{fft}$ -point FFT is used in the demodulator.

## A. Design Methodology

Two criteria are considered in the optimization of the system parameters: effective rate and power consumption. Specifically, we must choose  $N_{fft}$  and  $N_{eq}$  to minimize power consumption while satisfying the rate constraint

$$R_{eff} \ge (1 - \rho)R_d \tag{7}$$

with  $0 < \rho \ll 1$ .

Define  $T_{ds}^*$  as the effective delay spread after equalization. Let  $T_{pre}$  be the length of the equalizer's training mode in seconds and let  $T_{ch}$  be the minimum time for the channel to change enough to require re-training of the equalizer (coherence time). The length of a packet (excluding training time) is thus  $T_{packet} = T_d N_{fft} + T_{ds}^*$ . The equalizer must be trained once every  $T_{ch}/T_{packet}$  packets. Thus the effective rate is

$$R_{eff} = \frac{N_{fft}R_d}{(N_{fft} + R_d T_{ds}^*)(1 + \frac{T_{pre}}{T_{ch}})}.$$
 (8)

For simplicity, we assume that the channel changes slowly so that  $T_{pre}/T_{ch} \approx 0$ . Using this in (8), it can easily be seen that the rate constraint (7) will be satisfied if

$$N_{fft} \ge \left[\frac{1-\rho}{\rho}\right] L(N_{eq}) \triangleq N_{fft,min}$$
 (9)

where  $L=T_{ds}^{*}/T_{d}$  is the memory, in samples, of the equalized channel and is a function of  $N_{eq}$ .

Equation (9) describes the relation between the minimum number of carriers  $N_{fft,min}$  that must be used and the equalizer length  $N_{eq}$  provided the function  $L(N_{eq})$  can be determined. This equation provides a set of pairs  $(N_{fft}, N_{eq})$  that



Figure 9: Minimum FFT size,  $N_{fft,min}$  as a function of equalizer size,  $N_{eq}$  for American Legion Drive at 1 Mbps and 10 Mbps

satisfy the rate constraint (7). The final step in the optimization is to minimize the function  $P_{rec}(N_{fft}, N_{eq})$  subject to (7) where  $P_{rec}$  is the power consumption of the receiver.

#### B. Design Example

Using  $\rho=0.1$ , the effective rate must be within ten percent of the desired rate and relation (9) becomes  $N_{fft,min}=9L(N_{eq})$ .

To determine the function  $L(N_{eq})$ , we assume that the equalizer converges to the Wiener filter  $\underline{w}^o = R_{xx}^{-1} R_{xs}$  where  $R_{xx} = E[\underline{x}_k x_k^T]$  and  $R_{xs} = E[\underline{x}_k s_k]$ . This is usually the case with LMS equalization assuming the gain parameter  $\mu$  is small. The equalized channel memory L is computed by

$$L = E[\max_{k} \{k : |h_k^*|^2 < \epsilon |h_0^*|^2\}]$$
 (10)

where  $h_k^* = h_k * \underline{w}^o$  and  $h_k$  is the discretized channel impulse response sampled at rate  $1/T_d$ . Thus  $h_k^*$  is the effective channel impulse response after equalization. We define  $h_k$  to be an i.i.d. Rayleigh random vector whose mean is equal to the delay profile of the American Legion Drive channel. Using  $\epsilon = 0.1$  and calculating the expectation in (10) by 100,000 Monte-Carlo simulations, the function  $N_{fft,min} = 9L(N_{eq})$  for the American Legion Drive channel at desired rates 1 Mbps and 10 Mbps is shown in Figure 9.

Figure 9 gives the set of pairs  $(N_{fft}, N_{eq})$  that meet the rate constraint (7). We can further reduce this set by selecting only those values of  $N_{fft}$  that are powers of two since most FFT implementations have this requirement. The resulting valid  $(N_{fft}, N_{eq})$  pairs along with their power consumptions are shown in Tables 1 and 2. The equalizer and FFT data and coefficient word widths are all set to 12 for the power calculations. Although this bit allocation is generally sub-optimal for the LMS equalizer [4, 5], the resulting optimal  $(N_{fft}, N_{eq})$  pairs should not be affected too much by this choice. From Tables 1 and 2, we see that the optimal values for  $N_{fft}$  and  $N_{eq}$  are  $N_{fft} = 16, N_{eq} = 10$  for  $R_d = 1$ Mbps and  $N_{fft} = 256$ ,  $N_{eq} = 70$  for  $R_d = 10$  Mbps. Thus, increasing the equalizer length reduces overall reciever power consumption while maintaining the rate constraint. Since the American Legion Drive channel has a delay spread of approximately 5.5  $\mu$ s, its coherence bandwidth is approximately 182

| $N_{eq}$ | $N_{f.ft.}$ | TotalPower (mW) |
|----------|-------------|-----------------|
| 0        | 64          | 9.2             |
| 2        | 32          | 7.7             |
| 10       | 16          | 7.6             |
| 20       | 16          | 11.2            |

Table 1: Total power (mW) dissipated for 1 Mbps data rate.

| i | $N_{eq}$ | $N_{fft}$ | TotalPower (mW) |
|---|----------|-----------|-----------------|
|   | 0        | 512       | 810             |
|   | 70       | 256       | 760             |
|   | 100      | 256       | 850             |

Table 2: Total power (mW) dissipated for 10 Mbps data rate.

kHz. With  $R_d=1$  Mbps and  $N_{fft}=16$ , the sub-channel bandwidth is 125 kHz. With  $R_d=10$  Mbps and  $N_{fft}=256$ , the sub-channel bandwidth is approximately 78 kHz. Thus the flat-fading assumption is valid for both optimal designs.

#### VI. CONCLUSIONS

A high data rate multi-carrier receiver employing orthogonal frequency division multiplexing in a highly-mobile communications environment requires some amount of channel equalization for sufficient reception quality. The receiver structure is greatly simplified by use of the FFT in such receivers. We have shown that the FFT size and equalizer length can be jointly optimized for minimum power consumption. In particular we have shown that a large equalizer size reduces the required FFT size, thereby reducing the overall power consumption.

### VII. REFERENCES

- Andrew J. Viterbi, CDMA: Principles of Spread Spectrum Communication, Addison-Wesley, 1995.
- [2] John G. Proakis, Digital Communications, McGraw-Hill, 1995.
- [3] J. A. C. Bingham, "Multicarrier Modulation for Data Transmission: An Idea Whose Time Has Come," *IEEE Communications Magazine*, May 1990
- [4] A. O. Hero and R. Gupta, "Power vs. Performance Tradeoffs for Reduced Resolution Adaptive Equalizers," Proceedings of the IEEE 1998 Conf. on Miltary Communications, Bedford, MA Sept. 1998
- [5] R. Gupta and A. O. Hero, "Theoretical Aspects of Power Reduction for Adaptive Filters," Proceedings of the IEEE 1999 Int. Conf. on Acoust., Speech, and Sig. Proc., Phoenix, AZ, March 1999
- [6] Sangjin Hong, Suhwan Kim, Marios C. Papaefthymiou, and Wayne E. Stark, "Power vs. Complexity Analysis of Pipeline VLSI FFT Architecture for Low Energy Wireless Communication Applications," *Proceedings of MWSCAS*, 1999.
- [7] L. J. Cimini, "Analysis and Simulation of a Digital Mobile Channel Using Orthogonal Frequency Division Multiplexing," *IEEE Transactions on Communications*, vol. 33, no. 7, pp. 665-675, July 1985.
- [8] N. Al-Dhahir and J. M. Cioffi, "Optimum Finite-Length Equalization for Multicarrier Transceivers," *IEEE Transactions on Communications*, vol. 44, no. 1, pp. 56-63, Jan. 1996.
- [9] A. P. Chandrakashan and R. Brodersen, Low Power Digital CMOS Design, Kluwer Academic Publishers, 1996.