# Solving Hard Instances of Floorplacement Aaron Ng, Igor Markov University of Michigan Rajat Aggarwal Xilinx, Inc. Venky Ramachandran Calypto Design Systems, Inc. #### **Outline** - Motivation and previous work - Design trends and placement tools: RTL placement - □ Floorplacement techniques - Difficult floorplacement instances - Empirical analysis of existing techniques - Scaling floorplacement up with SCAMPI - Techniques to improve floorplacement - Empirical results - ☐ Advantages and drawbacks - Conclusions # Motivation & previous work #### Design trends & placement tools - Traditional placement is bit-level - □ Relatively late in the design flow - □ Relatively slow - Layout of final implementations - □ IP modules, memory, SoCs - → hard macro modules - System-level design & high-level synthesis - □ Fast performance estimations, prototyping - Build custom RTL library pre-characterized area, timing, power - → soft macro modules ## Support for larger scale & greater complexity - Moving away from bit-level design → more macros - Floorplanning - ☐ Std cell placement & floorplanning have similar objectives - non-overlapping module locations - optimization of interconnect, but - More expensive algorithms required for floorplanning - std cells fit in rows and are relatively similar in size - macro modules can span rows & vary greatly in size - → Floorplanning algorithms do not scale well ## Unification of floorplanning and placement - Floorplacement [Adya, ICCAD04] - □ Simultaneous placement - + floorplanning - Various combinatorial - + analytic techniques (PATOMA, Capo, APlace) - Shortcomings of unified frameworks - □ Placement + floorplanning integration is not seamless - Tradeoff between scalability & accuracy (e.g., sacrificed strength of floorplanning algorithms) - □ To illustrate these effects, we introduce a suite of hard floorplacement benchmarks # Difficult floorplacement instances #### Difficult instances - 81 to 8827 RTL modules - Hard & soft modules, some std cells - Area<sub>largest</sub> up to 50% of total cell area - Area<sub>largest</sub> / Area<sub>smallest</sub> 650 to 185330 - http://vlsicad.eecs.umich.edu/BK/ISPD06bench # Empirical analysis of existing techniques ### Partitioning & fast block-packing - PATOMA [J.Cong et. al, ASPDAC 2005] - Hierarchical min-cut partitioning - □ Bears the burden of minimizing interconnect - Fast block-packing on resulting partitions - □ Check area feasibility - Weak wirelength optimization - Contingency plan - Best legal packing is saved at every level - □ If partitioning cannot continue, best legal packing is used ## Partitioning & fast block-packing (cont'd) 1000 Fast block-packing solutions used too early Bad wirelength in some cases (9.7x worse in this case) Fast lookahead block packers check area feasibility of floorplanning instances - produce false negatives - bail out too early ### Partitioning & strong block-packing - Capo (with Parquet) [J. A. Roy et. al, TCAD 2006] - Top-down min-cut placement framework - Dynamically invoke floorplanner using heuristics (e.g., when a block is too large to fit in child partitions) - □ Can undo partitioning decisions and perform FP instead - Floorplanning by simulated annealing - □ Floorplan representations capture large solution space (e.g., SeqPair, B\*-tree) - Multi-objective optimization (area & wirelength) - □ Hard & soft blocks with any aspect ratios - □ Limited effective operating range (up to ~100 modules) # r,e ## Partitioning & strong block-packing (cont'd) # M ## Analytical placement, cell spreading - APlace [A. B. Kahng et. al, ICCAD 2005] - Non-linear optimization - DensityWeight\*DensityPenalty + WLweight\*TotalWL - DensityPenalty = $\sum_{g} (\sum_{c} Potential(c,g) ExpPotential(g))^2$ (Potential is a bell-shaped function of: module dims, a radius of influence & module's distance from grid cells) - Simultaneous handling of macros and std cells - Clustering for scalability and better solution quality - Legalization usually required after cell-spreading ## Analytical plcmnt, cell spreading (cont'd) # Scaling floorplacement up #### Scaling floorplacement up - Hierarchical framework: coarse view → fine view - Approximations more tolerable at the coarse level - Accurate/detailed algorithms required at the fine level - □ Our work bridges the gap between coarse & detail levels #### SCAMPI - □ Scalable Advanced Macro Placement Improvements - Selective macro placement and clustering - Obstacle handling - Look-ahead floorplanning - Whitespace allocation by block densities #### Selective macro placement & clustering - Place large modules early - □ A module is placed & fixed when it becomes *large* relative to its bin (partition) - ☐ Cluster smaller modules & std cells into soft blocks Specific locations are determined at the right level of spatial hierarchy #### **Obstacle handling** - Necessity - Macros placed early become obstacles - □ Obstacles can also appear in input - Our approach - Modify well-known B\*-tree evaluation procedure #### Other improvements - Ad hoc look-ahead floorplanning - □ Quick area feasibility check for a bin - □ Fast block-packing of *large* blocks - □ Aggressive clustering to reduce the problem size - Whitespace allocation by block densities - Sum of area underestimates area of packed blocks (assumes zero deadspace) - □ Estimate deadspace by using sum of module perimeters (e.g., surface area) no deadspace some deadspace **VS** VS Compare bins and adjust cutlines after partitioning no deadspace #### **Empirical results** Best legal solutions Illegal or no solution | cal | PATOMA 1.0 | | Capo 9.4 | | APlace 2.0 | | FengShui 5.1 | | SCAM | | PI = | | | | | | | |------------|----------------|-------------|--------------|----------------|---------------|----------------|----------------|--------------|----------------|-----------------------|--------------|-------------|----------------|---------------|----------------|------------------|----------------| | bench | HPWL<br>(e+04) | ovlp<br>(%) | time<br>(s) | HPWL<br>(e+04) | ovlp<br>(%) | time<br>(s) | HPWL<br>(e+04) | ovlp<br>(%) | time<br>(s) | HPWL<br>(e+04) | ovlp<br>(%) | time<br>(s) | HPWL<br>(e+04) | ovlp<br>(%) | time<br>(s) | PATOMA<br>(HPWL) | CAPO<br>(HPWL) | | 040<br>098 | 177.2<br>52.3 | 0.0 | 9.6<br>11.2 | 18.7 | 0.0<br>1.3 | 45.4<br>788.2 | 20.7<br>22.6 | 0.3 ⊗<br>0.3 | 239.0<br>271.6 | 20.6<br>24.0 | 0.0<br>0.0 ⊗ | 37.9<br>6.0 | 18.8<br>30.7 | 0.0 | 44.9<br>302.4 | 0.11x<br>0.59x | 1.00x | | 336<br>353 | 2.8<br>7.6 | 0.0 | 1.2<br>1.0 | 3.5<br>6.5 | 9.1<br>0.5 | 22.5<br>52.6 | 2.2<br>4.6 | 0.1 ⊗<br>0.3 | 83.5<br>211.8 | 7.6<br>31.5 | 0.0<br>1.6 ⊗ | 0.2<br>0.8 | 3.3<br>6.3 | 0.0 | 30.4<br>44.5 | 1.20x<br>0.83x | - | | 523<br>542 | 123.7<br>0.9 | 0.0 | 3.4<br>0.1 | 34.7<br>0.8 | 0.3<br>0.0 | 240.2<br>3.3 | 27.5<br>0.7 | 0.3<br>0.1 | 920.3<br>42.8 | 348.7 | 0.0<br>× | 2.8<br>× | 37.1<br>0.8 | 0.0 | 460.1<br>2.4 | 0.30x<br>0.89x | 1.00x | | 566<br>583 | 83.6<br>47.0 | 0.0 | 4.9<br>2.3 | 63.8 | 1.9<br>0.6 | 225.7<br>190.6 | 46,9<br>20,6 | 0.5<br>0.2 | 341.1<br>421.2 | 493.6<br>× | 3.8 ⊗<br>× | 3.2<br>× | 69.3<br>25.1 | 0.0 | 162.8<br>342.6 | 0.83x<br>0.53x | - | | 588<br>643 | 8.8<br>4.9 | 0.0 | 0.7<br>0.6 | 6.3<br>3.8 | 1.1<br>0.9 | 60.4<br>18.8 | 4.8<br>3.0 | 0.5<br>0.4 | 41.5<br>29.3 | 15.3 | ×<br>0.2⊗ | ×<br>0.5 | 6.9<br>3.7 | 0.0 | 102.7<br>40.0 | 0.78x<br>0.76x | - | | DCT × in | ndicates tir | × | ×<br>rash or | a run comr | ×<br>deted wi | >1800 | ucing a sol | 1.7 ⊗ | 719.4 | 184.7<br>an out-of-co | 0.0 | 8.0 | 37.2 A | 0.0<br>verage | 123.5 | -<br>0.68x | -<br>1.00x | Table 4: Runs on proprietary designs. Best legal solutions are emphasized in bold. PATOMA 1.0 Capo 9.4 FengShui 5.1 **SCAMPI** APlace 2.0 ibm $-HB^+$ HPW L HPWL HPWL ovlp HPWL HPWL ovlp ovlp time PATOMA CAPO (e+06)bench (e+06)(%) (s) (e+06)(%) (s) (e+06)(%)(s) (e+06)(%)(s) (%) (s) (HPWL) (HPWL) 2.7 3.0 0.0 5.6 651.5 68.0 0.2 ⊗ 16.6 0.0 62.0 01 1.4 0.87x $0.9 \otimes$ 0.0 1539.7 5.0 2.6 101.5 8.7 43.6 8.0 0.0 139.6 0.42x02 19.1 $\times$ 7.4 2.1 101.3 9.5 104.6 03 >1800 $\times$ >18008.2 2.8 113.9 10.8 $0.2 \otimes$ 41.4 12.3 0.0 144.1 04 × × 8.2 122.5 36.0 06 >18001.0 10.7 1.4 ⊗ 11.0 170.00.0 13.6 13.7 218.4 37.1 0.0 5.1 99.9 0.93x0.99x07 16.8 15.8 115.31 1.4 15.7 60.6 08 >180016.6 1.0 ⊗ 294.2 21.8 0.5 ⊗ 20.5 0.0188.4 0.2 188.9 15.1 0.9 222.4 $1.2 \otimes$ 42.9 22.2 0.0 182.0 09 20.6 $\times$ 10 2.7 263.7 36.9 0.3 529.5 × 0.0319.9 $\times$ $0.2 \otimes$ 25.3 0.0 49.2 28.1 0.0 140.5 24.5 1.1 270.3 30.463.8 27.8 0.0144.7 1.10x0.99x11 482.2 39.2 1.07x12 63.4 0.0 >1800 $0.0 \otimes$ 67.6 0.0406.1 0.0 34.7 39.6 0.0 221.5 0.5 240.4 209.6 1.07x13 31.7 42.2 0.01.13x 89.7 14 68.7 0.0 70.9 0.0 320.7 57.1 1.0 ⊗ 392.9 74.02.7 0.0268.3 0.97x0.97x87.5 15 >18001.5 422.2 90.6 $0.0 \otimes$ 100.388.2 0.0375.9 0.0 0.0 80.8 0.3 106.2 306.5 0.99x16 100.3 74.4 106.9 431.5 528.10.01.06x $\times$ $\times$ 799.3 17 141.4 0.0 95.9 0.1397.1 133.9 0.5 152.7 0.0 385.7 1.08x $\times$ 18 67.2 220.1344.0 77.8 0.0192.3 1.07x× Average 1.03x 0.93x Table 5: Runs on IBM-HB<sup>+</sup>. Best legal solutions are emphasized in bold. × indicates time-out, crash, or a run completed without producing a solution; ⊗ indicates an out-of-core solution #### **Empirical results (cont'd)** #### Success rates #### Wirelength comparison - □ Averaged over successful runs of Capo 9.4 & PATOMA - □ SCAMPI achieves 3.5% and 14.5% better HPWL, resp. #### **Advantages & drawbacks of SCAMPI** #### Advantages - □ Robust (68% and 36% better success rates than Capo9.4 and PATOMA) - ☐ Handles soft & hard macros, and std. cells - □ Handles obstacles & wide ranges of block dimensions - □ Good routability [J. A. Roy et. al, ISPD 2006] #### Potential drawbacks - □ Worse wirelength than some tools (e.g., APlace) - ☐ But APlace currently produces illegal floorplans - Stronger legalization can make APlace more competitive (see next slide) ### Ongoing work: floorplan assistant - Al-based floorplan legalizer - Preliminary results: - □ Removes overlaps quickly, e.g., from APlace placements - □ Preserves placement - □ Some increase in wirelength seems inevitable #### **Conclusions** - RTL placement includes - □ Numerous hard & soft blocks, and standard cells - ☐ Macros, IP blocks, memories of very different sizes. - ☐ Fixed obstacles - SCAMPI solves hard instances using - ☐ Selective floorplanning & macro clustering - Support for obstacles in the B\*-tree representation - Ad hoc look-ahead floorplanning - □ Whitespace allocation by block densities - Suite of hard floorplacement instances - □ <a href="http://vlsicad.eecs.umich.edu/BK/ISPD06bench">http://vlsicad.eecs.umich.edu/BK/ISPD06bench</a> - SCAMPI is available in source code # Questions? #### Reproducing difficult instances - In general, difficulties are from scale and/or large variations in module sizes - We take IBM-HB (which were from IBM/ISPD'98) - ☐ Std cells → macros - We introduce IBM-HB+ (derived from IBM-HB) - □ An example of how to re-create difficult instances - ☐ Largest macro inflated 100% - □ Smaller macros shrunk to preserve total cell area | Proprietary | Movab | le modules | NT_4 | Area <sub>largest</sub> | Area <sub>largest</sub> / | | |-------------|-------|------------|-------|-------------------------|---------------------------|--| | designs | Cells | Macros | Nets | (%) | $Area_{smallest}$ | | | cal040 | 1 | 4605 | 4607 | 0.1 | 650 | | | cal098 | 3200 | 1212 | 4673 | 0.1 | 529 | | | cal336 | 17 | 105 | 147 | 2.2 | 11556 | | | cal353 | 217 | 459 | 908 | 7.0 | 11556 | | | cal523 | 934 | 1936 | 4350 | 0.3 | 3080 | | | cal542 | 7 | 74 | 92 | 20.1 | 11556 | | | cal566 | 93 | 1553 | 5502 | 1.2 | 11556 | | | cal583 | 773 | 1530 | 3390 | 0.4 | 2916 | | | cal588 | 293 | 495 | 1111 | 0.6 | 900 | | | cal643 | 139 | 316 | 598 | 6.5 | 6162 | | | calDCT | 0 | 8827 | 11463 | 50.0 | 185330 | | Table 2: Characteristics of the proprietary designs. | D 11 | Movab | le modules | NI_4 | Area <sub>largest</sub> | Area <sub>largest</sub> / | | |------------------------|-------|------------|-------|-------------------------|---------------------------|--| | Benchmarks | Cells | Macros | Nets | (%) | $Area_{smallest}$ | | | ibm-HB <sup>+</sup> 01 | 0 | 911 | 5829 | 6.4 | 8416 | | | ibm-HB+02 | 0 | 1471 | 8508 | 11.3 | 3004.3 | | | ibm-HB <sup>+</sup> 03 | 0 | 1289 | 10279 | 10.8 | 33088 | | | ibm-HB <sup>+</sup> 04 | 0 | 1584 | 12456 | 9.2 | 13296.5 | | | ibm-HB+06 | 0 | 749 | 9963 | 13.6 | 18173.8 | | | ibm-HB <sup>+</sup> 07 | 0 | 1120 | 15047 | 4.8 | 399.5 | | | ibm-HB+08 | 0 | 1269 | 16075 | 12.1 | 50880 | | | ibm-HB <sup>+</sup> 09 | 0 | 1113 | 18913 | 5.4 | 29707 | | | ibm-HB+10 | 0 | 1595 | 27508 | 4.8 | 71299 | | | ibm-HB <sup>+</sup> 11 | 0 | 1497 | 27477 | 4.5 | 9902.3 | | | ibm-HB <sup>+</sup> 12 | 0 | 1233 | 26320 | 6.4 | 74256 | | | ibm-HB <sup>+</sup> 13 | 0 | 954 | 27011 | 4.2 | 33088 | | | ibm-HB <sup>+</sup> 14 | 0 | 1635 | 43062 | 2.0 | 17860 | | | ibm-HB+15 | 0 | 1412 | 52779 | 11.0 | 62781.3 | | | ibm-HB <sup>+</sup> 16 | 0 | 1091 | 47821 | 1.9 | 31093 | | | ibm-HB+17 | 0 | 1442 | 56517 | 0.9 | 12441 | | | ibm-HB <sup>+</sup> 18 | 0 | 943 | 42200 | 1.0 | 3384 | | Table 3: Characteristics of the IBM-HB<sup>+</sup> benchmarks. #### IBM-HB+ http://vlsicad.eecs.umich.edu/BK/ISPD06bench Figure 2: Six of the seventeen IBM-HB<sup>+</sup> benchmarks. #### **Floorist** - Constraint-driven floorplan repair\* - Build constraint graphs from placement ordering - □ Represent pair-wise relationships between modules - Perform conflict-directed iterative repair on graphs - Overlapping pairs are initially constrained - □ Induce constraints to resolve overlaps, or - Identify blocks on critical paths, modify their relationships with other modules - Translate constraint graphs back - APlace + Floorist = best-seen results for IBM-HB <sup>\*</sup> M. Moffitt, A. N. Ng, "Constraint-driven floorplan repair", DAC 2006 ## FengShui placements