Stefan Roese | a2c95a7 | 2006-07-28 18:34:58 +0200 | [diff] [blame] | 1 | AMCC suggested to set the PMU bit to 0 for best performace on the |
| 2 | PPC440 DDR controller. The 440er common DDR setup files (sdram.c & |
| 3 | spd_sdram.c) are changed accordingly. So all 440er boards using |
| 4 | these setup routines will automatically receive this performance |
| 5 | increase. |
| 6 | |
| 7 | Please see below some benchmarks done by AMCC to demonstrate this |
| 8 | performance changes: |
| 9 | |
| 10 | |
| 11 | ---------------------------------------- |
| 12 | SDRAM0_CFG0[PMU] = 1 (U-boot default for Bamboo, Yosemite and Yellowstone) |
| 13 | ---------------------------------------- |
| 14 | Stream benchmark results |
| 15 | ------------------------------------------------------------- |
| 16 | This system uses 8 bytes per DOUBLE PRECISION word. |
| 17 | ------------------------------------------------------------- |
| 18 | Array size = 2000000, Offset = 0 |
| 19 | Total memory required = 45.8 MB. |
| 20 | Each test is run 10 times, but only |
| 21 | the *best* time for each is used. |
| 22 | ------------------------------------------------------------- |
| 23 | Your clock granularity/precision appears to be 1 microseconds. |
| 24 | Each test below will take on the order of 112345 microseconds. |
| 25 | (= 112345 clock ticks) |
| 26 | Increase the size of the arrays if this shows that you are not getting |
| 27 | at least 20 clock ticks per test. |
| 28 | ------------------------------------------------------------- |
| 29 | WARNING -- The above is only a rough guideline. |
| 30 | For best results, please be sure you know the precision of your system |
| 31 | timer. |
| 32 | ------------------------------------------------------------- |
| 33 | Function Rate (MB/s) RMS time Min time Max time |
| 34 | Copy: 256.7683 0.1248 0.1246 0.1250 |
| 35 | Scale: 246.0157 0.1302 0.1301 0.1302 |
| 36 | Add: 255.0316 0.1883 0.1882 0.1885 |
| 37 | Triad: 253.1245 0.1897 0.1896 0.1899 |
| 38 | |
| 39 | |
| 40 | TTCP Benchmark Results |
| 41 | ttcp-t: socket |
| 42 | ttcp-t: connect |
| 43 | ttcp-t: buflen=8192, nbuf=2048, align=16384/0, port=5000 tcp -> |
| 44 | localhost |
| 45 | ttcp-t: 16777216 bytes in 0.28 real seconds = 454.29 Mbit/sec +++ |
| 46 | ttcp-t: 2048 I/O calls, msec/call = 0.14, calls/sec = 7268.57 |
| 47 | ttcp-t: 0.0user 0.1sys 0:00real 60% 0i+0d 0maxrss 0+2pf 3+1506csw |
| 48 | |
| 49 | ---------------------------------------- |
| 50 | SDRAM0_CFG0[PMU] = 0 (Suggested modification) |
| 51 | Setting PMU = 0 provides a noticeable performance improvement *2% to |
| 52 | 5% improvement in memory performance. |
| 53 | *Improves the Mbit/sec for TTCP benchmark by almost 76%. |
| 54 | ---------------------------------------- |
| 55 | Stream benchmark results |
| 56 | ------------------------------------------------------------- |
| 57 | This system uses 8 bytes per DOUBLE PRECISION word. |
| 58 | ------------------------------------------------------------- |
| 59 | Array size = 2000000, Offset = 0 |
| 60 | Total memory required = 45.8 MB. |
| 61 | Each test is run 10 times, but only |
| 62 | the *best* time for each is used. |
| 63 | ------------------------------------------------------------- |
| 64 | Your clock granularity/precision appears to be 1 microseconds. |
| 65 | Each test below will take on the order of 120066 microseconds. |
| 66 | (= 120066 clock ticks) |
| 67 | Increase the size of the arrays if this shows that you are not getting |
| 68 | at least 20 clock ticks per test. |
| 69 | ------------------------------------------------------------- |
| 70 | WARNING -- The above is only a rough guideline. |
| 71 | For best results, please be sure you know the precision of your system |
| 72 | timer. |
| 73 | ------------------------------------------------------------- |
| 74 | Function Rate (MB/s) RMS time Min time Max time |
| 75 | Copy: 262.5167 0.1221 0.1219 0.1223 |
| 76 | Scale: 258.4856 0.1238 0.1238 0.1240 |
| 77 | Add: 262.5404 0.1829 0.1828 0.1831 |
| 78 | Triad: 266.8594 0.1800 0.1799 0.1802 |
| 79 | |
| 80 | TTCP Benchmark Results |
| 81 | ttcp-t: socket |
| 82 | ttcp-t: connect |
| 83 | ttcp-t: buflen=8192, nbuf=2048, align=16384/0, port=5000 tcp -> |
| 84 | localhost |
| 85 | ttcp-t: 16777216 bytes in 0.16 real seconds = 804.06 Mbit/sec +++ |
| 86 | ttcp-t: 2048 I/O calls, msec/call = 0.08, calls/sec = 12864.89 |
| 87 | ttcp-t: 0.0user 0.0sys 0:00real 46% 0i+0d 0maxrss 0+2pf 120+1csw |
| 88 | |
| 89 | |
| 90 | 2006-07-28, Stefan Roese <sr@denx.de> |