Simon Glass | b2e16a8 | 2013-06-11 11:14:39 -0700 | [diff] [blame] | 1 | # |
| 2 | # Copyright (c) 2013 The Chromium OS Authors. |
| 3 | # |
Wolfgang Denk | 1a45966 | 2013-07-08 09:37:19 +0200 | [diff] [blame] | 4 | # SPDX-License-Identifier: GPL-2.0+ |
Simon Glass | b2e16a8 | 2013-06-11 11:14:39 -0700 | [diff] [blame] | 5 | # |
| 6 | |
| 7 | Tracing in U-Boot |
| 8 | ================= |
| 9 | |
| 10 | U-Boot supports a simple tracing feature which allows a record of excecution |
| 11 | to be collected and sent to a host machine for analysis. At present the |
| 12 | main use for this is to profile boot time. |
| 13 | |
| 14 | |
| 15 | Overview |
| 16 | -------- |
| 17 | |
| 18 | The trace feature uses GCC's instrument-functions feature to trace all |
| 19 | function entry/exit points. These are then recorded in a memory buffer. |
| 20 | The memory buffer can be saved to the host over a network link using |
| 21 | tftpput or by writing to an attached memory device such as MMC. |
| 22 | |
| 23 | On the host, the file is first converted with a tool called 'proftool', |
| 24 | which extracts useful information from it. The resulting trace output |
| 25 | resembles that emitted by Linux's ftrace feature, so can be visually |
| 26 | displayed by pytimechart. |
| 27 | |
| 28 | |
| 29 | Quick-start using Sandbox |
| 30 | ------------------------- |
| 31 | |
| 32 | Sandbox is a build of U-Boot that can run under Linux so it is a convenient |
| 33 | way of trying out tracing before you use it on your actual board. To do |
| 34 | this, follow these steps: |
| 35 | |
| 36 | Add the following to include/configs/sandbox.h (if not already there) |
| 37 | |
| 38 | #define CONFIG_TRACE |
| 39 | #define CONFIG_CMD_TRACE |
| 40 | #define CONFIG_TRACE_BUFFER_SIZE (16 << 20) |
| 41 | #define CONFIG_TRACE_EARLY_SIZE (8 << 20) |
| 42 | #define CONFIG_TRACE_EARLY |
| 43 | #define CONFIG_TRACE_EARLY_ADDR 0x00100000 |
| 44 | |
| 45 | Build sandbox U-Boot with tracing enabled: |
| 46 | |
| 47 | $ make FTRACE=1 O=sandbox sandbox_config |
| 48 | $ make FTRACE=1 O=sandbox |
| 49 | |
| 50 | Run sandbox, wait for a bit of trace information to appear, and then capture |
| 51 | a trace: |
| 52 | |
| 53 | $ ./sandbox/u-boot |
| 54 | |
| 55 | |
| 56 | U-Boot 2013.04-rc2-00100-ga72fcef (Apr 17 2013 - 19:25:24) |
| 57 | |
| 58 | DRAM: 128 MiB |
| 59 | trace: enabled |
| 60 | Using default environment |
| 61 | |
| 62 | In: serial |
| 63 | Out: serial |
| 64 | Err: serial |
| 65 | =>trace stats |
Wolfgang Denk | 93e1459 | 2013-10-04 17:43:24 +0200 | [diff] [blame] | 66 | 671,406 function sites |
| 67 | 69,712 function calls |
| 68 | 0 untracked function calls |
| 69 | 73,373 traced function calls |
| 70 | 16 maximum observed call depth |
| 71 | 15 call depth limit |
| 72 | 66,491 calls not traced due to depth |
Simon Glass | b2e16a8 | 2013-06-11 11:14:39 -0700 | [diff] [blame] | 73 | =>trace stats |
Wolfgang Denk | 93e1459 | 2013-10-04 17:43:24 +0200 | [diff] [blame] | 74 | 671,406 function sites |
Simon Glass | b2e16a8 | 2013-06-11 11:14:39 -0700 | [diff] [blame] | 75 | 1,279,450 function calls |
Wolfgang Denk | 93e1459 | 2013-10-04 17:43:24 +0200 | [diff] [blame] | 76 | 0 untracked function calls |
| 77 | 950,490 traced function calls (333217 dropped due to overflow) |
| 78 | 16 maximum observed call depth |
| 79 | 15 call depth limit |
Simon Glass | b2e16a8 | 2013-06-11 11:14:39 -0700 | [diff] [blame] | 80 | 1,275,767 calls not traced due to depth |
| 81 | =>trace calls 0 e00000 |
| 82 | Call list dumped to 00000000, size 0xae0a40 |
| 83 | =>print |
| 84 | baudrate=115200 |
| 85 | profbase=0 |
| 86 | profoffset=ae0a40 |
| 87 | profsize=e00000 |
| 88 | stderr=serial |
| 89 | stdin=serial |
| 90 | stdout=serial |
| 91 | |
| 92 | Environment size: 117/8188 bytes |
| 93 | =>sb save host 0 trace 0 ${profoffset} |
| 94 | 11405888 bytes written in 10 ms (1.1 GiB/s) |
| 95 | =>reset |
| 96 | |
| 97 | |
| 98 | Then run proftool to convert the trace information to ftrace format. |
| 99 | |
| 100 | $ ./sandbox/tools/proftool -m sandbox/System.map -p trace dump-ftrace >trace.txt |
| 101 | |
| 102 | Finally run pytimechart to display it: |
| 103 | |
| 104 | $ pytimechart trace.txt |
| 105 | |
| 106 | Using this tool you can zoom and pan across the trace, with the function |
| 107 | calls on the left and little marks representing the start and end of each |
| 108 | function. |
| 109 | |
| 110 | |
| 111 | CONFIG Options |
| 112 | -------------- |
| 113 | |
| 114 | - CONFIG_TRACE |
| 115 | Enables the trace feature in U-Boot. |
| 116 | |
| 117 | - CONFIG_CMD_TRACE |
| 118 | Enables the trace command. |
| 119 | |
| 120 | - CONFIG_TRACE_BUFFER_SIZE |
| 121 | Size of trace buffer to allocate for U-Boot. This buffer is |
| 122 | used after relocation, as a place to put function tracing |
| 123 | information. The address of the buffer is determined by |
| 124 | the relocation code. |
| 125 | |
| 126 | - CONFIG_TRACE_EARLY |
| 127 | Define this to start tracing early, before relocation. |
| 128 | |
| 129 | - CONFIG_TRACE_EARLY_SIZE |
| 130 | Size of 'early' trace buffer. Before U-Boot has relocated |
| 131 | it doesn't have a proper trace buffer. On many boards |
| 132 | you can define an area of memory to use for the trace |
| 133 | buffer until the 'real' trace buffer is available after |
| 134 | relocation. The contents of this buffer are then copied to |
| 135 | the real buffer. |
| 136 | |
| 137 | - CONFIG_TRACE_EARLY_ADDR |
| 138 | Address of early trace buffer |
| 139 | |
| 140 | |
| 141 | Building U-Boot with Tracing Enabled |
| 142 | ------------------------------------ |
| 143 | |
| 144 | Pass 'FTRACE=1' to the U-Boot Makefile to actually instrument the code. |
| 145 | This is kept as a separate option so that it is easy to enable/disable |
| 146 | instrumenting from the command line instead of having to change board |
| 147 | config files. |
| 148 | |
| 149 | |
| 150 | Collecting Trace Data |
| 151 | --------------------- |
| 152 | |
| 153 | When you run U-Boot on your board it will collect trace data up to the |
| 154 | limit of the trace buffer size you have specified. Once that is exhausted |
| 155 | no more data will be collected. |
| 156 | |
| 157 | Collecting trace data has an affect on execution time/performance. You |
| 158 | will notice this particularly with trvial functions - the overhead of |
| 159 | recording their execution may even exceed their normal execution time. |
| 160 | In practice this doesn't matter much so long as you are aware of the |
| 161 | effect. Once you have done your optimisations, turn off tracing before |
| 162 | doing end-to-end timing. |
| 163 | |
| 164 | The best time to start tracing is right at the beginning of U-Boot. The |
| 165 | best time to stop tracing is right at the end. In practice it is hard |
| 166 | to achieve these ideals. |
| 167 | |
| 168 | This implementation enables tracing early in board_init_f(). This means |
| 169 | that it captures most of the board init process, missing only the |
| 170 | early architecture-specific init. However, it also misses the entire |
| 171 | SPL stage if there is one. |
| 172 | |
| 173 | U-Boot typically ends with a 'bootm' command which loads and runs an |
| 174 | OS. There is useful trace data in the execution of that bootm |
| 175 | command. Therefore this implementation provides a way to collect trace |
| 176 | data after bootm has finished processing, but just before it jumps to |
| 177 | the OS. In practical terms, U-Boot runs the 'fakegocmd' environment |
| 178 | variable at this point. This variable should have a short script which |
| 179 | collects the trace data and writes it somewhere. |
| 180 | |
| 181 | Trace data collection relies on a microsecond timer, accesed through |
| 182 | timer_get_us(). So the first think you should do is make sure that |
| 183 | this produces sensible results for your board. Suitable sources for |
| 184 | this timer include high resolution timers, PWMs or profile timers if |
| 185 | available. Most modern SOCs have a suitable timer for this. Make sure |
| 186 | that you mark this timer (and anything it calls) with |
| 187 | __attribute__((no_instrument_function)) so that the trace library can |
| 188 | use it without causing an infinite loop. |
| 189 | |
| 190 | |
| 191 | Commands |
| 192 | -------- |
| 193 | |
| 194 | The trace command has variable sub-commands: |
| 195 | |
| 196 | - stats |
| 197 | Display tracing statistics |
| 198 | |
| 199 | - pause |
| 200 | Pause tracing |
| 201 | |
| 202 | - resume |
| 203 | Resume tracing |
| 204 | |
| 205 | - funclist [<addr> <size>] |
| 206 | Dump a list of functions into the buffer |
| 207 | |
| 208 | - calls [<addr> <size>] |
| 209 | Dump function call trace into buffer |
| 210 | |
| 211 | If the address and size are not given, these are obtained from environment |
| 212 | variables (see below). In any case the environment variables are updated |
| 213 | after the command runs. |
| 214 | |
| 215 | |
| 216 | Environment Variables |
| 217 | --------------------- |
| 218 | |
| 219 | The following are used: |
| 220 | |
| 221 | - profbase |
| 222 | Base address of trace output buffer |
| 223 | |
| 224 | - profoffset |
| 225 | Offset of first unwritten byte in trace output buffer |
| 226 | |
| 227 | - profsize |
| 228 | Size of trace output buffer |
| 229 | |
| 230 | All of these are set by the 'trace calls' command. |
| 231 | |
| 232 | These variables keep track of the amount of data written to the trace |
| 233 | output buffer by the 'trace' command. The trace commands which write data |
| 234 | to the output buffer can use these to specify the buffer to write to, and |
| 235 | update profoffset each time. This allows successive commands to append data |
| 236 | to the same buffer, for example: |
| 237 | |
| 238 | trace funclist 10000 e00000 |
| 239 | trace calls |
| 240 | |
| 241 | (the latter command appends more data to the buffer). |
| 242 | |
| 243 | |
| 244 | - fakegocmd |
| 245 | Specifies commands to run just before booting the OS. This |
| 246 | is a useful time to write the trace data to the host for |
| 247 | processing. |
| 248 | |
| 249 | |
| 250 | Writing Out Trace Data |
| 251 | ---------------------- |
| 252 | |
| 253 | Once the trace data is in an output buffer in memory there are various ways |
| 254 | to transmit it to the host. Notably you can use tftput to send the data |
| 255 | over a network link: |
| 256 | |
| 257 | fakegocmd=trace pause; usb start; set autoload n; bootp; |
| 258 | trace calls 10000000 1000000; |
| 259 | tftpput ${profbase} ${profoffset} 192.168.1.4:/tftpboot/calls |
| 260 | |
| 261 | This starts up USB (to talk to an attached USB Ethernet dongle), writes |
| 262 | a trace log to address 10000000 and sends it to a host machine using |
| 263 | TFTP. After this, U-Boot will boot the OS normally, albeit a little |
| 264 | later. |
| 265 | |
| 266 | |
| 267 | Converting Trace Output Data |
| 268 | ---------------------------- |
| 269 | |
| 270 | The trace output data is kept in a binary format which is not documented |
| 271 | here. To convert it into something useful, you can use proftool. |
| 272 | |
| 273 | This tool must be given the U-Boot map file and the trace data received |
| 274 | from running that U-Boot. It produces a text output file. |
| 275 | |
| 276 | Options |
| 277 | -m <map_file> |
| 278 | Specify U-Boot map file |
| 279 | |
| 280 | -p <trace_file> |
| 281 | Specifiy profile/trace file |
| 282 | |
| 283 | Commands: |
| 284 | |
| 285 | - dump-ftrace |
| 286 | Write a text dump of the file in Linux ftrace format to stdout |
| 287 | |
| 288 | |
| 289 | Viewing the Trace Data |
| 290 | ---------------------- |
| 291 | |
| 292 | You can use pytimechart for this (sudo apt-get pytimechart might work on |
| 293 | your Debian-style machine, and use your favourite search engine to obtain |
| 294 | documentation). It expects the file to have a .txt extension. The program |
| 295 | has terse user interface but is very convenient for viewing U-Boot |
| 296 | profile information. |
| 297 | |
| 298 | |
| 299 | Workflow Suggestions |
| 300 | -------------------- |
| 301 | |
| 302 | The following suggestions may be helpful if you are trying to reduce boot |
| 303 | time: |
| 304 | |
| 305 | 1. Enable CONFIG_BOOTSTAGE and CONFIG_BOOTSTAGE_REPORT. This should get |
| 306 | you are helpful overall snapshot of the boot time. |
| 307 | |
| 308 | 2. Build U-Boot with tracing and run it. Note the difference in boot time |
| 309 | (it is common for tracing to add 10% to the time) |
| 310 | |
| 311 | 3. Collect the trace information as descibed above. Use this to find where |
| 312 | all the time is being spent. |
| 313 | |
| 314 | 4. Take a look at that code and see if you can optimise it. Perhaps it is |
| 315 | possible to speed up the initialisation of a device, or remove an unused |
| 316 | feature. |
| 317 | |
| 318 | 5. Rebuild, run and collect again. Compare your results. |
| 319 | |
| 320 | 6. Keep going until you run out of steam, or your boot is fast enough. |
| 321 | |
| 322 | |
| 323 | Configuring Trace |
| 324 | ----------------- |
| 325 | |
| 326 | There are a few parameters in the code that you may want to consider. |
| 327 | There is a function call depth limit (set to 15 by default). When the |
| 328 | stack depth goes above this then no tracing information is recorded. |
| 329 | The maximum depth reached is recorded and displayed by the 'trace stats' |
| 330 | command. |
| 331 | |
| 332 | |
| 333 | Future Work |
| 334 | ----------- |
| 335 | |
| 336 | Tracing could be a little tidier in some areas, for example providing |
| 337 | run-time configuration options for trace. |
| 338 | |
| 339 | Some other features that might be useful: |
| 340 | |
| 341 | - Trace filter to select which functions are recorded |
| 342 | - Sample-based profiling using a timer interrupt |
| 343 | - Better control over trace depth |
| 344 | - Compression of trace information |
| 345 | |
| 346 | |
| 347 | Simon Glass <sjg@chromium.org> |
| 348 | April 2013 |