From Vakul Garg (Vakul@freescale.com) – Freescale Semiconductor - Design Lead
Introduction
The advancements in the features provided by the simulated hardware platforms have made them an attractive and viable alternative for use in embedded software development. A simulated hardware platform is a software program that models the behavior of complete system hardware which includes processor(s), RAM, flash, peripheral devices such as Ethernet controllers, UART, hard-disk etc. This typically means that embedded software developers can bring-up, test and debug whole of their end system code on their own individual desktop or other commodity multiuser server platforms without the need of the specialized hardware boards. Companies ranging from silicon manufacturers, board designers, OS/device driver/middleware providers to OEMs can accelerate their software development process through the use of simulators. Apart from freeing the dependency of software development process from the physical hardware availability, the simulators bring an unparalleled level of visibility of what is going under the hood of system hardware. This helps in rapid and easy debugging of many kinds of system level issues which are otherwise extremely hard to track on real hardware. For testing teams, simulators allow them to create such test scenarios which may occur in field conditions but are almost impossible to create in lab when using real hardware. As another side benefit, the higher level of visibility helps developers learn hardware aspects better and enables them to write efficient high performance code.
Overall simulated platforms can help to accelerate the embedded product development, cut down development costs, project risks and time to market, improve quality and engineering efficiency.
Virtual Boards
A Virtual Board is the software model of the physical target board that is being simulated. It can contain anything from a simple CPU and RAM, to a complex single board containing multiple multi-core processors, UART consoles, disk drives, USB controllers etc. The CPU architecture on the virtual board can be different than that of CPU architecture of the simulation host machine. E.g. it is possible to run a simulated model of PowerPC CPU on an Intel based host machine. Similarly, it is also possible to have multiple models of memory and I/O devices from different companies all on the same virtual board running on same host machine. Virtual boards can connect to other virtual boards using a variety of virtual interconnects such as PCI, RapidIO or Ethernet. Virtual Boards can also connect to the real hardware boards or network nodes using TCP/IP network.
Types of simulators
Depending on the simulation levels, the simulators can be broadly divided into following types.
Functional simulators: These mimic purely the functional behavioral aspect of the hardware platform which is visible to the software programs. These models leave out much of the finer low level details of the hardware which is invisible to software program. E.g. a functional simulator of a processor mimics the instruction model, exception model of the processor but leaves out the details such as processor pipelines, bus interfaces, L1 cache etc.
Cycle accurate simulators: Cycle accurate simulators in addition to the replicating functional behavior of the hardware also mimic finer low level under the hood hardware details which is invisible to the software programming model. E.g. Cycle accurate simulators provide system bus transaction details, accurate instruction timings, cache hits and misses. Because cycle accurate simulators have to model a lot of extra hardware subsystems, they are much slower than functional models and need a very powerful host machine to run.
Usage Model
Since the simulation needs to mimic the actual target hardware, programs compiled for the hardware are able to run unmodified on the virtual boards. In fact, the same binaries can be used on both the virtual and hardware boards. This eliminates the need of any special modification to the build environment and the need of special compiler. (Note that this is different than the case when application software is run over OS simulators running. In this case, the compiler compiles a binary based on host machine architecture). When virtual board is powered-on or reset, the boot sequence is same as that of actual hardware board. The virtual CPU on the board starts executing the boot loader from the simulated flash which then either loads OS kernel and mounts file system from flash or downloads OS kernel and file system images using simulated Ethernet from remote system. Thus the software developers working on a virtual board get a very similar experience to that of working on an actual hardware board (except that virtual board is slower than the hardware board and the developers are able to exercise a much higher degree of control on the simulation).
Configuring Virtual Board
The simulator software often provides a command line interface (CLI) to the users using which the virtual board and simulation software itself can be configured. Configuring virtual board means telling the simulation software about the processors residing on the board, the amount of RAM, flash memory, I/O ports (e.g. Ethernet, UART ports), different clock frequencies, and consoles attached to UARTs etc. The simulators also generally provide powerful scripting support to handle common repeated tasks such as copying boot loader, kernel images on flash, reconfiguring clocks (if needed), assigning IP address etc. The developers can try using different hardware configuration options (e.g. they can experiment with different clock frequencies) on the virtual board to figure out the settings which give the best performance.
Controlling Simulation
The simulation software allows virtual board to be paused and resumed at any arbitrary point of time. This means that all the processors, clock generators, peripherals etc on the virtual board get into frozen or paused state. In this state, the developer can inspect or modify the state of any of the simulated hardware (e.g. CPU registers, memory content etc). Freezing all the hardware components on whole of board is almost impossible in a real hardware. Later on, when done, the developer can resume the paused virtual board and it again starts running seamlessly. There are options using which specific parts of the virtual board can be paused and rest others are kept running. While using a Multicore processor on the board, one can temporarily disable a subset of cores while run the other ones or can command the simulation to advance some of the core by ‘N’ cycles and some other by ‘M’ cycles. This kind of fine grained control helps in replicating problems of race-conditions which are hard to replicate using real hardware.
Simulation snapshots
While the virtual board is paused, the developer can also take a snapshot of the simulation state and save it in a image file. The snapshot image file can be loaded into simulator later and used to run the simulation forward from the point where the snapshot was made earlier. This facility allows the snapshot images to be saved and sent to the relevant developer when the problem has reproduced for debugging purposes. Moreover, the developers can save snapshot images of the system after booting it up and doing initial configuration steps (all which can take significant amount of time). The saved snapshot images can then be directly loaded into simulation and used to execute further test cases leading to significant saving in tests execution time.
Going back in Time!!
Some simulators provide support to undo the steps which have been previously executed. In other words, the developers can command the simulation to go back in time. To the programmer, it appears as if older state has been restored. In the debugger, the program counter on processor simply appears to move back undoing the changes that have already been made. This feature is immensely powerful to debug crash kind of situations. For example, after a software crash due to NULL pointer access, one can go back step by step and pin point the place at which the pointer being accessed changed to NULL by some other culprit thread.
Source Code Debugging
The simulated CPU on the virtual board provides debug monitor which is quite similar to one provided by a JTAG debugger. This allows halting the simulated processor, inspecting/changing register values, do source code and assembly level debugging by inserting breakpoints, watch-points etc. Depending on the integration level, the debug monitor can be used with other commercial or open-source popular debuggers such as gdb, ddd etc. If using a Multicore processor simulation, it is possible to have different debugger windows open with each one displaying and controlling the source code being executed on an individual core. The breakpoints allowed by simulator are non-intrusive. This means that when they are inserted, the source code in the RAM is not changed with some target specific instruction which takes the processor control to some place which implements breakpoint feature. This means breakpoints can also be applied on the code which lies in read-only memory such as flash. Using physical hardware with JTAG debugger allows hardware breakpoints to be used for debugging code in flash. But hardware breakpoints are very limited in number. Simulation break points do not have any limitation in terms of number.
Special breakpoints
This is a unique and very powerful feature provided by the simulated platforms. It allows the developers to halt the simulation in case of arbitrary conditions e.g. when bit ‘x’ of register ‘ABC’ of 2nd core of Multicore processor toggles. Other examples can be when console attached to UART0 displays string “Hello”, when memory address ‘0xabcd’ changes from ‘p to q’. The variety of special break points exposed by each modeled sub-component of the virtual board is beyond the scope of this article and is solely dependent model implementation. Any number of special breakpoints is possible.
Code Coverage
If the simulator software has the ability to record the program flow, then it is possible for it to calculate the code coverage which has been achieved after executing a set of test cases. Unless other popular code coverage tools (e.g. open source gcov), doing it on simulator does not require special pre-instrumentation of code and does not slow down execution speed of the code. Depending on the integration level, the code coverage data recorded by the simulator can be analyzed using front end of other popular tools such as open source gcov.
Performance Profiling
The simulator software can also be configured to collect performance statistics of the program. The level of statistics collection can be extremely fine grained at clock cycle level. Similar to generating code coverage report, performance data collection does not require source code to be instrumented to insert timestamp data collection callbacks at function entry and exit points. The simulator is capable to record these times at the branches and control flow changes. The recorded run-times can be as granular as in CPU cycles spent. Using the cycle accurate simulation model, it is also possible to estimate the amount of cache hits and misses.
OS awareness
OS awareness feature of virtual platform examines the state of target hardware and provide a complete view of processes, threads and other important kernel data structures. It does this by analyzing registers and memory contents and tracking changes in these and capturing exceptions and CPU mode transitions. It does not require any kind of modification or instrumentation of the software it is tracking. As an example, using Linux OS awareness, we can directly display the ‘task_struct’ and ‘thread_info’ kernel structure of the running thread context.
Integration with IDE
Some simulator implementations are capable of being integrated into popular GUI based development environment such as Eclipse. It allows convenient visualization of hardware blocks, developer friendly method of software writing, debugging and workspace management.


3 Comments
Hi,
I agree that in case of a fast path code for dataplane processing, since the number of cycles per packet is very small (200 to 1000 cycles per packets), we have to use such cycle accurate models. It is the most efficient way to find bottlenecks. We need to replay and to see the root causes of instruction and data cache misses, of read after write, of write after read, etc., of the latency of instructions to better write codes, etc.
However, we do not need, most of the time, the models of the boards for such optimizations. So, a simple model which includes only Ethernet ports, RAM and CPU is enough for dataplane software like we need into the 6WINDGate.
I am not aware of such tools for QorIQ’s cores, we only use it for some other CPUs. Please, what’s the name of the tool? How can it be used to inject packets (typically pcap files)?
Thank you,
Vincent
Hi
For QorIQ series, we use Windriver Simics board model. It can be connected to host machine using tuntap interfaces. The tuntap interfaces can be used to inject pcap files directly. In a more sophisticated configuration where multiple network nodes also have to be connected to simulated board (e.g. router testing), User model linux instances can be connected to virtual board through tuntap.
Regards
Vakul
Hi
We use Windriver Simics for QorIQ series. It connects to tuntap interfaces on host machine to get the packets which can be injected using PCAP files or user mode linux instances or other similar software.
Regards
Vakul