Benchmarking the ARM Cortex-A9 processor
In computing, a benchmark is the act of running a computer program, a set of programs, or other operations, in order to assess the relative performance of an object, normally by running a number of standard tests and trials against it. The term 'benchmark' is also mostly utilized for the purposes of elaborately-designed benchmarking programs themselves.
Benchmarking is usually associated with assessing performance characteristics of computer hardware, for example, the floating point operation performance of a CPU, but there are circumstances when the technique is also applicable to software. Software benchmarks are, for example, run against compilers or database management systems. CPU core benchmarking
Although it doesn’t reflect how you would use a processor in a real application, sometimes it’s important to isolate the CPU’s core from the other elements of the processor and focus on one key element. For example, you might want to have the ability to ignore memory and I/O effects and focus primarily on the pipeline operation. This is CoreMark’s domain. CoreMark is capable of testing a processor’s basic pipeline structure, as well as the ability to test basic read/write operations, integer operations, and control operations. Read more. CoreMark
CoreMark is a benchmark that aims to measure the performance of central processing units (CPU) used in embedded systems. It was developed in 2009 by Shay Gal-On at EEMBC and is intended to become an industry standard, replacing the antiquated Dhrystone benchmark. The code is written in C code and contains implementations of the following algorithms: list processing (find and sort), Matrix (mathematics) manipulation (common matrix operations), state machine (determine if an input stream contains valid numbers), and CRC.

Downloading CoreMark
Here is the result after unpacking. We will create a new application in SDK called CoreMark and copy the the marked c-files to the src directory.

Compiling
Trying to compile the Coremark program without modifications gives the following error:
undefined reference to `clock_gettime'
We are running this application "bare metal" (without OS). This means we don't have access to a real-time clock (RTC) and we can not use the library routines in time.h. It looks like we have to write our own "clock_gettime" routine.
Bare-metal application development
Xilinx software design tools facilitate the development of embedded software applications for many runtime environments. Xilinx embedded design tools create a set of hardware platform data files that include:
• An XML-based hardware description file describing processors, peripherals, memory maps, and additional system data
• A bitstream file containing optional Programmable Logic (PL) programming data
• A block RAM Memory Map (BMM) file
• PS configuration data used by the Zynq-7000 AP SoC First Stage Bootloader (FSBL).
The bare-metal Board Support Package (BSP) is a collection of libraries and drivers that form the lowest layer of your application. The runtime environment is a simple, semi-hosted and single-threaded environment that provides basic features, including boot code, cache functions, exception handling, basic file I/O, C library support for memory allocation and other calls, processor hardware access macros, timer functions, and other functions required to support bare-metal applications. Using the hardware platform data and bare-metal BSP, you can develop, debug, and deploy bare-metal applications using SDK.
Board support package
The BSP <standalone_bsp_0> we generated in our first software project stores all the information about our board setup and all the software we need to start writing a bare metal program. The libsrc directory contains low-level drivers and example code to be used when writing software to access the hardware in the processing system. We will take a closer look in the scutimer_v1_02_a directory.

Writing our own clock_gettime
We will use one of the timers available in the in ARM processor to count clock cycles and measure time intervals. Let's take a look on the timer setup. Here is a picture taken from chapter 8 in the Zynq-7000 Technical Reference Manual.

Timers
Each Cortex-A9 processor has its own private 32-bit timer and 32-bit watchdog timer. Both processors share a global 64-bit timer. These timers are always clocked at 1/2 of the CPU frequency (667MHz). On the system level, there is a 24-bit watchdog timer and two 16-bit triple timer/counters. The system watchdog timer is clocked at 1/4 or 1/6 of the CPU frequency, or can be clocked by an external signal from an MIO pin or from the PL. The two triple timers/counters are always clocked at 1/4 or 1/6 of the CPU frequency, and are used to count the widths of signal pulses from an MIO pin or from the PL. Read more about the timers in the Cortex-A9 MPCore Technical Reference Manual chapter 4.
Program example
Here is an example program that uses the ARM CPU private timer to measure the time it takes to run the CoreMark benchmark program. It is used in the core_portme.c to read the timer counter register before the program starts and when it has finished.
ee_u32 GetTimerValue(ee_u32 TimerIntrId,ee_u16 Mode)
{
int Status;
XScuTimer_Config *ConfigPtr;
volatile ee_u32 CntValue = 0;
XScuTimer *TimerInstancePtr = &Timer;
if (Mode == 0) {
// Initialize the Private Timer so that it is ready to use
ConfigPtr = XScuTimer_LookupConfig(TimerIntrId);
Status = XScuTimer_CfgInitialize(TimerInstancePtr, ConfigPtr,
ConfigPtr->BaseAddr);
if (Status != XST_SUCCESS) {
return XST_FAILURE; }
// Load the timer prescaler register.
XScuTimer_SetPrescaler(TimerInstancePtr, TIMER_RES_DIVIDER);
// Load the timer counter register.
XScuTimer_LoadTimer(TimerInstancePtr, TIMER_LOAD_VALUE);
// Start the timer counter and read start value
XScuTimer_Start(TimerInstancePtr);
CntValue = XScuTimer_GetCounterValue(TimerInstancePtr);
}
else {
// Read stop value and stop the timer counter
CntValue = XScuTimer_GetCounterValue(TimerInstancePtr);
XScuTimer_Stop(TimerInstancePtr);
}
return CntValue;
}
Compiling the modified code
Here is all the source code that will be compiled. Here are the modified files core_portme.h and core_portme.c ready to be downloaded.

Compilation setup
Right-click the CoreMark project and select C/C++ Build Settings. We will define the following symbols

and select the most optimization (-O3).

Compilation print out

Running CoreMark
Here is a print out from the CoreMark program.

CoreMark benchmark result
1998 iterations/sec and the CPU running at 667MHz will give a CoreMark value of 1998/667 ≈ 3.0 CoreMark/MHz. All you compiler experts out there please let me know about other ways to improve this result.
More benchmarking
Z-7020 based ZC702 evaluation platform
Top Previous Next