Computer Architecture 2 - lab 3
Influence of computer architecture on software performance

This laboratory exercise considers the influence of architectural properties of the computer to the execution of programs written in higher level languages. If possible, the exercise should be worked out on computers with different architectures and the results should be compared and discussed. As a minimal recommended solution, you can use PC/Pentium and SUN/Ultrasparc (pinus) architectures.

1. Byte ordering (endianness)

Write a computer program which is able to detect byte ordering convention employed on the computer executing the program. Determine the convention employed by the test architectures.

Hint: analyze individual bytes of a suitably crafted computer word by employing an adequate pointer.

Remark: 64-bit SPARC computers can be configured in a way to emply any of the two main byte ordering conventions. However, your task is to discover which of the two is preferred by the compiler.

2. Cache memories

The goal of this laboratory exercise is to analyze the influence of cache parameters to the performance. The exercise consists of the following tasks:

  1. Determine the following cache (L1, L2, l3?) parameters of the considered computer: total capacity, line width in bytes, and associativity. On modern x86 processors this can be accomplished by directly executing the CPUID instruction [1, 2] or by invoking the tools which employ that instruction [3, 4]. For other processors, this information can be retrieved at the web pages of the manufacturer.

    In the case of the pinus computer, the the computer type can be recovered by employing commands uname, fpversion, or prtconf. Cache parameters can be found at www pages [5].

    On computers under Linux, such pieces of information can be retrieved in virtual directories /proc/ (file cpuinfo) and /sys/devices/system/cpu/ (e.g. file cpu0/cache/index2/size).

    cat /proc/cpuinfo
    cat /sys/devices/system/cpu/cpu0/cache/index2/size

    The other option is to employ programs hwinfo, lshw lscpu or dmidecode which are able to display more detailed information about the system.

  2. Write a computer program to show the memory access performance for data in:

    Let's introduce the following notation:

    Your program should measure average bandwidth of the byte access during many rounds of execution of the following three subroutines:

    Subroutines B i C employ the memory buffer which is exactly twice as large as the capacity of the analyzed cache (L1 or L2). By using such memory buffers we ensure that the buffer is smaller than the capacity of the memory at the next level of the memory hierarchy (if we are testing L1, the buffer is smaller than L2).

    Each of these three subroutines has to be invoked many times in the loop. We see that the subroutine A relatively seldom generates L1 cache misses (once in b1 accesses). Subroutine B generates a L1 cache miss in each memory access, but a majority of these accesses shall fall inside L2. Subroutine C generates a L2 cache miss in each access.

    For each subroutine, your program should determine average time of a byte access, as well as the achieved bandwidth in MB/s. Based on the obtained data, estimate the ratios of the latencies corresponding to neighbouring levels of the memory hierarchy (t(L2)/t(L1), t(RAM)/t(L2)).

    Instructions:

  3. Many programs implement 2D matrices by linear buffers. If the buffer address is given in buf, i and j denote row and column indices, and rows and cols denote matrix dimensions, then the element at (i,j) can be accessed by buf[i*cols+j]. Often the same operation needs to be applied to all matrix elements such as in matrix multiplication. Your task is to experimentally find out whether it is better to loop first over rows and then over columns, or vice versa, in the case of large matrices which can not fit into L2 cache. The obtained results should be commented and discussed.

    Instructions

3. Influence of the data type to program performance

The goal of this exercise is to analyze the influence of (i) builtin data types and (ii) elementary operations to the program performance.

Discuss the results.

References

[1] Wikipedia: CPUID

[2] Intel Processor Identification and the CPUID Instruction

[3] x86info

[4] System Information Viewer

[5] Sun Fire V880 CINT2000 Result

[6] L1 memory cache on Intel x86 processors

Last change: 14th January 2013.