Untitleddocument.pdf

you will improve a program that does matrix multiplication so that it works better with acache. You will be working with the pa5/matMul/ and pa5/cacheBlocking/ directories forthis part. The pa5/matMul/ directory contains a fully written matrix multiplication programpa5/matMul/matMul.c, its pa5/matMul/autograder.py testing script, test cases inpa5/matMul/tests/, and expected answers in pa5/matMul/answers/. Thepa5/cacheBlocking/ directory is where you will write your optimized version of matrixmultiplication in pa5/cacheBlocking/cacheBlocking.c.

Correctness

First, your matrix multiplication program in cacheBlocking.c should correctly do matrixmultiplication. You can use the testing harness in pa5/matMul/ to do this testing. Thepa5/cacheBlocking/autograder.py script will also do tests to check for correct matrixmultiplication.

Generating memory traces

Second, you can use valgrind to generate memory access traces using this commandfrom the pa5/cacheBlocking/ directory:

valgrind –tool=lackey –trace-mem=yes ./cacheBlocking../matMul/tests/matrix_a_2x2.txt ../matMul/tests/matrix_b_2x2.txt

Though you can and should just use the pa5/cacheBlocking/autograder.py script, whichwill call valgrind as above to generate memory traces.

The pa5/cacheBlocking/tests/ directory contains the memory access traces for thebaseline pa5/matMul/matMul program that you are competing against.

Simulating memory accesses on a cache simulator

Third, you can use the reference simulator pa5/csim-ref to simulate the memory traces.For this part of the assignment, we assume a 256-byte 4-way set-associative LRUcache with 16-byte blocks (this design should sound familiar).

The pa5/cacheBlocking/answers/ directory contains the summary statistics for thebaseline pa5/matMul/matMul program that you are competing against. You want tooptimize your cacheBlocking.c program to perform better than the baseline assuming

the above cache design. For full credit, you should have lesser of both miss count andevictions than the baseline.