Rules and Criteria |DPC3

Competition Rules

The competition will proceed as follows. The contestants will be responsible for implementing and evaluating their algorithm in the provided framework. The framework itself should not be modified (except for the contestant’s prefetcher files that implement their prefetching algorithms). Submissions will be taken, and the contestants’ prefetcher C++ files will be compiled and run with the original version of the framework. The contestants will be ranked on the basis of the measured performance of their prefetching algorithms, relative to a baseline with all prefetching disabled. Each contestant will receive two scores, one for each of the two configurations (single-core and 4-core), that measure the geometric mean of their prefetching algorithm speedups across a set of benchmarks. The overall score is the sum of these two scores. For example, if a contestant has scores of 1.05, and 1.03 for the two configurations, his/her overall score is 2.08. The two configurations are 1 cores running a single trace, and 4 cores running 4 traces (the trace mix used for evaluation will not be disclosed beforehand).

Your 3 prefetchers (L1, L2, and L3) should not explicitly communicate with each other, and your prefetcher code should not access any information about the system other than the following:

NUM_CPUS -- Number of cores in the simulation

current_core_cycle[cpu] -- Current simulation cycle

MSHR.occupancy and MSHR.SIZE -- Per-cache level MSHR resource availability

PQ.occupancy and PQ.SIZE -- Per-cache level prefetch queue resource availability

get_set(cl_address) and get_way(cl_address, set) -- Set and way information for a cacheline. NOTE: Two warnings about these functions. First, these should not be used to gain oracular knowledge of the contents of the cache. So for example you shouldn’t use these functions to filter out prefetch candidates, and you shouldn’t scan the contents of the cache looking for prefetchable patterns. Instead, you should only use these functions on the addr argument of the prefetcher_operate() functions. Reviewers will be on the lookout for any abuses of these functions. Second, these functions expect a cache line address, and the addr argument in the prefetcher_operate() functions is a byte address, so you will need to call the set and way functions with addr>>LOG2_BLOCK_SIZE.

Your prefetchers should be driven only by the information passed to them as arguments in the prefetching functions, plus the above exceptions. Because the evaluation trace list is public, your prefetchers should not attempt to identify which trace is running and configure itself specifically for that trace. Doing so may cause your submission to be disqualified. Instead, your prefetchers should be versatile enough to deal with a variety of different workloads.

Your 3 prefetchers have a shared, per-core storage budget of 64 KB. So the 1-core configuration has a total budget of 64 KB, and the 4-core configuration has a total budget of 4 x 64 KB = 256 KB. This budget can be distributed among the L1, L2, and L3 prefetchers however you choose. There is no logical complexity budget, but it would be nice if you could discuss how implementable your prefetcher is in your submitted paper.

We will evaluate your prefetchers using the perceptron branch predictor and LRU LLC cache replacement algorithms included with ChampSim, building the simulator with these commands:

./build_champsim.sh perceptron [your_l1d_prefetcher_here] [your_l2c_prefetcher_here] [your_llc_prefetcher_here] lru 1

and

./build_champsim.sh perceptron [your_l1d_prefetcher_here] [your_l2c_prefetcher_here] [your_llc_prefetcher_here] lru 4

Acceptance Criteria

In the interest of assembling a quality program for workshop attendees and future readers, there will be an overall selection process, of which performance ranking is a key component, but not the sole component. To be considered, submissions must conform to the submission requirements described above.

Submissions will be selected to appear in the workshop on the basis of the performance ranking, novelty, and overall quality of the paper and commented code. Novelty is not a strict requirement. For example, a contestant may submit his/her previously published prefetchers or make incremental enhancements to previously proposed prefetchers. In such cases, performance is a heavily weighted criterion, as is overall quality of the paper (for example, analysis of the new results on the common framework, etc.). Conversely, a very novel submission that is not necessarily a top performer in the competition will be considered not just from a performance standpoint, but also on the basis of insights etc.