#aca rgpv Advance Computer Architecture unit 4 part 1
Cache Coherence and Synchronization
Cache coherence problem
An important problem that must be addressed in many parallel systems – any system that allows multiple processors to access (potentially) multiple copies of data – is cache coherence. The existence of multiple cached copies of data creates the possibility of inconsistency between a cached copy and the shared memory or between cached copies themselves.
DMA I/O – this inconsistency problem occur during the I/O operation that bypass the cache. This problem is present even in a uniprocessor and can be removed by OS cache flushes)
• In practice, these issues are managed by a memory bus, which by its very nature ensures write serialization, and also allows us to broadcast invalidation signals (we essentially just put the memory address to be invalidated on the bus). We can add an extra valid bit to cache tags to mark then invalid. Typically, we would use a write-back cache, because it has much lower memory bandwidth requirements. Each processor must keep track of which cache blocks are dirty – that is, that it has written to – again by adding a bit to the cache tag. If it sees a memory access for a word in a cache block it has marked as dirty, it intervenes and provides the (updated) value. There are numerous other issues to address when considering cache coherence.
One approach to maintaining coherence is to recognize that not every location needs to be shared (and in fact most don’t), and simply reserve some space for non-cacheable data such as semaphores, called a coherency domain.
Using a fixed area of memory, however, is very restrictive. Restrictions can be reduced by allowing the MMU to tag segments or pages as non-cacheable. However, that requires the OS, compiler, and programmer to be involved in specifying data that is to be coherently shared. For example, it would be necessary to distinguish between the sharing of semaphores and simple data so that the data can be cached once a processor owns its semaphore, but the semaphore itself should never be cached.
In order to remove this data inconsistency there are a number of approaches based on hardware and software techniques few are given below:
• No caches is used which is not a feasible solution
• Make shared-data non-cacheable this is the simplest software solution but produce low performance if a lot of data is shared
• software flush at strategic times: e.g., after critical sections, this is relatively simple technique but has low performance if synchronization is not frequent
• hardware cache coherence this can be achieved by making memory and caches coherent (consistent) with each other, in other words if the memory and other processors see writes then without intervention of the to software
• absolute coherence all copies of each block have same data at all times
• It is not necessary what is required is appearance of absolute coherence that is done by making temporary incoherence is OK (e.g., write-back cache)
• In general a cache coherence protocols consist of the set of possible states in local caches, the state in shared memory and the state transitions caused by the messages transported through the interconnection network to keep memory coherent. There are basically two kinds of protocols depends on how writes is handled
Snooping Cache Protocol (for bus-based machines);
With a bus interconnection, cache coherence is usually maintained by adopting a “snoopy protocol”, where each cache controller “snoops” on the transactions of the other caches and guarantees the validity of the cached data. In a (single-) multi-stage network, however, the unavailability of a system “bus” where transactions are broadcast makes snoopy protocols not useful. Directory based schemes are used in this case. In case of snooping protocol processors perform some form of snooping – that is, keeping track of other processor’s memory writes. ALL caches/memories see
In both cases, the coherence protocol does not add any overhead.
When a multistage network is used to build a large multiprocessor system, the snoopy cache protocols must be modified. Since broadcasting is very expensive in a multistage n