
Anthony Williams wrote:
"Peter Dimov" <pdimov@gmail.com> writes:
On x86 all loads already have acquire semantics by default, and all stores have release semantics.
On Itanium, sure. <quote source=Intel Itanium Architecture Software Developer's Manual> 6.3.4 Memory Ordering Interactions IA-32 instructions are mapped into the Itanium memory ordering model as follows: - All IA-32 stores have release semantics - All IA-32 loads have acquire semantics - All IA-32 read-modify-write or lock instructions have release and acquire semantics (fully fenced). </quote>
Not according to the intel specs. 25366818.pdf (IA32 software developers manual volume 3A), section 7.7.2:
The thing is that x86 native doesn't have officially defined memory model (Itanium mapping may well be stronger than x86 native). Note that what you quote below was written for testers with scopes on "system bus".
"1. Reads can be carried out speculatively and in any order."
However... http://www.well.com/~aleks/CompOsPlan9/0005.html <quote author=an architect at Intel> The PPro does speculative and out-of-order loads. However, it has a mechanism called the "memory order buffer" to ensure that the above memory ordering model is not violated. Load and store instructions do not get retired until the processor can prove there are no memory ordering violations in the actual order of execution that was used. Stores do not get sent to memory until they are ready to be retired. If the processor detects a memory ordering violation, it discards all unretired operations (including the offending memory operation) and restarts execution at the oldest unretired instruction. </quote> Consider also: Kourosh Gharachorloo, Anoop Gupta, and John Hennessy. Two techniques to enhance the performance of memory consistency models. In Proceedings of the 1991 International Conference on Parallel Processing (Vol. I Architecture), pages 1-355-364, August 1991. <quote> The speculative-load buffer provides the detection mechanism by signaling when the speculated result is incorrect. The buffer works as follows. Loads that are retired from the reservation station are put into the buffer in addition to being issued to the memory system. There are four fields per entry (as shown in Figure 4): load address, acq, done, and store tag. The load address field holds the physical address for the load. The acq field is set if the load is considered an acquire access. For SC, all loads are treated as acquires. The done field is set when the load is performed. If the consistency constraints require the load to be delayed for a previous store, the store tag uniquely identifies that store. A null store tag specifies that the load depends on no previous stores. When a store completes, its corresponding tag in the speculative-load buffer is nullified if present. Entries are retired in a FIFO manner. Two conditions need to be satisfied before an entry at the head of the buffer is retired. First, the store tag field should equal null. Second, the done field should be set if the acq field is set. Therefore, for SC, an entry remains in the buffer until all previous load and store accesses complete and the load access it refers to completes. Appendix A describes how an atomic read-modify-write can be incorporated in the above implementation. We now describe the detection mechanism. The following coherence transactions are monitored by the speculativeload buffer: invalidations (or ownership requests), updates, and replacements.3 The load addresses in the buffer are associatively checked for a match with the address of such transactions.4 Multiple matches are possible. We assume the match closest to the head of the buffer is reported. A match in the buffer for an address that is being invalidated or updated signals the possibility of an incorrect speculation. A match for an address that is being replaced signifies that future coherence transactions for that address will not be sent to the processor. In either case, the speculated value for the load is assumed to be incorrect. Guaranteeing the constraints for release consistency can be done in a similar way to SC. The conventional way to provide RC is to delay a release access until its previous accesses complete and to delay accesses following an acquire until the acquire completes. Let us first consider delays for stores. The mechanism that provides precise interrupts by holding back store accesses in the store buffer is sufficient for guaranteeing that stores are delayed for the previous acquire. Although the mechanism described is stricter than what RC requires, the conservative implementation is required for providing precise interrupts. The same mechanism also guarantees that a release (which is simply a special store access) is delayed for previous load accesses. To guarantee a release is also delayed for previous store accesses, the store buffer delays the issue of the release operation until all previously issued stores are complete. In contrast to SC, however, ordinary stores are issued in a pipelined manner. </quote> and, also somewhat related: http://www.cs.wisc.edu/~cain/pubs/micro01_correct_vp.pdf regards, alexander.