I am going to try and answer this myself... TL;DR : This is a correct implementation, but potentially more expensive than the one with volatile?.
Though this looks better, it can under-perform in some case. I am going to push myself against the famous IRIW example
: independent reads of independent writes:
volatile x, y ----------------------------------------------------- x = 1 | y = 1 | int r1 = x | int r3 = y | | int r2 = y | int r4 = x
This reads as :
- there are two threads (
ThreadA
andThreadB
) that write tox
andy
(x = 1
andy = 1
) - there are two more threads (
ThreadC
andThreadD
) that readx
andy
, but in reverse order.
Because x
and y
are volatile
a result as below is impossible:
r1 = 1 (x) r3 = 1 (y) r2 = 0 (y) r4 = 0 (x)
This is what sequential consistency
of volatile
guarantees. If ThreadC
observed the write to x
(it saw that x = 1
), it means that ThreadD
MUST observe the same x = 1
. This is because in a sequential consistent execution writes happens as-if in global order, or it happens as-if atomically, everywhere. So every single thread must see the same value. So this execution is impossible, according to the JLS too:
If a program has no data races, then all executions of the program will appear to be sequentially consistent.
Now if we move the same example to release/acquire
(x = 1
and y = 1
are releases while the other reads are acquires):
non-volatile x, y ----------------------------------------------------- x = 1 | y = 1 | int r1 = x | int r3 = y | | int r2 = y | int r4 = x
A result like:
r1 = 1 (x) r3 = 1 (y)r2 = 0 (y) r4 = 0 (x)
is possible and allowed. This breaks sequential consistency
and this is normal, since release/acquire
is "weaker". For x86
release/acquire does not impose a StoreLoad
barrier , so an acquire
is allowed to go above (reorder) an release
(unlike volatile
which prohibits this). In simpler words volatile
s themselves are not allowed to be re-ordered, while a chain like:
release ... // (STORE) acquire ... // this acquire (LOAD) can float ABOVE the release
is allowed to be "inverted" (reordered), since StoreLoad
is not mandatory.
Though this is somehow wrong and irrelevant, because JLS
does not explain things with barriers. Unfortunately, these are not yet documented in the JLS either...
If I extrapolate this to the example of SingletonFactory
, it means that after a release :
VAR_HANDLE.setRelease(FACTORY, localSingleton);
any other thread that does an acquire
:
Singleton localSingleton = (Singleton) VAR_HANDLE.getAcquire(FACTORY);
is not guaranteed to read the value from the release (a non-null Singleton
).
Think about it: in case of volatile
, if one thread has seen the volatile write, every other thread will, for sure, see it too. There is no such guarantee with release/acquire
.
As such, with release/acquire
every thread might need to enter the synchronized block. And this might happen for many threads, because it's really unknown when the store that happened in the release
will be visible by the load acquire
.
And even if the synchronized
itself does offer happens-before order, this code, at least for some time (until the release is observed) is going to perform worse? (I assume so): every thread competing to enter the synchronized block.
So in the end - this is about what is more expensive? A volatile store
or an eventually seen release
. I have no answer to this one.