PUMA/RiSE Workshop 2014
image/svg+xml
PUMA/RiSE Workshop 2014
2014-10-03
Andreas Haas
##
/
##
MASTER SLIDE
PUMA/RiSE Workshop, October 2014
Fast Linearizable Timestamped Stack
Mike Dodds
University of York
Andreas Haas
University of Salzburg
Christoph M. Kirsch
University of Salzburg
0
2000
4000
6000
8000
10000
12000
2
8
16
24
32
40
48
56
64
operations per ms (more is better)
number of threads
gnuplot_plot_1
Treiber Stack
gnuplot_plot_2
EB Stack
gnuplot_plot_3
TS Stack
a
b
top
c
push
a
b
top
c
a
b
top
b
pop
a
top
top
pop
top
EMPTY
not discussed
a
b
top
d
push
a
b
top
c
c
push
||
d
a
b
top
d
c
c
push
d
push
d
push
c
push
or
a
b
top
d
push
c
push
a
b
top
c
d
pr
concurrent operationsallow multiple potentiallinearization orders
linearizationorder
a
b
top
d
push
c
push
||
a
b
top
c
c
d
the actual linearization order can be chosen lazily
insight
a
top
d
push
||
a
top
c
c
d
b
b
push
c
push
pr
less contentionon each topelement
the element removed first decidesthe linearization order
return :
d
return :
c
return :
b
a
b
top
c
a
top
b
d
a
d
top
c
the element order dependson the push-pop order
pop
a
top
c
c
d
b
pop
a
top
b
d
pop
c
push
||
a
top
c
c
d
b
d
c
b
||
b
push
c
push
pr
d
push
pop → ?
pr
A concurrent stack implementation is linearizable with respect to stack semantics if it always holds that if two elements a and b are ordered on the stack such that a is older than b, and a gets popped, then b gets popped before a.
precedence +transitivepush-pop order
Proven in Isabelle/HOL (~5000 lines).
The push-pop order has to be retrieved from the algorithm.
Theorem
Order elements in the stack sufficiently.
Preserve precedence.
Incorporate the push-pop order.
Remove elements in the right order.
pop: search for the youngest elementand remove it
a
b
c
TS buffer
[1,2]
[3,4]
[6,8]
timestampsapproximateexecution times
a
b
top
c
c
d
d
[7,8]
hardware timestampingis the fastest, software timestamping is possible
1) add element to the TS buffer
2) generate a timestamp
3) assign the timestamp to the element
pseudo code: push
element isvisible before itis timestamped
a push is ordered before a pop if its element is visible when the pop starts its search
definition: push-pop order
only concurrently generated timestamps may overlap
d
||
b
push
c
push
pr
d
push
pop → ?
pr
a
d
top
c
a
top
b
d
pop
c
push
||
b
a
b
TS buffer
[1,2]
[3,5]
d
|[4,5]
a
c
TS buffer
[1,2]
[7,8]
d
[4,5]
pop
c
push
||
b
is already timestamped,not even added
c
1) add element to the buffer
2) generate a timestamp
3) assign the timestamp to the element
pseudo code: push
find the youngest elementand remove it
a
b
c
TS buffer
[1,3]
[2,4]
[5,6]
element may have been removed while searching
d
[7,8]
elements mayhave been inserted while searching
next youngest element can be removed safely
can be ignored,can be eliminated
time
c
push
a
push
b
push
trade-off insert performance with remove performance
a
b
top
c
a
c
top
b
contentionon the topelement
time
c
push
a
push
b
push
artificial extensionscan create more"concurrent" pushes
reduced contention
data-structure
producer 1
producer 2
producer n
consumer 1
consumer 2
consumer n
1:a
1:b
1:z
2:a
2:z
n:a
n:b
n:z
...
...
...
?:?
?:?
?:?
?:?
?:?
?:?
?:?
?:?
...
...
...
artificialdelay to controlcontention
1 millionelements
1 millionelements
...
...
0
2000
4000
6000
8000
10000
12000
2
8
16
24
32
40
48
56
64
operations per ms (more is better)
number of threads
gnuplot_plot_1
Treiber Stack
gnuplot_plot_2
EB Stack (delay=21μs)
gnuplot_plot_3
TS Stack (hardware TS, delay=4.5μs)
gnuplot_plot_4
TS Stack (software TS)
0
2000
4000
6000
8000
10000
12000
2
8
16
24
32
40
48
56
64
operations per ms (more is better)
number of threads
gnuplot_plot_1
Treiber Stack
gnuplot_plot_2
EB Stack (delay=18μs)
gnuplot_plot_3
TS Stack (hardware TS, delay=4.5μs)
gnuplot_plot_4
TS Stack (software TS)
0
2000
4000
6000
8000
10000
12000
0
3000
6000
9000
12000
15000
18000
21000
0
0.5
1
1.5
2
2.5
3
operations per ms (more is better)
number of retries (less is better)
delay in ns
gnuplot_plot_1
Performance TS Stack (hardware)
gnuplot_plot_2
Retries TS Stack (hardware)
Thank You
For more information about data-structure implementations see http://scal.cs.uni-salzburg.at/
a
push
pr
b
push
pr
c
push
pr
b
pop→
lin
a
pop→
c
pop→
pr
||
pr
lin
a
push
b
push
a
pop→
b
pop→
b
push
c
push
b
pop→
c
pop→
b
push
c
push
c
pop→
a
push
c
push
a
pop→
c
pop→
locally correct,globally incorrect
a
b
top
pop
c
push
||
a
b
top
c
no state change
elimination is always correct