image/svg+xml
##
/
##
MASTER SLIDE
Fast and lock-free concurrent priority queue for multi-thread systems
HÃ¥kan Sundell, Philippas Tsigas
Presented by Andreas HaasUniversity of SalzburgConcurrency and Memory Management Seminar2010-12-16
Introduction
Priority queue
Skip list
Non-blocking algorithm
Correctness
No mutual exclusion
Guaranteed progress of at least one operation
Atomic operations:
Test-And-Set (TAS)
Fetch-And-Add (FAA)
Compare-And-Swap (CAS)
Insert
DeleteMin
3
1
4
6
7
5
3
1
4
6
7
5
3
1
4
6
7
5
1
3
4
6
7
5
4
4
3
1
6
7
Applications
Scheduling
Common implementations
Sorted list
Discrete Simulation
Unsorted list
Dijkstra's algorithm
...
Heap
Binary search tree
H
1
2
3
4
5
T
Singly-linked list
With shortcuts
H
1
2
3
4
5
T
2
Find the node with key 2
H
1
2
4
5
6
T
3
3
Choose the level
Find the correct position
Set the next pointers
Add the node to the list
H
1
2
4
5
6
T
3
Find the node
Remove the node
Delete the node
Return the value
3
3
d
p
d
p
d
p
d
p
d
p
next[level-1]
next[1]
next[0]
prev
level
validLevel
key
value
...
No node should be reclaimed and then later re-allocated while some other process is traversingthat node
Reference Counting is used
ABA Problem
Next pointers have to be changed consistently,but not necessarily all at once (atomically)
A marked node is interpreted as deleted
A node is interpreted to be inserted when insertedat the lowest level
Use a deletion mark to mark a node that is about tobe deleted
Helping strategy
Every thread removes all marked nodes it encounters
While traversing the skip list, all nodesare deleted which are flagged with thedeletion mark.
Helping to achieve non-blocking
d
p
d
p
d
d
p
d
p
next[level-1]
next[1]
next[0]
prev
level
validLevel
key
value
...
(1)
(2)
(3)
(4)
(5)
(6)
Create a new node
Search insert position
Change existing node
Insert new node
Insert new node
Delete node
Delete node
Search insert position
function Insert(key:integer, value:pointer to Value):boolean
I2....Choose level randomly according to the skip list.......distributionI3....newNode:=CreateNode(level,key,value);I4....COPY_NODE(newNode);
I5....node1:=COPY_ NODE(head);I6....for i:=maxLevel-1 to 1 step -1 doI7........node2:=ScanKey(&node1, i, key);I8........RELEASE_NODE(node2);I9........if i< level then I10..........savedNodes[i]:= COPY_ NODE(node1);I11..while true doI12......node2:=ScanKey(&node1, 0, key);
I12......<value2,d> := node2.value;I13......if d=false and node2.key=key thenI14..........if CAS(&node2.value, <value2,false>, ...................................................<value, false>) ...............thenI15..............RELEASE_NODE(node1);I16..............RELEASE_NODE(node2);I17..............for i:=1 to level-1 doI18..................RELEASE_NODE(savedNodes[i]);I19..............RELEASE_NODE(newNode);I20..............RELEASE_NODE(newNode);I21..............return true2;I22..........elseI23..............RELEASE_NODE(node2);I24..............continue;
I25......newNode.next[0]:= <node2, false>;I26......RELEASE_NODE(node2);I27......if CAS(&node1.next[0],<node2,false>,................................................<newNode,false>) ...........thenI28..........RELEASE_NODE(node1);I29..........break;I30......Back-OffI31..for i:=1 to level-1 doI32......newNode.validLevel:=i;I33......node1:=savedNodes[i];I34......while true doI35..........node2:=ScanKey(&node1,i,key);I36..........newNode.next[i]:= 〈node2,false〉 ;I37..........RELEASE_NODE(node2);I38..........if newNode.value.d=true or................CAS(&node1.next[i],node2,newNode) thenI39................RELEASE_NODE(node1);I40................break;I41..........Back-OffI42..newNode.validLevel:=level;
I38..........if newNode.value.d=true or................CAS(&node1.next[i],node2,newNode) thenI39..............RELEASE_NODE(node1);I40..............break;I41..........Back-OffI42..newNode.validLevel:=level;I44..if newNode.value.d=true then I45......newNode:= HelpDelete(newNode,0);I46..RELEASE_ NODE(newNode);I47..return true;
(1)
(2)
(3)-(6)
d
p
d
p
d
p
d
p
d
p
next[level-1]
next[1]
next[0]
prev
level
validLevel
key
value
...
(1)
(3)
(2)
(4)
(5)
(6)
(7)
function DeleteMin():pointer to Node
Find the first node
Delete node
D3....prev:=COPY_ NODE(head);D4....while true doD5........node1:=ReadNext(&prev,0);D6........if node1=tail thenD7............RELEASE_ NODE(prev);D8............RELEASE_ NODE(node1);D9............return NULL;........retry:D10......〈value,d〉 :=node1.value;D11......if node1 != prev.next[0].p thenD12..........RELEASE_ NODE(node1);D13..........continue;D14......if d=false thenD15..........if CAS(&node1.value,<value,false>, <value,true>) thenD16..............node1.prev:=prev;D17..............break;D18..........else goto retry;D19......else d=true thenD20..........node1:=HelpDelete(node1,0);D21......RELEASE_NODE(prev);D22......prev:=node1;
Set the deletion mark
D15..........if CAS(&node1.value, <value,false>, <value,true>) thenD16..............node1.prev:=prev;D17..............break;D18..........else goto retry;
Remove node
D23..for i:=0 to node1.level-1 doD24......repeatD25..........〈node2,d〉 :=node1.next[i];D26......until d= true or CAS(&node1.next[i], 〈node2,false〉,......................................................................〈node2,true〉);
Return value
D27..prev:=COPY_ NODE(head);D28..for i:=node1.level-1 to 0 step -1 doD29......RemoveNode(node1,&prev,i);D30..RELEASE_NODE(prev);D31..RELEASE_NODE(node1);D32..RELEASE_ NODE(node1); /* Delete the node */
D33..return value;
(1)
(1)
(2)-(7)
Changes to the skip list are donein a single CAS instruction
Insert: The next pointer pointing to the new node
The data structure is always in a consistent state
Linearizable since changes occur atomically
DeleteMin: The deletion mark of thenode with the minimal key is set
Priority Queue based on a sorted linked list
Use shortcuts (skip list) as optimization
The shortcuts are not part of the consistentstate of the priority queue
Delayed deletion using deletion marks
Helping for non-blocking
Correctness is achieved by changing theconsistent state of all threads by singleCAS operations
Questions?
HÃ¥kan Sundell, Philippas Tsigas: Fast and lock-free concurrent priority queue for multi-thread systems, JPDC 65, 2005