1.ConcurrentLinkedQueue

1.1 官方文档

An unbounded thread-safe queue based on linked nodes. This 
queue orders elements FIFO (first-in-first-out). The head of the 
queue is that element that has been on the queue the longest time. 
The tail of the queue is that element that has been on the queue 
the shortest time. New elements are inserted at the tail of the 
queue, and the queue retrieval operations obtain elements at the 
head of the queue. A ConcurrentLinkedQueue is an appropriate 
choice when many threads will share access to a common 
collection. Like most other concurrent collection implementations, 
this class does not permit the use of null elements.

This implementation employs an efficient non-blocking algorithm 
based on one described in Simple, Fast, and Practical Non-
Blocking and Blocking Concurrent Queue Algorithms by Maged M. 
Michael and Michael L. Scott.

Iterators are weakly consistent, returning elements reflecting the 
state of the queue at some point at or since the creation of the 
iterator. They do not throw ConcurrentModificationException, and 
may proceed concurrently with other operations. Elements 
contained in the queue since the creation of the iterator will be 
returned exactly once.

Beware that, unlike in most collections, the size method is NOT a 
constant-time operation. Because of the asynchronous nature of 
these queues, determining the current number of elements 
requires a traversal of the elements, and so may report inaccurate 
results if this collection is modified during traversal. Additionally, the 
bulk operations addAll, removeAll, retainAll, containsAll, equals, 
and toArray are not guaranteed to be performed atomically. For 
example, an iterator operating concurrently with an addAll 
operation might view only some of the added elements.

This class and its iterator implement all of the optional methods of 
the Queue and Iterator interfaces.

Memory consistency effects: As with other concurrent collections, 
actions in a thread prior to placing an object into a 
ConcurrentLinkedQueue happen-before actions subsequent to the 
access or removal of that element from the 
ConcurrentLinkedQueue in another thread.

基于链表的无界线程安全队列。FIFO。队列头部是入队最长时间的结点，队尾是入队时间最短。新元素插入队列尾部。不允许使用null元素。

采用有效的非阻塞算法，该算法基于Maged M. Michael 和 Michael L. Scott非阻塞和阻塞的并发队列算法。

迭代器时弱一致的。不会抛出ConcurrentModificationException，可与其他操作并发操作。自迭代器创建时开始队列中包含的元素只会返回一次。

注意，与大多数集合不同，size()方法不是常数时间操作。并且可能报告不是完全精确的数字。此外，批操作addAll removeAll retainAll containsAll equals toArray不保证以原子方式执行。

内存一致性效应：与其他并发集合一样，线程将元素放入队列中 happen-before 接下来从队列中访问或删除该元素。

 This is a modification of the Michael & Scott algorithm,
 adapted for a garbage-collected environment, with support for
 interior node deletion (to support remove(Object)).  For
 explanation, read the paper.
 
 Note that like most non-blocking algorithms in this package,
 this implementation relies on the fact that in garbage
 collected systems, there is no possibility of ABA problems due
 to recycled nodes, so there is no need to use "counted
 pointers" or related techniques seen in versions used in
 non-GC'ed settings.

是对Michael & Scott算法修改，适用于有GC环境，支持内部结点删除。

注意，与此程序包中大多数非阻塞算法一样，此实现依赖于以下事实：在GC环境中，由于循环结点不会出现ABA问题，因此无需使用计数指针或在非GC环境中的相关技术。

 The fundamental invariants are:
 - There is exactly one (last) Node with a null next reference,
   which is CASed when enqueueing.  This last Node can be
   reached in O(1) time from tail, but tail is merely an
   optimization - it can always be reached in O(N) time from
   head as well.
 - The elements contained in the queue are the non-null items in
   Nodes that are reachable from head.  CASing the item
   reference of a Node to null atomically removes it from the
   queue.  Reachability of all elements from head must remain
   true even in the case of concurrent modifications that cause
   head to advance.  A dequeued Node may remain in use
   indefinitely due to creation of an Iterator or simply a
   poll() that has lost its time slice.

基本不变式：

只有一个last结点，其next为null，在入队时CAS插入。可以从tail花费O(1)时间到达。tail只是一个优化，tail可从head花费O(N)时间到达。
从head可以访问队列中所有不为null元素。将item CAS置为null会被自动从队列中移除。在并发修改情况下这个条件也必须成立。

 The above might appear to imply that all Nodes are GC-reachable
 from a predecessor dequeued Node.  That would cause two problems:
 - allow a rogue Iterator to cause unbounded memory retention
 - cause cross-generational linking of old Nodes to new Nodes if
   a Node was tenured while live, which generational GCs have a
   hard time dealing with, causing repeated major collections.
 However, only non-deleted Nodes need to be reachable from
 dequeued Nodes, and reachability does not necessarily have to
 be of the kind understood by the GC.  We use the trick of
 linking a Node that has just been dequeued to itself.  Such a
 self-link implicitly means to advance to head.

上面的要求似乎暗示所有结点都可从出队的结点到达，这回导致两个问题：

允许恶意迭代器在内存中无限制停留
导致旧结点到新结点的跨世代链接，会导致反复的major回收

但是，只有非删除结点才需要从出队的结点可达，并且可达性并不一定必须是GC所理解的那种。将刚刚出队的结点链接到自身，这种self-link意味着前进到head。

 Both head and tail are permitted to lag.  In fact, failing to
 update them every time one could is a significant optimization
 (fewer CASes). As with LinkedTransferQueue (see the internal
 documentation for that class), we use a slack threshold of two;
 that is, we update head/tail when the current pointer appears
 to be two or more steps away from the first/last node.
 
 Since head and tail are updated concurrently and independently,
 it is possible for tail to lag behind head (why not)?

允许head和tail滞后。不是每次都更新它们是一个重要优化（更少的CAS）。与LinkedTransferQueue一样，使用松弛阈值2，当距离first/last指针两步及以上长度时，更新head/tail。

由于head tail并发独立更新，tail是否可能滞后于head？

 CASing a Node's item reference to null atomically removes the
 element from the queue.  Iterators skip over Nodes with null
 items.  Prior implementations of this class had a race between
 poll() and remove(Object) where the same element would appear
 to be successfully removed by two concurrent operations.  The
 method remove(Object) also lazily unlinks deleted Nodes, but
 this is merely an optimization.

迭代器跳过null值结点。remove延迟unlinks已删除结点。

 When constructing a Node (before enqueuing it) we avoid paying
 for a volatile write to item by using Unsafe.putObject instead
 of a normal write.  This allows the cost of enqueue to be
 "one-and-a-half" CASes.
 
 Both head and tail may or may not point to a Node with a
 non-null item.  If the queue is empty, all items must of course
 be null.  Upon creation, both head and tail refer to a dummy
 Node with null item.  Both head and tail are only updated using
 CAS, so they never regress, although again this is merely an
 optimization.

当构建一个结点时（入队前），使用Unsafe.putObject而不是普通写来避免volatile写元素。入队消耗为1.5倍的CAS。

head/tail都可能指向或者不指向非空元素。队列为元素为null，创建时，head tail都指向具有nulll值的哑结点。head tail只能通过CAS更新，所以它们不会退化。

1.2 Node

    private static class Node {
        volatile E item;
        volatile Node next;

    /**
     * A node from which the first live (non-deleted) node (if any)
     * can be reached in O(1) time.
     * Invariants:
     * - all live nodes are reachable from head via succ()
     * - head != null
     * - (tmp = head).next != tmp || tmp != head
     * Non-invariants:
     * - head.item may or may not be null.
     * - it is permitted for tail to lag behind head, that is, for tail
     *   to not be reachable from head!
     */
    private transient volatile Node head;

    /**
     * A node from which the last node on list (that is, the unique
     * node with node.next == null) can be reached in O(1) time.
     * Invariants:
     * - the last node is always reachable from tail via succ()
     * - tail != null
     * Non-invariants:
     * - tail.item may or may not be null.
     * - it is permitted for tail to lag behind head, that is, for tail
     *   to not be reachable from head!
     * - tail.next may or may not be self-pointing to tail.
     */
    private transient volatile Node tail;

    /**
     * Creates a {@code ConcurrentLinkedQueue} that is initially empty.
     */
    public ConcurrentLinkedQueue() {
        head = tail = new Node(null);
    }

注释中阐述了关于head和tail的不变式和可变式

关于head的不变式
1）所有存活的结点均可通过head的succ()到达
2）head != null
3）head的next不能指向自身
关于head的可变式：
1）head的item可为null
2）允许tail滞后于head，head不一定能到达tail
关于tail的不变式
1）last结点可以通过tail的succ()到达
2）tail不为null
关于tail的可变式
1）tail的item可为Null
2）tail可滞后于head
3）tail的next可指向自己

1.3 入队offer()

   public boolean offer(E e) {
        checkNotNull(e);
        final Node newNode = new Node(e);

        for (Node t = tail, p = t;;) {
            Node q = p.next;
            if (q == null) {
                // p is last node
                if (p.casNext(null, newNode)) {
                    // Successful CAS is the linearization point
                    // for e to become an element of this queue,
                    // and for newNode to become "live".
                    if (p != t) // hop two nodes at a time
                        casTail(t, newNode);  // Failure is OK.
                    return true;
                }
                // Lost CAS race to another thread; re-read next
            }
            else if (p == q)
                // We have fallen off list.  If tail is unchanged, it
                // will also be off-list, in which case we need to
                // jump to head, from which all live nodes are always
                // reachable.  Else the new tail is a better bet.
                p = (t != (t = tail)) ? t : head;
            else
                // Check for tail updates after two hops.
                p = (p != t && t != (t = tail)) ? t : q;
        }
    }

首先厘清几个变量的作用：

t指向tail
p指向last结点，通过q = p.next是否为null进行判断

代码中几处说明：

1）CAS每两次offer进行一次，所以tail不总是指向last结点。

 if (p != t) // hop two nodes at a time
    casTail(t, newNode);

2）p == q 表明此时其他线程删除的first结点是最后一个结点（poll中updateHead导致next指向自身），head和tail指向first前一个dummy结点。删除后，tail不变，并指向了自身，而head移到了first上。
p = (t != (t = tail)) ? t : head; 其中(t != (t = tail))表示如果其他线程更新了tail，则用新的tail，这样会更快，一个小的加速优化。
3）p = (p != t && t != (t = tail)) ? t : q; 这种情况就要向后更新p指向最后一个last结点。如果其他线程变更了tail并且p!=t，则直接将p指向新的tail，也是一个小的加速优化。

实例：

添加a：

添加b：

添加c：

1.4 出队poll()

    public E poll() {
        restartFromHead:
        for (;;) {
            for (Node h = head, p = h, q;;) {
                E item = p.item;

                if (item != null && p.casItem(item, null)) {
                    // Successful CAS is the linearization point
                    // for item to be removed from this queue.
                    if (p != h) // hop two nodes at a time
                        updateHead(h, ((q = p.next) != null) ? q : p);
                    return item;
                }
                else if ((q = p.next) == null) {
                    updateHead(h, p);
                    return null;
                }
                else if (p == q)
                    continue restartFromHead;
                else
                    p = q;
            }
        }
    }

变量说明：

h指向tail
p指向真正的first结点
q = p.next

代码中几处说明：

1）CAS每两次poll()进行一次，所以head并不总是指向first结点

if (p != h) // hop two nodes at a time
      updateHead(h, ((q = p.next) != null) ? q : p);

2）关于updateHead()
将head指向新的p结点，之后将h.next指向自身。

    final void updateHead(Node h, Node p) {
        if (h != p && casHead(h, p))
            h.lazySetNext(h);
    }

3）((q = p.next) != null) ? q : p 防止此时p已经是尾结点，所以只是将head指向尾结点。也即当前删除的是最后一个元素。
4）(q = p.next) == null表示queue是空的
5）(p == q)表明前面已经有另外的线程已经删除了head，且之前p指向的就是head，后来赋值q = p.next时此时head就已经被删除了

实例：

出队a：

出队b：

出队c：

一旦tail滞后于head，如果此时offer添加d，就会有p == q

1.5 总结

offer只会修改tail，poll只会修改head
使用了关于head和tail的不变式来保证算法正确性
以批处理方式（每2次使用CAS更新一次）更新head/tail
在offer时，使用了两个小优化将p向后推进到真正的last结点

2.ConcurrentLinkedDeque

2.1 官方文档

This is an implementation of a concurrent lock-free deque
supporting interior removes but not interior insertions, as
required to support the entire Deque interface.

We extend the techniques developed for ConcurrentLinkedQueue and
LinkedTransferQueue (see the internal docs for those classes).
Understanding the ConcurrentLinkedQueue implementation is a
prerequisite for understanding the implementation of this class.

The data structure is a symmetrical doubly-linked "GC-robust"
linked list of nodes.  We minimize the number of volatile writes
using two techniques: advancing multiple hops with a single CAS
and mixing volatile and non-volatile writes of the same memory
locations.

并发无锁双端队列的实现，支持内部移除但不支持内部插入。

扩展了为ConcurrentLinkedQueue和LinkedTransferQueue开发的技术。理解ConcurrentLinkedQueue实现是理解此类实现的先决条件。

数据结构是对称的双向链接的GC-robust结点。使用两种技术最小化volatie写的次数：前进多个位置只使用单个CAS，对相同内存位置混合使用volatile和非volatie写。

A node contains the expected E ("item") and links to predecessor
("prev") and successor ("next") nodes:

class Node { volatile Node prev, next; volatile E item; }

A node p is considered "live" if it contains a non-null item
(p.item != null).  When an item is CASed to null, the item is
atomically logically deleted from the collection.

At any time, there is precisely one "first" node with a null
prev reference that terminates any chain of prev references
starting at a live node.  Similarly there is precisely one
"last" node terminating any chain of next references starting at
a live node.  The "first" and "last" nodes may or may not be live.
The "first" and "last" nodes are always mutually reachable.

结点p的item非null则认为其是活的。当item被CAS置为null，则该元素逻辑上被认为自动从集合中删除。

任何时候，都有一个从活动结点开始能够到达的first结点和last结点，first和last结点可能是活的，也可能不是。first和last总是可以互相可达。

A new element is added atomically by CASing the null prev or
next reference in the first or last node to a fresh node
containing the element.  The element's node atomically becomes
"live" at that point.

A node is considered "active" if it is a live node, or the
first or last node.  Active nodes cannot be unlinked.

A "self-link" is a next or prev reference that is the same node:
  p.prev == p  or  p.next == p
Self-links are used in the node unlinking process.  Active nodes
never have self-links.

A node p is active if and only if:

p.item != null ||
(p.prev == null && p.next != p) ||
(p.next == null && p.prev != p)

通过将first和last结点的null prev或者next引用CAS指向包含元素的新结点实现新元素的原子添加，元素结点自动变为活的。

活结点、first和last结点被认为是活动的active，活动结点不能被unlinked。

self-link结点用于结点取消链接过程，活动结点永远不会有self-link。

如下为活动结点的判定条件：
p.item != null ||
(p.prev == null && p.next != p) ||
(p.next == null && p.prev != p)

The deque object has two node references, "head" and "tail".
The head and tail are only approximations to the first and last
nodes of the deque.  The first node can always be found by
following prev pointers from head; likewise for tail.  However,
it is permissible for head and tail to be referring to deleted
nodes that have been unlinked and so may not be reachable from
any live node.

head tail只是deque首结点和尾结点的近似值。首结点可以通过head的prev指针不断向前获得，tail类似。可以让head tail引用已取消链接的删除结点，因此可能无法从任何活结点访问到head tail。

There are 3 stages of node deletion;
"logical deletion", "unlinking", and "gc-unlinking".

1. "logical deletion" by CASing item to null atomically removes
the element from the collection, and makes the containing node
eligible for unlinking.

2. "unlinking" makes a deleted node unreachable from active
nodes, and thus eventually reclaimable by GC.  Unlinked nodes
may remain reachable indefinitely from an iterator.

Physical node unlinking is merely an optimization (albeit a
critical one), and so can be performed at our convenience.  At
any time, the set of live nodes maintained by prev and next
links are identical, that is, the live nodes found via next
links from the first node is equal to the elements found via
prev links from the last node.  However, this is not true for
nodes that have already been logically deleted - such nodes may
be reachable in one direction only.

3. "gc-unlinking" takes unlinking further by making active
nodes unreachable from deleted nodes, making it easier for the
GC to reclaim future deleted nodes.  This step makes the data
structure "gc-robust", as first described in detail by Boehm
(http://portal.acm.org/citation.cfm?doid=503272.503282).

GC-unlinked nodes may remain reachable indefinitely from an
iterator, but unlike unlinked nodes, are never reachable from
head or tail.

Making the data structure GC-robust will eliminate the risk of
unbounded memory retention with conservative GCs and is likely
to improve performance with generational GCs.

结点删除有三个阶段：逻辑删除、取消链接和回收

1）逻辑删除
CAS将元素原子置为null
2）取消链接
使得已删除结点无法从活动结点到达，因此可以通过GC进行回收。这种结点可能会从迭代器无限期可访问。
物理结点取消链接仅仅是一种优化（虽然是关键的），因此可以在方便时执行。在任何时候，由prev和next链接维护的一组活结点是相同的。但是对于已逻辑删除结点并非如此。
3）GC回收
使数据结构GC-robust可以消除长时间滞留的风险，并且可能提高世代GC的性能。

When a node is dequeued at either end, e.g. via poll(), we would
like to break any references from the node to active nodes.  We
develop further the use of self-links that was very effective in
other concurrent collection classes.  The idea is to replace
prev and next pointers with special values that are interpreted
to mean off-the-list-at-one-end.  These are approximations, but
good enough to preserve the properties we want in our
traversals, e.g. we guarantee that a traversal will never visit
the same element twice, but we don't guarantee whether a
traversal that runs out of elements will be able to see more
elements later after enqueues at that end.  Doing gc-unlinking
safely is particularly tricky, since any node can be in use
indefinitely (for example by an iterator).  We must ensure that
the nodes pointed at by head/tail never get gc-unlinked, since
head/tail are needed to get "back on track" by other nodes that
are gc-unlinked.  gc-unlinking accounts for much of the
implementation complexity.

进一步开发了self-links，想法是用特殊值替换prev和next指针，这些特殊值意味着从一端看是离开链表的。

必须确保head/tail指向的结点永远不会gc-unlinked，因为需要head/tail需要通过其他gc-unlinked结点才能回到正轨。

Since neither unlinking nor gc-unlinking are necessary for
correctness, there are many implementation choices regarding
frequency (eagerness) of these operations.  Since volatile
reads are likely to be much cheaper than CASes, saving CASes by
unlinking multiple adjacent nodes at a time may be a win.
gc-unlinking can be performed rarely and still be effective,
since it is most important that long chains of deleted nodes
are occasionally broken.

The actual representation we use is that p.next == p means to
goto the first node (which in turn is reached by following prev
pointers from head), and p.next == null && p.prev == p means
that the iteration is at an end and that p is a (static final)
dummy node, NEXT_TERMINATOR, and not the last active node.
Finishing the iteration when encountering such a TERMINATOR is
good enough for read-only traversals, so such traversals can use
p.next == null as the termination condition.  When we need to
find the last (active) node, for enqueueing a new node, we need
to check whether we have reached a TERMINATOR node; if so,
restart traversal from tail.

The implementation is completely directionally symmetrical,
except that most public methods that iterate through the list
follow next pointers ("forward" direction).

We believe (without full proof) that all single-element deque
operations (e.g., addFirst, peekLast, pollLast) are linearizable
(see Herlihy and Shavit's book).  However, some combinations of
operations are known not to be linearizable.  In particular,
when an addFirst(A) is racing with pollFirst() removing B, it is
possible for an observer iterating over the elements to observe
A B C and subsequently observe A C, even though no interior
removes are ever performed.  Nevertheless, iterators behave
reasonably, providing the "weakly consistent" guarantees.

Empirically, microbenchmarks suggest that this class adds about
40% overhead relative to ConcurrentLinkedQueue, which feels as
good as we can hope for.

由于正确性不需要unlinking或gc-unlinking，因此关于这些操作的频率有许多实现选择。由于volatile读取比CAS快很多，因此可以进行等到多次才进行一次CAS。

参考

ConcurrentLinkedQueue 源码分析 (基于Java 8)

并发容器Queue - ConcurrentLinkedQueue和ConcurrentLinkedDeque