搬运自:Raushan Roy-TEMPORAL CONVOLUTIONAL NETWORKS
Until recently the default choice for sequence modeling task was RNNs because of their great ability to capture temporal dependencies in sequential data. There are several variants of RNNs such as LSTM, GRU capable of capturing long term dependencies in sequences and have achieved state of art performances in seq2seq modeling tasks.
Recently, Deep Learning practitioners have been using a variation of Convolutional Neural Network architecture for the sequence modelling tasks, Temporal Convolutional Networks. This is a simple descriptive term to refer to a family of architectures.
Google DeepMind published a paper Pixel Recurrent Neural Network where the authors introduced the notion of using CNN as sequence model with fixed dependency range, by using masked convolutions. The idea was to convolve only on the elements from current timestamp or earlier in the previous layer. Google DeepMind published another paper Neural Machine Translation in Linear Time inspired from the above idea which achieved state-of -the-art performance on English-to-German translation task surpassing the recurrent networks with attentional pooling.
The distinguishing characteristics of TCNs are:
To accomplish the first point, the TCN uses a 1D fully-convolutional network (FCN) architecture, where each hidden layer is the same length as the input layer, and zero padding of length (kernel size − 1) is added to keep subsequent layers the same length as previous ones. To achieve the second point, the TCN uses causal convolutions, convolutions where an output at time t is convolved only with elements from time t and earlier in the previous layer.
To put it simply: TCN= 1D FCN + causal convolutions
This basic architecture can look back at history with size linear in the depth of the network and the filter size. Hence, capturing long term dependencies becomes really challenging. One simple solution to overcome this problem is to use dilated convolutions.
For a 1-D sequence input x ∈ R^n and a filter f: {0,…,k-1} → R, the dilated convolution operation F on element s of the sequence is defined as(Image from paper):
In above formula the causal convolution criteria is also taken care of by using only non-negative values of i. It is evident that the normal convolution is nothing but a 1 dilated convolution. For more on dilated convolution you can refer to this blog.
A dilated causal convolution with dilation factors d = 1, 2, 4 and filter size k = 3
Performing BackProp for dilated convolutions is nothing but transpose convolution operation.
Since a TCN’s receptive field depends on the network depth n as well as filter size k and dilation factor d, stabilization of deeper and larger TCNs becomes important. Therefore, a generic residual module in place of a convolutional layer.
References:
补充:
https://blog.csdn.net/Kuo_Jun_Lin/article/details/80602776