忘记门:
f t = σ ( W f ⋅ [ h t − 1 , x t ] + b f ) f_t = \sigma(W_f \cdot [h_{t-1}, x_t] + b_f) ft=σ(Wf⋅[ht−1,xt]+bf)
输入门:
i t = σ ( W i ⋅ [ h t − 1 , x t ] + b i ) i_t = \sigma(W_i \cdot [h_{t-1}, x_t] + b_i) it=σ(Wi⋅[ht−1,xt]+bi)
C ~ t = tanh ( W C ⋅ [ h t − 1 , x t ] + b C ) \tilde{C}_t = \tanh(W_C \cdot [h_{t-1}, x_t] + b_C) C~t=tanh(WC⋅[ht−1,xt]+bC)
单元状态更新:
C t = f t ∗ C t − 1 + i t ∗ C ~ t C_t = f_t \ast C_{t-1} + i_t \ast \tilde{C}_t Ct=ft∗Ct−1+it∗C~t
输出门:
o t = σ ( W o ⋅ [ h t − 1 , x t ] + b o ) o_t = \sigma(W_o \cdot [h_{t-1}, x_t] + b_o) ot=σ(Wo⋅[ht−1,xt]+bo)
h t = o t ∗ tanh ( C t ) h_t = o_t \ast \tanh(C_t) ht=ot∗tanh(Ct)
这里, C t C_t Ct 是当前的单元状态, h t h_t ht 是当前的隐藏状态, x t x_t xt 是当前的输入。
2.2 ConvLSTM原理
ConvLSTM的单元布局与LSTM非常相似,但是在每个门的盘算中使用了卷积操作。具体来说,ConvLSTM的每个门的公式可以表现为:
i t = σ ( W x i ∗ X t + W h i ∗ H t − 1 + W c i ∘ C t − 1 + b i ) i_t = \sigma (W_{xi} * X_t + W_{hi} * H_{t-1} + W_{ci} \circ C_{t-1} + b_i) it=σ(Wxi∗Xt+Whi∗Ht−1+Wci∘Ct−1+bi)
f t = σ ( W x f ∗ X t + W h f ∗ H t − 1 + W c f ∘ C t − 1 + b f ) f_t = \sigma (W_{xf} * X_t + W_{hf} * H_{t-1} + W_{cf} \circ C_{t-1} + b_f) ft=σ(Wxf∗Xt+Whf∗Ht−1+Wcf∘Ct−1+bf)
C t = f t ∘ C t − 1 + i t ∘ t a n h ( W x c ∗ X t + W h c ∗ H t − 1 + b c ) C_t = f_t \circ C_{t-1} + i_t \circ tanh(W_{xc} * X_t + W_{hc} * H_{t-1} + b_c) Ct=ft∘Ct−1+it∘tanh(Wxc∗Xt+Whc∗Ht−1+bc)
o t = σ ( W x o ∗ X t + W h o ∗ H t − 1 + W c o ∘ C t + b o ) o_t = \sigma (W_{xo} * X_t + W_{ho} * H_{t-1} + W_{co} \circ C_t + b_o) ot=σ(Wxo∗Xt+Who∗Ht−1+Wco∘Ct+bo)
H t = o t ∘ t a n h ( C t ) H_t = o_t \circ tanh(C_t) Ht=ot∘tanh(Ct)
这里的 全部 W W W都是是卷积权重, b b b是偏置项, σ \sigma σ 是 sigmoid 函数, tanh \tanh tanh 是双曲正切函数。。
(isinstance(kernel_size, list) and all([isinstance(elem, tuple) for elem in kernel_size]))):
raise ValueError('`kernel_size` 必须是 tuple 或者 list of tuples')
@staticmethod
def _extend_for_multilayer(param, num_layers):
if not isinstance(param, list):
param = [param] * num_layers
return param
复制代码
参考文献
[1]Shi, X., Chen, Z., Wang, H., Yeung, D. Y., Wong, W. K., & Woo, W. (2015). Convolutional LSTM Network: A Machine Learning [2]Approach for Precipitation Nowcasting. Advances in Neural Information Processing Systems, 28.
[2]Hochreiter, S., & Schmidhuber, J. (1997). Long Short-Term Memory. Neural Computation, 9(8), 1735-1780.
[3]Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press.