微积分条记04:常见的矩阵求导运算
4.1 通例矩阵求导示例
4.1.1 求导示例1:\(f(x)=A_{m\times n}\cdot x_{n \times 1}\) \(\Rightarrow f'_{x^T}(x)=A_{m\times n}\)
如:
\[A=\begin{bmatrix}a_1&a_2&a_3\\b_1&b_2&b_3\end{bmatrix},x=\begin{bmatrix}x_1\\x_2\\x_3\end{bmatrix}\Rightarrowf(x)=\begin{bmatrix}a_1x_1+a_2x_2+a_3x_3\\b_1x_1+b_2x_2+b_3x_3\end{bmatrix}\]
由矩阵性质和意义(参数项直接保存在矩阵中)可得:
\[\tag{1}f'_{x^T}(x)=\begin{bmatrix}a_1&a_2&a_3\\b_1&b_2&b_3\end{bmatrix}=A\]
4.1.2 求导示例2:\(f(x)= x_{1 \times m}\cdot A_{mm} \cdot x^T_{1 \times m} \Rightarrow f'_x(x)=(A_{mm}+A_{mm}^T)\cdot x_{1 \times m}\)
如:
\[x=\begin{bmatrix}x_1&x_2\end{bmatrix},A=\begin{bmatrix}a&b\\c&d\end{bmatrix},x^T=\begin{bmatrix}x_1\\x_2\end{bmatrix}\]
\[\Rightarrow f(x)= \begin{bmatrix}ax_1+cx_2&bx_1+dx_2\end{bmatrix}\cdot\begin{bmatrix}x_1\\x_2\end{bmatrix}\]
\[\qquad\quad=\begin{bmatrix}a{x_1}^2+bx_1x_2+cx_1x_2+dx_2^2\end{bmatrix}\]
则有:
\[f'_x(x)= \begin{bmatrix}2ax_1+bx_2+cx_2&2dx_2+bx_1+cx_1\end{bmatrix}\]
\[\tag{2}=\begin{bmatrix}a&b\\c&d\end{bmatrix}\cdot\begin{bmatrix}x_1&x_2\end{bmatrix}+\begin{bmatrix}a&c\\b&d\end{bmatrix}\cdot\begin{bmatrix}x_1&x_2\end{bmatrix}=(A+A^T)x\]
4.1.3 求导示例3:\(f(x)=x_{1\times n}^T\cdot a_{n \times 1} \Rightarrow f_x'(x)=(x_{1\times n}\cdot a_{n \times 1}^T)'_x=a\)
如:
\[x^T=\begin{bmatrix}x_1&x_2\end{bmatrix},a=\begin{bmatrix}a_1\\a_2\end{bmatrix}\]
\[\Rightarrowf(x)=x^T\cdot a=\begin{bmatrix}x_1a_1+x_2a_2\end{bmatrix}=x\cdot a^T\]
又:
\[x=\begin{bmatrix}x_1\\x_2\end{bmatrix}\]
则由矩阵的性质及意义(参数项直接保存在矩阵中),有:
\[\tag{3}f'_x(x)=(x\cdot a^T)_x'=\begin{bmatrix}a_1\\a_2\end{bmatrix}=a\]
4.1.4 求导示例4:\(f(x)=x_{m\times 1}^T\cdot A_{m \times n}\cdot y_{n \times 1} \Rightarrow f_x'(x)=Ay,f'_A(x)=xy^T\)
如:
\[x^T=\begin{bmatrix}x_1&x_2&x_3\end{bmatrix},A=\begin{bmatrix}a_1&a_2\\a_3&a_4\\a_5&a_6\end{bmatrix},y=\begin{bmatrix}y_1\\y_2\\\end{bmatrix}\]
\[\Rightarrowf(x) =x^T\cdot A\cdot y=\begin{bmatrix}a_1x_1+a_3x_2+a_5x_3&a_2x_1+a_4x_2+a_6x_3\\\end{bmatrix}\cdot\begin{bmatrix}y_1\\y_2\\\end{bmatrix}\]
\[\qquad\qquad\qquad\qquad\qquad\quad=\begin{bmatrix}(a_1x_1+a_3x_2+a_5x_3)\cdot y_1+(a_2x_1+a_4x_2+a_6x_3)\cdot y_2\end{bmatrix}\]
则有:
\[f'_x(x)=\begin{bmatrix}(a_1+a_3+a_5)\cdot y_1+(a_2+a_4+a_6)\cdot y_2\end{bmatrix}=A \cdot y\]
\[\tag{4}f'_A(x)=\begin{bmatrix}(x_1)\cdot y_1+(x_1)\cdot y_2\\(x_2)\cdot y_1+(x_2)\cdot y_2\\(x_3)\cdot y_1+(x_3)\cdot y_2\end{bmatrix}=x\cdot y^T\]
4.2 矩阵的范数求导示例
设存在矩阵\(X_{N \times n},向量a_{n \times 1},y_{N \times 1}\)
设\(f(x)=||X\cdot a-y||^2\),则\(f'_a(x)\)的求解过程如下:
由范数相关性质可得:
\[f(x)=(X\cdot a-y)\cdot (X\cdot a-y)^T\]
\[\qquad \qquad=(X\cdot a-y)\cdot (a^T\cdot X^T -y^T)\]
\[\tag{5}\qquad \qquad\qquad\qquad\qquad\quad=a\cdot X X^T \cdot a^T -X\cdot a\cdot y^T-y\cdot a^T \cdot X^T + yy^T\]
式(5)中:
对于项\(a\cdot X X^T \cdot a^T\),由通例矩阵求导的式(2)可得:
\[(a\cdot X X^T \cdot a^T)'_a=(XX^T+X^TX)\cdot a=2XX^T\cdot a\]
对于项\(X\cdot a\cdot y^T\),由通例矩阵求导的式(3)可得:
\[(X\cdot a\cdot y^T)_a'=(y^T\cdot X\cdot a )_a'=[(X^T\cdot y )^T\cdot a] _a'=X^T\cdot y\]
对于项\(y\cdot a^T \cdot X^T\):
\[(y\cdot a^T \cdot X^T)'_a=(a^T\cdot X^T\cdot y)'_a=X^T\cdot y\]
由上可得:
\[f'_a(x)=(||X\cdot a-y||^2)_a'=2(XX^T\cdot a-X^T\cdot y)\]
4.3 矩阵的迹求导示例
4.3.1 矩阵的迹求导示例1:\(tr'_A(A)=I\)
设存在矩阵\(A_{mm}\),且\(tr(A)\)为矩阵\(A\)的迹,则有:
\[tr(A)=\Sigma_{i=1}^m a_{ii}\]
由矩阵的特性和意义(参数项直接保存在矩阵中)可得:
\[\tag{6}\Rightarrowtr(A)'_A=I=\begin {bmatrix}1&&&\\&1&&\\&&...&\\&&&1\\\end{bmatrix}\]
4.3.2 矩阵的迹求导示例2:\(tr'_A(A\cdot B)=B^T\)
设存在矩阵\(A_{mm}、B_{mm}\),且\(tr(A\cdot B)\)为\(A\cdot B\)的迹,则有:
\[tr(A\cdot B)=\Sigma_{i=1}^m\Sigma_{j=1}^m a_{ij}b_{ji}\]
由矩阵的特性和意义(参数项直接保存在矩阵中)可得:
\[\tag{7}tr'_A(A\cdot B)=(\Sigma_{i=1}^m\Sigma_{j=1}^m a_{ij}b_{ji})'_A=B^T\]
4.3.3 矩阵的迹求导示例3:\(tr'_A(A\cdot A^T)=2\cdot A\)
设存在矩阵\(A_{mm}\),且\(tr(A\cdot A^T)\)为\(A\cdot A^T\)的迹,则有:
\[tr(A\cdot A^T)=\Sigma_{i=1}^m\Sigma_{j=1}^m a_{ij}a_{ji}=\Sigma_{i=1}^m\Sigma_{j=1}^m a^2_{ij}\]
由矩阵的特性和意义(参数项直接保存在矩阵中)可得:
\[\tag{8}tr'_A(A\cdot A^T)=(\Sigma_{i=1}^m\Sigma_{j=1}^m a^2_{ij})'_A=(A^2)'_A=2\cdot A\]
4.4 行列式求导示例:\(|A|'_A=|A|\cdot (A^{-1})^T\)
设存在矩阵\(A_{mm}\),\(|A|\)是A的行列式,\(a_{ij}\)是A中任一元素,\(A_{ij}\)是\(a_{ij}\)的代数余子式
则有:
\[|A|=a_{i1}A_{i1}+a_{i2}A_{i2}+...+a_{im}A_{im}\]
\[\Rightarrow |A|'_A=(a_{i1}A_{i1}+a_{i2}A_{i2}+...+a_{im}A_{im})'_A\]
\[\qquad\qquad\qquad\qquad=\begin {bmatrix}(a_{11}A_{11}+a_{12}A_{12}+...+a_{1m}A_{1m})'_A\\(a_{21}A_{21}+a_{22}A_{22}+...+a_{2m}A_{2m})'_A\\......\\(a_{m1}A_{m1}+a_{m2}A_{m2}+...+a_{mm}A_{mm})'_A\end {bmatrix}\]
\[\tag{9}\qquad\qquad\quad=\begin {bmatrix}A_{11}&A_{12}&...&A_{1m}\\A_{21}&A_{22}&...&A_{2m}\\&&......&\\A_{m1}&A_{m2}&...&A_{mm}\\\end {bmatrix}=A^{*T}\]
由矩阵的逆相关性质\(A^{-1}=\frac{A^*}{|A|}\)可得:
\[\tag{10}|A|'_A=|A|\cdot (A^{-1})^T\]
免责声明:如果侵犯了您的权益,请联系站长,我们会及时删除侵权内容,谢谢合作!更多信息从访问主页:qidao123.com:ToB企服之家,中国第一个企服评测及商务社交产业平台。 |