On Differentiating Vectors and Matrices

Machine learning algorithms involve a great deal of matrix-related differentiation and derivatives. Here we introduce some common derivative formulas for matrices and vectors.

Common data types can be divided into scalars, vectors, and matrices. Among these, the common derivative operations are: a scalar with respect to a scalar/vector/matrix, a vector with respect to a scalar/vector, and a matrix with respect to a scalar. Other derivative computations yield tensors, which we will not discuss here for now.

For representing derivative results, there are two conventions: numerator-layout notation and denominator-layout notation. Both representations are correct, and there is currently no relevant standard. Convex Optimization, which I have been reading recently, generally adopts the numerator layout.

In the numerator-layout convention, the relevant derivative dimensions are shown in the table below: Table of vector and matrix derivatives

For the derivative of a vector with respect to a scalar, or a matrix with respect to a scalar, the result is the matrix or vector obtained by differentiating the matrix or vector pointwise with respect to the scalar. The derivative of a vector with respect to a vector yields a matrix, called the Jacobian matrix.

The derivative of a vector with respect to a scalar is: Derivative of a vector with respect to a scalar The derivative of a scalar with respect to a vector is: Derivative of a scalar with respect to a vector Derivatives between a matrix and a scalar are similar to the above; when differentiating a scalar with respect to a matrix, a transpose is likewise required.

When differentiating between two vectors, a Jacobian matrix is obtained: Derivative of a vector with respect to a vector

Some common matrix derivative formulas are as follows: Derivative of a vector with respect to a vector Derivative of a vector with respect to a scalar Derivative of a scalar with respect to a vector Derivative of a scalar with respect to a vector