On Differentiating Vectors and Matrices
Machine learning algorithms involve a great deal of matrix-related differentiation and derivatives. Here we introduce some common derivative formulas for matrices and vectors.
Common data types can be divided into scalars, vectors, and matrices. Among these, the common derivative operations are: a scalar with respect to a scalar/vector/matrix, a vector with respect to a scalar/vector, and a matrix with respect to a scalar. Other derivative computations yield tensors, which we will not discuss here for now.
For representing derivative results, there are two conventions: numerator-layout notation and denominator-layout notation. Both representations are correct, and there is currently no relevant standard. Convex Optimization, which I have been reading recently, generally adopts the numerator layout.
In the numerator-layout convention, the relevant derivative dimensions are shown in the table below:

For the derivative of a vector with respect to a scalar, or a matrix with respect to a scalar, the result is the matrix or vector obtained by differentiating the matrix or vector pointwise with respect to the scalar. The derivative of a vector with respect to a vector yields a matrix, called the Jacobian matrix.
The derivative of a vector with respect to a scalar is:
The derivative of a scalar with respect to a vector is:
Derivatives between a matrix and a scalar are similar to the above; when differentiating a scalar with respect to a matrix, a transpose is likewise required.
When differentiating between two vectors, a Jacobian matrix is obtained:
Some common matrix derivative formulas are as follows:
