Math Insight

The definition of differentiability in multivariable calculus is a bit technical. There are subtleties to watch out for, as one has to remember the existence of the derivative is a more stringent condition than the existence of partial derivatives. But, in the end, if our function is nice enough so that it is differentiable, then the derivative itself isn't too complicated. It's a fairly straightforward generalization of the single variable derivative.

In single variable calculus, you learned that the derivative of a function $f: \R \to \R$ (confused?) at a single point is just a real number, the rate of increase of the function (i.e., slope of the graph) at that point. We could think of that number as a $1 \times 1$ matrix, so if we like, we could denote the derivative of $f(x)$ at $x=a$ as \begin{align*} Df(a) = \left[\diff{f}{x}(a)\right]. \end{align*}

For a scalar-valued function of multiple variables, such as $f(x,y)$ or $f(x,y,z)$, we can think of the partial derivatives as the rates of increase of the function in the coordinate directions. If the function is differentiable, then the derivative is simply a row matrix containing all of these partial derivatives, which we call the matrix of partial derivatives (also called the Jacobian matrix). For $f: \R^n \to \R$, viewed as a $f(\vc{x})$, where $\vc{x} = (x_1,x_2,\ldots,x_n)$, the $1 \times n$ matrix of partial derivatives at $\vc{x}=\vc{a}$ is \begin{align*} Df(\vc{a}) = \left[\pdiff{f}{x_1}(\vc{a}) \ \pdiff{f}{x_2}(\vc{a}) \ \ldots \ \pdiff{f}{x_n}(\vc{a})\right]. \end{align*}

The last generalization is to vector-valued functions, $\vc{f}: \R^n \to \R^m$. Here, $\vc{f}(\vc{x})$ is a function of the vector $\vc{x} = (x_1,x_2,\ldots,x_n)$ whose output is a vector of $m$ components. We could write $\vc{f}$ in terms of its components as \begin{gather*} \vc{f}(\vc{x}) = (f_1(\vc{x}),f_2(\vc{x}), \cdots, f_m(\vc{x})) = \left[\begin{array}{c} f_1(\vc{x})\\f_2(\vc{x})\\ \vdots\\ f_m(\vc{x}) \end{array}\right]. \end{gather*} (Recall that when we view vectors as matrices, we view them as column matrices, so the components are stacked up on top of each other.)

To form the matrix of partial derivatives, we think of $\vc{f}(\vc{x})$ as column matrix, where each component is a scalar-valued function. The matrix of partial derivatives of each component $f_i(\vc{x})$ would be a $1 \times n$ row matrix, as above. We just stack these row matrices on top of each other to form a larger matrix. We get that the full $m \times n$ matrix of partial derivatives at $\vc{x}=\vc{a}$ is \begin{gather*} D\vc{f}(\vc{a})= \left[ \begin{array}{cccc} \displaystyle\pdiff{f_1}{x_1}(\vc{a})& \displaystyle\pdiff{f_1}{x_2}(\vc{a})& \ldots & \displaystyle\pdiff{f_1}{x_n}(\vc{a})\\ \displaystyle\pdiff{f_2}{x_1}(\vc{a})& \displaystyle\pdiff{f_2}{x_2}(\vc{a})& \ldots & \displaystyle\pdiff{f_2}{x_n}(\vc{a})\\ \vdots & \vdots & \ddots & \vdots\\ \displaystyle\pdiff{f_m}{x_1}(\vc{a})& \displaystyle\pdiff{f_m}{x_2}(\vc{a})& \ldots & \displaystyle\pdiff{f_m}{x_n}(\vc{a}) \end{array} \right]. \end{gather*}

Though we should probably probably refer to the derivative of $\vc{f}$ as the linear transformation that is associated with the matrix $D\vc{f}(\vc{a})$, it's fine at this level to refer to the matrix of partial derivatives $D\vc{f}(\vc{a})$ as “the derivative” of $\vc{f}$ at the point $\vc{a}$ (assuming that $\vc{f}$ is differentiable at $\vc{a}$, of course).

Examples of calculating the derivative may be helpful in making sure you understand the matrix of partial derivatives.