Math Insight

Derivation of the directional derivative and the gradient

In the introduction to the directional derivative and the gradient, we illustrated the concepts behind the directional derivative. The main points were that, given a multivariable scalar-valued function $f : \R^n \to \R$ (confused?),

  1. the directional derivative $D_{\vc{u}}f$ is a generalization of the partial derivative to the slope of $f$ in a direction of an arbitrary unit vector $\vc{u}$,
  2. the gradient $\nabla f$ is a vector that points in the direction of the greatest upward slope whose length is the directional derivative in that direction, and
  3. the directional derivative is the dot product between the gradient and the unit vector: $D_{\vc{u}}f = \nabla f \cdot \vc{u}$.

This introduction is missing one important piece of information: what exactly is the gradient? How can we calculate it from $f$? It's actually pretty simple to calculate an expression for the gradient, if you can remember what it means for a function to be differentiable.

What does it mean for a function $f(\vc{x})$ to be differentiable at the point $\vc{x}=\vc{a}$? The function must be locally be essentially linear, i.e., there must be a linear approximation \begin{align*} L(\vc{x}) = f(\vc{a}) + Df(\vc{a})(\vc{x}-\vc{a}) \end{align*} that is very close to to $f(\vc{x})$ for all $\vc{x}$ near $\vc{a}$. The definition of differentiability means that, for all directions emanating out of $\vc{a}$, $f(\vc{x})$ and $L(\vc{x})$ have the same slope. We can therefore calculate the directional derivatives of $f$ at $\vc{x}$ using $L$ rather than $f$.

Using the definition of directional derivative, we can calculate the directional derivative of $f$ at $\vc{a}$ in the direction of $\vc{u}$: \begin{align*} D_{\vc{u}}f(\vc{a}) &= D_{\vc{u}}L(\vc{a}) = \lim_{h \to 0} \frac{L(\vc{a}+h\vc{u}) - L(\vc{a})}{h}\\ &= \lim_{h \to 0} \frac{hDf(\vc{a})\vc{u}}{h} = \lim_{h \to 0}~ Df(\vc{a})\vc{u} = Df(\vc{a})\vc{u}. \end{align*} Since $Df(\vc{x})$ is a $1 \times n$ row vector and $\vc{u}$ is an $n \times 1$ column vector, the matrix-vector product is a scalar. We could rewrite this product as a dot-product between two vectors, by reforming the $1 \times n$ matrix of partial derivatives into a vector. We denote the vector by $\nabla f$ and we call it the gradient. We obtain that the directional derivative is \begin{align*} D_{\vc{u}}f(\vc{a}) = \nabla f(\vc{a}) \cdot \vc{u} \end{align*} as promised.