In the introduction to the directional derivative and the gradient, we illustrated the concepts behind the directional derivative. The main points were that, given a multivariable scalar-valued function $f : \R^n \to \R$ (confused?),
- the directional derivative $D_{\vc{u}}f$ is a generalization of the partial derivative to the slope of $f$ in a direction of an arbitrary unit vector $\vc{u}$,
- the gradient $\nabla f$ is a vector that points in the direction of the greatest upward slope whose length is the directional derivative in that direction, and
- the directional derivative is the dot product between the gradient and the unit vector: $D_{\vc{u}}f = \nabla f \cdot \vc{u}$.
This introduction is missing one important piece of information: what exactly is the gradient? How can we calculate it from $f$? It's actually pretty simple to calculate an expression for the gradient, if you can remember what it means for a function to be differentiable.
What does it mean for a function $f(\vc{x})$ to be differentiable at the point $\vc{x}=\vc{a}$? The function must be locally be essentially linear, i.e., there must be a linear approximation \begin{align*} L(\vc{x}) = f(\vc{a}) + Df(\vc{a})(\vc{x}-\vc{a}) \end{align*} that is very close to to $f(\vc{x})$ for all $\vc{x}$ near $\vc{a}$. The definition of differentiability means that, for all directions emanating out of $\vc{a}$, $f(\vc{x})$ and $L(\vc{x})$ have the same slope. We can therefore calculate the directional derivatives of $f$ at $\vc{x}$ using $L$ rather than $f$.
Using the definition of directional derivative, we can calculate the directional derivative of $f$ at $\vc{a}$ in the direction of $\vc{u}$: \begin{align*} D_{\vc{u}}f(\vc{a}) &= D_{\vc{u}}L(\vc{a}) = \lim_{h \to 0} \frac{L(\vc{a}+h\vc{u}) - L(\vc{a})}{h}\\ &= \lim_{h \to 0} \frac{hDf(\vc{a})\vc{u}}{h} = \lim_{h \to 0}~ Df(\vc{a})\vc{u} = Df(\vc{a})\vc{u}. \end{align*} Since $Df(\vc{x})$ is a $1 \times n$ row vector and $\vc{u}$ is an $n \times 1$ column vector, the matrix-vector product is a scalar. We could rewrite this product as a dot-product between two vectors, by reforming the $1 \times n$ matrix of partial derivatives into a vector. We denote the vector by $\nabla f$ and we call it the gradient. We obtain that the directional derivative is \begin{align*} D_{\vc{u}}f(\vc{a}) = \nabla f(\vc{a}) \cdot \vc{u} \end{align*} as promised.