Special cases of the multivariable chain rule
The general statement of the multivariable chain rule is the following.
Chain Rule: For differentiable functions g:Rm→Rk and f:Rk→Rn (confused?), the derivative matrix of the composition h=f∘g (i.e,. h(x)=f(g(x))) at the point a is the product of the derivative matrices for f and g: Dh(a)=D(f∘g)(a)=Df(g(a))Dg(a).
In this form, the multivariable chain rule looks similar to the one-variable chain rule: ddx(f∘g)(x)=ddxf(g(x))=f′(g(x))g′(x).
Using the above general form may be the easiest way to learn the chain rule. If you are comfortable forming derivative matrices, multiplying matrices, and using the one-variable chain rule, then using the chain rule (1) doesn't require memorizing a series of formulas and determining which formula applies to a given problem.
On the other hand, having a few special case formulas available can save some work. For a given type of problem, you can form the matrices and calculate their product once to obtain a formula valid for that particular type of problem. Then, for each example of that problem type, you can plug the particular functions into the special case formula and get to the final result more quickly.
In the following, we derive formulas for a few special cases. In each case, the outer function f is a scalar-valued function. We look at two groups of special cases: when g is a function of one variable and when g is a function of two variables.
g is a function of one variable
When the inner function g is a function of one variable, g:R→Rn, and f is a scalar-valued function, f:Rn→R, then the composition h(t)=f(g(t)) is just a scalar-valued function of a single variable h:R→R. Its derivative is just a single number h′(t). We show how to express this single number as a dot product of two vectors.
Since g is a function fo a single variable, we can view g as parametrizing a curve. We can write the derivative of the parametrized curve as a vector Dg(t)=[dg1dt(t)dg2dt(t)⋮dgndt(t)]=(g′1(t),g′2(t),…,g′n(t))=g′(t).
Since we are considering the case where f is a scalar-valued function, its derivative matrix can be viewed as the gradient vector: ∇f(x)=(∂f∂x1(x),∂f∂x2(x),⋯,∂f∂xn(x)).
If g(t) is a one-dimensional function x=g(t) and f(x) is a function of a single variable, then we are back to the single-variable chain rule and equation (2) becomes h′(t)=f′(g(t))g′(t). If we wrote g(t)=x(t), we could also write this chain rule as dhdt=dfdxdxdt, where we neglect to write the arguments of each function.
If, on the other hand, g(t) is a two-dimensional function (x,y)=g(t)=(g1(t),g2(t)) and f(x) is a function of two variables, f(x,y), then we can multiply out the dot product of equation (2) to write it as h′(t)=∂f∂x(g(t))g′1(t)+∂f∂y(g(t))g′2(t).
If we write g(t)=(g1(t),g2(t))=(x(t),y(t)) and its derivative as g′(t)=(dxdt,dydt), then we can write this formula in a way that some people fine easier to memorize: dhdt=∂f∂xdxdt+∂f∂ydydt.
We can write a similar expression for three-dimensional g(t)=(g1(t),g2(t),g3(t))=(x(t),y(t),z(t)) and f(x,y,z): h′(t)=∂f∂x(g(t))g′1(t)+∂f∂y(g(t))g′2(t)+∂f∂z(g(t))g′3(t).
g is a function of two variables
If g(s,t) is a vector-valued function of two variables, g:R2→Rn, then we can no longer write its derivative as a vector. Instead, the derivative Dg(s,t) will be a matrix of partial derivatives with two columns, i.e., an n×2 matrix. Since the derivative of f:Rn→R is a 1×n matrix, the derivative of the composition h(s,t)=g(g(s,t)) will be a 1×2 matrix: Dh(s,t)=[∂h∂s(s,t)∂h∂t(s,t)].
If g is a scalar-valued function, x(s,t)=g(s,t), then its derivative is the 1×2 matrix Dg(s,t)=[∂g∂s(s,t)∂g∂t(s,t)]=[∂x∂s∂x∂t],
If g is a two-dimensional function (x(s,t),y(s,t))=g(s,t)=(g1(s,t),g2(s,t)) and f is a function of two variables, f(x,y), then Dg(s,t) is a 2×2 matrix and Df(x,y) is a 1×2 matrix. The chain rule (1) can be written as [∂h∂s(s,t)∂h∂t(s,t)]=[∂f∂x(g(s,t))∂f∂y(g(s,t))][∂g1∂s(s,t)∂g1∂t(s,t)∂g2∂s(s,t)∂g2∂t(s,t)].
Lastly, in three-dimensions, with functions g:R2→R3 and f:R3→R, we can write the chain rule in matrix form as [∂h∂s(s,t)∂h∂t(s,t)]=[∂f∂x(g(s,t))∂f∂y(g(s,t))∂f∂z(g(s,t))][∂g1∂s(s,t)∂g1∂t(s,t)∂g2∂s(s,t)∂g2∂t(s,t)∂g3∂s(s,t)∂g3∂t(s,t)],
Thread navigation
Multivariable calculus
Math 2374
- Previous: Introduction to the chain rule*
- Next: Chain rule examples
Similar pages
- Multivariable chain rule examples
- Introduction to the multivariable chain rule
- A refresher on the chain rule
- The idea of the chain rule
- Simple examples of using the chain rule
- The multidimensional differentiability theorem
- Non-differentiable functions must have discontinuous partial derivatives
- A differentiable function with discontinuous partial derivatives
- The gradient vector
- Newton's Method
- More similar pages