Maxima and Minima of an n-th Dimensional Function - The Hessian Matrix
Updated May 25, 2024
“As you will find in multivariable calculus, there is often a
number of solutions for any given problem.”
- John Nash
Local Maximum, Local Minimum, and Extremum
Consider an $n$-dimensional function $z=f(\mathbf{x})$ defined
by $f: D\subseteq \R^n \to \R$, where $\mathbf{x}=(x_1, x_2,
\ldots, x_n)$ is an $n$-tuple of real numbers. Also consider a point
$\mathbf{a} = (a_1, a_2, \ldots, a_n)$ defined in the domain
$D\subseteq\R^n$.
Local Maximum. The function $z=f(\mathbf{x})$
attains its local maximum at $\mathbf{a}$ if there exists a
neighborhood $N_\varepsilon(\mathbf{a})\in D$ of $\mathbf{a}$
such that $f(\mathbf{x})\leq f(\mathbf{a})$, $\forall
\mathbf{x}\in N_\varepsilon(\mathbf{a}) \backslash
\{\mathbf{a}\}$.
Local Minimum.
The function $z=f(\mathbf{x})$ attains its local minimum at
$\mathbf{a}$ if there exists a neighborhood $N_\varepsilon(\mathbf{a})\in
D$ of $\mathbf{a}$ such that $f(\mathbf{x})\geq f(\mathbf{a})$,
$\forall \mathbf{x}\in N_\varepsilon(\mathbf{a}) \backslash
\{\mathbf{a}\}$.
Extremum.
The function $z=f(\mathbf{x})$ attains its maximum, or minimum
at point $\mathbf{a}$ is generally called an extremum at point
$\mathbf{a}$. Point $\mathbf{a}$ might sometimes be
called the extreme point of the function and $f(\mathbf{a})$
is called the extreme value of the function $z=f(\mathbf{x})$.
We call the plural of maximum maxima, the plural of minimum minima, and the
plural of extremum extrema.
If the function $z=f(\mathbf{x})$ has continuous first
partial derivatives in some neighborhood of point
$\mathbf{a}$ and obtain its extrema at
$\mathbf{a}$, then
Stationary Points, Critical Points, and Saddle Points
Consider an $n$-dimensional function $z=f(\mathbf{x})$, $f:
D\subseteq \R^n \to \R$, where $\mathbf{x}=(x_1, x_2, \ldots,
x_n)$ is an $n$-tuple of real numbers. Assume that the function is
defined at a point $\mathbf{a} = (a_1, a_2, \ldots, a_n)\in
D\subseteq\R^n$.
Stationary Point.
The points where all of the partial derivatives of $f$ equals 0 is called
the stationary points of $f$. That is,
Critical Point.
The point $\mathbf{a}=(a_1, a_2, \ldots ,a_n)\in D$ is called
a critical point if at least one of the partial derivatives of $f$ at that
point equals 0 or undefined.
Saddle Point.
The point $\mathbf{a}=(a_1,a_2,\ldots,a_n)\in D$ is called a saddle
point if it is a stationary point and for all $V_\varepsilon(\mathbf{a})$,
and there exists $\mathbf{y}=(y_1, y_2,\ldots,y_n)\in V_\varepsilon$
and $\mathbf{z}=(z_1,z_2,\ldots,z_n)\in V_\varepsilon$ such that
$$f(\mathbf{y})< f(\mathbf{a})< f(\mathbf{z}).$$
The Hessian Matrix and the Sylvester Criterion
The Hessian Matrix. Consider a function $z=f(\mathbf{x})$
and $\mathbf{x}$ defined as before, and let $f$ be a function whose
second partial derivatives are continuous on its domain $D$. Consider also
a point $\mathbf{a}=(a_1,a_2,...,a_n)\in D\subseteq \R^n$. By then,
the matrix
is called the Hessian matrix of the function $f$ at point
$\mathbf{a}$. Since $f$ has continuous second partial
derivatives, $H_f(M)$ must be a symmetric square matrix; that is,
$H_f(\mathbf{a})=H_f(\mathbf{a})^{\mathsf{T}}$.
Quadratic Form. A quadratic form, to put simply, is just
a second-degree multi-variable polynomial such as
One interesting about quadratic forms is that every symmetric matrix
defines a quadratic form; that is, given that $\mathbf{A} =
[a]_{ij}$, the polynomial
will results in a quadratic form without requiring us to combining
terms. Therefore, $H_f(\mathbf{a})$ can most definitely be
treated as a matrix associated with a quadratic form.
Definite Matrix. A symmetric real matrix $\mathbf{A}$
is called positive definite if the associated quadratic form $$\phi( \mathbf{x})=\mathbf{x}^{\mathsf{T}}\mathbf{A}\mathbf{x}$$
is positive for every column ($n\times 1$), non-zero vector $\mathbf{x}$
in $\R^n$. Similarly, if $\phi(\mathbf{x})$ only yields negative
values, then $\mathbf{A}$ is called negative definite. Finally,
if $\phi$ produces both negative and positive values, then $\mathbf{A}$
is said to be indefinite. There are also other cases such as positive semi-definite
or negative semi-definite matrices. In general,
$\mathbf{A}$ is positive-definite if and only if
$\mathbf{x}^{\mathsf{T}}\mathbf{A}\mathbf{x}>0,\;\forall
\mathbf{x}\in\R^n
\backslash\{\mathbf{0}\}$.
$\mathbf{A}$ is negative-definite if and only if
$\mathbf{x}^{\mathsf{T}}\mathbf{A}\mathbf{x}<
0,\;\forall \mathbf{x}\in\R^n
\backslash\{\mathbf{0}\}$.
$\mathbf{A}$ is semi-positive-definite if and only if
$\mathbf{x}^{\mathsf{T}}\mathbf{A}\mathbf{x}\geq
0,\;\forall \mathbf{x}\in\R^n$.
$\mathbf{A}$ is semi-negative-definite if and only if
$\mathbf{x}^{\mathsf{T}}\mathbf{A}\mathbf{x}\leq
0,\;\forall \mathbf{x}\in\R^n$.
Sub-matrices. Given an $n\times n$ matrix $\mathbf{A}$
with $(i,j)$-th entry, where $a_{ij}=a_{ji}$. Let $A^{(k)}$
denote the $k\times k$ submatrix taken from the top left corner of $\mathbf{A}$.
These matrices are called major sub-matrices of $A$. That is,
$\mathbf{A}$ is indefinite if there exists a $\Delta_k$
that breaks both patterns above such that $\Delta_k$ is in the
wrong sign.
The Sylvester criterion is inconclusive ($\mathbf{A}$
can be either positive or negative definite, or indefinite) if
there exists a $\Delta_k$ that breaks the first two patterns such
that $\Delta_k=0$.
The Maximum/Minimum Algorithm for an n-Variable Function
The General Case
The General Second Derivative Test. To identify all the
stationary (and critical points) of $f(\mathbf{x})$ and test if
they are the extrema of $f$, we follow the following procedure:
Find $D(f)$, the domain of $f(\mathbf{x})$. This is for
us to determine whether to keep a critical point or not.
Solve the equation $\nabla
f(\mathbf{x})=\mathbf{0}$; that is, solve the
system of $n$-equations
to find all stationary points of $f(\mathbf{x})$ (if they
exist). Label these points, for instance, $\mathbf{a}_0=(a_1^0,
\;a_2^0,...,\;a_n^0)$, $\mathbf{a}_1=(a_1^1,\;a_2^1,\ldots,\;a_n^1),\ldots$.
Next, find all points where the partial derivatives $f_{x_1}(\mathbf{x}),
f_{x_2}(\mathbf{x}),...,f_{x_n}(\mathbf{x})$
are undefined, we now have the set of all critical points of $f(\mathbf{x})$.
Remark. One reminder to note is that all critical points
that need to be considered must lie inside the domain $D$. If not, just
discard them.
If $\mathbf{a_0}$ is a stationary point, calculate its
Hessian matrix at $\mathbf{a_0}$; that is, find
$H_f(\mathbf{a_0})$. After that, apply Sylvester's
Criterion on $H_f(\mathbf{a_0})$ by calculating the
values of $\Delta_1,\Delta_2,...,\Delta_n$ at
$\mathbf{a_0}$.
If $\mathrm{H}_f(\mathbf{a_0})\succ 0$,
then $\mathbf{a_0}$ is a minimum of the function
$f(\mathbf{x})$.
If $\mathrm{H}_f(\mathbf{a_0})\prec 0$,
then $\mathbf{a_0}$ is a maximum of the function
$f(\mathbf{x})$.
If $\det (\Delta_k)\neq 0,\;1\leq k\leq n,$ but
$\mathrm{H}_f(\mathbf{a_0})$ itself is
neither positive nor negative definite, then
$\mathbf{a_0}$ is not an extremum. A similar
statement is that if there exist two points
$\mathbf{y}=(y_1,y_2,...,y_n)$ and
$\mathbf{z}=(z_1,z_2,...,z_n)$ such that
$\mathbf{y}H_f(\mathbf{a_0})\mathbf{y}^{\mathsf{T}}>0$
and
$\mathbf{z}H_f(\mathbf{a_0})\mathbf{z}^{\mathsf{T}}<
0$, then $\mathbf{a_0}$ is not an extremum of
$f(\mathbf{x})$.
If there exists a point
$\mathbf{y}=(y_1,y_2,...,y_n)$ such that
$\mathbf{y}^{\mathrm{T}}\mathrm{H}_f(\mathbf{a_0})\mathbf{y}=0$;
that is, if there is at least one major sub-matrix of
$\mathrm{H}_f(\mathbf{a_0})$ equals to 0,
then the test is inconclusive.
Repeat the same process again for the stationary point
$\mathbf{a_1}$.
The Special Case: $n=2$
For a 2-variable function $z=f(x,y)$, the maximum/minimum algorithm
can be simplified a bit. After finding all stationary and critical
points of $f(x,y)$, we can follow these steps to identify the extrema
of $f$:
For each stationary point $(x_0,y_0)$, calculate the second
partial derivatives of $f$ at that point:
$f_{xx}(x_0,y_0)$, $f_{yy}(x_0,y_0)$, and
$f_{xy}(x_0,y_0)$.
If $\Delta_2(x_0,y_0)>0$ and $f_{xx}(x_0,y_0)>0$,
then $(x_0,y_0)$ is a minimum of $f(x,y)$.
If $\Delta_2(x_0,y_0)>0$ and $f_{xx}(x_0,y_0)< 0
$, then $(x_0,y_0)$ is a maximum of $f(x,y)$.
If $\Delta_2(x_0,y_0)< 0$, then $(x_0,y_0)$ is not an
extremum of $f(x,y)$.
If $\Delta_2(x_0,y_0)=0$, then the test is inconclusive.
Example Problems with Solutions
In 3D Space
Find all local maximum, minimum, and saddle points of the following
function $$f(x,y)=x^2-xy+y^2-2x+y$$
Solution
We first need to find the domain of $f$, which is just simply
$D=\R^2$ since there is no point in $\R^2$ where the function is
undefined. We then solve for $\nabla f(x,y)=(0,0)$, that is, solve
for the system of equations
The function $f$ has one critical point $(x,y)=(1,0)$. The next step
is to find second partial derivatives of $f$: $$f_{xx}(x,y)=2,
\;f_{xy}(x,y)=1,\;\text{and}\;f_{yy}(x,y)=2$$
At $(1,0)$, we have $f_{xx}(1,0)=2$, $f_{xy}(1,0)=1$
and $f_{yy}(1,0)=2$. Since, $\Delta_2(1,0)=f_{xx}f_{yy}-f_{xy}^2=3>0$
and $f_{xx}(1,0)=2>0$, therefore, $(1,0)$ is a minimum of
$f(x,y)$ and $f_{\min} = f(1, 0) = -1$.
Find all local maximum, minimum, and saddle points of the following
function $$f(x,y)=\sqrt{x^2+y^2}$$
Solution
The first step, again, is to find the domain of $f: D=\R^2$. We
then solve for $\nabla f(x,y)=(0,0)$
If we simply solve for $x$ and $y$, we will immediately get $(x, y)
= (0, 0)$. This solution is invalid because the condition of the system
requires $(x, y)\neq (0,0)$.
Based on the condition of the equations system, the function $f$
has no stationary point, but it has one critical point $(0,0)$
since that point is undefined at both $f_x$ and $f_y$. Notice that
$f(x,y)=\sqrt{x^2+y^2}\geq f(0,0)=0,$, $\forall
(x,y)\in\R^2$. Therefore, by definition, $(0,0)$ is the minimum of
the function and $f_{\min}=0$.
Find all local maximum, minimum, and saddle points of the following
function
$$f(x,y)=xy+\dfrac{50}{x}+\dfrac{20}{y}$$
Solution
Domain of $f$: $D=\R^2\backslash\{(0,0)\}$. We then
solve for $\nabla f(x,y)=(0,0)$
For this system to be well-defined, $x$ and $y$ must not be equal to
zero. We have already defined that in the domain of $f$, so now, we
can simply multiple both sides of the first equations by $x$, both sides
of the second equation by $y$, we will obtain
At $(5,2)$, $f_{xx}=4/5$, $f_{xy}=1$, $f_{yy}=5$,
and thus $\Delta_2=3>0$. Therefore, $(5,2)$ is a local minimum of $f$
and $f_{\min}=30$.
For this particular function, we can verify this quite easily
using the AM-GM inequality. The inequality requires $x$ and $y$ to
be non-negative, which they are, so we don't have to worry about
them. Simply apply the inequality and we may obtain
If $y=0$, substitute into equation (2), we obtain
$x\ln(x^2)=0$, and thus $x=0$ (We reject this solution since
$(0,0)\notin D$) or $\ln(x^2)=0$, which gives two stationary
points $(1,0)$ and $(-1,0)$.
If $x=0$, substitute into equation (1), we obtain
$y\ln(y^2)=0$, and thus $y=0$ (We reject this solution since
$(0,0)\notin D$) or $\ln(y^2)=0,$ which gives two more
stationary points $(0,1)$ and $(0,-1)$.
If $x\neq 0$ and $y\neq 0$, the system of equations above is
equivalent to
Adding $(3)$ and $(4)$, we obtain $2\ln(x^2+y^2)+2=0\;\Rightarrow\;
x^2+y^2=\dfrac{1}{e}$. Substitute this into
(3) (or (4)), we obtain the quadratic equation
$$-1+2ex^2=0\quad\Rightarrow\quad x = \pm
\frac{1}{\sqrt{2e}}$$
If we substitute these solutions to the equation $x^2+y^2=\dfrac{1}{e}$,
we can obtain the $y$'s: If $x = 1/\sqrt{2e}$, then
$$y = \pm
\dfrac{1}{\sqrt{2e}},$$
similarly, if $x = -1/\sqrt{2e}$, then we have two more
of the same $y$
For this problem, it is easier to calculate each second partial derivative
at a point independently (just use a good calculator that allows variable-passing).
We now set up a table:
For $x=k\pi$, since $-1\leq x\leq 7$, we have $x\in \{0,\;\pi,\;2\pi\}$.
This gives us three stationary points.
If $x=0$, then $y=\cos (0)=1\;\Rightarrow\;(0,1)$.
If $x=\pi$, then $y=\cos (\pi)=-1\;\Rightarrow\;(\pi,-1)$.
If $x=2\pi$, then $y=\cos (2\pi)=1\;\Rightarrow\;(2\pi,1)$.
For $y=0$, we then have $y=\cos x=0$, and thus $x=\pi/2$ and
$x=3\pi/2$ (since $-1\leq x\leq 7$), therefore we have two more
stationary point $(\pi/2,0)$ and $(3\pi/2,0)$. We then find the
second partial derivatives of $f$ and $\Delta_2(x,y)$
Since $m$ and $n$ are integers, and therefore the signs of $\Delta_2(x,y)$
depend on the even and odd properties of $m$ and $n$. Thus, we set up
an appropriate table according to their odd and even properties.
From the equations, we see that this equation only has one stationary
point $M=\left(-\dfrac{2}{3},-\dfrac{1}{3},1\right)$.
We then proceed to find all the second derivatives of $f$:
we see that the matrix $H_f(M)$ is positive-definite, therefore $M=\left(-\dfrac{2}{3},-\dfrac{1}{3},1\right)$
is a minimum of $f$ and $$f_{\min}=f\left(-\dfrac{2}{3},-\dfrac{1}{3},1\right)=-\dfrac{4}{3}$$
Find all local maximum, minimum, and saddle points of the following
function
$$f(x,y,z)=x^3+xy+y^2-2xz+2z^2+3y-1$$
Solution
We first need to find the domain of $f$: $D=\R^3$. Next, we solve
the equation $\nabla f(x,y,z)=\mathbf{0}$.
We conclude that $H_f(M)$ is positive definite, therefore $M$ is a minimum
and $f_{\min}=-\frac{3}{2}$.
At
$N=\left(-\dfrac{1}{2},-\dfrac{5}{4},-\dfrac{1}{4}\right)$,