Maxima and Minima of an n-th Dimensional Function - The Hessian Matrix
Local Maximum, Local Minimum, and Extremum
Consider an $n$-dimensional function $z=f(\mathbf{x})$ defined by $f: D\subseteq \R^n \to \R$, where $\mathbf{x}=(x_1, x_2, \ldots, x_n)$ is an $n$-tuple of real numbers. Also consider a point $\mathbf{a} = (a_1, a_2, \ldots, a_n)$ defined in the domain $D\subseteq\R^n$.
- Local Maximum. The function $z=f(\mathbf{x})$ attains its local maximum at $\mathbf{a}$ if there exists a neighborhood $N_\varepsilon(\mathbf{a})\in D$ of $\mathbf{a}$ such that $f(\mathbf{x})\leq f(\mathbf{a})$, $\forall \mathbf{x}\in N_\varepsilon(\mathbf{a}) \backslash \{\mathbf{a}\}$.
- Local Minimum. The function $z=f(\mathbf{x})$ attains its local minimum at $\mathbf{a}$ if there exists a neighborhood $N_\varepsilon(\mathbf{a})\in D$ of $\mathbf{a}$ such that $f(\mathbf{x})\geq f(\mathbf{a})$, $\forall \mathbf{x}\in N_\varepsilon(\mathbf{a}) \backslash \{\mathbf{a}\}$.
- Extremum. The function $z=f(\mathbf{x})$ attains its maximum, or minimum at point $\mathbf{a}$ is generally called an extremum at point $\mathbf{a}$. Point $\mathbf{a}$ might sometimes be called the extreme point of the function and $f(\mathbf{a})$ is called the extreme value of the function $z=f(\mathbf{x})$.
Stationary Points, Critical Points, and Saddle Points
Consider an $n$-dimensional function $z=f(\mathbf{x})$, $f: D\subseteq \R^n \to \R$, where $\mathbf{x}=(x_1, x_2, \ldots, x_n)$ is an $n$-tuple of real numbers. Assume that the function is defined at a point $\mathbf{a} = (a_1, a_2, \ldots, a_n)\in D\subseteq\R^n$.
- Stationary Point.
The points where all of the partial derivatives of $f$ equals 0 is called
the stationary points of $f$. That is,
$$\pd{f}{x_i}(a_1,a_2,...,a_n)=\pd{f(\mathbf{a})}{x_i}=0\quad\forall\; i=1,2,...,n$$
- Critical Point. The point $\mathbf{a}=(a_1, a_2, \ldots ,a_n)\in D$ is called a critical point if at least one of the partial derivatives of $f$ at that point equals 0 or undefined.
- Saddle Point. The point $\mathbf{a}=(a_1,a_2,\ldots,a_n)\in D$ is called a saddle point if it is a stationary point and for all $V_\varepsilon(\mathbf{a})$, and there exists $\mathbf{y}=(y_1, y_2,\ldots,y_n)\in V_\varepsilon$ and $\mathbf{z}=(z_1,z_2,\ldots,z_n)\in V_\varepsilon$ such that $$f(\mathbf{y})< f(\mathbf{a})< f(\mathbf{z}).$$
The Hessian Matrix and the Sylvester Criterion
The Hessian Matrix. Consider a function $z=f(\mathbf{x})$ and $\mathbf{x}$ defined as before, and let $f$ be a function whose second partial derivatives are continuous on its domain $D$. Consider also a point $\mathbf{a}=(a_1,a_2,...,a_n)\in D\subseteq \R^n$. By then, the matrix
is called the Hessian matrix of the function $f$ at point $\mathbf{a}$. Since $f$ has continuous second partial derivatives, $H_f(M)$ must be a symmetric square matrix; that is, $H_f(\mathbf{a})=H_f(\mathbf{a})^{\mathsf{T}}$.
Quadratic Form. A quadratic form, to put simply, is just a second-degree multi-variable polynomial such as
One interesting about quadratic forms is that every symmetric matrix defines a quadratic form; that is, given that $\mathbf{A} = [a]_{ij}$, the polynomial
will results in a quadratic form without requiring us to combining terms. Therefore, $H_f(\mathbf{a})$ can most definitely be treated as a matrix associated with a quadratic form.
Definite Matrix. A symmetric real matrix $\mathbf{A}$ is called positive definite if the associated quadratic form $$\phi( \mathbf{x})=\mathbf{x}^{\mathsf{T}}\mathbf{A}\mathbf{x}$$ is positive for every column ($n\times 1$), non-zero vector $\mathbf{x}$ in $\R^n$. Similarly, if $\phi(\mathbf{x})$ only yields negative values, then $\mathbf{A}$ is called negative definite. Finally, if $\phi$ produces both negative and positive values, then $\mathbf{A}$ is said to be indefinite. There are also other cases such as positive semi-definite or negative semi-definite matrices. In general,
- $\mathbf{A}$ is positive-definite if and only if $\mathbf{x}^{\mathsf{T}}\mathbf{A}\mathbf{x}>0,\;\forall \mathbf{x}\in\R^n \backslash\{\mathbf{0}\}$.
- $\mathbf{A}$ is negative-definite if and only if $\mathbf{x}^{\mathsf{T}}\mathbf{A}\mathbf{x}< 0,\;\forall \mathbf{x}\in\R^n \backslash\{\mathbf{0}\}$.
- $\mathbf{A}$ is semi-positive-definite if and only if $\mathbf{x}^{\mathsf{T}}\mathbf{A}\mathbf{x}\geq 0,\;\forall \mathbf{x}\in\R^n$.
- $\mathbf{A}$ is semi-negative-definite if and only if $\mathbf{x}^{\mathsf{T}}\mathbf{A}\mathbf{x}\leq 0,\;\forall \mathbf{x}\in\R^n$.
Sub-matrices. Given an $n\times n$ matrix $\mathbf{A}$ with $(i,j)$-th entry, where $a_{ij}=a_{ji}$. Let $A^{(k)}$ denote the $k\times k$ submatrix taken from the top left corner of $\mathbf{A}$. These matrices are called major sub-matrices of $A$. That is,
In particular,
Let $\Delta_k=\det(\mathbf{A}^{k})\;,1\leq k\leq n$. For instance, $\Delta_1=a_{11}$ and $\Delta_n =\det(\mathbf{A})$.
- $\mathbf{A}$ is positive definite, denote $\mathbf{A}\succ 0$ if and only if $\Delta_1>0,\Delta_2>0,...,\Delta_n>0$.
-
$\mathbf{A}$ is negative definite, denote
$\mathbf{A}\prec 0$ if and only if the signs of
$\Delta_i$ alternates, with $\Delta_1 < 0$; that is,
$$(-1)^1\Delta_1>0, \;(-1)^2\Delta_2>0,\ldots,\;(-1)^n\Delta_n>0.$$
- $\mathbf{A}$ is indefinite if there exists a $\Delta_k$ that breaks both patterns above such that $\Delta_k$ is in the wrong sign.
- The Sylvester criterion is inconclusive ($\mathbf{A}$ can be either positive or negative definite, or indefinite) if there exists a $\Delta_k$ that breaks the first two patterns such that $\Delta_k=0$.
The Maximum/Minimum Algorithm for an n-Variable Function
The General Case
The General Second Derivative Test. To identify all the stationary (and critical points) of $f(\mathbf{x})$ and test if they are the extrema of $f$, we follow the following procedure:
- Find $D(f)$, the domain of $f(\mathbf{x})$. This is for us to determine whether to keep a critical point or not.
-
Solve the equation $\nabla
f(\mathbf{x})=\mathbf{0}$; that is, solve the
system of $n$-equations
\[ \left\{ \begin{array}{cc} \pd{f(\mathbf{x})}{x_1}=f_{x_1}(x_1,x_2,...,x_n)=0 \\[0.1em] \pd{f(\mathbf{x})}{x_2}=f_{x_2}(x_1,x_2,...,x_n)=0 \\[0.1em] \vdots \\[-0.1em] \pd{f(\mathbf{x})}{x_n}=f_{x_n}(x_1,x_2,...,x_n)=0 \end{array} \right., \]to find all stationary points of $f(\mathbf{x})$ (if they exist). Label these points, for instance, $\mathbf{a}_0=(a_1^0, \;a_2^0,...,\;a_n^0)$, $\mathbf{a}_1=(a_1^1,\;a_2^1,\ldots,\;a_n^1),\ldots$. Next, find all points where the partial derivatives $f_{x_1}(\mathbf{x}), f_{x_2}(\mathbf{x}),...,f_{x_n}(\mathbf{x})$ are undefined, we now have the set of all critical points of $f(\mathbf{x})$.
Remark. One reminder to note is that all critical points that need to be considered must lie inside the domain $D$. If not, just discard them. -
If $\mathbf{a_0}$ is a stationary point, calculate its
Hessian matrix at $\mathbf{a_0}$; that is, find
$H_f(\mathbf{a_0})$. After that, apply Sylvester's
Criterion on $H_f(\mathbf{a_0})$ by calculating the
values of $\Delta_1,\Delta_2,...,\Delta_n$ at
$\mathbf{a_0}$.
- If $\mathrm{H}_f(\mathbf{a_0})\succ 0$, then $\mathbf{a_0}$ is a minimum of the function $f(\mathbf{x})$.
- If $\mathrm{H}_f(\mathbf{a_0})\prec 0$, then $\mathbf{a_0}$ is a maximum of the function $f(\mathbf{x})$.
- If $\det (\Delta_k)\neq 0,\;1\leq k\leq n,$ but $\mathrm{H}_f(\mathbf{a_0})$ itself is neither positive nor negative definite, then $\mathbf{a_0}$ is not an extremum. A similar statement is that if there exist two points $\mathbf{y}=(y_1,y_2,...,y_n)$ and $\mathbf{z}=(z_1,z_2,...,z_n)$ such that $\mathbf{y}H_f(\mathbf{a_0})\mathbf{y}^{\mathsf{T}}>0$ and $\mathbf{z}H_f(\mathbf{a_0})\mathbf{z}^{\mathsf{T}}< 0$, then $\mathbf{a_0}$ is not an extremum of $f(\mathbf{x})$.
- If there exists a point $\mathbf{y}=(y_1,y_2,...,y_n)$ such that $\mathbf{y}^{\mathrm{T}}\mathrm{H}_f(\mathbf{a_0})\mathbf{y}=0$; that is, if there is at least one major sub-matrix of $\mathrm{H}_f(\mathbf{a_0})$ equals to 0, then the test is inconclusive.
- Repeat the same process again for the stationary point $\mathbf{a_1}$.
The Special Case: $n=2$
For a 2-variable function $z=f(x,y)$, the maximum/minimum algorithm can be simplified a bit. After finding all stationary and critical points of $f(x,y)$, we can follow these steps to identify the extrema of $f$:
- For each stationary point $(x_0,y_0)$, calculate the second partial derivatives of $f$ at that point: $f_{xx}(x_0,y_0)$, $f_{yy}(x_0,y_0)$, and $f_{xy}(x_0,y_0)$.
-
Calculate
$\Delta_2(x_0,y_0)=f_{xx}(x_0,y_0)f_{yy}(x_0,y_0)-[f_{xy}(x_0,y_0)]^2$.
- If $\Delta_2(x_0,y_0)>0$ and $f_{xx}(x_0,y_0)>0$, then $(x_0,y_0)$ is a minimum of $f(x,y)$.
- If $\Delta_2(x_0,y_0)>0$ and $f_{xx}(x_0,y_0)< 0 $, then $(x_0,y_0)$ is a maximum of $f(x,y)$.
- If $\Delta_2(x_0,y_0)< 0$, then $(x_0,y_0)$ is not an extremum of $f(x,y)$.
- If $\Delta_2(x_0,y_0)=0$, then the test is inconclusive.
Example Problems with Solutions
In 3D Space
Solution
Solution
Based on the condition of the equations system, the function $f$ has no stationary point, but it has one critical point $(0,0)$ since that point is undefined at both $f_x$ and $f_y$. Notice that $f(x,y)=\sqrt{x^2+y^2}\geq f(0,0)=0,$, $\forall (x,y)\in\R^2$. Therefore, by definition, $(0,0)$ is the minimum of the function and $f_{\min}=0$.
Solution
For this particular function, we can verify this quite easily using the AM-GM inequality. The inequality requires $x$ and $y$ to be non-negative, which they are, so we don't have to worry about them. Simply apply the inequality and we may obtain
Solution
- If $y=0$, substitute into equation (2), we obtain $x\ln(x^2)=0$, and thus $x=0$ (We reject this solution since $(0,0)\notin D$) or $\ln(x^2)=0$, which gives two stationary points $(1,0)$ and $(-1,0)$.
- If $x=0$, substitute into equation (1), we obtain $y\ln(y^2)=0$, and thus $y=0$ (We reject this solution since $(0,0)\notin D$) or $\ln(y^2)=0,$ which gives two more stationary points $(0,1)$ and $(0,-1)$.
-
If $x\neq 0$ and $y\neq 0$, the system of equations above is
equivalent to
\begin{equation*} \begin{cases} \ln(x^2+y^2)+\dfrac{2x^2}{x^2+y^2}=0\quad (3)\\[1em] %%% \ln(x^2+y^2)+\dfrac{2y^2}{x^2+y^2}=0\quad (4) \end{cases} \end{equation*}Adding $(3)$ and $(4)$, we obtain $2\ln(x^2+y^2)+2=0\;\Rightarrow\; x^2+y^2=\dfrac{1}{e}$. Substitute this into (3) (or (4)), we obtain the quadratic equation$$-1+2ex^2=0\quad\Rightarrow\quad x = \pm \frac{1}{\sqrt{2e}}$$If we substitute these solutions to the equation $x^2+y^2=\dfrac{1}{e}$, we can obtain the $y$'s: If $x = 1/\sqrt{2e}$, then$$y = \pm \dfrac{1}{\sqrt{2e}},$$similarly, if $x = -1/\sqrt{2e}$, then we have two more of the same $y$$$y = \pm \dfrac{1}{\sqrt{2e}},$$This gives four more stationary points:$$\left(\dfrac{1}{\sqrt{2e}},\dfrac{1}{\sqrt{2e}}\right),\;\left(\dfrac{1}{\sqrt{2e}},-\dfrac{1}{\sqrt{2e}}\right),\;\left(-\dfrac{1}{\sqrt{2e}},\dfrac{1}{\sqrt{2e}}\right),\; \text{and} \;\left(-\dfrac{1}{\sqrt{2e}},-\dfrac{1}{\sqrt{2e}}\right).$$
| Points | $f_{xx}(x_0,y_0)$ | $f_{xy}(x_0,y_0)$ | $f_{yy}(x_0,y_0)$ | $\Delta_2=f_{xx}f_{yy}-f_{xy}^2$ | Conclusion |
| $(0,1)$ | $0$ | $2$ | $0$ | $-4<0$ | Saddle point at $(0,1,0)$ |
| $(0,-1)$ | $0$ | $2$ | $0$ | $-4<0$ | Saddle point at $(0,-1,0)$ |
| $(1,0)$ | $0$ | $2$ | $0$ | $-4<0$ | Saddle point at $(1,0,0)$ |
| $(-1,0)$ | $0$ | $2$ | $0$ | $-4<0$ | Saddle point at $(-1,0,0)$ |
| $\left(\dfrac{1}{\sqrt{2e}},\dfrac{1}{\sqrt{2e}}\right)$ | $2>0$ | $0$ | $2$ | $4>0$ | Local minimum, $f_{\min}=-\frac{1}{2e}$ |
| $\left(\dfrac{1}{\sqrt{2e}},-\dfrac{1}{\sqrt{2e}}\right)$ | $-2<0$ | $0$ | $-2$ | $4>0$ | Local maximum, $f_{\max}=\dfrac{1}{2e}$ |
| $\left(-\dfrac{1}{\sqrt{2e}},\dfrac{1}{\sqrt{2e}}\right)$ | $-2<0$ | $0$ | $-2$ | $4>0$ | Local maximum, $f_{\max}=\dfrac{1}{2e}$ |
| $\left(-\dfrac{1}{\sqrt{2e}},-\dfrac{1}{\sqrt{2e}}\right)$ | $2>0$ | $0$ | $2$ | $4>0$ | Local minimum, $f_{\min}=-\frac{1}{2e}$ |
Solution
- If $x=0$, then $y=\cos (0)=1\;\Rightarrow\;(0,1)$.
- If $x=\pi$, then $y=\cos (\pi)=-1\;\Rightarrow\;(\pi,-1)$.
- If $x=2\pi$, then $y=\cos (2\pi)=1\;\Rightarrow\;(2\pi,1)$.
| Points | $f_{xx}(x_0,y_0)$ | $\Delta_2(x_0,y_0)$ | Conclusion |
| $(0,1)$ | $2>0$ | $4>0$ | Relative minimum, $f_{\min}=-1$ |
| $(\pi,-1)$ | $2>0$ | $4>0$ | Relative minimum, $f_{\min}=-1$ |
| $(2\pi,1)$ | $2>0$ | $4>0$ | Relative minimum, $f_{\min}=-1$ |
| $(\frac{\pi}{2},0)$ | $0$ | $-4<0$ | Saddle point at $(\pi/2,0,0)$ |
| $(\frac{3\pi}{2},0)$ | $0$ | $-4<0$ | Saddle point at $(3\pi/2,0,0)$ |
Solution
| Points | $f_{xx}(x_0,y_0)$ | $\Delta_2(x_0,y_0)$ | Conclusion |
| $(0,0)$ | $-2<0$ | $8>0$ | Local maximum, $f_{\max}=0$ |
| $(0,1)$ | $-2<0$ | $-16<0$ | Saddle Point at $(0,1,-1)$ |
| $(0,-1)$ | $-2<0$ | $-16<0$ | Saddle Point at $(0,-1,-1)$ |
| $\left(\frac{1}{2},0\right)$ | $4>0$ | $-16<0$ | Saddle Point at $(\frac{1}{2},0,-\frac{1}{8})$ |
| $\left(\frac{1}{2},1\right)$ | $4>0$ | $32<0$ | Local minimum, $f_{\min}=-\frac{9}{8}$ |
| $\left(\frac{1}{2},-1\right)$ | $4>0$ | $32>0$ | Local minimum, $f_{\min}=-\frac{9}{8}$ |
| $\left(-\frac{1}{2},0\right)$ | $4>0$ | $-16<0$ | Saddle Point at $(-\frac{1}{2},0,-\frac{1}{8})$ |
| $\left(-\frac{1}{2},1\right)$ | $4>0$ | $32>0$ | Local minimum, $f_{\min}=-\frac{9}{8}$ |
| $\left(-\frac{1}{2},-1\right)$ | $4>0$ | $32>0$ | Local minimum, $f_{\min}=-\frac{9}{8}$ |
Solution
| $m$ | $n$ | $\Delta_2(x,y)$ | $f_{xx}(x_0,y_0)$ | Conclusion |
| Even | Even | $\Delta_2=-16\cos\left(-\dfrac{\pi}{6}\right)=-8\sqrt{3}<0$ | ~ | Saddle Point |
| Odd | Odd | $\Delta_2=16\cos\left(\dfrac{7\pi}{6}\right)=-8\sqrt{3}<0$ | ~ | Saddle Point |
| Even | Odd | $\Delta_2=16\cos\left(-\dfrac{\pi}{6}\right)=8\sqrt{3}>0$ | $\sqrt{3}+2>0$ | Local Minimum, where $$f_{\min}=m\pi-\frac{\pi}{6}-2-\sqrt{3}$$ |
| Odd | Even | $\Delta_2=-16\cos\left(\dfrac{7\pi}{6}\right)=8\sqrt{3}>0$ | $-\sqrt{3}-2<0$ | Local maximum, where $$f_{\max}=m\pi+\frac{\pi}{6}+2+\sqrt{3}$$ |
In 4D Space
Solution
Solution
At $N=\left(-\dfrac{1}{2},-\dfrac{5}{4},-\dfrac{1}{4}\right)$,
More Articles