Systems of nonlinear algebraic equations¶
Implicit time discretization methods for a system of ODEs, or a PDE, lead to systems of nonlinear algebraic equations, written compactly as
where \(u\) is a vector of unknowns \(u=(u_0,\ldots,u_N)\), and \(F\) is a vector function: \(F=(F_0,\ldots,F_N)\). The system at the end of the section Systems of ODEs fits this notation with \(N=2\), \(F_0(u)\) given by the left-hand side of (18), while \(F_1(u)\) is the left-hand side of (19).
Sometimes the equation system has a special structure because of the underlying problem, e.g.,
with \(A(u)\) as an \((N+1)\times (N+1)\) matrix function of \(u\) and \(b\) as a vector function: \(b=(b_0,\ldots,b_N)\).
We shall next explain how Picard iteration and Newton’s method can be applied to systems like \(F(u)=0\) and \(A(u)u=b(u)\). The exposition has a focus on ideas and practical computations. More theoretical considerations, including quite general results on convergence properties of these methods, can be found in Kelley [Ref1].
Picard iteration¶
We cannot apply Picard iteration to nonlinear equations unless there is some special structure. For the commonly arising case \(A(u)u=b(u)\) we can linearize the product \(A(u)u\) to \(A(u^{-})u\) and \(b(u)\) as \(b(u^{-})\). That is, we use the most previously computed approximation in \(A\) and \(b\) to arrive at a linear system for \(u\):
A relaxed iteration takes the form
In other words, we solve a system of nonlinear algebraic equations as a sequence of linear systems.
Algorithm for relaxed Picard iteration
Given \(A(u)u=b(u)\) and an initial guess \(u^{-}\), iterate until convergence:
- solve \(A(u^{-})u^* = b(u^{-})\) with respect to \(u^*\)
- \(u = \omega u^* + (1-\omega) u^{-}\)
- \(u^{-}\ \leftarrow\ u\)
“Until convergence” means that the iteration is stopped when the change in the unknown, \(||u - u^{-}||\), or the residual \(||A(u)u-b(u)||\), is sufficiently small, see the section Stopping criteria for more details.
Newton’s method¶
The natural starting point for Newton’s method is the general nonlinear vector equation \(F(u)=0\). As for a scalar equation, the idea is to approximate \(F\) around a known value \(u^{-}\) by a linear function \(\hat F\), calculated from the first two terms of a Taylor expansion of \(F\). In the multi-variate case these two terms become
where \(J\) is the Jacobian of \(F\), defined by
So, the original nonlinear system is approximated by
which is linear in \(u\) and can be solved in a two-step procedure: first solve \(J\delta u = -F(u^{-})\) with respect to the vector \(\delta u\) and then update \(u = u^{-} + \delta u\). A relaxation parameter can easily be incorporated:
Algorithm for Newton’s method
Given \(F(u)=0\) and an initial guess \(u^{-}\), iterate until convergence:
- solve \(J\delta u = -F(u^{-})\) with respect to \(\delta u\)
- \(u = u^{-} + \omega\delta u\)
- \(u^{-}\ \leftarrow\ u\)
For the special system with structure \(A(u)u=b(u)\),
one gets
We realize that the Jacobian needed in Newton’s method consists of \(A(u^{-})\) as in the Picard iteration plus two additional terms arising from the differentiation. Using the notation \(A^{\prime}(u)\) for \(\partial A/\partial u\) (a quantity with three indices: \(\partial A_{i,k}/\partial u_j\)), and \(b^{\prime}(u)\) for \(\partial b/\partial u\) (a quantity with two indices: \(\partial b_i/\partial u_j\)), we can write the linear system to be solved as
or
Rearranging the terms demonstrates the difference from the system solved in each Picard iteration:
Here we have inserted a parameter \(\gamma\) such that \(\gamma=0\) gives the Picard system and \(\gamma=1\) gives the Newton system. Such a parameter can be handy in software to easily switch between the methods.
Combined algorithm for Picard and Newton iteration
Given \(A(u)\), \(b(u)\), and an initial guess \(u^{-}\), iterate until convergence:
- solve \((A(u^{-}) + \gamma(A^{\prime}(u^{-})u^{-} - b^{\prime}(u^{-})))\delta u = -A(u^{-})u^{-} + b(u^{-})\) with respect to \(\delta u\)
- \(u = u^{-} + \omega\delta u\)
- \(u^{-}\ \leftarrow\ u\)
\(\gamma =1\) gives a Newton method while \(\gamma =0\) corresponds to Picard iteration.
Stopping criteria¶
Let \(||\cdot||\) be the standard Euclidean vector norm. Four termination criteria are much in use:
- Absolute change in solution: \(||u - u^{-}||\leq \epsilon_u\)
- Relative change in solution: \(||u - u^{-}||\leq \epsilon_u ||u_0||\), where \(u_0\) denotes the start value of \(u^{-}\) in the iteration
- Absolute residual: \(||F(u)|| \leq \epsilon_r\)
- Relative residual: \(||F(u)|| \leq \epsilon_r ||F(u_0)||\)
To prevent divergent iterations to run forever, one terminates the iterations when the current number of iterations \(k\) exceeds a maximum value \(k_{\max}\).
The relative criteria are most used since they are not sensitive to the characteristic size of \(u\). Nevertheless, the relative criteria can be misleading when the initial start value for the iteration is very close to the solution, since an unnecessary reduction in the error measure is enforced. In such cases the absolute criteria work better. It is common to combine the absolute and relative measures of the size of the residual, as in
where \(\epsilon_{rr}\) is the tolerance in the relative criterion and \(\epsilon_{ra}\) is the tolerance in the absolute criterion. With a very good initial guess for the iteration (typically the solution of a differential equation at the previous time level), the term \(||F(u_0)||\) is small and \(\epsilon_{ra}\) is the dominating tolerance. Otherwise, \(\epsilon_{rr} ||F(u_0)||\) and the relative criterion dominates.
With the change in solution as criterion we can formulate a combined absolute and relative measure of the change in the solution:
The ultimate termination criterion, combining the residual and the change in solution with a test on the maximum number of iterations, can be expressed as
Example: A nonlinear ODE model from epidemiology¶
The simplest model of the spreading of a disease, such as a flu, takes the form of a \(2\times 2\) ODE system
where \(S(t)\) is the number of people who can get ill (susceptibles) and \(I(t)\) is the number of people who are ill (infected). The constants \(\beta >0\) and \(\nu >0\) must be given along with initial conditions \(S(0)\) and \(I(0)\).
Implicit time discretization¶
A Crank-Nicolson scheme leads to a \(2\times 2\) system of nonlinear algebraic equations in the unknowns \(S^{n+1}\) and \(I^{n+1}\):
Introducing \(S\) for \(S^{n+1}\), \(S^{(1)}\) for \(S^n\), \(I\) for \(I^{n+1}\), \(I^{(1)}\) for \(I^n\), we can rewrite the system as
A Picard iteration¶
We assume that we have approximations \(S^{-}\) and \(I^{-}\) to \(S\) and \(I\). A way of linearizing the only nonlinear term \(SI\) is to write \(I^{-}S\) in the \(F_S=0\) equation and \(S^{-}I\) in the \(F_I=0\) equation, which also decouples the equations. Solving the resulting linear equations with respect to the unknowns \(S\) and \(I\) gives
Before a new iteration, we must update \(S^{-}\ \leftarrow\ S\) and \(I^{-}\ \leftarrow\ I\).
Newton’s method¶
The nonlinear system (28)-(29) can be written as \(F(u)=0\) with \(F=(F_S,F_I)\) and \(u=(S,I)\). The Jacobian becomes
The Newton system \(J(u^{-})\delta u = -F(u^{-})\) to be solved in each iteration is then
Remark. For this particular system of ODEs, explicit time integration methods work very well. Even a Forward Euler scheme is fine, but the 4-th order Runge-Kutta method is an excellent balance between high accuracy, high efficiency, and simplicity.