.. !split

.. _ch:diffu:

Diffusion equations          (1)
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

.. index::
   single: diffusion equation; 1D

.. index:: heat equation

.. index:: diffusion coefficient

.. index::
   single: diffusion equation; diffusion coefficient

The famous *diffusion equation*, also known as the *heat equation*,
reads

.. math::
         \frac{\partial u}{\partial t} =
        {\alpha} \frac{\partial^2 u}{\partial x^2},
        

where :math:`u(x,t)` is the unknown function to be solved for, :math:`x` is a coordinate
in space, and :math:`t` is time. The coefficient :math:`{\alpha}` is the *diffusion
coefficient* and determines how fast :math:`u` changes in time. A quick
short form for the diffusion equation is :math:`u_t = {\alpha} u_{xx}`.

Compared to the wave equation, :math:`u_{tt}=c^2u_{xx}`, which looks very similar,
the diffusion equation features solutions that are very different from
those of the wave equation. Also, the diffusion equation
makes quite different demands to the numerical
methods.

.. index::
   single: diffusion equation; stationary solution

.. index:: stationary solution

.. index:: Laplace equation

Typical diffusion problems may experience rapid change in the very
beginning, but then the evolution of :math:`u` becomes slower and slower.
The solution is usually very smooth, and after some time, one cannot
recognize the initial shape of :math:`u`. This is in sharp contrast to
solutions of the wave equation where the initial shape is preserved in
homogeneous media - the solution is then basically a moving initial
condition. The standard wave equation :math:`u_{tt}=c^2u_{xx}` has solutions
that propagates with speed :math:`c` forever, without changing shape, while
the diffusion equation converges to a *stationary solution* :math:`\bar
u(x)` as :math:`t\rightarrow\infty`. In this limit, :math:`u_t=0`, and :math:`\bar u` is
governed by :math:`\bar u''(x)=0`.  This stationary limit of the diffusion
equation is called the *Laplace* equation and arises in a very wide
range of applications throughout the sciences.

It is possible to solve for :math:`u(x,t)` using an explicit scheme, as we
do in the section :ref:`diffu:pde1:FEsec`, but the time step restrictions
soon become much less favorable than for an explicit scheme applied to
the wave equation. And of more importance, since the solution :math:`u` of
the diffusion equation is very smooth and changes slowly, small time
steps are not convenient and not required by accuracy as the diffusion
process converges to a stationary state. Therefore, implicit schemes
(as described in the section :ref:`diffu:pde1:implicit`) are popular, but
these require solutions of systems of algebraic equations. We shall
use ready-made software for this purpose, but also program some simple
iterative methods.
The exposition is, as usual in this book, very basic and focuses on
the basic ideas and how to implement. More comprehensive mathematical
treatments and classical analysis
of the methods are found in lots of textbooks. A favorite
of ours in this respect is the one by LeVeque [Ref07]_.
The books by Strikwerda [Ref08]_ and by
Lapidus and Pinder [Ref09]_ are also highly recommended
as additional material on the topic.

.. _diffu:pde1:FEsec:

An explicit method for the 1D diffusion equation
================================================

Explicit finite difference methods for the wave equation :math:`u_{tt}=c^2u_{xx}`
can be used, with small modifications, for solving :math:`u_t = {\alpha} u_{xx}`
as well.
The exposition below assumes that the reader is familiar with the
basic ideas of discretization and implementation of wave
equations from the chapter :ref:`ch:wave`. Readers not familiar with the
Forward Euler, Backward Euler, and Crank-Nicolson (or centered or
midpoint) discretization methods in time should consult, e.g.,
the
section `Finite difference methods <http://hplgit.github.io/decay-book/doc/pub/book/sphinx/._book002.html#finite-difference-methods>`__ in [Ref02]_.

The initial-boundary value problem for 1D diffusion
---------------------------------------------------

To obtain a unique solution of the diffusion equation, or equivalently,
to apply numerical methods, we need initial and boundary conditions.
The diffusion equation goes with one initial condition :math:`u(x,0)=I(x)`, where
:math:`I` is a prescribed function. One boundary condition is required at
each point on the boundary, which in 1D means that :math:`u` must be known,
:math:`u_x` must be known, or some combination of them.

.. index::
   single: diffusion equation;  1D, explicit scheme

.. index::
   single: diffusion equation;  1D, initial boundary value problem

.. index::
   single: diffusion equation;  1D, initial condition

.. index::
   single: diffusion equation;  1D, boundary condition

We shall start
with the simplest boundary condition: :math:`u=0`. The complete
initial-boundary value diffusion problem in one
space dimension can then be specified as

.. _Eq:diffu:pde1:

.. math::

    \tag{349}
    \frac{\partial u}{\partial t} =
        {\alpha} \frac{\partial^2 u}{\partial x^2} + f, \quad x\in (0,L),\ t\in (0,T]
        
        
.. _Eq:diffu:pde1:ic:u:

.. math::

    \tag{350}
    u(x,0) = I(x), \quad  x\in [0,L]
        
        
.. _Eq:diffu:pde1:bc:0:

.. math::

    \tag{351}
    u(0,t)  = 0, \quad  t>0,
        
        
.. _Eq:diffu:pde1:bc:L:

.. math::

    \tag{352}
    u(L,t)  = 0, \quad  t>0{\thinspace .}
        
        
With only a first-order derivative in time,
only one *initial condition* is needed, while the second-order
derivative in space leads to a demand for two *boundary conditions*.
We have added a source term :math:`f=f(x,t)`, which is
convenient when testing implementations.

.. index::
   single: diffusion equation; source term

Diffusion equations like :ref:`(349) <Eq:diffu:pde1>` have a wide range of
applications throughout physical, biological, and financial sciences.
One of the most common applications is propagation of heat, where
:math:`u(x,t)` represents the temperature of some substance at point :math:`x` and
time :math:`t`. Other applications are listed in the section :ref:`diffu:app`.

.. _diffu:pde1:FE:

Forward Euler scheme          (1)
---------------------------------

.. index:: explicit discretization methods

.. index::
   single: diffusion equation;  1D, Forward Euler scheme

.. index:: domain

.. index:: mesh points

.. index:: mesh function

.. index:: forward difference approximation

.. index:: central difference approximation

.. index::
   single: diffusion equation;  1D, discrete equations

.. index::
   single: diffusion equation;  1D, Fourier number

.. index::
   single: diffusion equation;  1D, mesh Fourier number

.. index:: dimensionless number

The first step in the discretization procedure is to replace the
domain :math:`[0,L]\times [0,T]` by a set of mesh points. Here we apply
equally spaced mesh points

.. math::
         x_i=i\Delta x,\quad i=0,\ldots,N_x,

and

.. math::
         t_n=n\Delta t,\quad n=0,\ldots,N_t {\thinspace .}  

Moreover, :math:`u^n_i` denotes the mesh function that
approximates :math:`u(x_i,t_n)` for :math:`i=0,\ldots,N_x` and :math:`n=0,\ldots,N_t`.
Requiring the PDE :ref:`(349) <Eq:diffu:pde1>` to be fulfilled at a mesh point :math:`(x_i,t_n)`
leads to the equation

.. _Eq:diffu:pde1:step2:

.. math::

    \tag{353}
    \frac{\partial}{\partial t} u(x_i, t_n) =
        {\alpha}\frac{\partial^2}{\partial x^2} u(x_i, t_n) + f(x_i,t_n),
        
        
The next step is to replace the derivatives by finite difference approximations.
The computationally simplest method arises from
using a forward difference in time and a central difference in
space:

.. _Eq:diffu:pde1:step3a:

.. math::

    \tag{354}
    [D_t^+ u = {\alpha} D_xD_x u + f]^n_i {\thinspace .}
        
        
Written out,

.. _Eq:diffu:pde1:step3b:

.. math::

    \tag{355}
    \frac{u^{n+1}_i-u^n_i}{\Delta t} = {\alpha} \frac{u^{n}_{i+1} - 2u^n_i + u^n_{i-1}}{\Delta x^2} + f_i^n{\thinspace .}
        
        
We have turned the PDE into algebraic equations, also often called
discrete equations. The key property of the equations is that they
are algebraic, which makes them easy to solve.
As usual, we anticipate that :math:`u^n_i` is already computed such that
:math:`u^{n+1}_i` is the only unknown in :ref:`(355) <Eq:diffu:pde1:step3b>`.
Solving with respect to this unknown is easy:

.. _Eq:diffu:pde1:step4:

.. math::

    \tag{356}
    u^{n+1}_i = u^n_i + F\left(
        u^{n}_{i+1} - 2u^n_i + u^n_{i-1}\right) + \Delta t f_i^n,
        
        
where we have introduced the *mesh Fourier number*:

.. _Eq:_auto144:

.. math::

    \tag{357}
    F = {\alpha}\frac{\Delta t}{\Delta x^2}{\thinspace .}
        
        
.. admonition:: :math:`F` is the key parameter in the discrete diffusion equation

   Note that :math:`F` is a *dimensionless* number that lumps the key physical
   parameter in the problem, :math:`{\alpha}`, and the discretization parameters
   :math:`\Delta x` and :math:`\Delta t` into a single parameter. Properties
   of the numerical method are critically dependent upon the value of
   :math:`F` (see the section :ref:`diffu:pde1:analysis` for details).


The computational algorithm then becomes

 1. compute :math:`u^0_i=I(x_i)` for :math:`i=0,\ldots,N_x`

 2. for :math:`n=0,1,\ldots,N_t`:

   a. apply :ref:`(356) <Eq:diffu:pde1:step4>` for all the internal
      spatial points :math:`i=1,\ldots,N_x-1`

   b. set the boundary values
      :math:`u^{n+1}_i=0` for :math:`i=0` and :math:`i=N_x`

The algorithm is compactly and fully specified in Python:

.. code-block:: python

    import numpy as np
    x = np.linspace(0, L, Nx+1)    # mesh points in space
    dx = x[1] - x[0]
    t = np.linspace(0, T, Nt+1)    # mesh points in time
    dt = t[1] - t[0]
    F = a*dt/dx**2
    u   = np.zeros(Nx+1)           # unknown u at new time level
    u_n = np.zeros(Nx+1)           # u at the previous time level
    
    # Set initial condition u(x,0) = I(x)
    for i in range(0, Nx+1):
        u_n[i] = I(x[i])
    
    for n in range(0, Nt):
        # Compute u at inner mesh points
        for i in range(1, Nx):
            u[i] = u_n[i] + F*(u_n[i-1] - 2*u_n[i] + u_n[i+1]) + \ 
    	       dt*f(x[i], t[n])
    
        # Insert boundary conditions
        u[0] = 0;  u[Nx] = 0
    
        # Update u_n before next step
        u_n[:]= u

Note that we use ``a`` for :math:`{\alpha}` in the code, motivated by easy visual
mapping between the variable name and the mathematical symbol in formulas.

We need to state already now that the shown algorithm does not
produce meaningful results unless :math:`F\leq 1/2`. Why is explained in
the section :ref:`diffu:pde1:analysis`.

.. _diffu:pde1:FE:code:

Implementation          (7)
---------------------------

.. index::
   single: diffusion equation;  1D, implementation (FE)

The file `diffu1D_u0.py <http://tinyurl.com/nu656p2/diffu/diffu1D_u0.py>`__
contains a complete function ``solver_FE_simple``
for solving the 1D diffusion equation with :math:`u=0` on the boundary
as specified in the algorithm above:

.. code-block:: python

    import numpy as np
    
    def solver_FE_simple(I, a, f, L, dt, F, T):
        """
        Simplest expression of the computational algorithm
        using the Forward Euler method and explicit Python loops.
        For this method F <= 0.5 for stability.
        """
        import time;  t0 = time.clock()  # For measuring the CPU time
    
        Nt = int(round(T/float(dt)))
        t = np.linspace(0, Nt*dt, Nt+1)   # Mesh points in time
        dx = np.sqrt(a*dt/F)
        Nx = int(round(L/dx))
        x = np.linspace(0, L, Nx+1)       # Mesh points in space
        # Make sure dx and dt are compatible with x and t
        dx = x[1] - x[0]
        dt = t[1] - t[0]
    
        u   = np.zeros(Nx+1)
        u_n = np.zeros(Nx+1)
    
        # Set initial condition u(x,0) = I(x)
        for i in range(0, Nx+1):
            u_n[i] = I(x[i])
    
        for n in range(0, Nt):
            # Compute u at inner mesh points
            for i in range(1, Nx):
                u[i] = u_n[i] + F*(u_n[i-1] - 2*u_n[i] + u_n[i+1]) + \ 
                       dt*f(x[i], t[n])
    
            # Insert boundary conditions
            u[0] = 0;  u[Nx] = 0
    
            # Switch variables before next step
            #u_n[:] = u  # safe, but slow
            u_n, u = u, u_n
    
        t1 = time.clock()
        return u_n, x, t, t1-t0  # u_n holds latest u

A faster version, based on vectorization of the finite difference
scheme, is available in the function ``solver_FE``.
The vectorized version replaces the explicit loop

.. code-block:: python

    for i in range(1, Nx):
        u[i] = u_n[i] + F*(u_n[i-1] - 2*u_n[i] + u_n[i+1]) \ 
               + dt*f(x[i], t[n])

by arithmetics on displaced slices of the ``u`` array:

.. code-block:: python

    u[1:Nx] = u_n[1:Nx] + F*(u_n[0:Nx-1] - 2*u_n[1:Nx] + u_n[2:Nx+1]) \ 
              + dt*f(x[1:Nx], t[n])
    # or
    u[1:-1] = u_n[1:-1] + F*(u_n[0:-2] - 2*u_n[1:-1] + u_n[2:]) \ 
              + dt*f(x[1:-1], t[n])

For example,
the vectorized version runs 70 times faster than the scalar version
in a case with 100 time steps and a spatial mesh of :math:`10^5` cells.

The ``solver_FE`` function also features a callback function such that the
user can process the solution at each time level. The callback
function looks like ``user_action(u, x, t, n)``, where ``u`` is the array
containing the solution at time level ``n``, ``x`` holds all the
spatial mesh points, while ``t`` holds all the temporal mesh points.
Apart from the vectorized loop over the spatial mesh points, the
callback function, and a bit more complicated setting of the source
``f`` it is not specified (``None``), the ``solver_FE`` function is identical to
``solver_FE_simple`` above:

.. code-block:: python

    def solver_FE(I, a, f, L, dt, F, T,
                  user_action=None, version='scalar'):
        """
        Vectorized implementation of solver_FE_simple.
        """
        import time;  t0 = time.clock()  # for measuring the CPU time
    
        Nt = int(round(T/float(dt)))
        t = np.linspace(0, Nt*dt, Nt+1)   # Mesh points in time
        dx = np.sqrt(a*dt/F)
        Nx = int(round(L/dx))
        x = np.linspace(0, L, Nx+1)       # Mesh points in space
        # Make sure dx and dt are compatible with x and t
        dx = x[1] - x[0]
        dt = t[1] - t[0]
    
        u   = np.zeros(Nx+1)   # solution array
        u_n = np.zeros(Nx+1)   # solution at t-dt
    
        # Set initial condition
        for i in range(0,Nx+1):
            u_n[i] = I(x[i])
    
        if user_action is not None:
            user_action(u_n, x, t, 0)
    
        for n in range(0, Nt):
            # Update all inner points
            if version == 'scalar':
                for i in range(1, Nx):
                    u[i] = u_n[i] +\ 
                           F*(u_n[i-1] - 2*u_n[i] + u_n[i+1]) +\ 
                           dt*f(x[i], t[n])
    
            elif version == 'vectorized':
                u[1:Nx] = u_n[1:Nx] +  \ 
                          F*(u_n[0:Nx-1] - 2*u_n[1:Nx] + u_n[2:Nx+1]) +\ 
                          dt*f(x[1:Nx], t[n])
            else:
                raise ValueError('version=%s' % version)
    
            # Insert boundary conditions
            u[0] = 0;  u[Nx] = 0
            if user_action is not None:
                user_action(u, x, t, n+1)
    
            # Switch variables before next step
            u_n, u = u, u_n
    
        t1 = time.clock()
        return t1-t0

.. _diffu:pde1:FE:verify:

Verification          (7)
-------------------------

.. index::
   single: diffusion equation;  1D, verification (FE)

.. _diffu:pde1:FE:verify:exact:

Exact solution of discrete equations
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Before thinking about running the functions in the previous section,
we need to construct a suitable test example for verification. It
appears that a manufactured solution that is linear in time and at
most quadratic in space fulfills the Forward Euler scheme
exactly. With the restriction that :math:`u=0` for :math:`x=0,L`, we can try the
solution

.. math::
         u(x,t) = 5tx(L-x){\thinspace .}

Inserted in the PDE, it requires a source term

.. math::
         f(x,t) = 10{\alpha} t + 5x(L-x){\thinspace .}

With the formulas from  :ref:`sec:form:fdtn` we can easily check
that the manufactured ``u`` fulfills the scheme:

.. math::
        \begin{align*}
        \lbrack D_t^+ u = {\alpha} D_x D_x u + f\rbrack^n_i &=
        \lbrack 5x(L-x)D_t^+ t  = 5 t{\alpha} D_x D_x (xL-x^2) +\\ 
        &\quad\quad 10{\alpha} t + 5x(L-x)\rbrack^n_i\\ 
        &=
        \lbrack 5x(L-x)  = 5 t{\alpha} (-2) + 10{\alpha} t + 5x(L-x) \rbrack^n_i,
        \end{align*}

which is a 0=0 expression.
The computation of the source term, given any :math:`u`,
is easily automated with ``sympy``:

.. code-block:: python

    import sympy as sym
    x, t, a, L = sym.symbols('x t a L')
    u = x*(L-x)*5*t
    
    def pde(u):
        return sym.diff(u, t) - a*sym.diff(u, x, x)
    
    f = sym.simplify(pde(u))

Now we can choose any expression for ``u`` and automatically
get the suitable source term ``f``. However, the manufactured solution
``u`` will in general
not be exactly reproduced by the scheme: only constant and linear
functions are differentiated correctly by a forward difference, while only
constant, linear, and quadratic functions are differentiated exactly by
a :math:`[D_xD_x u]^n_i` difference.

The numerical code will need to access the ``u`` and ``f`` above
as Python functions. The exact solution is wanted as a Python
function ``u_exact(x, t)``, while the source term is wanted as
``f(x, t)``. The parameters ``a`` and ``L`` in ``u`` and ``f`` above
are symbols and must be replaced by ``float`` objects in a Python
function. This can be done by redefining ``a`` and ``L`` as
``float`` objects and performing substitutions of symbols by
numbers in ``u`` and ``f``. The appropriate code looks like this:

.. code-block:: python

    a = 0.5
    L = 1.5
    u_exact = sym.lambdify(
        [x, t], u.subs('L', L).subs('a', a), modules='numpy')
    f = sym.lambdify(
        [x, t], f.subs('L', L).subs('a', a), modules='numpy')
    I = lambda x: u_exact(x, 0)

Here we also make a function ``I`` for the initial condition.

The idea now is that our manufactured solution should be
exactly reproduced by the code (to machine precision).
For this purpose we make a test function for comparing
the exact and numerical solutions at the end of the
time interval:

.. code-block:: python

    def test_solver_FE():
        # Define u_exact, f, I as explained above
    
        dx = L/3  # 3 cells
        F = 0.5
        dt = F*dx**2
    
        u, x, t, cpu = solver_FE_simple(
            I=I, a=a, f=f, L=L, dt=dt, F=F, T=2)
        u_e = u_exact(x, t[-1])
        diff = abs(u_e - u).max()
        tol = 1E-14
        assert diff < tol, 'max diff solver_FE_simple: %g' % diff
    
        u, x, t, cpu = solver_FE(
            I=I, a=a, f=f, L=L, dt=dt, F=F, T=2,
            user_action=None, version='scalar')
        u_e = u_exact(x, t[-1])
        diff = abs(u_e - u).max()
        tol = 1E-14
        assert diff < tol, 'max diff solver_FE, scalar: %g' % diff
    
        u, x, t, cpu = solver_FE(
            I=I, a=a, f=f, L=L, dt=dt, F=F, T=2,
            user_action=None, version='vectorized')
        u_e = u_exact(x, t[-1])
        diff = abs(u_e - u).max()
        tol = 1E-14
        assert diff < tol, 'max diff solver_FE, vectorized: %g' % diff


.. admonition:: The critical value :math:`F=0.5`

   We emphasize that the value ``F=0.5`` is critical: the tests above
   will fail if ``F`` has a larger value. This is because the Forward
   Euler scheme is unstable for :math:`F>1/2`.
   
   The reader may wonder if
   :math:`F=1/2` is safe or if :math:`F<1/2` should be required. Experiments show
   that :math:`F=1/2` works fine for :math:`u_t={\alpha} u_{xx}`, so
   there is no accumulation of rounding
   errors in this case and hence no need to introduce any safety factor
   to keep :math:`F` away from the limiting value 0.5.


.. _diffu:pde1:FE:verify:convrates:

Checking convergence rates          (2)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

If our chosen exact solution does not satisfy the discrete equations
exactly, we are left with checking the convergence rates, just as we did
previously for the wave equation. However, with the Euler scheme here,
we have different accuracies in time and space, since we use a second
order approximation to the spatial derivative and a first order approximation
to the time derivative. Thus, we must expect different convergence rates in
time and space. For the numerical error,

.. math::
         E = C_t\Delta t^r + C_x\Delta x^p,

we should get convergence rates :math:`r=1` and :math:`p=2` (:math:`C_t` and :math:`C_x` are unknown constants).
As previously,
in the section :ref:`wave:pde2:fd:MMS`,
we simplify matters by introducing a single discretization parameter :math:`h`:

.. math::
         h = \Delta t,\quad \Delta x = Kh^{r/p},

where :math:`K` is any constant. This allows us to factor out only *one*
discretization parameter :math:`h` from the formula:

.. math::
        
        E = C_t h + C_x (K^{r/p})^p = \tilde C h^r,\quad
        \tilde C = C_t + C_sK^r{\thinspace .}

The computed rate :math:`r` should approach 1 with increasing resolution.

It is tempting, for simplicity,
to choose :math:`K=1`, which gives :math:`\Delta x = h^{r/p}`, expected to be
:math:`\sqrt{\Delta t}`. However,
we have to control the stability requirement: :math:`F\leq\frac{1}{2}`,
which means

.. math::
         \frac{{\alpha}\Delta t}{\Delta x^2}\leq\frac{1}{2}\quad\Rightarrow
        \quad \Delta x \geq \sqrt{2{\alpha}}h^{1/2} ,

implying that :math:`K=\sqrt{2{\alpha}}` is our choice in experiments where we
lie on the stability limit :math:`F=1/2`.

.. _diffu:pde1:FE:experiments:

Numerical experiments
---------------------

.. index::
   single: diffusion equation;  1D, numerical experiments

When a test function like the one above runs silently without errors,
we have some evidence for a correct implementation of the numerical
method.  The next step is to do some experiments with more interesting
solutions.

We target a scaled diffusion problem where :math:`x/L` is a new spatial
coordinate and :math:`{\alpha} t/L^2` is a new time coordinate. The source term
:math:`f` is omitted, and :math:`u` is scaled by :math:`\max_{x\in [0,L]}|I(x)|` (see
`Scaling of the diffusion equation <http://hplgit.github.io/scaling-book/doc/pub/book/html/._scaling-book008.html#sec:scale:diffu>`__ [Ref03]_ for details).
The governing PDE is then

.. math::
         \frac{\partial u}{\partial t} = \frac{\partial^2 u}{\partial x^2},

in the spatial domain :math:`[0,L]`, with boundary conditions :math:`u(0)=u(1)=0`.
Two initial conditions will be tested: a discontinuous plug,

.. math::
         I(x) = \left\lbrace\begin{array}{ll}
        0, & |x-L/2| > 0.1\\ 
        1, & \hbox{otherwise}
        \end{array}\right.

and a smooth Gaussian function,

.. math::
         I(x) = e^{-\frac{1}{2\sigma^2}(x-L/2)^2}{\thinspace .}

The functions ``plug`` and ``gaussian`` in `diffu1D_u0.py <http://tinyurl.com/nu656p2/diffu/diffu1D_u0.py>`__ run the two cases,
respectively:

.. code-block:: python

    def plug(scheme='FE', F=0.5, Nx=50):
        L = 1.
        a = 1.
        T = 0.1
        # Compute dt from Nx and F
        dx = L/Nx;  dt = F/a*dx**2
    
        def I(x):
            """Plug profile as initial condition."""
            if abs(x-L/2.0) > 0.1:
                return 0
            else:
                return 1
    
        cpu = viz(I, a, L, dt, F, T,
                  umin=-0.1, umax=1.1,
                  scheme=scheme, animate=True, framefiles=True)
        print 'CPU time:', cpu
    
    def gaussian(scheme='FE', F=0.5, Nx=50, sigma=0.05):
        L = 1.
        a = 1.
        T = 0.1
        # Compute dt from Nx and F
        dx = L/Nx;  dt = F/a*dx**2
    
        def I(x):
            """Gaussian profile as initial condition."""
            return exp(-0.5*((x-L/2.0)**2)/sigma**2)
    
        u, cpu = viz(I, a, L, dt, F, T,
                     umin=-0.1, umax=1.1,
                     scheme=scheme, animate=True, framefiles=True)
        print 'CPU time:', cpu

These functions make use of the function ``viz`` for running the
solver and visualizing the solution using a callback function
with plotting:

.. code-block:: python

    def viz(I, a, L, dt, F, T, umin, umax,
            scheme='FE', animate=True, framefiles=True):
    
        def plot_u(u, x, t, n):
            plt.plot(x, u, 'r-', axis=[0, L, umin, umax],
                     title='t=%f' % t[n])
            if framefiles:
                plt.savefig('tmp_frame%04d.png' % n)
            if t[n] == 0:
                time.sleep(2)
            elif not framefiles:
                # It takes time to write files so pause is needed
                # for screen only animation
                time.sleep(0.2)
    
        user_action = plot_u if animate else lambda u,x,t,n: None
    
        cpu = eval('solver_'+scheme)(I, a, L, dt, F, T,
                                     user_action=user_action)
        return cpu

Notice that this ``viz`` function stores all the solutions in a
list ``solutions`` in the callback function. Modern computers have
hardly any problem with storing a lot of such solutions for moderate
values of :math:`N_x` in 1D problems, but for 2D and 3D problems, this
technique cannot be used and solutions must be stored in files.

Our experiments employ a time step :math:`\Delta t = 0.0002` and
simulate for :math:`t\in [0,0.1]`. First we try the highest value of
:math:`F`: :math:`F=0.5`. This resolution corresponds to
:math:`N_x=50`. A possible terminal command is

.. code-block:: text

    Terminal> python -c 'from diffu1D_u0 import gaussian
              gaussian("solver_FE", F=0.5, dt=0.0002)'

The :math:`u(x,t)` curve as a function of :math:`x` is shown in Figure
:ref:`diffu:pde1:FE:fig:F=0.5` at four time levels.

.. raw:: html
        
        <div>
        <video  loop controls width='640' height='365' preload='none'>
            <source src='mov-diffu/diffu1D_u0_FE_plug/movie.mp4'  type='video/mp4;  codecs="avc1.42E01E, mp4a.40.2"'>
            <source src='mov-diffu/diffu1D_u0_FE_plug/movie.webm' type='video/webm; codecs="vp8, vorbis"'>
            <source src='mov-diffu/diffu1D_u0_FE_plug/movie.ogg'  type='video/ogg;  codecs="theora, vorbis"'>
        </video>
        </div>
        <p><em></em></p>
        
        <!-- Issue warning if in a Safari browser -->
        <script language="javascript">
        if (!!(window.safari)) {
          document.write("<div style=\"width: 95%%; padding: 10px; border: 1px solid #100; border-radius: 4px;\"><p><font color=\"red\">The above movie will not play in Safari - use Chrome, Firefox, or Opera.</font></p></div>")}
        </script>
        

.. `movie <http://tinyurl.com/pu5uyfn/pub/pub/diffu/html/mov-diffu/diffu1D_u0_FE_plug/movie.ogg>`__

.. Does not work:

.. http://tinyurl.com/pu5uyfn/pub/diffu/html/mov-diffu/diffu1D_u0_FE_plug/movie.ogg

.. Works:

.. https://raw.githubusercontent.com/hplgit/fdm-book/master/doc/.src/book/mov-diffu/diffu1D_u0_FE_plug/movie.ogg

We see that the curves have saw-tooth waves in the beginning of the
simulation. This non-physical noise is smoothed out with time, but
solutions of the diffusion equations are known to be smooth, and
this numerical solution is definitely not smooth.
Lowering :math:`F` helps: :math:`F\leq 0.25` gives a smooth solution, see
Figure :ref:`diffu:pde1:FE:fig:F=0.25`.

.. raw:: html
        
        <div>
        <video  loop controls width='640' height='365' preload='none'>
            <source src='mov-diffu/diffu1D_u0_FE_plug_F025/movie.mp4'  type='video/mp4;  codecs="avc1.42E01E, mp4a.40.2"'>
            <source src='mov-diffu/diffu1D_u0_FE_plug_F025/movie.webm' type='video/webm; codecs="vp8, vorbis"'>
            <source src='mov-diffu/diffu1D_u0_FE_plug_F025/movie.ogg'  type='video/ogg;  codecs="theora, vorbis"'>
        </video>
        </div>
        <p><em></em></p>
        
        <!-- Issue warning if in a Safari browser -->
        <script language="javascript">
        if (!!(window.safari)) {
          document.write("<div style=\"width: 95%%; padding: 10px; border: 1px solid #100; border-radius: 4px;\"><p><font color=\"red\">The above movie will not play in Safari - use Chrome, Firefox, or Opera.</font></p></div>")}
        </script>
        

Increasing :math:`F` slightly beyond the limit 0.5, to :math:`F=0.51`,
gives growing, non-physical instabilities,
as seen in Figure :ref:`diffu:pde1:FE:fig:F=0.51`.

.. _diffu:pde1:FE:fig:F=0.5:

.. figure:: plug_FE_F05.png
   :width: 800

   Forward Euler scheme for :math:`F=0.5`

.. _diffu:pde1:FE:fig:F=0.25:

.. figure:: plug_FE_F025.png
   :width: 800

   Forward Euler scheme for :math:`F=0.25`

.. _diffu:pde1:FE:fig:F=0.51:

.. figure:: plug_FE_F051.png
   :width: 800

   Forward Euler scheme for :math:`F=0.51`

Instead of a discontinuous initial condition we now try the smooth
Gaussian function for :math:`I(x)`. A simulation for :math:`F=0.5`
is shown in Figure :ref:`diffu:pde1:FE:fig:gauss:F=0.5`. Now the numerical solution
is smooth for all times, and this is true for any :math:`F\leq 0.5`.

.. raw:: html
        
        <div>
        <video  loop controls width='640' height='365' preload='none'>
            <source src='mov-diffu/diffu1D_u0_FE_gaussian1/movie.mp4'  type='video/mp4;  codecs="avc1.42E01E, mp4a.40.2"'>
            <source src='mov-diffu/diffu1D_u0_FE_gaussian1/movie.webm' type='video/webm; codecs="vp8, vorbis"'>
            <source src='mov-diffu/diffu1D_u0_FE_gaussian1/movie.ogg'  type='video/ogg;  codecs="theora, vorbis"'>
        </video>
        </div>
        <p><em></em></p>
        
        <!-- Issue warning if in a Safari browser -->
        <script language="javascript">
        if (!!(window.safari)) {
          document.write("<div style=\"width: 95%%; padding: 10px; border: 1px solid #100; border-radius: 4px;\"><p><font color=\"red\">The above movie will not play in Safari - use Chrome, Firefox, or Opera.</font></p></div>")}
        </script>
        

.. _diffu:pde1:FE:fig:gauss:F=0.5:

.. figure:: gaussian_FE_F05.png
   :width: 800

   Forward Euler scheme for :math:`F=0.5`

Experiments with these two choices of :math:`I(x)` reveal some
important observations:

 * The Forward Euler scheme leads to growing solutions if :math:`F>\frac{1}{2}`.

 * :math:`I(x)` as a discontinuous plug leads to a saw tooth-like noise
   for :math:`F=\frac{1}{2}`, which is absent for :math:`F\leq\frac{1}{4}`.

 * The smooth Gaussian initial function leads to a smooth solution
   for all relevant :math:`F` values (:math:`F\leq \frac{1}{2}`).

.. _diffu:pde1:implicit:

Implicit methods for the 1D diffusion equation
==============================================

.. index::
   single: diffusion equation;  1D, implicit schemes

Simulations with the Forward Euler scheme shows that the time step
restriction, :math:`F\leq\frac{1}{2}`, which means :math:`\Delta t \leq \Delta x^2/(2{\alpha})`,
may be relevant in the beginning of the diffusion process, when the
solution changes quite fast, but as time increases, the process slows
down, and a small :math:`\Delta t` may be inconvenient. By using
*implicit schemes*, which lead to coupled systems of linear equations
to be solved at each time level, any size of :math:`\Delta t` is possible
(but the accuracy decreases with increasing :math:`\Delta t`).
The Backward Euler scheme, derived and implemented below, is the
simplest implicit scheme for the diffusion equation.

.. _diffu:pde1:BE:

Backward Euler scheme
---------------------

We now apply a backward difference in time in :ref:`(353) <Eq:diffu:pde1:step2>`,
but the same central difference in space:

.. _Eq:diffu:pde1:step3aBE:

.. math::

    \tag{358}
    [D_t^- u = D_xD_x u + f]^n_i,
        
        
which written out reads

.. _Eq:diffu:pde1:step3bBE:

.. math::

    \tag{359}
    \frac{u^{n}_i-u^{n-1}_i}{\Delta t} = {\alpha}\frac{u^{n}_{i+1} - 2u^n_i + u^n_{i-1}}{\Delta x^2} + f_i^n{\thinspace .}
        
        
Now we assume :math:`u^{n-1}_i` is already computed, but all quantities at the "new"
time level :math:`n` are unknown. This time it is not possible to solve
with respect to :math:`u_i^{n}` because this value couples to its neighbors
in space, :math:`u^n_{i-1}` and :math:`u^n_{i+1}`, which are also unknown.
Let us examine this fact for the case when :math:`N_x=3`. Equation :ref:`(359) <Eq:diffu:pde1:step3bBE>` written for :math:`i=1,\ldots,Nx-1= 1,2` becomes

.. _Eq:_auto145:

.. math::

    \tag{360}
    \frac{u^{n}_1-u^{n-1}_1}{\Delta t} = {\alpha}\frac{u^{n}_{2} - 2u^n_1 + u^n_{0}}{\Delta x^2} + f_1^n
        
        
.. _Eq:_auto146:

.. math::

    \tag{361}
    \frac{u^{n}_2-u^{n-1}_2}{\Delta t} = {\alpha}\frac{u^{n}_{3} - 2u^n_2 + u^n_{1}}{\Delta x^2} + f_2^n
        
        
The boundary values :math:`u^n_0` and :math:`u^n_3` are known as zero. Collecting the
unknown new values :math:`u^n_1` and :math:`u^n_2` on the left-hand side and multiplying
by :math:`\Delta t` gives

.. _Eq:_auto147:

.. math::

    \tag{362}
    \left(1+  2F\right) u^{n}_1 - F u^{n}_{2}    = u^{n-1}_1 + \Delta t f_1^n,
        
        
.. _Eq:_auto148:

.. math::

    \tag{363}
    - F u^{n}_{1} + \left(1+  2F\right) u^{n}_2  = u^{n-1}_2 + \Delta t f_2^n{\thinspace .}
        
        
This is a coupled :math:`2\times 2` system of algebraic equations for
the unknowns :math:`u^n_1` and :math:`u^n_2`. The equivalent matrix form is

.. math::
         \left(\begin{array}{cc}
        1+  2F &   - F\\ 
        - F    & 1+  2F
        \end{array}\right)
        \left(\begin{array}{c}
        u^{n}_1\\ 
        u^{n}_2
        \end{array}\right)
        =
        \left(\begin{array}{c}
        u^{n-1}_1 + \Delta t f_1^n\\ 
        u^{n-1}_2 + \Delta t f_2^n
        \end{array}\right)
        

.. admonition:: Terminology: implicit vs. explicit methods

   Discretization methods that lead to a coupled system of equations
   for the unknown function at a new time level are said to be
   *implicit methods*.
   The counterpart, *explicit methods*, refers to discretization
   methods where there is a simple explicit formula for the values of
   the unknown function at each of the spatial mesh points at the new
   time level. From an implementational point of view, implicit methods
   are more comprehensive to code since they require
   the solution of coupled equations, i.e., a matrix system, at each time level.
   With explicit methods we have a closed-form formula for the value of
   the unknown at each mesh point.
   
   Very often explicit schemes have a restriction on the size of the time
   step that can be relaxed by using implicit schemes. In fact,
   implicit schemes are frequently unconditionally stable, so the size of the
   time step is governed by accuracy and not by stability. This is the great
   advantage of implicit schemes.


In the general case, :ref:`(359) <Eq:diffu:pde1:step3bBE>` gives rise to
a coupled :math:`(N_x-1)\times (N_x-1)` system of algebraic equations for
all the unknown :math:`u^n_i` at the interior spatial points :math:`i=1,\ldots,N_x-1`.
Collecting the unknowns on the left-hand side,
:ref:`(359) <Eq:diffu:pde1:step3bBE>` can be written

.. _Eq:diffu:pde1:step4BE:

.. math::

    \tag{364}
    - F u^n_{i-1} + \left(1+  2F \right) u^{n}_i - F u^n_{i+1} =
        u_{i-1}^{n-1},
        
        
for :math:`i=1,\ldots,N_x-1`.
One can either view these equations as a system for where the
:math:`u^{n}_i` values at the internal mesh points, :math:`i=1,\ldots,N_x-1`, are
unknown, or we may append the boundary values :math:`u^n_0` and :math:`u^n_{N_x}`
to the system. In the latter case, all :math:`u^n_i` for :math:`i=0,\ldots,N_x`
are considered unknown, and we must add the boundary equations to
the :math:`N_x-1` equations in :ref:`(364) <Eq:diffu:pde1:step4BE>`:

.. _Eq:diffu:pde1:step4BE:BC:0:

.. math::

    \tag{365}
    u_0^n = 0,
        

.. _Eq:diffu:pde1:step4BE:BC:L:

.. math::

    \tag{366}
    u_{N_x}^n = 0{\thinspace .}
        
        
A coupled system of algebraic equations can be written on matrix form,
and this is important if we want to call up ready-made software for
solving the system.  The equations :ref:`(364) <Eq:diffu:pde1:step4BE>`
and :ref:`(365) <Eq:diffu:pde1:step4BE:BC:0>`--:ref:`(366) <Eq:diffu:pde1:step4BE:BC:L>`
correspond to the matrix equation

.. math::
         AU = b

where :math:`U=(u^n_0,\ldots,u^n_{N_x})`, and
the matrix :math:`A` has the following structure:

.. _Eq:diffu:pde1:matrix:sparsity:

.. math::

    \tag{367}
    A =
        \left(
        \begin{array}{cccccccccc}
        A_{0,0} & A_{0,1} & 0
        &\cdots &
        \cdots & \cdots & \cdots &
        \cdots & 0 \\ 
        A_{1,0} & A_{1,1} & A_{1,2} & \ddots &   & &  & &  \vdots \\ 
        0 & A_{2,1} & A_{2,2} & A_{2,3} &
        \ddots & &  &  & \vdots \\ 
        \vdots & \ddots &  & \ddots & \ddots & 0 &  & & \vdots \\ 
        \vdots &  & \ddots & \ddots & \ddots & \ddots & \ddots & & \vdots \\ 
        \vdots & &  & 0 & A_{i,i-1} & A_{i,i} & A_{i,i+1} & \ddots & \vdots \\ 
        \vdots & & &  & \ddots & \ddots & \ddots &\ddots  & 0 \\ 
        \vdots & & & &  &\ddots  & \ddots &\ddots  & A_{N_x-1,N_x} \\ 
        0 &\cdots & \cdots &\cdots & \cdots & \cdots  & 0 & A_{N_x,N_x-1} & A_{N_x,N_x}
        \end{array}
        \right)
        
        
The nonzero elements are given by

.. _Eq:_auto149:

.. math::

    \tag{368}
    A_{i,i-1} = -F
        
        
.. _Eq:_auto150:

.. math::

    \tag{369}
    A_{i,i} = 1+ 2F
        
        
.. _Eq:_auto151:

.. math::

    \tag{370}
    A_{i,i+1} = -F
        
        
in the equations for internal points, :math:`i=1,\ldots,N_x-1`. The first and last
equation correspond to the boundary condition, where we know the solution,
and therefore we must have

.. _Eq:_auto152:

.. math::

    \tag{371}
    A_{0,0} = 1,
        
        
.. _Eq:_auto153:

.. math::

    \tag{372}
    A_{0,1} = 0,
        
        
.. _Eq:_auto154:

.. math::

    \tag{373}
    A_{N_x,N_x-1} = 0,
        
        
.. _Eq:_auto155:

.. math::

    \tag{374}
    A_{N_x,N_x} = 1{\thinspace .}
        
        
The right-hand side :math:`b` is written as

.. _Eq:_auto156:

.. math::

    \tag{375}
    b = \left(\begin{array}{c}
        b_0\\ 
        b_1\\ 
        \vdots\\ 
        b_i\\ 
        \vdots\\ 
        b_{N_x}
        \end{array}\right)
        
        
with

.. _Eq:_auto157:

.. math::

    \tag{376}
    b_0 = 0,
        
        
.. _Eq:_auto158:

.. math::

    \tag{377}
    b_i = u^{n-1}_i,\quad i=1,\ldots,N_x-1,
        
        
.. _Eq:_auto159:

.. math::

    \tag{378}
    b_{N_x} = 0 {\thinspace .}   
        

We observe that the matrix :math:`A` contains quantities that do not change
in time. Therefore, :math:`A` can be formed once and for all before we enter
the recursive formulas for the time evolution.
The right-hand side :math:`b`, however, must be updated at each time step.
This leads to the following computational algorithm, here sketched
with Python code:

.. code-block:: python

    x = np.linspace(0, L, Nx+1)   # mesh points in space
    dx = x[1] - x[0]
    t = np.linspace(0, T, N+1)    # mesh points in time
    u   = np.zeros(Nx+1)          # unknown u at new time level
    u_n = np.zeros(Nx+1)          # u at the previous time level
    
    # Data structures for the linear system
    A = np.zeros((Nx+1, Nx+1))
    b = np.zeros(Nx+1)
    
    for i in range(1, Nx):
        A[i,i-1] = -F
        A[i,i+1] = -F
        A[i,i] = 1 + 2*F
    A[0,0] = A[Nx,Nx] = 1
    
    # Set initial condition u(x,0) = I(x)
    for i in range(0, Nx+1):
        u_n[i] = I(x[i])
    
    import scipy.linalg
    
    for n in range(0, Nt):
        # Compute b and solve linear system
        for i in range(1, Nx):
            b[i] = -u_n[i]
        b[0] = b[Nx] = 0
        u[:] = scipy.linalg.solve(A, b)
    
        # Update u_n before next step
        u_n[:] = u

Regarding verification, the same considerations apply as for the
Forward Euler method (the section :ref:`diffu:pde1:FE:verify`).

.. index::
   single: diffusion equation;  1D, verification (BE)

.. _diffu:pde1:impl:sparse:

Sparse matrix implementation
----------------------------

.. index::
   single: diffusion equation;  1D, tridiagonal matrix

We have seen from :ref:`(367) <Eq:diffu:pde1:matrix:sparsity>` that the matrix
:math:`A` is tridiagonal. The code segment above used a full, dense matrix
representation of :math:`A`, which stores a lot of values we know are zero
beforehand, and worse, the solution algorithm computes with all these
zeros.  With :math:`N_x+1` unknowns, the work by the solution algorithm is
:math:`\frac{1}{3} (N_x+1)^3` and the storage requirements :math:`(N_x+1)^2`. By
utilizing the fact that :math:`A` is tridiagonal and employing corresponding
software tools that work with the three diagonals, the work and
storage demands can be proportional to :math:`N_x` only.  This leads to a
dramatic improvement: with :math:`N_x=200`, which is a realistic resolution,
the code runs about 40,000 times faster and reduces the storage to
just 1.5%! It is no doubt that we should take advantage of the fact
that :math:`A` is tridiagonal.

The key idea is to apply a data structure for a tridiagonal or sparse
matrix. The ``scipy.sparse`` package has relevant utilities. For
example, we can store only the nonzero diagonals of a matrix. The
package also has linear system solvers that operate on sparse matrix
data structures. The code below illustrates how we can store only the
main diagonal and the upper and lower diagonals.

.. code-block:: python

    # Representation of sparse matrix and right-hand side
    main  = np.zeros(Nx+1)
    lower = np.zeros(Nx)
    upper = np.zeros(Nx)
    b     = np.zeros(Nx+1)
    
    # Precompute sparse matrix
    main[:] = 1 + 2*F
    lower[:] = -F
    upper[:] = -F
    # Insert boundary conditions
    main[0] = 1
    main[Nx] = 1
    
    A = scipy.sparse.diags(
        diagonals=[main, lower, upper],
        offsets=[0, -1, 1], shape=(Nx+1, Nx+1),
        format='csr')
    print A.todense()  # Check that A is correct
    
    # Set initial condition
    for i in range(0,Nx+1):
        u_n[i] = I(x[i])
    
    for n in range(0, Nt):
        b = u_n
        b[0] = b[-1] = 0.0  # boundary conditions
        u[:] = scipy.sparse.linalg.spsolve(A, b)
        u_n[:] = u

The ``scipy.sparse.linalg.spsolve`` function utilizes the sparse storage
structure of ``A`` and performs, in this case, a very efficient Gaussian
elimination solve.

The program `diffu1D_u0.py <http://tinyurl.com/nu656p2/diffu/diffu1D_u0.py>`__
contains a function ``solver_BE``, which implements the Backward Euler scheme
sketched above.
As mentioned in the section :ref:`diffu:pde1:FE`,
the functions ``plug`` and ``gaussian``
runs the case with :math:`I(x)` as a discontinuous plug or a smooth
Gaussian function. All experiments point to two characteristic
features of the Backward Euler scheme: 1) it is always stable, and
2) it always gives a smooth, decaying solution.

.. _diffu:pde1:CN:

Crank-Nicolson scheme          (1)
----------------------------------

.. index::
   single: diffusion equation;  1D, Crank-Nicolson scheme

The idea in the Crank-Nicolson scheme is to apply centered
differences in space and time, combined with an average in time.
We demand the PDE to be fulfilled at the spatial mesh points, but
midway between the points in the time mesh:

.. math::
        
        \frac{\partial}{\partial t} u(x_i, t_{n+\frac{1}{2}}) =
        {\alpha}\frac{\partial^2}{\partial x^2}u(x_i, t_{n+\frac{1}{2}}) + f(x_i,t_{n+\frac{1}{2}}),
        

for :math:`i=1,\ldots,N_x-1` and :math:`n=0,\ldots, N_t-1`.

With centered differences in space and time, we get

.. math::
         [D_t u = {\alpha} D_xD_x u + f]^{n+\frac{1}{2}}_i{\thinspace .}

On the right-hand side we get an expression

.. math::
         \frac{1}{\Delta x^2}\left(u^{n+\frac{1}{2}}_{i-1} - 2u^{n+\frac{1}{2}}_i + u^{n+\frac{1}{2}}_{i+1}\right) + f_i^{n+\frac{1}{2}}{\thinspace .}

This expression is problematic since :math:`u^{n+\frac{1}{2}}_i` is not one of
the unknowns we compute. A possibility is to replace :math:`u^{n+\frac{1}{2}}_i`
by an arithmetic average:

.. math::
         u^{n+\frac{1}{2}}_i\approx
        \frac{1}{2}\left(u^{n}_i +u^{n+1}_{i}\right){\thinspace .}
        

In the compact notation, we can use the arithmetic average
notation :math:`\overline{u}^t`:

.. math::
         [D_t u = {\alpha} D_xD_x \overline{u}^t + f]^{n+\frac{1}{2}}_i{\thinspace .}

We can also use an average for :math:`f_i^{n+\frac{1}{2}}`:

.. math::
         [D_t u = {\alpha} D_xD_x \overline{u}^t + \overline{f}^t]^{n+\frac{1}{2}}_i{\thinspace .}

After writing out the differences and average, multiplying by :math:`\Delta t`,
and collecting all unknown terms on the left-hand side, we get

.. math::
        
        u^{n+1}_i - \frac{1}{2} F(u^{n+1}_{i-1} - 2u^{n+1}_i + u^{n+1}_{i+1})
        = u^{n}_i + \frac{1}{2} F(u^{n}_{i-1} - 2u^{n}_i + u^{n}_{i+1})\nonumber
        

.. _Eq:_auto160:

.. math::

    \tag{379}
    \qquad \frac{1}{2} f_i^{n+1} + \frac{1}{2} f_i^n{\thinspace .}
        
        
Also here, as in the Backward Euler scheme, the new unknowns
:math:`u^{n+1}_{i-1}`, :math:`u^{n+1}_{i}`, and :math:`u^{n+1}_{i+1}` are coupled
in a linear system :math:`AU=b`, where :math:`A` has the same structure
as in :ref:`(367) <Eq:diffu:pde1:matrix:sparsity>`, but with slightly
different entries:

.. _Eq:_auto161:

.. math::

    \tag{380}
    A_{i,i-1} = -\frac{1}{2} F
        
        
.. _Eq:_auto162:

.. math::

    \tag{381}
    A_{i,i} = 1 + F
        
        
.. _Eq:_auto163:

.. math::

    \tag{382}
    A_{i,i+1} = -\frac{1}{2} F
        
        
in the equations for internal points, :math:`i=1,\ldots,N_x-1`. The equations
for the boundary points correspond to

.. _Eq:_auto164:

.. math::

    \tag{383}
    A_{0,0} = 1,
        
        
.. _Eq:_auto165:

.. math::

    \tag{384}
    A_{0,1} = 0,
        
        
.. _Eq:_auto166:

.. math::

    \tag{385}
    A_{N_x,N_x-1} = 0,
        
        
.. _Eq:_auto167:

.. math::

    \tag{386}
    A_{N_x,N_x} = 1{\thinspace .}
        
        
The right-hand side :math:`b` has entries

.. _Eq:_auto168:

.. math::

    \tag{387}
    b_0 = 0,
        
        
.. _Eq:_auto169:

.. math::

    \tag{388}
    b_i = u^{n-1}_i + \frac{1}{2}(f_i^n + f_i^{n+1}),\quad i=1,\ldots,N_x-1,
        
        
.. _Eq:_auto170:

.. math::

    \tag{389}
    b_{N_x} = 0 {\thinspace .}   
        

When verifying some implementation of the Crank-Nicolson scheme by convergence rate testing,
one should note that the scheme is second order accurate in both space and time. The numerical
error then reads

.. math::
         E = C_t\Delta t^r + C_x\Delta x^r,

where :math:`r=2` (:math:`C_t` and :math:`C_x` are unknown constants, as before).
When introducing a single discretization parameter, we may now simply choose

.. math::
         h = \Delta x = \Delta t,

which gives

.. math::
         E = C_th^r + C_xh^r = (C_t + C_x)h^r,

where :math:`r` should approach 2 as resolution is increased in the convergence rate computations.

.. index::
   single: diffusion equation;  1D, verification (CN)

.. _diffu:pde1:theta:

The unifying :math:`\theta` rule
--------------------------------

.. index::
   single: diffusion equation;  1D, theta rule

For the equation

.. math::
         \frac{\partial u}{\partial t} = G(u),

where :math:`G(u)` is some
spatial differential operator, the :math:`\theta`-rule
looks like

.. math::
         \frac{u^{n+1}_i - u^n_i}{\Delta t} =
        \theta G(u^{n+1}_i) + (1-\theta) G(u^{n}_i){\thinspace .}

The important feature of this time discretization scheme is that
we can implement one formula and then generate a family of
well-known and widely used schemes:

 * :math:`\theta=0` gives the Forward Euler scheme in time

 * :math:`\theta=1` gives the Backward Euler scheme in time

 * :math:`\theta=\frac{1}{2}` gives the Crank-Nicolson scheme in time

In the compact difference notation, we write the :math:`\theta` rule
as

.. math::
         [D_t u = {\alpha} D_xD_x u]^{n+\theta}{\thinspace .}

We have that :math:`t_{n+\theta} = \theta t_{n+1} + (1-\theta)t_n`.

Applied to the 1D diffusion problem, the :math:`\theta`-rule gives

.. math::
        \begin{align*}
        \frac{u^{n+1}_i-u^n_i}{\Delta t} &=
        {\alpha}\left( \theta \frac{u^{n+1}_{i+1} - 2u^{n+1}_i + u^{n+1}_{i-1}}{\Delta x^2}
        + (1-\theta) \frac{u^{n}_{i+1} - 2u^n_i + u^n_{i-1}}{\Delta x^2}\right)\\ 
        &\qquad + \theta f_i^{n+1} + (1-\theta)f_i^n
        {\thinspace .}
        \end{align*}

This scheme also leads to a matrix system with entries

.. math::
         A_{i,i-1} = -F\theta,\quad A_{i,i} = 1+2F\theta\quad,
        A_{i,i+1} = -F\theta,

while right-hand side entry :math:`b_i` is

.. math::
         b_i = u^n_{i} + F(1-\theta)
        \frac{u^{n}_{i+1} - 2u^n_i + u^n_{i-1}}{\Delta x^2} +
        \Delta t\theta f_i^{n+1} + \Delta t(1-\theta)f_i^n{\thinspace .}
        

The corresponding entries for the boundary points are as in the Backward
Euler and Crank-Nicolson schemes listed earlier.

Note that convergence rate testing with implementations of the theta rule must
adjust the error expression according to which of the underlying schemes is actually being run.
That is, if :math:`\theta=0` (i.e., Forward Euler) or :math:`\theta=1` (i.e., Backward Euler), there should
be first order convergence, whereas with :math:`\theta=0.5` (i.e., Crank-Nicolson), one should get
second order convergence (as outlined in previous sections).

.. _diffu:pde1:theta:experiments:

Experiments
-----------

We can repeat the experiments from the section :ref:`diffu:pde1:FE:experiments`
to see if the Backward Euler or Crank-Nicolson schemes have problems
with sawtooth-like noise when starting with a discontinuous initial
condition. We can also verify that we can have :math:`F>\frac{1}{2}`,
which allows larger time steps than in the Forward Euler method.

.. _diffu:pde1:BE:fig:F=0.5:

.. figure:: plug_BE_F05.png
   :width: 800

   Backward Euler scheme for :math:`F=0.5`

The Backward Euler scheme always produces smooth solutions for any :math:`F`.
Figure :ref:`diffu:pde1:BE:fig:F=0.5` shows one example.
Note that the mathematical discontinuity at :math:`t=0` leads to a linear
variation on a mesh, but the approximation to a jump becomes better
as :math:`N_x` increases. In our simulation we specify :math:`\Delta t` and :math:`F`,
and :math:`N_x` is set to :math:`L/\sqrt{{\alpha}\Delta t/F}`. Since :math:`N_x\sim\sqrt{F}`,
the discontinuity looks sharper in the Crank-Nicolson
simulations with larger :math:`F`.

The Crank-Nicolson method produces smooth solutions for small :math:`F`,
:math:`F\leq\frac{1}{2}`, but small noise gets more and more evident as :math:`F`
increases. Figures :ref:`diffu:pde1:CN:fig:F=3` and :ref:`diffu:pde1:CN:fig:F=10`
demonstrate the effect for :math:`F=3` and :math:`F=10`, respectively.
The section :ref:`diffu:pde1:analysis` explains why such noise occur.

.. _diffu:pde1:CN:fig:F=3:

.. figure:: plug_CN_F3.png
   :width: 800

   Crank-Nicolson scheme for :math:`F=3`

.. _diffu:pde1:CN:fig:F=10:

.. figure:: plug_CN_F10.png
   :width: 800

   Crank-Nicolson scheme for :math:`F=10`

The Laplace and Poisson equation
--------------------------------

The Laplace equation, :math:`\nabla^2 u = 0`, and the Poisson equation,
:math:`-\nabla^2 u = f`, occur in numerous applications throughout science and
engineering. In 1D these equations read
:math:`u''(x)=0` and :math:`-u''(x)=f(x)`, respectively.
We can solve 1D variants of the Laplace equations with the listed
software, because we can interpret :math:`u_{xx}=0` as the limiting solution
of :math:`u_t = {\alpha} u_{xx}` when :math:`u` reaches a steady state limit where
:math:`u_t\rightarrow 0`.
Similarly, Poisson's equation :math:`-u_{xx}=f` arises from solving
:math:`u_t = u_{xx} + f` and letting :math:`t\rightarrow\infty` so :math:`u_t\rightarrow 0`.

Technically in a program, we can simulate :math:`t\rightarrow\infty`
by just taking one large time step:
:math:`\Delta t\rightarrow\infty`. In the limit, the Backward Euler
scheme gives

.. math::
         -\frac{u^{n+1}_{i+1} - 2u^{n+1}_i + u^{n+1}_{i-1}}{\Delta x^2} = f^{n+1}_i,

which is nothing but the discretization :math:`[-D_xD_x u = f]^{n+1}_i=0` of
:math:`-u_{xx}=f`.

The result above means that
the Backward Euler scheme can solve the limit equation directly and
hence produce a solution of the 1D Laplace equation.
With the Forward Euler scheme we must do the time stepping since :math:`\Delta t >
\Delta x^2/{\alpha}`
is illegal and leads to instability.
We may interpret this time stepping
as solving the equation system from :math:`-u_{xx}=f` by iterating on a
pseudo time variable.

.. _diffu:pde1:analysis:

Analysis of schemes for the diffusion equation
==============================================

The numerical experiments in the sections :ref:`diffu:pde1:FE:experiments` and :ref:`diffu:pde1:theta:experiments`
reveal that there are some
numerical problems with the Forward Euler and Crank-Nicolson schemes:
sawtooth-like noise is sometimes present in solutions that are,
from a mathematical point of view, expected to be smooth.
This section presents a mathematical analysis that explains the
observed behavior and arrives at criteria for obtaining numerical
solutions that reproduce the qualitative properties of the exact
solutions. In short, we shall explain what is observed in
Figures :ref:`diffu:pde1:FE:fig:F=0.5`-:ref:`diffu:pde1:CN:fig:F=10`.

.. :ref:`diffu:pde1:FE:fig:F=0.5`,

.. :ref:`diffu:pde1:FE:fig:F=0.25`,

.. :ref:`diffu:pde1:FE:fig:F=0.51`,

.. :ref:`diffu:pde1:FE:fig:gauss:F=0.5`,

.. :ref:`diffu:pde1:BE:fig:F=0.5`,

.. :ref:`diffu:pde1:CN:fig:F=3`,

.. and

.. :ref:`diffu:pde1:CN:fig:F=10`.

.. _diffu:pde1:analysis:uex:

Properties of the solution
--------------------------

A particular characteristic of diffusive processes, governed
by an equation like

.. _Eq:diffu:pde1:eq:

.. math::

    \tag{390}
    u_t = {\alpha} u_{xx},
        
        
is that the initial shape :math:`u(x,0)=I(x)` spreads out in space with
time, along with a decaying amplitude.  Three different examples will
illustrate the spreading of :math:`u` in space and the decay in time.

Similarity solution
~~~~~~~~~~~~~~~~~~~

The diffusion equation :ref:`(390) <Eq:diffu:pde1:eq>` admits solutions
that depend on :math:`\eta = (x-c)/\sqrt{4{\alpha} t}` for a given value
of :math:`c`. One particular solution
is

.. _Eq:diffu:pdf1:erf:sol:

.. math::

    \tag{391}
    u(x,t) = a\,\mbox{erf}(\eta) + b,
        
        
where

.. _Eq:diffu:analysis:erf:def:

.. math::

    \tag{392}
    \mbox{erf}(\eta) = \frac{2}{\sqrt{\pi}}\int_0^\eta e^{-\zeta^2}d\zeta,
        
        
is the *error function*, and :math:`a` and :math:`b` are arbitrary constants.
The error function lies in :math:`(-1,1)`, is odd around :math:`\eta =0`, and
goes relatively quickly to :math:`\pm 1`:

.. math::
        \begin{align*}
        \lim_{\eta\rightarrow -\infty}\mbox{erf}(\eta) &=-1,\\ 
        \lim_{\eta\rightarrow \infty}\mbox{erf}(\eta) &=1,\\ 
        \mbox{erf}(\eta) &= -\mbox{erf}(-\eta),\\ 
        \mbox{erf}(0) &=0,\\ 
        \mbox{erf}(2) &=0.99532227,\\ 
        \mbox{erf}(3) &=0.99997791
        {\thinspace .}
        \end{align*}

As :math:`t\rightarrow 0`, the error function approaches a step function centered
at :math:`x=c`. For a diffusion problem posed on the unit interval :math:`[0,1]`,
we may choose the step at :math:`x=1/2` (meaning :math:`c=1/2`), :math:`a=-1/2`, :math:`b=1/2`.
Then

.. _Eq:diffu:analysis:pde1:step:erf:sol:

.. math::

    \tag{393}
    u(x,t) = \frac{1}{2}\left(1 -
        \mbox{erf}\left(\frac{x-\frac{1}{2}}{\sqrt{4{\alpha} t}}\right)\right) =
        \frac{1}{2}\mbox{erfc}\left(\frac{x-\frac{1}{2}}{\sqrt{4{\alpha} t}}\right),
        
        
where we have introduced the *complementary error function*
:math:`\mbox{erfc}(\eta) = 1-\mbox{erf}(\eta)`.
The solution :ref:`(393) <Eq:diffu:analysis:pde1:step:erf:sol>`
implies the boundary conditions

.. _Eq:diffu:analysis:pde1:p1:erf:uL:

.. math::

    \tag{394}
    u(0,t) = \frac{1}{2}\left(1 - \mbox{erf}\left(\frac{-1/2}{\sqrt{4{\alpha} t}}\right)\right),
         
        
.. _Eq:diffu:analysis:pde1:p1:erf:uR:

.. math::

    \tag{395}
    u(1,t) = \frac{1}{2}\left(1 - \mbox{erf}\left(\frac{1/2}{\sqrt{4{\alpha} t}}\right)\right)
        
        {\thinspace .}
        

For small enough :math:`t`, :math:`u(0,t)\approx 1` and :math:`u(1,t)\approx 1`, but as
:math:`t\rightarrow\infty`, :math:`u(x,t)\rightarrow 1/2` on :math:`[0,1]`.

Solution for a Gaussian pulse
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The standard diffusion equation :math:`u_t = {\alpha} u_{xx}` admits a
Gaussian function as solution:

.. _Eq:diffu:pde1:sol:Gaussian:

.. math::

    \tag{396}
    u(x,t) = \frac{1}{\sqrt{4\pi{\alpha} t}} \exp{\left({-\frac{(x-c)^2}{4{\alpha} t}}\right)}
        
        {\thinspace .}
        

At :math:`t=0` this is a Dirac delta function, so for computational
purposes one must start to view the solution at some time :math:`t=t_\epsilon>0`.
Replacing :math:`t` by :math:`t_\epsilon +t` in :ref:`(396) <Eq:diffu:pde1:sol:Gaussian>`
makes it easy to operate with a (new) :math:`t` that starts at :math:`t=0`
with an initial condition with a finite width.
The important feature of :ref:`(396) <Eq:diffu:pde1:sol:Gaussian>` is that
the standard deviation :math:`\sigma` of a sharp initial Gaussian pulse
increases in time according to :math:`\sigma = \sqrt{2{\alpha} t}`, making
the pulse diffuse and flatten out.

.. Mention combinations of such kernels to build up a general analytical sol?

.. Or maybe an exercise for verification.

Solution for a sine component
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Also, :ref:`(390) <Eq:diffu:pde1:eq>` admits a solution of the form

.. _Eq:diffu:pde1:sol1:

.. math::

    \tag{397}
    u(x,t) = Qe^{-at}\sin\left( kx\right)
        
        {\thinspace .}
        

The parameters :math:`Q` and :math:`k` can be freely chosen, while
inserting :ref:`(397) <Eq:diffu:pde1:sol1>` in :ref:`(390) <Eq:diffu:pde1:eq>` gives the constraint

.. math::
         a = -{\alpha} k^2
        {\thinspace .}
        

A very important feature is that the initial shape :math:`I(x)=Q\sin kx`
undergoes a damping :math:`\exp{(-{\alpha} k^2t)}`, meaning that
rapid oscillations in space, corresponding to large :math:`k`, are very much
faster dampened than slow oscillations in space, corresponding to small
:math:`k`. This feature leads to a smoothing of the initial condition with time.
(In fact, one can use a few steps of the diffusion equation as
a method for removing noise in signal processing.)
To judge how good a numerical method is, we may look at its ability to
smoothen or dampen the solution in the same way as the PDE does.

The following example illustrates the damping properties of
:ref:`(397) <Eq:diffu:pde1:sol1>`. We consider the specific problem

.. math::
        \begin{align*}
        u_t &= u_{xx},\quad x\in (0,1),\ t\in (0,T],\\ 
        u(0,t) &= u(1,t) = 0,\quad t\in (0,T],\\ 
        u(x,0) & = \sin (\pi x) + 0.1\sin(100\pi x)
        {\thinspace .}
        \end{align*}

The initial condition has been chosen such that adding
two solutions like :ref:`(397) <Eq:diffu:pde1:sol1>` constructs
an analytical solution to the problem:

.. _Eq:diffu:pde1:sol2:

.. math::

    \tag{398}
    u(x,t) = e^{-\pi^2 t}\sin (\pi x) + 0.1e^{-\pi^2 10^4 t}\sin (100\pi x)
        
        {\thinspace .}
        

Figure :ref:`diffu:pde1:fig:damping` illustrates the rapid damping of
rapid oscillations :math:`\sin (100\pi x)` and the very much slower damping of the
slowly varying :math:`\sin (\pi x)` term. After about :math:`t=0.5\cdot10^{-4}` the rapid
oscillations do not have a visible amplitude, while we have to wait
until :math:`t\sim 0.5` before the amplitude of the long wave :math:`\sin (\pi x)`
becomes very small.

.. _diffu:pde1:fig:damping:

.. figure:: diffusion_damping.png
   :width: 800

   *Evolution of the solution of a diffusion problem: initial condition (upper left), 1/100 reduction of the small waves (upper right), 1/10 reduction of the long wave (lower left), and 1/100 reduction of the long wave (lower right)*

.. x/sqrt(t) solution, kernel with integral

Analysis of discrete equations
------------------------------

A counterpart to :ref:`(397) <Eq:diffu:pde1:sol1>` is the complex representation
of the same function:

.. math::
         u(x,t) = Qe^{-at}e^{ikx},

where :math:`i=\sqrt{-1}` is the imaginary unit.
We can add such functions, often referred to as wave components,
to make a Fourier representation
of a general solution of the diffusion equation:

.. _Eq:diffu:pde1:u:Fourier:

.. math::

    \tag{399}
    u(x,t) \approx \sum_{k\in K} b_k e^{-{\alpha} k^2t}e^{ikx},
        
        
where :math:`K` is a set of an infinite number of :math:`k` values needed to construct
the solution. In practice, however, the series is truncated and
:math:`K` is a finite set of :math:`k` values
needed to build a good approximate solution.
Note that :ref:`(398) <Eq:diffu:pde1:sol2>` is a special case of
:ref:`(399) <Eq:diffu:pde1:u:Fourier>` where :math:`K=\{\pi, 100\pi\}`, :math:`b_{\pi}=1`,
and :math:`b_{100\pi}=0.1`.

The amplitudes :math:`b_k` of the individual Fourier waves must be determined
from the initial condition. At :math:`t=0` we have :math:`u\approx\sum_kb_k\exp{(ikx)}`
and find :math:`K` and :math:`b_k` such that

.. _Eq:_auto171:

.. math::

    \tag{400}
    I(x) \approx \sum_{k\in K} b_k e^{ikx}{\thinspace .}
        
        
(The relevant formulas for :math:`b_k` come from Fourier analysis, or
equivalently, a least-squares method for approximating :math:`I(x)`
in a function space with basis :math:`\exp{(ikx)}`.)

Much insight about the behavior of numerical methods can be obtained
by investigating how a wave component :math:`\exp{(-{\alpha} k^2
t)}\exp{(ikx)}` is treated by the numerical scheme. It appears that
such wave components are also solutions of the schemes, but the
damping factor :math:`\exp{(-{\alpha} k^2 t)}` varies among the schemes.  To
ease the forthcoming algebra, we write the damping factor as
:math:`A^n`. The exact amplification factor corresponding to :math:`A` is :math:`{A_{\small\mbox{e}}} =
\exp{(-{\alpha} k^2\Delta t)}`.

.. _diffu:pde1:analysis:details:

Analysis of the finite difference schemes
-----------------------------------------

We have seen that a general solution of the diffusion equation
can be built as a linear combination of basic components

.. math::
         e^{-{\alpha} k^2t}e^{ikx} {\thinspace .}  

A fundamental question is whether such components are also solutions of
the finite difference schemes. This is indeed the case, but the
amplitude :math:`\exp{(-{\alpha} k^2t)}` might be modified (which also happens when
solving the ODE counterpart :math:`u'=-{\alpha} u`).
We therefore look for numerical solutions of the form

.. _Eq:diffu:pde1:analysis:uni:

.. math::

    \tag{401}
    u^n_q = A^n e^{ikq\Delta x} = A^ne^{ikx},
        
        
where the amplification factor :math:`A`
must be determined by inserting the component into an actual scheme.
Note that :math:`A^n` means :math:`A` raised to the power of :math:`n`, :math:`n` being the
index in the time mesh, while the superscript :math:`n` in :math:`u^n_q` just
denotes :math:`u` at time :math:`t_n`.

Stability          (3)
~~~~~~~~~~~~~~~~~~~~~~

The exact amplification factor is :math:`{A_{\small\mbox{e}}}=\exp{(-{\alpha}^2 k^2\Delta t)}`.
We should therefore require :math:`|A| < 1` to have a decaying numerical
solution as well. If
:math:`-1\leq A<0`, :math:`A^n` will change sign from time level to
time level, and we get stable, non-physical oscillations in the numerical
solutions that are not present in the exact solution.

.. index:: amplification factor

Accuracy          (1)
~~~~~~~~~~~~~~~~~~~~~

To determine how accurately a finite difference scheme treats one
wave component :ref:`(401) <Eq:diffu:pde1:analysis:uni>`, we see that the basic
deviation from the exact solution is reflected in how well
:math:`A^n` approximates :math:`{A_{\small\mbox{e}}}^n`,
or how well :math:`A` approximates :math:`{A_{\small\mbox{e}}}`.
We can plot :math:`{A_{\small\mbox{e}}}` and the various expressions for :math:`A`, and we can
make Taylor expansions of :math:`A/{A_{\small\mbox{e}}}` to see the error more analytically.

.. We shall in particular investigate the error :math:`{A_{\small\mbox{e}}} - A` in the

.. amplification factor.

Truncation error          (1)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

As an alternative to examining the accuracy of the damping of a wave
component, we can perform a general truncation error analysis as
explained in  :ref:`ch:trunc`. Such results are more general, but
less detailed than what we get from the wave component analysis.  The
truncation error can almost always be computed and represents the
error in the numerical model when the exact solution is substituted
into the equations. In particular, the truncation error analysis tells
the order of the scheme, which is of fundamental importance when
verifying codes based on empirical estimation of convergence rates.

.. _diffu:pde1:analysis:FE:

Analysis of the Forward Euler scheme          (2)
-------------------------------------------------

.. 2DO: refer to vib and wave

The Forward Euler finite difference scheme for :math:`u_t = {\alpha} u_{xx}` can
be written as

.. math::
         [D_t^+ u = {\alpha} D_xD_x u]^n_q{\thinspace .}  

Inserting a wave component :ref:`(401) <Eq:diffu:pde1:analysis:uni>`
in the scheme demands calculating the terms

.. math::
         e^{ikq\Delta x}[D_t^+ A]^n = e^{ikq\Delta x}A^n\frac{A-1}{\Delta t},

and

.. math::
         A^nD_xD_x [e^{ikx}]_q = A^n\left( - e^{ikq\Delta x}\frac{4}{\Delta x^2}
        \sin^2\left(\frac{k\Delta x}{2}\right)\right)
        {\thinspace .}  

Inserting these terms in the discrete equation and
dividing by :math:`A^n e^{ikq\Delta x}` leads to

.. math::
        
        \frac{A-1}{\Delta t} = -{\alpha} \frac{4}{\Delta x^2}\sin^2\left(
        \frac{k\Delta x}{2}\right),
        

and consequently

.. _Eq:_auto172:

.. math::

    \tag{402}
    A = 1 -4F\sin^2 p
        
        
where

.. _Eq:_auto173:

.. math::

    \tag{403}
    F = \frac{{\alpha}\Delta t}{\Delta x^2}
        
        
is the *numerical Fourier number*, and :math:`p=k\Delta x/2`.
The complete numerical solution is then

.. _Eq:_auto174:

.. math::

    \tag{404}
    u^n_q = \left(1 -4F\sin^2 p\right)^ne^{ikq\Delta x}
        {\thinspace .}
        
        
Stability          (4)
~~~~~~~~~~~~~~~~~~~~~~

We easily see that :math:`A\leq 1`. However, the :math:`A` can be less than :math:`-1`,
which will lead
to growth of a numerical wave component. The criterion :math:`A\geq -1` implies

.. math::
         4F\sin^2 (p/2)\leq 2
        {\thinspace .}
        

The worst case is when :math:`\sin^2 (p/2)=1`, so a sufficient criterion for
stability is

.. _Eq:_auto175:

.. math::

    \tag{405}
    F\leq {\frac{1}{2}},
        
        
or expressed as a condition on :math:`\Delta t`:

.. _Eq:_auto176:

.. math::

    \tag{406}
    \Delta t\leq \frac{\Delta x^2}{2{\alpha}}{\thinspace .}
        
        
Note that halving the spatial mesh size, :math:`\Delta x \rightarrow {\frac{1}{2}}
\Delta x`, requires :math:`\Delta t` to be reduced by a factor of :math:`1/4`.
The method hence becomes very expensive for fine spatial meshes.

.. 2DO: verification based on exact solutions

Accuracy          (2)
~~~~~~~~~~~~~~~~~~~~~

Since :math:`A` is expressed in terms of :math:`F` and the parameter we now call
:math:`p=k\Delta x/2`, we should also express :math:`{A_{\small\mbox{e}}}` by :math:`F` and :math:`p`. The exponent
in :math:`{A_{\small\mbox{e}}}` is :math:`-{\alpha} k^2\Delta t`, which equals :math:`-F k^2\Delta x^2=-F4p^2`.
Consequently,

.. math::
         {A_{\small\mbox{e}}} = \exp{(-{\alpha} k^2\Delta t)} = \exp{(-4Fp^2)}
        {\thinspace .} 

All our :math:`A` expressions as well as :math:`{A_{\small\mbox{e}}}` are now functions of the two
dimensionless parameters :math:`F` and :math:`p`.

Computing
the Taylor series expansion of :math:`A/{A_{\small\mbox{e}}}` in terms of :math:`F`
can easily be done with aid of ``sympy``:

.. code-block:: python

    def A_exact(F, p):
        return exp(-4*F*p**2)
    
    def A_FE(F, p):
        return 1 - 4*F*sin(p)**2
    
    from sympy import *
    F, p = symbols('F p')
    A_err_FE = A_FE(F, p)/A_exact(F, p)
    print A_err_FE.series(F, 0, 6)

The result is

.. math::
         \frac{A}{{A_{\small\mbox{e}}}} = 1 - 4 F \sin^{2}p + 2F p^{2} - 16F^{2} p^{2} \sin^{2}p + 8 F^{2} p^{4} + \cdots
        

Recalling that :math:`F={\alpha}\Delta t/\Delta x^2`, :math:`p=k\Delta x/2`, and that
:math:`\sin^2p\leq 1`, we
realize that the dominating terms in :math:`A/{A_{\small\mbox{e}}}` are at most

.. math::
         1 - 4{\alpha} \frac{\Delta t}{\Delta x^2} +
        {\alpha}\Delta t - 4{\alpha}^2\Delta t^2
        + {\alpha}^2 \Delta t^2\Delta x^2 + \cdots
        {\thinspace .}
        

Truncation error          (2)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

We follow the theory explained in
 :ref:`ch:trunc`. The recipe is to set up the
scheme in operator notation and use formulas from
 :ref:`trunc:table` to derive an expression for
the residual. The details are documented in
 :ref:`trunc:diffu:1D`. We end up with a truncation error

.. math::
         R^n_i = {\mathcal{O}(\Delta t)} + {\mathcal{O}(\Delta x^2)}{\thinspace .}

Although this is not the true error :math:`{u_{\small\mbox{e}}}(x_i,t_n) - u^n_i`, it indicates
that the true error is of the form

.. math::
         E = C_t\Delta t + C_x\Delta x^2

for two unknown constants :math:`C_t` and :math:`C_x`.

.. _diffu:pde1:analysis:BE:

Analysis of the Backward Euler scheme
-------------------------------------

Discretizing :math:`u_t = {\alpha} u_{xx}` by a Backward Euler scheme,

.. math::
         [D_t^- u = {\alpha} D_xD_x u]^n_q,

and inserting a wave component :ref:`(401) <Eq:diffu:pde1:analysis:uni>`,
leads to calculations similar to those arising from the Forward Euler scheme,
but since

.. math::
         e^{ikq\Delta x}[D_t^- A]^n = A^ne^{ikq\Delta x}\frac{1 - A^{-1}}{\Delta t},

we get

.. math::
        
        \frac{1-A^{-1}}{\Delta t} = -{\alpha} \frac{4}{\Delta x^2}\sin^2\left(
        \frac{k\Delta x}{2}\right),
        

and then

.. _Eq:diffu:pde1:analysis:BE:A:

.. math::

    \tag{407}
    A = \left(1  + 4F\sin^2p\right)^{-1}
        
        {\thinspace .}
        

The complete numerical solution can be written

.. _Eq:_auto177:

.. math::

    \tag{408}
    u^n_q = \left(1  + 4F\sin^2 p\right)^{-n}
        e^{ikq\Delta x} {\thinspace .}
        
        
Stability          (5)
~~~~~~~~~~~~~~~~~~~~~~

We see from :ref:`(407) <Eq:diffu:pde1:analysis:BE:A>` that :math:`0<A<1`, which means
that all numerical wave components are stable and non-oscillatory
for any :math:`\Delta t >0`.

Truncation error          (3)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The derivation of the truncation error for the Backward Euler scheme is almost
identical to that for the Forward Euler scheme. We end up with

.. math::
         R^n_i = {\mathcal{O}(\Delta t)} + {\mathcal{O}(\Delta x^2)}{\thinspace .}

.. _diffu:pde1:analysis:CN:

Analysis of the Crank-Nicolson scheme
-------------------------------------

The Crank-Nicolson scheme can be written as

.. math::
         [D_t u = {\alpha} D_xD_x \overline{u}^x]^{n+\frac{1}{2}}_q, 

or

.. math::
         [D_t u]^{n+\frac{1}{2}}_q = \frac{1}{2}{\alpha}\left( [D_xD_x u]^{n}_q +
        [D_xD_x u]^{n+1}_q\right)
        {\thinspace .}
        

Inserting :ref:`(401) <Eq:diffu:pde1:analysis:uni>` in the time derivative approximation
leads to

.. math::
         [D_t A^n e^{ikq\Delta x}]^{n+\frac{1}{2}} = A^{n+\frac{1}{2}} e^{ikq\Delta x}\frac{A^{\frac{1}{2}}-A^{-\frac{1}{2}}}{\Delta t} = A^ne^{ikq\Delta x}\frac{A-1}{\Delta t}
        {\thinspace .} 

Inserting :ref:`(401) <Eq:diffu:pde1:analysis:uni>` in the other terms
and dividing by
:math:`A^ne^{ikq\Delta x}` gives the relation

.. math::
        
        \frac{A-1}{\Delta t} = -\frac{1}{2}{\alpha}\frac{4}{\Delta x^2}
        \sin^2\left(\frac{k\Delta x}{2}\right)
        (1 + A),
        

and after some more algebra,

.. _Eq:_auto178:

.. math::

    \tag{409}
    A = \frac{ 1 - 2F\sin^2p}{1 + 2F\sin^2p}
        {\thinspace .}
        
        
The exact numerical solution is hence

.. _Eq:_auto179:

.. math::

    \tag{410}
    u^n_q = \left(\frac{ 1 - 2F\sin^2p}{1 + 2F\sin^2p}\right)^ne^{ikp\Delta x}
        {\thinspace .}
        
        
Stability          (6)
~~~~~~~~~~~~~~~~~~~~~~

The criteria :math:`A>-1` and :math:`A<1` are fulfilled for any :math:`\Delta t >0`.
Therefore, the solution cannot grow, but it will oscillate if
:math:`1-2F\sin^p < 0`. To avoid such non-physical oscillations, we must demand
:math:`F\leq\frac{1}{2}`.

Truncation error          (4)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The truncation error is derived in
 :ref:`trunc:diffu:1D`:

.. math::
         R^{n+\frac{1}{2}}_i = {\mathcal{O}(\Delta x^2)} + {\mathcal{O}(\Delta t^2)}{\thinspace .}

.. _diffu:pde1:analysis:leapfrog:

Analysis of the Leapfrog scheme
-------------------------------

An attractive feature of the Forward Euler scheme is the explicit
time stepping and no need for solving linear systems. However, the
accuracy in time is only :math:`{\mathcal{O}(\Delta t)}`. We can get an explicit
*second-order* scheme in time by using the Leapfrog method:

.. math::
         [D_{2t} u = {\alpha} D_xDx u + f]^n_i{\thinspace .}

Written out,

.. math::
         u^{n+1} = u^{n-1} + \frac{2{\alpha}\Delta t}{\Delta x^2}
        (u^{n}_{i+1} - 2u^n_i + u^n_{i-1}) + f(x_i,t_n){\thinspace .}

We need some formula for the first step, :math:`u^1_i`, but for that we can use
a Forward Euler step.

Unfortunately, the Leapfrog scheme is always unstable for the
diffusion equation. To see this, we insert a wave component :math:`A^ne^{ikx}`
and get

.. math::
         \frac{A - A^{-1}}{\Delta t} = -{\alpha} \frac{4}{\Delta x^2}\sin^2 p,

or

.. math::
         A^2 + 4F \sin^2 p\, A - 1 = 0,

which has roots

.. math::
         A = -2F\sin^2 p \pm \sqrt{4F^2\sin^4 p + 1}{\thinspace .}

Both roots have :math:`|A|>1` so the always amplitude grows, which is not in
accordance with physics of the problem.
However, for a PDE with a first-order derivative in space, instead of
a second-order one, the Leapfrog scheme performs very well.
Details are provided in the section :ref:`advec:1D:leapfrog`.

Summary of accuracy of amplification factors
--------------------------------------------

We can plot the various amplification factors against :math:`p=k\Delta x/2`
for different choices of the :math:`F` parameter. Figures
:ref:`diffu:pde1:fig:A:err:C20`, :ref:`diffu:pde1:fig:A:err:C0.5`, and
:ref:`diffu:pde1:fig:A:err:C0.1` show how long and small waves are
damped by the various schemes compared to the exact damping. As long
as all schemes are stable, the amplification factor is positive,
except for Crank-Nicolson when :math:`F>0.5`.

.. _diffu:pde1:fig:A:err:C20:

.. figure:: diffusion_A_F20_F2.png
   :width: 800

   *Amplification factors for large time steps*

.. _diffu:pde1:fig:A:err:C0.5:

.. figure:: diffusion_A_F05_F025.png
   :width: 800

   *Amplification factors for time steps around the Forward Euler stability limit*

.. _diffu:pde1:fig:A:err:C0.1:

.. figure:: diffusion_A_F01_F001.png
   :width: 800

   *Amplification factors for small time steps*

The effect of negative amplification factors is that :math:`A^n` changes
sign from one time level to the next, thereby giving rise to
oscillations in time in an animation of the solution. We see from
Figure :ref:`diffu:pde1:fig:A:err:C20` that for :math:`F=20`, waves with
:math:`p\geq \pi/4` undergo a damping close to :math:`-1`, which means that the
amplitude does not decay and that the wave component jumps up and down
(flips amplitude) in time. For :math:`F=2` we have a damping of a factor of
0.5 from one time level to the next, which is very much smaller than
the exact damping. Short waves will therefore fail to be effectively
dampened.  These waves will manifest themselves as high frequency
oscillatory noise in the solution.

A value :math:`p=\pi/4` corresponds to four mesh points per wave length of
:math:`e^{ikx}`, while :math:`p=\pi/2` implies only two points per wave length,
which is the smallest number of points we can have to represent the
wave on the mesh.

To demonstrate the oscillatory behavior of the Crank-Nicolson scheme,
we choose an initial condition that leads to short waves with
significant amplitude. A discontinuous :math:`I(x)` will in particular serve
this purpose: Figures :ref:`diffu:pde1:CN:fig:F=3` and
:ref:`diffu:pde1:CN:fig:F=10` correspond to :math:`F=3` and :math:`F=10`,
respectively, and we see how short waves pollute the overall solution.

.. _diffu:2D:analysis:

Analysis of the 2D diffusion equation
-------------------------------------

We first consider the 2D diffusion equation

.. math::
         u_{t} = {\alpha}(u_{xx} + u_{yy}),

which has Fourier component solutions of the form

.. math::
         u(x,y,t) = Ae^{-{\alpha} k^2t}e^{i(k_x x + k_yy)},

and the schemes have discrete versions of this Fourier component:

.. math::
         u^{n}_{q,r} = A\xi^{n}e^{i(k_x q\Delta x + k_y r\Delta y)}{\thinspace .}

The Forward Euler scheme          (2)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

For the Forward Euler discretization,

.. math::
         [D_t^+u = {\alpha}(D_xD_x u + D_yD_y u)]_{i,j}^n,

we get

.. math::
        
        \frac{\xi - 1}{\Delta t}
        =
        -{\alpha}\frac{4}{\Delta x^2}\sin^2\left(\frac{k_x\Delta x}{2}\right) -
        {\alpha}\frac{4}{\Delta y^2}\sin^2\left(\frac{k_y\Delta y}{2}\right){\thinspace .}
        

Introducing

.. math::
         p_x = \frac{k_x\Delta x}{2},\quad p_y = \frac{k_y\Delta y}{2},

we can write the equation for :math:`\xi` more compactly as

.. math::
        
        \frac{\xi - 1}{\Delta t}
        =
        -{\alpha}\frac{4}{\Delta x^2}\sin^2 p_x -
        {\alpha}\frac{4}{\Delta y^2}\sin^2 p_y,
        

and solve for :math:`\xi`:

.. _Eq:diffu:2D:analysis:xi:

.. math::

    \tag{411}
    \xi = 1 - 4F_x\sin^2 p_x - 4F_y\sin^2 p_y{\thinspace .}
        
        
The complete numerical solution for a wave component is

.. _Eq:diffu:2D:analysis:FE:numexact:

.. math::

    \tag{412}
    u^{n}_{q,r} = A(1 - 4F_x\sin^2 p_x - 4F_y\sin^2 p_y)^n
        e^{i(k_xp\Delta x + k_yq\Delta y)}{\thinspace .}
        
        
For stability we demand :math:`-1\leq\xi\leq 1`, and :math:`-1\leq\xi` is the
critical limit, since clearly :math:`\xi \leq 1`, and the worst case
happens when the sines are at their maximum. The stability criterion
becomes

.. _Eq:diffu:2D:analysis:FE:stab:

.. math::

    \tag{413}
    F_x + F_y \leq \frac{1}{2}{\thinspace .}
        
        
For the special, yet common, case :math:`\Delta x=\Delta y=h`, the
stability criterion can be written as

.. math::
         \Delta t \leq \frac{h^2}{2d{\alpha}},

where :math:`d` is the number of space dimensions: :math:`d=1,2,3`.

The Backward Euler scheme          (2)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The Backward Euler method,

.. math::
         [D_t^-u = {\alpha}(D_xD_x u + D_yD_y u)]_{i,j}^n,

results in

.. math::
         1 - \xi^{-1} = - 4F_x \sin^2 p_x - 4F_y \sin^2 p_y, 

and

.. math::
         \xi = (1 + 4F_x \sin^2 p_x + 4F_y \sin^2 p_y)^{-1},

which is always in :math:`(0,1]`. The solution for a wave component becomes

.. _Eq:diffu:2D:analysis:BN:numexact:

.. math::

    \tag{414}
    u^{n}_{q,r} = A(1 + 4F_x\sin^2 p_x + 4F_y\sin^2 p_y)^{-n}
        e^{i(k_xq\Delta x + k_yr\Delta y)}{\thinspace .}
        
        
The Crank-Nicolson scheme          (2)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

With a Crank-Nicolson discretization,

.. math::
         [D_tu]^{n+\frac{1}{2}}_{i,j} =
        \frac{1}{2} [{\alpha}(D_xD_x u + D_yD_y u)]_{i,j}^{n+1} +
        \frac{1}{2} [{\alpha}(D_xD_x u + D_yD_y u)]_{i,j}^n,

we have, after some algebra,

.. math::
         \xi = \frac{1 - 2(F_x\sin^2 p_x + F_x\sin^2p_y)}{1 + 2(F_x\sin^2 p_x + F_x\sin^2p_y)}{\thinspace .}

The fraction on the right-hand side is always less than 1, so stability
in the sense of non-growing wave components is guaranteed for all
physical and numerical parameters. However,
the fraction can become negative and result in non-physical
oscillations. This phenomenon happens when

.. math::
         F_x\sin^2 p_x + F_x\sin^2p_y > \frac{1}{2}{\thinspace .}

A criterion against non-physical oscillations is therefore

.. math::
         F_x + F_y \leq \frac{1}{2}, 

which is the same limit as the stability criterion for the Forward Euler
scheme.

The exact discrete solution is

.. _Eq:diffu:2D:analysis:CN:numexact:

.. math::

    \tag{415}
    u^{n}_{q,r} = A
        \left(
        \frac{1 - 2(F_x\sin^2 p_x + F_x\sin^2p_y)}{1 + 2(F_x\sin^2 p_x + F_x\sin^2p_y)}
        \right)^n
        e^{i(k_xq\Delta x + k_yr\Delta y)}{\thinspace .}
        
        
Explanation of numerical artifacts
----------------------------------

The behavior of the solution generated by Forward Euler discretization in time (and centered
differences in space) is summarized at the end of
the section :ref:`diffu:pde1:FE:experiments`. Can we from the analysis
above explain the behavior?

We may start by looking at Figure :ref:`diffu:pde1:FE:fig:F=0.51`
where :math:`F=0.51`. The figure shows that the solution is unstable and
grows in time. The stability limit for such growth is :math:`F=0.5` and
since the :math:`F` in this simulation is slightly larger, growth is
unavoidable.

Figure :ref:`diffu:pde1:FE:fig:F=0.5` has unexpected features:
we would expect the solution of the diffusion equation to be
smooth, but the graphs in Figure :ref:`diffu:pde1:FE:fig:F=0.5`
contain non-smooth noise. Turning to Figure
:ref:`diffu:pde1:FE:fig:gauss:F=0.5`,  which has a quite similar
initial condition, we see that the curves are indeed smooth.
The problem with the results in Figure :ref:`diffu:pde1:FE:fig:F=0.5`
is that the initial condition is discontinuous. To represent it, we
need a significant amplitude on the shortest waves in the mesh.
However, for :math:`F=0.5`, the shortest wave (:math:`p=\pi/2`) gives
the amplitude in the numerical solution as :math:`(1-4F)^n`, which oscillates
between negative and positive values at subsequent time levels
for :math:`F>\frac{1}{4}`. Since the shortest waves have visible amplitudes in
the solution profile, the oscillations becomes visible. The
smooth initial condition in Figure :ref:`diffu:pde1:FE:fig:gauss:F=0.5`,
on the other hand, leads to very small amplitudes of the shortest waves.
That these waves then oscillate in a non-physical way for
:math:`F=0.5` is not a visible effect. The oscillations
in time in the amplitude :math:`(1-4F)^n` disappear for :math:`F\leq\frac{1}{4}`,
and that is why also the discontinuous initial condition always leads to
smooth solutions in Figure :ref:`diffu:pde1:FE:fig:F=0.25`, where
:math:`F=\frac{1}{4}`.

Turning the attention to the Backward Euler scheme and the experiments
in Figure :ref:`diffu:pde1:BE:fig:F=0.5`, we see that even the discontinuous
initial condition gives smooth solutions for :math:`F=0.5` (and in fact all other
:math:`F` values). From the exact expression of the numerical amplitude,
:math:`(1  + 4F\sin^2p)^{-1}`, we realize that this factor can never flip between
positive and negative values, and no instabilities can occur. The conclusion
is that the Backward Euler scheme always produces smooth solutions.
Also, the Backward Euler scheme guarantees that the solution cannot grow
in time (unless we add a source term to the PDE, but that is meant to
represent a physically relevant growth).

Finally, we have some small, strange artifacts when simulating the
development of the initial plug profile with the Crank-Nicolson scheme,
see Figure :ref:`diffu:pde1:CN:fig:F=10`, where :math:`F=3`.
The Crank-Nicolson scheme cannot give growing amplitudes, but it may
give oscillating amplitudes in time. The critical factor is
:math:`1 - 2F\sin^2p`, which for the shortest waves (:math:`p=\pi/2`) indicates
a stability limit :math:`F=0.5`. With the discontinuous initial condition, we have
enough amplitude on the shortest waves so their wrong behavior is visible,
and this is what we see as small instabilities in
Figure :ref:`diffu:pde1:CN:fig:F=10`. The only remedy is to lower the :math:`F` value.

Exercises          (6)
======================

.. --- begin exercise ---

.. _diffu:exer:1D:gaussian:symmetric:

Exercise 3.1: Explore symmetry in a 1D problem
----------------------------------------------

This exercise simulates the exact solution :ref:`(396) <Eq:diffu:pde1:sol:Gaussian>`.
Suppose for simplicity that :math:`c=0`.

**a)**
Formulate an initial-boundary value problem that has
:ref:`(396) <Eq:diffu:pde1:sol:Gaussian>` as solution in the domain :math:`[-L,L]`.
Use the exact solution :ref:`(396) <Eq:diffu:pde1:sol:Gaussian>` as Dirichlet
condition at the boundaries.
Simulate the diffusion of the Gaussian peak. Observe that the
solution is symmetric around :math:`x=0`.

**b)**
Show from :ref:`(396) <Eq:diffu:pde1:sol:Gaussian>` that :math:`u_x(c,t)=0`.
Since the solution is symmetric around :math:`x=c=0`, we can solve the
numerical problem in half of the domain, using a *symmetry boundary condition*
:math:`u_x=0` at :math:`x=0`. Set up the
initial-boundary value problem in this case. Simulate the
diffusion problem in :math:`[0,L]` and compare with the solution in a).

.. removed !bsol ... !esol environment (because of the command-line option --without_solutions)

Filename: ``diffu_symmetric_gaussian``.

.. --- end exercise ---

.. --- begin exercise ---

.. _diffu:exer:1D:ux:onesided:

Exercise 3.2: Investigate approximation errors from a :math:`u_x=0` boundary condition
--------------------------------------------------------------------------------------

We consider the problem solved in :ref:`diffu:exer:1D:gaussian:symmetric`
part b). The boundary condition :math:`u_x(0,t)=0` can be implemented in
two ways: 1) by a standard symmetric finite difference :math:`[D_{2x}u]_i^n=0`,
or 2) by a one-sided difference :math:`[D^+u=0]^n_i=0`.
Investigate the effect of these two conditions on the
convergence rate in space.

.. --- begin hint in exercise ---

**Hint.**
If you use a Forward Euler scheme, choose a discretization parameter
:math:`h=\Delta t = \Delta x^2` and assume the error goes like :math:`E\sim h^r`.
The error in the scheme is :math:`{\mathcal{O}(\Delta t,\Delta x^2)}` so one should
expect that the estimated :math:`r` approaches 1. The question is if
a one-sided difference approximation to :math:`u_x(0,t)=0` destroys this
convergence rate.

.. --- end hint in exercise ---

Filename: ``diffu_onesided_fd``.

.. --- end exercise ---

.. --- begin exercise ---

.. _diffu:exer:1D:openBC:

Exercise 3.3: Experiment with open boundary conditions in 1D
------------------------------------------------------------

We address diffusion of a Gaussian function
as in :ref:`diffu:exer:1D:gaussian:symmetric`,
in the domain :math:`[0,L]`,
but now we shall explore different types of boundary
conditions on :math:`x=L`. In real-life problems we do not know
the exact solution on :math:`x=L` and must use something simpler.

**a)**
Imagine that we want to solve the problem numerically on
:math:`[0,L]`, with a symmetry boundary condition :math:`u_x=0` at :math:`x=0`,
but we do not know the exact solution and cannot of that
reason assign a correct Dirichlet condition at :math:`x=L`.
One idea is to simply set :math:`u(L,t)=0` since this will be an
accurate approximation before the diffused pulse reaches :math:`x=L`
and even thereafter it might be a satisfactory condition if the exact :math:`u` has
a small value.
Let :math:`{u_{\small\mbox{e}}}` be the exact solution and let :math:`u` be the solution
of :math:`u_t={\alpha} u_{xx}` with an initial Gaussian pulse and
the boundary conditions :math:`u_x(0,t)=u(L,t)=0`. Derive a diffusion
problem for the error :math:`e={u_{\small\mbox{e}}} - u`. Solve this problem
numerically using an exact Dirichlet condition at :math:`x=L`.
Animate the evolution of the error and make a curve plot of
the error measure

.. math::
         E(t)=\sqrt{\frac{\int_0^L e^2dx}{\int_0^L udx}}{\thinspace .} 

Is this a suitable error measure for the present problem?

**b)**
Instead of using :math:`u(L,t)=0` as approximate boundary condition for
letting the diffused Gaussian pulse move out of our finite domain,
one may try :math:`u_x(L,t)=0` since the solution for large :math:`t` is
quite flat. Argue that this condition gives a completely wrong
asymptotic solution as :math:`t\rightarrow 0`. To do this,
integrate the diffusion equation from :math:`0` to :math:`L`, integrate
:math:`u_{xx}` by parts (or use Gauss' divergence theorem in 1D) to
arrive at the important property

.. math::
         \frac{d}{dt}\int_{0}^L u(x,t)dx = 0,

implying that :math:`\int_0^Ludx` must be constant in time, and therefore

.. math::
         \int_{0}^L u(x,t)dx = \int_{0}^LI(x)dx{\thinspace .} 

The integral of the initial pulse is 1.

**c)**
Another idea for an artificial boundary condition at :math:`x=L`
is to use a cooling law

.. _Eq:diffu:pde1:Gaussian:xL:cooling:

.. math::

    \tag{416}
    -{\alpha} u_x = q(u - u_S),
        
        
where :math:`q` is an unknown heat transfer coefficient and :math:`u_S` is
the surrounding temperature in the medium outside of :math:`[0,L]`.
(Note that arguing that :math:`u_S` is approximately :math:`u(L,t)` gives
the :math:`u_x=0` condition from the previous subexercise that is
qualitatively wrong for large :math:`t`.)
Develop a diffusion problem for the error in the solution using
:ref:`(416) <Eq:diffu:pde1:Gaussian:xL:cooling>` as boundary condition.
Assume one can take :math:`u_S=0` "outside the domain" since
:math:`{u_{\small\mbox{e}}}\rightarrow 0` as :math:`x\rightarrow\infty`.
Find a function :math:`q=q(t)` such that the exact solution
obeys the condition :ref:`(416) <Eq:diffu:pde1:Gaussian:xL:cooling>`.
Test some constant values of :math:`q` and animate how the corresponding
error function behaves. Also compute :math:`E(t)` curves as defined above.

Filename: ``diffu_open_BC``.

.. --- end exercise ---

.. --- begin exercise ---

Exercise 3.4: Simulate a diffused Gaussian peak in 2D/3D
--------------------------------------------------------

**a)**
Generalize :ref:`(396) <Eq:diffu:pde1:sol:Gaussian>` to multi dimensions by
assuming that one-dimensional solutions can be multiplied to solve
:math:`u_t = {\alpha}\nabla^2 u`. Set :math:`c=0` such that the peak of
the Gaussian is at the origin.

**b)**
One can from the exact solution show
that :math:`u_x=0` on :math:`x=0`, :math:`u_y=0` on :math:`y=0`, and :math:`u_z=0` on :math:`z=0`.
The approximately correct condition :math:`u=0` can be set
on the remaining boundaries (say :math:`x=L`, :math:`y=L`, :math:`z=L`), cf. :ref:`diffu:exer:1D:openBC`.
Simulate a 2D case and make an animation of the diffused Gaussian peak.

**c)**
The formulation in b) makes use of symmetry of the solution such that we
can solve the problem in the first quadrant (2D) or octant (3D) only.
To check that the symmetry assumption is correct, formulate the problem
without symmetry in a domain :math:`[-L,L]\times [L,L]` in 2D. Use :math:`u=0` as
approximately correct boundary condition. Simulate the same case as
in b), but in a four times as large domain. Make an animation and compare
it with the one in b).

Filename: ``diffu_symmetric_gaussian_2D``.

.. --- end exercise ---

.. --- begin exercise ---

.. _diffu:exer:uterm:

Exercise 3.5: Examine stability of a diffusion model with a source term
-----------------------------------------------------------------------

Consider a diffusion equation with a linear :math:`u` term:

.. math::
         u_t = {\alpha} u_{xx} + \beta u{\thinspace .}

**a)**
Derive in detail the Forward Euler, Backward Euler,
and Crank-Nicolson schemes for this type of diffusion model.
Thereafter, formulate a :math:`\theta`-rule to summarize the three schemes.

**b)**
Assume a solution like :ref:`(397) <Eq:diffu:pde1:sol1>` and find the relation
between :math:`a`, :math:`k`, :math:`{\alpha}`, and :math:`\beta`.

.. --- begin hint in exercise ---

**Hint.**
Insert :ref:`(397) <Eq:diffu:pde1:sol1>` in the PDE problem.

.. --- end hint in exercise ---

**c)**
Calculate the stability of the Forward Euler scheme. Design
numerical experiments to confirm the results.

.. --- begin hint in exercise ---

**Hint.**
Insert the discrete counterpart to :ref:`(397) <Eq:diffu:pde1:sol1>` in the
numerical scheme. Run experiments at the stability limit and slightly above.

.. --- end hint in exercise ---

**d)**
Repeat c) for the Backward Euler scheme.

**e)**
Repeat c) for the Crank-Nicolson scheme.

**f)**
How does the extra term :math:`bu` impact the accuracy of the three schemes?

.. --- begin hint in exercise ---

**Hint.**
For analysis of the accuracy,
compare the numerical and exact amplification factors, in
graphs and/or by Taylor series expansion.

.. --- end hint in exercise ---

Filename: ``diffu_stability_uterm``.

.. --- end exercise ---

.. _diffu:varcoeff:

Diffusion in heterogeneous media
================================

.. index::
   single: diffusion coefficient; non-constant

Diffusion in heterogeneous media normally implies a non-constant
diffusion coefficient :math:`\alpha = \alpha (x)`.
A 1D diffusion model with such a variable diffusion coefficient reads

.. _Eq:diffu:pde2:

.. math::

    \tag{417}
    \frac{\partial u}{\partial t} =
        \frac{\partial}{\partial x}\left( \alpha (x) \frac{\partial u}{\partial x}
        \right) + f(x,t), \quad x\in (0,L),\ t\in (0,T],
        
        
.. _Eq:diffu:pde2:ic:u:

.. math::

    \tag{418}
    u(x,0) = I(x), \quad  x\in [0,L],
        
        
.. _Eq:diffu:pde2:bc:0:

.. math::

    \tag{419}
    u(0,t)  = U_0, \quad  t>0,
        
        
.. _Eq:diffu:pde2:bc:L:

.. math::

    \tag{420}
    u(L,t)  = U_L, \quad  t>0.
        
        
A short form of the diffusion equation with variable coefficients is
:math:`u_t = (\alpha u_x)_x`.

.. _diffu:varcoeff:discr:

Discretization          (2)
---------------------------

We can discretize :ref:`(417) <Eq:diffu:pde2>` by a :math:`\theta`-rule in time
and centered differences in space:

.. math::
         \lbrack D_t u\rbrack^{n+\frac{1}{2}}_i = \theta\lbrack D_x(\overline{{\alpha}}^x
        D_x u) + f\rbrack^{n+1}_i +
        (1-\theta)\lbrack D_x(\overline{{\alpha}}^x
        D_x u) + f\rbrack^{n}_i{\thinspace .}

Written out, this becomes

.. math::
        \begin{align*}
        \frac{u^{n+1}_i-u^{n}_i}{\Delta t} &=
        \theta\frac{1}{\Delta x^2}
        ({\alpha}_{i+\frac{1}{2}}(u^{n+1}_{i+1} - u^{n+1}_{i})
        - {\alpha}_{i-\frac{1}{2}}(u^{n+1}_i - u^{n+1}_{i+1})) +\\ 
        &\quad (1-\theta)\frac{1}{\Delta x^2}
        ({\alpha}_{i+\frac{1}{2}}(u^{n}_{i+1} - u^{n}_{i})
        - {\alpha}_{i-\frac{1}{2}}(u^{n}_i - u^{n}_{i+1})) +\\ 
        &\quad \theta f_i^{n+1} + (1-\theta)f_i^{n},
        \end{align*}

where, e.g., an arithmetic mean can to be used for :math:`{\alpha}_{i+\frac{1}{2}}`:

.. math::
         {\alpha}_{i+\frac{1}{2}} = \frac{1}{2}({\alpha}_i + {\alpha}_{i+1}){\thinspace .}

.. _diffu:varcoeff:impl:

Implementation          (8)
---------------------------

.. index::
   single: diffusion equation;  1D, Implementation

Suitable code for solving the discrete equations is very similar to
what we created for a constant :math:`{\alpha}`.
Since the Fourier number has no meaning for varying
:math:`{\alpha}`, we introduce a related parameter :math:`D=\Delta t /\Delta x^2`.

.. code-block:: python

    def solver_theta(I, a, L, Nx, D, T, theta=0.5, u_L=1, u_R=0,
                     user_action=None):
        x = linspace(0, L, Nx+1)   # mesh points in space
        dx = x[1] - x[0]
        dt = D*dx**2
        Nt = int(round(T/float(dt)))
        t = linspace(0, T, Nt+1)   # mesh points in time
    
        u   = zeros(Nx+1)   # solution array at t[n+1]
        u_n = zeros(Nx+1)   # solution at t[n]
    
        Dl = 0.5*D*theta
        Dr = 0.5*D*(1-theta)
    
        # Representation of sparse matrix and right-hand side
        diagonal = zeros(Nx+1)
        lower    = zeros(Nx)
        upper    = zeros(Nx)
        b        = zeros(Nx+1)
    
        # Precompute sparse matrix (scipy format)
        diagonal[1:-1] = 1 + Dl*(a[2:] + 2*a[1:-1] + a[:-2])
        lower[:-1] = -Dl*(a[1:-1] + a[:-2])
        upper[1:]  = -Dl*(a[2:] + a[1:-1])
        # Insert boundary conditions
        diagonal[0] = 1
        upper[0] = 0
        diagonal[Nx] = 1
        lower[-1] = 0
    
        A = scipy.sparse.diags(
            diagonals=[diagonal, lower, upper],
            offsets=[0, -1, 1],
            shape=(Nx+1, Nx+1),
            format='csr')
    
        # Set initial condition
        for i in range(0,Nx+1):
            u_n[i] = I(x[i])
    
        if user_action is not None:
            user_action(u_n, x, t, 0)
    
        # Time loop
        for n in range(0, Nt):
            b[1:-1] = u_n[1:-1] + Dr*(
                (a[2:] + a[1:-1])*(u_n[2:] - u_n[1:-1]) -
                (a[1:-1] + a[0:-2])*(u_n[1:-1] - u_n[:-2]))
            # Boundary conditions
            b[0]  = u_L(t[n+1])
            b[-1] = u_R(t[n+1])
            # Solve
            u[:] = scipy.sparse.linalg.spsolve(A, b)
    
            if user_action is not None:
                user_action(u, x, t, n+1)
    
            # Switch variables before next step
            u_n, u = u, u_n

The code is found in the file `diffu1D_vc.py <http://tinyurl.com/nu656p2/diffu/diffu1D_vc.py>`__.

.. _diffu:varcoeff:stationary:

Stationary solution
-------------------

.. index::
   single: diffusion equation; stationary solution

As :math:`t\rightarrow\infty`, the solution of the
problem :ref:`(417) <Eq:diffu:pde2>`-:ref:`(420) <Eq:diffu:pde2:bc:L>`
will approach
a stationary limit where :math:`\partial u/\partial t=0`. The governing
equation is then

.. _Eq:diffu:fd2:pde:st:

.. math::

    \tag{421}
    \frac{d}{dx}\left(\alpha\frac{du}{dx}\right) =0,
        
        
with boundary conditions :math:`u(0)=U_0` and :math:`u(L)=u_L`.
It is possible to obtain an exact solution of :ref:`(421) <Eq:diffu:fd2:pde:st>`
for any :math:`\alpha`. Integrating twice and applying the boundary conditions
to determine the integration constants gives

.. _Eq:diffu:fd2:pde:st:sol:

.. math::

    \tag{422}
    u(x) = U_0 + (U_L-U_0)\frac{\int_0^x (\alpha(\xi))^{-1}d\xi}{\int_0^L (\alpha(\xi))^{-1}d\xi}
        {\thinspace .}
        
        
.. _diffu:varcoeff:piecewise:

Piecewise constant medium
-------------------------

.. index::
   single: diffusion coefficient;  piecewise constant

Consider a medium built of :math:`M` layers. The layer boundaries
are denoted :math:`b_0, \ldots, b_M`,
where :math:`b_0=0` and :math:`b_M=L`.
If the layers potentially have different material properties, but
these properties are constant within each layer, we can express :math:`\alpha` as a
*piecewise constant function* according to

.. _Eq:diffu:fd2:pde:st:pc:alpha:

.. math::

    \tag{423}
    \alpha (x) = \left\lbrace\begin{array}{ll}
        \alpha_0,& b_0 \leq x < b_1,\\ 
        \vdots &\\ 
        \alpha_i,& b_i \leq x < b_{i+1},\\ 
        \vdots &\\ 
        \alpha_{M-1},& b_{M-1} \leq x \leq b_M.
        \end{array}\right.
        
        
The exact solution :ref:`(422) <Eq:diffu:fd2:pde:st:sol>` in case of such a
piecewise constant :math:`\alpha` function is easy to derive. Assume that
:math:`x` is in the :math:`m`-th layer: :math:`x\in [b_m, b_{m+1}]`. In the integral
:math:`\int_0^x (a(\xi))^{-1}d\xi` we must integrate through the first
:math:`m-1` layers and then add the contribution from the remaining part
:math:`x-b_m` into the :math:`m`-th layer:

.. _Eq:diffu:fd2:pde:st:sol:pc:

.. math::

    \tag{424}
    u(x) = U_0 + (U_L-U_0)
        \frac{\sum_{j=0}^{m-1} (b_{j+1}-b_j)/\alpha(b_j) + (x-b_m)/\alpha(b_m)}{\sum_{j=0}^{M-1} (b_{j+1}-b_j)/\alpha(b_j)}
        
        
**Remark.**
It may sound strange to have a discontinuous :math:`\alpha` in a differential
equation where one is to differentiate, but a discontinuous :math:`\alpha`
is compensated by a discontinuous :math:`u_x` such that :math:`\alpha u_x` is
continuous and therefore can be differentiated as :math:`(\alpha u_x)_x`.

.. _diffu:varcoeff:impl:piecewise:

Implementation of diffusion in a piecewise constant medium
----------------------------------------------------------

.. index::
   single: diffusion equation;  implementation

Programming with piecewise function definitions quickly becomes
cumbersome as the most naive approach is to test for which interval
:math:`x` lies, and then start evaluating a formula like
:ref:`(424) <Eq:diffu:fd2:pde:st:sol:pc>`. In Python, vectorized expressions may
help to speed up the computations.
The convenience classes ``PiecewiseConstant`` and
``IntegratedPiecewiseConstant`` in the `Heaviside <http://tinyurl.com/nu656p2/diffu/Heaviside.py>`__
module were made to simplify programming with
functions like :ref:`(423) <Eq:diffu:fd2:pde:st:pc:alpha>` and expressions like
:ref:`(424) <Eq:diffu:fd2:pde:st:sol:pc>`. These utilities not only represent
piecewise constant functions, but also *smoothed* versions of them
where the discontinuities can be smoothed out in a controlled fashion.

The ``PiecewiseConstant`` class is created by sending in the domain as a
2-tuple or 2-list and a ``data`` object describing the boundaries
:math:`b_0,\ldots,b_M` and the corresponding function values
:math:`\alpha_0,\ldots,\alpha_{M-1}`. More precisely, ``data`` is a nested
list, where ``data[i][0]`` holds :math:`b_i` and ``data[i][1]`` holds the
corresponding value :math:`\alpha_i`, for :math:`i=0,\ldots,M-1`. Given :math:`b_i` and
:math:`\alpha_i` in arrays ``b`` and ``a``, it is easy to fill out the nested
list ``data``.

In our application, we want to represent :math:`\alpha` and :math:`1/\alpha`
as piecewise constant functions, in addition to the :math:`u(x)` function
which involves the integrals of :math:`1/\alpha`. A class creating the
functions we need and a method for evaluating :math:`u`, can take the
form

.. code-block:: python

    class SerialLayers:
        """
        b: coordinates of boundaries of layers, b[0] is left boundary
        and b[-1] is right boundary of the domain [0,L].
        a: values of the functions in each layer (len(a) = len(b)-1).
        U_0: u(x) value at left boundary x=0=b[0].
        U_L: u(x) value at right boundary x=L=b[0].
        """
    
        def __init__(self, a, b, U_0, U_L, eps=0):
            self.a, self.b = np.asarray(a), np.asarray(b)
            self.eps = eps  # smoothing parameter for smoothed a
            self.U_0, self.U_L = U_0, U_L
    
            a_data = [[bi, ai] for bi, ai in zip(self.b, self.a)]
            domain = [b[0], b[-1]]
            self.a_func = PiecewiseConstant(domain, a_data, eps)
    
            # inv_a = 1/a is needed in formulas
            inv_a_data = [[bi, 1./ai] for bi, ai in zip(self.b, self.a)]
            self.inv_a_func = \ 
                 PiecewiseConstant(domain, inv_a_data, eps)
            self.integral_of_inv_a_func = \ 
                 IntegratedPiecewiseConstant(domain, inv_a_data, eps)
            # Denominator in the exact formula is constant
            self.inv_a_0L = self.integral_of_inv_a_func(b[-1])
    
        def __call__(self, x):
            solution = self.U_0 + (self.U_L-self.U_0)*\ 
                       self.integral_of_inv_a_func(x)/self.inv_a_0L
            return solution

A visualization method is also convenient to have. Below we plot :math:`u(x)`
along with :math:`\alpha (x)` (which works well as long as :math:`\max \alpha(x)`
is of the same size as :math:`\max u = \max(U_0,U_L)`).

.. code-block:: python

    class SerialLayers:
        ...
    
        def plot(self):
            x, y_a = self.a_func.plot()
            x = np.asarray(x); y_a = np.asarray(y_a)
            y_u = self.u_exact(x)
            import matplotlib.pyplot as plt
            plt.figure()
            plt.plot(x, y_u, 'b')
            plt.hold('on')  # Matlab style
            plt.plot(x, y_a, 'r')
            ymin = -0.1
            ymax = 1.2*max(y_u.max(), y_a.max())
            plt.axis([x[0], x[-1], ymin, ymax])
            plt.legend(['solution $u$', 'coefficient $a$'], loc='upper left')
            if self.eps > 0:
                plt.title('Smoothing eps: %s' % self.eps)
            plt.savefig('tmp.pdf')
            plt.savefig('tmp.png')
            plt.show()

Figure :ref:`diffu:fd2:pde:st:sol:pc:fig1` shows the case where

.. code-block:: python

    b = [0, 0.25, 0.5, 1]   # material boundaries
    a = [0.2, 0.4, 4]       # material values
    U_0 = 0.5;  U_L = 5     # boundary conditions

.. _diffu:fd2:pde:st:sol:pc:fig1:

.. figure:: flow_in_layers_case1.png
   :width: 400

   *Solution of the stationary diffusion equation corresponding to a piecewise constant diffusion coefficient*

By adding the ``eps`` parameter to the constructor of the ``SerialLayers``
class, we can experiment with smoothed versions of :math:`\alpha` and see
the (small) impact on :math:`u`. Figure :ref:`diffu:fd2:pde:st:sol:pc:fig2`
shows the result.

.. _diffu:fd2:pde:st:sol:pc:fig2:

.. figure:: flow_in_layers_case1_eps.png
   :width: 400

   Solution of the stationary diffusion equation corresponding to a *smoothed* piecewise constant diffusion coefficient

.. _diffu:fd2:radial:

Axi-symmetric diffusion
-----------------------

.. index::
   single: diffusion equation; axi-symmetric diffusion

Suppose we have a diffusion process taking place in a straight tube
with radius :math:`R`. We assume axi-symmetry such that :math:`u` is just a
function of :math:`r` and :math:`t`, :math:`r` being the radial distance from the center
axis of the tube to a point. With such axi-symmetry it is
advantageous to introduce *cylindrical coordinates* :math:`r`, :math:`\theta`, and
:math:`z`, where :math:`z` is in the direction of the tube and :math:`(r,\theta)` are
polar coordinates in a cross section. Axi-symmetry means that all
quantities are independent of :math:`\theta`. From the relations :math:`x=\cos\theta`,
:math:`y=\sin\theta`, and :math:`z=z`, between Cartesian and cylindrical coordinates,
one can (with some effort) derive the diffusion equation in cylindrical
coordinates, which with axi-symmetry takes the form

.. math::
        
        \frac{\partial u}{\partial t} = \frac{1}{r}\frac{\partial}{\partial r}
        \left(r{\alpha}(r,z)\frac{\partial u}{\partial r}\right) + \frac{\partial}{\partial z}
        \left(\alpha(r,z)\frac{\partial u}{\partial z}\right) + f(r,z,t){\thinspace .}

Let us assume that :math:`u` does not change along the tube axis so it
suffices to compute variations in a cross section. Then :math:`\partial u/\partial
z = 0` and the we have a 1D diffusion equation in the radial coordinate
:math:`r` and time :math:`t`. In particular, we shall address the initial-boundary
value problem

.. _Eq:diffu:fd2:radial:PDE:

.. math::

    \tag{425}
    \frac{\partial u}{\partial t} = \frac{1}{r}\frac{\partial}{\partial r}
        \left(r{\alpha}(r)\frac{\partial u}{\partial r}\right) + f(t),  r\in (0,R),\ t\in (0,T],
        
        
.. _Eq:diffu:fd2:radial:symmr0:

.. math::

    \tag{426}
    \frac{\partial u}{\partial r}(0,t) = 0,  t\in (0,T],
        
        
.. _Eq:diffu:fd2:radial:uR:

.. math::

    \tag{427}
    u(R,t) = 0,  t\in (0,T],
        
        
.. _Eq:diffu:fd2:radial:initial:

.. math::

    \tag{428}
    u(r,0) = I(r),  r\in [0,R].
        
        
The condition :ref:`(426) <Eq:diffu:fd2:radial:symmr0>` is a necessary symmetry condition
at :math:`r=0`, while :ref:`(427) <Eq:diffu:fd2:radial:uR>` could be any Dirichlet
or Neumann condition (or Robin condition in case of cooling or heating).

The finite difference approximation will need the discretized version
of the PDE for :math:`r=0` (just as we use the PDE at the boundary when
implementing Neumann conditions). However, discretizing the PDE at
:math:`r=0` poses a problem because of the :math:`1/r` factor. We therefore need
to work out the PDE for discretization at :math:`r=0` with care.
Let us, for the case of constant :math:`{\alpha}`, expand the spatial derivative term to

.. math::
         \alpha\frac{\partial^2 u}{\partial r^2} + \alpha\frac{1}{r}\frac{\partial u}{\partial r}{\thinspace .}

The last term faces a difficulty at :math:`r=0`, since it becomes a :math:`0/0` expression
caused by the symmetry condition at :math:`r=0`.
However, L'Hosptial's rule can be used:

.. math::
         \lim_{r\rightarrow 0} \frac{1}{r}\frac{\partial u}{\partial r}
        = \frac{\partial^2 u}{\partial r^2}{\thinspace .}

The PDE at :math:`r=0` therefore becomes

.. _Eq:diffu:fd2:radial:eq_PDEr0:aconst:

.. math::

    \tag{429}
    \frac{\partial u}{\partial t} = 2{\alpha}\frac{\partial^2 u}{\partial r^2}
        + f(t){\thinspace .}
        
        
For a variable coefficient :math:`{\alpha}(r)` the expanded spatial derivative term reads

.. math::
         {\alpha}(r)\frac{\partial^2 u}{\partial r^2} +
        \frac{1}{r}({\alpha}(r) + r{\alpha}'(r))\frac{\partial u}{\partial r}{\thinspace .}

We are interested in this expression for :math:`r=0`. A necessary condition
for :math:`u` to be axi-symmetric is that all input data, including :math:`\alpha`,
must also be axi-symmetric, implying that :math:`\alpha'(0)=0` (the second
term vanishes anyway because of :math:`r=0`). The limit of interest is

.. math::
         \lim_{r\rightarrow 0}
        \frac{1}{r}{\alpha}(r)\frac{\partial u}{\partial r} =
        {\alpha}(0)\frac{\partial^2 u}{\partial r^2}{\thinspace .}
        

The PDE at :math:`r=0` now looks like

.. _Eq:diffu:fd2:radial:eq_PDEr0:avar:

.. math::

    \tag{430}
    \frac{\partial u}{\partial t} = 2{\alpha}(0)
        \frac{\partial^2 u}{\partial r^2}
        + f(t),
        
        
so there is no essential difference between the constant coefficient
and variable coefficient cases.

The second-order derivative in :ref:`(429) <Eq:diffu:fd2:radial:eq_PDEr0:aconst>`
and :ref:`(430) <Eq:diffu:fd2:radial:eq_PDEr0:avar>`
is discretized in the usual way.

.. math::
         2{\alpha}\frac{\partial^2}{\partial r^2}u(r_0,t_n) \approx
        [2{\alpha} 2D_rD_r u]^n_0 =
        2{\alpha} \frac{u^{n}_{1} - 2u^{n}_0 + u^n_{-1}}{\Delta r^2}{\thinspace .}

The fictitious value :math:`u^n_{-1}` can be eliminated using the discrete
symmetry condition

.. math::
         [D_{2r} u =0]^n_0 \quad\Rightarrow\quad u^n_{-1} = u^n_1,

which then gives the modified approximation to the term with the second-order derivative
of :math:`u` in :math:`r` at :math:`r=0`:

.. _Eq:_auto180:

.. math::

    \tag{431}
    4{\alpha} \frac{u^{n}_{1} - u^{n}_0}{\Delta r^2}{\thinspace .}
        
        
The discretization of the term with the second-order derivative in :math:`r` at any
internal mesh point is straightforward:

.. math::
        \begin{align*}
        \left[\frac{1}{r}\frac{\partial}{\partial r}
        \left(r{\alpha}\frac{\partial u}{\partial r}\right)\right]_i^n
        & \approx [r^{-1} D_r (r {\alpha} D_r u)]_i^n\\ 
        &= \frac{1}{r_i}\frac{1}{\Delta r^2}\left(
        r_{i+\frac{1}{2}}{\alpha}_{i+\frac{1}{2}}(u_{i+1}^n - u_i^n) - r_{i-\frac{1}{2}}{\alpha}_{i-\frac{1}{2}}(u_{i}^n - u_{i-1}^n)\right){\thinspace .}
        \end{align*}

To complete the discretization, we need a scheme in time, but that can
be done as before and does not interfere with the discretization in space.

.. _diffu:fd2:spherical:

Spherically-symmetric diffusion
-------------------------------

.. index::
   single: diffusion equation; spherically-symmetric diffusion

Discretization in spherical coordinates
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Let us now pose the problem from the section :ref:`diffu:fd2:radial`
in spherical coordinates, where :math:`u` only depends on the radial coordinate
:math:`r` and time :math:`t`. That is, we have spherical symmetry.
For simplicity we restrict the diffusion coefficient :math:`{\alpha}` to be
a constant. The PDE reads

.. _Eq:_auto181:

.. math::

    \tag{432}
    \frac{\partial u}{\partial t} = \frac{{\alpha}}{r^\gamma}\frac{\partial}{\partial r}
        \left(r^\gamma\frac{\partial u}{\partial r}\right) + f(t),
        
        
for :math:`r\in (0,R)` and :math:`t\in (0,T]`. The parameter :math:`\gamma` is 2 for
spherically-symmetric problems and 1 for axi-symmetric problems.
The boundary and initial conditions
have the same mathematical form as
in :ref:`(425) <Eq:diffu:fd2:radial:PDE>`-:ref:`(428) <Eq:diffu:fd2:radial:initial>`.

Since the PDE in spherical coordinates has the same form as the PDE
in the section :ref:`diffu:fd2:radial`, just with the :math:`\gamma` parameter
being different, we can use the same discretization approach.
At the origin :math:`r=0` we get problems with the term

.. math::
         \frac{\gamma}{r}\frac{\partial u}{\partial t},

but L'Hosptial's rule shows that this term equals :math:`\gamma\partial^2 u/
\partial r^2`, and the PDE at :math:`r=0` becomes

.. _Eq:_auto182:

.. math::

    \tag{433}
    \frac{\partial u}{\partial t} = (\gamma+1){\alpha}\frac{\partial^2 u}{\partial r^2}
        + f(t){\thinspace .}
        
        
The associated discrete form is then

.. _Eq:_auto183:

.. math::

    \tag{434}
    [D_t u = \frac{1}{2} (\gamma+1){\alpha}([D_rD_r \overline{u}^t + \overline{f}^t]^n_i,
        
        
for a Crank-Nicolson scheme.

Discretization in Cartesian coordinates
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The spherically-symmetric spatial derivative can be transformed to
the Cartesian counterpart by introducing

.. math::
         v(r,t) = ru(r,t){\thinspace .}

Inserting :math:`u=v/r` in

.. math::
         \frac{1}{r^2}\frac{\partial}{\partial r}
        \left({\alpha}(r)r^2\frac{\partial u}{\partial r}\right), 

yields

.. math::
         r\left(\frac{d {\alpha}}{dr}\frac{\partial v}{\partial r} +
        {\alpha}\frac{\partial^2 v}{\partial r^2}\right) - \frac{d {\alpha}}{dr}v
        {\thinspace .}
        

The two terms in the parenthesis can be combined to

.. math::
         r\frac{\partial}{\partial r}\left( {\alpha}\frac{\partial v}{\partial r}\right){\thinspace .}
        

The PDE for :math:`v` takes the form

.. _Eq:_auto184:

.. math::

    \tag{435}
    \frac{\partial v}{\partial t} = \frac{\partial}{\partial r}\left( {\alpha}
        \frac{\partial v}{\partial r}\right) - \frac{1}{r}\frac{d{\alpha}}{dr}v + rf(r,t),
        \quad r\in (0,R),\ t\in (0,T]{\thinspace .}
        
        
For :math:`\alpha` constant we immediately realize that we can reuse a
solver in Cartesian coordinates to compute :math:`v`. With variable :math:`\alpha`,
a "reaction" term :math:`v/r` needs to be added to the Cartesian solver.
The boundary condition :math:`\partial u/\partial r=0` at :math:`r=0`, implied
by symmetry, forces :math:`v(0,t)=0`, because

.. math::
         \frac{\partial u}{\partial r} = \frac{1}{r^2}\left(
        r\frac{\partial v}{\partial r} - v\right) = 0,\quad r=0{\thinspace .}
        

.. _diffu:2D:

Diffusion in 2D
===============

.. index::
   single: diffusion equation; 2D

We now address a diffusion in two space dimensions:

.. _Eq:_auto185:

.. math::

    \tag{436}
    \frac{\partial u}{\partial t}  = {\alpha}\left(
        \frac{\partial^2 u}{\partial x^2} +
        \frac{\partial^2 u}{\partial x^2}\right) + f(x,y),
        
        
in a domain

.. math::
         (x,y)\in (0,L_x)\times (0,L_y),\ t\in (0,T], 

with :math:`u=0` on the boundary and :math:`u(x,y,0)=I(x,y)` as initial condition.

.. _diffu:2D:discr:

Discretization          (3)
---------------------------

For generality, it is natural to use a :math:`\theta`-rule for the time
discretization. Standard, second-order accurate finite differences are
used for the spatial derivatives. We sample the PDE at a space-time
point :math:`(i,j,n+\frac{1}{2})` and apply the difference approximations:

.. math::
        
        \lbrack D_t u\rbrack^{n+\frac{1}{2}} =
        \theta \lbrack {\alpha} (D_xD_x u + D_yD_yu) + f\rbrack^{n+1} + \nonumber
        

.. _Eq:_auto186:

.. math::

    \tag{437}
    \quad (1-\theta)\lbrack {\alpha} (D_xD_x u + D_yD_y u) + f\rbrack^{n}{\thinspace .}
        
        
Written out,

.. math::
        
         \frac{u^{n+1}_{i,j}-u^n_{i,j}}{\Delta t} =\nonumber
        

.. math::
          
        \qquad \theta ({\alpha}
        (\frac{u^{n+1}_{i-1,j} - 2^{n+1}_{i,j} + u^{n+1}_{i+1,j}}{\Delta x^2} +
        \frac{u^{n+1}_{i,j-1} - 2^{n+1}_{i,j} + u^{n+1}_{i,j+1}}{\Delta y^2}) +
        f^{n+1}_{i,j})
        + \nonumber
        

.. _Eq:_auto187:

.. math::

    \tag{438}
    \qquad (1-\theta)({\alpha}
        (\frac{u^{n}_{i-1,j} - 2^{n}_{i,j} + u^{n}_{i+1,j}}{\Delta x^2} +
        \frac{u^{n}_{i,j-1} - 2^{n}_{i,j} + u^{n}_{i,j+1}}{\Delta y^2}) +
        f^{n}_{i,j})
        
        
We collect the unknowns on the left-hand side

.. math::
        
         u^{n+1}_{i,j} -
        \theta\left(
        F_x
        (u^{n+1}_{i-1,j} - 2^{n+1}_{i,j} + u^{n+1}_{i,j}) +
        F_y
        (u^{n+1}_{i,j-1} - 2^{n+1}_{i,j} + u^{n+1}_{i,j+1})\right)
        = \nonumber
        

.. math::
          
        \qquad
        (1-\theta)\left(
        F_x
        (u^{n}_{i-1,j} - 2^{n}_{i,j} + u^{n}_{i,j}) +
        F_y
        (u^{n}_{i,j-1} - 2^{n}_{i,j} + u^{n}_{i,j+1})\right) + \nonumber
        

.. _Eq:diffu:2D:theta_scheme2:

.. math::

    \tag{439}
    \qquad \theta \Delta t f^{n+1}_{i,j} + (1-\theta) \Delta t f^{n}_{i,j}
        + u^n_{i,j},
        
        
where

.. math::
         F_x = \frac{{\alpha}\Delta t}{\Delta x^2},\quad F_y = \frac{{\alpha}\Delta t}{\Delta y^2},

are the Fourier numbers in :math:`x` and :math:`y` direction, respectively.

.. _diffu:2D:fig:mesh3x2:

.. figure:: mesh3x2.png
   :width: 500

   *3x2 2D mesh*

.. _diffu:2D:numbering:

Numbering of mesh points versus equations and unknowns
------------------------------------------------------

.. index::
   single: diffusion equation; 2D, numbering of mesh points

.. Nx=3, Ny=2

The equations :ref:`(439) <Eq:diffu:2D:theta_scheme2>` are coupled at the new
time level :math:`n+1`. That is, we must solve a system of (linear) algebraic
equations, which we will write as :math:`Ac=b`, where :math:`A` is the coefficient
matrix, :math:`c` is the vector of unknowns, and :math:`b` is the right-hand side.

Let us examine the equations in :math:`Ac=b` on a mesh with :math:`N_x=3` and
:math:`N_y=2` cells in each direction.  The spatial mesh is depicted in
Figure :ref:`diffu:2D:fig:mesh3x2`.  The equations at the boundary just
implement the boundary condition :math:`u=0`:

.. math::
        \begin{align*}
        & u^{n+1}_{0,0}=
        u^{n+1}_{1,0}=
        u^{n+1}_{2,0}=
        u^{n+1}_{3,0}=
        u^{n+1}_{0,1}=\\ 
        & u^{n+1}_{3,1}=
        u^{n+1}_{0,2}=
        u^{n+1}_{1,2}=
        u^{n+1}_{2,2}=
        u^{n+1}_{3,2}= 0{\thinspace .}
        \end{align*}

We are left with two interior points, with :math:`i=1`, :math:`j=1` and :math:`i=2`, :math:`j=1`.
The corresponding equations are

.. math::
        \begin{align*}
        & u^{n+1}_{i,j} -
        \theta\left(
        F_x
        (u^{n+1}_{i-1,j} - 2^{n+1}_{i,j} + u^{n+1}_{i,j}) +
        F_y
        (u^{n+1}_{i,j-1} - 2^{n+1}_{i,j} + u^{n+1}_{i,j+1})\right)
        =  \\ 
        &\qquad
        (1-\theta)\left(
        F_x
        (u^{n}_{i-1,j} - 2^{n}_{i,j} + u^{n}_{i,j}) +
        F_y
        (u^{n}_{i,j-1} - 2^{n}_{i,j} + u^{n}_{i,j+1})\right) + \\ 
        &\qquad \theta \Delta t f^{n+1}_{i,j} + (1-\theta) \Delta t f^{n}_{i,j}
        + u^n_{i,j},
        \end{align*}

There are in total 12 unknowns :math:`u^{n+1}_{i,j}` for :math:`i=0,1,2,3` and
:math:`j=0,1,2`.  To solve the equations, we need to form a matrix system :math:`Ac=b`.
In that system, the solution vector :math:`c` can only have one index. Thus,
we need a numbering of the unknowns with one
index, not two as used in the mesh. We introduce a mapping :math:`m(i,j)`
from a mesh point with indices :math:`(i,j)` to the corresponding unknown
:math:`p` in the equation system:

.. math::
         p = m(i,j) = j(N_x+1) + i{\thinspace .}

When :math:`i` and :math:`j` run through their values, we see the following mapping
to :math:`p`:

.. math::
        \begin{align*}
        &(0,0)\rightarrow 0,\ 
        (0,1)\rightarrow 1,\ 
        (0,2)\rightarrow 2,\ 
        (0,3)\rightarrow 3,\\ 
        &(1,0)\rightarrow 4,\ 
        (1,1)\rightarrow 5,\ 
        (1,2)\rightarrow 6,\ 
        (1,3)\rightarrow 7,\\ 
        &(2,0)\rightarrow 8,\ 
        (2,1)\rightarrow 9,\ 
        (2,2)\rightarrow 10,\ 
        (2,3)\rightarrow 11{\thinspace .}
        \end{align*}

That is, we number the points along the :math:`x` axis, starting with :math:`y=0`,
and then progress one "horizontal" mesh line at a time.
In Figure :ref:`diffu:2D:fig:mesh3x2` you can see that the :math:`(i,j)` and the
corresponding single index (:math:`p`) are listed for each mesh point.

We could equally well have numbered the equations in other ways, e.g.,
let the :math:`j` index be the fastest varying index:
:math:`p = m(i,j) = i(N_y+1) + j`.

Let us form the coefficient matrix :math:`A`, or more precisely, insert a
matrix element (according Python's convention with zero as base
index) for each of the nonzero elements in :math:`A` (the indices
run through the values of :math:`p`, i.e., :math:`p=0,\ldots,11`):

.. math::
        
        {\tiny
        \left(\begin{array}{cccccccccccc}
        (0,0) &   0   &   0   &   0   &   0   &   0   &   0   &   0   &   0   &   0   &   0    &   0    \\ 
          0   & (1,1) &   0   &   0   &   0   &   0   &   0   &   0   &   0   &   0   &   0    &   0    \\ 
          0   &   0   & (2,2) &   0   &   0   &   0   &   0   &   0   &   0   &   0   &   0    &   0    \\ 
          0   &   0   &   0   & (3,3) &   0   &   0   &   0   &   0   &   0   &   0   &   0    &   0    \\ 
          0   &   0   &   0   &   0   & (4,4) &   0   &   0   &   0   &   0   &   0   &   0    &   0    \\ 
          0   & (5,1) &   0   &   0   & (5,4) & (5,5) & (5,6) &   0   &   0   & (5,9) &   0    &   0    \\ 
          0   &   0   & (6,2) &   0   &   0   & (6,5) & (6,6) & (6,7) &   0   &   0   & (6,10) &   0    \\ 
          0   &   0   &   0   &   0   &   0   &   0   &   0   & (7,7) &   0   &   0   &   0    &   0    \\ 
          0   &   0   &   0   &   0   &   0   &   0   &   0   &   0   & (8,8) &   0   &   0    &   0    \\ 
          0   &   0   &   0   &   0   &   0   &   0   &   0   &   0   &   0   & (9,9) &   0    &   0    \\ 
           0   &    0   &    0   &    0   &    0   &    0   &    0   &    0   &    0   &    0   & (10,10) &    0    \\ 
           0   &    0   &    0   &    0   &    0   &    0   &    0   &    0   &    0   &    0   &    0    & (11,11) \\ 
        \end{array}\right)
        }
        

Here is a more compact visualization of the coefficient matrix where we
insert dots for zeros and bullets for non-zero elements:

.. math::
        
        \footnotesize
        \left(\begin{array}{cccccccccccc}
        \bullet & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot \\ 
        \cdot & \bullet & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot \\ 
        \cdot & \cdot & \bullet & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot \\ 
        \cdot & \cdot & \cdot & \bullet & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot \\ 
        \cdot & \cdot & \cdot & \cdot & \bullet & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot \\ 
        \cdot & \bullet & \cdot & \cdot & \bullet & \bullet & \bullet & \cdot & \cdot & \bullet & \cdot & \cdot \\ 
        \cdot & \cdot & \bullet & \cdot & \cdot & \bullet & \bullet & \bullet & \cdot & \cdot & \bullet & \cdot \\ 
        \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \bullet & \cdot & \cdot & \cdot & \cdot \\ 
        \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \bullet & \cdot & \cdot & \cdot \\ 
        \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \bullet & \cdot & \cdot \\ 
        \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \bullet & \cdot \\ 
        \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \bullet \\ 
        \end{array}\right)
        

It is clearly seen that most of the elements are zero. This is a general
feature of coefficient matrices arising from discretizing PDEs by
finite difference methods. We say that the matrix is *sparse*.

.. index::
   single: diffusion equation; 2D, sparse matrix

Let :math:`A_{p,q}` be the value of element :math:`(p,q)` in the coefficient matrix :math:`A`,
where :math:`p` and :math:`q` now correspond to the numbering of the unknowns in the
equation system.
We have :math:`A_{p,q}=1` for :math:`p=q=0,1,2,3,4,7,8,9,10,11`, corresponding
to all the known boundary values. Let :math:`p` be :math:`m(i,j)`, i.e.,
the single index corresponding to mesh point :math:`(i,j)`. Then we have

.. _Eq:_auto188:

.. math::

    \tag{440}
    A_{m(i,j),m(i,j)} = A_{p,p} =
        1 +
        \theta (F_x + F_y),
        
        
.. _Eq:_auto189:

.. math::

    \tag{441}
    A_{p, m(i-1,j)} = A_{p,p-1} = -\theta F_x,
        
        
.. _Eq:_auto190:

.. math::

    \tag{442}
    A_{p, m(i+1,j)} = A_{p,p+1} = -\theta F_x,
        
        
.. _Eq:_auto191:

.. math::

    \tag{443}
    A_{p, m(i,j-1)} = A_{p, p-(N_x+1)} = -\theta F_y,
        
        
.. _Eq:_auto192:

.. math::

    \tag{444}
    A_{p, m(i,j+1)} = A_{p, p+(N_x+1)} = -\theta F_y,
        
        
.. _Eq:_auto193:

.. math::

    \tag{445}
    for the equations associated with the two interior mesh points.
At these interior points, the single index :math:`p` takes on the
specific values :math:`p=5,6`, corresponding to the
values :math:`(1,1)` and :math:`(1,2)` of the pair :math:`(i,j)`.

The above values for :math:`A_{p,q}` can be inserted in the matrix:

.. math::
        
        {\tiny
        \left(\begin{array}{cccccccccccc}
        1 &   0   &   0   &   0   &   0   &   0   &   0   &   0   &   0   &   0   &   0    &   0    \\ 
          0   & 1 &   0   &   0   &   0   &   0   &   0   &   0   &   0   &   0   &   0    &   0    \\ 
          0   &   0   & 1 &   0   &   0   &   0   &   0   &   0   &   0   &   0   &   0    &   0    \\ 
          0   &   0   &   0   & 1 &   0   &   0   &   0   &   0   &   0   &   0   &   0    &   0    \\ 
          0   &   0   &   0   &   0   & 1 &   0   &   0   &   0   &   0   &   0   &   0    &   0    \\ 
          0   & -\theta F_y &   0   &   0   & -\theta F_x & 1+2\theta F_x & -\theta F_x &   0   &   0   & -\theta F_y &   0    &   0    \\ 
          0   &   0   & -\theta F_y &   0   &   0   & -\theta F_x & 1+2\theta F_x & -\theta F_x &   0   &   0   & -\theta F_y &   0    \\ 
          0   &   0   &   0   &   0   &   0   &   0   &   0   & 1 &   0   &   0   &   0    &   0    \\ 
          0   &   0   &   0   &   0   &   0   &   0   &   0   &   0   & 1 &   0   &   0    &   0    \\ 
          0   &   0   &   0   &   0   &   0   &   0   &   0   &   0   &   0   & 1 &   0    &   0    \\ 
           0   &    0   &    0   &    0   &    0   &    0   &    0   &    0   &    0   &    0   & 1 &    0    \\ 
           0   &    0   &    0   &    0   &    0   &    0   &    0   &    0   &    0   &    0   &    0    & 1 \\ 
        \end{array}\right)
        }
        

The corresponding right-hand side vector in the equation system has
the entries :math:`b_p`, where :math:`p` numbers the equations. We have

.. math::
         b_0=b_1=b_2=b_3=b_4=b_7=b_8=b_9=b_{10}=b_{11}=0,

for the boundary values. For the equations associated with the
interior points, we get for :math:`p=5,6`, corresponding to :math:`i=1,2` and :math:`j=1`:

.. math::
        \begin{align*}
        b_p &= u_i +
        (1-\theta)\left(
        F_x
        (u^{n}_{i-1,j} - 2^{n}_{i,j} + u^{n}_{i,j}) +
        F_y
        (u^{n}_{i,j-1} - 2^{n}_{i,j} + u^{n}_{i,j+1})\right) + \\ 
        &\qquad \theta \Delta t f^{n+1}_{i,j} + (1-\theta) \Delta t f^{n}_{i,j}{\thinspace .}
        \end{align*}

Recall that :math:`p=m(i,j)=j(N_x+1)+j` in this expression.

We can, as an alternative, leave the boundary mesh points out of the
matrix system. For a mesh with :math:`N_x=3` and :math:`N_y=2` there are only two
internal mesh points whose unknowns will enter the matrix system.
We must now number the unknowns at the interior points:

.. math::
         p = (j-1)(N_x-1) + i,

for :math:`i=1,\ldots,N_x-1`, :math:`j=1,\ldots,N_y-1`.

.. Nx=4, Ny=3

.. _diffu:2D:fig:mesh4x3:

.. figure:: mesh4x3.png
   :width: 700

   *4x3 2D mesh*

We can continue with illustrating a bit larger mesh, :math:`N_x=4` and :math:`N_y=3`,
see Figure :ref:`diffu:2D:fig:mesh4x3`. The corresponding coefficient matrix
with dots for zeros and bullets for non-zeroes looks as follows (values at boundary points are included in the equation system):

.. math::
        
        {\tiny
        \left(\begin{array}{cccccccccccccccccccc}
        \bullet & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot \\ 
        \cdot & \bullet & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot \\ 
        \cdot & \cdot & \bullet & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot \\ 
        \cdot & \cdot & \cdot & \bullet & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot \\ 
        \cdot & \cdot & \cdot & \cdot & \bullet & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot \\ 
        \cdot & \cdot & \cdot & \cdot & \cdot & \bullet & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot \\ 
        \cdot & \bullet & \cdot & \cdot & \cdot & \bullet & \bullet & \bullet & \cdot & \cdot & \cdot & \bullet & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot \\ 
        \cdot & \cdot & \bullet & \cdot & \cdot & \cdot & \bullet & \bullet & \bullet & \cdot & \cdot & \cdot & \bullet & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot \\ 
        \cdot & \cdot & \cdot & \bullet & \cdot & \cdot & \cdot & \bullet & \bullet & \bullet & \cdot & \cdot & \cdot & \bullet & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot \\ 
        \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \bullet & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot \\ 
        \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \bullet & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot \\ 
        \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \bullet & \cdot & \cdot & \cdot & \bullet & \bullet & \bullet & \cdot & \cdot & \cdot & \bullet & \cdot & \cdot & \cdot \\ 
        \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \bullet & \cdot & \cdot & \cdot & \bullet & \bullet & \bullet & \cdot & \cdot & \cdot & \bullet & \cdot & \cdot \\ 
        \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \bullet & \cdot & \cdot & \cdot & \bullet & \bullet & \bullet & \cdot & \cdot & \cdot & \bullet & \cdot \\ 
        \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \bullet & \cdot & \cdot & \cdot & \cdot & \cdot \\ 
        \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \bullet & \cdot & \cdot & \cdot & \cdot \\ 
        \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \bullet & \cdot & \cdot & \cdot \\ 
        \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \bullet & \cdot & \cdot \\ 
        \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \bullet & \cdot \\ 
        \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \bullet \\ 
        \end{array}\right)
        }
        

.. index::
   single: diffusion equation; 2D, banded matrix


.. admonition:: The coefficient matrix is banded

   Besides being sparse, we observe that the coefficient matrix is *banded*:
   it has five distinct bands. We have the diagonal :math:`A_{i,i}`, the
   subdiagonal :math:`A_{i-1,j}`, the superdiagonal :math:`A_{i,i+1}`, a lower
   diagonal :math:`A_{i,i-(Nx+1)}`, and an upper diagonal :math:`A_{i,i+(Nx+1)}`.
   The other matrix entries are known to be zero. With :math:`N_x+1=N_y+1=N`,
   only a fraction :math:`5N^{-2}` of the matrix entries are nonzero, so the
   matrix is clearly very sparse for relevant :math:`N` values.
   The more we can compute with the nonzeros only, the faster the solution
   methods will be.


.. _diffu:2D:alg:

Algorithm for setting up the coefficient matrix
-----------------------------------------------

We looked at a specific mesh in the previous section, formulated
the equations, and saw what the corresponding coefficient matrix and
right-hand side are. Now our aim is to set up a general algorithm, for any
choice of :math:`N_x` and :math:`N_y`, that produces the coefficient matrix and
the right-hand side vector.
We start with a zero matrix and vector, run through each mesh point,
and fill in the values depending on whether the mesh point is an interior
point or on the boundary.

 * for :math:`i=0,\ldots,N_x`

  * for :math:`j=0,\ldots, N_y`

    * :math:`p=j(N_x+1)+i`

    * if point :math:`(i,j)` is on the boundary:

      * :math:`A_{p,p}=1`, :math:`b_p=0`

    * else:

      * fill :math:`A_{p,m(i-1,j)}`, :math:`A_{p,m(i+1,j)}`, :math:`A_{p,m(i,j)}`, :math:`A_{p,m(i,j-1)}`, :math:`A_{p,m(i,j+1)}`, and :math:`b_p`

To ease the test on whether :math:`(i,j)` is on the boundary or not, we can
split the loops a bit, starting with the boundary line :math:`j=0`, then
treat the interior lines :math:`1\leq j<N_y`, and finally treat the boundary
line :math:`j=N_y`:

 * for :math:`i=0,\ldots,N_x`

  * boundary :math:`j=0`: :math:`p=j(N_x+1)+i`, :math:`A_{p,p}=1`

 * for :math:`j=0,\ldots,N_y`

  * boundary :math:`i=0`: :math:`p=j(N_x+1)+i`, :math:`A_{p,p}=1`

  * for :math:`i=1,\ldots, N_x-1`

    * interior point :math:`p=j(N_x+1)+i`

    * fill :math:`A_{p,m(i-1,j)}`, :math:`A_{p,m(i+1,j)}`, :math:`A_{p,m(i,j)}`, :math:`A_{p,m(i,j-1)}`, :math:`A_{p,m(i,j+1)}`, and :math:`b_p`

  * boundary :math:`i=N_x`: :math:`p=j(N_x+1)+i`, :math:`A_{p,p}=1`

 * for :math:`i=0,\ldots,N_x`

  * boundary :math:`j=N_y`: :math:`p=j(N_x+1)+i`, :math:`A_{p,p}=1`

The right-hand side is set up as follows.

 * for :math:`i=0,\ldots,N_x`

  * boundary :math:`j=0`: :math:`p=j(N_x+1)+i`, :math:`b_p=0`

 * for :math:`j=0,\ldots,N_y`

  * boundary :math:`i=0`: :math:`p=j(N_x+1)+i`, :math:`b_p=0`

  * for :math:`i=1,\ldots, N_x-1`

    * interior point :math:`p=j(N_x+1)+i`

    * fill :math:`b_p`

  * boundary :math:`i=N_x`: :math:`p=j(N_x+1)+i`, :math:`b_p=0`

 * for :math:`i=0,\ldots,N_x`

  * boundary :math:`j=N_y`: :math:`p=j(N_x+1)+i`, :math:`b_p=0`

.. _diffu:2D:impl:dense:

Implementation with a dense coefficient matrix
----------------------------------------------

.. index::
   single: diffusion equation; 2D, implementation

The goal now is to map the algorithms in the previous section to
Python code. One should, for computational efficiency reasons, take
advantage of the fact that the coefficient matrix is sparse and/or
banded, i.e., take advantage of all the zeros. However, we first demonstrate
how to fill an :math:`N\times N` dense square matrix, where :math:`N` is the number
of unknowns, here :math:`N=(N_x+1)(N_y+1)`. The dense matrix is much easier
to understand than the sparse matrix case.

.. code-block:: python

    import numpy as np
    
    def solver_dense(
        I, a, f, Lx, Ly, Nx, Ny, dt, T, theta=0.5, user_action=None):
        """
        Solve u_t = a*(u_xx + u_yy) + f, u(x,y,0)=I(x,y), with u=0
        on the boundary, on [0,Lx]x[0,Ly]x[0,T], with time step dt,
        using the theta-scheme.
        """
        x = np.linspace(0, Lx, Nx+1)       # mesh points in x dir
        y = np.linspace(0, Ly, Ny+1)       # mesh points in y dir
        dx = x[1] - x[0]
        dy = y[1] - y[0]
    
        dt = float(dt)                    # avoid integer division
        Nt = int(round(T/float(dt)))
        t = np.linspace(0, Nt*dt, Nt+1)   # mesh points in time
    
        # Mesh Fourier numbers in each direction
        Fx = a*dt/dx**2
        Fy = a*dt/dy**2

The :math:`u^{n+1}_{i,j}` and :math:`u^n_{i,j}` mesh functions are represented
by their spatial values at the mesh points:

.. code-block:: python

    u   = np.zeros((Nx+1, Ny+1))      # unknown u at new time level
    u_n = np.zeros((Nx+1, Ny+1))      # u at the previous time level

It is a good habit (for extensions) to
introduce index sets for all mesh points:

.. code-block:: python

    Ix = range(0, Nx+1)
    Iy = range(0, Ny+1)
    It = range(0, Nt+1)

The initial condition is easy to fill in:

.. code-block:: python

    # Load initial condition into u_n
    for i in Ix:
        for j in Iy:
            u_n[i,j] = I(x[i], y[j])

The memory for the coefficient matrix and right-hand side vector
is allocated by

.. code-block:: python

    N = (Nx+1)*(Ny+1)  # no of unknowns
    A = np.zeros((N, N))
    b = np.zeros(N)

The filling of ``A`` goes like this:

.. code-block:: python

    m = lambda i, j: j*(Nx+1) + i
    
    # Equations corresponding to j=0, i=0,1,... (u known)
    j = 0
    for i in Ix:
        p = m(i,j);  A[p, p] = 1
    
    # Loop over all internal mesh points in y diretion
    # and all mesh points in x direction
    for j in Iy[1:-1]:
        i = 0;  p = m(i,j);  A[p, p] = 1   # Boundary
        for i in Ix[1:-1]:                 # Interior points
            p = m(i,j)
            A[p, m(i,j-1)] = - theta*Fy
            A[p, m(i-1,j)] = - theta*Fx
            A[p, p]        = 1 + 2*theta*(Fx+Fy)
            A[p, m(i+1,j)] = - theta*Fx
            A[p, m(i,j+1)] = - theta*Fy
        i = Nx;  p = m(i,j);  A[p, p] = 1  # Boundary
    # Equations corresponding to j=Ny, i=0,1,... (u known)
    j = Ny
    for i in Ix:
        p = m(i,j);  A[p, p] = 1

Since ``A`` is independent of time, it can be filled once and for all before
the time loop. The right-hand side vector must be filled at each
time level inside the time loop:

.. code-block:: python

    import scipy.linalg
    
    for n in It[0:-1]:
        # Compute b
        j = 0
        for i in Ix:
            p = m(i,j);  b[p] = 0           # Boundary
        for j in Iy[1:-1]:
            i = 0;  p = m(i,j);  b[p] = 0   # Boundary
            for i in Ix[1:-1]:              # Interior points
                p = m(i,j)
                b[p] = u_n[i,j] + \ 
                  (1-theta)*(
                  Fx*(u_n[i+1,j] - 2*u_n[i,j] + u_n[i-1,j]) +\ 
                  Fy*(u_n[i,j+1] - 2*u_n[i,j] + u_n[i,j-1]))\ 
                    + theta*dt*f(i*dx,j*dy,(n+1)*dt) + \ 
                  (1-theta)*dt*f(i*dx,j*dy,n*dt)
            i = Nx;  p = m(i,j);  b[p] = 0  # Boundary
        j = Ny
        for i in Ix:
            p = m(i,j);  b[p] = 0           # Boundary
    
        # Solve matrix system A*c = b
        c = scipy.linalg.solve(A, b)
    
        # Fill u with vector c
        for i in Ix:
            for j in Iy:
                u[i,j] = c[m(i,j)]
    
        # Update u_n before next step
        u_n, u = u, u_n

We use ``solve`` from ``scipy.linalg`` and not from ``numpy.linalg``. The difference
is stated below.


.. admonition:: ``scipy.linalg`` versus ``numpy.linalg``

   Quote from the `SciPy documentation <http://docs.scipy.org/doc/scipy/reference/tutorial/linalg.html>`__:
   
   ``scipy.linalg`` contains all the functions in ``numpy.linalg``
   plus some other more advanced ones not contained in ``numpy.linalg``.
   
   Another advantage of using ``scipy.linalg`` over numpy.linalg is that it is always compiled with BLAS/LAPACK support, while for NumPy this is optional. Therefore, the SciPy version might be faster depending on how NumPy was installed.
   
   Therefore, unless you don't want to add SciPy as a dependency to your NumPy program, use ``scipy.linalg`` instead of ``numpy.linalg``.


The code shown above is available in the ``solver_dense`` function
in the file `diffu2D_u0.py <http://tinyurl.com/nu656p2/diffu/diffu2D_u0.py>`__, differing only
in the boundary conditions, which in the code can be an arbitrary function along
each side of the domain.

We do not bother to look at vectorized versions of filling ``A`` since
a dense matrix is just used of pedagogical reasons for the very first
implementation. Vectorization will be treated when ``A`` has a sparse
matrix representation, as in the section :ref:`diffu:2D:impl:sparse`.


.. admonition:: How to debug the computation of :math:`A` and :math:`b`

   A good starting point for debugging the filling of :math:`A` and :math:`b` is
   to choose a very coarse mesh, say :math:`N_x=N_y=2`, where there is just
   one internal mesh point, compute the equations by hand, and
   print out ``A`` and ``b`` for comparison in the code. If wrong elements
   in ``A`` or ``b`` occur, print out each assignment to elements in
   ``A`` and ``b`` inside the loops and compare with what you expect.


To let the user store, analyze, or visualize the solution at each
time level, we include a callback function, named ``user_action``,
to be called before the time loop and in each pass in that loop.
The function has the signature

.. code-block:: python

    user_action(u, x, xv, y, yv, t, n)

where ``u`` is a two-dimensional array holding the solution at time level ``n``
and time ``t[n]``. The :math:`x` and :math:`y` coordinates of the mesh points are given by
the arrays ``x`` and ``y``, respectively. The arrays ``xv`` and ``yv`` are
vectorized representations of the mesh points such that vectorized
function evaluations can be invoked. The ``xv`` and ``yv`` arrays are
defined by

.. code-block:: python

    xv = x[:,np.newaxis]
    yv = y[np.newaxis,:]

One can then evaluate, e.g., :math:`f(x,y,t)` at all internal mesh points at time
level ``n`` by first evaluating :math:`f` at all points,

.. code-block:: python

    f_a = f(xv, yv, t[n])

and then use slices to extract a view of the values at the internal
mesh points: ``f_a[1:-1,1:-1]``.
The next section features an example on writing a ``user_action``
callback function.

.. _diffu:2D:verify:

Verification: exact numerical solution
--------------------------------------

.. index::
   single: diffusion equation; 2D, verification (exact num. sol.)

A good test example to start with is one that preserves the solution
:math:`u=0`, i.e., :math:`f=0` and :math:`I(x,y)=0`. This trivial solution can uncover
some bugs.

The first real test example is based on having an exact solution of
the discrete equations. This solution is linear in time and quadratic
in space:

.. math::
         u(x,y,t) = 5tx(L_x-x)y(y-L_y){\thinspace .}

Inserting this manufactured solution in the PDE shows that the
source term :math:`f` must be

.. math::
         f(x,y,t) = 5x(L_x-x)y(y-L_y) + 10{\alpha} t (x(L_x-x)+ y(y-L_y)){\thinspace .}

We can use the ``user_action`` function to compare the numerical solution
with the exact solution at each time level. A suitable helper function
for checking the solution goes like this:

.. code-block:: text

    def quadratic(theta, Nx, Ny):
    
        def u_exact(x, y, t):
            return 5*t*x*(Lx-x)*y*(Ly-y)
        def I(x, y):
            return u_exact(x, y, 0)
        def f(x, y, t):
            return 5*x*(Lx-x)*y*(Ly-y) + 10*a*t*(y*(Ly-y)+x*(Lx-x))
    
        # Use rectangle to detect errors in switching i and j in scheme
        Lx = 0.75
        Ly = 1.5
        a = 3.5
        dt = 0.5
        T = 2
    
        def assert_no_error(u, x, xv, y, yv, t, n):
            """Assert zero error at all mesh points."""
            u_e = u_exact(xv, yv, t[n])
            diff = abs(u - u_e).max()
            tol = 1E-12
            msg = 'diff=%g, step %d, time=%g' % (diff, n, t[n])
            print msg
            assert diff < tol, msg
    
        solver_dense(
            I, a, f, Lx, Ly, Nx, Ny,
            dt, T, theta, user_action=assert_no_error)

A true test function for checking the quadratic solution for several
different meshes and :math:`\theta` values can take the form

.. code-block:: python

    def test_quadratic():
        # For each of the three schemes (theta = 1, 0.5, 0), a series of
        # meshes are tested (Nx > Ny and Nx < Ny)
        for theta in [1, 0.5, 0]:
            for Nx in range(2, 6, 2):
                for Ny in range(2, 6, 2):
                    print 'testing for %dx%d mesh' % (Nx, Ny)
                    quadratic(theta, Nx, Ny)

.. _diffu:2D:convrate:

Verification: convergence rates          (2)
--------------------------------------------

.. index::
   single: diffusion equation; 2D, verification (conv. rates)

For 2D verification with convergence rate computations, the expressions
and computations just build naturally on what we saw for 1D diffusion.
Truncation error analysis and other forms of error analysis point to a
numerical error formula like

.. math::
         E = C_t\Delta t^p + C_x\Delta x^2 + C_y\Delta y^2,

where :math:`p`, :math:`C_t`, :math:`C_x`, and :math:`C_y` are constants. Often, the analysis of
a Crank-Nicolson
method can show that :math:`p=2`, while the Forward and Backward Euler schemes
have :math:`p=1`.

When checking the error formula empirically, we need to reduce it to
a form :math:`E=Ch^r` with a single discretization parameter :math:`h` and some
rate :math:`r` to be estimated. For the Backward Euler method,
where :math:`p=1`, we can introduce a single discretization parameter
according to

.. math::
         h = \Delta x^2 =  \Delta y^2,\quad h = K^{-1}\Delta t,

where :math:`K` is a constant. The error formula then becomes

.. math::
         E = C_t Kh + C_xh + C_y = \tilde C h,\quad \tilde C = C_tK + C_x + C_y{\thinspace .}

The simplest choice is obviously :math:`K=1`. With the Forward Euler method, however,
stability requires :math:`\Delta t = hK \leq h/(4{\alpha})`, so :math:`K\leq 1/(4{\alpha})`.

For the Crank-Nicolson method, :math:`p=2`, and we can simply choose

.. math::
         h = \Delta x = \Delta y = \Delta t,

since there is no restriction on :math:`\Delta t` in terms of :math:`\Delta x` and
:math:`\Delta y`.

A frequently used error measure is the :math:`\ell^2` norm of the error mesh
point values. The section :ref:`wave:pde2:fd:MMS` and the formula
:ref:`(185) <Eq:wave:pde2:fd:MMS:E:l2>` shows the error measure for a 1D
time-dependent problem. The extension to the current 2D problem
reads

.. math::
         E = \left(\Delta t\Delta x\Delta y \sum_{n=0}^{N_t}
        \sum_{i=0}^{N_x}\sum_{j=0}^{N_y}({u_{\small\mbox{e}}}(x_i,y_j,t_n)
        - u^n_{i,j})^2\right)^{\frac{1}{2}}{\thinspace .}

One attractive manufactured solution is

.. math::
         {u_{\small\mbox{e}}} = e^{-pt}\sin(k_xx)\sin(k_yy),\quad k_x=\frac{\pi}{L_x},
        k_y=\frac{\pi}{L_y},

where :math:`p` can be arbitrary. The required source term is

.. math::
         f = ({\alpha}(k_x^2 + k_y^2) - p){u_{\small\mbox{e}}}{\thinspace .}

The function ``convergence_rates`` in
`diffu2D_u0.py <http://tinyurl.com/nu656p2/diffu/diffu2D_u0.py>`__ implements a convergence
rate test. Two potential difficulties are important to be aware of:

1. The error formula is assumed to be
   correct when :math:`h\rightarrow 0`, so for coarse meshes the estimated rate
   :math:`r` may be somewhat away from the expected value.
   Fine meshes may lead to prohibitively long execution times.

2. Choosing :math:`p={\alpha} (k_x^2 + k_y^2)` in the manufactured solution above
   seems attractive (:math:`f=0`), but leads to a slower approach to the
   asymptotic range where the error formula is valid (i.e., :math:`r`
   fluctuates and needs finer meshes to stabilize).

.. _diffu:2D:impl:sparse:

Implementation with a sparse coefficient matrix
-----------------------------------------------

.. index::
   single: diffusion equation; 2D, implementation (sparse)

We used a sparse matrix implementation in the section :ref:`diffu:pde1:impl:sparse`
for a 1D problem with a tridiagonal matrix. The present matrix, arising
from a 2D problem, has five diagonals, but we can use the same
sparse matrix data structure ``scipy.sparse.diags``.

Understanding the diagonals
~~~~~~~~~~~~~~~~~~~~~~~~~~~

Let us look closer at the diagonals in the example with a :math:`4\times 3` mesh
as depicted in Figure :ref:`diffu:2D:fig:mesh4x3` and its associated matrix
visualized by dots for zeros and bullets for nonzeros. From the example
mesh, we may generalize to an :math:`N_x\times N_y` mesh.

.. math::
        
        {\tiny
        \begin{array}{lcccccccccccccccccccc}
        0 =m(0,0) & \bullet & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot \\ 
        1 = m(1,0) & \cdot & \bullet & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot \\ 
        2 = m(2,0) & \cdot & \cdot & \bullet & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot \\ 
        3 = m(3,0) & \cdot & \cdot & \cdot & \bullet & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot \\ 
        N_x=m(N_x,0) & \cdot & \cdot & \cdot & \cdot & \bullet & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot \\ 
        N_x+1=m(0,1) & \cdot & \cdot & \cdot & \cdot & \cdot & \bullet & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot \\ 
        (N_x+1)+1=m(1,1) & \cdot & \bullet & \cdot & \cdot & \cdot & \bullet & \bullet & \bullet & \cdot & \cdot & \cdot & \bullet & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot \\ 
        (N_x+1)+2=m(2,1) & \cdot & \cdot & \bullet & \cdot & \cdot & \cdot & \bullet & \bullet & \bullet & \cdot & \cdot & \cdot & \bullet & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot \\ 
        (N_x+1)+3=m(3,1) & \cdot & \cdot & \cdot & \bullet & \cdot & \cdot & \cdot & \bullet & \bullet & \bullet & \cdot & \cdot & \cdot & \bullet & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot \\ 
        (N_x+1)+N_x=m(N_x,1) & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \bullet & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot \\ 
        2(N_x+1)=m(0,2)& \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \bullet & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot \\ 
        2(N_x+1)+1=m(1,2)& \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \bullet & \cdot & \cdot & \cdot & \bullet & \bullet & \bullet & \cdot & \cdot & \cdot & \bullet & \cdot & \cdot & \cdot \\ 
        2(N_x+1)+2=m(2,2)& \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \bullet & \cdot & \cdot & \cdot & \bullet & \bullet & \bullet & \cdot & \cdot & \cdot & \bullet & \cdot & \cdot \\ 
        2(N_x+1)+3=m(3,2)& \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \bullet & \cdot & \cdot & \cdot & \bullet & \bullet & \bullet & \cdot & \cdot & \cdot & \bullet & \cdot \\ 
        2(N_x+1)+N_x=m(N_x,2)& \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \bullet & \cdot & \cdot & \cdot & \cdot & \cdot \\ 
        N_y(N_x+1)=m(0,N_y)& \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \bullet & \cdot & \cdot & \cdot & \cdot \\ 
        N_y(N_x+1)+1=m(1,N_y)& \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \bullet & \cdot & \cdot & \cdot \\ 
        N_y(N_x+1)+2=m(2,N_y)& \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \bullet & \cdot & \cdot \\ 
        N_y(N_x+1)+3=m(3,N_y)& \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \bullet & \cdot \\ 
        N_y(N_x+1)+N_x=m(N_x,N_y)& \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \bullet \\ 
        \end{array}
        }
        

The main diagonal has :math:`N=(N_x+1)(N_y+1)` elements, while the sub- and
super-diagonals have :math:`N-1` elements. By looking at the matrix above,
we realize that the lower diagonal starts in row :math:`N_x+1` and goes to
row :math:`N`, so its length is :math:`N-(N_x+1)`. Similarly, the upper diagonal
starts at row 0 and lasts to row :math:`N-(N_x+1)`, so it has the same length.
Based on this information, we declare the diagonals by

.. code-block:: python

    main   = np.zeros(N)            # diagonal
    lower  = np.zeros(N-1)          # subdiagonal
    upper  = np.zeros(N-1)          # superdiagonal
    lower2 = np.zeros(N-(Nx+1))     # lower diagonal
    upper2 = np.zeros(N-(Nx+1))     # upper diagonal
    b      = np.zeros(N)            # right-hand side

Filling the diagonals
~~~~~~~~~~~~~~~~~~~~~

We run through all mesh points and fill in elements on the various
diagonals. The line of mesh points corresponding to :math:`j=0` are all
on the boundary, and only the main diagonal gets a contribution:

.. code-block:: python

    m = lambda i, j: j*(Nx+1) + i
    j = 0; main[m(0,j):m(Nx+1,j)] = 1  # j=0 boundary line

Then we run through all interior :math:`j=\hbox{const}` lines of mesh points.
The first and the last point on each line, :math:`i=0` and :math:`i=N_x`, correspond
to boundary points:

.. code-block:: python

    for j in Iy[1:-1]:             # Interior mesh lines j=1,...,Ny-1
        i = 0;   main[m(i,j)] = 1
        i = Nx;  main[m(i,j)] = 1  # Boundary

For the interior mesh points :math:`i=1,\ldots,N_x-1` on a mesh line :math:`y=\hbox{const}`
we can start with the main diagonal. The entries to be filled go from
:math:`i=1` to :math:`i=N_x-1` so the relevant slice in the ``main`` vector is
``m(1,j):m(Nx,j)``:

.. code-block:: python

    main[m(1,j):m(Nx,j)] = 1 + 2*theta*(Fx+Fy)

The ``upper`` array for the superdiagonal has its index 0 corresponding to
row 0 in the matrix, and the array entries
to be set go from :math:`m(1,j)` to :math:`m(N_x-1,j)`:

.. code-block:: python

    upper[m(1,j):m(Nx,j)] = - theta*Fx

The subdiagonal (``lower`` array), however, has its index 0
corresponding to row 1, so there is an offset of 1 in indices compared to
the matrix. The first nonzero occurs (interior point) at a mesh line :math:`j=\hbox{const}` corresponding to matrix row :math:`m(1,j)`, and the corresponding array index
in ``lower`` is then :math:`m(1,j)`. To fill the entries from :math:`m(1,j)` to :math:`m(N_x-1,j)`
we set the following slice in ``lower``:

.. code-block:: python

    lower_offset = 1
    lower[m(1,j)-lower_offset:m(Nx,j)-lower_offset] = - theta*Fx

For the upper diagonal, its index 0 corresponds to matrix row 0, so there
is no offset and we can set the entries correspondingly to ``upper``:

.. code-block:: python

    upper2[m(1,j):m(Nx,j)] = - theta*Fy

The ``lower2`` diagonal, however, has its first index 0 corresponding to row
:math:`N_x+1`, so here we need to subtract the offset :math:`N_x+1`:

.. code-block:: python

    lower2_offset = Nx+1
    lower2[m(1,j)-lower2_offset:m(Nx,j)-lower2_offset] = - theta*Fy

We can now summarize the above code lines for setting the entries in
the sparse matrix representation of the coefficient matrix:

.. code-block:: python

    lower_offset = 1
    lower2_offset = Nx+1
    m = lambda i, j: j*(Nx+1) + i
    
    j = 0; main[m(0,j):m(Nx+1,j)] = 1  # j=0 boundary line
    for j in Iy[1:-1]:             # Interior mesh lines j=1,...,Ny-1
        i = 0;   main[m(i,j)] = 1  # Boundary
        i = Nx;  main[m(i,j)] = 1  # Boundary
        # Interior i points: i=1,...,N_x-1
        lower2[m(1,j)-lower2_offset:m(Nx,j)-lower2_offset] = - theta*Fy
        lower[m(1,j)-lower_offset:m(Nx,j)-lower_offset] = - theta*Fx
        main[m(1,j):m(Nx,j)] = 1 + 2*theta*(Fx+Fy)
        upper[m(1,j):m(Nx,j)] = - theta*Fx
        upper2[m(1,j):m(Nx,j)] = - theta*Fy
    j = Ny; main[m(0,j):m(Nx+1,j)] = 1  # Boundary line

The next task is to create the sparse matrix from these diagonals:

.. code-block:: python

    import scipy.sparse
    
    A = scipy.sparse.diags(
        diagonals=[main, lower, upper, lower2, upper2],
        offsets=[0, -lower_offset, lower_offset,
                 -lower2_offset, lower2_offset],
        shape=(N, N), format='csr')

Filling the right-hand side; scalar version
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Setting the entries in the right-hand side is easier since there are no
offsets in the array to take into account. The is in fact similar to
the one previously shown when we used a dense matrix representation
(the right-hand side vector is, of course, independent of what type of
representation we use for the coefficient matrix). The complete time
loop goes as follows.

.. code-block:: python

    import scipy.sparse.linalg
    
    for n in It[0:-1]:
        # Compute b
        j = 0
        for i in Ix:
            p = m(i,j);  b[p] = 0                     # Boundary
        for j in Iy[1:-1]:
            i = 0;  p = m(i,j);  b[p] = 0             # Boundary
            for i in Ix[1:-1]:
                p = m(i,j)                            # Interior
                b[p] = u_n[i,j] + \ 
                  (1-theta)*(
                  Fx*(u_n[i+1,j] - 2*u_n[i,j] + u_n[i-1,j]) +\ 
                  Fy*(u_n[i,j+1] - 2*u_n[i,j] + u_n[i,j-1]))\ 
                    + theta*dt*f(i*dx,j*dy,(n+1)*dt) + \ 
                  (1-theta)*dt*f(i*dx,j*dy,n*dt)
            i = Nx;  p = m(i,j);  b[p] = 0            # Boundary
        j = Ny
        for i in Ix:
            p = m(i,j);  b[p] = 0                     # Boundary
    
        # Solve matrix system A*c = b
        c = scipy.sparse.linalg.spsolve(A, b)
    
        # Fill u with vector c
        for i in Ix:
            for j in Iy:
                u[i,j] = c[m(i,j)]
    
        # Update u_n before next step
        u_n, u = u, u_n

Filling the right-hand side; vectorized version
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Since we use a sparse matrix and try to speed up the computations, we
should examine the loops and see if some can be easily removed by
vectorization. In the filling of :math:`A` we have already used vectorized
expressions at each :math:`j=\hbox{const}` line of mesh points. We can
very easily do the same in the code above and remove the need for
loops over the ``i`` index:

.. code-block:: python

    for n in It[0:-1]:
        # Compute b, vectorized version
    
        # Precompute f in array so we can make slices
        f_a_np1 = f(xv, yv, t[n+1])
        f_a_n   = f(xv, yv, t[n])
    
        j = 0; b[m(0,j):m(Nx+1,j)] = 0     # Boundary
        for j in Iy[1:-1]:
            i = 0;   p = m(i,j);  b[p] = 0 # Boundary
            i = Nx;  p = m(i,j);  b[p] = 0 # Boundary
            imin = Ix[1]
            imax = Ix[-1]  # for slice, max i index is Ix[-1]-1
            b[m(imin,j):m(imax,j)] = u_n[imin:imax,j] + \ 
                  (1-theta)*(Fx*(
              u_n[imin+1:imax+1,j] -
            2*u_n[imin:imax,j] + \ 
              u_n[imin-1:imax-1,j]) +
                             Fy*(
              u_n[imin:imax,j+1] -
            2*u_n[imin:imax,j] +
              u_n[imin:imax,j-1])) + \ 
                theta*dt*f_a_np1[imin:imax,j] + \ 
              (1-theta)*dt*f_a_n[imin:imax,j]
        j = Ny;  b[m(0,j):m(Nx+1,j)] = 0 # Boundary
    
        # Solve matrix system A*c = b
        c = scipy.sparse.linalg.spsolve(A, b)
    
        # Fill u with vector c
        u[:,:] = c.reshape(Ny+1,Nx+1).T
    
        # Update u_n before next step
        u_n, u = u, u_n

The most tricky part of this code snippet is the loading of values from
the one-dimensional array ``c``
into the two-dimensional array ``u``. With our numbering of unknowns
from left to right along "horizontal" mesh lines, the correct
reordering of the one-dimensional array ``c`` as a two-dimensional array
requires first a reshaping to an ``(Ny+1,Nx+1)`` two-dimensional
array and then taking the transpose. The result is an ``(Nx+1,Ny+1)``
array compatible with ``u`` both in size and appearance of the function values.

The ``spsolve`` function in ``scipy.sparse.linalg`` is an efficient
version of Gaussian elimination suited for matrices described by
diagonals.  The algorithm is known as *sparse Gaussian elimination*,
and ``spsolve`` calls up a well-tested C code called `SuperLU <http://crd-legacy.lbl.gov/~xiaoye/SuperLU/>`__.

The complete code utilizing ``spsolve``
is found in the ``solver_sparse`` function in the file
`diffu2D_u0.py <http://tinyurl.com/nu656p2/diffu/diffu2D_u0.py>`__.

Verification          (8)
~~~~~~~~~~~~~~~~~~~~~~~~~

We can easily extend the function ``quadratic`` from
the section :ref:`diffu:2D:verify` to include a test of the
``solver_sparse`` function as well.

.. code-block:: python

    def quadratic(theta, Nx, Ny):
        ...
        t, cpu = solver_sparse(
            I, a, f, Lx, Ly, Nx, Ny,
            dt, T, theta, user_action=assert_no_error)

The Jacobi iterative method
---------------------------

.. index:: Jacobi method

So far we have created a matrix and right-hand side of a linear system
:math:`Ac=b` and solved the system for :math:`c` by calling an exact algorithm
based on Gaussian elimination. A much simpler implementation, which
requires no memory for the coefficient matrix :math:`A`, arises if we solve
the system by *iterative* methods. These methods are only approximate,
and the core algorithm is repeated many times until the solution is
considered to be converged.

Numerical scheme and linear system
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

To illustrate the idea of the Jacobi method, we simplify the numerical scheme to the
Backward Euler case, :math:`\theta=1`, so there are fewer terms to write:

.. math::
        
         u^{n+1}_{i,j} -
        \left(
        F_x
        (u^{n+1}_{i-1,j} - 2u^{n+1}_{i,j} + u^{n+1}_{i,j}) +
        F_y
        (u^{n+1}_{i,j-1} - 2u^{n+1}_{i,j} + u^{n+1}_{i,j+1})\right)
        = \nonumber
        

.. _Eq:diffu:2D:BE_scheme:

.. math::

    \tag{446}
    \qquad u^n_{i,j} + \Delta t f^{n+1}_{i,j}
        
        
The idea of the *Jacobi* iterative method is to introduce an iteration,
here with index :math:`r`, where we in each iteration treat :math:`u^{n+1}_{i,j}`
as unknown, but use values from the previous iteration for
the other unknowns :math:`u^{n+1}_{i\pm 1,j\pm 1}`.

Iterations
~~~~~~~~~~

Let :math:`u^{n+1,r}_{i,j}`
be the approximation to :math:`u^{n+1}_{i,j}` in iteration :math:`r`, for all
relevant :math:`i` and :math:`j` indices. We first solve with respect to
:math:`u^{n+1}_{i,j}` to get the equation to solve:

.. math::
        
         u^{n+1}_{i,j} = (1+2F_x +2F_y)^{-1}
        \left(
        F_x
        (u^{n+1}_{i-1,j} + u^{n+1}_{i,j}) +
        F_y
        (u^{n+1}_{i,j-1} + u^{n+1}_{i,j+1})\right) +
        \nonumber
        

.. _Eq:diffu:2D:Jacobi0:

.. math::

    \tag{447}
    \qquad
        u^n_{i,j} + \Delta t f^{n+1}_{i,j}
        
        
The iteration is introduced by using iteration index :math:`r`, for computed values,
on the right-hand side and :math:`r+1` (unknown in this iteration) on the left-hand
side:

.. math::
        
         u^{n+1,r+1}_{i,j} = (1+2F_x +2F_y)^{-1}\left(
        F_x
        (u^{n+1,r}_{i-1,j} + u^{n+1,r}_{i,j}) +
        F_y
        (u^{n+1,r}_{i,j-1} + u^{n+1,r}_{i,j+1})\right)
        \nonumber
        

.. _Eq:diffu:2D:Jacobi:

.. math::

    \tag{448}
    \qquad
        + u^n_{i,j} + \Delta t f^{n+1}_{i,j}
        
        
Initial guess
~~~~~~~~~~~~~

We start the iteration with the computed values at the previous time level:

.. _Eq:diffu:2D:iter:startvector:

.. math::

    \tag{449}
    u^{n+1,0}_{i,j} = u^{n}_{i,j},\quad i=0,\ldots,N_x,\ j=0,\ldots,N_y{\thinspace .}
        
        
Relaxation          (1)
~~~~~~~~~~~~~~~~~~~~~~~

.. index:: relaxation

A common technique in iterative methods is to introduce a *relaxation*,
which means that the new approximation is a weighted mean of the
approximation as suggested by the algorithm and the previous approximation.
Naming the quantity on the left-hand side of :ref:`(448) <Eq:diffu:2D:Jacobi>`
as :math:`u^{n+1,*}_{i,j}`, a new approximation based on relaxation reads

.. _Eq:diffu:2D:iter:relaxation:

.. math::

    \tag{450}
    u^{n+1,r+1} = \omega u^{n+1,*}_{i,j} + (1-\omega) u^{n+1,r}_{i,j}{\thinspace .}
        
        
Under-relaxation means :math:`\omega < 1`, while over-relaxation has
:math:`\omega > 1`.

Stopping criteria          (1)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The iteration can be stopped when the change from one iteration to the
next is sufficiently small (:math:`\epsilon`), using either an infinity norm,

.. _Eq:_auto194:

.. math::

    \tag{451}
    \max_{i,j}\left\vert u^{n+1,r+1}_{i,j}-u^{n+1,r}_{i,j}
        \right\vert \leq \epsilon,
        
        
or an :math:`L^2` norm,

.. _Eq:_auto195:

.. math::

    \tag{452}
    \left(\Delta x\Delta y\sum_{i,j} (u^{n+1,r+1}_{i,j}-u^{n+1,r}_{i,j})^2
        \right)^{\frac{1}{2}} \leq \epsilon{\thinspace .}
        
        
Another widely used criterion measures how well the equations are solved
by looking at the residual (essentially :math:`b-Ac^{r+1}` if :math:`c^{r+1}` is
the approximation to the solution in iteration :math:`r+1`).
The residual, defined in terms of the finite difference stencil, is

.. math::
        
        R_{i,j} = u^{n+1,r+1}_{i,j} -
        (F_x(u^{n+1,r+1}_{i-1,j} - 2u^{n+1,r+1}_{i,j} + u^{n+1,r+1}_{i,j}) +\nonumber
        

.. math::
          
        \quad\quad F_y(u^{n+1,r+1}_{i,j-1} - 2u^{n+1,r+1}_{i,j} + u^{n+1,r+1}_{i,j+1}))
        - \nonumber
        

.. _Eq:diffu:2D:residual:

.. math::

    \tag{453}
    \qquad u^n_{i,j} -  \Delta t f^{n+1}_{i,j}
        
        
One can then iterate until the norm of the mesh function :math:`R_{i,j}`
is less than some tolerance:

.. _Eq:_auto196:

.. math::

    \tag{454}
    \left(\Delta x\Delta y\sum_{i,j} R_{i,j}^2
        \right)^{\frac{1}{2}} \leq \epsilon{\thinspace .}
        
        
Code-friendly notation
~~~~~~~~~~~~~~~~~~~~~~

To make the mathematics as close as possible to what we will write in
a computer program, we may introduce some new notation: :math:`u_{i,j}` is a
short notation for :math:`u^{n+1,r+1}_{i,j}`, :math:`u^{-}_{i,j}` is a short
notation for :math:`u^{n+1,r}_{i,j}`, and :math:`u^{(s)}_{i,j}` denotes
:math:`u^{n+1-s}_{i,j}`.  That is, :math:`u_{i,j}` is the unknown, :math:`u^{-}_{i,j}`
is its most recently computed approximation, and :math:`s` counts time
levels backwards in time. The Jacobi method
:ref:`(448) <Eq:diffu:2D:Jacobi>`) takes the following form with the new
notation:

.. math::
        
         u^{*}_{i,j} = (1+2F_x +2F_y)^{-1}((
        F_x
        (u^{-}_{i-1,j} + u^{-}_{i,j}) +
        F_y
        (u^{n+1,r}_{i,j-1} + u^{n+1,r}_{i,j+1})) +
        \nonumber
        

.. _Eq:diffu:2D:Jacobi2:

.. math::

    \tag{455}
    \qquad
        u^{(1)}_{i,j} + \Delta t f^{n+1}_{i,j})
        
        
Generalization of the scheme
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

We can also quite easily introduce the :math:`\theta` rule for discretization in
time and write up the Jacobi iteration in that case as well:

.. math::
        
         u^{*}_{i,j} = (1+ 2\theta(F_x +F_y))^{-1}(\theta(
        F_x
        (u^{-}_{i-1,j} + u^{-}_{i,j}) +
        F_y
        (u^{-}_{i,j-1} + u^{-}_{i,j+1})) +
        \nonumber
        

.. math::
          \qquad
        u^{(1)}_{i,j} + \theta \Delta t f^{n+1}_{i,j}
        + (1-\theta)\Delta t f^n_{i,j} + \nonumber
        

.. _Eq:diffu:2D:Jacobi3:

.. math::

    \tag{456}
    \qquad (1-\theta)(
        F_x(u^{(1)}_{i-1,j}-2u^{(1)}_{i,j} + u^{(1)}_{i+1,j}) +
        F_y(u^{(1)}_{i,j-1}-2u^{(1)}_{i,j} + u^{(1)}_{i,j+1}))){\thinspace .}
        
        
The final update of :math:`u` applies relaxation:

.. math::
         u_{i,j} = \omega u^{*}_{i,j} + (1-\omega)u^{-}_{i,j}{\thinspace .}

.. _diffu:2D:Jacobi:impl:

Implementation of the Jacobi method
-----------------------------------

The Jacobi method needs no coefficient matrix and right-hand side
vector, but it needs an array for :math:`u` in the previous iteration.  We
call this array ``u_``, using the notation at the end of the previous
section (at the same time level). The unknown itself is called ``u``,
while ``u_n`` is the computed solution one time level back in time.
With a :math:`\theta` rule in time, the time loop can be coded like this:

.. code-block:: python

    for n in It[0:-1]:
        # Solve linear system by Jacobi iteration at time level n+1
        u_[:,:] = u_n  # Start value
        converged = False
        r = 0
        while not converged:
            if version == 'scalar':
                j = 0
                for i in Ix:
                    u[i,j] = U_0y(t[n+1])           # Boundary
                for j in Iy[1:-1]:
                    i = 0;   u[i,j] = U_0x(t[n+1])  # Boundary
                    i = Nx;  u[i,j] = U_Lx(t[n+1])  # Boundary
    		# Interior points
                    for i in Ix[1:-1]:
                        u_new = 1.0/(1.0 + 2*theta*(Fx + Fy))*(theta*(
                            Fx*(u_[i+1,j] + u_[i-1,j]) +
                            Fy*(u_[i,j+1] + u_[i,j-1])) + \ 
                        u_n[i,j] + \ 
                        (1-theta)*(Fx*(
                        u_n[i+1,j] - 2*u_n[i,j] + u_n[i-1,j]) +
                          Fy*(
                        u_n[i,j+1] - 2*u_n[i,j] + u_n[i,j-1]))\ 
                          + theta*dt*f(i*dx,j*dy,(n+1)*dt) + \ 
                        (1-theta)*dt*f(i*dx,j*dy,n*dt))
                        u[i,j] = omega*u_new + (1-omega)*u_[i,j]
                j = Ny
                for i in Ix:
                    u[i,j] = U_Ly(t[n+1])      # Boundary
    
            elif version == 'vectorized':
                j = 0;  u[:,j] = U_0y(t[n+1])  # Boundary
                i = 0;  u[i,:] = U_0x(t[n+1])  # Boundary
                i = Nx; u[i,:] = U_Lx(t[n+1])  # Boundary
                j = Ny; u[:,j] = U_Ly(t[n+1])  # Boundary
    	    # Internal points
                f_a_np1 = f(xv, yv, t[n+1])
                f_a_n   = f(xv, yv, t[n])
                u_new = 1.0/(1.0 + 2*theta*(Fx + Fy))*(theta*(Fx*(
                  u_[2:,1:-1] + u_[:-2,1:-1]) +
                    Fy*(
                  u_[1:-1,2:] + u_[1:-1,:-2])) +\ 
                u_n[1:-1,1:-1] + \ 
                  (1-theta)*(Fx*(
                  u_n[2:,1:-1] - 2*u_n[1:-1,1:-1] + u_n[:-2,1:-1]) +\ 
                    Fy*(
                  u_n[1:-1,2:] - 2*u_n[1:-1,1:-1] + u_n[1:-1,:-2]))\ 
                  + theta*dt*f_a_np1[1:-1,1:-1] + \ 
                  (1-theta)*dt*f_a_n[1:-1,1:-1])
                u[1:-1,1:-1] = omega*u_new + (1-omega)*u_[1:-1,1:-1]
            r += 1
            converged = np.abs(u-u_).max() < tol or r >= max_iter
            u_[:,:] = u
    
        # Update u_n before next step
        u_n, u = u, u_n

The vectorized version should be quite straightforward to understand
once one has an understanding of how a standard 2D finite stencil
is vectorized.

The first natural verification is to use the test problem
in the function ``quadratic`` from
the section :ref:`diffu:2D:verify`. This problem is known to have no
approximation error, but any iterative method will produce an
approximate solution with unknown error. For a tolerance :math:`10^{-k}`
in the iterative method, we can, e.g., use a slightly larger
tolerance :math:`10^{-(k-1)}`
for the difference between the exact and the computed solution.

.. code-block:: python

    def quadratic(theta, Nx, Ny):
        ...
        def assert_small_error(u, x, xv, y, yv, t, n):
            """Assert small error for iterative methods."""
            u_e = u_exact(xv, yv, t[n])
            diff = abs(u - u_e).max()
            tol = 1E-4
            msg = 'diff=%g, step %d, time=%g' % (diff, n, t[n])
            assert diff < tol, msg
    
        for version in 'scalar', 'vectorized':
            for theta in 1, 0.5:
                print 'testing Jacobi, %s version, theta=%g' % \ 
                      (version, theta)
                t, cpu = solver_Jacobi(
                    I=I, a=a, f=f, Lx=Lx, Ly=Ly, Nx=Nx, Ny=Ny,
                    dt=dt, T=T, theta=theta,
                    U_0x=0, U_0y=0, U_Lx=0, U_Ly=0,
                    user_action=assert_small_error,
                    version=version, iteration='Jacobi',
                    omega=1.0, max_iter=100, tol=1E-5)

Even for a very coarse :math:`4\times 4` mesh, the Jacobi method requires
26 iterations to reach a tolerance of :math:`10^{-5}`,
which is quite many iterations, given that there are only 25 unknowns.

.. _diffu:2D:Jacobi:impl:hill:

Test problem: diffusion of a sine hill
--------------------------------------

It can be shown that

.. _Eq:diffu:2D:Jacobi:impl:hill:uex:

.. math::

    \tag{457}
    {u_{\small\mbox{e}}} = Ae^{-{\alpha}\pi^2(L_x^{-2} + L_y^{-2})t}
        \sin\left(\frac{\pi}{L_x}x\right)\sin\left(\frac{\pi}{L_y}y\right),
        
        
is a solution of the 2D homogeneous diffusion equation
:math:`u_t = {\alpha}(u_{xx}+u_{yy})` in
a rectangle :math:`[0,L_x]\times [0,L_y]`, for any value of the amplitude :math:`A`.
This solution vanishes at the boundaries,
and the initial condition is the product of two sines.
We may choose :math:`A=1` for simplicity.

It is difficult to know if our solver based on the Jacobi method works
properly since we are faced with two sources of errors: one from the
discretization, :math:`E_\Delta`, and one from the iterative Jacobi method,
:math:`E_i`. The total error in the computed :math:`u` can be represented as

.. math::
         E_u = E_\Delta + E_i{\thinspace .}

One error measure is to look at the maximum value, which is obtained for
the midpoint :math:`x=L_x/2` and :math:`y=L_x/2`. This midpoint is represented in
the discrete ``u`` if :math:`N_x` and :math:`N_y` are even numbers. We can then
compute :math:`E_u` as :math:`E_u = |\max {u_{\small\mbox{e}}} - \max u|`, when we know an exact
solution :math:`{u_{\small\mbox{e}}}` of the problem.

What about :math:`E_\Delta`? If we use the maximum value as a measure of the
error, we have in fact analytical insight into the approximation error
in this particular problem. According to the section :ref:`diffu:2D:analysis`, the exact solution
:ref:`(457) <Eq:diffu:2D:Jacobi:impl:hill:uex>` of the PDE problem is also an
exact solution of the discrete equations, except that the damping
factor in time is different. More precisely,
:ref:`(414) <Eq:diffu:2D:analysis:BN:numexact>` and
:ref:`(415) <Eq:diffu:2D:analysis:CN:numexact>` are solutions of the discrete
problem for :math:`\theta=1` (Backward Euler) and :math:`\theta=\frac{1}{2}`
(Crank-Nicolson), respectively.  The factors raised to the power :math:`n`
is the numerical amplitude, and the errors in these factors become

.. math::
        \begin{align*}
        E_\Delta &= e^{-{\alpha} k^2t} - \left(
        \frac{1 - 2(F_x\sin^2 p_x + F_x\sin^2p_y)}{1 + 2(F_x\sin^2 p_x + F_x\sin^2p_y)}
        \right)^n,\quad \theta=\frac{1}{2},\\ 
        E_\Delta &= e^{-{\alpha} k^2t} -
        (1 + 4F_x\sin^2 p_x + 4F_y\sin^2 p_y)^{-n},\quad\theta=1{\thinspace .}
        \end{align*}

We are now in a position to compute :math:`E_i` numerically. That is, we can
compute the error due to iterative solution of the linear system and
see if it corresponds to the convergence tolerance used in the method.
Note that the convergence is based on measuring the difference in
two consecutive approximations, which is not exactly the error
due to the iteration, but it is a kind of measure, and it should
have about the same size as :math:`E_i`.

The function ``demo_classic_iterative`` in `diffu2D_u0.py <http://tinyurl.com/nu656p2/diffu/diffu2D_u0.py>`__ implements the idea above (also for the
methods in the section :ref:`diffu:2D:SOR`). The value of :math:`E_i` is in
particular printed at each time level. By changing the tolerance in
the convergence criterion in the Jacobi method, we can see that :math:`E_i`
is of the same order of magnitude as the prescribed tolerance in the
Jacobi method.  For example: :math:`E_\Delta\sim 10^{-2}` with :math:`N_x=N_y=10`
and :math:`\theta=\frac{1}{2}`, as long as :math:`\max u` has some significant size
(:math:`\max u > 0.02`). An appropriate value of the tolerance is then
:math:`10^{-3}`, such that the error in the Jacobi method does not become
bigger than the discretization error. In that case, :math:`E_i` is around
:math:`5\cdot 10^{-3}`.  The corresponding number of Jacobi iterations (with
:math:`\omega=1`) varies from 31 to 12 during the time simulation (for :math:`\max
u > 0.02`). Changing the tolerance to :math:`10^{-5}` causes many more
iterations (61 to 42) without giving any contribution to the overall
accuracy, because the total error is dominated by :math:`E_\Delta`.

Also, with an :math:`N_x=N_y=20`, the spatial accuracy increases and many
more iterations are needed (143 to 45), but the dominating error is
from the time discretization. However, with such a finer spatial mesh,
a higher tolerance in the convergence criterion :math:`10^{-4}` is needed to
keep :math:`E_i\sim 10^{-3}`.  More experiments show the disadvantage of the
very simple Jacobi iteration method: the number of iterations
increases with the number of unknowns, keeping the tolerance fixed,
but the tolerance should also be lowered to avoid the iteration error
to dominate the total error. A small adjustment of the Jacobi method,
as described in the section :ref:`diffu:2D:SOR`, provides a better method.

.. _diffu:2D:Jacobi_vs_FE:

The relaxed Jacobi method and its relation to the Forward Euler method
----------------------------------------------------------------------

We shall now show that solving the Poisson equation :math:`-{\alpha}\nabla^2 u =
f` by the Jacobi iterative method is in fact equivalent to using a
Forward Euler scheme on :math:`u_t = {\alpha}\nabla^2 u + f` and letting
:math:`t\rightarrow\infty`.

A Forward Euler discretization of the 2D diffusion equation,

.. math::
         \lbrack D_t^+ u = {\alpha} (D_xD_x u + D_yD_y u) + f\rbrack^n_{i,j},

can be written out as

.. math::
         u_{i,j}^{n+1} = u_{i,j}^n + \frac{\Delta t}{{\alpha} h^2}
        \left( u_{i-1,j}^n + u_{i+1,j}^n + u_{i,j-1}^n + u_{i,j+1}^n - 4u_{i,j}^n
        + h^2f_{i,j}\right),

where :math:`h=\Delta x = \Delta y` has been introduced for simplicity. The
scheme can be reordered as

.. math::
         u_{i,j}^{n+1} = \left(1 - \omega\right) u_{i,j}^n
        + \frac{1}{4}\omega
        \left( u_{i-1,j}^n + u_{i+1,j}^n + u_{i,j-1}^n + u_{i,j+1}^n - 4u_{i,j}^n
        + h^2f_{i,j}\right),

with

.. math::
         \omega = 4\frac{\Delta t}{{\alpha} h^2},

but this latter form is nothing but the relaxed Jacobi method applied to

.. math::
         [D_xD_x u + D_yD_y u = -f]^n_{i,j}{\thinspace .}

From the equivalence above we know a couple of things about the Jacobi
method for solving :math:`-\nabla^2 u = f`:

1. The method is unstable if :math:`\omega > 1` (since the Forward Euler method
   is then unstable).

2. The convergence is really slow as the iteration index increases (coming from the fact that
   the Forward Euler scheme requires many small time steps to reach the stationary solution).

These observations are quite disappointing: if we already have a
time-dependent diffusion problem and want to take larger time steps by
an implicit time discretization method, we will with the Jacobi method
end up with something close to a slow Forward Euler simulation of the
original problem at each time level.  Nevertheless, the are two
reasons for why the Jacobi method remains a fundamental building block
for solving linear systems arising from PDEs: 1) a couple of
iterations remove large parts of the error and this is effectively
used in the very efficient class of multigrid methods; and 2) the idea
of the Jacobi method can be developed into more efficient methods,
especially the SOR method, which is treated next.

.. _diffu:2D:SOR:

The Gauss-Seidel and SOR methods
--------------------------------

.. index:: Gauss-Seidel method

If we update the mesh points according to the Jacobi method
:ref:`(447) <Eq:diffu:2D:Jacobi0>` for a Backward Euler discretization with a
loop over :math:`i=1,\ldots,N_x-1` and :math:`j=1,\ldots,N_y-1`, we realize that
when :math:`u^{n+1,r+1}_{i,j}` is computed, :math:`u^{n+1,r+1}_{i-1,j}` and
:math:`u^{n+1,r+1}_{i,j-1}` are already computed, so these new values can be
used rather than :math:`u^{n+1,r}_{i-1,j}` and :math:`u^{n+1,r}_{i,j-1}`
(respectively) in the formula for :math:`u^{n+1,r+1}_{i,j}`.  This idea
gives rise to the *Gauss-Seidel* iteration method, which
mathematically is just a small adjustment of :ref:`(447) <Eq:diffu:2D:Jacobi0>`:

.. math::
        
         u^{n+1,r+1}_{i,j} = (1+2F_x +2F_y)^{-1}((\nonumber
        

.. _Eq:diffu:2D:SOR:eq:

.. math::

    \tag{458}
    \qquad
        F_x
        (u^{n+1,r+1}_{i-1,j} + u^{n+1,r}_{i,j}) +
        F_y
        (u^{n+1,r+1}_{i,j-1} + u^{n+1,r}_{i,j+1})) +
        u^n_{i,j} + \Delta t f^{n+1}_{i,j}){\thinspace .}
        
        
Observe that the way we access the mesh points in the formula
:ref:`(458) <Eq:diffu:2D:SOR:eq>` is important: points with :math:`i-1` must be
computed before points with :math:`i`, and points with :math:`j-1` must be
computed before points with :math:`j`. Any sequence of mesh points can be
used in the Gauss-Seidel method, but the particular math formula must
distinguish between already visited points in the current iteration
and the points not yet visited.

.. index:: SOR method

The idea of relaxation :ref:`(450) <Eq:diffu:2D:iter:relaxation>` can equally
well be applied to the Gauss-Seidel method. Actually, the Gauss-Seidel
method with an arbitrary :math:`0<\omega\leq 2` has its own name: the
*Successive Over-Relaxation* method, abbreviated as SOR.

The SOR method for a :math:`\theta` rule discretization, with the
shortened :math:`u` and :math:`u^{-}` notation, can be written

.. math::
        
         u^{*}_{i,j} = (1+ 2\theta(F_x +F_y))^{-1}(\theta(
        F_x
        (u_{i-1,j} + u^{-}_{i,j}) +
        F_y
        (u_{i,j-1} + u^{-}_{i,j+1})) +
        \nonumber
        

.. math::
          \qquad
        u^{(1)}_{i,j} + \theta \Delta t f^{n+1}_{i,j}
        + (1-\theta)\Delta t f^n_{i,j} + \nonumber
        

.. _Eq:diffu:2D:SOR3:

.. math::

    \tag{459}
    \qquad (1-\theta)(
        F_x(u^{(1)}_{i-1,j}-2u^{(1)}_{i,j} + u^{(1)}_{i+1,j}) +
        F_y(u^{(1)}_{i,j-1}-2u^{(1)}_{i,j} + u^{(1)}_{i,j+1}))),
        
        
.. _Eq:_auto197:

.. math::

    \tag{460}
    u_{i,j} = \omega u^{*}_{i,j} + (1-\omega)u^{-}_{i,j}
        
        
The sequence of mesh points in :ref:`(459) <Eq:diffu:2D:SOR3>` is
:math:`i=1,\ldots,N_x-1`, :math:`j=1,\ldots,N_y-1` (but whether :math:`i` runs faster
or slower than :math:`j` does not matter).

.. _diffu:2D:SOR:impl:scalar:

Scalar implementation of the SOR method
---------------------------------------

Since the Jacobi and Gauss-Seidel methods with relaxation
are so similar, we can easily make a common code for the two:

.. code-block:: python

    for n in It[0:-1]:
        # Solve linear system by Jacobi/SOR iteration at time level n+1
        u_[:,:] = u_n  # Start value
        converged = False
        r = 0
        while not converged:
            if version == 'scalar':
                if iteration == 'Jacobi':
                    u__ = u_
                elif iteration == 'SOR':
                    u__ = u
                j = 0
                for i in Ix:
                    u[i,j] = U_0y(t[n+1])  # Boundary
                for j in Iy[1:-1]:
                    i = 0;   u[i,j] = U_0x(t[n+1])  # Boundary
                    i = Nx;  u[i,j] = U_Lx(t[n+1])  # Boundary
                    for i in Ix[1:-1]:
                        u_new = 1.0/(1.0 + 2*theta*(Fx + Fy))*(theta*(
                            Fx*(u_[i+1,j] + u__[i-1,j]) +
                            Fy*(u_[i,j+1] + u__[i,j-1])) + \ 
                        u_n[i,j] + (1-theta)*(
                          Fx*(
                        u_n[i+1,j] - 2*u_n[i,j] + u_n[i-1,j]) +
                          Fy*(
                        u_n[i,j+1] - 2*u_n[i,j] + u_n[i,j-1]))\ 
                          + theta*dt*f(i*dx,j*dy,(n+1)*dt) + \ 
                        (1-theta)*dt*f(i*dx,j*dy,n*dt))
                        u[i,j] = omega*u_new + (1-omega)*u_[i,j]
                    j = Ny
                    for i in Ix:
                        u[i,j] = U_Ly(t[n+1])  # boundary
            r += 1
            converged = np.abs(u-u_).max() < tol or r >= max_iter
            u_[:,:] = u
    
        u_n, u = u, u_n  # Get ready for next iteration

The idea here is to introduce ``u__`` to be used for already computed
values (``u``) in the Gauss-Seidel/SOR version of the implementation, or
just values from the previous iteration (``u_``) in case of the Jacobi method.

.. _diffu:2D:SOR:impl:vectorized:

Vectorized implementation of the SOR method
-------------------------------------------

Vectorizing the Gauss-Seidel iteration step turns out to be non-trivial.
The problem is that vectorized operations typically imply
operations on arrays where the sequence we visit the elements in does
not matter. In particular, this principle makes vectorized code trivial to
parallelize. However, in the Gauss-Seidel algorithm the sequence we
visit the elements in the arrays does matter, and it is well known that
the basic method as explained above cannot be parallelized.
Therefore, also vectorization will require new thinking.

.. index:: red-black numbering

The strategy for vectorizing (and parallelizing) the Gauss-Seidel
method is to use a special numbering of the mesh points called
red-black numbering: every other point is red or black as in a
checkerboard pattern. This numbering requires :math:`N_x` and :math:`N_y` to
be even numbers.
Here is an example of a :math:`6\times 6` mesh:

.. code-block:: text

    r b r b r b r
    b r b r b r b
    r b r b r b r
    b r b r b r b
    r b r b r b r
    b r b r b r b
    r b r b r b r

The idea now is to first update all the red points. Each formula for
updating a red point involves only the black neighbors. Thereafter, we
update all the black points, and at each black point, only the
recently computed red points are involved.

The scalar implementation of the red-black numbered Gauss-Seidel
method is really compact, since we can update values directly in
``u`` (that guarantees that we use the most recently computed
values). Here is the relevant code for the Backward Euler
scheme in time and without a source term:

.. code-block:: python

    # Update internal points
    for sweep in 'red', 'black':
        for j in range(1, Ny, 1):
            if sweep == 'red':
                start = 1 if j % 2 == 1 else 2
            elif sweep == 'black':
                start = 2 if j % 2 == 1 else 1
            for i in range(start, Nx, 2):
    	    u[i,j] = 1.0/(1.0 + 2*(Fx + Fy))*(
                         Fx*(u[i+1,j] + u[i-1,j]) +
                         Fy*(u[i,j+1] + u[i,j-1]) + u_n[i,j])

The vectorized version must be based on slices. Looking at a typical
red-black pattern, e.g.,

.. code-block:: text

    r b r b r b r
    b r b r b r b
    r b r b r b r
    b r b r b r b
    r b r b r b r
    b r b r b r b
    r b r b r b r

we want to update the internal points (marking boundary points with
``x``):

.. code-block:: text

    x x x x x x x
    x r b r b r x
    x b r b r b x
    x r b r b r x
    x b r b r b x
    x r b r b r x
    x x x x x x x

It is impossible to make one slice that picks out all the internal
red points. Instead, we need two slices. The first involves points
marked with ``R``:

.. code-block:: text

    x x x x x x x
    x R b R b R x
    x b r b r b x
    x R b R b R x
    x b r b r b x
    x R b R b R x
    x x x x x x x

This slice is specified as ``1::2`` for ``i`` and ``1::2`` for ``j``, or with
``slice`` objects:

.. code-block:: python

    i = slice(1, None, 2);  j = slice(1, None, 2)

The second slice involves the red points with ``R``:

.. code-block:: text

    x x x x x x x
    x r b r b r x
    x b R b R b x
    x r b r b r x
    x b R b R b x
    x r b r b r x
    x x x x x x x

The slices are

.. code-block:: python

    i = slice(2, None, 2);  j = slice(2, None, 2)

For the black points, the first slice involves the ``B`` points:

.. code-block:: text

    x x x x x x x
    x r B r B r x
    x b r b r b x
    x r B r B r x
    x b r b r b x
    x r B r B r x
    x x x x x x x

with slice objects

.. code-block:: python

    i = slice(2, None, 2);  j = slice(1, None, 2)

The second set of black points is shown here:

.. code-block:: text

    x x x x x x x
    x r b r b r x
    x B r B r B x
    x r b r b r x
    x B r B r B x
    x r b r b r x
    x x x x x x x

with slice objects

.. code-block:: python

    i = slice(1, None, 2);  j = slice(2, None, 2)

That is, we need four sets of slices. The simplest way of implementing
the algorithm is to make a function with variables for the slices
representing :math:`i`, :math:`i-1`, :math:`i+1`, :math:`j`, :math:`j-1`, and :math:`j+1`, here called
``ic`` ("i center"), ``im1`` ("i minus 1", ``ip1`` ("i plus 1"), ``jc``, ``jm1``,
and ``jp1``, respectively.

.. code-block:: python

    def update(u_, u_n, ic, im1, ip1, jc, jm1, jp1):
        return \ 
           1.0/(1.0 + 2*theta*(Fx + Fy))*(theta*(
               Fx*(u_[ip1,jc] + u_[im1,jc]) +
               Fy*(u_[ic,jp1] + u_[ic,jm1])) +\ 
           u_n[ic,jc] + (1-theta)*(
             Fx*(u_n[ip1,jc] - 2*u_n[ic,jc] + u_n[im1,jc]) +\ 
             Fy*(u_n[ic,jp1] - 2*u_n[ic,jc] + u_n[ic,jm1]))+\ 
             theta*dt*f_a_np1[ic,jc] + \ 
             (1-theta)*dt*f_a_n[ic,jc])

The formula returned from ``update`` is to be compared with
:ref:`(459) <Eq:diffu:2D:SOR3>`.

The relaxed Jacobi iteration can be implemented by

.. code-block:: python

    ic  = jc  = slice(1,-1)
    im1 = jm1 = slice(0,-2)
    ip1 = jp1 = slice(2,None)
    u_new[ic,jc] = update(
        u_, u_n, ic, im1, ip1, jc, jm1, jp1)
    u[ic,jc] = omega*u_new[ic,jc] + (1-omega)*u_[ic,jc]

The Gauss-Seidel (or SOR) updates need four different steps.
The ``ic`` and ``jc`` slices are specified above. For each of these,
we must specify the corresponding ``im1``, ``ip1``, ``jm1``, and ``jp1``
slices. The code below contains the details.

.. code-block:: python

    # Red points
    ic  = slice(1,-1,2)
    im1 = slice(0,-2,2)
    ip1 = slice(2,None,2)
    jc  = slice(1,-1,2)
    jm1 = slice(0,-2,2)
    jp1 = slice(2,None,2)
    u_new[ic,jc] = update(
        u_new, u_n, ic, im1, ip1, jc, jm1, jp1)
    
    ic  = slice(2,-1,2)
    im1 = slice(1,-2,2)
    ip1 = slice(3,None,2)
    jc  = slice(2,-1,2)
    jm1 = slice(1,-2,2)
    jp1 = slice(3,None,2)
    u_new[ic,jc] = update(
        u_new, u_n, ic, im1, ip1, jc, jm1, jp1)
    
    # Black points
    ic  = slice(2,-1,2)
    im1 = slice(1,-2,2)
    ip1 = slice(3,None,2)
    jc  = slice(1,-1,2)
    jm1 = slice(0,-2,2)
    jp1 = slice(2,None,2)
    u_new[ic,jc] = update(
        u_new, u_n, ic, im1, ip1, jc, jm1, jp1)
    
    ic  = slice(1,-1,2)
    im1 = slice(0,-2,2)
    ip1 = slice(2,None,2)
    jc  = slice(2,-1,2)
    jm1 = slice(1,-2,2)
    jp1 = slice(3,None,2)
    u_new[ic,jc] = update(
        u_new, u_n, ic, im1, ip1, jc, jm1, jp1)
    
    # Relax
    c = slice(1,-1)
    u[c,c] = omega*u_new[c,c] + (1-omega)*u_[c,c]

The function ``solver_classic_iterative`` in
`diffu2D_u0.py <http://tinyurl.com/nu656p2/diffu/diffu2D_u0.py>`__
contains a unified implementation of the relaxed Jacobi and SOR
methods in scalar and vectorized versions using the techniques
explained above.

.. _diffu:2D:direct_vs_iter:

Direct versus iterative methods
-------------------------------

Direct methods
~~~~~~~~~~~~~~

There are two classes of methods for solving linear systems: direct methods
and iterative methods. Direct methods are based on variants of the
Gaussian elimination procedure and will produce an exact solution (in
exact arithmetics) in an a priori known number of steps.
Iterative methods, on the other hand, produce an approximate solution,
and the amount of work for reaching a given accuracy is usually not
known.

.. index:: LU factorization

.. index:: Cholesky factorization

The most common direct method today is to use the *LU factorization*
procedure to factor the coefficient matrix :math:`A` as the product of a
lower-triangular matrix :math:`L` (with unit diagonal terms)
and an upper-triangular matrix :math:`U`:
:math:`A=LU`. As soon as we have :math:`L` and :math:`U`, a system of equations :math:`LUc=b`
is easy to solve because of the triangular nature of :math:`L` and :math:`U`. We
first solve :math:`Ly=b` for :math:`y` (forward substitution), and
thereafter we find :math:`c` from solving :math:`Uc=y` (backward
substitution).  When :math:`A` is a dense :math:`N\times N` matrix, the LU
factorization costs :math:`\frac{1}{3}N^3` arithmetic operations, while the
forward and backward substitution steps each require of
the order :math:`N^2` arithmetic operations.
That is, factorization dominates the costs, while the substitution steps are
cheap.

Symmetric, positive definite coefficient matrices often arise when
discretizing PDEs. In this case, the LU factorization becomes :math:`A=LL^T`,
and the associated algorithm is known as *Cholesky factorization*.
Most linear algebra software offers highly optimized implementations of
LU and Cholesky factorization as well as forward and backward
substitution (``scipy.linalg`` is the relevant Python package).

Finite difference discretizations lead to sparse coefficient matrices.
An extreme case arose in the section :ref:`diffu:pde1:BE` where :math:`A` was
tridiagonal. For a tridiagonal matrix, the amount of arithmetic
operations in the LU and Cholesky factorization algorithms is just of
the order :math:`N`, not :math:`N^3`. Tridiagonal matrices are special cases of
*banded matrices*, where the matrices contain just a set of diagonal
bands.  Finite difference methods on regularly numbered rectangular
and box-shaped meshes give rise to such banded matrices, with 5 bands
in 2D and 7 in 3D for diffusion problems.  Gaussian elimination only
needs to work within the bands, leading to much more efficient
algorithms.

If :math:`A_{i,j}=0` for :math:`j> i+p` and :math:`j< i-p`, :math:`p` is the *half-bandwidth*
of the matrix. We have in our 2D problem :math:`p=N_x+2`, while in 3D,
:math:`p=(N_x+1)(N_y+1)+2`. The cost of Gaussian elimination is then
:math:`{\mathcal{O}(Np^2)}`, so with :math:`p\ll N`, we see that banded matrices are much
more efficient to compute with. By reordering the unknowns in clever
ways, one can reduce the work of Gaussian elimination
further. Fortunately, the Python programmer has access to such
algorithms through the ``scipy.sparse.linalg`` package.

Although a direct method is an exact algorithm, rounding errors
may in practice accumulate and pollute the solution. The effect
grows with the size of the linear system, so both for accuracy and
efficiency, iterative methods are better suited than direct methods
for solving really large linear systems.

Iterative methods
~~~~~~~~~~~~~~~~~

The Jacobi and SOR iterative methods belong to a class of iterative methods
where the idea is to solve :math:`Au=b` by splitting A into two parts, :math:`A=M-N`,
such that solving systems :math:`Mu=c` is easy and efficient. With the splitting,
we get a system

.. math::
         Mu = Nu + b,

which suggests an iterative method

.. math::
         Mu^{r+1} = Nu^{r} + b,\quad r=0,1,2,\ldots,

where :math:`u^{r+1}` is a new approximation to :math:`u` in the :math:`r+1`-th iteration. To
initiate the iteration, we need a start vector :math:`u^0`.

The Jacobi and SOR methods are based on splitting :math:`A` into a lower
tridiagonal part :math:`L`, the diagonal :math:`D`, and an upper tridiagonal part :math:`U`,
such that :math:`A=L+D+U`. The Jacobi method corresponds to :math:`M=D` and :math:`N=-L-U`.
The Gauss-Seidel method employs :math:`M=L+D` and :math:`N=-U`, while the SOR
method corresponds to

.. math::
         M= \frac{1}{\omega}D + L,\quad N = \frac{1-\omega}{\omega}D - U{\thinspace .}

The relaxed Jacobi method has similar expressions:

.. math::
         M = \frac{1}{\omega}D,\quad N = \frac{1-\omega}{\omega}D - L - U{\thinspace .}

With the matrix forms of the Jacobi and SOR methods as written above,
we could in an implementation alternatively fill the matrix :math:`A` with
entries and call general implementations of the Jacobi or SOR methods
that work on a system :math:`Au=b`. However, this is almost never done since
forming the matrix :math:`A` requires quite some code and storing :math:`A` in the
computer's memory is unnecessary. It is much easier to just apply the
Jacobi and SOR ideas to the finite difference stencils directly in an
implementation, as we have shown in detail.

Nevertheless, the matrix formulation of the Jacobi and SOR methods have
been important for analyzing their convergence behavior. One can show that
the error :math:`u^r-u` fulfills :math:`u^r-u = G^r(u^0-u)`, where :math:`G=M^{-1}N`
and :math:`G^k` is a matrix exponential. For the method to converge,
:math:`\lim_{r\rightarrow\infty}||G^r||=0` is a necessary and sufficient condition.
This implies that the *spectral radius* of :math:`G` must be less than one.
Since :math:`G` is directly related to the finite difference scheme for the
underlying PDE problem, one can in principle compute the spectral radius.
For a given PDE problem, however, this is not a practical strategy, since it
is very difficult to develop useful formulas.
Analysis of model problems, usually
related to the Poisson equation,
reveals some trends of interest: the convergence rate of the Jacobi
method goes like :math:`h^2`, while that of SOR with an optimal :math:`\omega` goes
like :math:`h`, where :math:`h` is the spatial spacing: :math:`h=\Delta x=\Delta y`.
That is, the efficiency of the Jacobi method quickly deteriorates with
the increasing mesh resolution, and SOR is much to be preferred
(even if the optimal
:math:`\omega` remains an open question). We refer to
Chapter 4 of [Ref10]_ for more information on the convergence theory.
One important result is that if :math:`A` is symmetric and positive definite,
then SOR will converge for any :math:`0<\omega <2`.

The optimal :math:`\omega` parameter can be theoretically established for
a Poisson problem as

.. _Eq:_auto198:

.. math::

    \tag{461}
    \omega_{o} = \frac{2}{1 + \sqrt{1-\varrho^2}},\quad \varrho = \frac{\cos(\pi/N_x) + (\Delta x/\Delta y)^2\cos(\pi/N_y)}{1 + (\Delta x/\Delta y)^2}{\thinspace .}
        
        
This formula can be used as a guide also in other problems.

The Jacobi and the SOR methods have their great advantage of being
trivial to implement, so they are obviously popular of this reason.
However, the slow convergence of these methods limits the popularity to fairly
small linear systems (i.e., coarse meshes). As soon as the matrix size
grows, one is better off with more sophisticated iterative methods
like the preconditioned Conjugate gradient method, which we now turn to.

Finally, we mention that there is a variant of the SOR method, called
Symmetric Successive Overrelaxation method, known as SSOR, where
one runs a standard SOR sweep through the mesh points and then a
new sweep but visiting the points in reverse order.

.. _diffu:2D:CG:

The Conjugate gradient method
-----------------------------

.. index:: conjugate gradient method

There is no simple intuitive derivation of the Conjugate gradient
method, so we refer to the many excellent expositions in the
literature for the idea of the method and how the algorithm is
derived. In particular, we recommend the books
[Ref11]_ [Ref12]_ [Ref10]_ [Ref13]_.
A brief overview is
provided in the `Wikipedia article <https://en.wikipedia.org/wiki/Conjugate_gradient_method>`__. Here, we
just state the pros and cons of the method from a user's perspective
and how we utilize it in code.

The original Conjugate gradient method is limited to linear systems :math:`Au=b`,
where :math:`A` is a symmetric and positive definite matrix. There are, however,
extensions of the method to non-symmetric matrices.

.. , so when we use the

.. term *Conjugate gradient method* hereafter, we usually mean the family of

.. related methods that can be applied

.. to most matrix systems arising from discretizing

.. PDEs. When we need to distinuish between methods for

.. the symmetric and non-symmetric

.. cases, we use the terms *original Conjugate gradient method* and

.. *conjugate gradient-like methods*, respectively.

.. index:: preconditioning

A major advantage of all conjugate gradient methods is that the matrix
:math:`A` is only used in matrix-vector products, so we do not need form and
store :math:`A` if we can provide code for computing a matrix-vector product
:math:`Au`.  Another important feature is that the algorithm is very easy to
vectorize and parallelize.  The primary downside of the method is that
it convergences slowly unless one has an effective *preconditioner*
for the system. That is, instead of solving :math:`Au=b`, we try to solve
:math:`M^{-1}Au=M^{-1}b` in the hope that the method works better for this
*preconditioned* system. The matrix :math:`M` is the *preconditioner* or
preconditioning matrix.  Now we need to perform matrix-vector products
:math:`y = M^{-1}Au`, which is done in two steps: first the matrix-vector
product :math:`v=Au` is carried out and then the system :math:`My=v` must be
solved.  Therefore, :math:`M` must be cheap to compute and systems :math:`My=v`
must be cheap to solve.

A perfect preconditioner is :math:`M=A`, but in each iteration in the
Conjugate gradient method one then has so solve a system with :math:`A` as
coefficient matrix! A key idea is to let :math:`M` be some kind of *cheap
approximation* to :math:`A`.  The simplest preconditioner is to set :math:`M=D`,
where :math:`D` is the diagonal of :math:`A`. This choice means running one Jacobi
iteration as preconditioner. :ref:`diffu:exer:splitting_prec`
shows that the Jacobi and SOR methods can also be viewed as
preconditioners.

Constructing good preconditioners is a scientific field on its
own. Here we shall treat the topic just very briefly. For a user
having access to the ``scipy.sparse.linalg`` library, there are
Conjugate gradient methods and preconditioners readily available:

 * For positive definite, symmetric systems: ``cg`` (the Conjugate gradient method)

 * For symmetric systems: ``minres`` (Minimum residual method)

 * For non-symmetric systems:

  * ``gmres`` (GMRES: Generalized minimum residual method)

  * ``bicg`` (BiConjugate gradient method)

  * ``bicgstab`` (Stabilized BiConjugate gradient method)

  * ``cgs`` (Conjugate gradient squared method)

  * ``qmr`` (Quasi-minimal residual iteration)

 * Preconditioner: ``spilu`` (Sparse, incomplete LU factorization)

The ILU preconditioner is an attractive all-round type of
preconditioner that is suitable for most problems on serial
computers. A more efficient preconditioner is the multigrid method,
and algebraic multigrid is also an all-round choice as
preconditioner. The Python package `PyAMG <https://github.com/pyamg/pyamg>`__ offers efficient implementations of
the algebraic multigrid method, to be used both as a preconditioner
and as a stand-alone iterative method.

The matrix arising from implicit time discretization methods of the
diffusion equation is symmetric and positive definite so we can use
the Conjugate gradient method (``cg``), typically in combination with an
ILU preconditioner.  The code is very similar to the one we created
when solving the linear system by sparse Gaussian elimination, the
main difference is that we now allow for calling up the Conjugate
gradient function as an alternative solver.

.. code-block:: python

    def solver_sparse(
        I, a, f, Lx, Ly, Nx, Ny, dt, T, theta=0.5,
        U_0x=0, U_0y=0, U_Lx=0, U_Ly=0, user_action=None,
        method='direct', CG_prec='ILU', CG_tol=1E-5):
        """
        Full solver for the model problem using the theta-rule
        difference approximation in time. Sparse matrix with
        dedicated Gaussian elimination algorithm (method='direct')
        or ILU preconditioned Conjugate Gradients (method='CG' with
        tolerance CG_tol and preconditioner CG_prec ('ILU' or None)).
        """
        # Set up data structures as shown before
    
        # Precompute sparse matrix
        ...
    
        A = scipy.sparse.diags(
            diagonals=[main, lower, upper, lower2, upper2],
            offsets=[0, -lower_offset, lower_offset,
                     -lower2_offset, lower2_offset],
            shape=(N, N), format='csc')
    
        if method == 'CG':
            if CG_prec == 'ILU':
                # Find ILU preconditioner (constant in time)
                A_ilu = scipy.sparse.linalg.spilu(A)  # SuperLU defaults
                M = scipy.sparse.linalg.LinearOperator(
                    shape=(N, N), matvec=A_ilu.solve)
            else:
                M = None
            CG_iter = []  # No of CG iterations at time level n
    
        # Time loop
        for n in It[0:-1]:
            # Compute b, vectorized version
    
            # Solve matrix system A*c = b
            if method == 'direct':
                c = scipy.sparse.linalg.spsolve(A, b)
            elif method == 'CG':
                x0 = u_n.T.reshape(N)  # Start vector is u_n
                CG_iter.append(0)
    
                def CG_callback(c_k):
                    """Trick to count the no of iterations in CG."""
                    CG_iter[-1] += 1
    
                c, info = scipy.sparse.linalg.cg(
                    A, b, x0=x0, tol=CG_tol, maxiter=N, M=M,
                    callback=CG_callback)
    
            # Fill u with vector c
            # Update u_n before next step
            u_n, u = u, u_n

The number of iterations in the Conjugate gradient method is of interest,
but is unfortunately not available from the ``cg`` function. Therefore,
we perform a trick: in each iteration a user function ``CG_callback``
is called where we accumulate the number of iterations in a list
``CG_iter``.

What is the recommended method for solving linear systems?
----------------------------------------------------------

There is no clear answer to this question. If you have enough memory
and computing time available, direct methods such as ``spsolve`` are
to be preferred since they are easy to use and finds almost an exact
solution. However, in larger 2D and in 3D problems, direct methods
usually run too slowly or required too much memory, so one is forced
to use iterative methods. The fastest and most reliable methods are
in the Conjugate Gradient family, but these requires suitable
preconditioners. ILU is an all-round preconditioner, but it is not
suited for parallel computing. The Jacobi and SOR iterative methods
are easy to implement, and popular of that reason, but run slowly. Jacobi
iteration is not an option in real problems, but SOR may be.

.. As we have showed, Jacobi is equivalent to running a Forward Euler scheme

.. until the stationary state is reached

.. _diffu:randomwalk:

Random walk
===========

.. index:: random walk

Models leading to diffusion equations, see the section :ref:`diffu:app`, are
usually based on reasoning with *averaged* physical quantities such as
concentration, temperature, and velocity.  The underlying physical
processes involve complicated microscopic movement of atoms and
molecules, but an average of a large number of molecules is performed
in a small volume before the modeling starts, and the averaged
quantity inside this volume is assigned as a point value at the
centroid of the volume. This means that concentration, temperature,
and velocity at a space-time point represent averages around the
point in a small time interval and small spatial volume.

Random walk is a principally different kind of modeling procedure
compared to the reasoning behind partial differential equations.  The
idea in random walk is to have a large number of "particles" that
undergo random movements. Averaging can then be used afterwards to
compute macroscopic quantities like concentration. The "particles"
and their random movement represent a very simplified microscopic
behavior of molecules, much simpler and computationally much more
efficient than direct `molecular simulation <https://en.wikipedia.org/wiki/Molecular_dynamics>`__, yet the random
walk model has been very powerful to describe a wide range of
phenomena, including heat conduction, quantum mechanics, polymer
chains, population genetics, neuroscience, hazard games, and pricing
of financial instruments.

It can be shown that random walk, when averaged, produces models that
are mathematically equivalent to diffusion equations. This is the
primary reason why we treat random walk in this chapter: two very
different algorithms (finite difference stencils and random walk)
solve the same type of problems. The simplicity of the random walk
algorithm makes it particularly attractive for solving diffusion
equations on massively parallel computers.
The exposition here is as simple as possible, and good thorough
derivation of the models is provided by Hjorth-Jensen [Ref14]_.

.. _diffu:randomwalk:1D:

Random walk in 1D
-----------------

Imagine that we have some particles that perform random moves, either
to the right or to the left. We may flip a coin to decide the movement
of each particle, say head implies movement to the right and tail
means movement to the left. Each move is one unit length.  Physicists
use the term *random walk* for this type of movement.
The movement is also known as `drunkard's walk <https://en.wikipedia.org/wiki/The_Drunkard%27s_Walk>`__.
You may try this yourself: flip the coin and make one step to the left
or right, and repeat the process.

We introduce the symbol :math:`N` for the number of steps in a random walk.
Figure :ref:`diffu:randomwalk:1D:fig:ensemble` shows four different
random walks with :math:`N=200`.

.. _diffu:randomwalk:1D:fig:ensemble:

.. figure:: rw1D_ensemble4.png
   :width: 800

   *Ensemble of 4 random walks, each with 200 steps*

.. _diffu:randomwalk:1D:EVar:

Statistical considerations
--------------------------

Let :math:`S_k` be the stochastic variable representing a step to the left
or to the right in step number :math:`k`. We have that :math:`S_k=-1` with
probability :math:`p` and :math:`S_k=1` with probability :math:`q=1-p`. The variable
:math:`S_k` is known as a `Bernoulli variable <https://en.wikipedia.org/wiki/Bernoulli_distribution>`__. The
expectation of :math:`S_k` is

.. math::
         {\hbox{E}\lbrack S_k \rbrack} = p\cdot (-1) + q\cdot 1 = 1 - 2p,

and the variance is

.. math::
         {\hbox{Var}\lbrack S_k \rbrack} = {\hbox{E}\lbrack S_k^2 \rbrack} - {\hbox{E}\lbrack S_k \rbrack}^2 = 1 - (1-2p)^2 = 4p(1-p){\thinspace .}

The position after :math:`k` steps is another stochastic variable

.. math::
         \bar X_k = \sum_{i=0}^{k-1} S_i{\thinspace .}

The expected position is

.. math::
         {\hbox{E}\lbrack \bar X_k \rbrack} =
        {\hbox{E}\lbrack \sum_{i=0 \rbrack}^{k-1} S_i} = \sum_{i=0}^{k-1} {\hbox{E}\lbrack S_i \rbrack}= k(1-2p){\thinspace .}

All the :math:`S_k` variables are independent. The variance therefore becomes

.. math::
         {\hbox{Var}\lbrack \bar X_k \rbrack} = {\hbox{Var}\lbrack \sum_{i=0 \rbrack}^{k-1} S_i} = \sum_{i=0}^{k-1} {\hbox{Var}\lbrack S_i \rbrack}=
        k4p(1-p){\thinspace .}

We see that :math:`{\hbox{Var}\lbrack \bar X_k \rbrack}` is proportional with the number of steps :math:`k`.
For the very important case :math:`p=q=\frac{1}{2}`, :math:`{\hbox{E}\lbrack \bar X_k \rbrack}=0` and
:math:`{\hbox{Var}\lbrack \bar X_k \rbrack}=k`.

How can we estimate :math:`{\hbox{E}\lbrack \bar X_k \rbrack}=0` and :math:`{\hbox{Var}\lbrack \bar X_k \rbrack}=N`?
We must have many random walks of the type in
Figure :ref:`diffu:randomwalk:1D:fig:ensemble`. For a given :math:`k`, say :math:`k=100`,
we find all the values of :math:`\bar X_k`, name them :math:`\bar x_{0,k}`, :math:`\bar x_{1,k}`,
:math:`\bar x_{2,k}`, and so on. The empirical estimate of :math:`{\hbox{E}\lbrack \bar X_k \rbrack}` is the
average,

.. math::
         {\hbox{E}\lbrack \bar X_k \rbrack} \approx = \frac{1}{W}\sum_{j=0}^{W-1} \bar x_{j,k},

while an empirical estimate of :math:`{\hbox{Var}\lbrack \bar X_k \rbrack}` is

.. math::
         {\hbox{Var}\lbrack \bar X_k \rbrack} \approx \frac{1}{W}\sum_{j=0}^{W-1} (\bar x_{j,k})^2 -
        \left(\frac{1}{W}\sum_{j=0}^{W-1} \bar x_{j,k}\right)^2{\thinspace .}

That is, we take the statistics for a given :math:`K` across the ensemble
of random walks ("vertically" in
Figure :ref:`diffu:randomwalk:1D:fig:ensemble`). The key quantities
to record are :math:`\sum_i \bar x_{i,k}` and :math:`\sum_i \bar x_{i,k}^2`.

.. _diffu:randomwalk:1D:code1:

Playing around with some code
-----------------------------

Scalar code          (1)
~~~~~~~~~~~~~~~~~~~~~~~~

Python has a ``random`` module for drawing random numbers, and this
module has a function ``uniform(a, b)`` for drawing a uniformly
distributed random number in the interval :math:`[a,b)`.  If an event
happens with probability :math:`p`, we can simulate this on the computer by
drawing a random number :math:`r` in :math:`[0,1)`, because then :math:`r\leq p` with
probability :math:`p` and :math:`r>p` with probability :math:`1-p`:

.. code-block:: python

    import random
    r = random.uniform(0, 1)
    if r <= p:
        # Event happens
    else:
        # Event does not happen

A random walk with :math:`N` steps, starting at :math:`x_0`, where we move
to the left with probability :math:`p` and to the right
with probability :math:`1-p` can now be implemented by

.. code-block:: python

    import random, numpy as np
    
    def random_walk1D(x0, N, p):
        """1D random walk with 1 particle."""
        # Store position in step k in position[k]
        position = np.zeros(N)
        position[0] = x0
        current_pos = x0
        for k in range(N-1):
            r = random.uniform(0, 1)
            if r <= p:
                current_pos -= 1
            else:
                current_pos += 1
            position[k+1] = current_pos
        return position

.. index:: vectorization

Vectorized code          (2)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Since :math:`N` is supposed to be large and we want to repeat the process for
many particles, we should speed up the code as much as possible.
Vectorization is the obvious technique here: we draw all the random
numbers at once with aid of ``numpy``, and then we formulate vector
operations to get rid of the loop over the steps (``k``).
The ``numpy.random`` module has vectorized versions of the functions in
Python's built-in ``random`` module. For example, ``numpy.random.uniform(a, b, N)``
returns ``N`` random numbers uniformly distributed between ``a`` (included)
and ``b`` (not included).

We can then make an array of all the steps in a random walk: if
the random number is less than or equal to :math:`p`, the step is :math:`-1`,
otherwise the step is :math:`1`:

.. code-block:: python

    r = np.random.uniform(0, 1, size=N)
    steps = np.where(r <= p, -1, 1)

The value of ``position[k]`` is the sum of all steps up to step ``k``.
Such sums are often needed in vectorized algorithms and therefore
available by the ``numpy.cumsum`` function:

.. code-block:: python

    >>> import numpy as np
    >>> np.cumsum(np.array([1,3,4,6]))
    array([ 1,  4,  8, 14])

The resulting array in this demo has elements :math:`1`, :math:`1+3=4`, :math:`1+3+4=8`,
and :math:`1+3+4+6=14`.

We can now vectorize the ``random_walk1D`` function:

.. code-block:: python

    def random_walk1D_vec(x0, N, p):
        """Vectorized version of random_walk1D."""
        # Store position in step k in position[k]
        position = np.zeros(N+1)
        position[0] = x0
        r = np.random.uniform(0, 1, size=N)
        steps = np.where(r <= p, -1, 1)
        position[1:] = x0 + np.cumsum(steps)
        return position

This code runs about 10 times faster than the scalar version.
With a parallel ``numpy`` library, the code can also automatically take
advantage of hardware for parallel computing because each of the four
array operations can be trivially parallelized.

.. index:: seed (random numbers)

Fixing the random sequence
~~~~~~~~~~~~~~~~~~~~~~~~~~

During software development with random numbers it is advantageous to
always generate the same sequence of random numbers as this may help
debugging processes. To fix the sequence, we set a *seed* of the random
number generator to some chosen integer, e.g.,

.. code-block:: python

    np.random.seed(10)

Calls to ``random_walk1D_vec`` give positions of the particle as
depicted in Figure :ref:`diffu:randomwalk:1D:code1:fig1`. The particle starts
at the origin and moves with :math:`p=\frac{1}{2}`. Since the seed is the same,
the plot to the left is just a magnification of the first 1,000 steps in
the plot to the right.

.. demo_random_walk1D produced the plots

.. _diffu:randomwalk:1D:code1:fig1:

.. figure:: rw1D_1sample.png
   :width: 800

   *1,000 (left) and 50,000 (right) steps of a random walk*

.. index:: verification

Verification          (9)
~~~~~~~~~~~~~~~~~~~~~~~~~

When we have a scalar and a vectorized code, it is always a good idea to
develop a unit test for checking that they produce the same result.
A problem in the present context is that the two versions apply to different
random number generators. For a test to be meaningful, we need to fix
the seed and use the same generator. This means that the scalar version
must either use ``np.random`` or have this as an option. An option
is the most flexible choice:

.. code-block:: text

    import random
    
    def random_walk1D(x0, N, p, random=random):
        ...
        r = random.uniform(0, 1)

Using ``random=np.random``, the ``r`` variable gets computed
by ``np.random.uniform``, and the sequence of random numbers will be
the same as in the vectorized version that employs the same generator
(given that the seed is also the same). A proper test function may be
to check that the positions in the walk are the same in the scalar and
vectorized implementations:

.. code-block:: python

    def test_random_walk1D():
        # For fixed seed, check that scalar and vectorized versions
        # produce the same result
        x0 = 2;  N = 4;  p = 0.6
        np.random.seed(10)
        scalar_computed = random_walk1D(x0, N, p, random=np.random)
        np.random.seed(10)
        vectorized_computed = random_walk1D_vec(x0, N, p)
        assert (scalar_computed == vectorized_computed).all()

Note that we employ ``==`` for arrays with real numbers, which is normally
an inadequate test due to rounding errors, but in the present case,
all arithmetics consists of adding or subtracting one, so these operations
are expected to have no rounding errors. Comparing two ``numpy`` arrays
with ``==`` results in a boolean array, so we need to call the ``all()``
method to ensure that all elements are ``True``, i.e., that all elements
in the two arrays match each other pairwise.

.. _diffu:randomwalk:1D:pde:

Equivalence with diffusion
--------------------------

.. index:: diffusion limit of random walk

The original random walk algorithm can be said to work with
dimensionless coordinates :math:`\bar x_i = -N + i`, :math:`i=0,1,\ldots, 2N+1`
(:math:`i\in [-N,N]`), and :math:`\bar t_n=n`, :math:`n=0,1,\ldots,N`.  A mesh with
spacings :math:`\Delta x` and :math:`\Delta t` with dimensions can be introduced
by

.. math::
         x_i = X_0 + \bar x_i \Delta x,\quad t_n = \bar t_n\Delta t{\thinspace .}

If we implement the algorithm with dimensionless coordinates, we can just
use this rescaling to obtain the movement in a coordinate system
without unit spacings.

Let :math:`P^{n+1}_i` be the probability of finding the particle at mesh point
:math:`\bar x_i` at time :math:`\bar t_{n+1}`. We can reach mesh point :math:`(i,n+1)` in two
ways: either coming in from the left from :math:`(i-1,n)` or from the
right (:math:`i+1,n)`. Each has probability :math:`\frac{1}{2}` (if we assume
:math:`p=q=\frac{1}{2}`). The fundamental equation for :math:`P^{n+1}_i` is

.. _Eq:diffu:randomwalk:1D:pde:Markov:

.. math::

    \tag{462}
    P^{n+1}_i = \frac{1}{2} P^{n}_{i-1} + \frac{1}{2} P^{n}_{i+1}{\thinspace .}
        
        
(This equation is easiest to understand if one looks at the random walk
as a Markov process and applies the transition probabilities, but this is
beyond scope of the present text.)

Subtracting :math:`P^{n}_i` from (:ref:`diffu:randomwalk:1D`) results
in

.. math::
        
        P^{n+1}_i - P^{n}_i = \frac{1}{2} (P^{n}_{i-1} -2P^{n}_i + \frac{1}{2} P^{n}_{i+1}){\thinspace .}
        

Readers who have seen the Forward Euler discretization of a 1D
diffusion equation recognize this scheme as very close to such a
discretization. We have

.. math::
         \frac{\partial}{\partial t}P(x_i,t_{n})
        = \frac{P^{n+1}_i - P^{n}_i}{\Delta t} + {\mathcal{O}(\Delta t)},

or in dimensionless coordinates

.. math::
         \frac{\partial}{\partial\bar t}P(\bar x_i,\bar t_n)
        \approx P^{n+1}_i - P^{n}_i{\thinspace .}

Similarly, we have

.. math::
        \begin{align*}
        \frac{\partial^2}{\partial x^2}P(x_i,t_n) &=
        \frac{P^{n}_{i-1} -2P^{n}_i + \frac{1}{2} P^{n}_{i+1}}{\Delta x^2}
        + {\mathcal{O}(\Delta x^2)},\\ 
        \frac{\partial^2}{\partial x^2}P(\bar x_i,\bar t_n) &\approx
        P^{n}_{i-1} -2P^{n}_i + \frac{1}{2} P^{n}_{i+1}{\thinspace .}
        \end{align*}

Equation (:ref:`diffu:randomwalk:1D`) is therefore equivalent with
the dimensionless diffusion equation

.. _Eq:diffu:randomwalk:1D:pde:dimless:

.. math::

    \tag{463}
    \frac{\partial P}{\partial\bar t} = \frac{1}{2}
        \frac{\partial^2 P}{\partial \bar x^2},
        
        
or the diffusion equation

.. _Eq:diffu:randomwalk:1D:pde:dim:

.. math::

    \tag{464}
    \frac{\partial P}{\partial t} = D\frac{\partial^2 P}{\partial x^2},
        
        
with diffusion coefficient

.. math::
         D = \frac{\Delta x^2}{2\Delta t}{\thinspace .}

This derivation shows the tight link between random walk and diffusion.
If we keep track of where the particle is, and repeat the process
many times, or run the algorithms for lots of particles, the histogram
of the positions will approximate the solution of the diffusion equation
for the local probability :math:`P^n_i`.

Suppose all the random walks start at the origin. Then the initial
condition for the probability distribution is the Dirac delta
function :math:`\delta(x)`. The solution of :ref:`(463) <Eq:diffu:randomwalk:1D:pde:dimless>`
can be shown to be

.. _Eq:diffu:randomwalk:1D:pde:dimless:sol:

.. math::

    \tag{465}
    \bar P(\bar x,\bar t) = \frac{1}{\sqrt{4\pi{\alpha} t}}e^{-\frac{x^2}{4{\alpha} t}},
        
        
where :math:`{\alpha} = \frac{1}{2}`.

Implementation of multiple walks
--------------------------------

Our next task is to implement an ensemble of walks (for statistics,
see the section :ref:`diffu:randomwalk:1D:EVar`) and also provide data from
the walks such that we can compute the probabilities of the positions
as introduced in the previous section. An appropriate representation
of probabilities :math:`P^n_i` are histograms (with :math:`i` along the :math:`x` axis)
for a few selected values of :math:`n`.

To estimate the expectation and variance of the random walks,
the section :ref:`diffu:randomwalk:1D:EVar` points to recording
:math:`\sum_j x_{j,k}` and :math:`\sum_j x_{j,k}^2`, where :math:`x_{j,k}` is the
position at time/step level :math:`k` in random walk number :math:`j`.
The histogram of positions needs the individual values :math:`x_{i,k}`
for all :math:`i` values and some selected :math:`k` values.

We introduce ``position[k]`` to hold :math:`\sum_j x_{j,k}`,
``position2[k]`` to hold :math:`\sum_j (x_{j,k})^2`, and
``pos_hist[i,k]`` to hold :math:`x_{i,k}`. A selection of :math:`k` values can be
specified by saying how many, ``num_times``, and let them be equally
spaced through time:

.. code-block:: python

    pos_hist_times = [(N//num_times)*i for i in range(num_times)]

This is one of the few situations we want integer division (``//``) or
real division rounded to an integer.

Scalar version
~~~~~~~~~~~~~~

Our scalar implementation of running ``num_walks`` random walks may go
like this:

.. code-block:: python

    def random_walks1D(x0, N, p, num_walks=1, num_times=1,
                       random=random):
        """Simulate num_walks random walks from x0 with N steps."""
        position = np.zeros(N+1)    # Accumulated positions
        position[0] = x0*num_walks
        position2 = np.zeros(N+1)   # Accumulated positions**2
        position2[0] = x0**2*num_walks
        # Histogram at num_times selected time points
        pos_hist = np.zeros((num_walks, num_times))
        pos_hist_times = [(N//num_times)*i for i in range(num_times)]
        #print 'save hist:', post_hist_times
    
        for n in range(num_walks):
            num_times_counter = 0
            current_pos = x0
            for k in range(N):
                if k in pos_hist_times:
                    #print 'save, k:', k, num_times_counter, n
                    pos_hist[n,num_times_counter] = current_pos
                    num_times_counter += 1
                # current_pos corresponds to step k+1
                r = random.uniform(0, 1)
                if r <= p:
                    current_pos -= 1
                else:
                    current_pos += 1
                position [k+1] += current_pos
                position2[k+1] += current_pos**2
        return position, position2, pos_hist, np.array(pos_hist_times)

Vectorized version
~~~~~~~~~~~~~~~~~~

We have already vectorized a single random walk. The additional
challenge here is to vectorize the computation of the data for the
histogram, ``pos_hist``, but given the selected steps in ``pos_hist_times``,
we can find the corresponding positions by indexing with the
list ``pos_hist_times``: ``position[post_hist_times]``, which are to be
inserted in ``pos_hist[n,:]``.

.. code-block:: python

    def random_walks1D_vec1(x0, N, p, num_walks=1, num_times=1):
        """Vectorized version of random_walks1D."""
        position  = np.zeros(N+1)    # Accumulated positions
        position2 = np.zeros(N+1)    # Accumulated positions**2
        walk = np.zeros(N+1)         # Positions of current walk
        walk[0] = x0
        # Histogram at num_times selected time points
        pos_hist = np.zeros((num_walks, num_times))
        pos_hist_times = [(N//num_times)*i for i in range(num_times)]
    
        for n in range(num_walks):
            r = np.random.uniform(0, 1, size=N)
            steps = np.where(r <= p, -1, 1)
            walk[1:] = x0 + np.cumsum(steps)  # Positions of this walk
            position  += walk
            position2 += walk**2
            pos_hist[n,:] = walk[pos_hist_times]
        return position, position2, pos_hist, np.array(pos_hist_times)

Improved vectorized version
~~~~~~~~~~~~~~~~~~~~~~~~~~~

Looking at the vectorized version above, we still have one potentially
long Python loop over ``n``. Normally, ``num_walks`` will be much larger than ``N``.
The vectorization of the loop over ``N`` certainly speeds up the program,
but if we think of vectorization as also a way to parallelize the code,
all the independent walks (the ``n`` loop) can be executed in parallel.
Therefore, we should include this loop as well in the vectorized
expressions, at the expense of using more memory.

We introduce the array ``walks`` to hold the :math:`N+1` steps of all the walks:
each row represents the steps in one walk.

.. code-block:: python

    walks = np.zeros((num_walks, N+1))  # Positions of each walk
    walks[:,0] = x0

Since all the steps are independent, we can just make one long
vector of enough random numbers (``N*num_walks``), translate these
numbers to :math:`\pm 1`, then we reshape the array such that the steps
of each walk are stored in the rows.

.. code-block:: python

    r = np.random.uniform(0, 1, size=N*num_walks)
    steps = np.where(r <= p, -1, 1).reshape(num_walks, N)

The next step is to sum up the steps in each walk. We need the
``np.cumsum`` function for this, with the argument ``axis=1`` for
indicating a sum across the columns:

.. code-block:: python

    >>> a = np.arange(6).reshape(2,3)
    >>> a
    array([[0, 1, 2],
           [3, 4, 5]])
    >>> np.cumsum(a, axis=1)
    array([[ 0,  1,  3],
           [ 3,  7, 12]])

Now ``walks`` can be computed by

.. code-block:: python

    walks[:,1:] = x0 + np.cumsum(steps, axis=1)

The ``position`` vector is the sum of all the walks. That is, we want to
sum all the rows, obtained by

.. code-block:: python

    position  = np.sum(walks, axis=0)

A corresponding expression computes the squares of the positions.
Finally, we need to compute ``pos_hist``, but that is a matter of
grabbing some of the walks (according to ``pos_hist_times``):

.. code-block:: python

    pos_hist[:,:] = walks[:,pos_hist_times]

The complete vectorized algorithm without any loop can now be
summarized:

.. code-block:: python

    def random_walks1D_vec2(x0, N, p, num_walks=1, num_times=1):
        """Vectorized version of random_walks1D; no loops."""
        position  = np.zeros(N+1)    # Accumulated positions
        position2 = np.zeros(N+1)    # Accumulated positions**2
        walks = np.zeros((num_walks, N+1))  # Positions of each walk
        walks[:,0] = x0
        # Histogram at num_times selected time points
        pos_hist = np.zeros((num_walks, num_times))
        pos_hist_times = [(N//num_times)*i for i in range(num_times)]
    
        r = np.random.uniform(0, 1, size=N*num_walks)
        steps = np.where(r <= p, -1, 1).reshape(num_walks, N)
        walks[:,1:] = x0 + np.cumsum(steps, axis=1)
        position  = np.sum(walks,    axis=0)
        position2 = np.sum(walks**2, axis=0)
        pos_hist[:,:] = walks[:,pos_hist_times]
        return position, position2, pos_hist, np.array(pos_hist_times)

What is the gain of the vectorized implementations? One important gain
is that each vectorized operation can be automatically parallelized
if one applies a parallel ``numpy`` library like `Numba <http://numba.pydata.org>`__. On a single CPU, however, the speed up of the vectorized operations
is also significant. With :math:`N=1,000` and 50,000 repeated walks,
the two vectorized versions run about 25 and 18 times faster than the scalar
version, with ``random_walks1D_vec1`` being fastest.

.. The downside of vectorization of random walks is the large

.. arrays that arise, especially the version in ``random_walks1D_vec2``.

.. CPU:

.. N=1000, 50K walks

.. scalar: 42, vec1: 1.7, vec2: 1.7, factors 25 and 25

Remark on vectorized code and parallelization
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Our first attempt on vectorization removed the loop over the :math:`N` steps in
a single walk. However, the number of walks is usually much larger than
:math:`N`, because of the need for accurate statistics. Therefore, we should
rather remove the loop over all walks. It turns out, from our efficiency
experiments, that the function ``random_walks1D_vec2`` (with no loops) is
slower than ``random_walks1D_vec1``. This is a bit surprising and may be
explained by less efficiency in the statements involving very large
arrays, containing all steps for all walks at once.

From a parallelization and improved vectorization point of view, it
would be more natural to switch the sequence of the loops in the
serial code such that the shortest loop is the outer loop:

.. code-block:: python

    def random_walks1D2(x0, N, p, num_walks=1, num_times=1, ...):
        ...
        current_pos = x0 + np.zeros(num_walks)
        num_times_counter = -1
    
        for k in range(N):
            if k in pos_hist_times:
    	    num_times_counter += 1
    	    store_hist = True
    	else:
    	    store_hist = False
    
            for n in range(num_walks):
                # current_pos corresponds to step k+1
                r = random.uniform(0, 1)
    	    if r <= p:
                    current_pos[n] -= 1
                else:
                    current_pos[n] += 1
                position [k+1] += current_pos[n]
                position2[k+1] += current_pos[n]**2
                if store_hist:
                    pos_hist[n,num_times_counter] = current_pos[n]
        return position, position2, pos_hist, np.array(pos_hist_times)

The vectorized version of this code, where we just vectorize the
loop over ``n``, becomes

.. code-block:: python

    def random_walks1D2_vec1(x0, N, p, num_walks=1, num_times=1):
        """Vectorized version of random_walks1D2."""
        position  = np.zeros(N+1)    # Accumulated positions
        position2 = np.zeros(N+1)    # Accumulated positions**2
        # Histogram at num_times selected time points
        pos_hist = np.zeros((num_walks, num_times))
        pos_hist_times = [(N//num_times)*i for i in range(num_times)]
    
        current_pos = np.zeros(num_walks)
        current_pos[0] = x0
        num_times_counter = -1
    
        for k in range(N):
            if k in pos_hist_times:
    	    num_times_counter += 1
    	    store_hist = True  # Store histogram data for this k
    	else:
    	    store_hist = False
    
            # Move all walks one step
            r = np.random.uniform(0, 1, size=num_walks)
            steps = np.where(r <= p, -1, 1)
            current_pos += steps
            position[k+1]  = np.sum(current_pos)
            position2[k+1] = np.sum(current_pos**2)
            if store_hist:
                pos_hist[:,num_times_counter] = current_pos
        return position, position2, pos_hist, np.array(pos_hist_times)

This function runs significantly faster than the ``random_walks1D_vec1``
function above, typically 1.7 times faster. The code is also more appropriate
in a parallel computing context since each vectorized statement
can work with data of size ``num_walks`` over the compute units, repeated ``N``
times (compared with data of size ``N``, repeated ``num_walks`` times, in
``random_walks1D_vec1``).

The scalar code with switched loops, ``random_walks1D2`` runs a bit slower
than the original code in ``random_walks1D``, so with the longest loop as
the inner loop, the vectorized function ``random_walks1D2_vec1``
is almost 60 times faster than the scalar counterpart, while the
code ``random_walks1D_vec2`` without loops is only around 18 times faster.
Taking into account the very large arrays required by the latter function,
we end up with ``random_walks1D2_vec1`` as the preferred implementation.

Test function
~~~~~~~~~~~~~

During program development, it is highly recommended to carry out
computations by hand for, e.g., ``N=4`` and ``num_walks=3``.  Normally,
this is done by executing the program with these parameters and
checking with pen and paper that the computations make sense.  The
next step is to use this test for correctness in a formal test
function.

First, we need to check that the simulation of multiple random walks
reproduces the results of ``random_walk1D``, ``random_walk1D_vec1``, and
``random_walk1D_vec2`` for the first walk, if the seed is the
same. Second, we run three random walks (``N=4``) with the scalar and
the two vectorized versions and check that the returned arrays are
identical.

For this type of test to be successful, we must be sure that exactly
the same set of random numbers are used in the three versions, a fact
that requires the same random number generator and the same seed, of
course, but also the same sequence of computations.  This is not
obviously the case with the three ``random_walk1D*`` functions we
have presented. The critical issue in ``random_walk1D_vec1`` is that the
first random numbers are used for the first walk, the second set of
random numbers is used for the second walk and so, to be compatible
with how the random numbers are used in the function ``random_walk1D``.
For the function ``random_walk1D_vec2`` the situation is a bit more
complicated since we generate all the random numbers at once.
However, the critical step now is the reshaping of the array returned
from ``np.where``: we must reshape as ``(num_walks, N)`` to ensure that
the first ``N`` random numbers are used for the first walk, the next ``N``
numbers are used for the second walk, and so on.

We arrive at the test function below.

.. code-block:: python

    def test_random_walks1D():
        # For fixed seed, check that scalar and vectorized versions
        # produce the same result
        x0 = 0;  N = 4;  p = 0.5
    
        # First, check that random_walks1D for 1 walk reproduces
        # the walk in random_walk1D
        num_walks = 1
        np.random.seed(10)
        computed = random_walks1D(
            x0, N, p, num_walks, random=np.random)
        np.random.seed(10)
        expected = random_walk1D(
            x0, N, p, random=np.random)
        assert (computed[0] == expected).all()
    
        # Same for vectorized versions
        np.random.seed(10)
        computed = random_walks1D_vec1(x0, N, p, num_walks)
        np.random.seed(10)
        expected = random_walk1D_vec(x0, N, p)
        assert (computed[0] == expected).all()
        np.random.seed(10)
        computed = random_walks1D_vec2(x0, N, p, num_walks)
        np.random.seed(10)
        expected = random_walk1D_vec(x0, N, p)
        assert (computed[0] == expected).all()
    
        # Second, check multiple walks: scalar == vectorized
        num_walks = 3
        num_times = N
        np.random.seed(10)
        serial_computed = random_walks1D(
            x0, N, p, num_walks, num_times, random=np.random)
        np.random.seed(10)
        vectorized1_computed = random_walks1D_vec1(
            x0, N, p, num_walks, num_times)
        np.random.seed(10)
        vectorized2_computed = random_walks1D_vec2(
            x0, N, p, num_walks, num_times)
        # positions: [0, 1, 0, 1, 2]
        # Can test without tolerance since everything is +/- 1
        return_values = ['pos', 'pos2', 'pos_hist', 'pos_hist_times']
        for s, v, r in zip(serial_computed,
                           vectorized1_computed,
                           return_values):
            msg = '%s: %s (serial) vs %s (vectorized)' % (r, s, v)
            assert (s == v).all(), msg
        for s, v, r in zip(serial_computed,
                           vectorized2_computed,
                           return_values):
            msg = '%s: %s (serial) vs %s (vectorized)' % (r, s, v)
            assert (s == v).all(), msg

Such test functions are indispensable for further development of the code
as we can at any time test whether the basic computations remain correct or not.
This is particularly important in stochastic simulations since without
test functions and fixed seeds, we always experience variations from run to
run, and it can be very difficult to spot bugs through averaged statistical
quantities.

Demonstration of multiple walks
-------------------------------

Assuming now that the code works, we can just scale up the number of
steps in each walk and the number of walks. The latter influences the
accuracy of the statistical estimates.  Figure
:ref:`diffu:randomwalk:1D:fig:demo1:EX` shows the impact of the number
of walks on the expectation, which should approach zero.  Figure
:ref:`diffu:randomwalk:1D:fig:demo1:VarX` displays the corresponding
estimate of the variance of the position, which should grow linearly
with the number of steps. It does, seemingly very accurately, but
notice that the scale on the :math:`y` axis is so much larger than for the
expectation, so irregularities due to the stochastic nature of the
process become so much less visible in the variance plots.  The
probability of finding a particle at a certain position at time (or
step) 800 is shown in Figure
:ref:`diffu:randomwalk:1D:fig:demo1:HistX`. The dashed red line is the
theoretical distribution :ref:`(465) <Eq:diffu:randomwalk:1D:pde:dimless:sol>`
arising from solving the diffusion equation
:ref:`(463) <Eq:diffu:randomwalk:1D:pde:dimless>` instead. As always, we realize
that one needs significantly more statistical samples to estimate a
histogram accurately than the expectation or variance.

.. _diffu:randomwalk:1D:fig:demo1:EX:

.. figure:: rw1D_EX_100_10000_100000_1000000.png
   :width: 800

   *Estimated expected value for 1000 steps, using 100 walks (upper left), 10,000 (upper right), 100,000 (lower left), and 1,000,000 (lower right)*

.. _diffu:randomwalk:1D:fig:demo1:VarX:

.. figure:: rw1D_VarX_100_10000_100000_1000000.png
   :width: 800

   *Estimated variance over 1000 steps, using 100 walks (upper left), 10,000 (upper right), 100,000 (lower left), and 1,000,000 (lower right)*

.. _diffu:randomwalk:1D:fig:demo1:HistX:

.. figure:: rw1D_HistX_100_10000_100000_1000000.png
   :width: 800

   *Estimated probability distribution at step 800, using 100 walks (upper left), 10,000 (upper right), 100,000 (lower left), and 1,000,000 (lower right)*

.. _diffu:randomwalk:1D:avplotter:

Ascii visualization of 1D random walk
-------------------------------------

.. index:: scitools.avplotter

.. index:: Plotter class (SciTools)

.. index:: interrupt a program by Ctrl+c

If we want to study (very) long time series of random walks, it can be
convenient to plot the position in a terminal window with the time axis
pointing downwards. The module ``avplotter`` in SciTools has a class ``Plotter``
for plotting functions in the terminal window with the aid of ascii symbols
only. Below is the code required to visualize a simple random walk,
starting at the origin, and considered over
when the point :math:`x=-1` is reached. We use a spacing :math:`\Delta x = 0.05` (so
:math:`x=-1` corresponds to :math:`i=-20`).

.. code-block:: python

    def run_random_walk():
        from scitools.avplotter import Plotter
        import time, numpy as np
        p = Plotter(-1, 1, width=75)   # Horizontal axis: 75 chars wide
        dx = 0.05
        np.random.seed(10)
    
        x = 0
        while True:
            random_step = 1 if np.random.random() > 0.5 else -1
            x = x + dx*random_step
            if x < -1:
                break                  # Destination reached!
            print p.plot(0, x)
    
            # Allow Ctrl+c to abort the simulation
            try:
                time.sleep(0.1)  # Wait for interrupt
            except KeyboardInterrupt:
                print 'Interrupted by Ctrl+c'
                break

Observe that we implement an infinite loop, but allow a smooth interrupt
of the program by ``Ctrl+c`` through Python's ``KeyboardInterrupt``
exception. This is a useful recipe that can be used in many occasions!

The output looks typically like

.. code-block:: text

                                *         |
                                  *       |
                                *         |
                                  *       |
                                *         |
                                  *       |
                                    *     |
                                  *       |
                                *         |
                              *           |
                            *             |
                          *               |
                            *             |
                          *               |
                         *                |
                       *                  |
                     *                    |
                   *                      |

Positions beyond the limits of the :math:`x` axis appear with a value.
`A long file <http://bit.ly/1UbULeH>`__
contains the complete ascii plot corresponding to the
function ``run_random_walk`` above.

.. "https://github.com/hplgit/fdm-book/blob/master/doc/.src/chapters/diffu/fig-diffu/rw_ascii.txt"

.. _diffu:randomwalk:1D:ode:

Random walk as a stochastic equation
------------------------------------

.. index:: stochastic difference equation

The (dimensionless) position in a random walk, :math:`\bar X_k`, can be expressed as
a stochastic difference equation:

.. _Eq:diffu:randomwalk:1D:ode:x:

.. math::

    \tag{466}
    \bar X_k = \bar X_{k-1} + s, \quad x_0=0,
        
        
where :math:`s` is a `Bernoulli variable <https://en.wikipedia.org/wiki/Bernoulli_distribution>`__,
taking on the two values :math:`s=-1` and :math:`s=1`
with equal probability:

.. math::
         \hbox{P}(s=1)=\frac{1}{2},\quad \hbox{P}(s=-1)=\frac{1}{2}{\thinspace .}

The :math:`s` variable in a step is independent of the :math:`s` variable in other steps.

The difference equation expresses essentially the sum of independent
Bernoulli variables.  Because of the central limit theorem, :math:`X_k`,
will then be normally distributed with expectation :math:`k{\hbox{E}\lbrack s \rbrack}` and
:math:`k{\hbox{Var}\lbrack s \rbrack}`.  The expectation and variance of a Bernoulli variable with
values :math:`r=0` and :math:`r=1` are :math:`p` and :math:`p(1-p)`, respectively.
The variable :math:`s=2r-1` then has expectation
:math:`2{\hbox{E}\lbrack r \rbrack}-1=2p-1=0` and variance :math:`2^2{\hbox{Var}\lbrack r \rbrack}=4p(1-p)=1`.  The position
:math:`X_k` is normally distributed with zero expectation and variance :math:`k`,
as we found in the section :ref:`diffu:randomwalk:1D:EVar`.

The central limit theorem tells that as long as :math:`k` is not small,
the distribution of :math:`X_k` remains the same if
we replace the Bernoulli variable :math:`s` by any other stochastic variable with
the same expectation and variance. In particular, we may let :math:`s` be a
standardized Gaussian variable (zero mean, unit variance).

.. Let us introduce

.. !bt

.. x_k = \Delta x\bar x_k,\]

.. !et

.. such that we take steps of length :math:`\Delta x`.

Dividing :ref:`(466) <Eq:diffu:randomwalk:1D:ode:x>` by :math:`\Delta t` gives

.. math::
         \frac{\bar X_k - \bar X_{k-1}}{\Delta t} = \frac{1}{\Delta t} s{\thinspace .}

In the limit :math:`\Delta t\rightarrow 0`, :math:`s/\Delta t` approaches a white noise
stochastic process.
With :math:`\bar X(t)` as the continuous process in the limit
:math:`\Delta t\rightarrow 0` (:math:`X_k\rightarrow X(t_k)`),
we formally get the stochastic differential equation

.. index:: stochastic ODE

.. index:: Fokker-Planck equation

.. index:: Wiener process

.. _Eq:_auto199:

.. math::

    \tag{467}
    d\bar X = dW,
        
        
where :math:`W(t)` is a `Wiener process <https://en.wikipedia.org/wiki/Wiener_process>`__. Then :math:`X` is also a
Wiener process. It follows from the stochastic ODE :math:`dX=dW` that the
probability distribution of :math:`X` is given by the `Fokker-Planck
equation <https://en.wikipedia.org/wiki/Fokker-Planck_equation>`__
:ref:`(463) <Eq:diffu:randomwalk:1D:pde:dimless>`.  In other words, the key
results for random walk we found earlier can alternatively be
derived via a stochastic ordinary differential equation and its
related Fokker-Planck equation.

Random walk in 2D
-----------------

The most obvious generalization of 1D random walk to two spatial
dimensions is to allow movements to the north, east, south, and west,
with equal probability :math:`\frac{1}{4}`.

.. code-block:: python

    def random_walk2D(x0, N, p, random=random):
        """2D random walk with 1 particle and N moves: N, E, W, S."""
        # Store position in step k in position[k]
        d = len(x0)
        position = np.zeros((N+1, d))
        position[0,:] = x0
        current_pos = np.array(x0, dtype=float)
        for k in range(N):
            r = random.uniform(0, 1)
            if r <= 0.25:
                current_pos += np.array([0, 1])   # Move north
            elif 0.25 < r <= 0.5:
                current_pos += np.array([1, 0])   # Move east
            elif 0.5 < r <= 0.75:
                current_pos += np.array([0, -1])  # Move south
            else:
                current_pos += np.array([-1, 0])  # Move west
            position[k+1,:] = current_pos
        return position

The left plot in Figure :ref:`diffu:randomwalk:2D:fig:rect_vs_diag` provides
an example on 200 steps with this kind of walk. We may refer to this walk
as a walk on a *rectangular mesh* as we move from any spatial
mesh point :math:`(i,j)` to one of its four neighbors in the rectangular directions:
:math:`(i+1,j)`, :math:`(i-1,j)`, :math:`(i,j+1)`, or :math:`(i,j-1)`.

.. _diffu:randomwalk:2D:fig:rect_vs_diag:

.. figure:: rw2D_sample200.png
   :width: 800

   *Random walks in 2D with 200 steps: rectangular mesh (left) and diagonal mesh (right)*

Random walk in any number of space dimensions
---------------------------------------------

From a programming point of view, especially when implementing a random
walk in any number of dimensions, it is more natural to consider a walk
in the diagonal directions NW, NE, SW, and SE. On a two-dimensional spatial mesh
it means that we go from :math:`(i,j)` to either :math:`(i+1,j+1)`, :math:`(i-1,j+1)`,
:math:`(i+1,j-1)`, or :math:`(i-1,j-1)`. We can with such a *diagonal mesh*
(see right plot in Figure :ref:`diffu:randomwalk:2D:fig:rect_vs_diag`)
draw a Bernoulli variable for the step in each spatial direction and
trivially write code that works in any number of spatial directions:

.. code-block:: python

    def random_walkdD(x0, N, p, random=random):
        """Any-D (diagonal) random walk with 1 particle and N moves."""
        # Store position in step k in position[k]
        d = len(x0)
        position = np.zeros((N+1, d))
        position[0,:] = x0
        current_pos = np.array(x0, dtype=float)
        for k in range(N):
            for i in range(d):
                r = random.uniform(0, 1)
                if r <= p:
                    current_pos[i] -= 1
                else:
                    current_pos[i] += 1
            position[k+1,:] = current_pos
        return position

A vectorized version is desired. We follow the ideas from the section :ref:`diffu:randomwalk:1D:code1`, but each step is now a vector in :math:`d`
spatial dimensions. We therefore need to draw :math:`Nd` random numbers in ``r``,
compute steps in the various directions through ``np.where(r <=p, -1, 1)``
(each step being :math:`-1` or :math:`1`),
and then we can reshape this array to an :math:`N\times d` array of step
*vectors*. Doing an ``np.cumsum`` summation along axis 0 will add
the vectors, as this demo shows:

.. code-block:: python

    >>> a = np.arange(6).reshape(3,2)
    >>> a
    array([[0, 1],
           [2, 3],
           [4, 5]])
    >>> np.cumsum(a, axis=0)
    array([[ 0,  1],
           [ 2,  4],
           [ 6,  9]])

With such summation of step vectors, we get all the positions to be
filled in the ``position`` array:

.. code-block:: python

    def random_walkdD_vec(x0, N, p):
        """Vectorized version of random_walkdD."""
        d = len(x0)
        # Store position in step k in position[k]
        position = np.zeros((N+1,d))
        position[0] = np.array(x0, dtype=float)
        r = np.random.uniform(0, 1, size=N*d)
        steps = np.where(r <= p, -1, 1).reshape(N,d)
        position[1:,:] = x0 + np.cumsum(steps, axis=0)
        return position

.. _diffu:randomwalk:2D:fig:samples:

.. figure:: rw2D_samples_5000.png
   :width: 800

   *Four random walks with 5000 steps in 2D*

Multiple random walks in any number of space dimensions
-------------------------------------------------------

As we did in 1D, we extend one single walk to a number of walks (``num_walks``
in the code).

Scalar code          (2)
~~~~~~~~~~~~~~~~~~~~~~~~

As always, we start with implementing the scalar case:

.. code-block:: python

    def random_walksdD(x0, N, p, num_walks=1, num_times=1,
                       random=random):
        """Simulate num_walks random walks from x0 with N steps."""
        d = len(x0)
        position  = np.zeros((N+1, d))   # Accumulated positions
        position2 = np.zeros((N+1, d))   # Accumulated positions**2
        # Histogram at num_times selected time points
        pos_hist = np.zeros((num_walks, num_times, d))
        pos_hist_times = [(N//num_times)*i for i in range(num_times)]
    
        for n in range(num_walks):
            num_times_counter = 0
            current_pos = np.array(x0, dtype=float)
            for k in range(N):
                if k in pos_hist_times:
                    pos_hist[n,num_times_counter,:] = current_pos
                    num_times_counter += 1
                # current_pos corresponds to step k+1
                for i in range(d):
                    r = random.uniform(0, 1)
                    if r <= p:
                        current_pos[i] -= 1
                    else:
                        current_pos[i] += 1
                position [k+1,:] += current_pos
                position2[k+1,:] += current_pos**2
        return position, position2, pos_hist, np.array(pos_hist_times)

Vectorized code          (3)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Significant speed-ups can be obtained by vectorization. We get rid of
the loops in the previous function and arrive at the following vectorized
code.

.. code-block:: python

    def random_walksdD_vec(x0, N, p, num_walks=1, num_times=1):
        """Vectorized version of random_walks1D; no loops."""
        d = len(x0)
        position  = np.zeros((N+1, d))  # Accumulated positions
        position2 = np.zeros((N+1, d))  # Accumulated positions**2
        walks = np.zeros((num_walks, N+1, d))  # Positions of each walk
        walks[:,0,:] = x0
        # Histogram at num_times selected time points
        pos_hist = np.zeros((num_walks, num_times, d))
        pos_hist_times = [(N//num_times)*i for i in range(num_times)]
    
        r = np.random.uniform(0, 1, size=N*num_walks*d)
        steps = np.where(r <= p, -1, 1).reshape(num_walks, N, d)
        walks[:,1:,:] = x0 + np.cumsum(steps, axis=1)
        position  = np.sum(walks,    axis=0)
        position2 = np.sum(walks**2, axis=0)
        pos_hist[:,:,:] = walks[:,pos_hist_times,:]
        return position, position2, pos_hist, np.array(pos_hist_times)

.. 2D N=4M

.. scalar/vec (1 walk): 25

.. 2D N=5000

.. num_walks=10K

.. scalar/vec=50

.. 3D: 40

.. 1D: 100

.. scalar is about the same for 1D, 2D, 3D, but vec is 0.3, 0.7, 1.0

.. _diffu:app:

Applications
============

.. _diffu:app:substance:

Diffusion of a substance
------------------------

The first process to be considered is a substance that gets
transported through a fluid at rest by pure diffusion. We consider an
arbitrary volume :math:`V` of this fluid, containing the substance with
concentration function :math:`c(\boldsymbol{x},t)`. Physically, we can think of a very
small volume with centroid :math:`\boldsymbol{x}` at time :math:`t` and assign the ratio of
the volume of the substance and the total volume to :math:`c(\boldsymbol{x}, t)`.  This
means that the mass of the substance in a small volume :math:`\Delta V` is
approximately :math:`\varrho c\Delta V`, where :math:`\varrho` is the density of
the substance. Consequently, the total mass of the substance inside
the volume :math:`V` is the sum of all :math:`\varrho c\Delta V`, which becomes
the volume integral :math:`\int_V\varrho cdV`.

Let us reason how the mass of the substance changes and thereby derive
a PDE governing the concentration :math:`c`.  Suppose the substance flows
out of :math:`V` with a flux :math:`\boldsymbol{q}`.  If :math:`\Delta S` is a small part of the
boundary :math:`\partial V` of :math:`V`, the volume of the substance flowing out
through :math:`dS` in a small time interval :math:`\Delta t` is :math:`\varrho
\boldsymbol{q}\cdot\boldsymbol{n} \Delta t \Delta S`, where :math:`\boldsymbol{n}` is an outward
unit normal to the boundary :math:`\partial V`, see Figure
:ref:`diffu:app:substance:fig1`.  We realize that only the normal
component of :math:`\boldsymbol{q}` is able to transport mass in and out of :math:`V`.  The
total outflow of the mass of the substance in a small time interval
:math:`\Delta t` becomes the surface integral

.. math::
         \int\limits_{\partial V} \varrho\boldsymbol{q}\cdot\boldsymbol{n} \Delta t\,  dS{\thinspace .}

Assuming conservation of mass, this outflow of mass must be balanced by
a loss of mass inside the volume. The increase of mass inside the volume,
during a small time interval :math:`\Delta t`, is

.. math::
         \int\limits_V \varrho (c(\boldsymbol{x},t+\Delta t) - c(\boldsymbol{x},t)) dV,

assuming :math:`\varrho` is constant, which is reasonable. The outflow of mass
balances the loss of mass in :math:`V`, which is the increase with a minus sign.
Setting the two contributions equal to each other ensures balance of
mass inside :math:`V`. Dividing by :math:`\Delta t` gives

.. math::
         \int\limits_V \varrho \frac{c(\boldsymbol{x},t+\Delta t) - c(\boldsymbol{x},t)}{\Delta t} dV =
        - \int\limits_{\partial V} \varrho\boldsymbol{q}\cdot\boldsymbol{n}\, dS{\thinspace .}

Note the minus sign on the right-hand side: the left-hand side expresses
loss of mass, while the integral on the right-hand side is the gain of
mass.

Now, letting :math:`\Delta t\rightarrow 0`, we have

.. math::
         \frac{c(\boldsymbol{x},t+\Delta t) - c(\boldsymbol{x},t)}{\Delta t} \rightarrow
        \frac{\partial c}{\partial t},

so

.. _Eq:diffu:app:substance:integral:

.. math::

    \tag{468}
    \int\limits_V \varrho \frac{\partial c}{\partial t}dV +
        \int\limits_{\partial V} \varrho\boldsymbol{q}\cdot\boldsymbol{n}\, dS = 0{\thinspace .}
        
        
To arrive at a PDE,
we express the surface integral as a volume integral using Gauss'
divergence theorem:

.. math::
        
        \int\limits_V (\varrho \frac{\partial c}{\partial t} +
        \nabla\cdot(\varrho \boldsymbol{q}))dV = 0{\thinspace .}
        

Since :math:`\varrho` is constant, we can divide by this quantity.  If the
integral is to vanish for an arbitrary volume :math:`V`, the integrand must
vanish too, and we get the mass conservation PDE for the substance:

.. _Eq:diffu:app:substance:massconv:

.. math::

    \tag{469}
    \frac{\partial c}{\partial t} +
        \nabla\cdot \boldsymbol{q} = 0{\thinspace .}
        
        
.. _diffu:app:substance:fig1:

.. figure:: continuum.png
   :width: 200

   *An arbitrary volume of a fluid*

A fundamental problem is that this is a scalar PDE for four unknowns:
:math:`c` and the three components of :math:`\boldsymbol{q}`. We therefore need additional
equations. Here, Fick's law comes at rescue: it models how the flux
:math:`\boldsymbol{q}` of the substance is related to the concentration :math:`c`. Diffusion
is recognized by mass flowing from regions with high concentration to
regions of low concentration. This principle suggests that :math:`\boldsymbol{q}` is
proportional to the negative gradient of :math:`c`:

.. _Eq:diffu:app:substance:Fick:

.. math::

    \tag{470}
    \boldsymbol{q} = -{\alpha}\nabla c,
        
        
where :math:`{\alpha}` is an empirically determined constant.
The relation :ref:`(470) <Eq:diffu:app:substance:Fick>` is known as Fick's law.
Inserting :ref:`(470) <Eq:diffu:app:substance:Fick>` in
:ref:`(469) <Eq:diffu:app:substance:massconv>` gives a scalar PDE for the
concentration :math:`c`:

.. _Eq:diffu:app:substance:PDE:

.. math::

    \tag{471}
    \frac{\partial c}{\partial t} = {\alpha}\nabla^2 c{\thinspace .}
        
        
.. _diffu:app:heat:

Heat conduction
---------------

Heat conduction is a well-known diffusion process. The governing PDE
is in this case based on the first law of thermodynamics: the increase
in energy of a system is equal to the work done on the system, plus
the supplied heat. Here, we shall consider media at rest and neglect work
done on the system. The principle then reduces to a balance between
increase in internal energy and supplied heat flow by conduction.

Let :math:`e(x,t)` be the *internal energy* per unit mass. The increase of
the internal energy in a small volume :math:`\Delta V` in a small time
interval :math:`\Delta t` is then

.. math::
         \varrho (e(\boldsymbol{x},t+\Delta t) - e(\boldsymbol{x},t))\Delta V,

where :math:`\varrho` is the density of the material subject to heat conduction.
In an arbitrary volume :math:`V`, as depicted in
Figure :ref:`diffu:app:substance:fig1`, the corresponding increase in internal
energy becomes the volume integral

.. math::
         \int\limits_V \varrho (e(\boldsymbol{x},t+\Delta t) - e(\boldsymbol{x},t))dV{\thinspace .}

This increase in internal energy is balanced by heat supplied by
conduction. Let :math:`\boldsymbol{q}` be the heat flow per time unit. Through the
surface :math:`\partial V` of :math:`V` the following amount of heat flows out
of :math:`V` during a time interval :math:`\Delta t`:

.. math::
        
        \int\limits_{\partial V} \boldsymbol{q}\cdot\boldsymbol{n}\Delta t\, dS{\thinspace .}

The simplified version of the first law of thermodynamics then states that

.. math::
        
        \int\limits_V \varrho (e(\boldsymbol{x},t+\Delta t) - e(\boldsymbol{x},t))dV = -
        \int\limits_{\partial V} \boldsymbol{q}\cdot\boldsymbol{n}\Delta t\, dS{\thinspace .}
        

The minus sign on the right-hand side ensures that the integral there
models net *inflow* of heat (since :math:`\boldsymbol{n}` is an outward unit normal,
:math:`\boldsymbol{q}\cdot\boldsymbol{n}` models *outflow*).
Dividing by :math:`\Delta t` and notifying that

.. math::
         \lim_{\Delta t\rightarrow 0}
        \frac{e(\boldsymbol{x},t+\Delta t) - e(\boldsymbol{x},t)}{\Delta t} =
        \frac{\partial e}{\partial t},

we get (in the limit :math:`\Delta t\rightarrow 0`)

.. math::
        
        \int\limits_V \varrho \frac{\partial e}{\partial t} dV +
        \int\limits_{\partial V} \boldsymbol{q}\cdot\boldsymbol{n}\Delta t\, dS = 0{\thinspace .}
        

This is the integral equation for heat conduction, but we aim at a PDE.
The next step is therefore to transform the surface integral to a
volume integral via Gauss' divergence theorem. The result is

.. math::
        
        \int\limits_V\left( \varrho \frac{\partial e}{\partial t}
        + \nabla\cdot\boldsymbol{q}\right) dV = 0{\thinspace .}
        

If this equality is to hold for all volumes :math:`V`, the integrand must vanish,
and we have the PDE

.. _Eq:diffu:app:heat:PDE1:

.. math::

    \tag{472}
    \varrho \frac{\partial e}{\partial t} = -\nabla\cdot\boldsymbol{q}{\thinspace .}
        
        
Sometimes the supplied heat can come from the medium itself. This is
the case, for instance, when radioactive rock generates heat.  Let us
add this effect. If :math:`f(\boldsymbol{x},t)` is the supplied heat per unit volume per
unit time, the heat supplied in a small volume is :math:`f\Delta t\Delta V`,
and inside an arbitrary volume :math:`V` the supplied generated heat becomes

.. math::
         \int\limits_V f\Delta t dV{\thinspace .}

Adding this to the integral statement of the (simplified) first law of
thermodynamics, and continuing the derivation, leads to the PDE

.. _Eq:diffu:app:heat:PDE2:

.. math::

    \tag{473}
    \varrho \frac{\partial e}{\partial t} = -\nabla\cdot\boldsymbol{q} + f{\thinspace .}
        
        
There are four unknown scalar fields: :math:`e` and :math:`\boldsymbol{q}`. Moreover, the
temperature :math:`T`, which is our primary quantity to compute, does not
enter the model yet. We need an additional equation, called the
*equation of state*, relating :math:`e`, :math:`V=1/\varrho=`, and :math:`T`:
:math:`e=e(V,T)`. By the chain rule we have

.. math::
         \frac{\partial e}{\partial t} =
        \left.\frac{\partial e}{\partial T}\right\vert_{V}
        \frac{\partial T}{\partial t} +
        \left.\frac{\partial e}{\partial V}\right\vert_{T}
        \frac{\partial V}{\partial t}{\thinspace .}

The first coefficient :math:`\partial e/\partial T` is called *specific heat
capacity at constant volume*, denoted by :math:`c_v`:

.. math::
         c_v = \left.\frac{\partial e}{\partial T}\right\vert_{V}{\thinspace .}

The specific heat capacity will in general vary with :math:`T`, but taking it as a
constant is a good approximation in many applications.

The term :math:`\partial e/\partial V` models effects due to compressibility
and volume expansion. These effects are often small and can be
neglected.  We shall do so here. Using :math:`\partial e/\partial t =
c_v\partial T/\partial t` in the PDE gives

.. math::
         \varrho c_v\frac{\partial T}{\partial t} = -\nabla\cdot\boldsymbol{q} + f{\thinspace .}

We still have four unknown scalar fields (:math:`T` and :math:`\boldsymbol{q}`). To close the
system, we need a relation between the heat flux :math:`\boldsymbol{q}` and the temperature :math:`T`
called Fourier's law:

.. math::
         \boldsymbol{q} = -k\nabla T,

which simply states that heat flows from hot to cold areas, along the
path of greatest variation. In a solid medium, :math:`k` depends on the
material of the medium, and in multi-material media one must regard
:math:`k` as spatially dependent. In a fluid, it is common to assume that
:math:`k` is constant. The value of :math:`k` reflects how easy heat is
conducted through the medium, and :math:`k` is named the *coefficient of
heat conduction*.

We have now one scalar PDE for the unknown temperature field :math:`T(\boldsymbol{x},t)`:

.. _Eq:diffu:app:heat:PDE:

.. math::

    \tag{474}
    \varrho  c_v\frac{\partial T}{\partial t} = \nabla\cdot(k\nabla T) + f{\thinspace .}
        
        
.. _diffu:app:porous:

Porous media flow
-----------------

The requirement of mass balance for
flow of a single, incompressible fluid through a deformable (elastic) porous
medium leads to the equation

.. math::
        
        S\frac{\partial p}{\partial t} + \nabla\cdot(\boldsymbol{q} - \alpha\frac{\partial\boldsymbol{u}}{\partial t}) = 0,
        

where :math:`p` is the fluid pressure, :math:`\boldsymbol{q}` is the fluid velocity, :math:`\boldsymbol{u}`
is the displacement (deformation) of the medium, :math:`S` is the storage
coefficient of the medium (related to the compressibility of the
fluid and the material in the medium), and :math:`\alpha` is another coefficient.
In many circumstances, the last term with :math:`\boldsymbol{u}` can be neglected,
an assumption that decouples the equation above from a model for
the deformation of the medium. The famous Darcy's law relates
:math:`\boldsymbol{q}` to :math:`p`:

.. math::
         \boldsymbol{q} = -\frac{K}{\mu}(\nabla p - \varrho\boldsymbol{g}),

where :math:`K` is the permeability of the medium, :math:`\mu` is the dynamic
viscosity of the fluid, :math:`\varrho` is the density of the fluid, and :math:`\boldsymbol{g}`
is the acceleration of gravity, here taken as :math:`\boldsymbol{g} = -g\boldsymbol{k}`.
Combining the two equations results in the diffusion model

.. _Eq:diffu:app:porous:PDE:

.. math::

    \tag{475}
    S\frac{\partial p}{\partial t} = \mu^{-1}\nabla(K\nabla p) + \frac{\varrho g}{\mu}\frac{\partial K}{\partial z}{\thinspace .}
        
        
Boundary conditions consist of specifying :math:`p` or :math:`\boldsymbol{q}\cdot\boldsymbol{n}` at
(normal velocity) each point of the boundary.

Potential fluid flow
--------------------

Let :math:`\boldsymbol{v}` be the velocity of a fluid. The condition :math:`\nabla\times\boldsymbol{v} =0`
is relevant for many flows, especially in geophysics when viscous effects
are negligible. From vector calculus it is known
that :math:`\nabla\times\boldsymbol{v} =0` implies that :math:`v` can be derived
from a scalar potential field :math:`\phi`: :math:`\boldsymbol{v} = \nabla\phi`. If the
fluid is incompressible, :math:`\nabla\cdot\boldsymbol{v} = 0`, it follows that
:math:`\nabla\cdot\nabla\phi = 0`, or

.. _Eq:_auto200:

.. math::

    \tag{476}
    \nabla^2\phi = 0{\thinspace .}
        
        
This Laplace equation is sufficient for determining :math:`\phi` and thereby
describe the fluid motion. This type of flow is known as
`potential flow <https://en.wikipedia.org/wiki/Potential_flow>`__.
One very important application where potential flow is a good model
is water waves.
As boundary condition we must prescribe
:math:`\boldsymbol{v}\cdot\boldsymbol{n} =\partial\phi/\partial n`. This gives rise to
what is known as a pure Neumann problem and will cause numerical
difficulties because :math:`\phi` and :math:`\phi` plus any constant are two
solutions of the problem. The simplest remedy is to fix the value of
:math:`\phi` at a point.

Streamlines for 2D fluid flow
-----------------------------

The streamlines in a two-dimensional
stationary fluid flow are lines tangential to the flow.
The `stream function <https://en.wikipedia.org/wiki/Stream_function>`__
:math:`\psi` is often introduced in two-dimensional flow
such that its contour
lines, :math:`\psi = \hbox{const}`, gives the streamlines. The relation
between :math:`\psi` and the velocity field :math:`\boldsymbol{v}=(u,v)` is

.. math::
         u = \frac{\partial\psi}{\partial y},\quad v = -
        \frac{\partial\psi}{\partial x}{\thinspace .}

It follows that :math:`\nabla\boldsymbol{v} = \psi_{yx} - \psi_{xy}=0`, so the stream function
can only be used for incompressible flows.
Since

.. math::
        \nabla\times\boldsymbol{v} = \left(\frac{\partial v}{\partial y} - \frac{\partial u}{\partial x}\right)\boldsymbol{k} \equiv \omega\boldsymbol{k},

we can derive the relation

.. _Eq:_auto201:

.. math::

    \tag{477}
    \nabla^2\psi = -\omega,
        
        
which is a governing equation for the stream function :math:`\psi(x,y)`
if the vorticity :math:`\omega` is known.

The potential of an electric field
----------------------------------

Under the assumption of time independence, Maxwell's equations
for the electric field :math:`\boldsymbol{E}` become

.. math::
        \begin{align*}
        \nabla\cdot\boldsymbol{E} &= \frac{\rho}{\epsilon_0},\\ 
        \nabla\times\boldsymbol{E} &= 0,
        \end{align*}

where :math:`\rho` is the electric charge density and :math:`\epsilon_0` is
the electric permittivity of free space (i.e., vacuum).
Since :math:`\nabla\times\boldsymbol{E}=0`, :math:`\boldsymbol{E}` can be derived from a potential
:math:`\varphi`, :math:`\boldsymbol{E} = -\nabla\varphi`. The electric field potential is
therefore governed by the Poisson equation

.. _Eq:_auto202:

.. math::

    \tag{478}
    \nabla^2\varphi = -\frac{\rho}{\epsilon_0}{\thinspace .}
        
        
If the medium is heterogeneous, :math:`\rho` will depend on the spatial location :math:`\boldsymbol{r}`.
Also, :math:`\epsilon_0` must be exchanged with an electric permittivity function
:math:`\epsilon(\boldsymbol{r})`.

Each point of the boundary must be accompanied by, either a Dirichlet condition
:math:`\varphi(\boldsymbol{r}) = \varphi_D(\boldsymbol{r})`, or a Neumann condition
:math:`\frac{\partial\varphi(\boldsymbol{r})}{\partial n} = \varphi_N(\boldsymbol{r})`.

.. _diffu:app:Couette:

Development of flow between two flat plates
-------------------------------------------

Diffusion equations may also arise as simplified versions of other
mathematical models, especially in fluid flow. Consider a fluid
flowing between two flat, parallel plates. The velocity is
uni-directional, say along the :math:`z` axis, and depends only on the
distance :math:`x` from the plates; :math:`\boldsymbol{u} = u(x,t)\boldsymbol{k}`.  The flow is governed
by the Navier-Stokes equations,

.. math::
        \begin{align*}
        \varrho\frac{\partial\boldsymbol{u}}{\partial t} + \varrho\boldsymbol{u}\cdot\nabla\boldsymbol{u} &= -\nabla p
        + \mu\nabla^2\boldsymbol{u} + \varrho\boldsymbol{f},\\ 
        \nabla\cdot\boldsymbol{u} &= 0,
        \end{align*}

where :math:`p` is the pressure field, unknown along with the velocity :math:`\boldsymbol{u}`,
:math:`\varrho` is the fluid density, :math:`\mu` the dynamic viscosity, and :math:`\boldsymbol{f}`
is some external body force. The geometric restrictions of flow between
two flat plates puts restrictions on the velocity, :math:`\boldsymbol{u} = u(x,t)\boldsymbol{i}`,
and the :math:`z` component of the Navier-Stokes equations collapses to
a diffusion equation:

.. math::
        
        \varrho\frac{\partial u}{\partial t} = - \frac{\partial p}{\partial z} +
        \mu\frac{\partial^2 u}{\partial z^2}
        + \varrho f_z,
        

if :math:`f_z` is the component of :math:`\boldsymbol{f}` in the :math:`z` direction.

The boundary conditions are derived from the fact that the fluid sticks to
the plates, which means :math:`\boldsymbol{u}=0` at the plates. Say the location of the
plates are :math:`z=0` and :math:`z=L`. We then have

.. math::
         u(0,t)=u(L,t)=0{\thinspace .}

One can easily show that :math:`\partial p/\partial z` must be a constant or
just a function of time :math:`t`. We set :math:`\partial p/\partial z = -\beta(t)`.
The body force could be a component of gravity, if desired, set as
:math:`f_z = \gamma g`. Switching from :math:`z` to :math:`x` as independent variable gives
a very standard one-dimensional diffusion equation:

.. math::
        
        \varrho\frac{\partial u}{\partial t} =
        \mu\frac{\partial^2 u}{\partial z^2} + \beta(t) + \varrho\gamma g,\quad
        x\in [0,L],\ t\in (0,T]{\thinspace .}
        

The boundary conditions are

.. math::
         u(0,t)=u(L,t)=0,

while some initial condition

.. math::
         u(x,0) = I(x)

must also be prescribed.

The flow is driven by either the pressure gradient :math:`\beta` or gravity, or
a combination of both. One may also consider one moving plate that
drives the fluid. If the plate at :math:`x=L` moves with velocity :math:`U_L(t)`, we have
the adjusted boundary condition

.. math::
         u(L,t) = U_L(t){\thinspace .}

.. _diffu:app:pipeflow:

Flow in a straight tube
-----------------------

Now we consider viscous fluid flow in a straight tube with radius :math:`R`
and rigid walls.
The governing equations are the Navier-Stokes equations, but as
in the section :ref:`diffu:app:Couette`, it is natural to assume that the
velocity is directed along the tube, and that it is axi-symmetric.
These assumptions reduced the velocity field to :math:`\boldsymbol{u} = u(r,x,t)\boldsymbol{i}`, if the
:math:`x` axis is directed along the tube. From the equation of continuity,
:math:`\nabla\cdot\boldsymbol{u} = 0`, we see that :math:`u` must be independent of :math:`x`.
Inserting :math:`\boldsymbol{u} = u(r,t)\boldsymbol{i}` in the Navier-Stokes equations, expressed
in axi-symmetric cylindrical coordinates, results in

.. _Eq:diffu:app:pipeflow:pde:

.. math::

    \tag{479}
    \varrho\frac{\partial u}{\partial t} =
        \mu\frac{1}{r}\frac{\partial}{\partial r}\left(
        r\frac{\partial u}{\partial r}\right) + \beta(t) + \varrho\gamma g,\quad
        r\in [0,R],\ t\in (0,T]{\thinspace .}
        
        
Here, :math:`\beta(t) = -\partial p/\partial x` is the pressure gradient along
the tube. The associated boundary condition is :math:`u(R,t)=0`.

Tribology: thin film fluid flow
-------------------------------

Thin fluid films are extremely important inside machinery to reduce friction
between gliding surfaces. The mathematical model for the fluid motion takes
the form of a diffusion problem and is quickly derived here.
We consider two solid surfaces whose distance is described by a
gap function :math:`h(x,y)`. The space between these surfaces is filled with
a fluid with dynamic viscosity :math:`\mu`. The fluid may move partially
because of pressure gradients and partially because the surfaces
move. Let :math:`U\boldsymbol{i} + V\boldsymbol{j}` be the relative velocity of the two surfaces and
:math:`p` the pressure in the fluid. The mathematical model builds on two
principles: 1) conservation of mass, 2) assumption of locally quasi-static
flow between flat plates.

The conservation of mass equation reads :math:`\nabla\cdot\boldsymbol{u}`, where :math:`\boldsymbol{u}` is
the local fluid velocity. For thin films the detailed variation between
the surfaces is not of interest, so :math:`\nabla\cdot\boldsymbol{u} = 0` is integrated
(average) in the direction perpendicular to the surfaces. This gives
rise to the alternative mass conservation equation

.. math::
         \nabla\cdot\boldsymbol{q} = 0,\quad \boldsymbol{q} = \int\limits_0^{h(x,y)}\boldsymbol{u} dz,

where :math:`z` is the coordinate perpendicular to the surfaces, and :math:`\boldsymbol{q}`
is then the volume flux in the fluid gap.

Locally, we may assume that we have steady flow between two flat
surfaces, with a pressure gradient and where the lower surface is at rest
and the upper moves with velocity :math:`U\boldsymbol{i} + V\boldsymbol{j}`.
The corresponding mathematical problem is actually the limit problem
in the section :ref:`diffu:app:Couette` as :math:`t\rightarrow\infty`. The limit
problem can be solved analytically, and the local volume flux becomes

.. math::
         \boldsymbol{q}(x,y,z) = \int\limits_0^{h}\boldsymbol{u}(x,y,z) dz
        = -\frac{h^3}{12\mu}\nabla p + \frac{1}{2} Uh\boldsymbol{i} + \frac{1}{2} Vh\boldsymbol{j}{\thinspace .}

The idea is to use this expression locally
also when the surfaces are not flat,
but slowly varying, and if :math:`U`, :math:`V`, or :math:`p` varies in time, provided the
time variation is sufficiently slow. This is a common quasi-static
approximation much used in mathematical modeling.

Inserting the expression for :math:`\boldsymbol{q}` via :math:`p`, :math:`U`, and :math:`V` in the
equation :math:`\nabla\boldsymbol{q} = 0` gives a diffusion PDE for :math:`p`:

.. _Eq:_auto203:

.. math::

    \tag{480}
    \nabla\cdot\left(\frac{h^3}{12\mu}\nabla p\right) =
        \frac{1}{2}\frac{\partial}{\partial x}(hU)
        + \frac{1}{2}\frac{\partial}{\partial x}(hV){\thinspace .}
        
        
The boundary conditions must involve :math:`p` or :math:`\boldsymbol{q}` at the boundary.

Propagation of electrical signals in the brain
----------------------------------------------

.. `<http://icwww.epfl.ch/~gerstner/SPNM/node17.html>`_

.. `<http://www.uio.no/studier/emner/matnat/ifi/INF5610/h09/Lecture04.pdf>`_

.. `<http://people.mbi.ohio-state.edu/schwemmer.2/Publications/Schwemmer_Dissertation_Final.pdf>`_

.. The book by Peskin & ...

One can make a model of how electrical signals are propagated along the
neuronal fibers that receive synaptic inputs in the brain. The signal
propagation is one-dimensional and can, in the simplest cases, be
governed by the `Cable equation <http://en.wikipedia.org/wiki/Cable_equation>`__:

.. math::
        
        c_m \frac{\partial V}{\partial t} =
        \frac{1}{r_l}\frac{\partial^2 V}{\partial x^2} - \frac{1}{r_m}V
        label{}
        

where :math:`V(x,t)` is the voltage to be determined,
:math:`c_m` is capacitance of the neuronal fiber, while
:math:`r_l` and :math:`r_m` are measures of the resistance.
The boundary conditions are often taken as :math:`V=0` at a short circuit or open end,
:math:`\partial V/\partial x=0` at a sealed end, or :math:`\partial V/\partial x \propto
V` where there is an injection of current.

Exercises          (7)
======================

.. --- begin exercise ---

.. _diffu:exer:CN:Rannacher:

Exercise 3.6: Stabilizing the Crank-Nicolson method by Rannacher time stepping
------------------------------------------------------------------------------

It is well known that the Crank-Nicolson method may give rise to
non-physical oscillations in the solution of diffusion equations
if the initial data exhibit jumps (see the section :ref:`diffu:pde1:analysis:CN`).
Rannacher [Ref15]_ suggested a stabilizing technique
consisting of using the Backward Euler scheme for the first two
time steps with step length :math:`\frac{1}{2}\Delta t`. One can generalize
this idea to taking :math:`2m` time steps of size :math:`\frac{1}{2}\Delta t` with
the Backward Euler method and then continuing with the
Crank-Nicolson method, which is of second-order in time.
The idea is that the high frequencies of the initial solution are
quickly damped out, and the Backward Euler scheme treats these
high frequencies correctly. Thereafter, the high frequency content of
the solution is gone and the Crank-Nicolson method will do well.

Test this idea for :math:`m=1,2,3` on a diffusion problem with a
discontinuous initial condition. Measure the convergence rate using
the solution :ref:`(393) <Eq:diffu:analysis:pde1:step:erf:sol>` with the boundary
conditions
:ref:`(394) <Eq:diffu:analysis:pde1:p1:erf:uL>`-:ref:`(395) <Eq:diffu:analysis:pde1:p1:erf:uR>`
for :math:`t` values such that the conditions are in the vicinity of :math:`\pm 1`.
For example, :math:`t< 5a 1.6\cdot 10^{-2}` makes the solution diffusion from
a step to almost a straight line. The
program ``diffu_erf_sol.py`` shows how to compute the analytical
solution.

.. --- end exercise ---

.. --- begin exercise ---

.. _diffu:exer:energy:estimates:

Project 3.7: Energy estimates for diffusion problems
----------------------------------------------------

.. index:: energy estimates (diffusion)

This project concerns so-called *energy estimates* for diffusion problems
that can be used for qualitative analytical insight and for
verification of implementations.

**a)**
We start with a 1D homogeneous diffusion equation with zero Dirichlet
conditions:

.. _Eq:diffu:exer:estimates:p1:pdf:

.. math::

    \tag{481}
    u_t = \alpha u_xx,  x\in \Omega =(0,L),\ t\in (0,T],
         
        
.. _Eq:diffu:exer:estimates:p1:bc:

.. math::

    \tag{482}
    u(0,t) = u(L,t) = 0,  t\in (0,T],
        
        
.. _Eq:diffu:exer:estimates:p1:ic:

.. math::

    \tag{483}
    u(x,0) = I(x),  x\in [0,L]
        
        {\thinspace .}
        

The energy estimate for this problem reads

.. _Eq:diffu:exer:estimates:p1:result:

.. math::

    \tag{484}
    ||u||_{L^2} \leq ||I||_{L^2},
        
        
where the :math:`||\cdot ||_{L^2}` norm is defined by

.. _Eq:diffu:exer:estimates:L2:

.. math::

    \tag{485}
    ||g||_{L^2} = \sqrt{\int_0^L g^2dx}{\thinspace .}
        
        
The quantify  :math:`||u||_{L^2}` or :math:`\frac{1}{2} ||u||_{L^2}` is known
as the *energy* of the solution, although it is not the physical
energy of the system. A mathematical tradition has introduced the
notion *energy* in this context.

The estimate :ref:`(484) <Eq:diffu:exer:estimates:p1:result>` says that the
"size of $u$" never exceeds that of the initial condition,
or more precisely, it says that the area under the :math:`u` curve decreases
with time.

To show :ref:`(484) <Eq:diffu:exer:estimates:p1:result>`, multiply the PDE
by :math:`u` and integrate from :math:`0` to :math:`L`. Use that :math:`uu_t` can be
expressed as the time derivative of :math:`u^2` and that :math:`u_xxu` can
integrated by parts to form an integrand :math:`u_x^2`. Show that
the time derivative of :math:`||u||_{L^2}^2` must be less than or equal
to zero. Integrate this expression and derive
:ref:`(484) <Eq:diffu:exer:estimates:p1:result>`.

.. `<http://www.ann.jussieu.fr/~frey/cours/UdC/ma691/ma691_ch6.pdf>`_

**b)**
Now we address a slightly different problem,

.. _Eq:diffu:exer:estimates:p2:pdf:

.. math::

    \tag{486}
    u_t = \alpha u_xx + f(x,t),  x\in \Omega =(0,L),\ t\in (0,T],
         
        
.. _Eq:diffu:exer:estimates:p2:bc:

.. math::

    \tag{487}
    u(0,t) = u(L,t) = 0,  t\in (0,T],
        
        
.. _Eq:diffu:exer:estimates:p2:ic:

.. math::

    \tag{488}
    u(x,0) = 0,  x\in [0,L]
        
        {\thinspace .}
        

The associated energy estimate is

.. _Eq:diffu:exer:estimates:p2:result:

.. math::

    \tag{489}
    ||u||_{L^2} \leq ||f||_{L^2}{\thinspace .}
        
        
(This result is more difficult to derive.)

Now consider the compound problem with an initial condition :math:`I(x)` and
a right-hand side :math:`f(x,t)`:

.. _Eq:diffu:exer:estimates:p3:pdf:

.. math::

    \tag{490}
    u_t = \alpha u_xx + f(x,t),  x\in \Omega =(0,L),\ t\in (0,T],
         
        
.. _Eq:diffu:exer:estimates:p3:bc:

.. math::

    \tag{491}
    u(0,t) = u(L,t) = 0,  t\in (0,T],
        
        
.. _Eq:diffu:exer:estimates:p3:ic:

.. math::

    \tag{492}
    u(x,0) = I(x),  x\in [0,L]
        
        {\thinspace .}
        

Show that if :math:`w_1` fulfills
:ref:`(481) <Eq:diffu:exer:estimates:p1:pdf>`-:ref:`(483) <Eq:diffu:exer:estimates:p1:ic>`
and :math:`w_2` fulfills
:ref:`(486) <Eq:diffu:exer:estimates:p2:pdf>`-:ref:`(488) <Eq:diffu:exer:estimates:p2:ic>`,
then :math:`u=w_1 + w_2` is the solution of
:ref:`(490) <Eq:diffu:exer:estimates:p3:pdf>`-:ref:`(492) <Eq:diffu:exer:estimates:p3:ic>`.
Using the triangle inequality for norms,

.. math::
         ||a + b|| \leq ||a|| + ||b||,

show that the energy estimate for
:ref:`(490) <Eq:diffu:exer:estimates:p3:pdf>`-:ref:`(492) <Eq:diffu:exer:estimates:p3:ic>`
becomes

.. _Eq:diffu:exer:estimates:p3:result:

.. math::

    \tag{493}
    ||u||_{L^2} \leq ||I||_{L^2} + ||f||_{L^2}{\thinspace .}
        
        
**c)**
One application of :ref:`(493) <Eq:diffu:exer:estimates:p3:result>` is to prove uniqueness
of the solution.
Suppose :math:`u_1` and :math:`u_2` both fulfill
:ref:`(490) <Eq:diffu:exer:estimates:p3:pdf>`-:ref:`(492) <Eq:diffu:exer:estimates:p3:ic>`.
Show that :math:`u=u_1 - u_2` then fulfills
:ref:`(490) <Eq:diffu:exer:estimates:p3:pdf>`-:ref:`(492) <Eq:diffu:exer:estimates:p3:ic>`
with :math:`f=0` and :math:`I=0`. Use :ref:`(493) <Eq:diffu:exer:estimates:p3:result>`
to deduce that the energy must be zero for all times and therefore
that :math:`u_1=u_2`, which proves that the solution is unique.

**d)**
Generalize :ref:`(493) <Eq:diffu:exer:estimates:p3:result>` to a 2D/3D
diffusion equation :math:`u_t = \nabla\cdot (\alpha \nabla u)` for :math:`x\in\Omega`.

.. --- begin hint in exercise ---

**Hint.**
Use integration by parts in multi dimensions:

.. math::
         \int_\Omega u \nabla\cdot (\alpha\nabla u){\, \mathrm{d}x} =
        - \int_\Omega \alpha \nabla u\cdot\nabla u{\, \mathrm{d}x}
        + \int_{\partial\Omega} u \alpha\frac{\partial u}{\partial n},

where :math:`\frac{\partial u}{\partial n} = \boldsymbol{n}\cdot\nabla u`,
:math:`\boldsymbol{n}` being the outward unit normal to the boundary :math:`\partial\Omega`
of the domain :math:`\Omega`.

.. --- end hint in exercise ---

**e)**
Now we also consider the multi-dimensional PDE :math:`u_t =
\nabla\cdot (\alpha \nabla u)`. Integrate both sides over :math:`\Omega`
and use Gauss' divergence theorem, :math:`\int_\Omega \nabla\cdot\boldsymbol{q}{\, \mathrm{d}x}
= \int_{\partial\Omega}\boldsymbol{q}\cdot\boldsymbol{n}{\, \mathrm{d}s}` for a vector field
:math:`\boldsymbol{q}`. Show that if we have homogeneous Neumann conditions
on the boundary, :math:`\partial u/\partial n=0`, area under the
:math:`u` surface remains constant in time and

.. _Eq:diffu:exer:estimates:p4:result:

.. math::

    \tag{494}
    \int_{\Omega} u{\, \mathrm{d}x} = \int_{\Omega} I{\, \mathrm{d}x}
        {\thinspace .}
        
        
**f)**
Establish a code in 1D, 2D, or 3D that can solve a diffusion equation with a
source term :math:`f`, initial condition :math:`I`, and zero Dirichlet or
Neumann conditions on the whole boundary.

We can use :ref:`(493) <Eq:diffu:exer:estimates:p3:result>`
and :ref:`(494) <Eq:diffu:exer:estimates:p4:result>` as a partial verification
of the code. Choose some functions :math:`f` and :math:`I` and
check that :ref:`(493) <Eq:diffu:exer:estimates:p3:result>` is obeyed at any
time when zero Dirichlet conditions are used.
Iterate over the same :math:`I` functions and check that
:ref:`(494) <Eq:diffu:exer:estimates:p4:result>` is fulfilled
when using zero Neumann conditions.

**g)**
Make a list of some possible bugs in the code, such as indexing errors
in arrays, failure to set the correct boundary conditions,
evaluation of a term at a wrong time level, and similar.
For each of the bugs, see if the verification tests from the previous
subexercise pass or fail. This investigation shows how strong
the energy estimates and the estimate :ref:`(494) <Eq:diffu:exer:estimates:p4:result>`
are for pointing out errors in the implementation.

Filename: ``diffu_energy``.

.. --- end exercise ---

.. --- begin exercise ---

.. _diffu:exer:splitting_prec:

Exercise 3.8: Splitting methods and preconditioning
---------------------------------------------------

.. index:: Richardson iteration

.. index:: preconditioning

In the section :ref:`diffu:2D:direct_vs_iter`, we outlined a class of
iterative methods for :math:`Au=b` based on splitting :math:`A` into :math:`A=M-N`
and introducing the iteration

.. math::
         Mu^{k} = Nu^k + b{\thinspace .}

The very simplest splitting is :math:`M=I`, where :math:`I` is the identity
matrix. Show that this choice corresponds to the iteration

.. _Eq:diffu:exer:splitting_prec:simplest:

.. math::

    \tag{495}
    u^k = u^{k-1} + r^{k-1},\quad r^{k-1} = b - Au^{k-1},
        
        
where :math:`r^{k-1}` is the residual in the linear system in iteration
:math:`k-1`. The formula :ref:`(495) <Eq:diffu:exer:splitting_prec:simplest>` is known
as Richardson's iteration.
Show that if we apply the simple iteration method
:ref:`(495) <Eq:diffu:exer:splitting_prec:simplest>` to the *preconditioned*
system :math:`M^{-1}Au=M^{-1}b`, we arrive at the Jacobi method by choosing
:math:`M=D` (the diagonal of :math:`A`) as preconditioner and the SOR method by
choosing :math:`M=\omega^{-1}D + L` (:math:`L` being the lower triangular part of
:math:`A`).  This equivalence shows that we can apply one iteration of the
Jacobi or SOR method as preconditioner.

.. removed !bsol ... !esol environment (because of the command-line option --without_solutions)

.. --- end exercise ---

.. --- begin exercise ---

.. _diffu:exer:earthosc:

Problem 3.9: Oscillating surface temperature of the earth
---------------------------------------------------------

Consider a day-and-night or seasonal variation in temperature at
the surface of the earth. How deep down in the ground will the
surface oscillations reach? For simplicity, we model only the
vertical variation along a coordinate :math:`x`, where :math:`x=0` at the
surface, and :math:`x` increases as we go down in the ground.
The temperature is governed by the heat equation

.. math::
         \varrho c_v\frac{\partial T}{\partial t} = \nabla\cdot(k\nabla T),

in some spatial domain :math:`x\in [0,L]`, where :math:`L` is chosen large enough such
that we can assume that :math:`T` is approximately constant, independent of the surface
oscillations, for :math:`x>L`. The parameters :math:`\varrho`, :math:`c_v`, and :math:`k` are the
density, the specific heat capacity at constant volume, and the
heat conduction coefficient, respectively.

**a)**
Derive the mathematical model for computing :math:`T(x,t)`.
Assume the surface oscillations to be sinusoidal around some mean
temperature :math:`T_m`. Let :math:`T=T_m` initially. At :math:`x=L`, assume :math:`T\approx T_m`.

.. removed !bsol ... !esol environment (because of the command-line option --without_solutions)

**b)**
Scale the model in a) assuming :math:`k` is constant. Use a time scale
:math:`t_c = \omega^{-1}` and a length scale :math:`x_c = \sqrt{2{\alpha}/\omega}`,
where :math:`{\alpha} = k/(\varrho c_v)`. The primary unknown can be scaled
as :math:`\frac{T-T_m}{2A}`.

Show that the scaled PDE is

.. math::
        
        \frac{\partial u}{\partial \bar t} =
        \frac{1}{2}\frac{\partial^2 u}{\partial x^2},

with initial condition :math:`u(\bar x,0) = 0`,
left boundary condition
:math:`u(0,\bar t)  = \sin(\bar t)`,
and right boundary condition
:math:`u(\bar L,\bar t)  = 0`. The bar indicates a dimensionless quantity.

Show that :math:`u(\bar x, \bar t)=e^{-\bar x}\sin (\bar x - \bar t)` is a
solution that fulfills the PDE and the boundary condition at :math:`\bar x
=0` (this is the solution we will experience as :math:`\bar
t\rightarrow\infty` and :math:`L\rightarrow\infty`).  Conclude that an
appropriate domain for :math:`x` is :math:`[0,4]` if a damping :math:`e^{-4}\approx
0.18` is appropriate for implementing :math:`\bar u\approx\hbox{const}`;
increasing to :math:`[0,6]` damps :math:`\bar u` to 0.0025.

.. removed !bsol ... !esol environment (because of the command-line option --without_solutions)

**c)**
Compute the scaled temperature and make animations comparing two solutions
with :math:`\bar L=4` and :math:`\bar L=8`, respectively (keep :math:`\Delta x` the same).

.. removed !bsol ... !esol environment (because of the command-line option --without_solutions)

.. --- end exercise ---

.. --- begin exercise ---

.. _diffu:exer:bloodflow:

Problem 3.10: Oscillating and pulsating flow in tubes
-----------------------------------------------------

We consider flow in a straight tube with radius :math:`R` and straight walls.
The flow is driven by a pressure gradient :math:`\beta(t)`. The effect of
gravity can be neglected. The mathematical problem reads

.. _Eq:_auto204:

.. math::

    \tag{496}
    \varrho\frac{\partial u}{\partial t} =
        \mu\frac{1}{r}\frac{\partial}{\partial r}\left(
        r\frac{\partial u}{\partial r}\right) + \beta(t),\quad
         r\in [0,R],\ t\in (0,T],
        
        
.. _Eq:_auto205:

.. math::

    \tag{497}
    u(r,0) = I(r),\quad  r\in [0,R],
        
        
.. _Eq:_auto206:

.. math::

    \tag{498}
    u(R,t) = 0,\quad  t\in (0,T],
        
        
.. _Eq:_auto207:

.. math::

    \tag{499}
    \frac{\partial u}{\partial r}(0,t) = 0,\quad  t\in (0,T].
        
        
We consider two models for :math:`\beta(t)`. One plain, sinusoidal oscillation:

.. _Eq:_auto208:

.. math::

    \tag{500}
    \beta = A\sin(\omega t),
        
        
and one with periodic pulses,

.. _Eq:_auto209:

.. math::

    \tag{501}
    \beta = A\sin^{16}(\omega t),
        
        
Note that both models can be written as :math:`\beta = A\sin^m(\omega t)`, with
:math:`m=1` and :math:`m=16`, respectively.

**a)**
Scale the mathematical model, using the viscous time scale :math:`\varrho R^2/\mu`.

.. removed !bsol ... !esol environment (because of the command-line option --without_solutions)

**b)**
Implement the scaled model from a), using the unifying :math:`\theta` scheme
in time and centered differences in space.

.. removed !bsol ... !esol environment (because of the command-line option --without_solutions)

**c)**
Verify the implementation in b) using a manufactured solution that is
quadratic in :math:`r` and linear in :math:`t`. Make a corresponding test function.

.. --- begin hint in exercise ---

**Hint.**
You need to include an extra source term
in the equation to allow for such tests. Let the spatial variation be
:math:`1-r^2` such that the boundary condition is fulfilled.

.. --- end hint in exercise ---

**d)**
Make animations for :math:`m=1,16` and :math:`\alpha=1,0.1`. Choose :math:`T` such that
the motion has reached a steady state (non-visible changes from period to
period in :math:`u`).

**e)**
For :math:`\alpha\gg 1`, the scaling in a) is not good, because the
characteristic time for changes (due to the pressure) is much smaller
than the viscous diffusion time scale (:math:`\alpha` becomes large).
We should in this case base
the short time scale on :math:`1/\omega`. Scale the model again, and
make an animation for :math:`m=1,16` and :math:`\alpha = 10`.

.. removed !bsol ... !esol environment (because of the command-line option --without_solutions)

Filename: ``axisymm_flow``.

.. --- end exercise ---

.. --- begin exercise ---

.. _diffu:exer:welding:

Problem 3.11: Scaling a welding problem
---------------------------------------

Welding equipment makes a very localized heat source that moves in
time. We shall investigate the heating due to welding and choose, for
maximum simplicity, a one-dimensional heat equation with a fixed
temperature at the ends, and we neglect melting.  We shall scale the
problem, and besides solving such a problem numerically, the aim is to
investigate the appropriateness of alternative scalings.

The governing PDE problem reads

.. math::
        \begin{alignat*}{2}
        \varrho c\frac{\partial u}{\partial t} &= k\frac{\partial^2 u}{\partial x^2}
        + f, & x\in (0,L),\ t\in (0,T),\\ 
        u(x,0) &= U_s, & x\in [0,L],\\ 
        u(0,t) = u(L,t) &= 0, & t\in (0,T].
        \end{alignat*}

Here, :math:`u` is the temperature, :math:`\varrho` the density of the material,
:math:`c` a heat capacity, :math:`k` the heat conduction coefficient, :math:`f` is
the heat source from the welding equipment, and :math:`U_s` is the
initial constant (room) temperature in the material.

A possible model for the heat source is a moving Gaussian function:

.. math::
         f = A\exp{\left(-\frac{1}{2}\left(\frac{x-vt}{\sigma}\right)^2\right)},

where :math:`A` is the strength, :math:`\sigma` is a parameter governing how
peak-shaped (or localized in space) the heat source is, and
:math:`v` is the velocity (in positive :math:`x` direction) of the source.

**a)**
Let :math:`x_c`, :math:`t_c`, :math:`u_c`, and :math:`f_c` be scales, i.e., characteristic
sizes, of :math:`x`, :math:`t`, :math:`u`, and :math:`f`, respectively. The natural choice of
:math:`x_c` and :math:`f_c` is :math:`L` and :math:`A`, since these make the scaled :math:`x` and
:math:`f` in the interval :math:`[0,1]`.  If each of the three terms in the PDE
are equally important, we can find :math:`t_c` and :math:`u_c` by demanding that
the coefficients in the scaled PDE are all equal to unity.  Perform
this scaling. Use scaled quantities in the arguments for the
exponential function in :math:`f` too and show that

.. math::
         \bar f= e^{-\frac{1}{2}\beta^2(\bar x -\gamma \bar t)^2},

where :math:`\beta` and :math:`\gamma` are dimensionless numbers. Give an
interpretation of :math:`\beta` and :math:`\gamma`.

.. removed !bsol ... !esol environment (because of the command-line option --without_solutions)

**b)**
Argue that for large :math:`\gamma` we should base the time scale on the
movement of the heat source. Show that this gives rise to the scaled
PDE

.. math::
         \frac{\partial\bar u}{\partial\bar t} =
        \gamma^{-1}\frac{\partial^2\bar u}{\partial\bar x^2}
        + \bar f,
        

and

.. math::
         \bar f = \exp{(-\frac{1}{2}\beta^2(\bar x - \bar t)^2)}{\thinspace .}

Discuss when the scalings in a) and b) are appropriate.

.. removed !bsol ... !esol environment (because of the command-line option --without_solutions)

**c)**
One aim with scaling is to get a solution that lies in the interval
:math:`[-1,1]`. This is not always the case when :math:`u_c` is based on a scale
involving a source term, as we do in a) and b).  However, from the
scaled PDE we realize that if we replace :math:`\bar f` with :math:`\delta\bar f`,
where :math:`\delta` is a dimensionless factor, this corresponds to
replacing :math:`u_c` by :math:`u_c/\delta`. So, if we observe that :math:`\bar
u\sim1/\delta` in simulations, we can just replace :math:`\bar f` by :math:`\delta
\bar f` in the scaled PDE.

Use this trick and implement the two scaled models. Reuse software for
the diffusion equation (e.g., the ``solver`` function in
``diffu1D_vc.py``).  Make a function ``run(gamma, beta=10, delta=40,
scaling=1, animate=False)`` that runs the model with the given
:math:`\gamma`, :math:`\beta`, and :math:`\delta` parameters as well as an indicator
``scaling`` that is 1 for the scaling in a) and 2 for the scaling in
b). The last argument can be used to turn screen animations on or off.

Experiments show that with :math:`\gamma=1` and :math:`\beta=10`, :math:`\delta =20`
is appropriate. Then :math:`\max |\bar u|` will be larger than 4 for :math:`\gamma
=40`, but that is acceptable.

Equip the ``run`` function with visualization, both animation of :math:`\bar u`
and :math:`\bar f`, and plots with :math:`\bar u` and :math:`\bar f` for :math:`t=0.2` and :math:`t=0.5`.

.. --- begin hint in exercise ---

**Hint.**
Since the amplitudes of :math:`\bar u` and :math:`\bar f` differs by a factor :math:`\delta`,
it is attractive to plot :math:`\bar f/\delta` together with :math:`\bar u`.

.. --- end hint in exercise ---

.. removed !bsol ... !esol environment (because of the command-line option --without_solutions)

**d)**
Use the software in c) to investigate :math:`\gamma=0.2,1,5,40` for the
two scalings. Discuss the results.

.. removed !bsol ... !esol environment (because of the command-line option --without_solutions)

.. ===== Exercise: Radial heat conduction out of offshore pipelines =====

.. Easy to make something out of the ideas/5620/apps/offshore... mekit

.. paper where one has a multi-walled radial heat conduction equation.

.. Can, as in the paper, use one cell per material. Coupling to soil

.. outside with many parameters given. The discussion of the Fourier

.. number is interesting - I guess time changes here relates to

.. BCs on the inner wall because the gas suddenly has a different

.. temperature? Could be a good project perhaps; anyway, the theory

.. can be written up.

Filename: ``welding``.

.. --- end exercise ---

.. --- begin exercise ---

.. _diffu:exer:axisymm:

Exercise 3.12: Implement a Forward Euler scheme for axi-symmetric diffusion
---------------------------------------------------------------------------

Based on the discussion in the section :ref:`diffu:fd2:radial`, derive in detail
the discrete equations for a Forward Euler in time, centered in space,
finite difference method for axi-symmetric diffusion. The
diffusion coefficient may be a function of the radial coordinate.
At the outer boundary :math:`r=R`, we may have either a Dirichlet or Robin
condition.
Implement this scheme. Construct appropriate test problems.

.. removed !bsol ... !esol environment (because of the command-line option --without_solutions)

Filename: ``FE_axisym``.

.. --- end exercise ---