.. !split .. _ch:diffu: Diffusion equations (1) %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% .. index:: single: diffusion equation; 1D .. index:: heat equation .. index:: diffusion coefficient .. index:: single: diffusion equation; diffusion coefficient The famous *diffusion equation*, also known as the *heat equation*, reads .. math:: \frac{\partial u}{\partial t} = {\alpha} \frac{\partial^2 u}{\partial x^2}, where :math:`u(x,t)` is the unknown function to be solved for, :math:`x` is a coordinate in space, and :math:`t` is time. The coefficient :math:`{\alpha}` is the *diffusion coefficient* and determines how fast :math:`u` changes in time. A quick short form for the diffusion equation is :math:`u_t = {\alpha} u_{xx}`. Compared to the wave equation, :math:`u_{tt}=c^2u_{xx}`, which looks very similar, the diffusion equation features solutions that are very different from those of the wave equation. Also, the diffusion equation makes quite different demands to the numerical methods. .. index:: single: diffusion equation; stationary solution .. index:: stationary solution .. index:: Laplace equation Typical diffusion problems may experience rapid change in the very beginning, but then the evolution of :math:`u` becomes slower and slower. The solution is usually very smooth, and after some time, one cannot recognize the initial shape of :math:`u`. This is in sharp contrast to solutions of the wave equation where the initial shape is preserved in homogeneous media - the solution is then basically a moving initial condition. The standard wave equation :math:`u_{tt}=c^2u_{xx}` has solutions that propagates with speed :math:`c` forever, without changing shape, while the diffusion equation converges to a *stationary solution* :math:`\bar u(x)` as :math:`t\rightarrow\infty`. In this limit, :math:`u_t=0`, and :math:`\bar u` is governed by :math:`\bar u''(x)=0`. This stationary limit of the diffusion equation is called the *Laplace* equation and arises in a very wide range of applications throughout the sciences. It is possible to solve for :math:`u(x,t)` using an explicit scheme, as we do in the section :ref:`diffu:pde1:FEsec`, but the time step restrictions soon become much less favorable than for an explicit scheme applied to the wave equation. And of more importance, since the solution :math:`u` of the diffusion equation is very smooth and changes slowly, small time steps are not convenient and not required by accuracy as the diffusion process converges to a stationary state. Therefore, implicit schemes (as described in the section :ref:`diffu:pde1:implicit`) are popular, but these require solutions of systems of algebraic equations. We shall use ready-made software for this purpose, but also program some simple iterative methods. The exposition is, as usual in this book, very basic and focuses on the basic ideas and how to implement. More comprehensive mathematical treatments and classical analysis of the methods are found in lots of textbooks. A favorite of ours in this respect is the one by LeVeque [Ref07]_. The books by Strikwerda [Ref08]_ and by Lapidus and Pinder [Ref09]_ are also highly recommended as additional material on the topic. .. _diffu:pde1:FEsec: An explicit method for the 1D diffusion equation ================================================ Explicit finite difference methods for the wave equation :math:`u_{tt}=c^2u_{xx}` can be used, with small modifications, for solving :math:`u_t = {\alpha} u_{xx}` as well. The exposition below assumes that the reader is familiar with the basic ideas of discretization and implementation of wave equations from the chapter :ref:`ch:wave`. Readers not familiar with the Forward Euler, Backward Euler, and Crank-Nicolson (or centered or midpoint) discretization methods in time should consult, e.g., the section `Finite difference methods `__ in [Ref02]_. The initial-boundary value problem for 1D diffusion --------------------------------------------------- To obtain a unique solution of the diffusion equation, or equivalently, to apply numerical methods, we need initial and boundary conditions. The diffusion equation goes with one initial condition :math:`u(x,0)=I(x)`, where :math:`I` is a prescribed function. One boundary condition is required at each point on the boundary, which in 1D means that :math:`u` must be known, :math:`u_x` must be known, or some combination of them. .. index:: single: diffusion equation; 1D, explicit scheme .. index:: single: diffusion equation; 1D, initial boundary value problem .. index:: single: diffusion equation; 1D, initial condition .. index:: single: diffusion equation; 1D, boundary condition We shall start with the simplest boundary condition: :math:`u=0`. The complete initial-boundary value diffusion problem in one space dimension can then be specified as .. _Eq:diffu:pde1: .. math:: \tag{349} \frac{\partial u}{\partial t} = {\alpha} \frac{\partial^2 u}{\partial x^2} + f, \quad x\in (0,L),\ t\in (0,T] .. _Eq:diffu:pde1:ic:u: .. math:: \tag{350} u(x,0) = I(x), \quad x\in [0,L] .. _Eq:diffu:pde1:bc:0: .. math:: \tag{351} u(0,t) = 0, \quad t>0, .. _Eq:diffu:pde1:bc:L: .. math:: \tag{352} u(L,t) = 0, \quad t>0{\thinspace .} With only a first-order derivative in time, only one *initial condition* is needed, while the second-order derivative in space leads to a demand for two *boundary conditions*. We have added a source term :math:`f=f(x,t)`, which is convenient when testing implementations. .. index:: single: diffusion equation; source term Diffusion equations like :ref:`(349) ` have a wide range of applications throughout physical, biological, and financial sciences. One of the most common applications is propagation of heat, where :math:`u(x,t)` represents the temperature of some substance at point :math:`x` and time :math:`t`. Other applications are listed in the section :ref:`diffu:app`. .. _diffu:pde1:FE: Forward Euler scheme (1) --------------------------------- .. index:: explicit discretization methods .. index:: single: diffusion equation; 1D, Forward Euler scheme .. index:: domain .. index:: mesh points .. index:: mesh function .. index:: forward difference approximation .. index:: central difference approximation .. index:: single: diffusion equation; 1D, discrete equations .. index:: single: diffusion equation; 1D, Fourier number .. index:: single: diffusion equation; 1D, mesh Fourier number .. index:: dimensionless number The first step in the discretization procedure is to replace the domain :math:`[0,L]\times [0,T]` by a set of mesh points. Here we apply equally spaced mesh points .. math:: x_i=i\Delta x,\quad i=0,\ldots,N_x, and .. math:: t_n=n\Delta t,\quad n=0,\ldots,N_t {\thinspace .} Moreover, :math:`u^n_i` denotes the mesh function that approximates :math:`u(x_i,t_n)` for :math:`i=0,\ldots,N_x` and :math:`n=0,\ldots,N_t`. Requiring the PDE :ref:`(349) ` to be fulfilled at a mesh point :math:`(x_i,t_n)` leads to the equation .. _Eq:diffu:pde1:step2: .. math:: \tag{353} \frac{\partial}{\partial t} u(x_i, t_n) = {\alpha}\frac{\partial^2}{\partial x^2} u(x_i, t_n) + f(x_i,t_n), The next step is to replace the derivatives by finite difference approximations. The computationally simplest method arises from using a forward difference in time and a central difference in space: .. _Eq:diffu:pde1:step3a: .. math:: \tag{354} [D_t^+ u = {\alpha} D_xD_x u + f]^n_i {\thinspace .} Written out, .. _Eq:diffu:pde1:step3b: .. math:: \tag{355} \frac{u^{n+1}_i-u^n_i}{\Delta t} = {\alpha} \frac{u^{n}_{i+1} - 2u^n_i + u^n_{i-1}}{\Delta x^2} + f_i^n{\thinspace .} We have turned the PDE into algebraic equations, also often called discrete equations. The key property of the equations is that they are algebraic, which makes them easy to solve. As usual, we anticipate that :math:`u^n_i` is already computed such that :math:`u^{n+1}_i` is the only unknown in :ref:`(355) `. Solving with respect to this unknown is easy: .. _Eq:diffu:pde1:step4: .. math:: \tag{356} u^{n+1}_i = u^n_i + F\left( u^{n}_{i+1} - 2u^n_i + u^n_{i-1}\right) + \Delta t f_i^n, where we have introduced the *mesh Fourier number*: .. _Eq:_auto144: .. math:: \tag{357} F = {\alpha}\frac{\Delta t}{\Delta x^2}{\thinspace .} .. admonition:: :math:`F` is the key parameter in the discrete diffusion equation Note that :math:`F` is a *dimensionless* number that lumps the key physical parameter in the problem, :math:`{\alpha}`, and the discretization parameters :math:`\Delta x` and :math:`\Delta t` into a single parameter. Properties of the numerical method are critically dependent upon the value of :math:`F` (see the section :ref:`diffu:pde1:analysis` for details). The computational algorithm then becomes 1. compute :math:`u^0_i=I(x_i)` for :math:`i=0,\ldots,N_x` 2. for :math:`n=0,1,\ldots,N_t`: a. apply :ref:`(356) ` for all the internal spatial points :math:`i=1,\ldots,N_x-1` b. set the boundary values :math:`u^{n+1}_i=0` for :math:`i=0` and :math:`i=N_x` The algorithm is compactly and fully specified in Python: .. code-block:: python import numpy as np x = np.linspace(0, L, Nx+1) # mesh points in space dx = x[1] - x[0] t = np.linspace(0, T, Nt+1) # mesh points in time dt = t[1] - t[0] F = a*dt/dx**2 u = np.zeros(Nx+1) # unknown u at new time level u_n = np.zeros(Nx+1) # u at the previous time level # Set initial condition u(x,0) = I(x) for i in range(0, Nx+1): u_n[i] = I(x[i]) for n in range(0, Nt): # Compute u at inner mesh points for i in range(1, Nx): u[i] = u_n[i] + F*(u_n[i-1] - 2*u_n[i] + u_n[i+1]) + \ dt*f(x[i], t[n]) # Insert boundary conditions u[0] = 0; u[Nx] = 0 # Update u_n before next step u_n[:]= u Note that we use ``a`` for :math:`{\alpha}` in the code, motivated by easy visual mapping between the variable name and the mathematical symbol in formulas. We need to state already now that the shown algorithm does not produce meaningful results unless :math:`F\leq 1/2`. Why is explained in the section :ref:`diffu:pde1:analysis`. .. _diffu:pde1:FE:code: Implementation (7) --------------------------- .. index:: single: diffusion equation; 1D, implementation (FE) The file `diffu1D_u0.py `__ contains a complete function ``solver_FE_simple`` for solving the 1D diffusion equation with :math:`u=0` on the boundary as specified in the algorithm above: .. code-block:: python import numpy as np def solver_FE_simple(I, a, f, L, dt, F, T): """ Simplest expression of the computational algorithm using the Forward Euler method and explicit Python loops. For this method F <= 0.5 for stability. """ import time; t0 = time.clock() # For measuring the CPU time Nt = int(round(T/float(dt))) t = np.linspace(0, Nt*dt, Nt+1) # Mesh points in time dx = np.sqrt(a*dt/F) Nx = int(round(L/dx)) x = np.linspace(0, L, Nx+1) # Mesh points in space # Make sure dx and dt are compatible with x and t dx = x[1] - x[0] dt = t[1] - t[0] u = np.zeros(Nx+1) u_n = np.zeros(Nx+1) # Set initial condition u(x,0) = I(x) for i in range(0, Nx+1): u_n[i] = I(x[i]) for n in range(0, Nt): # Compute u at inner mesh points for i in range(1, Nx): u[i] = u_n[i] + F*(u_n[i-1] - 2*u_n[i] + u_n[i+1]) + \ dt*f(x[i], t[n]) # Insert boundary conditions u[0] = 0; u[Nx] = 0 # Switch variables before next step #u_n[:] = u # safe, but slow u_n, u = u, u_n t1 = time.clock() return u_n, x, t, t1-t0 # u_n holds latest u A faster version, based on vectorization of the finite difference scheme, is available in the function ``solver_FE``. The vectorized version replaces the explicit loop .. code-block:: python for i in range(1, Nx): u[i] = u_n[i] + F*(u_n[i-1] - 2*u_n[i] + u_n[i+1]) \ + dt*f(x[i], t[n]) by arithmetics on displaced slices of the ``u`` array: .. code-block:: python u[1:Nx] = u_n[1:Nx] + F*(u_n[0:Nx-1] - 2*u_n[1:Nx] + u_n[2:Nx+1]) \ + dt*f(x[1:Nx], t[n]) # or u[1:-1] = u_n[1:-1] + F*(u_n[0:-2] - 2*u_n[1:-1] + u_n[2:]) \ + dt*f(x[1:-1], t[n]) For example, the vectorized version runs 70 times faster than the scalar version in a case with 100 time steps and a spatial mesh of :math:`10^5` cells. The ``solver_FE`` function also features a callback function such that the user can process the solution at each time level. The callback function looks like ``user_action(u, x, t, n)``, where ``u`` is the array containing the solution at time level ``n``, ``x`` holds all the spatial mesh points, while ``t`` holds all the temporal mesh points. Apart from the vectorized loop over the spatial mesh points, the callback function, and a bit more complicated setting of the source ``f`` it is not specified (``None``), the ``solver_FE`` function is identical to ``solver_FE_simple`` above: .. code-block:: python def solver_FE(I, a, f, L, dt, F, T, user_action=None, version='scalar'): """ Vectorized implementation of solver_FE_simple. """ import time; t0 = time.clock() # for measuring the CPU time Nt = int(round(T/float(dt))) t = np.linspace(0, Nt*dt, Nt+1) # Mesh points in time dx = np.sqrt(a*dt/F) Nx = int(round(L/dx)) x = np.linspace(0, L, Nx+1) # Mesh points in space # Make sure dx and dt are compatible with x and t dx = x[1] - x[0] dt = t[1] - t[0] u = np.zeros(Nx+1) # solution array u_n = np.zeros(Nx+1) # solution at t-dt # Set initial condition for i in range(0,Nx+1): u_n[i] = I(x[i]) if user_action is not None: user_action(u_n, x, t, 0) for n in range(0, Nt): # Update all inner points if version == 'scalar': for i in range(1, Nx): u[i] = u_n[i] +\ F*(u_n[i-1] - 2*u_n[i] + u_n[i+1]) +\ dt*f(x[i], t[n]) elif version == 'vectorized': u[1:Nx] = u_n[1:Nx] + \ F*(u_n[0:Nx-1] - 2*u_n[1:Nx] + u_n[2:Nx+1]) +\ dt*f(x[1:Nx], t[n]) else: raise ValueError('version=%s' % version) # Insert boundary conditions u[0] = 0; u[Nx] = 0 if user_action is not None: user_action(u, x, t, n+1) # Switch variables before next step u_n, u = u, u_n t1 = time.clock() return t1-t0 .. _diffu:pde1:FE:verify: Verification (7) ------------------------- .. index:: single: diffusion equation; 1D, verification (FE) .. _diffu:pde1:FE:verify:exact: Exact solution of discrete equations ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Before thinking about running the functions in the previous section, we need to construct a suitable test example for verification. It appears that a manufactured solution that is linear in time and at most quadratic in space fulfills the Forward Euler scheme exactly. With the restriction that :math:`u=0` for :math:`x=0,L`, we can try the solution .. math:: u(x,t) = 5tx(L-x){\thinspace .} Inserted in the PDE, it requires a source term .. math:: f(x,t) = 10{\alpha} t + 5x(L-x){\thinspace .} With the formulas from :ref:`sec:form:fdtn` we can easily check that the manufactured ``u`` fulfills the scheme: .. math:: \begin{align*} \lbrack D_t^+ u = {\alpha} D_x D_x u + f\rbrack^n_i &= \lbrack 5x(L-x)D_t^+ t = 5 t{\alpha} D_x D_x (xL-x^2) +\\ &\quad\quad 10{\alpha} t + 5x(L-x)\rbrack^n_i\\ &= \lbrack 5x(L-x) = 5 t{\alpha} (-2) + 10{\alpha} t + 5x(L-x) \rbrack^n_i, \end{align*} which is a 0=0 expression. The computation of the source term, given any :math:`u`, is easily automated with ``sympy``: .. code-block:: python import sympy as sym x, t, a, L = sym.symbols('x t a L') u = x*(L-x)*5*t def pde(u): return sym.diff(u, t) - a*sym.diff(u, x, x) f = sym.simplify(pde(u)) Now we can choose any expression for ``u`` and automatically get the suitable source term ``f``. However, the manufactured solution ``u`` will in general not be exactly reproduced by the scheme: only constant and linear functions are differentiated correctly by a forward difference, while only constant, linear, and quadratic functions are differentiated exactly by a :math:`[D_xD_x u]^n_i` difference. The numerical code will need to access the ``u`` and ``f`` above as Python functions. The exact solution is wanted as a Python function ``u_exact(x, t)``, while the source term is wanted as ``f(x, t)``. The parameters ``a`` and ``L`` in ``u`` and ``f`` above are symbols and must be replaced by ``float`` objects in a Python function. This can be done by redefining ``a`` and ``L`` as ``float`` objects and performing substitutions of symbols by numbers in ``u`` and ``f``. The appropriate code looks like this: .. code-block:: python a = 0.5 L = 1.5 u_exact = sym.lambdify( [x, t], u.subs('L', L).subs('a', a), modules='numpy') f = sym.lambdify( [x, t], f.subs('L', L).subs('a', a), modules='numpy') I = lambda x: u_exact(x, 0) Here we also make a function ``I`` for the initial condition. The idea now is that our manufactured solution should be exactly reproduced by the code (to machine precision). For this purpose we make a test function for comparing the exact and numerical solutions at the end of the time interval: .. code-block:: python def test_solver_FE(): # Define u_exact, f, I as explained above dx = L/3 # 3 cells F = 0.5 dt = F*dx**2 u, x, t, cpu = solver_FE_simple( I=I, a=a, f=f, L=L, dt=dt, F=F, T=2) u_e = u_exact(x, t[-1]) diff = abs(u_e - u).max() tol = 1E-14 assert diff < tol, 'max diff solver_FE_simple: %g' % diff u, x, t, cpu = solver_FE( I=I, a=a, f=f, L=L, dt=dt, F=F, T=2, user_action=None, version='scalar') u_e = u_exact(x, t[-1]) diff = abs(u_e - u).max() tol = 1E-14 assert diff < tol, 'max diff solver_FE, scalar: %g' % diff u, x, t, cpu = solver_FE( I=I, a=a, f=f, L=L, dt=dt, F=F, T=2, user_action=None, version='vectorized') u_e = u_exact(x, t[-1]) diff = abs(u_e - u).max() tol = 1E-14 assert diff < tol, 'max diff solver_FE, vectorized: %g' % diff .. admonition:: The critical value :math:`F=0.5` We emphasize that the value ``F=0.5`` is critical: the tests above will fail if ``F`` has a larger value. This is because the Forward Euler scheme is unstable for :math:`F>1/2`. The reader may wonder if :math:`F=1/2` is safe or if :math:`F<1/2` should be required. Experiments show that :math:`F=1/2` works fine for :math:`u_t={\alpha} u_{xx}`, so there is no accumulation of rounding errors in this case and hence no need to introduce any safety factor to keep :math:`F` away from the limiting value 0.5. .. _diffu:pde1:FE:verify:convrates: Checking convergence rates (2) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ If our chosen exact solution does not satisfy the discrete equations exactly, we are left with checking the convergence rates, just as we did previously for the wave equation. However, with the Euler scheme here, we have different accuracies in time and space, since we use a second order approximation to the spatial derivative and a first order approximation to the time derivative. Thus, we must expect different convergence rates in time and space. For the numerical error, .. math:: E = C_t\Delta t^r + C_x\Delta x^p, we should get convergence rates :math:`r=1` and :math:`p=2` (:math:`C_t` and :math:`C_x` are unknown constants). As previously, in the section :ref:`wave:pde2:fd:MMS`, we simplify matters by introducing a single discretization parameter :math:`h`: .. math:: h = \Delta t,\quad \Delta x = Kh^{r/p}, where :math:`K` is any constant. This allows us to factor out only *one* discretization parameter :math:`h` from the formula: .. math:: E = C_t h + C_x (K^{r/p})^p = \tilde C h^r,\quad \tilde C = C_t + C_sK^r{\thinspace .} The computed rate :math:`r` should approach 1 with increasing resolution. It is tempting, for simplicity, to choose :math:`K=1`, which gives :math:`\Delta x = h^{r/p}`, expected to be :math:`\sqrt{\Delta t}`. However, we have to control the stability requirement: :math:`F\leq\frac{1}{2}`, which means .. math:: \frac{{\alpha}\Delta t}{\Delta x^2}\leq\frac{1}{2}\quad\Rightarrow \quad \Delta x \geq \sqrt{2{\alpha}}h^{1/2} , implying that :math:`K=\sqrt{2{\alpha}}` is our choice in experiments where we lie on the stability limit :math:`F=1/2`. .. _diffu:pde1:FE:experiments: Numerical experiments --------------------- .. index:: single: diffusion equation; 1D, numerical experiments When a test function like the one above runs silently without errors, we have some evidence for a correct implementation of the numerical method. The next step is to do some experiments with more interesting solutions. We target a scaled diffusion problem where :math:`x/L` is a new spatial coordinate and :math:`{\alpha} t/L^2` is a new time coordinate. The source term :math:`f` is omitted, and :math:`u` is scaled by :math:`\max_{x\in [0,L]}|I(x)|` (see `Scaling of the diffusion equation `__ [Ref03]_ for details). The governing PDE is then .. math:: \frac{\partial u}{\partial t} = \frac{\partial^2 u}{\partial x^2}, in the spatial domain :math:`[0,L]`, with boundary conditions :math:`u(0)=u(1)=0`. Two initial conditions will be tested: a discontinuous plug, .. math:: I(x) = \left\lbrace\begin{array}{ll} 0, & |x-L/2| > 0.1\\ 1, & \hbox{otherwise} \end{array}\right. and a smooth Gaussian function, .. math:: I(x) = e^{-\frac{1}{2\sigma^2}(x-L/2)^2}{\thinspace .} The functions ``plug`` and ``gaussian`` in `diffu1D_u0.py `__ run the two cases, respectively: .. code-block:: python def plug(scheme='FE', F=0.5, Nx=50): L = 1. a = 1. T = 0.1 # Compute dt from Nx and F dx = L/Nx; dt = F/a*dx**2 def I(x): """Plug profile as initial condition.""" if abs(x-L/2.0) > 0.1: return 0 else: return 1 cpu = viz(I, a, L, dt, F, T, umin=-0.1, umax=1.1, scheme=scheme, animate=True, framefiles=True) print 'CPU time:', cpu def gaussian(scheme='FE', F=0.5, Nx=50, sigma=0.05): L = 1. a = 1. T = 0.1 # Compute dt from Nx and F dx = L/Nx; dt = F/a*dx**2 def I(x): """Gaussian profile as initial condition.""" return exp(-0.5*((x-L/2.0)**2)/sigma**2) u, cpu = viz(I, a, L, dt, F, T, umin=-0.1, umax=1.1, scheme=scheme, animate=True, framefiles=True) print 'CPU time:', cpu These functions make use of the function ``viz`` for running the solver and visualizing the solution using a callback function with plotting: .. code-block:: python def viz(I, a, L, dt, F, T, umin, umax, scheme='FE', animate=True, framefiles=True): def plot_u(u, x, t, n): plt.plot(x, u, 'r-', axis=[0, L, umin, umax], title='t=%f' % t[n]) if framefiles: plt.savefig('tmp_frame%04d.png' % n) if t[n] == 0: time.sleep(2) elif not framefiles: # It takes time to write files so pause is needed # for screen only animation time.sleep(0.2) user_action = plot_u if animate else lambda u,x,t,n: None cpu = eval('solver_'+scheme)(I, a, L, dt, F, T, user_action=user_action) return cpu Notice that this ``viz`` function stores all the solutions in a list ``solutions`` in the callback function. Modern computers have hardly any problem with storing a lot of such solutions for moderate values of :math:`N_x` in 1D problems, but for 2D and 3D problems, this technique cannot be used and solutions must be stored in files. Our experiments employ a time step :math:`\Delta t = 0.0002` and simulate for :math:`t\in [0,0.1]`. First we try the highest value of :math:`F`: :math:`F=0.5`. This resolution corresponds to :math:`N_x=50`. A possible terminal command is .. code-block:: text Terminal> python -c 'from diffu1D_u0 import gaussian gaussian("solver_FE", F=0.5, dt=0.0002)' The :math:`u(x,t)` curve as a function of :math:`x` is shown in Figure :ref:`diffu:pde1:FE:fig:F=0.5` at four time levels. .. raw:: html

.. `movie `__ .. Does not work: .. http://tinyurl.com/pu5uyfn/pub/diffu/html/mov-diffu/diffu1D_u0_FE_plug/movie.ogg .. Works: .. https://raw.githubusercontent.com/hplgit/fdm-book/master/doc/.src/book/mov-diffu/diffu1D_u0_FE_plug/movie.ogg We see that the curves have saw-tooth waves in the beginning of the simulation. This non-physical noise is smoothed out with time, but solutions of the diffusion equations are known to be smooth, and this numerical solution is definitely not smooth. Lowering :math:`F` helps: :math:`F\leq 0.25` gives a smooth solution, see Figure :ref:`diffu:pde1:FE:fig:F=0.25`. .. raw:: html

Increasing :math:`F` slightly beyond the limit 0.5, to :math:`F=0.51`, gives growing, non-physical instabilities, as seen in Figure :ref:`diffu:pde1:FE:fig:F=0.51`. .. _diffu:pde1:FE:fig:F=0.5: .. figure:: plug_FE_F05.png :width: 800 Forward Euler scheme for :math:`F=0.5` .. _diffu:pde1:FE:fig:F=0.25: .. figure:: plug_FE_F025.png :width: 800 Forward Euler scheme for :math:`F=0.25` .. _diffu:pde1:FE:fig:F=0.51: .. figure:: plug_FE_F051.png :width: 800 Forward Euler scheme for :math:`F=0.51` Instead of a discontinuous initial condition we now try the smooth Gaussian function for :math:`I(x)`. A simulation for :math:`F=0.5` is shown in Figure :ref:`diffu:pde1:FE:fig:gauss:F=0.5`. Now the numerical solution is smooth for all times, and this is true for any :math:`F\leq 0.5`. .. raw:: html

.. _diffu:pde1:FE:fig:gauss:F=0.5: .. figure:: gaussian_FE_F05.png :width: 800 Forward Euler scheme for :math:`F=0.5` Experiments with these two choices of :math:`I(x)` reveal some important observations: * The Forward Euler scheme leads to growing solutions if :math:`F>\frac{1}{2}`. * :math:`I(x)` as a discontinuous plug leads to a saw tooth-like noise for :math:`F=\frac{1}{2}`, which is absent for :math:`F\leq\frac{1}{4}`. * The smooth Gaussian initial function leads to a smooth solution for all relevant :math:`F` values (:math:`F\leq \frac{1}{2}`). .. _diffu:pde1:implicit: Implicit methods for the 1D diffusion equation ============================================== .. index:: single: diffusion equation; 1D, implicit schemes Simulations with the Forward Euler scheme shows that the time step restriction, :math:`F\leq\frac{1}{2}`, which means :math:`\Delta t \leq \Delta x^2/(2{\alpha})`, may be relevant in the beginning of the diffusion process, when the solution changes quite fast, but as time increases, the process slows down, and a small :math:`\Delta t` may be inconvenient. By using *implicit schemes*, which lead to coupled systems of linear equations to be solved at each time level, any size of :math:`\Delta t` is possible (but the accuracy decreases with increasing :math:`\Delta t`). The Backward Euler scheme, derived and implemented below, is the simplest implicit scheme for the diffusion equation. .. _diffu:pde1:BE: Backward Euler scheme --------------------- We now apply a backward difference in time in :ref:`(353) `, but the same central difference in space: .. _Eq:diffu:pde1:step3aBE: .. math:: \tag{358} [D_t^- u = D_xD_x u + f]^n_i, which written out reads .. _Eq:diffu:pde1:step3bBE: .. math:: \tag{359} \frac{u^{n}_i-u^{n-1}_i}{\Delta t} = {\alpha}\frac{u^{n}_{i+1} - 2u^n_i + u^n_{i-1}}{\Delta x^2} + f_i^n{\thinspace .} Now we assume :math:`u^{n-1}_i` is already computed, but all quantities at the "new" time level :math:`n` are unknown. This time it is not possible to solve with respect to :math:`u_i^{n}` because this value couples to its neighbors in space, :math:`u^n_{i-1}` and :math:`u^n_{i+1}`, which are also unknown. Let us examine this fact for the case when :math:`N_x=3`. Equation :ref:`(359) ` written for :math:`i=1,\ldots,Nx-1= 1,2` becomes .. _Eq:_auto145: .. math:: \tag{360} \frac{u^{n}_1-u^{n-1}_1}{\Delta t} = {\alpha}\frac{u^{n}_{2} - 2u^n_1 + u^n_{0}}{\Delta x^2} + f_1^n .. _Eq:_auto146: .. math:: \tag{361} \frac{u^{n}_2-u^{n-1}_2}{\Delta t} = {\alpha}\frac{u^{n}_{3} - 2u^n_2 + u^n_{1}}{\Delta x^2} + f_2^n The boundary values :math:`u^n_0` and :math:`u^n_3` are known as zero. Collecting the unknown new values :math:`u^n_1` and :math:`u^n_2` on the left-hand side and multiplying by :math:`\Delta t` gives .. _Eq:_auto147: .. math:: \tag{362} \left(1+ 2F\right) u^{n}_1 - F u^{n}_{2} = u^{n-1}_1 + \Delta t f_1^n, .. _Eq:_auto148: .. math:: \tag{363} - F u^{n}_{1} + \left(1+ 2F\right) u^{n}_2 = u^{n-1}_2 + \Delta t f_2^n{\thinspace .} This is a coupled :math:`2\times 2` system of algebraic equations for the unknowns :math:`u^n_1` and :math:`u^n_2`. The equivalent matrix form is .. math:: \left(\begin{array}{cc} 1+ 2F & - F\\ - F & 1+ 2F \end{array}\right) \left(\begin{array}{c} u^{n}_1\\ u^{n}_2 \end{array}\right) = \left(\begin{array}{c} u^{n-1}_1 + \Delta t f_1^n\\ u^{n-1}_2 + \Delta t f_2^n \end{array}\right) .. admonition:: Terminology: implicit vs. explicit methods Discretization methods that lead to a coupled system of equations for the unknown function at a new time level are said to be *implicit methods*. The counterpart, *explicit methods*, refers to discretization methods where there is a simple explicit formula for the values of the unknown function at each of the spatial mesh points at the new time level. From an implementational point of view, implicit methods are more comprehensive to code since they require the solution of coupled equations, i.e., a matrix system, at each time level. With explicit methods we have a closed-form formula for the value of the unknown at each mesh point. Very often explicit schemes have a restriction on the size of the time step that can be relaxed by using implicit schemes. In fact, implicit schemes are frequently unconditionally stable, so the size of the time step is governed by accuracy and not by stability. This is the great advantage of implicit schemes. In the general case, :ref:`(359) ` gives rise to a coupled :math:`(N_x-1)\times (N_x-1)` system of algebraic equations for all the unknown :math:`u^n_i` at the interior spatial points :math:`i=1,\ldots,N_x-1`. Collecting the unknowns on the left-hand side, :ref:`(359) ` can be written .. _Eq:diffu:pde1:step4BE: .. math:: \tag{364} - F u^n_{i-1} + \left(1+ 2F \right) u^{n}_i - F u^n_{i+1} = u_{i-1}^{n-1}, for :math:`i=1,\ldots,N_x-1`. One can either view these equations as a system for where the :math:`u^{n}_i` values at the internal mesh points, :math:`i=1,\ldots,N_x-1`, are unknown, or we may append the boundary values :math:`u^n_0` and :math:`u^n_{N_x}` to the system. In the latter case, all :math:`u^n_i` for :math:`i=0,\ldots,N_x` are considered unknown, and we must add the boundary equations to the :math:`N_x-1` equations in :ref:`(364) `: .. _Eq:diffu:pde1:step4BE:BC:0: .. math:: \tag{365} u_0^n = 0, .. _Eq:diffu:pde1:step4BE:BC:L: .. math:: \tag{366} u_{N_x}^n = 0{\thinspace .} A coupled system of algebraic equations can be written on matrix form, and this is important if we want to call up ready-made software for solving the system. The equations :ref:`(364) ` and :ref:`(365) `--:ref:`(366) ` correspond to the matrix equation .. math:: AU = b where :math:`U=(u^n_0,\ldots,u^n_{N_x})`, and the matrix :math:`A` has the following structure: .. _Eq:diffu:pde1:matrix:sparsity: .. math:: \tag{367} A = \left( \begin{array}{cccccccccc} A_{0,0} & A_{0,1} & 0 &\cdots & \cdots & \cdots & \cdots & \cdots & 0 \\ A_{1,0} & A_{1,1} & A_{1,2} & \ddots & & & & & \vdots \\ 0 & A_{2,1} & A_{2,2} & A_{2,3} & \ddots & & & & \vdots \\ \vdots & \ddots & & \ddots & \ddots & 0 & & & \vdots \\ \vdots & & \ddots & \ddots & \ddots & \ddots & \ddots & & \vdots \\ \vdots & & & 0 & A_{i,i-1} & A_{i,i} & A_{i,i+1} & \ddots & \vdots \\ \vdots & & & & \ddots & \ddots & \ddots &\ddots & 0 \\ \vdots & & & & &\ddots & \ddots &\ddots & A_{N_x-1,N_x} \\ 0 &\cdots & \cdots &\cdots & \cdots & \cdots & 0 & A_{N_x,N_x-1} & A_{N_x,N_x} \end{array} \right) The nonzero elements are given by .. _Eq:_auto149: .. math:: \tag{368} A_{i,i-1} = -F .. _Eq:_auto150: .. math:: \tag{369} A_{i,i} = 1+ 2F .. _Eq:_auto151: .. math:: \tag{370} A_{i,i+1} = -F in the equations for internal points, :math:`i=1,\ldots,N_x-1`. The first and last equation correspond to the boundary condition, where we know the solution, and therefore we must have .. _Eq:_auto152: .. math:: \tag{371} A_{0,0} = 1, .. _Eq:_auto153: .. math:: \tag{372} A_{0,1} = 0, .. _Eq:_auto154: .. math:: \tag{373} A_{N_x,N_x-1} = 0, .. _Eq:_auto155: .. math:: \tag{374} A_{N_x,N_x} = 1{\thinspace .} The right-hand side :math:`b` is written as .. _Eq:_auto156: .. math:: \tag{375} b = \left(\begin{array}{c} b_0\\ b_1\\ \vdots\\ b_i\\ \vdots\\ b_{N_x} \end{array}\right) with .. _Eq:_auto157: .. math:: \tag{376} b_0 = 0, .. _Eq:_auto158: .. math:: \tag{377} b_i = u^{n-1}_i,\quad i=1,\ldots,N_x-1, .. _Eq:_auto159: .. math:: \tag{378} b_{N_x} = 0 {\thinspace .} We observe that the matrix :math:`A` contains quantities that do not change in time. Therefore, :math:`A` can be formed once and for all before we enter the recursive formulas for the time evolution. The right-hand side :math:`b`, however, must be updated at each time step. This leads to the following computational algorithm, here sketched with Python code: .. code-block:: python x = np.linspace(0, L, Nx+1) # mesh points in space dx = x[1] - x[0] t = np.linspace(0, T, N+1) # mesh points in time u = np.zeros(Nx+1) # unknown u at new time level u_n = np.zeros(Nx+1) # u at the previous time level # Data structures for the linear system A = np.zeros((Nx+1, Nx+1)) b = np.zeros(Nx+1) for i in range(1, Nx): A[i,i-1] = -F A[i,i+1] = -F A[i,i] = 1 + 2*F A[0,0] = A[Nx,Nx] = 1 # Set initial condition u(x,0) = I(x) for i in range(0, Nx+1): u_n[i] = I(x[i]) import scipy.linalg for n in range(0, Nt): # Compute b and solve linear system for i in range(1, Nx): b[i] = -u_n[i] b[0] = b[Nx] = 0 u[:] = scipy.linalg.solve(A, b) # Update u_n before next step u_n[:] = u Regarding verification, the same considerations apply as for the Forward Euler method (the section :ref:`diffu:pde1:FE:verify`). .. index:: single: diffusion equation; 1D, verification (BE) .. _diffu:pde1:impl:sparse: Sparse matrix implementation ---------------------------- .. index:: single: diffusion equation; 1D, tridiagonal matrix We have seen from :ref:`(367) ` that the matrix :math:`A` is tridiagonal. The code segment above used a full, dense matrix representation of :math:`A`, which stores a lot of values we know are zero beforehand, and worse, the solution algorithm computes with all these zeros. With :math:`N_x+1` unknowns, the work by the solution algorithm is :math:`\frac{1}{3} (N_x+1)^3` and the storage requirements :math:`(N_x+1)^2`. By utilizing the fact that :math:`A` is tridiagonal and employing corresponding software tools that work with the three diagonals, the work and storage demands can be proportional to :math:`N_x` only. This leads to a dramatic improvement: with :math:`N_x=200`, which is a realistic resolution, the code runs about 40,000 times faster and reduces the storage to just 1.5%! It is no doubt that we should take advantage of the fact that :math:`A` is tridiagonal. The key idea is to apply a data structure for a tridiagonal or sparse matrix. The ``scipy.sparse`` package has relevant utilities. For example, we can store only the nonzero diagonals of a matrix. The package also has linear system solvers that operate on sparse matrix data structures. The code below illustrates how we can store only the main diagonal and the upper and lower diagonals. .. code-block:: python # Representation of sparse matrix and right-hand side main = np.zeros(Nx+1) lower = np.zeros(Nx) upper = np.zeros(Nx) b = np.zeros(Nx+1) # Precompute sparse matrix main[:] = 1 + 2*F lower[:] = -F upper[:] = -F # Insert boundary conditions main[0] = 1 main[Nx] = 1 A = scipy.sparse.diags( diagonals=[main, lower, upper], offsets=[0, -1, 1], shape=(Nx+1, Nx+1), format='csr') print A.todense() # Check that A is correct # Set initial condition for i in range(0,Nx+1): u_n[i] = I(x[i]) for n in range(0, Nt): b = u_n b[0] = b[-1] = 0.0 # boundary conditions u[:] = scipy.sparse.linalg.spsolve(A, b) u_n[:] = u The ``scipy.sparse.linalg.spsolve`` function utilizes the sparse storage structure of ``A`` and performs, in this case, a very efficient Gaussian elimination solve. The program `diffu1D_u0.py `__ contains a function ``solver_BE``, which implements the Backward Euler scheme sketched above. As mentioned in the section :ref:`diffu:pde1:FE`, the functions ``plug`` and ``gaussian`` runs the case with :math:`I(x)` as a discontinuous plug or a smooth Gaussian function. All experiments point to two characteristic features of the Backward Euler scheme: 1) it is always stable, and 2) it always gives a smooth, decaying solution. .. _diffu:pde1:CN: Crank-Nicolson scheme (1) ---------------------------------- .. index:: single: diffusion equation; 1D, Crank-Nicolson scheme The idea in the Crank-Nicolson scheme is to apply centered differences in space and time, combined with an average in time. We demand the PDE to be fulfilled at the spatial mesh points, but midway between the points in the time mesh: .. math:: \frac{\partial}{\partial t} u(x_i, t_{n+\frac{1}{2}}) = {\alpha}\frac{\partial^2}{\partial x^2}u(x_i, t_{n+\frac{1}{2}}) + f(x_i,t_{n+\frac{1}{2}}), for :math:`i=1,\ldots,N_x-1` and :math:`n=0,\ldots, N_t-1`. With centered differences in space and time, we get .. math:: [D_t u = {\alpha} D_xD_x u + f]^{n+\frac{1}{2}}_i{\thinspace .} On the right-hand side we get an expression .. math:: \frac{1}{\Delta x^2}\left(u^{n+\frac{1}{2}}_{i-1} - 2u^{n+\frac{1}{2}}_i + u^{n+\frac{1}{2}}_{i+1}\right) + f_i^{n+\frac{1}{2}}{\thinspace .} This expression is problematic since :math:`u^{n+\frac{1}{2}}_i` is not one of the unknowns we compute. A possibility is to replace :math:`u^{n+\frac{1}{2}}_i` by an arithmetic average: .. math:: u^{n+\frac{1}{2}}_i\approx \frac{1}{2}\left(u^{n}_i +u^{n+1}_{i}\right){\thinspace .} In the compact notation, we can use the arithmetic average notation :math:`\overline{u}^t`: .. math:: [D_t u = {\alpha} D_xD_x \overline{u}^t + f]^{n+\frac{1}{2}}_i{\thinspace .} We can also use an average for :math:`f_i^{n+\frac{1}{2}}`: .. math:: [D_t u = {\alpha} D_xD_x \overline{u}^t + \overline{f}^t]^{n+\frac{1}{2}}_i{\thinspace .} After writing out the differences and average, multiplying by :math:`\Delta t`, and collecting all unknown terms on the left-hand side, we get .. math:: u^{n+1}_i - \frac{1}{2} F(u^{n+1}_{i-1} - 2u^{n+1}_i + u^{n+1}_{i+1}) = u^{n}_i + \frac{1}{2} F(u^{n}_{i-1} - 2u^{n}_i + u^{n}_{i+1})\nonumber .. _Eq:_auto160: .. math:: \tag{379} \qquad \frac{1}{2} f_i^{n+1} + \frac{1}{2} f_i^n{\thinspace .} Also here, as in the Backward Euler scheme, the new unknowns :math:`u^{n+1}_{i-1}`, :math:`u^{n+1}_{i}`, and :math:`u^{n+1}_{i+1}` are coupled in a linear system :math:`AU=b`, where :math:`A` has the same structure as in :ref:`(367) `, but with slightly different entries: .. _Eq:_auto161: .. math:: \tag{380} A_{i,i-1} = -\frac{1}{2} F .. _Eq:_auto162: .. math:: \tag{381} A_{i,i} = 1 + F .. _Eq:_auto163: .. math:: \tag{382} A_{i,i+1} = -\frac{1}{2} F in the equations for internal points, :math:`i=1,\ldots,N_x-1`. The equations for the boundary points correspond to .. _Eq:_auto164: .. math:: \tag{383} A_{0,0} = 1, .. _Eq:_auto165: .. math:: \tag{384} A_{0,1} = 0, .. _Eq:_auto166: .. math:: \tag{385} A_{N_x,N_x-1} = 0, .. _Eq:_auto167: .. math:: \tag{386} A_{N_x,N_x} = 1{\thinspace .} The right-hand side :math:`b` has entries .. _Eq:_auto168: .. math:: \tag{387} b_0 = 0, .. _Eq:_auto169: .. math:: \tag{388} b_i = u^{n-1}_i + \frac{1}{2}(f_i^n + f_i^{n+1}),\quad i=1,\ldots,N_x-1, .. _Eq:_auto170: .. math:: \tag{389} b_{N_x} = 0 {\thinspace .} When verifying some implementation of the Crank-Nicolson scheme by convergence rate testing, one should note that the scheme is second order accurate in both space and time. The numerical error then reads .. math:: E = C_t\Delta t^r + C_x\Delta x^r, where :math:`r=2` (:math:`C_t` and :math:`C_x` are unknown constants, as before). When introducing a single discretization parameter, we may now simply choose .. math:: h = \Delta x = \Delta t, which gives .. math:: E = C_th^r + C_xh^r = (C_t + C_x)h^r, where :math:`r` should approach 2 as resolution is increased in the convergence rate computations. .. index:: single: diffusion equation; 1D, verification (CN) .. _diffu:pde1:theta: The unifying :math:`\theta` rule -------------------------------- .. index:: single: diffusion equation; 1D, theta rule For the equation .. math:: \frac{\partial u}{\partial t} = G(u), where :math:`G(u)` is some spatial differential operator, the :math:`\theta`-rule looks like .. math:: \frac{u^{n+1}_i - u^n_i}{\Delta t} = \theta G(u^{n+1}_i) + (1-\theta) G(u^{n}_i){\thinspace .} The important feature of this time discretization scheme is that we can implement one formula and then generate a family of well-known and widely used schemes: * :math:`\theta=0` gives the Forward Euler scheme in time * :math:`\theta=1` gives the Backward Euler scheme in time * :math:`\theta=\frac{1}{2}` gives the Crank-Nicolson scheme in time In the compact difference notation, we write the :math:`\theta` rule as .. math:: [D_t u = {\alpha} D_xD_x u]^{n+\theta}{\thinspace .} We have that :math:`t_{n+\theta} = \theta t_{n+1} + (1-\theta)t_n`. Applied to the 1D diffusion problem, the :math:`\theta`-rule gives .. math:: \begin{align*} \frac{u^{n+1}_i-u^n_i}{\Delta t} &= {\alpha}\left( \theta \frac{u^{n+1}_{i+1} - 2u^{n+1}_i + u^{n+1}_{i-1}}{\Delta x^2} + (1-\theta) \frac{u^{n}_{i+1} - 2u^n_i + u^n_{i-1}}{\Delta x^2}\right)\\ &\qquad + \theta f_i^{n+1} + (1-\theta)f_i^n {\thinspace .} \end{align*} This scheme also leads to a matrix system with entries .. math:: A_{i,i-1} = -F\theta,\quad A_{i,i} = 1+2F\theta\quad, A_{i,i+1} = -F\theta, while right-hand side entry :math:`b_i` is .. math:: b_i = u^n_{i} + F(1-\theta) \frac{u^{n}_{i+1} - 2u^n_i + u^n_{i-1}}{\Delta x^2} + \Delta t\theta f_i^{n+1} + \Delta t(1-\theta)f_i^n{\thinspace .} The corresponding entries for the boundary points are as in the Backward Euler and Crank-Nicolson schemes listed earlier. Note that convergence rate testing with implementations of the theta rule must adjust the error expression according to which of the underlying schemes is actually being run. That is, if :math:`\theta=0` (i.e., Forward Euler) or :math:`\theta=1` (i.e., Backward Euler), there should be first order convergence, whereas with :math:`\theta=0.5` (i.e., Crank-Nicolson), one should get second order convergence (as outlined in previous sections). .. _diffu:pde1:theta:experiments: Experiments ----------- We can repeat the experiments from the section :ref:`diffu:pde1:FE:experiments` to see if the Backward Euler or Crank-Nicolson schemes have problems with sawtooth-like noise when starting with a discontinuous initial condition. We can also verify that we can have :math:`F>\frac{1}{2}`, which allows larger time steps than in the Forward Euler method. .. _diffu:pde1:BE:fig:F=0.5: .. figure:: plug_BE_F05.png :width: 800 Backward Euler scheme for :math:`F=0.5` The Backward Euler scheme always produces smooth solutions for any :math:`F`. Figure :ref:`diffu:pde1:BE:fig:F=0.5` shows one example. Note that the mathematical discontinuity at :math:`t=0` leads to a linear variation on a mesh, but the approximation to a jump becomes better as :math:`N_x` increases. In our simulation we specify :math:`\Delta t` and :math:`F`, and :math:`N_x` is set to :math:`L/\sqrt{{\alpha}\Delta t/F}`. Since :math:`N_x\sim\sqrt{F}`, the discontinuity looks sharper in the Crank-Nicolson simulations with larger :math:`F`. The Crank-Nicolson method produces smooth solutions for small :math:`F`, :math:`F\leq\frac{1}{2}`, but small noise gets more and more evident as :math:`F` increases. Figures :ref:`diffu:pde1:CN:fig:F=3` and :ref:`diffu:pde1:CN:fig:F=10` demonstrate the effect for :math:`F=3` and :math:`F=10`, respectively. The section :ref:`diffu:pde1:analysis` explains why such noise occur. .. _diffu:pde1:CN:fig:F=3: .. figure:: plug_CN_F3.png :width: 800 Crank-Nicolson scheme for :math:`F=3` .. _diffu:pde1:CN:fig:F=10: .. figure:: plug_CN_F10.png :width: 800 Crank-Nicolson scheme for :math:`F=10` The Laplace and Poisson equation -------------------------------- The Laplace equation, :math:`\nabla^2 u = 0`, and the Poisson equation, :math:`-\nabla^2 u = f`, occur in numerous applications throughout science and engineering. In 1D these equations read :math:`u''(x)=0` and :math:`-u''(x)=f(x)`, respectively. We can solve 1D variants of the Laplace equations with the listed software, because we can interpret :math:`u_{xx}=0` as the limiting solution of :math:`u_t = {\alpha} u_{xx}` when :math:`u` reaches a steady state limit where :math:`u_t\rightarrow 0`. Similarly, Poisson's equation :math:`-u_{xx}=f` arises from solving :math:`u_t = u_{xx} + f` and letting :math:`t\rightarrow\infty` so :math:`u_t\rightarrow 0`. Technically in a program, we can simulate :math:`t\rightarrow\infty` by just taking one large time step: :math:`\Delta t\rightarrow\infty`. In the limit, the Backward Euler scheme gives .. math:: -\frac{u^{n+1}_{i+1} - 2u^{n+1}_i + u^{n+1}_{i-1}}{\Delta x^2} = f^{n+1}_i, which is nothing but the discretization :math:`[-D_xD_x u = f]^{n+1}_i=0` of :math:`-u_{xx}=f`. The result above means that the Backward Euler scheme can solve the limit equation directly and hence produce a solution of the 1D Laplace equation. With the Forward Euler scheme we must do the time stepping since :math:`\Delta t > \Delta x^2/{\alpha}` is illegal and leads to instability. We may interpret this time stepping as solving the equation system from :math:`-u_{xx}=f` by iterating on a pseudo time variable. .. _diffu:pde1:analysis: Analysis of schemes for the diffusion equation ============================================== The numerical experiments in the sections :ref:`diffu:pde1:FE:experiments` and :ref:`diffu:pde1:theta:experiments` reveal that there are some numerical problems with the Forward Euler and Crank-Nicolson schemes: sawtooth-like noise is sometimes present in solutions that are, from a mathematical point of view, expected to be smooth. This section presents a mathematical analysis that explains the observed behavior and arrives at criteria for obtaining numerical solutions that reproduce the qualitative properties of the exact solutions. In short, we shall explain what is observed in Figures :ref:`diffu:pde1:FE:fig:F=0.5`-:ref:`diffu:pde1:CN:fig:F=10`. .. :ref:`diffu:pde1:FE:fig:F=0.5`, .. :ref:`diffu:pde1:FE:fig:F=0.25`, .. :ref:`diffu:pde1:FE:fig:F=0.51`, .. :ref:`diffu:pde1:FE:fig:gauss:F=0.5`, .. :ref:`diffu:pde1:BE:fig:F=0.5`, .. :ref:`diffu:pde1:CN:fig:F=3`, .. and .. :ref:`diffu:pde1:CN:fig:F=10`. .. _diffu:pde1:analysis:uex: Properties of the solution -------------------------- A particular characteristic of diffusive processes, governed by an equation like .. _Eq:diffu:pde1:eq: .. math:: \tag{390} u_t = {\alpha} u_{xx}, is that the initial shape :math:`u(x,0)=I(x)` spreads out in space with time, along with a decaying amplitude. Three different examples will illustrate the spreading of :math:`u` in space and the decay in time. Similarity solution ~~~~~~~~~~~~~~~~~~~ The diffusion equation :ref:`(390) ` admits solutions that depend on :math:`\eta = (x-c)/\sqrt{4{\alpha} t}` for a given value of :math:`c`. One particular solution is .. _Eq:diffu:pdf1:erf:sol: .. math:: \tag{391} u(x,t) = a\,\mbox{erf}(\eta) + b, where .. _Eq:diffu:analysis:erf:def: .. math:: \tag{392} \mbox{erf}(\eta) = \frac{2}{\sqrt{\pi}}\int_0^\eta e^{-\zeta^2}d\zeta, is the *error function*, and :math:`a` and :math:`b` are arbitrary constants. The error function lies in :math:`(-1,1)`, is odd around :math:`\eta =0`, and goes relatively quickly to :math:`\pm 1`: .. math:: \begin{align*} \lim_{\eta\rightarrow -\infty}\mbox{erf}(\eta) &=-1,\\ \lim_{\eta\rightarrow \infty}\mbox{erf}(\eta) &=1,\\ \mbox{erf}(\eta) &= -\mbox{erf}(-\eta),\\ \mbox{erf}(0) &=0,\\ \mbox{erf}(2) &=0.99532227,\\ \mbox{erf}(3) &=0.99997791 {\thinspace .} \end{align*} As :math:`t\rightarrow 0`, the error function approaches a step function centered at :math:`x=c`. For a diffusion problem posed on the unit interval :math:`[0,1]`, we may choose the step at :math:`x=1/2` (meaning :math:`c=1/2`), :math:`a=-1/2`, :math:`b=1/2`. Then .. _Eq:diffu:analysis:pde1:step:erf:sol: .. math:: \tag{393} u(x,t) = \frac{1}{2}\left(1 - \mbox{erf}\left(\frac{x-\frac{1}{2}}{\sqrt{4{\alpha} t}}\right)\right) = \frac{1}{2}\mbox{erfc}\left(\frac{x-\frac{1}{2}}{\sqrt{4{\alpha} t}}\right), where we have introduced the *complementary error function* :math:`\mbox{erfc}(\eta) = 1-\mbox{erf}(\eta)`. The solution :ref:`(393) ` implies the boundary conditions .. _Eq:diffu:analysis:pde1:p1:erf:uL: .. math:: \tag{394} u(0,t) = \frac{1}{2}\left(1 - \mbox{erf}\left(\frac{-1/2}{\sqrt{4{\alpha} t}}\right)\right), .. _Eq:diffu:analysis:pde1:p1:erf:uR: .. math:: \tag{395} u(1,t) = \frac{1}{2}\left(1 - \mbox{erf}\left(\frac{1/2}{\sqrt{4{\alpha} t}}\right)\right) {\thinspace .} For small enough :math:`t`, :math:`u(0,t)\approx 1` and :math:`u(1,t)\approx 1`, but as :math:`t\rightarrow\infty`, :math:`u(x,t)\rightarrow 1/2` on :math:`[0,1]`. Solution for a Gaussian pulse ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The standard diffusion equation :math:`u_t = {\alpha} u_{xx}` admits a Gaussian function as solution: .. _Eq:diffu:pde1:sol:Gaussian: .. math:: \tag{396} u(x,t) = \frac{1}{\sqrt{4\pi{\alpha} t}} \exp{\left({-\frac{(x-c)^2}{4{\alpha} t}}\right)} {\thinspace .} At :math:`t=0` this is a Dirac delta function, so for computational purposes one must start to view the solution at some time :math:`t=t_\epsilon>0`. Replacing :math:`t` by :math:`t_\epsilon +t` in :ref:`(396) ` makes it easy to operate with a (new) :math:`t` that starts at :math:`t=0` with an initial condition with a finite width. The important feature of :ref:`(396) ` is that the standard deviation :math:`\sigma` of a sharp initial Gaussian pulse increases in time according to :math:`\sigma = \sqrt{2{\alpha} t}`, making the pulse diffuse and flatten out. .. Mention combinations of such kernels to build up a general analytical sol? .. Or maybe an exercise for verification. Solution for a sine component ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Also, :ref:`(390) ` admits a solution of the form .. _Eq:diffu:pde1:sol1: .. math:: \tag{397} u(x,t) = Qe^{-at}\sin\left( kx\right) {\thinspace .} The parameters :math:`Q` and :math:`k` can be freely chosen, while inserting :ref:`(397) ` in :ref:`(390) ` gives the constraint .. math:: a = -{\alpha} k^2 {\thinspace .} A very important feature is that the initial shape :math:`I(x)=Q\sin kx` undergoes a damping :math:`\exp{(-{\alpha} k^2t)}`, meaning that rapid oscillations in space, corresponding to large :math:`k`, are very much faster dampened than slow oscillations in space, corresponding to small :math:`k`. This feature leads to a smoothing of the initial condition with time. (In fact, one can use a few steps of the diffusion equation as a method for removing noise in signal processing.) To judge how good a numerical method is, we may look at its ability to smoothen or dampen the solution in the same way as the PDE does. The following example illustrates the damping properties of :ref:`(397) `. We consider the specific problem .. math:: \begin{align*} u_t &= u_{xx},\quad x\in (0,1),\ t\in (0,T],\\ u(0,t) &= u(1,t) = 0,\quad t\in (0,T],\\ u(x,0) & = \sin (\pi x) + 0.1\sin(100\pi x) {\thinspace .} \end{align*} The initial condition has been chosen such that adding two solutions like :ref:`(397) ` constructs an analytical solution to the problem: .. _Eq:diffu:pde1:sol2: .. math:: \tag{398} u(x,t) = e^{-\pi^2 t}\sin (\pi x) + 0.1e^{-\pi^2 10^4 t}\sin (100\pi x) {\thinspace .} Figure :ref:`diffu:pde1:fig:damping` illustrates the rapid damping of rapid oscillations :math:`\sin (100\pi x)` and the very much slower damping of the slowly varying :math:`\sin (\pi x)` term. After about :math:`t=0.5\cdot10^{-4}` the rapid oscillations do not have a visible amplitude, while we have to wait until :math:`t\sim 0.5` before the amplitude of the long wave :math:`\sin (\pi x)` becomes very small. .. _diffu:pde1:fig:damping: .. figure:: diffusion_damping.png :width: 800 *Evolution of the solution of a diffusion problem: initial condition (upper left), 1/100 reduction of the small waves (upper right), 1/10 reduction of the long wave (lower left), and 1/100 reduction of the long wave (lower right)* .. x/sqrt(t) solution, kernel with integral Analysis of discrete equations ------------------------------ A counterpart to :ref:`(397) ` is the complex representation of the same function: .. math:: u(x,t) = Qe^{-at}e^{ikx}, where :math:`i=\sqrt{-1}` is the imaginary unit. We can add such functions, often referred to as wave components, to make a Fourier representation of a general solution of the diffusion equation: .. _Eq:diffu:pde1:u:Fourier: .. math:: \tag{399} u(x,t) \approx \sum_{k\in K} b_k e^{-{\alpha} k^2t}e^{ikx}, where :math:`K` is a set of an infinite number of :math:`k` values needed to construct the solution. In practice, however, the series is truncated and :math:`K` is a finite set of :math:`k` values needed to build a good approximate solution. Note that :ref:`(398) ` is a special case of :ref:`(399) ` where :math:`K=\{\pi, 100\pi\}`, :math:`b_{\pi}=1`, and :math:`b_{100\pi}=0.1`. The amplitudes :math:`b_k` of the individual Fourier waves must be determined from the initial condition. At :math:`t=0` we have :math:`u\approx\sum_kb_k\exp{(ikx)}` and find :math:`K` and :math:`b_k` such that .. _Eq:_auto171: .. math:: \tag{400} I(x) \approx \sum_{k\in K} b_k e^{ikx}{\thinspace .} (The relevant formulas for :math:`b_k` come from Fourier analysis, or equivalently, a least-squares method for approximating :math:`I(x)` in a function space with basis :math:`\exp{(ikx)}`.) Much insight about the behavior of numerical methods can be obtained by investigating how a wave component :math:`\exp{(-{\alpha} k^2 t)}\exp{(ikx)}` is treated by the numerical scheme. It appears that such wave components are also solutions of the schemes, but the damping factor :math:`\exp{(-{\alpha} k^2 t)}` varies among the schemes. To ease the forthcoming algebra, we write the damping factor as :math:`A^n`. The exact amplification factor corresponding to :math:`A` is :math:`{A_{\small\mbox{e}}} = \exp{(-{\alpha} k^2\Delta t)}`. .. _diffu:pde1:analysis:details: Analysis of the finite difference schemes ----------------------------------------- We have seen that a general solution of the diffusion equation can be built as a linear combination of basic components .. math:: e^{-{\alpha} k^2t}e^{ikx} {\thinspace .} A fundamental question is whether such components are also solutions of the finite difference schemes. This is indeed the case, but the amplitude :math:`\exp{(-{\alpha} k^2t)}` might be modified (which also happens when solving the ODE counterpart :math:`u'=-{\alpha} u`). We therefore look for numerical solutions of the form .. _Eq:diffu:pde1:analysis:uni: .. math:: \tag{401} u^n_q = A^n e^{ikq\Delta x} = A^ne^{ikx}, where the amplification factor :math:`A` must be determined by inserting the component into an actual scheme. Note that :math:`A^n` means :math:`A` raised to the power of :math:`n`, :math:`n` being the index in the time mesh, while the superscript :math:`n` in :math:`u^n_q` just denotes :math:`u` at time :math:`t_n`. Stability (3) ~~~~~~~~~~~~~~~~~~~~~~ The exact amplification factor is :math:`{A_{\small\mbox{e}}}=\exp{(-{\alpha}^2 k^2\Delta t)}`. We should therefore require :math:`|A| < 1` to have a decaying numerical solution as well. If :math:`-1\leq A<0`, :math:`A^n` will change sign from time level to time level, and we get stable, non-physical oscillations in the numerical solutions that are not present in the exact solution. .. index:: amplification factor Accuracy (1) ~~~~~~~~~~~~~~~~~~~~~ To determine how accurately a finite difference scheme treats one wave component :ref:`(401) `, we see that the basic deviation from the exact solution is reflected in how well :math:`A^n` approximates :math:`{A_{\small\mbox{e}}}^n`, or how well :math:`A` approximates :math:`{A_{\small\mbox{e}}}`. We can plot :math:`{A_{\small\mbox{e}}}` and the various expressions for :math:`A`, and we can make Taylor expansions of :math:`A/{A_{\small\mbox{e}}}` to see the error more analytically. .. We shall in particular investigate the error :math:`{A_{\small\mbox{e}}} - A` in the .. amplification factor. Truncation error (1) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ As an alternative to examining the accuracy of the damping of a wave component, we can perform a general truncation error analysis as explained in :ref:`ch:trunc`. Such results are more general, but less detailed than what we get from the wave component analysis. The truncation error can almost always be computed and represents the error in the numerical model when the exact solution is substituted into the equations. In particular, the truncation error analysis tells the order of the scheme, which is of fundamental importance when verifying codes based on empirical estimation of convergence rates. .. _diffu:pde1:analysis:FE: Analysis of the Forward Euler scheme (2) ------------------------------------------------- .. 2DO: refer to vib and wave The Forward Euler finite difference scheme for :math:`u_t = {\alpha} u_{xx}` can be written as .. math:: [D_t^+ u = {\alpha} D_xD_x u]^n_q{\thinspace .} Inserting a wave component :ref:`(401) ` in the scheme demands calculating the terms .. math:: e^{ikq\Delta x}[D_t^+ A]^n = e^{ikq\Delta x}A^n\frac{A-1}{\Delta t}, and .. math:: A^nD_xD_x [e^{ikx}]_q = A^n\left( - e^{ikq\Delta x}\frac{4}{\Delta x^2} \sin^2\left(\frac{k\Delta x}{2}\right)\right) {\thinspace .} Inserting these terms in the discrete equation and dividing by :math:`A^n e^{ikq\Delta x}` leads to .. math:: \frac{A-1}{\Delta t} = -{\alpha} \frac{4}{\Delta x^2}\sin^2\left( \frac{k\Delta x}{2}\right), and consequently .. _Eq:_auto172: .. math:: \tag{402} A = 1 -4F\sin^2 p where .. _Eq:_auto173: .. math:: \tag{403} F = \frac{{\alpha}\Delta t}{\Delta x^2} is the *numerical Fourier number*, and :math:`p=k\Delta x/2`. The complete numerical solution is then .. _Eq:_auto174: .. math:: \tag{404} u^n_q = \left(1 -4F\sin^2 p\right)^ne^{ikq\Delta x} {\thinspace .} Stability (4) ~~~~~~~~~~~~~~~~~~~~~~ We easily see that :math:`A\leq 1`. However, the :math:`A` can be less than :math:`-1`, which will lead to growth of a numerical wave component. The criterion :math:`A\geq -1` implies .. math:: 4F\sin^2 (p/2)\leq 2 {\thinspace .} The worst case is when :math:`\sin^2 (p/2)=1`, so a sufficient criterion for stability is .. _Eq:_auto175: .. math:: \tag{405} F\leq {\frac{1}{2}}, or expressed as a condition on :math:`\Delta t`: .. _Eq:_auto176: .. math:: \tag{406} \Delta t\leq \frac{\Delta x^2}{2{\alpha}}{\thinspace .} Note that halving the spatial mesh size, :math:`\Delta x \rightarrow {\frac{1}{2}} \Delta x`, requires :math:`\Delta t` to be reduced by a factor of :math:`1/4`. The method hence becomes very expensive for fine spatial meshes. .. 2DO: verification based on exact solutions Accuracy (2) ~~~~~~~~~~~~~~~~~~~~~ Since :math:`A` is expressed in terms of :math:`F` and the parameter we now call :math:`p=k\Delta x/2`, we should also express :math:`{A_{\small\mbox{e}}}` by :math:`F` and :math:`p`. The exponent in :math:`{A_{\small\mbox{e}}}` is :math:`-{\alpha} k^2\Delta t`, which equals :math:`-F k^2\Delta x^2=-F4p^2`. Consequently, .. math:: {A_{\small\mbox{e}}} = \exp{(-{\alpha} k^2\Delta t)} = \exp{(-4Fp^2)} {\thinspace .} All our :math:`A` expressions as well as :math:`{A_{\small\mbox{e}}}` are now functions of the two dimensionless parameters :math:`F` and :math:`p`. Computing the Taylor series expansion of :math:`A/{A_{\small\mbox{e}}}` in terms of :math:`F` can easily be done with aid of ``sympy``: .. code-block:: python def A_exact(F, p): return exp(-4*F*p**2) def A_FE(F, p): return 1 - 4*F*sin(p)**2 from sympy import * F, p = symbols('F p') A_err_FE = A_FE(F, p)/A_exact(F, p) print A_err_FE.series(F, 0, 6) The result is .. math:: \frac{A}{{A_{\small\mbox{e}}}} = 1 - 4 F \sin^{2}p + 2F p^{2} - 16F^{2} p^{2} \sin^{2}p + 8 F^{2} p^{4} + \cdots Recalling that :math:`F={\alpha}\Delta t/\Delta x^2`, :math:`p=k\Delta x/2`, and that :math:`\sin^2p\leq 1`, we realize that the dominating terms in :math:`A/{A_{\small\mbox{e}}}` are at most .. math:: 1 - 4{\alpha} \frac{\Delta t}{\Delta x^2} + {\alpha}\Delta t - 4{\alpha}^2\Delta t^2 + {\alpha}^2 \Delta t^2\Delta x^2 + \cdots {\thinspace .} Truncation error (2) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ We follow the theory explained in :ref:`ch:trunc`. The recipe is to set up the scheme in operator notation and use formulas from :ref:`trunc:table` to derive an expression for the residual. The details are documented in :ref:`trunc:diffu:1D`. We end up with a truncation error .. math:: R^n_i = {\mathcal{O}(\Delta t)} + {\mathcal{O}(\Delta x^2)}{\thinspace .} Although this is not the true error :math:`{u_{\small\mbox{e}}}(x_i,t_n) - u^n_i`, it indicates that the true error is of the form .. math:: E = C_t\Delta t + C_x\Delta x^2 for two unknown constants :math:`C_t` and :math:`C_x`. .. _diffu:pde1:analysis:BE: Analysis of the Backward Euler scheme ------------------------------------- Discretizing :math:`u_t = {\alpha} u_{xx}` by a Backward Euler scheme, .. math:: [D_t^- u = {\alpha} D_xD_x u]^n_q, and inserting a wave component :ref:`(401) `, leads to calculations similar to those arising from the Forward Euler scheme, but since .. math:: e^{ikq\Delta x}[D_t^- A]^n = A^ne^{ikq\Delta x}\frac{1 - A^{-1}}{\Delta t}, we get .. math:: \frac{1-A^{-1}}{\Delta t} = -{\alpha} \frac{4}{\Delta x^2}\sin^2\left( \frac{k\Delta x}{2}\right), and then .. _Eq:diffu:pde1:analysis:BE:A: .. math:: \tag{407} A = \left(1 + 4F\sin^2p\right)^{-1} {\thinspace .} The complete numerical solution can be written .. _Eq:_auto177: .. math:: \tag{408} u^n_q = \left(1 + 4F\sin^2 p\right)^{-n} e^{ikq\Delta x} {\thinspace .} Stability (5) ~~~~~~~~~~~~~~~~~~~~~~ We see from :ref:`(407) ` that :math:`00`. Truncation error (3) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The derivation of the truncation error for the Backward Euler scheme is almost identical to that for the Forward Euler scheme. We end up with .. math:: R^n_i = {\mathcal{O}(\Delta t)} + {\mathcal{O}(\Delta x^2)}{\thinspace .} .. _diffu:pde1:analysis:CN: Analysis of the Crank-Nicolson scheme ------------------------------------- The Crank-Nicolson scheme can be written as .. math:: [D_t u = {\alpha} D_xD_x \overline{u}^x]^{n+\frac{1}{2}}_q, or .. math:: [D_t u]^{n+\frac{1}{2}}_q = \frac{1}{2}{\alpha}\left( [D_xD_x u]^{n}_q + [D_xD_x u]^{n+1}_q\right) {\thinspace .} Inserting :ref:`(401) ` in the time derivative approximation leads to .. math:: [D_t A^n e^{ikq\Delta x}]^{n+\frac{1}{2}} = A^{n+\frac{1}{2}} e^{ikq\Delta x}\frac{A^{\frac{1}{2}}-A^{-\frac{1}{2}}}{\Delta t} = A^ne^{ikq\Delta x}\frac{A-1}{\Delta t} {\thinspace .} Inserting :ref:`(401) ` in the other terms and dividing by :math:`A^ne^{ikq\Delta x}` gives the relation .. math:: \frac{A-1}{\Delta t} = -\frac{1}{2}{\alpha}\frac{4}{\Delta x^2} \sin^2\left(\frac{k\Delta x}{2}\right) (1 + A), and after some more algebra, .. _Eq:_auto178: .. math:: \tag{409} A = \frac{ 1 - 2F\sin^2p}{1 + 2F\sin^2p} {\thinspace .} The exact numerical solution is hence .. _Eq:_auto179: .. math:: \tag{410} u^n_q = \left(\frac{ 1 - 2F\sin^2p}{1 + 2F\sin^2p}\right)^ne^{ikp\Delta x} {\thinspace .} Stability (6) ~~~~~~~~~~~~~~~~~~~~~~ The criteria :math:`A>-1` and :math:`A<1` are fulfilled for any :math:`\Delta t >0`. Therefore, the solution cannot grow, but it will oscillate if :math:`1-2F\sin^p < 0`. To avoid such non-physical oscillations, we must demand :math:`F\leq\frac{1}{2}`. Truncation error (4) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The truncation error is derived in :ref:`trunc:diffu:1D`: .. math:: R^{n+\frac{1}{2}}_i = {\mathcal{O}(\Delta x^2)} + {\mathcal{O}(\Delta t^2)}{\thinspace .} .. _diffu:pde1:analysis:leapfrog: Analysis of the Leapfrog scheme ------------------------------- An attractive feature of the Forward Euler scheme is the explicit time stepping and no need for solving linear systems. However, the accuracy in time is only :math:`{\mathcal{O}(\Delta t)}`. We can get an explicit *second-order* scheme in time by using the Leapfrog method: .. math:: [D_{2t} u = {\alpha} D_xDx u + f]^n_i{\thinspace .} Written out, .. math:: u^{n+1} = u^{n-1} + \frac{2{\alpha}\Delta t}{\Delta x^2} (u^{n}_{i+1} - 2u^n_i + u^n_{i-1}) + f(x_i,t_n){\thinspace .} We need some formula for the first step, :math:`u^1_i`, but for that we can use a Forward Euler step. Unfortunately, the Leapfrog scheme is always unstable for the diffusion equation. To see this, we insert a wave component :math:`A^ne^{ikx}` and get .. math:: \frac{A - A^{-1}}{\Delta t} = -{\alpha} \frac{4}{\Delta x^2}\sin^2 p, or .. math:: A^2 + 4F \sin^2 p\, A - 1 = 0, which has roots .. math:: A = -2F\sin^2 p \pm \sqrt{4F^2\sin^4 p + 1}{\thinspace .} Both roots have :math:`|A|>1` so the always amplitude grows, which is not in accordance with physics of the problem. However, for a PDE with a first-order derivative in space, instead of a second-order one, the Leapfrog scheme performs very well. Details are provided in the section :ref:`advec:1D:leapfrog`. Summary of accuracy of amplification factors -------------------------------------------- We can plot the various amplification factors against :math:`p=k\Delta x/2` for different choices of the :math:`F` parameter. Figures :ref:`diffu:pde1:fig:A:err:C20`, :ref:`diffu:pde1:fig:A:err:C0.5`, and :ref:`diffu:pde1:fig:A:err:C0.1` show how long and small waves are damped by the various schemes compared to the exact damping. As long as all schemes are stable, the amplification factor is positive, except for Crank-Nicolson when :math:`F>0.5`. .. _diffu:pde1:fig:A:err:C20: .. figure:: diffusion_A_F20_F2.png :width: 800 *Amplification factors for large time steps* .. _diffu:pde1:fig:A:err:C0.5: .. figure:: diffusion_A_F05_F025.png :width: 800 *Amplification factors for time steps around the Forward Euler stability limit* .. _diffu:pde1:fig:A:err:C0.1: .. figure:: diffusion_A_F01_F001.png :width: 800 *Amplification factors for small time steps* The effect of negative amplification factors is that :math:`A^n` changes sign from one time level to the next, thereby giving rise to oscillations in time in an animation of the solution. We see from Figure :ref:`diffu:pde1:fig:A:err:C20` that for :math:`F=20`, waves with :math:`p\geq \pi/4` undergo a damping close to :math:`-1`, which means that the amplitude does not decay and that the wave component jumps up and down (flips amplitude) in time. For :math:`F=2` we have a damping of a factor of 0.5 from one time level to the next, which is very much smaller than the exact damping. Short waves will therefore fail to be effectively dampened. These waves will manifest themselves as high frequency oscillatory noise in the solution. A value :math:`p=\pi/4` corresponds to four mesh points per wave length of :math:`e^{ikx}`, while :math:`p=\pi/2` implies only two points per wave length, which is the smallest number of points we can have to represent the wave on the mesh. To demonstrate the oscillatory behavior of the Crank-Nicolson scheme, we choose an initial condition that leads to short waves with significant amplitude. A discontinuous :math:`I(x)` will in particular serve this purpose: Figures :ref:`diffu:pde1:CN:fig:F=3` and :ref:`diffu:pde1:CN:fig:F=10` correspond to :math:`F=3` and :math:`F=10`, respectively, and we see how short waves pollute the overall solution. .. _diffu:2D:analysis: Analysis of the 2D diffusion equation ------------------------------------- We first consider the 2D diffusion equation .. math:: u_{t} = {\alpha}(u_{xx} + u_{yy}), which has Fourier component solutions of the form .. math:: u(x,y,t) = Ae^{-{\alpha} k^2t}e^{i(k_x x + k_yy)}, and the schemes have discrete versions of this Fourier component: .. math:: u^{n}_{q,r} = A\xi^{n}e^{i(k_x q\Delta x + k_y r\Delta y)}{\thinspace .} The Forward Euler scheme (2) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ For the Forward Euler discretization, .. math:: [D_t^+u = {\alpha}(D_xD_x u + D_yD_y u)]_{i,j}^n, we get .. math:: \frac{\xi - 1}{\Delta t} = -{\alpha}\frac{4}{\Delta x^2}\sin^2\left(\frac{k_x\Delta x}{2}\right) - {\alpha}\frac{4}{\Delta y^2}\sin^2\left(\frac{k_y\Delta y}{2}\right){\thinspace .} Introducing .. math:: p_x = \frac{k_x\Delta x}{2},\quad p_y = \frac{k_y\Delta y}{2}, we can write the equation for :math:`\xi` more compactly as .. math:: \frac{\xi - 1}{\Delta t} = -{\alpha}\frac{4}{\Delta x^2}\sin^2 p_x - {\alpha}\frac{4}{\Delta y^2}\sin^2 p_y, and solve for :math:`\xi`: .. _Eq:diffu:2D:analysis:xi: .. math:: \tag{411} \xi = 1 - 4F_x\sin^2 p_x - 4F_y\sin^2 p_y{\thinspace .} The complete numerical solution for a wave component is .. _Eq:diffu:2D:analysis:FE:numexact: .. math:: \tag{412} u^{n}_{q,r} = A(1 - 4F_x\sin^2 p_x - 4F_y\sin^2 p_y)^n e^{i(k_xp\Delta x + k_yq\Delta y)}{\thinspace .} For stability we demand :math:`-1\leq\xi\leq 1`, and :math:`-1\leq\xi` is the critical limit, since clearly :math:`\xi \leq 1`, and the worst case happens when the sines are at their maximum. The stability criterion becomes .. _Eq:diffu:2D:analysis:FE:stab: .. math:: \tag{413} F_x + F_y \leq \frac{1}{2}{\thinspace .} For the special, yet common, case :math:`\Delta x=\Delta y=h`, the stability criterion can be written as .. math:: \Delta t \leq \frac{h^2}{2d{\alpha}}, where :math:`d` is the number of space dimensions: :math:`d=1,2,3`. The Backward Euler scheme (2) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The Backward Euler method, .. math:: [D_t^-u = {\alpha}(D_xD_x u + D_yD_y u)]_{i,j}^n, results in .. math:: 1 - \xi^{-1} = - 4F_x \sin^2 p_x - 4F_y \sin^2 p_y, and .. math:: \xi = (1 + 4F_x \sin^2 p_x + 4F_y \sin^2 p_y)^{-1}, which is always in :math:`(0,1]`. The solution for a wave component becomes .. _Eq:diffu:2D:analysis:BN:numexact: .. math:: \tag{414} u^{n}_{q,r} = A(1 + 4F_x\sin^2 p_x + 4F_y\sin^2 p_y)^{-n} e^{i(k_xq\Delta x + k_yr\Delta y)}{\thinspace .} The Crank-Nicolson scheme (2) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ With a Crank-Nicolson discretization, .. math:: [D_tu]^{n+\frac{1}{2}}_{i,j} = \frac{1}{2} [{\alpha}(D_xD_x u + D_yD_y u)]_{i,j}^{n+1} + \frac{1}{2} [{\alpha}(D_xD_x u + D_yD_y u)]_{i,j}^n, we have, after some algebra, .. math:: \xi = \frac{1 - 2(F_x\sin^2 p_x + F_x\sin^2p_y)}{1 + 2(F_x\sin^2 p_x + F_x\sin^2p_y)}{\thinspace .} The fraction on the right-hand side is always less than 1, so stability in the sense of non-growing wave components is guaranteed for all physical and numerical parameters. However, the fraction can become negative and result in non-physical oscillations. This phenomenon happens when .. math:: F_x\sin^2 p_x + F_x\sin^2p_y > \frac{1}{2}{\thinspace .} A criterion against non-physical oscillations is therefore .. math:: F_x + F_y \leq \frac{1}{2}, which is the same limit as the stability criterion for the Forward Euler scheme. The exact discrete solution is .. _Eq:diffu:2D:analysis:CN:numexact: .. math:: \tag{415} u^{n}_{q,r} = A \left( \frac{1 - 2(F_x\sin^2 p_x + F_x\sin^2p_y)}{1 + 2(F_x\sin^2 p_x + F_x\sin^2p_y)} \right)^n e^{i(k_xq\Delta x + k_yr\Delta y)}{\thinspace .} Explanation of numerical artifacts ---------------------------------- The behavior of the solution generated by Forward Euler discretization in time (and centered differences in space) is summarized at the end of the section :ref:`diffu:pde1:FE:experiments`. Can we from the analysis above explain the behavior? We may start by looking at Figure :ref:`diffu:pde1:FE:fig:F=0.51` where :math:`F=0.51`. The figure shows that the solution is unstable and grows in time. The stability limit for such growth is :math:`F=0.5` and since the :math:`F` in this simulation is slightly larger, growth is unavoidable. Figure :ref:`diffu:pde1:FE:fig:F=0.5` has unexpected features: we would expect the solution of the diffusion equation to be smooth, but the graphs in Figure :ref:`diffu:pde1:FE:fig:F=0.5` contain non-smooth noise. Turning to Figure :ref:`diffu:pde1:FE:fig:gauss:F=0.5`, which has a quite similar initial condition, we see that the curves are indeed smooth. The problem with the results in Figure :ref:`diffu:pde1:FE:fig:F=0.5` is that the initial condition is discontinuous. To represent it, we need a significant amplitude on the shortest waves in the mesh. However, for :math:`F=0.5`, the shortest wave (:math:`p=\pi/2`) gives the amplitude in the numerical solution as :math:`(1-4F)^n`, which oscillates between negative and positive values at subsequent time levels for :math:`F>\frac{1}{4}`. Since the shortest waves have visible amplitudes in the solution profile, the oscillations becomes visible. The smooth initial condition in Figure :ref:`diffu:pde1:FE:fig:gauss:F=0.5`, on the other hand, leads to very small amplitudes of the shortest waves. That these waves then oscillate in a non-physical way for :math:`F=0.5` is not a visible effect. The oscillations in time in the amplitude :math:`(1-4F)^n` disappear for :math:`F\leq\frac{1}{4}`, and that is why also the discontinuous initial condition always leads to smooth solutions in Figure :ref:`diffu:pde1:FE:fig:F=0.25`, where :math:`F=\frac{1}{4}`. Turning the attention to the Backward Euler scheme and the experiments in Figure :ref:`diffu:pde1:BE:fig:F=0.5`, we see that even the discontinuous initial condition gives smooth solutions for :math:`F=0.5` (and in fact all other :math:`F` values). From the exact expression of the numerical amplitude, :math:`(1 + 4F\sin^2p)^{-1}`, we realize that this factor can never flip between positive and negative values, and no instabilities can occur. The conclusion is that the Backward Euler scheme always produces smooth solutions. Also, the Backward Euler scheme guarantees that the solution cannot grow in time (unless we add a source term to the PDE, but that is meant to represent a physically relevant growth). Finally, we have some small, strange artifacts when simulating the development of the initial plug profile with the Crank-Nicolson scheme, see Figure :ref:`diffu:pde1:CN:fig:F=10`, where :math:`F=3`. The Crank-Nicolson scheme cannot give growing amplitudes, but it may give oscillating amplitudes in time. The critical factor is :math:`1 - 2F\sin^2p`, which for the shortest waves (:math:`p=\pi/2`) indicates a stability limit :math:`F=0.5`. With the discontinuous initial condition, we have enough amplitude on the shortest waves so their wrong behavior is visible, and this is what we see as small instabilities in Figure :ref:`diffu:pde1:CN:fig:F=10`. The only remedy is to lower the :math:`F` value. Exercises (6) ====================== .. --- begin exercise --- .. _diffu:exer:1D:gaussian:symmetric: Exercise 3.1: Explore symmetry in a 1D problem ---------------------------------------------- This exercise simulates the exact solution :ref:`(396) `. Suppose for simplicity that :math:`c=0`. **a)** Formulate an initial-boundary value problem that has :ref:`(396) ` as solution in the domain :math:`[-L,L]`. Use the exact solution :ref:`(396) ` as Dirichlet condition at the boundaries. Simulate the diffusion of the Gaussian peak. Observe that the solution is symmetric around :math:`x=0`. **b)** Show from :ref:`(396) ` that :math:`u_x(c,t)=0`. Since the solution is symmetric around :math:`x=c=0`, we can solve the numerical problem in half of the domain, using a *symmetry boundary condition* :math:`u_x=0` at :math:`x=0`. Set up the initial-boundary value problem in this case. Simulate the diffusion problem in :math:`[0,L]` and compare with the solution in a). .. removed !bsol ... !esol environment (because of the command-line option --without_solutions) Filename: ``diffu_symmetric_gaussian``. .. --- end exercise --- .. --- begin exercise --- .. _diffu:exer:1D:ux:onesided: Exercise 3.2: Investigate approximation errors from a :math:`u_x=0` boundary condition -------------------------------------------------------------------------------------- We consider the problem solved in :ref:`diffu:exer:1D:gaussian:symmetric` part b). The boundary condition :math:`u_x(0,t)=0` can be implemented in two ways: 1) by a standard symmetric finite difference :math:`[D_{2x}u]_i^n=0`, or 2) by a one-sided difference :math:`[D^+u=0]^n_i=0`. Investigate the effect of these two conditions on the convergence rate in space. .. --- begin hint in exercise --- **Hint.** If you use a Forward Euler scheme, choose a discretization parameter :math:`h=\Delta t = \Delta x^2` and assume the error goes like :math:`E\sim h^r`. The error in the scheme is :math:`{\mathcal{O}(\Delta t,\Delta x^2)}` so one should expect that the estimated :math:`r` approaches 1. The question is if a one-sided difference approximation to :math:`u_x(0,t)=0` destroys this convergence rate. .. --- end hint in exercise --- Filename: ``diffu_onesided_fd``. .. --- end exercise --- .. --- begin exercise --- .. _diffu:exer:1D:openBC: Exercise 3.3: Experiment with open boundary conditions in 1D ------------------------------------------------------------ We address diffusion of a Gaussian function as in :ref:`diffu:exer:1D:gaussian:symmetric`, in the domain :math:`[0,L]`, but now we shall explore different types of boundary conditions on :math:`x=L`. In real-life problems we do not know the exact solution on :math:`x=L` and must use something simpler. **a)** Imagine that we want to solve the problem numerically on :math:`[0,L]`, with a symmetry boundary condition :math:`u_x=0` at :math:`x=0`, but we do not know the exact solution and cannot of that reason assign a correct Dirichlet condition at :math:`x=L`. One idea is to simply set :math:`u(L,t)=0` since this will be an accurate approximation before the diffused pulse reaches :math:`x=L` and even thereafter it might be a satisfactory condition if the exact :math:`u` has a small value. Let :math:`{u_{\small\mbox{e}}}` be the exact solution and let :math:`u` be the solution of :math:`u_t={\alpha} u_{xx}` with an initial Gaussian pulse and the boundary conditions :math:`u_x(0,t)=u(L,t)=0`. Derive a diffusion problem for the error :math:`e={u_{\small\mbox{e}}} - u`. Solve this problem numerically using an exact Dirichlet condition at :math:`x=L`. Animate the evolution of the error and make a curve plot of the error measure .. math:: E(t)=\sqrt{\frac{\int_0^L e^2dx}{\int_0^L udx}}{\thinspace .} Is this a suitable error measure for the present problem? **b)** Instead of using :math:`u(L,t)=0` as approximate boundary condition for letting the diffused Gaussian pulse move out of our finite domain, one may try :math:`u_x(L,t)=0` since the solution for large :math:`t` is quite flat. Argue that this condition gives a completely wrong asymptotic solution as :math:`t\rightarrow 0`. To do this, integrate the diffusion equation from :math:`0` to :math:`L`, integrate :math:`u_{xx}` by parts (or use Gauss' divergence theorem in 1D) to arrive at the important property .. math:: \frac{d}{dt}\int_{0}^L u(x,t)dx = 0, implying that :math:`\int_0^Ludx` must be constant in time, and therefore .. math:: \int_{0}^L u(x,t)dx = \int_{0}^LI(x)dx{\thinspace .} The integral of the initial pulse is 1. **c)** Another idea for an artificial boundary condition at :math:`x=L` is to use a cooling law .. _Eq:diffu:pde1:Gaussian:xL:cooling: .. math:: \tag{416} -{\alpha} u_x = q(u - u_S), where :math:`q` is an unknown heat transfer coefficient and :math:`u_S` is the surrounding temperature in the medium outside of :math:`[0,L]`. (Note that arguing that :math:`u_S` is approximately :math:`u(L,t)` gives the :math:`u_x=0` condition from the previous subexercise that is qualitatively wrong for large :math:`t`.) Develop a diffusion problem for the error in the solution using :ref:`(416) ` as boundary condition. Assume one can take :math:`u_S=0` "outside the domain" since :math:`{u_{\small\mbox{e}}}\rightarrow 0` as :math:`x\rightarrow\infty`. Find a function :math:`q=q(t)` such that the exact solution obeys the condition :ref:`(416) `. Test some constant values of :math:`q` and animate how the corresponding error function behaves. Also compute :math:`E(t)` curves as defined above. Filename: ``diffu_open_BC``. .. --- end exercise --- .. --- begin exercise --- Exercise 3.4: Simulate a diffused Gaussian peak in 2D/3D -------------------------------------------------------- **a)** Generalize :ref:`(396) ` to multi dimensions by assuming that one-dimensional solutions can be multiplied to solve :math:`u_t = {\alpha}\nabla^2 u`. Set :math:`c=0` such that the peak of the Gaussian is at the origin. **b)** One can from the exact solution show that :math:`u_x=0` on :math:`x=0`, :math:`u_y=0` on :math:`y=0`, and :math:`u_z=0` on :math:`z=0`. The approximately correct condition :math:`u=0` can be set on the remaining boundaries (say :math:`x=L`, :math:`y=L`, :math:`z=L`), cf. :ref:`diffu:exer:1D:openBC`. Simulate a 2D case and make an animation of the diffused Gaussian peak. **c)** The formulation in b) makes use of symmetry of the solution such that we can solve the problem in the first quadrant (2D) or octant (3D) only. To check that the symmetry assumption is correct, formulate the problem without symmetry in a domain :math:`[-L,L]\times [L,L]` in 2D. Use :math:`u=0` as approximately correct boundary condition. Simulate the same case as in b), but in a four times as large domain. Make an animation and compare it with the one in b). Filename: ``diffu_symmetric_gaussian_2D``. .. --- end exercise --- .. --- begin exercise --- .. _diffu:exer:uterm: Exercise 3.5: Examine stability of a diffusion model with a source term ----------------------------------------------------------------------- Consider a diffusion equation with a linear :math:`u` term: .. math:: u_t = {\alpha} u_{xx} + \beta u{\thinspace .} **a)** Derive in detail the Forward Euler, Backward Euler, and Crank-Nicolson schemes for this type of diffusion model. Thereafter, formulate a :math:`\theta`-rule to summarize the three schemes. **b)** Assume a solution like :ref:`(397) ` and find the relation between :math:`a`, :math:`k`, :math:`{\alpha}`, and :math:`\beta`. .. --- begin hint in exercise --- **Hint.** Insert :ref:`(397) ` in the PDE problem. .. --- end hint in exercise --- **c)** Calculate the stability of the Forward Euler scheme. Design numerical experiments to confirm the results. .. --- begin hint in exercise --- **Hint.** Insert the discrete counterpart to :ref:`(397) ` in the numerical scheme. Run experiments at the stability limit and slightly above. .. --- end hint in exercise --- **d)** Repeat c) for the Backward Euler scheme. **e)** Repeat c) for the Crank-Nicolson scheme. **f)** How does the extra term :math:`bu` impact the accuracy of the three schemes? .. --- begin hint in exercise --- **Hint.** For analysis of the accuracy, compare the numerical and exact amplification factors, in graphs and/or by Taylor series expansion. .. --- end hint in exercise --- Filename: ``diffu_stability_uterm``. .. --- end exercise --- .. _diffu:varcoeff: Diffusion in heterogeneous media ================================ .. index:: single: diffusion coefficient; non-constant Diffusion in heterogeneous media normally implies a non-constant diffusion coefficient :math:`\alpha = \alpha (x)`. A 1D diffusion model with such a variable diffusion coefficient reads .. _Eq:diffu:pde2: .. math:: \tag{417} \frac{\partial u}{\partial t} = \frac{\partial}{\partial x}\left( \alpha (x) \frac{\partial u}{\partial x} \right) + f(x,t), \quad x\in (0,L),\ t\in (0,T], .. _Eq:diffu:pde2:ic:u: .. math:: \tag{418} u(x,0) = I(x), \quad x\in [0,L], .. _Eq:diffu:pde2:bc:0: .. math:: \tag{419} u(0,t) = U_0, \quad t>0, .. _Eq:diffu:pde2:bc:L: .. math:: \tag{420} u(L,t) = U_L, \quad t>0. A short form of the diffusion equation with variable coefficients is :math:`u_t = (\alpha u_x)_x`. .. _diffu:varcoeff:discr: Discretization (2) --------------------------- We can discretize :ref:`(417) ` by a :math:`\theta`-rule in time and centered differences in space: .. math:: \lbrack D_t u\rbrack^{n+\frac{1}{2}}_i = \theta\lbrack D_x(\overline{{\alpha}}^x D_x u) + f\rbrack^{n+1}_i + (1-\theta)\lbrack D_x(\overline{{\alpha}}^x D_x u) + f\rbrack^{n}_i{\thinspace .} Written out, this becomes .. math:: \begin{align*} \frac{u^{n+1}_i-u^{n}_i}{\Delta t} &= \theta\frac{1}{\Delta x^2} ({\alpha}_{i+\frac{1}{2}}(u^{n+1}_{i+1} - u^{n+1}_{i}) - {\alpha}_{i-\frac{1}{2}}(u^{n+1}_i - u^{n+1}_{i+1})) +\\ &\quad (1-\theta)\frac{1}{\Delta x^2} ({\alpha}_{i+\frac{1}{2}}(u^{n}_{i+1} - u^{n}_{i}) - {\alpha}_{i-\frac{1}{2}}(u^{n}_i - u^{n}_{i+1})) +\\ &\quad \theta f_i^{n+1} + (1-\theta)f_i^{n}, \end{align*} where, e.g., an arithmetic mean can to be used for :math:`{\alpha}_{i+\frac{1}{2}}`: .. math:: {\alpha}_{i+\frac{1}{2}} = \frac{1}{2}({\alpha}_i + {\alpha}_{i+1}){\thinspace .} .. _diffu:varcoeff:impl: Implementation (8) --------------------------- .. index:: single: diffusion equation; 1D, Implementation Suitable code for solving the discrete equations is very similar to what we created for a constant :math:`{\alpha}`. Since the Fourier number has no meaning for varying :math:`{\alpha}`, we introduce a related parameter :math:`D=\Delta t /\Delta x^2`. .. code-block:: python def solver_theta(I, a, L, Nx, D, T, theta=0.5, u_L=1, u_R=0, user_action=None): x = linspace(0, L, Nx+1) # mesh points in space dx = x[1] - x[0] dt = D*dx**2 Nt = int(round(T/float(dt))) t = linspace(0, T, Nt+1) # mesh points in time u = zeros(Nx+1) # solution array at t[n+1] u_n = zeros(Nx+1) # solution at t[n] Dl = 0.5*D*theta Dr = 0.5*D*(1-theta) # Representation of sparse matrix and right-hand side diagonal = zeros(Nx+1) lower = zeros(Nx) upper = zeros(Nx) b = zeros(Nx+1) # Precompute sparse matrix (scipy format) diagonal[1:-1] = 1 + Dl*(a[2:] + 2*a[1:-1] + a[:-2]) lower[:-1] = -Dl*(a[1:-1] + a[:-2]) upper[1:] = -Dl*(a[2:] + a[1:-1]) # Insert boundary conditions diagonal[0] = 1 upper[0] = 0 diagonal[Nx] = 1 lower[-1] = 0 A = scipy.sparse.diags( diagonals=[diagonal, lower, upper], offsets=[0, -1, 1], shape=(Nx+1, Nx+1), format='csr') # Set initial condition for i in range(0,Nx+1): u_n[i] = I(x[i]) if user_action is not None: user_action(u_n, x, t, 0) # Time loop for n in range(0, Nt): b[1:-1] = u_n[1:-1] + Dr*( (a[2:] + a[1:-1])*(u_n[2:] - u_n[1:-1]) - (a[1:-1] + a[0:-2])*(u_n[1:-1] - u_n[:-2])) # Boundary conditions b[0] = u_L(t[n+1]) b[-1] = u_R(t[n+1]) # Solve u[:] = scipy.sparse.linalg.spsolve(A, b) if user_action is not None: user_action(u, x, t, n+1) # Switch variables before next step u_n, u = u, u_n The code is found in the file `diffu1D_vc.py `__. .. _diffu:varcoeff:stationary: Stationary solution ------------------- .. index:: single: diffusion equation; stationary solution As :math:`t\rightarrow\infty`, the solution of the problem :ref:`(417) `-:ref:`(420) ` will approach a stationary limit where :math:`\partial u/\partial t=0`. The governing equation is then .. _Eq:diffu:fd2:pde:st: .. math:: \tag{421} \frac{d}{dx}\left(\alpha\frac{du}{dx}\right) =0, with boundary conditions :math:`u(0)=U_0` and :math:`u(L)=u_L`. It is possible to obtain an exact solution of :ref:`(421) ` for any :math:`\alpha`. Integrating twice and applying the boundary conditions to determine the integration constants gives .. _Eq:diffu:fd2:pde:st:sol: .. math:: \tag{422} u(x) = U_0 + (U_L-U_0)\frac{\int_0^x (\alpha(\xi))^{-1}d\xi}{\int_0^L (\alpha(\xi))^{-1}d\xi} {\thinspace .} .. _diffu:varcoeff:piecewise: Piecewise constant medium ------------------------- .. index:: single: diffusion coefficient; piecewise constant Consider a medium built of :math:`M` layers. The layer boundaries are denoted :math:`b_0, \ldots, b_M`, where :math:`b_0=0` and :math:`b_M=L`. If the layers potentially have different material properties, but these properties are constant within each layer, we can express :math:`\alpha` as a *piecewise constant function* according to .. _Eq:diffu:fd2:pde:st:pc:alpha: .. math:: \tag{423} \alpha (x) = \left\lbrace\begin{array}{ll} \alpha_0,& b_0 \leq x < b_1,\\ \vdots &\\ \alpha_i,& b_i \leq x < b_{i+1},\\ \vdots &\\ \alpha_{M-1},& b_{M-1} \leq x \leq b_M. \end{array}\right. The exact solution :ref:`(422) ` in case of such a piecewise constant :math:`\alpha` function is easy to derive. Assume that :math:`x` is in the :math:`m`-th layer: :math:`x\in [b_m, b_{m+1}]`. In the integral :math:`\int_0^x (a(\xi))^{-1}d\xi` we must integrate through the first :math:`m-1` layers and then add the contribution from the remaining part :math:`x-b_m` into the :math:`m`-th layer: .. _Eq:diffu:fd2:pde:st:sol:pc: .. math:: \tag{424} u(x) = U_0 + (U_L-U_0) \frac{\sum_{j=0}^{m-1} (b_{j+1}-b_j)/\alpha(b_j) + (x-b_m)/\alpha(b_m)}{\sum_{j=0}^{M-1} (b_{j+1}-b_j)/\alpha(b_j)} **Remark.** It may sound strange to have a discontinuous :math:`\alpha` in a differential equation where one is to differentiate, but a discontinuous :math:`\alpha` is compensated by a discontinuous :math:`u_x` such that :math:`\alpha u_x` is continuous and therefore can be differentiated as :math:`(\alpha u_x)_x`. .. _diffu:varcoeff:impl:piecewise: Implementation of diffusion in a piecewise constant medium ---------------------------------------------------------- .. index:: single: diffusion equation; implementation Programming with piecewise function definitions quickly becomes cumbersome as the most naive approach is to test for which interval :math:`x` lies, and then start evaluating a formula like :ref:`(424) `. In Python, vectorized expressions may help to speed up the computations. The convenience classes ``PiecewiseConstant`` and ``IntegratedPiecewiseConstant`` in the `Heaviside `__ module were made to simplify programming with functions like :ref:`(423) ` and expressions like :ref:`(424) `. These utilities not only represent piecewise constant functions, but also *smoothed* versions of them where the discontinuities can be smoothed out in a controlled fashion. The ``PiecewiseConstant`` class is created by sending in the domain as a 2-tuple or 2-list and a ``data`` object describing the boundaries :math:`b_0,\ldots,b_M` and the corresponding function values :math:`\alpha_0,\ldots,\alpha_{M-1}`. More precisely, ``data`` is a nested list, where ``data[i][0]`` holds :math:`b_i` and ``data[i][1]`` holds the corresponding value :math:`\alpha_i`, for :math:`i=0,\ldots,M-1`. Given :math:`b_i` and :math:`\alpha_i` in arrays ``b`` and ``a``, it is easy to fill out the nested list ``data``. In our application, we want to represent :math:`\alpha` and :math:`1/\alpha` as piecewise constant functions, in addition to the :math:`u(x)` function which involves the integrals of :math:`1/\alpha`. A class creating the functions we need and a method for evaluating :math:`u`, can take the form .. code-block:: python class SerialLayers: """ b: coordinates of boundaries of layers, b[0] is left boundary and b[-1] is right boundary of the domain [0,L]. a: values of the functions in each layer (len(a) = len(b)-1). U_0: u(x) value at left boundary x=0=b[0]. U_L: u(x) value at right boundary x=L=b[0]. """ def __init__(self, a, b, U_0, U_L, eps=0): self.a, self.b = np.asarray(a), np.asarray(b) self.eps = eps # smoothing parameter for smoothed a self.U_0, self.U_L = U_0, U_L a_data = [[bi, ai] for bi, ai in zip(self.b, self.a)] domain = [b[0], b[-1]] self.a_func = PiecewiseConstant(domain, a_data, eps) # inv_a = 1/a is needed in formulas inv_a_data = [[bi, 1./ai] for bi, ai in zip(self.b, self.a)] self.inv_a_func = \ PiecewiseConstant(domain, inv_a_data, eps) self.integral_of_inv_a_func = \ IntegratedPiecewiseConstant(domain, inv_a_data, eps) # Denominator in the exact formula is constant self.inv_a_0L = self.integral_of_inv_a_func(b[-1]) def __call__(self, x): solution = self.U_0 + (self.U_L-self.U_0)*\ self.integral_of_inv_a_func(x)/self.inv_a_0L return solution A visualization method is also convenient to have. Below we plot :math:`u(x)` along with :math:`\alpha (x)` (which works well as long as :math:`\max \alpha(x)` is of the same size as :math:`\max u = \max(U_0,U_L)`). .. code-block:: python class SerialLayers: ... def plot(self): x, y_a = self.a_func.plot() x = np.asarray(x); y_a = np.asarray(y_a) y_u = self.u_exact(x) import matplotlib.pyplot as plt plt.figure() plt.plot(x, y_u, 'b') plt.hold('on') # Matlab style plt.plot(x, y_a, 'r') ymin = -0.1 ymax = 1.2*max(y_u.max(), y_a.max()) plt.axis([x[0], x[-1], ymin, ymax]) plt.legend(['solution $u$', 'coefficient $a$'], loc='upper left') if self.eps > 0: plt.title('Smoothing eps: %s' % self.eps) plt.savefig('tmp.pdf') plt.savefig('tmp.png') plt.show() Figure :ref:`diffu:fd2:pde:st:sol:pc:fig1` shows the case where .. code-block:: python b = [0, 0.25, 0.5, 1] # material boundaries a = [0.2, 0.4, 4] # material values U_0 = 0.5; U_L = 5 # boundary conditions .. _diffu:fd2:pde:st:sol:pc:fig1: .. figure:: flow_in_layers_case1.png :width: 400 *Solution of the stationary diffusion equation corresponding to a piecewise constant diffusion coefficient* By adding the ``eps`` parameter to the constructor of the ``SerialLayers`` class, we can experiment with smoothed versions of :math:`\alpha` and see the (small) impact on :math:`u`. Figure :ref:`diffu:fd2:pde:st:sol:pc:fig2` shows the result. .. _diffu:fd2:pde:st:sol:pc:fig2: .. figure:: flow_in_layers_case1_eps.png :width: 400 Solution of the stationary diffusion equation corresponding to a *smoothed* piecewise constant diffusion coefficient .. _diffu:fd2:radial: Axi-symmetric diffusion ----------------------- .. index:: single: diffusion equation; axi-symmetric diffusion Suppose we have a diffusion process taking place in a straight tube with radius :math:`R`. We assume axi-symmetry such that :math:`u` is just a function of :math:`r` and :math:`t`, :math:`r` being the radial distance from the center axis of the tube to a point. With such axi-symmetry it is advantageous to introduce *cylindrical coordinates* :math:`r`, :math:`\theta`, and :math:`z`, where :math:`z` is in the direction of the tube and :math:`(r,\theta)` are polar coordinates in a cross section. Axi-symmetry means that all quantities are independent of :math:`\theta`. From the relations :math:`x=\cos\theta`, :math:`y=\sin\theta`, and :math:`z=z`, between Cartesian and cylindrical coordinates, one can (with some effort) derive the diffusion equation in cylindrical coordinates, which with axi-symmetry takes the form .. math:: \frac{\partial u}{\partial t} = \frac{1}{r}\frac{\partial}{\partial r} \left(r{\alpha}(r,z)\frac{\partial u}{\partial r}\right) + \frac{\partial}{\partial z} \left(\alpha(r,z)\frac{\partial u}{\partial z}\right) + f(r,z,t){\thinspace .} Let us assume that :math:`u` does not change along the tube axis so it suffices to compute variations in a cross section. Then :math:`\partial u/\partial z = 0` and the we have a 1D diffusion equation in the radial coordinate :math:`r` and time :math:`t`. In particular, we shall address the initial-boundary value problem .. _Eq:diffu:fd2:radial:PDE: .. math:: \tag{425} \frac{\partial u}{\partial t} = \frac{1}{r}\frac{\partial}{\partial r} \left(r{\alpha}(r)\frac{\partial u}{\partial r}\right) + f(t), r\in (0,R),\ t\in (0,T], .. _Eq:diffu:fd2:radial:symmr0: .. math:: \tag{426} \frac{\partial u}{\partial r}(0,t) = 0, t\in (0,T], .. _Eq:diffu:fd2:radial:uR: .. math:: \tag{427} u(R,t) = 0, t\in (0,T], .. _Eq:diffu:fd2:radial:initial: .. math:: \tag{428} u(r,0) = I(r), r\in [0,R]. The condition :ref:`(426) ` is a necessary symmetry condition at :math:`r=0`, while :ref:`(427) ` could be any Dirichlet or Neumann condition (or Robin condition in case of cooling or heating). The finite difference approximation will need the discretized version of the PDE for :math:`r=0` (just as we use the PDE at the boundary when implementing Neumann conditions). However, discretizing the PDE at :math:`r=0` poses a problem because of the :math:`1/r` factor. We therefore need to work out the PDE for discretization at :math:`r=0` with care. Let us, for the case of constant :math:`{\alpha}`, expand the spatial derivative term to .. math:: \alpha\frac{\partial^2 u}{\partial r^2} + \alpha\frac{1}{r}\frac{\partial u}{\partial r}{\thinspace .} The last term faces a difficulty at :math:`r=0`, since it becomes a :math:`0/0` expression caused by the symmetry condition at :math:`r=0`. However, L'Hosptial's rule can be used: .. math:: \lim_{r\rightarrow 0} \frac{1}{r}\frac{\partial u}{\partial r} = \frac{\partial^2 u}{\partial r^2}{\thinspace .} The PDE at :math:`r=0` therefore becomes .. _Eq:diffu:fd2:radial:eq_PDEr0:aconst: .. math:: \tag{429} \frac{\partial u}{\partial t} = 2{\alpha}\frac{\partial^2 u}{\partial r^2} + f(t){\thinspace .} For a variable coefficient :math:`{\alpha}(r)` the expanded spatial derivative term reads .. math:: {\alpha}(r)\frac{\partial^2 u}{\partial r^2} + \frac{1}{r}({\alpha}(r) + r{\alpha}'(r))\frac{\partial u}{\partial r}{\thinspace .} We are interested in this expression for :math:`r=0`. A necessary condition for :math:`u` to be axi-symmetric is that all input data, including :math:`\alpha`, must also be axi-symmetric, implying that :math:`\alpha'(0)=0` (the second term vanishes anyway because of :math:`r=0`). The limit of interest is .. math:: \lim_{r\rightarrow 0} \frac{1}{r}{\alpha}(r)\frac{\partial u}{\partial r} = {\alpha}(0)\frac{\partial^2 u}{\partial r^2}{\thinspace .} The PDE at :math:`r=0` now looks like .. _Eq:diffu:fd2:radial:eq_PDEr0:avar: .. math:: \tag{430} \frac{\partial u}{\partial t} = 2{\alpha}(0) \frac{\partial^2 u}{\partial r^2} + f(t), so there is no essential difference between the constant coefficient and variable coefficient cases. The second-order derivative in :ref:`(429) ` and :ref:`(430) ` is discretized in the usual way. .. math:: 2{\alpha}\frac{\partial^2}{\partial r^2}u(r_0,t_n) \approx [2{\alpha} 2D_rD_r u]^n_0 = 2{\alpha} \frac{u^{n}_{1} - 2u^{n}_0 + u^n_{-1}}{\Delta r^2}{\thinspace .} The fictitious value :math:`u^n_{-1}` can be eliminated using the discrete symmetry condition .. math:: [D_{2r} u =0]^n_0 \quad\Rightarrow\quad u^n_{-1} = u^n_1, which then gives the modified approximation to the term with the second-order derivative of :math:`u` in :math:`r` at :math:`r=0`: .. _Eq:_auto180: .. math:: \tag{431} 4{\alpha} \frac{u^{n}_{1} - u^{n}_0}{\Delta r^2}{\thinspace .} The discretization of the term with the second-order derivative in :math:`r` at any internal mesh point is straightforward: .. math:: \begin{align*} \left[\frac{1}{r}\frac{\partial}{\partial r} \left(r{\alpha}\frac{\partial u}{\partial r}\right)\right]_i^n & \approx [r^{-1} D_r (r {\alpha} D_r u)]_i^n\\ &= \frac{1}{r_i}\frac{1}{\Delta r^2}\left( r_{i+\frac{1}{2}}{\alpha}_{i+\frac{1}{2}}(u_{i+1}^n - u_i^n) - r_{i-\frac{1}{2}}{\alpha}_{i-\frac{1}{2}}(u_{i}^n - u_{i-1}^n)\right){\thinspace .} \end{align*} To complete the discretization, we need a scheme in time, but that can be done as before and does not interfere with the discretization in space. .. _diffu:fd2:spherical: Spherically-symmetric diffusion ------------------------------- .. index:: single: diffusion equation; spherically-symmetric diffusion Discretization in spherical coordinates ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Let us now pose the problem from the section :ref:`diffu:fd2:radial` in spherical coordinates, where :math:`u` only depends on the radial coordinate :math:`r` and time :math:`t`. That is, we have spherical symmetry. For simplicity we restrict the diffusion coefficient :math:`{\alpha}` to be a constant. The PDE reads .. _Eq:_auto181: .. math:: \tag{432} \frac{\partial u}{\partial t} = \frac{{\alpha}}{r^\gamma}\frac{\partial}{\partial r} \left(r^\gamma\frac{\partial u}{\partial r}\right) + f(t), for :math:`r\in (0,R)` and :math:`t\in (0,T]`. The parameter :math:`\gamma` is 2 for spherically-symmetric problems and 1 for axi-symmetric problems. The boundary and initial conditions have the same mathematical form as in :ref:`(425) `-:ref:`(428) `. Since the PDE in spherical coordinates has the same form as the PDE in the section :ref:`diffu:fd2:radial`, just with the :math:`\gamma` parameter being different, we can use the same discretization approach. At the origin :math:`r=0` we get problems with the term .. math:: \frac{\gamma}{r}\frac{\partial u}{\partial t}, but L'Hosptial's rule shows that this term equals :math:`\gamma\partial^2 u/ \partial r^2`, and the PDE at :math:`r=0` becomes .. _Eq:_auto182: .. math:: \tag{433} \frac{\partial u}{\partial t} = (\gamma+1){\alpha}\frac{\partial^2 u}{\partial r^2} + f(t){\thinspace .} The associated discrete form is then .. _Eq:_auto183: .. math:: \tag{434} [D_t u = \frac{1}{2} (\gamma+1){\alpha}([D_rD_r \overline{u}^t + \overline{f}^t]^n_i, for a Crank-Nicolson scheme. Discretization in Cartesian coordinates ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The spherically-symmetric spatial derivative can be transformed to the Cartesian counterpart by introducing .. math:: v(r,t) = ru(r,t){\thinspace .} Inserting :math:`u=v/r` in .. math:: \frac{1}{r^2}\frac{\partial}{\partial r} \left({\alpha}(r)r^2\frac{\partial u}{\partial r}\right), yields .. math:: r\left(\frac{d {\alpha}}{dr}\frac{\partial v}{\partial r} + {\alpha}\frac{\partial^2 v}{\partial r^2}\right) - \frac{d {\alpha}}{dr}v {\thinspace .} The two terms in the parenthesis can be combined to .. math:: r\frac{\partial}{\partial r}\left( {\alpha}\frac{\partial v}{\partial r}\right){\thinspace .} The PDE for :math:`v` takes the form .. _Eq:_auto184: .. math:: \tag{435} \frac{\partial v}{\partial t} = \frac{\partial}{\partial r}\left( {\alpha} \frac{\partial v}{\partial r}\right) - \frac{1}{r}\frac{d{\alpha}}{dr}v + rf(r,t), \quad r\in (0,R),\ t\in (0,T]{\thinspace .} For :math:`\alpha` constant we immediately realize that we can reuse a solver in Cartesian coordinates to compute :math:`v`. With variable :math:`\alpha`, a "reaction" term :math:`v/r` needs to be added to the Cartesian solver. The boundary condition :math:`\partial u/\partial r=0` at :math:`r=0`, implied by symmetry, forces :math:`v(0,t)=0`, because .. math:: \frac{\partial u}{\partial r} = \frac{1}{r^2}\left( r\frac{\partial v}{\partial r} - v\right) = 0,\quad r=0{\thinspace .} .. _diffu:2D: Diffusion in 2D =============== .. index:: single: diffusion equation; 2D We now address a diffusion in two space dimensions: .. _Eq:_auto185: .. math:: \tag{436} \frac{\partial u}{\partial t} = {\alpha}\left( \frac{\partial^2 u}{\partial x^2} + \frac{\partial^2 u}{\partial x^2}\right) + f(x,y), in a domain .. math:: (x,y)\in (0,L_x)\times (0,L_y),\ t\in (0,T], with :math:`u=0` on the boundary and :math:`u(x,y,0)=I(x,y)` as initial condition. .. _diffu:2D:discr: Discretization (3) --------------------------- For generality, it is natural to use a :math:`\theta`-rule for the time discretization. Standard, second-order accurate finite differences are used for the spatial derivatives. We sample the PDE at a space-time point :math:`(i,j,n+\frac{1}{2})` and apply the difference approximations: .. math:: \lbrack D_t u\rbrack^{n+\frac{1}{2}} = \theta \lbrack {\alpha} (D_xD_x u + D_yD_yu) + f\rbrack^{n+1} + \nonumber .. _Eq:_auto186: .. math:: \tag{437} \quad (1-\theta)\lbrack {\alpha} (D_xD_x u + D_yD_y u) + f\rbrack^{n}{\thinspace .} Written out, .. math:: \frac{u^{n+1}_{i,j}-u^n_{i,j}}{\Delta t} =\nonumber .. math:: \qquad \theta ({\alpha} (\frac{u^{n+1}_{i-1,j} - 2^{n+1}_{i,j} + u^{n+1}_{i+1,j}}{\Delta x^2} + \frac{u^{n+1}_{i,j-1} - 2^{n+1}_{i,j} + u^{n+1}_{i,j+1}}{\Delta y^2}) + f^{n+1}_{i,j}) + \nonumber .. _Eq:_auto187: .. math:: \tag{438} \qquad (1-\theta)({\alpha} (\frac{u^{n}_{i-1,j} - 2^{n}_{i,j} + u^{n}_{i+1,j}}{\Delta x^2} + \frac{u^{n}_{i,j-1} - 2^{n}_{i,j} + u^{n}_{i,j+1}}{\Delta y^2}) + f^{n}_{i,j}) We collect the unknowns on the left-hand side .. math:: u^{n+1}_{i,j} - \theta\left( F_x (u^{n+1}_{i-1,j} - 2^{n+1}_{i,j} + u^{n+1}_{i,j}) + F_y (u^{n+1}_{i,j-1} - 2^{n+1}_{i,j} + u^{n+1}_{i,j+1})\right) = \nonumber .. math:: \qquad (1-\theta)\left( F_x (u^{n}_{i-1,j} - 2^{n}_{i,j} + u^{n}_{i,j}) + F_y (u^{n}_{i,j-1} - 2^{n}_{i,j} + u^{n}_{i,j+1})\right) + \nonumber .. _Eq:diffu:2D:theta_scheme2: .. math:: \tag{439} \qquad \theta \Delta t f^{n+1}_{i,j} + (1-\theta) \Delta t f^{n}_{i,j} + u^n_{i,j}, where .. math:: F_x = \frac{{\alpha}\Delta t}{\Delta x^2},\quad F_y = \frac{{\alpha}\Delta t}{\Delta y^2}, are the Fourier numbers in :math:`x` and :math:`y` direction, respectively. .. _diffu:2D:fig:mesh3x2: .. figure:: mesh3x2.png :width: 500 *3x2 2D mesh* .. _diffu:2D:numbering: Numbering of mesh points versus equations and unknowns ------------------------------------------------------ .. index:: single: diffusion equation; 2D, numbering of mesh points .. Nx=3, Ny=2 The equations :ref:`(439) ` are coupled at the new time level :math:`n+1`. That is, we must solve a system of (linear) algebraic equations, which we will write as :math:`Ac=b`, where :math:`A` is the coefficient matrix, :math:`c` is the vector of unknowns, and :math:`b` is the right-hand side. Let us examine the equations in :math:`Ac=b` on a mesh with :math:`N_x=3` and :math:`N_y=2` cells in each direction. The spatial mesh is depicted in Figure :ref:`diffu:2D:fig:mesh3x2`. The equations at the boundary just implement the boundary condition :math:`u=0`: .. math:: \begin{align*} & u^{n+1}_{0,0}= u^{n+1}_{1,0}= u^{n+1}_{2,0}= u^{n+1}_{3,0}= u^{n+1}_{0,1}=\\ & u^{n+1}_{3,1}= u^{n+1}_{0,2}= u^{n+1}_{1,2}= u^{n+1}_{2,2}= u^{n+1}_{3,2}= 0{\thinspace .} \end{align*} We are left with two interior points, with :math:`i=1`, :math:`j=1` and :math:`i=2`, :math:`j=1`. The corresponding equations are .. math:: \begin{align*} & u^{n+1}_{i,j} - \theta\left( F_x (u^{n+1}_{i-1,j} - 2^{n+1}_{i,j} + u^{n+1}_{i,j}) + F_y (u^{n+1}_{i,j-1} - 2^{n+1}_{i,j} + u^{n+1}_{i,j+1})\right) = \\ &\qquad (1-\theta)\left( F_x (u^{n}_{i-1,j} - 2^{n}_{i,j} + u^{n}_{i,j}) + F_y (u^{n}_{i,j-1} - 2^{n}_{i,j} + u^{n}_{i,j+1})\right) + \\ &\qquad \theta \Delta t f^{n+1}_{i,j} + (1-\theta) \Delta t f^{n}_{i,j} + u^n_{i,j}, \end{align*} There are in total 12 unknowns :math:`u^{n+1}_{i,j}` for :math:`i=0,1,2,3` and :math:`j=0,1,2`. To solve the equations, we need to form a matrix system :math:`Ac=b`. In that system, the solution vector :math:`c` can only have one index. Thus, we need a numbering of the unknowns with one index, not two as used in the mesh. We introduce a mapping :math:`m(i,j)` from a mesh point with indices :math:`(i,j)` to the corresponding unknown :math:`p` in the equation system: .. math:: p = m(i,j) = j(N_x+1) + i{\thinspace .} When :math:`i` and :math:`j` run through their values, we see the following mapping to :math:`p`: .. math:: \begin{align*} &(0,0)\rightarrow 0,\ (0,1)\rightarrow 1,\ (0,2)\rightarrow 2,\ (0,3)\rightarrow 3,\\ &(1,0)\rightarrow 4,\ (1,1)\rightarrow 5,\ (1,2)\rightarrow 6,\ (1,3)\rightarrow 7,\\ &(2,0)\rightarrow 8,\ (2,1)\rightarrow 9,\ (2,2)\rightarrow 10,\ (2,3)\rightarrow 11{\thinspace .} \end{align*} That is, we number the points along the :math:`x` axis, starting with :math:`y=0`, and then progress one "horizontal" mesh line at a time. In Figure :ref:`diffu:2D:fig:mesh3x2` you can see that the :math:`(i,j)` and the corresponding single index (:math:`p`) are listed for each mesh point. We could equally well have numbered the equations in other ways, e.g., let the :math:`j` index be the fastest varying index: :math:`p = m(i,j) = i(N_y+1) + j`. Let us form the coefficient matrix :math:`A`, or more precisely, insert a matrix element (according Python's convention with zero as base index) for each of the nonzero elements in :math:`A` (the indices run through the values of :math:`p`, i.e., :math:`p=0,\ldots,11`): .. math:: {\tiny \left(\begin{array}{cccccccccccc} (0,0) & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & (1,1) & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & (2,2) & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & (3,3) & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & (4,4) & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & (5,1) & 0 & 0 & (5,4) & (5,5) & (5,6) & 0 & 0 & (5,9) & 0 & 0 \\ 0 & 0 & (6,2) & 0 & 0 & (6,5) & (6,6) & (6,7) & 0 & 0 & (6,10) & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & (7,7) & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & (8,8) & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & (9,9) & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & (10,10) & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & (11,11) \\ \end{array}\right) } Here is a more compact visualization of the coefficient matrix where we insert dots for zeros and bullets for non-zero elements: .. math:: \footnotesize \left(\begin{array}{cccccccccccc} \bullet & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot \\ \cdot & \bullet & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot \\ \cdot & \cdot & \bullet & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot & \bullet & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot & \cdot & \bullet & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot \\ \cdot & \bullet & \cdot & \cdot & \bullet & \bullet & \bullet & \cdot & \cdot & \bullet & \cdot & \cdot \\ \cdot & \cdot & \bullet & \cdot & \cdot & \bullet & \bullet & \bullet & \cdot & \cdot & \bullet & \cdot \\ \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \bullet & \cdot & \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \bullet & \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \bullet & \cdot & \cdot \\ \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \bullet & \cdot \\ \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \bullet \\ \end{array}\right) It is clearly seen that most of the elements are zero. This is a general feature of coefficient matrices arising from discretizing PDEs by finite difference methods. We say that the matrix is *sparse*. .. index:: single: diffusion equation; 2D, sparse matrix Let :math:`A_{p,q}` be the value of element :math:`(p,q)` in the coefficient matrix :math:`A`, where :math:`p` and :math:`q` now correspond to the numbering of the unknowns in the equation system. We have :math:`A_{p,q}=1` for :math:`p=q=0,1,2,3,4,7,8,9,10,11`, corresponding to all the known boundary values. Let :math:`p` be :math:`m(i,j)`, i.e., the single index corresponding to mesh point :math:`(i,j)`. Then we have .. _Eq:_auto188: .. math:: \tag{440} A_{m(i,j),m(i,j)} = A_{p,p} = 1 + \theta (F_x + F_y), .. _Eq:_auto189: .. math:: \tag{441} A_{p, m(i-1,j)} = A_{p,p-1} = -\theta F_x, .. _Eq:_auto190: .. math:: \tag{442} A_{p, m(i+1,j)} = A_{p,p+1} = -\theta F_x, .. _Eq:_auto191: .. math:: \tag{443} A_{p, m(i,j-1)} = A_{p, p-(N_x+1)} = -\theta F_y, .. _Eq:_auto192: .. math:: \tag{444} A_{p, m(i,j+1)} = A_{p, p+(N_x+1)} = -\theta F_y, .. _Eq:_auto193: .. math:: \tag{445} for the equations associated with the two interior mesh points. At these interior points, the single index :math:`p` takes on the specific values :math:`p=5,6`, corresponding to the values :math:`(1,1)` and :math:`(1,2)` of the pair :math:`(i,j)`. The above values for :math:`A_{p,q}` can be inserted in the matrix: .. math:: {\tiny \left(\begin{array}{cccccccccccc} 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & -\theta F_y & 0 & 0 & -\theta F_x & 1+2\theta F_x & -\theta F_x & 0 & 0 & -\theta F_y & 0 & 0 \\ 0 & 0 & -\theta F_y & 0 & 0 & -\theta F_x & 1+2\theta F_x & -\theta F_x & 0 & 0 & -\theta F_y & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \\ \end{array}\right) } The corresponding right-hand side vector in the equation system has the entries :math:`b_p`, where :math:`p` numbers the equations. We have .. math:: b_0=b_1=b_2=b_3=b_4=b_7=b_8=b_9=b_{10}=b_{11}=0, for the boundary values. For the equations associated with the interior points, we get for :math:`p=5,6`, corresponding to :math:`i=1,2` and :math:`j=1`: .. math:: \begin{align*} b_p &= u_i + (1-\theta)\left( F_x (u^{n}_{i-1,j} - 2^{n}_{i,j} + u^{n}_{i,j}) + F_y (u^{n}_{i,j-1} - 2^{n}_{i,j} + u^{n}_{i,j+1})\right) + \\ &\qquad \theta \Delta t f^{n+1}_{i,j} + (1-\theta) \Delta t f^{n}_{i,j}{\thinspace .} \end{align*} Recall that :math:`p=m(i,j)=j(N_x+1)+j` in this expression. We can, as an alternative, leave the boundary mesh points out of the matrix system. For a mesh with :math:`N_x=3` and :math:`N_y=2` there are only two internal mesh points whose unknowns will enter the matrix system. We must now number the unknowns at the interior points: .. math:: p = (j-1)(N_x-1) + i, for :math:`i=1,\ldots,N_x-1`, :math:`j=1,\ldots,N_y-1`. .. Nx=4, Ny=3 .. _diffu:2D:fig:mesh4x3: .. figure:: mesh4x3.png :width: 700 *4x3 2D mesh* We can continue with illustrating a bit larger mesh, :math:`N_x=4` and :math:`N_y=3`, see Figure :ref:`diffu:2D:fig:mesh4x3`. The corresponding coefficient matrix with dots for zeros and bullets for non-zeroes looks as follows (values at boundary points are included in the equation system): .. math:: {\tiny \left(\begin{array}{cccccccccccccccccccc} \bullet & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot \\ \cdot & \bullet & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot \\ \cdot & \cdot & \bullet & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot & \bullet & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot & \cdot & \bullet & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot & \cdot & \cdot & \bullet & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot \\ \cdot & \bullet & \cdot & \cdot & \cdot & \bullet & \bullet & \bullet & \cdot & \cdot & \cdot & \bullet & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot \\ \cdot & \cdot & \bullet & \cdot & \cdot & \cdot & \bullet & \bullet & \bullet & \cdot & \cdot & \cdot & \bullet & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot & \bullet & \cdot & \cdot & \cdot & \bullet & \bullet & \bullet & \cdot & \cdot & \cdot & \bullet & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \bullet & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \bullet & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \bullet & \cdot & \cdot & \cdot & \bullet & \bullet & \bullet & \cdot & \cdot & \cdot & \bullet & \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \bullet & \cdot & \cdot & \cdot & \bullet & \bullet & \bullet & \cdot & \cdot & \cdot & \bullet & \cdot & \cdot \\ \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \bullet & \cdot & \cdot & \cdot & \bullet & \bullet & \bullet & \cdot & \cdot & \cdot & \bullet & \cdot \\ \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \bullet & \cdot & \cdot & \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \bullet & \cdot & \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \bullet & \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \bullet & \cdot & \cdot \\ \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \bullet & \cdot \\ \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \bullet \\ \end{array}\right) } .. index:: single: diffusion equation; 2D, banded matrix .. admonition:: The coefficient matrix is banded Besides being sparse, we observe that the coefficient matrix is *banded*: it has five distinct bands. We have the diagonal :math:`A_{i,i}`, the subdiagonal :math:`A_{i-1,j}`, the superdiagonal :math:`A_{i,i+1}`, a lower diagonal :math:`A_{i,i-(Nx+1)}`, and an upper diagonal :math:`A_{i,i+(Nx+1)}`. The other matrix entries are known to be zero. With :math:`N_x+1=N_y+1=N`, only a fraction :math:`5N^{-2}` of the matrix entries are nonzero, so the matrix is clearly very sparse for relevant :math:`N` values. The more we can compute with the nonzeros only, the faster the solution methods will be. .. _diffu:2D:alg: Algorithm for setting up the coefficient matrix ----------------------------------------------- We looked at a specific mesh in the previous section, formulated the equations, and saw what the corresponding coefficient matrix and right-hand side are. Now our aim is to set up a general algorithm, for any choice of :math:`N_x` and :math:`N_y`, that produces the coefficient matrix and the right-hand side vector. We start with a zero matrix and vector, run through each mesh point, and fill in the values depending on whether the mesh point is an interior point or on the boundary. * for :math:`i=0,\ldots,N_x` * for :math:`j=0,\ldots, N_y` * :math:`p=j(N_x+1)+i` * if point :math:`(i,j)` is on the boundary: * :math:`A_{p,p}=1`, :math:`b_p=0` * else: * fill :math:`A_{p,m(i-1,j)}`, :math:`A_{p,m(i+1,j)}`, :math:`A_{p,m(i,j)}`, :math:`A_{p,m(i,j-1)}`, :math:`A_{p,m(i,j+1)}`, and :math:`b_p` To ease the test on whether :math:`(i,j)` is on the boundary or not, we can split the loops a bit, starting with the boundary line :math:`j=0`, then treat the interior lines :math:`1\leq j`__: ``scipy.linalg`` contains all the functions in ``numpy.linalg`` plus some other more advanced ones not contained in ``numpy.linalg``. Another advantage of using ``scipy.linalg`` over numpy.linalg is that it is always compiled with BLAS/LAPACK support, while for NumPy this is optional. Therefore, the SciPy version might be faster depending on how NumPy was installed. Therefore, unless you don't want to add SciPy as a dependency to your NumPy program, use ``scipy.linalg`` instead of ``numpy.linalg``. The code shown above is available in the ``solver_dense`` function in the file `diffu2D_u0.py `__, differing only in the boundary conditions, which in the code can be an arbitrary function along each side of the domain. We do not bother to look at vectorized versions of filling ``A`` since a dense matrix is just used of pedagogical reasons for the very first implementation. Vectorization will be treated when ``A`` has a sparse matrix representation, as in the section :ref:`diffu:2D:impl:sparse`. .. admonition:: How to debug the computation of :math:`A` and :math:`b` A good starting point for debugging the filling of :math:`A` and :math:`b` is to choose a very coarse mesh, say :math:`N_x=N_y=2`, where there is just one internal mesh point, compute the equations by hand, and print out ``A`` and ``b`` for comparison in the code. If wrong elements in ``A`` or ``b`` occur, print out each assignment to elements in ``A`` and ``b`` inside the loops and compare with what you expect. To let the user store, analyze, or visualize the solution at each time level, we include a callback function, named ``user_action``, to be called before the time loop and in each pass in that loop. The function has the signature .. code-block:: python user_action(u, x, xv, y, yv, t, n) where ``u`` is a two-dimensional array holding the solution at time level ``n`` and time ``t[n]``. The :math:`x` and :math:`y` coordinates of the mesh points are given by the arrays ``x`` and ``y``, respectively. The arrays ``xv`` and ``yv`` are vectorized representations of the mesh points such that vectorized function evaluations can be invoked. The ``xv`` and ``yv`` arrays are defined by .. code-block:: python xv = x[:,np.newaxis] yv = y[np.newaxis,:] One can then evaluate, e.g., :math:`f(x,y,t)` at all internal mesh points at time level ``n`` by first evaluating :math:`f` at all points, .. code-block:: python f_a = f(xv, yv, t[n]) and then use slices to extract a view of the values at the internal mesh points: ``f_a[1:-1,1:-1]``. The next section features an example on writing a ``user_action`` callback function. .. _diffu:2D:verify: Verification: exact numerical solution -------------------------------------- .. index:: single: diffusion equation; 2D, verification (exact num. sol.) A good test example to start with is one that preserves the solution :math:`u=0`, i.e., :math:`f=0` and :math:`I(x,y)=0`. This trivial solution can uncover some bugs. The first real test example is based on having an exact solution of the discrete equations. This solution is linear in time and quadratic in space: .. math:: u(x,y,t) = 5tx(L_x-x)y(y-L_y){\thinspace .} Inserting this manufactured solution in the PDE shows that the source term :math:`f` must be .. math:: f(x,y,t) = 5x(L_x-x)y(y-L_y) + 10{\alpha} t (x(L_x-x)+ y(y-L_y)){\thinspace .} We can use the ``user_action`` function to compare the numerical solution with the exact solution at each time level. A suitable helper function for checking the solution goes like this: .. code-block:: text def quadratic(theta, Nx, Ny): def u_exact(x, y, t): return 5*t*x*(Lx-x)*y*(Ly-y) def I(x, y): return u_exact(x, y, 0) def f(x, y, t): return 5*x*(Lx-x)*y*(Ly-y) + 10*a*t*(y*(Ly-y)+x*(Lx-x)) # Use rectangle to detect errors in switching i and j in scheme Lx = 0.75 Ly = 1.5 a = 3.5 dt = 0.5 T = 2 def assert_no_error(u, x, xv, y, yv, t, n): """Assert zero error at all mesh points.""" u_e = u_exact(xv, yv, t[n]) diff = abs(u - u_e).max() tol = 1E-12 msg = 'diff=%g, step %d, time=%g' % (diff, n, t[n]) print msg assert diff < tol, msg solver_dense( I, a, f, Lx, Ly, Nx, Ny, dt, T, theta, user_action=assert_no_error) A true test function for checking the quadratic solution for several different meshes and :math:`\theta` values can take the form .. code-block:: python def test_quadratic(): # For each of the three schemes (theta = 1, 0.5, 0), a series of # meshes are tested (Nx > Ny and Nx < Ny) for theta in [1, 0.5, 0]: for Nx in range(2, 6, 2): for Ny in range(2, 6, 2): print 'testing for %dx%d mesh' % (Nx, Ny) quadratic(theta, Nx, Ny) .. _diffu:2D:convrate: Verification: convergence rates (2) -------------------------------------------- .. index:: single: diffusion equation; 2D, verification (conv. rates) For 2D verification with convergence rate computations, the expressions and computations just build naturally on what we saw for 1D diffusion. Truncation error analysis and other forms of error analysis point to a numerical error formula like .. math:: E = C_t\Delta t^p + C_x\Delta x^2 + C_y\Delta y^2, where :math:`p`, :math:`C_t`, :math:`C_x`, and :math:`C_y` are constants. Often, the analysis of a Crank-Nicolson method can show that :math:`p=2`, while the Forward and Backward Euler schemes have :math:`p=1`. When checking the error formula empirically, we need to reduce it to a form :math:`E=Ch^r` with a single discretization parameter :math:`h` and some rate :math:`r` to be estimated. For the Backward Euler method, where :math:`p=1`, we can introduce a single discretization parameter according to .. math:: h = \Delta x^2 = \Delta y^2,\quad h = K^{-1}\Delta t, where :math:`K` is a constant. The error formula then becomes .. math:: E = C_t Kh + C_xh + C_y = \tilde C h,\quad \tilde C = C_tK + C_x + C_y{\thinspace .} The simplest choice is obviously :math:`K=1`. With the Forward Euler method, however, stability requires :math:`\Delta t = hK \leq h/(4{\alpha})`, so :math:`K\leq 1/(4{\alpha})`. For the Crank-Nicolson method, :math:`p=2`, and we can simply choose .. math:: h = \Delta x = \Delta y = \Delta t, since there is no restriction on :math:`\Delta t` in terms of :math:`\Delta x` and :math:`\Delta y`. A frequently used error measure is the :math:`\ell^2` norm of the error mesh point values. The section :ref:`wave:pde2:fd:MMS` and the formula :ref:`(185) ` shows the error measure for a 1D time-dependent problem. The extension to the current 2D problem reads .. math:: E = \left(\Delta t\Delta x\Delta y \sum_{n=0}^{N_t} \sum_{i=0}^{N_x}\sum_{j=0}^{N_y}({u_{\small\mbox{e}}}(x_i,y_j,t_n) - u^n_{i,j})^2\right)^{\frac{1}{2}}{\thinspace .} One attractive manufactured solution is .. math:: {u_{\small\mbox{e}}} = e^{-pt}\sin(k_xx)\sin(k_yy),\quad k_x=\frac{\pi}{L_x}, k_y=\frac{\pi}{L_y}, where :math:`p` can be arbitrary. The required source term is .. math:: f = ({\alpha}(k_x^2 + k_y^2) - p){u_{\small\mbox{e}}}{\thinspace .} The function ``convergence_rates`` in `diffu2D_u0.py `__ implements a convergence rate test. Two potential difficulties are important to be aware of: 1. The error formula is assumed to be correct when :math:`h\rightarrow 0`, so for coarse meshes the estimated rate :math:`r` may be somewhat away from the expected value. Fine meshes may lead to prohibitively long execution times. 2. Choosing :math:`p={\alpha} (k_x^2 + k_y^2)` in the manufactured solution above seems attractive (:math:`f=0`), but leads to a slower approach to the asymptotic range where the error formula is valid (i.e., :math:`r` fluctuates and needs finer meshes to stabilize). .. _diffu:2D:impl:sparse: Implementation with a sparse coefficient matrix ----------------------------------------------- .. index:: single: diffusion equation; 2D, implementation (sparse) We used a sparse matrix implementation in the section :ref:`diffu:pde1:impl:sparse` for a 1D problem with a tridiagonal matrix. The present matrix, arising from a 2D problem, has five diagonals, but we can use the same sparse matrix data structure ``scipy.sparse.diags``. Understanding the diagonals ~~~~~~~~~~~~~~~~~~~~~~~~~~~ Let us look closer at the diagonals in the example with a :math:`4\times 3` mesh as depicted in Figure :ref:`diffu:2D:fig:mesh4x3` and its associated matrix visualized by dots for zeros and bullets for nonzeros. From the example mesh, we may generalize to an :math:`N_x\times N_y` mesh. .. math:: {\tiny \begin{array}{lcccccccccccccccccccc} 0 =m(0,0) & \bullet & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot \\ 1 = m(1,0) & \cdot & \bullet & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot \\ 2 = m(2,0) & \cdot & \cdot & \bullet & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot \\ 3 = m(3,0) & \cdot & \cdot & \cdot & \bullet & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot \\ N_x=m(N_x,0) & \cdot & \cdot & \cdot & \cdot & \bullet & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot \\ N_x+1=m(0,1) & \cdot & \cdot & \cdot & \cdot & \cdot & \bullet & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot \\ (N_x+1)+1=m(1,1) & \cdot & \bullet & \cdot & \cdot & \cdot & \bullet & \bullet & \bullet & \cdot & \cdot & \cdot & \bullet & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot \\ (N_x+1)+2=m(2,1) & \cdot & \cdot & \bullet & \cdot & \cdot & \cdot & \bullet & \bullet & \bullet & \cdot & \cdot & \cdot & \bullet & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot \\ (N_x+1)+3=m(3,1) & \cdot & \cdot & \cdot & \bullet & \cdot & \cdot & \cdot & \bullet & \bullet & \bullet & \cdot & \cdot & \cdot & \bullet & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot \\ (N_x+1)+N_x=m(N_x,1) & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \bullet & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot \\ 2(N_x+1)=m(0,2)& \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \bullet & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot \\ 2(N_x+1)+1=m(1,2)& \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \bullet & \cdot & \cdot & \cdot & \bullet & \bullet & \bullet & \cdot & \cdot & \cdot & \bullet & \cdot & \cdot & \cdot \\ 2(N_x+1)+2=m(2,2)& \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \bullet & \cdot & \cdot & \cdot & \bullet & \bullet & \bullet & \cdot & \cdot & \cdot & \bullet & \cdot & \cdot \\ 2(N_x+1)+3=m(3,2)& \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \bullet & \cdot & \cdot & \cdot & \bullet & \bullet & \bullet & \cdot & \cdot & \cdot & \bullet & \cdot \\ 2(N_x+1)+N_x=m(N_x,2)& \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \bullet & \cdot & \cdot & \cdot & \cdot & \cdot \\ N_y(N_x+1)=m(0,N_y)& \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \bullet & \cdot & \cdot & \cdot & \cdot \\ N_y(N_x+1)+1=m(1,N_y)& \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \bullet & \cdot & \cdot & \cdot \\ N_y(N_x+1)+2=m(2,N_y)& \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \bullet & \cdot & \cdot \\ N_y(N_x+1)+3=m(3,N_y)& \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \bullet & \cdot \\ N_y(N_x+1)+N_x=m(N_x,N_y)& \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \bullet \\ \end{array} } The main diagonal has :math:`N=(N_x+1)(N_y+1)` elements, while the sub- and super-diagonals have :math:`N-1` elements. By looking at the matrix above, we realize that the lower diagonal starts in row :math:`N_x+1` and goes to row :math:`N`, so its length is :math:`N-(N_x+1)`. Similarly, the upper diagonal starts at row 0 and lasts to row :math:`N-(N_x+1)`, so it has the same length. Based on this information, we declare the diagonals by .. code-block:: python main = np.zeros(N) # diagonal lower = np.zeros(N-1) # subdiagonal upper = np.zeros(N-1) # superdiagonal lower2 = np.zeros(N-(Nx+1)) # lower diagonal upper2 = np.zeros(N-(Nx+1)) # upper diagonal b = np.zeros(N) # right-hand side Filling the diagonals ~~~~~~~~~~~~~~~~~~~~~ We run through all mesh points and fill in elements on the various diagonals. The line of mesh points corresponding to :math:`j=0` are all on the boundary, and only the main diagonal gets a contribution: .. code-block:: python m = lambda i, j: j*(Nx+1) + i j = 0; main[m(0,j):m(Nx+1,j)] = 1 # j=0 boundary line Then we run through all interior :math:`j=\hbox{const}` lines of mesh points. The first and the last point on each line, :math:`i=0` and :math:`i=N_x`, correspond to boundary points: .. code-block:: python for j in Iy[1:-1]: # Interior mesh lines j=1,...,Ny-1 i = 0; main[m(i,j)] = 1 i = Nx; main[m(i,j)] = 1 # Boundary For the interior mesh points :math:`i=1,\ldots,N_x-1` on a mesh line :math:`y=\hbox{const}` we can start with the main diagonal. The entries to be filled go from :math:`i=1` to :math:`i=N_x-1` so the relevant slice in the ``main`` vector is ``m(1,j):m(Nx,j)``: .. code-block:: python main[m(1,j):m(Nx,j)] = 1 + 2*theta*(Fx+Fy) The ``upper`` array for the superdiagonal has its index 0 corresponding to row 0 in the matrix, and the array entries to be set go from :math:`m(1,j)` to :math:`m(N_x-1,j)`: .. code-block:: python upper[m(1,j):m(Nx,j)] = - theta*Fx The subdiagonal (``lower`` array), however, has its index 0 corresponding to row 1, so there is an offset of 1 in indices compared to the matrix. The first nonzero occurs (interior point) at a mesh line :math:`j=\hbox{const}` corresponding to matrix row :math:`m(1,j)`, and the corresponding array index in ``lower`` is then :math:`m(1,j)`. To fill the entries from :math:`m(1,j)` to :math:`m(N_x-1,j)` we set the following slice in ``lower``: .. code-block:: python lower_offset = 1 lower[m(1,j)-lower_offset:m(Nx,j)-lower_offset] = - theta*Fx For the upper diagonal, its index 0 corresponds to matrix row 0, so there is no offset and we can set the entries correspondingly to ``upper``: .. code-block:: python upper2[m(1,j):m(Nx,j)] = - theta*Fy The ``lower2`` diagonal, however, has its first index 0 corresponding to row :math:`N_x+1`, so here we need to subtract the offset :math:`N_x+1`: .. code-block:: python lower2_offset = Nx+1 lower2[m(1,j)-lower2_offset:m(Nx,j)-lower2_offset] = - theta*Fy We can now summarize the above code lines for setting the entries in the sparse matrix representation of the coefficient matrix: .. code-block:: python lower_offset = 1 lower2_offset = Nx+1 m = lambda i, j: j*(Nx+1) + i j = 0; main[m(0,j):m(Nx+1,j)] = 1 # j=0 boundary line for j in Iy[1:-1]: # Interior mesh lines j=1,...,Ny-1 i = 0; main[m(i,j)] = 1 # Boundary i = Nx; main[m(i,j)] = 1 # Boundary # Interior i points: i=1,...,N_x-1 lower2[m(1,j)-lower2_offset:m(Nx,j)-lower2_offset] = - theta*Fy lower[m(1,j)-lower_offset:m(Nx,j)-lower_offset] = - theta*Fx main[m(1,j):m(Nx,j)] = 1 + 2*theta*(Fx+Fy) upper[m(1,j):m(Nx,j)] = - theta*Fx upper2[m(1,j):m(Nx,j)] = - theta*Fy j = Ny; main[m(0,j):m(Nx+1,j)] = 1 # Boundary line The next task is to create the sparse matrix from these diagonals: .. code-block:: python import scipy.sparse A = scipy.sparse.diags( diagonals=[main, lower, upper, lower2, upper2], offsets=[0, -lower_offset, lower_offset, -lower2_offset, lower2_offset], shape=(N, N), format='csr') Filling the right-hand side; scalar version ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Setting the entries in the right-hand side is easier since there are no offsets in the array to take into account. The is in fact similar to the one previously shown when we used a dense matrix representation (the right-hand side vector is, of course, independent of what type of representation we use for the coefficient matrix). The complete time loop goes as follows. .. code-block:: python import scipy.sparse.linalg for n in It[0:-1]: # Compute b j = 0 for i in Ix: p = m(i,j); b[p] = 0 # Boundary for j in Iy[1:-1]: i = 0; p = m(i,j); b[p] = 0 # Boundary for i in Ix[1:-1]: p = m(i,j) # Interior b[p] = u_n[i,j] + \ (1-theta)*( Fx*(u_n[i+1,j] - 2*u_n[i,j] + u_n[i-1,j]) +\ Fy*(u_n[i,j+1] - 2*u_n[i,j] + u_n[i,j-1]))\ + theta*dt*f(i*dx,j*dy,(n+1)*dt) + \ (1-theta)*dt*f(i*dx,j*dy,n*dt) i = Nx; p = m(i,j); b[p] = 0 # Boundary j = Ny for i in Ix: p = m(i,j); b[p] = 0 # Boundary # Solve matrix system A*c = b c = scipy.sparse.linalg.spsolve(A, b) # Fill u with vector c for i in Ix: for j in Iy: u[i,j] = c[m(i,j)] # Update u_n before next step u_n, u = u, u_n Filling the right-hand side; vectorized version ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Since we use a sparse matrix and try to speed up the computations, we should examine the loops and see if some can be easily removed by vectorization. In the filling of :math:`A` we have already used vectorized expressions at each :math:`j=\hbox{const}` line of mesh points. We can very easily do the same in the code above and remove the need for loops over the ``i`` index: .. code-block:: python for n in It[0:-1]: # Compute b, vectorized version # Precompute f in array so we can make slices f_a_np1 = f(xv, yv, t[n+1]) f_a_n = f(xv, yv, t[n]) j = 0; b[m(0,j):m(Nx+1,j)] = 0 # Boundary for j in Iy[1:-1]: i = 0; p = m(i,j); b[p] = 0 # Boundary i = Nx; p = m(i,j); b[p] = 0 # Boundary imin = Ix[1] imax = Ix[-1] # for slice, max i index is Ix[-1]-1 b[m(imin,j):m(imax,j)] = u_n[imin:imax,j] + \ (1-theta)*(Fx*( u_n[imin+1:imax+1,j] - 2*u_n[imin:imax,j] + \ u_n[imin-1:imax-1,j]) + Fy*( u_n[imin:imax,j+1] - 2*u_n[imin:imax,j] + u_n[imin:imax,j-1])) + \ theta*dt*f_a_np1[imin:imax,j] + \ (1-theta)*dt*f_a_n[imin:imax,j] j = Ny; b[m(0,j):m(Nx+1,j)] = 0 # Boundary # Solve matrix system A*c = b c = scipy.sparse.linalg.spsolve(A, b) # Fill u with vector c u[:,:] = c.reshape(Ny+1,Nx+1).T # Update u_n before next step u_n, u = u, u_n The most tricky part of this code snippet is the loading of values from the one-dimensional array ``c`` into the two-dimensional array ``u``. With our numbering of unknowns from left to right along "horizontal" mesh lines, the correct reordering of the one-dimensional array ``c`` as a two-dimensional array requires first a reshaping to an ``(Ny+1,Nx+1)`` two-dimensional array and then taking the transpose. The result is an ``(Nx+1,Ny+1)`` array compatible with ``u`` both in size and appearance of the function values. The ``spsolve`` function in ``scipy.sparse.linalg`` is an efficient version of Gaussian elimination suited for matrices described by diagonals. The algorithm is known as *sparse Gaussian elimination*, and ``spsolve`` calls up a well-tested C code called `SuperLU `__. The complete code utilizing ``spsolve`` is found in the ``solver_sparse`` function in the file `diffu2D_u0.py `__. Verification (8) ~~~~~~~~~~~~~~~~~~~~~~~~~ We can easily extend the function ``quadratic`` from the section :ref:`diffu:2D:verify` to include a test of the ``solver_sparse`` function as well. .. code-block:: python def quadratic(theta, Nx, Ny): ... t, cpu = solver_sparse( I, a, f, Lx, Ly, Nx, Ny, dt, T, theta, user_action=assert_no_error) The Jacobi iterative method --------------------------- .. index:: Jacobi method So far we have created a matrix and right-hand side of a linear system :math:`Ac=b` and solved the system for :math:`c` by calling an exact algorithm based on Gaussian elimination. A much simpler implementation, which requires no memory for the coefficient matrix :math:`A`, arises if we solve the system by *iterative* methods. These methods are only approximate, and the core algorithm is repeated many times until the solution is considered to be converged. Numerical scheme and linear system ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ To illustrate the idea of the Jacobi method, we simplify the numerical scheme to the Backward Euler case, :math:`\theta=1`, so there are fewer terms to write: .. math:: u^{n+1}_{i,j} - \left( F_x (u^{n+1}_{i-1,j} - 2u^{n+1}_{i,j} + u^{n+1}_{i,j}) + F_y (u^{n+1}_{i,j-1} - 2u^{n+1}_{i,j} + u^{n+1}_{i,j+1})\right) = \nonumber .. _Eq:diffu:2D:BE_scheme: .. math:: \tag{446} \qquad u^n_{i,j} + \Delta t f^{n+1}_{i,j} The idea of the *Jacobi* iterative method is to introduce an iteration, here with index :math:`r`, where we in each iteration treat :math:`u^{n+1}_{i,j}` as unknown, but use values from the previous iteration for the other unknowns :math:`u^{n+1}_{i\pm 1,j\pm 1}`. Iterations ~~~~~~~~~~ Let :math:`u^{n+1,r}_{i,j}` be the approximation to :math:`u^{n+1}_{i,j}` in iteration :math:`r`, for all relevant :math:`i` and :math:`j` indices. We first solve with respect to :math:`u^{n+1}_{i,j}` to get the equation to solve: .. math:: u^{n+1}_{i,j} = (1+2F_x +2F_y)^{-1} \left( F_x (u^{n+1}_{i-1,j} + u^{n+1}_{i,j}) + F_y (u^{n+1}_{i,j-1} + u^{n+1}_{i,j+1})\right) + \nonumber .. _Eq:diffu:2D:Jacobi0: .. math:: \tag{447} \qquad u^n_{i,j} + \Delta t f^{n+1}_{i,j} The iteration is introduced by using iteration index :math:`r`, for computed values, on the right-hand side and :math:`r+1` (unknown in this iteration) on the left-hand side: .. math:: u^{n+1,r+1}_{i,j} = (1+2F_x +2F_y)^{-1}\left( F_x (u^{n+1,r}_{i-1,j} + u^{n+1,r}_{i,j}) + F_y (u^{n+1,r}_{i,j-1} + u^{n+1,r}_{i,j+1})\right) \nonumber .. _Eq:diffu:2D:Jacobi: .. math:: \tag{448} \qquad + u^n_{i,j} + \Delta t f^{n+1}_{i,j} Initial guess ~~~~~~~~~~~~~ We start the iteration with the computed values at the previous time level: .. _Eq:diffu:2D:iter:startvector: .. math:: \tag{449} u^{n+1,0}_{i,j} = u^{n}_{i,j},\quad i=0,\ldots,N_x,\ j=0,\ldots,N_y{\thinspace .} Relaxation (1) ~~~~~~~~~~~~~~~~~~~~~~~ .. index:: relaxation A common technique in iterative methods is to introduce a *relaxation*, which means that the new approximation is a weighted mean of the approximation as suggested by the algorithm and the previous approximation. Naming the quantity on the left-hand side of :ref:`(448) ` as :math:`u^{n+1,*}_{i,j}`, a new approximation based on relaxation reads .. _Eq:diffu:2D:iter:relaxation: .. math:: \tag{450} u^{n+1,r+1} = \omega u^{n+1,*}_{i,j} + (1-\omega) u^{n+1,r}_{i,j}{\thinspace .} Under-relaxation means :math:`\omega < 1`, while over-relaxation has :math:`\omega > 1`. Stopping criteria (1) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The iteration can be stopped when the change from one iteration to the next is sufficiently small (:math:`\epsilon`), using either an infinity norm, .. _Eq:_auto194: .. math:: \tag{451} \max_{i,j}\left\vert u^{n+1,r+1}_{i,j}-u^{n+1,r}_{i,j} \right\vert \leq \epsilon, or an :math:`L^2` norm, .. _Eq:_auto195: .. math:: \tag{452} \left(\Delta x\Delta y\sum_{i,j} (u^{n+1,r+1}_{i,j}-u^{n+1,r}_{i,j})^2 \right)^{\frac{1}{2}} \leq \epsilon{\thinspace .} Another widely used criterion measures how well the equations are solved by looking at the residual (essentially :math:`b-Ac^{r+1}` if :math:`c^{r+1}` is the approximation to the solution in iteration :math:`r+1`). The residual, defined in terms of the finite difference stencil, is .. math:: R_{i,j} = u^{n+1,r+1}_{i,j} - (F_x(u^{n+1,r+1}_{i-1,j} - 2u^{n+1,r+1}_{i,j} + u^{n+1,r+1}_{i,j}) +\nonumber .. math:: \quad\quad F_y(u^{n+1,r+1}_{i,j-1} - 2u^{n+1,r+1}_{i,j} + u^{n+1,r+1}_{i,j+1})) - \nonumber .. _Eq:diffu:2D:residual: .. math:: \tag{453} \qquad u^n_{i,j} - \Delta t f^{n+1}_{i,j} One can then iterate until the norm of the mesh function :math:`R_{i,j}` is less than some tolerance: .. _Eq:_auto196: .. math:: \tag{454} \left(\Delta x\Delta y\sum_{i,j} R_{i,j}^2 \right)^{\frac{1}{2}} \leq \epsilon{\thinspace .} Code-friendly notation ~~~~~~~~~~~~~~~~~~~~~~ To make the mathematics as close as possible to what we will write in a computer program, we may introduce some new notation: :math:`u_{i,j}` is a short notation for :math:`u^{n+1,r+1}_{i,j}`, :math:`u^{-}_{i,j}` is a short notation for :math:`u^{n+1,r}_{i,j}`, and :math:`u^{(s)}_{i,j}` denotes :math:`u^{n+1-s}_{i,j}`. That is, :math:`u_{i,j}` is the unknown, :math:`u^{-}_{i,j}` is its most recently computed approximation, and :math:`s` counts time levels backwards in time. The Jacobi method :ref:`(448) `) takes the following form with the new notation: .. math:: u^{*}_{i,j} = (1+2F_x +2F_y)^{-1}(( F_x (u^{-}_{i-1,j} + u^{-}_{i,j}) + F_y (u^{n+1,r}_{i,j-1} + u^{n+1,r}_{i,j+1})) + \nonumber .. _Eq:diffu:2D:Jacobi2: .. math:: \tag{455} \qquad u^{(1)}_{i,j} + \Delta t f^{n+1}_{i,j}) Generalization of the scheme ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ We can also quite easily introduce the :math:`\theta` rule for discretization in time and write up the Jacobi iteration in that case as well: .. math:: u^{*}_{i,j} = (1+ 2\theta(F_x +F_y))^{-1}(\theta( F_x (u^{-}_{i-1,j} + u^{-}_{i,j}) + F_y (u^{-}_{i,j-1} + u^{-}_{i,j+1})) + \nonumber .. math:: \qquad u^{(1)}_{i,j} + \theta \Delta t f^{n+1}_{i,j} + (1-\theta)\Delta t f^n_{i,j} + \nonumber .. _Eq:diffu:2D:Jacobi3: .. math:: \tag{456} \qquad (1-\theta)( F_x(u^{(1)}_{i-1,j}-2u^{(1)}_{i,j} + u^{(1)}_{i+1,j}) + F_y(u^{(1)}_{i,j-1}-2u^{(1)}_{i,j} + u^{(1)}_{i,j+1}))){\thinspace .} The final update of :math:`u` applies relaxation: .. math:: u_{i,j} = \omega u^{*}_{i,j} + (1-\omega)u^{-}_{i,j}{\thinspace .} .. _diffu:2D:Jacobi:impl: Implementation of the Jacobi method ----------------------------------- The Jacobi method needs no coefficient matrix and right-hand side vector, but it needs an array for :math:`u` in the previous iteration. We call this array ``u_``, using the notation at the end of the previous section (at the same time level). The unknown itself is called ``u``, while ``u_n`` is the computed solution one time level back in time. With a :math:`\theta` rule in time, the time loop can be coded like this: .. code-block:: python for n in It[0:-1]: # Solve linear system by Jacobi iteration at time level n+1 u_[:,:] = u_n # Start value converged = False r = 0 while not converged: if version == 'scalar': j = 0 for i in Ix: u[i,j] = U_0y(t[n+1]) # Boundary for j in Iy[1:-1]: i = 0; u[i,j] = U_0x(t[n+1]) # Boundary i = Nx; u[i,j] = U_Lx(t[n+1]) # Boundary # Interior points for i in Ix[1:-1]: u_new = 1.0/(1.0 + 2*theta*(Fx + Fy))*(theta*( Fx*(u_[i+1,j] + u_[i-1,j]) + Fy*(u_[i,j+1] + u_[i,j-1])) + \ u_n[i,j] + \ (1-theta)*(Fx*( u_n[i+1,j] - 2*u_n[i,j] + u_n[i-1,j]) + Fy*( u_n[i,j+1] - 2*u_n[i,j] + u_n[i,j-1]))\ + theta*dt*f(i*dx,j*dy,(n+1)*dt) + \ (1-theta)*dt*f(i*dx,j*dy,n*dt)) u[i,j] = omega*u_new + (1-omega)*u_[i,j] j = Ny for i in Ix: u[i,j] = U_Ly(t[n+1]) # Boundary elif version == 'vectorized': j = 0; u[:,j] = U_0y(t[n+1]) # Boundary i = 0; u[i,:] = U_0x(t[n+1]) # Boundary i = Nx; u[i,:] = U_Lx(t[n+1]) # Boundary j = Ny; u[:,j] = U_Ly(t[n+1]) # Boundary # Internal points f_a_np1 = f(xv, yv, t[n+1]) f_a_n = f(xv, yv, t[n]) u_new = 1.0/(1.0 + 2*theta*(Fx + Fy))*(theta*(Fx*( u_[2:,1:-1] + u_[:-2,1:-1]) + Fy*( u_[1:-1,2:] + u_[1:-1,:-2])) +\ u_n[1:-1,1:-1] + \ (1-theta)*(Fx*( u_n[2:,1:-1] - 2*u_n[1:-1,1:-1] + u_n[:-2,1:-1]) +\ Fy*( u_n[1:-1,2:] - 2*u_n[1:-1,1:-1] + u_n[1:-1,:-2]))\ + theta*dt*f_a_np1[1:-1,1:-1] + \ (1-theta)*dt*f_a_n[1:-1,1:-1]) u[1:-1,1:-1] = omega*u_new + (1-omega)*u_[1:-1,1:-1] r += 1 converged = np.abs(u-u_).max() < tol or r >= max_iter u_[:,:] = u # Update u_n before next step u_n, u = u, u_n The vectorized version should be quite straightforward to understand once one has an understanding of how a standard 2D finite stencil is vectorized. The first natural verification is to use the test problem in the function ``quadratic`` from the section :ref:`diffu:2D:verify`. This problem is known to have no approximation error, but any iterative method will produce an approximate solution with unknown error. For a tolerance :math:`10^{-k}` in the iterative method, we can, e.g., use a slightly larger tolerance :math:`10^{-(k-1)}` for the difference between the exact and the computed solution. .. code-block:: python def quadratic(theta, Nx, Ny): ... def assert_small_error(u, x, xv, y, yv, t, n): """Assert small error for iterative methods.""" u_e = u_exact(xv, yv, t[n]) diff = abs(u - u_e).max() tol = 1E-4 msg = 'diff=%g, step %d, time=%g' % (diff, n, t[n]) assert diff < tol, msg for version in 'scalar', 'vectorized': for theta in 1, 0.5: print 'testing Jacobi, %s version, theta=%g' % \ (version, theta) t, cpu = solver_Jacobi( I=I, a=a, f=f, Lx=Lx, Ly=Ly, Nx=Nx, Ny=Ny, dt=dt, T=T, theta=theta, U_0x=0, U_0y=0, U_Lx=0, U_Ly=0, user_action=assert_small_error, version=version, iteration='Jacobi', omega=1.0, max_iter=100, tol=1E-5) Even for a very coarse :math:`4\times 4` mesh, the Jacobi method requires 26 iterations to reach a tolerance of :math:`10^{-5}`, which is quite many iterations, given that there are only 25 unknowns. .. _diffu:2D:Jacobi:impl:hill: Test problem: diffusion of a sine hill -------------------------------------- It can be shown that .. _Eq:diffu:2D:Jacobi:impl:hill:uex: .. math:: \tag{457} {u_{\small\mbox{e}}} = Ae^{-{\alpha}\pi^2(L_x^{-2} + L_y^{-2})t} \sin\left(\frac{\pi}{L_x}x\right)\sin\left(\frac{\pi}{L_y}y\right), is a solution of the 2D homogeneous diffusion equation :math:`u_t = {\alpha}(u_{xx}+u_{yy})` in a rectangle :math:`[0,L_x]\times [0,L_y]`, for any value of the amplitude :math:`A`. This solution vanishes at the boundaries, and the initial condition is the product of two sines. We may choose :math:`A=1` for simplicity. It is difficult to know if our solver based on the Jacobi method works properly since we are faced with two sources of errors: one from the discretization, :math:`E_\Delta`, and one from the iterative Jacobi method, :math:`E_i`. The total error in the computed :math:`u` can be represented as .. math:: E_u = E_\Delta + E_i{\thinspace .} One error measure is to look at the maximum value, which is obtained for the midpoint :math:`x=L_x/2` and :math:`y=L_x/2`. This midpoint is represented in the discrete ``u`` if :math:`N_x` and :math:`N_y` are even numbers. We can then compute :math:`E_u` as :math:`E_u = |\max {u_{\small\mbox{e}}} - \max u|`, when we know an exact solution :math:`{u_{\small\mbox{e}}}` of the problem. What about :math:`E_\Delta`? If we use the maximum value as a measure of the error, we have in fact analytical insight into the approximation error in this particular problem. According to the section :ref:`diffu:2D:analysis`, the exact solution :ref:`(457) ` of the PDE problem is also an exact solution of the discrete equations, except that the damping factor in time is different. More precisely, :ref:`(414) ` and :ref:`(415) ` are solutions of the discrete problem for :math:`\theta=1` (Backward Euler) and :math:`\theta=\frac{1}{2}` (Crank-Nicolson), respectively. The factors raised to the power :math:`n` is the numerical amplitude, and the errors in these factors become .. math:: \begin{align*} E_\Delta &= e^{-{\alpha} k^2t} - \left( \frac{1 - 2(F_x\sin^2 p_x + F_x\sin^2p_y)}{1 + 2(F_x\sin^2 p_x + F_x\sin^2p_y)} \right)^n,\quad \theta=\frac{1}{2},\\ E_\Delta &= e^{-{\alpha} k^2t} - (1 + 4F_x\sin^2 p_x + 4F_y\sin^2 p_y)^{-n},\quad\theta=1{\thinspace .} \end{align*} We are now in a position to compute :math:`E_i` numerically. That is, we can compute the error due to iterative solution of the linear system and see if it corresponds to the convergence tolerance used in the method. Note that the convergence is based on measuring the difference in two consecutive approximations, which is not exactly the error due to the iteration, but it is a kind of measure, and it should have about the same size as :math:`E_i`. The function ``demo_classic_iterative`` in `diffu2D_u0.py `__ implements the idea above (also for the methods in the section :ref:`diffu:2D:SOR`). The value of :math:`E_i` is in particular printed at each time level. By changing the tolerance in the convergence criterion in the Jacobi method, we can see that :math:`E_i` is of the same order of magnitude as the prescribed tolerance in the Jacobi method. For example: :math:`E_\Delta\sim 10^{-2}` with :math:`N_x=N_y=10` and :math:`\theta=\frac{1}{2}`, as long as :math:`\max u` has some significant size (:math:`\max u > 0.02`). An appropriate value of the tolerance is then :math:`10^{-3}`, such that the error in the Jacobi method does not become bigger than the discretization error. In that case, :math:`E_i` is around :math:`5\cdot 10^{-3}`. The corresponding number of Jacobi iterations (with :math:`\omega=1`) varies from 31 to 12 during the time simulation (for :math:`\max u > 0.02`). Changing the tolerance to :math:`10^{-5}` causes many more iterations (61 to 42) without giving any contribution to the overall accuracy, because the total error is dominated by :math:`E_\Delta`. Also, with an :math:`N_x=N_y=20`, the spatial accuracy increases and many more iterations are needed (143 to 45), but the dominating error is from the time discretization. However, with such a finer spatial mesh, a higher tolerance in the convergence criterion :math:`10^{-4}` is needed to keep :math:`E_i\sim 10^{-3}`. More experiments show the disadvantage of the very simple Jacobi iteration method: the number of iterations increases with the number of unknowns, keeping the tolerance fixed, but the tolerance should also be lowered to avoid the iteration error to dominate the total error. A small adjustment of the Jacobi method, as described in the section :ref:`diffu:2D:SOR`, provides a better method. .. _diffu:2D:Jacobi_vs_FE: The relaxed Jacobi method and its relation to the Forward Euler method ---------------------------------------------------------------------- We shall now show that solving the Poisson equation :math:`-{\alpha}\nabla^2 u = f` by the Jacobi iterative method is in fact equivalent to using a Forward Euler scheme on :math:`u_t = {\alpha}\nabla^2 u + f` and letting :math:`t\rightarrow\infty`. A Forward Euler discretization of the 2D diffusion equation, .. math:: \lbrack D_t^+ u = {\alpha} (D_xD_x u + D_yD_y u) + f\rbrack^n_{i,j}, can be written out as .. math:: u_{i,j}^{n+1} = u_{i,j}^n + \frac{\Delta t}{{\alpha} h^2} \left( u_{i-1,j}^n + u_{i+1,j}^n + u_{i,j-1}^n + u_{i,j+1}^n - 4u_{i,j}^n + h^2f_{i,j}\right), where :math:`h=\Delta x = \Delta y` has been introduced for simplicity. The scheme can be reordered as .. math:: u_{i,j}^{n+1} = \left(1 - \omega\right) u_{i,j}^n + \frac{1}{4}\omega \left( u_{i-1,j}^n + u_{i+1,j}^n + u_{i,j-1}^n + u_{i,j+1}^n - 4u_{i,j}^n + h^2f_{i,j}\right), with .. math:: \omega = 4\frac{\Delta t}{{\alpha} h^2}, but this latter form is nothing but the relaxed Jacobi method applied to .. math:: [D_xD_x u + D_yD_y u = -f]^n_{i,j}{\thinspace .} From the equivalence above we know a couple of things about the Jacobi method for solving :math:`-\nabla^2 u = f`: 1. The method is unstable if :math:`\omega > 1` (since the Forward Euler method is then unstable). 2. The convergence is really slow as the iteration index increases (coming from the fact that the Forward Euler scheme requires many small time steps to reach the stationary solution). These observations are quite disappointing: if we already have a time-dependent diffusion problem and want to take larger time steps by an implicit time discretization method, we will with the Jacobi method end up with something close to a slow Forward Euler simulation of the original problem at each time level. Nevertheless, the are two reasons for why the Jacobi method remains a fundamental building block for solving linear systems arising from PDEs: 1) a couple of iterations remove large parts of the error and this is effectively used in the very efficient class of multigrid methods; and 2) the idea of the Jacobi method can be developed into more efficient methods, especially the SOR method, which is treated next. .. _diffu:2D:SOR: The Gauss-Seidel and SOR methods -------------------------------- .. index:: Gauss-Seidel method If we update the mesh points according to the Jacobi method :ref:`(447) ` for a Backward Euler discretization with a loop over :math:`i=1,\ldots,N_x-1` and :math:`j=1,\ldots,N_y-1`, we realize that when :math:`u^{n+1,r+1}_{i,j}` is computed, :math:`u^{n+1,r+1}_{i-1,j}` and :math:`u^{n+1,r+1}_{i,j-1}` are already computed, so these new values can be used rather than :math:`u^{n+1,r}_{i-1,j}` and :math:`u^{n+1,r}_{i,j-1}` (respectively) in the formula for :math:`u^{n+1,r+1}_{i,j}`. This idea gives rise to the *Gauss-Seidel* iteration method, which mathematically is just a small adjustment of :ref:`(447) `: .. math:: u^{n+1,r+1}_{i,j} = (1+2F_x +2F_y)^{-1}((\nonumber .. _Eq:diffu:2D:SOR:eq: .. math:: \tag{458} \qquad F_x (u^{n+1,r+1}_{i-1,j} + u^{n+1,r}_{i,j}) + F_y (u^{n+1,r+1}_{i,j-1} + u^{n+1,r}_{i,j+1})) + u^n_{i,j} + \Delta t f^{n+1}_{i,j}){\thinspace .} Observe that the way we access the mesh points in the formula :ref:`(458) ` is important: points with :math:`i-1` must be computed before points with :math:`i`, and points with :math:`j-1` must be computed before points with :math:`j`. Any sequence of mesh points can be used in the Gauss-Seidel method, but the particular math formula must distinguish between already visited points in the current iteration and the points not yet visited. .. index:: SOR method The idea of relaxation :ref:`(450) ` can equally well be applied to the Gauss-Seidel method. Actually, the Gauss-Seidel method with an arbitrary :math:`0<\omega\leq 2` has its own name: the *Successive Over-Relaxation* method, abbreviated as SOR. The SOR method for a :math:`\theta` rule discretization, with the shortened :math:`u` and :math:`u^{-}` notation, can be written .. math:: u^{*}_{i,j} = (1+ 2\theta(F_x +F_y))^{-1}(\theta( F_x (u_{i-1,j} + u^{-}_{i,j}) + F_y (u_{i,j-1} + u^{-}_{i,j+1})) + \nonumber .. math:: \qquad u^{(1)}_{i,j} + \theta \Delta t f^{n+1}_{i,j} + (1-\theta)\Delta t f^n_{i,j} + \nonumber .. _Eq:diffu:2D:SOR3: .. math:: \tag{459} \qquad (1-\theta)( F_x(u^{(1)}_{i-1,j}-2u^{(1)}_{i,j} + u^{(1)}_{i+1,j}) + F_y(u^{(1)}_{i,j-1}-2u^{(1)}_{i,j} + u^{(1)}_{i,j+1}))), .. _Eq:_auto197: .. math:: \tag{460} u_{i,j} = \omega u^{*}_{i,j} + (1-\omega)u^{-}_{i,j} The sequence of mesh points in :ref:`(459) ` is :math:`i=1,\ldots,N_x-1`, :math:`j=1,\ldots,N_y-1` (but whether :math:`i` runs faster or slower than :math:`j` does not matter). .. _diffu:2D:SOR:impl:scalar: Scalar implementation of the SOR method --------------------------------------- Since the Jacobi and Gauss-Seidel methods with relaxation are so similar, we can easily make a common code for the two: .. code-block:: python for n in It[0:-1]: # Solve linear system by Jacobi/SOR iteration at time level n+1 u_[:,:] = u_n # Start value converged = False r = 0 while not converged: if version == 'scalar': if iteration == 'Jacobi': u__ = u_ elif iteration == 'SOR': u__ = u j = 0 for i in Ix: u[i,j] = U_0y(t[n+1]) # Boundary for j in Iy[1:-1]: i = 0; u[i,j] = U_0x(t[n+1]) # Boundary i = Nx; u[i,j] = U_Lx(t[n+1]) # Boundary for i in Ix[1:-1]: u_new = 1.0/(1.0 + 2*theta*(Fx + Fy))*(theta*( Fx*(u_[i+1,j] + u__[i-1,j]) + Fy*(u_[i,j+1] + u__[i,j-1])) + \ u_n[i,j] + (1-theta)*( Fx*( u_n[i+1,j] - 2*u_n[i,j] + u_n[i-1,j]) + Fy*( u_n[i,j+1] - 2*u_n[i,j] + u_n[i,j-1]))\ + theta*dt*f(i*dx,j*dy,(n+1)*dt) + \ (1-theta)*dt*f(i*dx,j*dy,n*dt)) u[i,j] = omega*u_new + (1-omega)*u_[i,j] j = Ny for i in Ix: u[i,j] = U_Ly(t[n+1]) # boundary r += 1 converged = np.abs(u-u_).max() < tol or r >= max_iter u_[:,:] = u u_n, u = u, u_n # Get ready for next iteration The idea here is to introduce ``u__`` to be used for already computed values (``u``) in the Gauss-Seidel/SOR version of the implementation, or just values from the previous iteration (``u_``) in case of the Jacobi method. .. _diffu:2D:SOR:impl:vectorized: Vectorized implementation of the SOR method ------------------------------------------- Vectorizing the Gauss-Seidel iteration step turns out to be non-trivial. The problem is that vectorized operations typically imply operations on arrays where the sequence we visit the elements in does not matter. In particular, this principle makes vectorized code trivial to parallelize. However, in the Gauss-Seidel algorithm the sequence we visit the elements in the arrays does matter, and it is well known that the basic method as explained above cannot be parallelized. Therefore, also vectorization will require new thinking. .. index:: red-black numbering The strategy for vectorizing (and parallelizing) the Gauss-Seidel method is to use a special numbering of the mesh points called red-black numbering: every other point is red or black as in a checkerboard pattern. This numbering requires :math:`N_x` and :math:`N_y` to be even numbers. Here is an example of a :math:`6\times 6` mesh: .. code-block:: text r b r b r b r b r b r b r b r b r b r b r b r b r b r b r b r b r b r b r b r b r b r b r b r b r The idea now is to first update all the red points. Each formula for updating a red point involves only the black neighbors. Thereafter, we update all the black points, and at each black point, only the recently computed red points are involved. The scalar implementation of the red-black numbered Gauss-Seidel method is really compact, since we can update values directly in ``u`` (that guarantees that we use the most recently computed values). Here is the relevant code for the Backward Euler scheme in time and without a source term: .. code-block:: python # Update internal points for sweep in 'red', 'black': for j in range(1, Ny, 1): if sweep == 'red': start = 1 if j % 2 == 1 else 2 elif sweep == 'black': start = 2 if j % 2 == 1 else 1 for i in range(start, Nx, 2): u[i,j] = 1.0/(1.0 + 2*(Fx + Fy))*( Fx*(u[i+1,j] + u[i-1,j]) + Fy*(u[i,j+1] + u[i,j-1]) + u_n[i,j]) The vectorized version must be based on slices. Looking at a typical red-black pattern, e.g., .. code-block:: text r b r b r b r b r b r b r b r b r b r b r b r b r b r b r b r b r b r b r b r b r b r b r b r b r we want to update the internal points (marking boundary points with ``x``): .. code-block:: text x x x x x x x x r b r b r x x b r b r b x x r b r b r x x b r b r b x x r b r b r x x x x x x x x It is impossible to make one slice that picks out all the internal red points. Instead, we need two slices. The first involves points marked with ``R``: .. code-block:: text x x x x x x x x R b R b R x x b r b r b x x R b R b R x x b r b r b x x R b R b R x x x x x x x x This slice is specified as ``1::2`` for ``i`` and ``1::2`` for ``j``, or with ``slice`` objects: .. code-block:: python i = slice(1, None, 2); j = slice(1, None, 2) The second slice involves the red points with ``R``: .. code-block:: text x x x x x x x x r b r b r x x b R b R b x x r b r b r x x b R b R b x x r b r b r x x x x x x x x The slices are .. code-block:: python i = slice(2, None, 2); j = slice(2, None, 2) For the black points, the first slice involves the ``B`` points: .. code-block:: text x x x x x x x x r B r B r x x b r b r b x x r B r B r x x b r b r b x x r B r B r x x x x x x x x with slice objects .. code-block:: python i = slice(2, None, 2); j = slice(1, None, 2) The second set of black points is shown here: .. code-block:: text x x x x x x x x r b r b r x x B r B r B x x r b r b r x x B r B r B x x r b r b r x x x x x x x x with slice objects .. code-block:: python i = slice(1, None, 2); j = slice(2, None, 2) That is, we need four sets of slices. The simplest way of implementing the algorithm is to make a function with variables for the slices representing :math:`i`, :math:`i-1`, :math:`i+1`, :math:`j`, :math:`j-1`, and :math:`j+1`, here called ``ic`` ("i center"), ``im1`` ("i minus 1", ``ip1`` ("i plus 1"), ``jc``, ``jm1``, and ``jp1``, respectively. .. code-block:: python def update(u_, u_n, ic, im1, ip1, jc, jm1, jp1): return \ 1.0/(1.0 + 2*theta*(Fx + Fy))*(theta*( Fx*(u_[ip1,jc] + u_[im1,jc]) + Fy*(u_[ic,jp1] + u_[ic,jm1])) +\ u_n[ic,jc] + (1-theta)*( Fx*(u_n[ip1,jc] - 2*u_n[ic,jc] + u_n[im1,jc]) +\ Fy*(u_n[ic,jp1] - 2*u_n[ic,jc] + u_n[ic,jm1]))+\ theta*dt*f_a_np1[ic,jc] + \ (1-theta)*dt*f_a_n[ic,jc]) The formula returned from ``update`` is to be compared with :ref:`(459) `. The relaxed Jacobi iteration can be implemented by .. code-block:: python ic = jc = slice(1,-1) im1 = jm1 = slice(0,-2) ip1 = jp1 = slice(2,None) u_new[ic,jc] = update( u_, u_n, ic, im1, ip1, jc, jm1, jp1) u[ic,jc] = omega*u_new[ic,jc] + (1-omega)*u_[ic,jc] The Gauss-Seidel (or SOR) updates need four different steps. The ``ic`` and ``jc`` slices are specified above. For each of these, we must specify the corresponding ``im1``, ``ip1``, ``jm1``, and ``jp1`` slices. The code below contains the details. .. code-block:: python # Red points ic = slice(1,-1,2) im1 = slice(0,-2,2) ip1 = slice(2,None,2) jc = slice(1,-1,2) jm1 = slice(0,-2,2) jp1 = slice(2,None,2) u_new[ic,jc] = update( u_new, u_n, ic, im1, ip1, jc, jm1, jp1) ic = slice(2,-1,2) im1 = slice(1,-2,2) ip1 = slice(3,None,2) jc = slice(2,-1,2) jm1 = slice(1,-2,2) jp1 = slice(3,None,2) u_new[ic,jc] = update( u_new, u_n, ic, im1, ip1, jc, jm1, jp1) # Black points ic = slice(2,-1,2) im1 = slice(1,-2,2) ip1 = slice(3,None,2) jc = slice(1,-1,2) jm1 = slice(0,-2,2) jp1 = slice(2,None,2) u_new[ic,jc] = update( u_new, u_n, ic, im1, ip1, jc, jm1, jp1) ic = slice(1,-1,2) im1 = slice(0,-2,2) ip1 = slice(2,None,2) jc = slice(2,-1,2) jm1 = slice(1,-2,2) jp1 = slice(3,None,2) u_new[ic,jc] = update( u_new, u_n, ic, im1, ip1, jc, jm1, jp1) # Relax c = slice(1,-1) u[c,c] = omega*u_new[c,c] + (1-omega)*u_[c,c] The function ``solver_classic_iterative`` in `diffu2D_u0.py `__ contains a unified implementation of the relaxed Jacobi and SOR methods in scalar and vectorized versions using the techniques explained above. .. _diffu:2D:direct_vs_iter: Direct versus iterative methods ------------------------------- Direct methods ~~~~~~~~~~~~~~ There are two classes of methods for solving linear systems: direct methods and iterative methods. Direct methods are based on variants of the Gaussian elimination procedure and will produce an exact solution (in exact arithmetics) in an a priori known number of steps. Iterative methods, on the other hand, produce an approximate solution, and the amount of work for reaching a given accuracy is usually not known. .. index:: LU factorization .. index:: Cholesky factorization The most common direct method today is to use the *LU factorization* procedure to factor the coefficient matrix :math:`A` as the product of a lower-triangular matrix :math:`L` (with unit diagonal terms) and an upper-triangular matrix :math:`U`: :math:`A=LU`. As soon as we have :math:`L` and :math:`U`, a system of equations :math:`LUc=b` is easy to solve because of the triangular nature of :math:`L` and :math:`U`. We first solve :math:`Ly=b` for :math:`y` (forward substitution), and thereafter we find :math:`c` from solving :math:`Uc=y` (backward substitution). When :math:`A` is a dense :math:`N\times N` matrix, the LU factorization costs :math:`\frac{1}{3}N^3` arithmetic operations, while the forward and backward substitution steps each require of the order :math:`N^2` arithmetic operations. That is, factorization dominates the costs, while the substitution steps are cheap. Symmetric, positive definite coefficient matrices often arise when discretizing PDEs. In this case, the LU factorization becomes :math:`A=LL^T`, and the associated algorithm is known as *Cholesky factorization*. Most linear algebra software offers highly optimized implementations of LU and Cholesky factorization as well as forward and backward substitution (``scipy.linalg`` is the relevant Python package). Finite difference discretizations lead to sparse coefficient matrices. An extreme case arose in the section :ref:`diffu:pde1:BE` where :math:`A` was tridiagonal. For a tridiagonal matrix, the amount of arithmetic operations in the LU and Cholesky factorization algorithms is just of the order :math:`N`, not :math:`N^3`. Tridiagonal matrices are special cases of *banded matrices*, where the matrices contain just a set of diagonal bands. Finite difference methods on regularly numbered rectangular and box-shaped meshes give rise to such banded matrices, with 5 bands in 2D and 7 in 3D for diffusion problems. Gaussian elimination only needs to work within the bands, leading to much more efficient algorithms. If :math:`A_{i,j}=0` for :math:`j> i+p` and :math:`j< i-p`, :math:`p` is the *half-bandwidth* of the matrix. We have in our 2D problem :math:`p=N_x+2`, while in 3D, :math:`p=(N_x+1)(N_y+1)+2`. The cost of Gaussian elimination is then :math:`{\mathcal{O}(Np^2)}`, so with :math:`p\ll N`, we see that banded matrices are much more efficient to compute with. By reordering the unknowns in clever ways, one can reduce the work of Gaussian elimination further. Fortunately, the Python programmer has access to such algorithms through the ``scipy.sparse.linalg`` package. Although a direct method is an exact algorithm, rounding errors may in practice accumulate and pollute the solution. The effect grows with the size of the linear system, so both for accuracy and efficiency, iterative methods are better suited than direct methods for solving really large linear systems. Iterative methods ~~~~~~~~~~~~~~~~~ The Jacobi and SOR iterative methods belong to a class of iterative methods where the idea is to solve :math:`Au=b` by splitting A into two parts, :math:`A=M-N`, such that solving systems :math:`Mu=c` is easy and efficient. With the splitting, we get a system .. math:: Mu = Nu + b, which suggests an iterative method .. math:: Mu^{r+1} = Nu^{r} + b,\quad r=0,1,2,\ldots, where :math:`u^{r+1}` is a new approximation to :math:`u` in the :math:`r+1`-th iteration. To initiate the iteration, we need a start vector :math:`u^0`. The Jacobi and SOR methods are based on splitting :math:`A` into a lower tridiagonal part :math:`L`, the diagonal :math:`D`, and an upper tridiagonal part :math:`U`, such that :math:`A=L+D+U`. The Jacobi method corresponds to :math:`M=D` and :math:`N=-L-U`. The Gauss-Seidel method employs :math:`M=L+D` and :math:`N=-U`, while the SOR method corresponds to .. math:: M= \frac{1}{\omega}D + L,\quad N = \frac{1-\omega}{\omega}D - U{\thinspace .} The relaxed Jacobi method has similar expressions: .. math:: M = \frac{1}{\omega}D,\quad N = \frac{1-\omega}{\omega}D - L - U{\thinspace .} With the matrix forms of the Jacobi and SOR methods as written above, we could in an implementation alternatively fill the matrix :math:`A` with entries and call general implementations of the Jacobi or SOR methods that work on a system :math:`Au=b`. However, this is almost never done since forming the matrix :math:`A` requires quite some code and storing :math:`A` in the computer's memory is unnecessary. It is much easier to just apply the Jacobi and SOR ideas to the finite difference stencils directly in an implementation, as we have shown in detail. Nevertheless, the matrix formulation of the Jacobi and SOR methods have been important for analyzing their convergence behavior. One can show that the error :math:`u^r-u` fulfills :math:`u^r-u = G^r(u^0-u)`, where :math:`G=M^{-1}N` and :math:`G^k` is a matrix exponential. For the method to converge, :math:`\lim_{r\rightarrow\infty}||G^r||=0` is a necessary and sufficient condition. This implies that the *spectral radius* of :math:`G` must be less than one. Since :math:`G` is directly related to the finite difference scheme for the underlying PDE problem, one can in principle compute the spectral radius. For a given PDE problem, however, this is not a practical strategy, since it is very difficult to develop useful formulas. Analysis of model problems, usually related to the Poisson equation, reveals some trends of interest: the convergence rate of the Jacobi method goes like :math:`h^2`, while that of SOR with an optimal :math:`\omega` goes like :math:`h`, where :math:`h` is the spatial spacing: :math:`h=\Delta x=\Delta y`. That is, the efficiency of the Jacobi method quickly deteriorates with the increasing mesh resolution, and SOR is much to be preferred (even if the optimal :math:`\omega` remains an open question). We refer to Chapter 4 of [Ref10]_ for more information on the convergence theory. One important result is that if :math:`A` is symmetric and positive definite, then SOR will converge for any :math:`0<\omega <2`. The optimal :math:`\omega` parameter can be theoretically established for a Poisson problem as .. _Eq:_auto198: .. math:: \tag{461} \omega_{o} = \frac{2}{1 + \sqrt{1-\varrho^2}},\quad \varrho = \frac{\cos(\pi/N_x) + (\Delta x/\Delta y)^2\cos(\pi/N_y)}{1 + (\Delta x/\Delta y)^2}{\thinspace .} This formula can be used as a guide also in other problems. The Jacobi and the SOR methods have their great advantage of being trivial to implement, so they are obviously popular of this reason. However, the slow convergence of these methods limits the popularity to fairly small linear systems (i.e., coarse meshes). As soon as the matrix size grows, one is better off with more sophisticated iterative methods like the preconditioned Conjugate gradient method, which we now turn to. Finally, we mention that there is a variant of the SOR method, called Symmetric Successive Overrelaxation method, known as SSOR, where one runs a standard SOR sweep through the mesh points and then a new sweep but visiting the points in reverse order. .. _diffu:2D:CG: The Conjugate gradient method ----------------------------- .. index:: conjugate gradient method There is no simple intuitive derivation of the Conjugate gradient method, so we refer to the many excellent expositions in the literature for the idea of the method and how the algorithm is derived. In particular, we recommend the books [Ref11]_ [Ref12]_ [Ref10]_ [Ref13]_. A brief overview is provided in the `Wikipedia article `__. Here, we just state the pros and cons of the method from a user's perspective and how we utilize it in code. The original Conjugate gradient method is limited to linear systems :math:`Au=b`, where :math:`A` is a symmetric and positive definite matrix. There are, however, extensions of the method to non-symmetric matrices. .. , so when we use the .. term *Conjugate gradient method* hereafter, we usually mean the family of .. related methods that can be applied .. to most matrix systems arising from discretizing .. PDEs. When we need to distinuish between methods for .. the symmetric and non-symmetric .. cases, we use the terms *original Conjugate gradient method* and .. *conjugate gradient-like methods*, respectively. .. index:: preconditioning A major advantage of all conjugate gradient methods is that the matrix :math:`A` is only used in matrix-vector products, so we do not need form and store :math:`A` if we can provide code for computing a matrix-vector product :math:`Au`. Another important feature is that the algorithm is very easy to vectorize and parallelize. The primary downside of the method is that it convergences slowly unless one has an effective *preconditioner* for the system. That is, instead of solving :math:`Au=b`, we try to solve :math:`M^{-1}Au=M^{-1}b` in the hope that the method works better for this *preconditioned* system. The matrix :math:`M` is the *preconditioner* or preconditioning matrix. Now we need to perform matrix-vector products :math:`y = M^{-1}Au`, which is done in two steps: first the matrix-vector product :math:`v=Au` is carried out and then the system :math:`My=v` must be solved. Therefore, :math:`M` must be cheap to compute and systems :math:`My=v` must be cheap to solve. A perfect preconditioner is :math:`M=A`, but in each iteration in the Conjugate gradient method one then has so solve a system with :math:`A` as coefficient matrix! A key idea is to let :math:`M` be some kind of *cheap approximation* to :math:`A`. The simplest preconditioner is to set :math:`M=D`, where :math:`D` is the diagonal of :math:`A`. This choice means running one Jacobi iteration as preconditioner. :ref:`diffu:exer:splitting_prec` shows that the Jacobi and SOR methods can also be viewed as preconditioners. Constructing good preconditioners is a scientific field on its own. Here we shall treat the topic just very briefly. For a user having access to the ``scipy.sparse.linalg`` library, there are Conjugate gradient methods and preconditioners readily available: * For positive definite, symmetric systems: ``cg`` (the Conjugate gradient method) * For symmetric systems: ``minres`` (Minimum residual method) * For non-symmetric systems: * ``gmres`` (GMRES: Generalized minimum residual method) * ``bicg`` (BiConjugate gradient method) * ``bicgstab`` (Stabilized BiConjugate gradient method) * ``cgs`` (Conjugate gradient squared method) * ``qmr`` (Quasi-minimal residual iteration) * Preconditioner: ``spilu`` (Sparse, incomplete LU factorization) The ILU preconditioner is an attractive all-round type of preconditioner that is suitable for most problems on serial computers. A more efficient preconditioner is the multigrid method, and algebraic multigrid is also an all-round choice as preconditioner. The Python package `PyAMG `__ offers efficient implementations of the algebraic multigrid method, to be used both as a preconditioner and as a stand-alone iterative method. The matrix arising from implicit time discretization methods of the diffusion equation is symmetric and positive definite so we can use the Conjugate gradient method (``cg``), typically in combination with an ILU preconditioner. The code is very similar to the one we created when solving the linear system by sparse Gaussian elimination, the main difference is that we now allow for calling up the Conjugate gradient function as an alternative solver. .. code-block:: python def solver_sparse( I, a, f, Lx, Ly, Nx, Ny, dt, T, theta=0.5, U_0x=0, U_0y=0, U_Lx=0, U_Ly=0, user_action=None, method='direct', CG_prec='ILU', CG_tol=1E-5): """ Full solver for the model problem using the theta-rule difference approximation in time. Sparse matrix with dedicated Gaussian elimination algorithm (method='direct') or ILU preconditioned Conjugate Gradients (method='CG' with tolerance CG_tol and preconditioner CG_prec ('ILU' or None)). """ # Set up data structures as shown before # Precompute sparse matrix ... A = scipy.sparse.diags( diagonals=[main, lower, upper, lower2, upper2], offsets=[0, -lower_offset, lower_offset, -lower2_offset, lower2_offset], shape=(N, N), format='csc') if method == 'CG': if CG_prec == 'ILU': # Find ILU preconditioner (constant in time) A_ilu = scipy.sparse.linalg.spilu(A) # SuperLU defaults M = scipy.sparse.linalg.LinearOperator( shape=(N, N), matvec=A_ilu.solve) else: M = None CG_iter = [] # No of CG iterations at time level n # Time loop for n in It[0:-1]: # Compute b, vectorized version # Solve matrix system A*c = b if method == 'direct': c = scipy.sparse.linalg.spsolve(A, b) elif method == 'CG': x0 = u_n.T.reshape(N) # Start vector is u_n CG_iter.append(0) def CG_callback(c_k): """Trick to count the no of iterations in CG.""" CG_iter[-1] += 1 c, info = scipy.sparse.linalg.cg( A, b, x0=x0, tol=CG_tol, maxiter=N, M=M, callback=CG_callback) # Fill u with vector c # Update u_n before next step u_n, u = u, u_n The number of iterations in the Conjugate gradient method is of interest, but is unfortunately not available from the ``cg`` function. Therefore, we perform a trick: in each iteration a user function ``CG_callback`` is called where we accumulate the number of iterations in a list ``CG_iter``. What is the recommended method for solving linear systems? ---------------------------------------------------------- There is no clear answer to this question. If you have enough memory and computing time available, direct methods such as ``spsolve`` are to be preferred since they are easy to use and finds almost an exact solution. However, in larger 2D and in 3D problems, direct methods usually run too slowly or required too much memory, so one is forced to use iterative methods. The fastest and most reliable methods are in the Conjugate Gradient family, but these requires suitable preconditioners. ILU is an all-round preconditioner, but it is not suited for parallel computing. The Jacobi and SOR iterative methods are easy to implement, and popular of that reason, but run slowly. Jacobi iteration is not an option in real problems, but SOR may be. .. As we have showed, Jacobi is equivalent to running a Forward Euler scheme .. until the stationary state is reached .. _diffu:randomwalk: Random walk =========== .. index:: random walk Models leading to diffusion equations, see the section :ref:`diffu:app`, are usually based on reasoning with *averaged* physical quantities such as concentration, temperature, and velocity. The underlying physical processes involve complicated microscopic movement of atoms and molecules, but an average of a large number of molecules is performed in a small volume before the modeling starts, and the averaged quantity inside this volume is assigned as a point value at the centroid of the volume. This means that concentration, temperature, and velocity at a space-time point represent averages around the point in a small time interval and small spatial volume. Random walk is a principally different kind of modeling procedure compared to the reasoning behind partial differential equations. The idea in random walk is to have a large number of "particles" that undergo random movements. Averaging can then be used afterwards to compute macroscopic quantities like concentration. The "particles" and their random movement represent a very simplified microscopic behavior of molecules, much simpler and computationally much more efficient than direct `molecular simulation `__, yet the random walk model has been very powerful to describe a wide range of phenomena, including heat conduction, quantum mechanics, polymer chains, population genetics, neuroscience, hazard games, and pricing of financial instruments. It can be shown that random walk, when averaged, produces models that are mathematically equivalent to diffusion equations. This is the primary reason why we treat random walk in this chapter: two very different algorithms (finite difference stencils and random walk) solve the same type of problems. The simplicity of the random walk algorithm makes it particularly attractive for solving diffusion equations on massively parallel computers. The exposition here is as simple as possible, and good thorough derivation of the models is provided by Hjorth-Jensen [Ref14]_. .. _diffu:randomwalk:1D: Random walk in 1D ----------------- Imagine that we have some particles that perform random moves, either to the right or to the left. We may flip a coin to decide the movement of each particle, say head implies movement to the right and tail means movement to the left. Each move is one unit length. Physicists use the term *random walk* for this type of movement. The movement is also known as `drunkard's walk `__. You may try this yourself: flip the coin and make one step to the left or right, and repeat the process. We introduce the symbol :math:`N` for the number of steps in a random walk. Figure :ref:`diffu:randomwalk:1D:fig:ensemble` shows four different random walks with :math:`N=200`. .. _diffu:randomwalk:1D:fig:ensemble: .. figure:: rw1D_ensemble4.png :width: 800 *Ensemble of 4 random walks, each with 200 steps* .. _diffu:randomwalk:1D:EVar: Statistical considerations -------------------------- Let :math:`S_k` be the stochastic variable representing a step to the left or to the right in step number :math:`k`. We have that :math:`S_k=-1` with probability :math:`p` and :math:`S_k=1` with probability :math:`q=1-p`. The variable :math:`S_k` is known as a `Bernoulli variable `__. The expectation of :math:`S_k` is .. math:: {\hbox{E}\lbrack S_k \rbrack} = p\cdot (-1) + q\cdot 1 = 1 - 2p, and the variance is .. math:: {\hbox{Var}\lbrack S_k \rbrack} = {\hbox{E}\lbrack S_k^2 \rbrack} - {\hbox{E}\lbrack S_k \rbrack}^2 = 1 - (1-2p)^2 = 4p(1-p){\thinspace .} The position after :math:`k` steps is another stochastic variable .. math:: \bar X_k = \sum_{i=0}^{k-1} S_i{\thinspace .} The expected position is .. math:: {\hbox{E}\lbrack \bar X_k \rbrack} = {\hbox{E}\lbrack \sum_{i=0 \rbrack}^{k-1} S_i} = \sum_{i=0}^{k-1} {\hbox{E}\lbrack S_i \rbrack}= k(1-2p){\thinspace .} All the :math:`S_k` variables are independent. The variance therefore becomes .. math:: {\hbox{Var}\lbrack \bar X_k \rbrack} = {\hbox{Var}\lbrack \sum_{i=0 \rbrack}^{k-1} S_i} = \sum_{i=0}^{k-1} {\hbox{Var}\lbrack S_i \rbrack}= k4p(1-p){\thinspace .} We see that :math:`{\hbox{Var}\lbrack \bar X_k \rbrack}` is proportional with the number of steps :math:`k`. For the very important case :math:`p=q=\frac{1}{2}`, :math:`{\hbox{E}\lbrack \bar X_k \rbrack}=0` and :math:`{\hbox{Var}\lbrack \bar X_k \rbrack}=k`. How can we estimate :math:`{\hbox{E}\lbrack \bar X_k \rbrack}=0` and :math:`{\hbox{Var}\lbrack \bar X_k \rbrack}=N`? We must have many random walks of the type in Figure :ref:`diffu:randomwalk:1D:fig:ensemble`. For a given :math:`k`, say :math:`k=100`, we find all the values of :math:`\bar X_k`, name them :math:`\bar x_{0,k}`, :math:`\bar x_{1,k}`, :math:`\bar x_{2,k}`, and so on. The empirical estimate of :math:`{\hbox{E}\lbrack \bar X_k \rbrack}` is the average, .. math:: {\hbox{E}\lbrack \bar X_k \rbrack} \approx = \frac{1}{W}\sum_{j=0}^{W-1} \bar x_{j,k}, while an empirical estimate of :math:`{\hbox{Var}\lbrack \bar X_k \rbrack}` is .. math:: {\hbox{Var}\lbrack \bar X_k \rbrack} \approx \frac{1}{W}\sum_{j=0}^{W-1} (\bar x_{j,k})^2 - \left(\frac{1}{W}\sum_{j=0}^{W-1} \bar x_{j,k}\right)^2{\thinspace .} That is, we take the statistics for a given :math:`K` across the ensemble of random walks ("vertically" in Figure :ref:`diffu:randomwalk:1D:fig:ensemble`). The key quantities to record are :math:`\sum_i \bar x_{i,k}` and :math:`\sum_i \bar x_{i,k}^2`. .. _diffu:randomwalk:1D:code1: Playing around with some code ----------------------------- Scalar code (1) ~~~~~~~~~~~~~~~~~~~~~~~~ Python has a ``random`` module for drawing random numbers, and this module has a function ``uniform(a, b)`` for drawing a uniformly distributed random number in the interval :math:`[a,b)`. If an event happens with probability :math:`p`, we can simulate this on the computer by drawing a random number :math:`r` in :math:`[0,1)`, because then :math:`r\leq p` with probability :math:`p` and :math:`r>p` with probability :math:`1-p`: .. code-block:: python import random r = random.uniform(0, 1) if r <= p: # Event happens else: # Event does not happen A random walk with :math:`N` steps, starting at :math:`x_0`, where we move to the left with probability :math:`p` and to the right with probability :math:`1-p` can now be implemented by .. code-block:: python import random, numpy as np def random_walk1D(x0, N, p): """1D random walk with 1 particle.""" # Store position in step k in position[k] position = np.zeros(N) position[0] = x0 current_pos = x0 for k in range(N-1): r = random.uniform(0, 1) if r <= p: current_pos -= 1 else: current_pos += 1 position[k+1] = current_pos return position .. index:: vectorization Vectorized code (2) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Since :math:`N` is supposed to be large and we want to repeat the process for many particles, we should speed up the code as much as possible. Vectorization is the obvious technique here: we draw all the random numbers at once with aid of ``numpy``, and then we formulate vector operations to get rid of the loop over the steps (``k``). The ``numpy.random`` module has vectorized versions of the functions in Python's built-in ``random`` module. For example, ``numpy.random.uniform(a, b, N)`` returns ``N`` random numbers uniformly distributed between ``a`` (included) and ``b`` (not included). We can then make an array of all the steps in a random walk: if the random number is less than or equal to :math:`p`, the step is :math:`-1`, otherwise the step is :math:`1`: .. code-block:: python r = np.random.uniform(0, 1, size=N) steps = np.where(r <= p, -1, 1) The value of ``position[k]`` is the sum of all steps up to step ``k``. Such sums are often needed in vectorized algorithms and therefore available by the ``numpy.cumsum`` function: .. code-block:: python >>> import numpy as np >>> np.cumsum(np.array([1,3,4,6])) array([ 1, 4, 8, 14]) The resulting array in this demo has elements :math:`1`, :math:`1+3=4`, :math:`1+3+4=8`, and :math:`1+3+4+6=14`. We can now vectorize the ``random_walk1D`` function: .. code-block:: python def random_walk1D_vec(x0, N, p): """Vectorized version of random_walk1D.""" # Store position in step k in position[k] position = np.zeros(N+1) position[0] = x0 r = np.random.uniform(0, 1, size=N) steps = np.where(r <= p, -1, 1) position[1:] = x0 + np.cumsum(steps) return position This code runs about 10 times faster than the scalar version. With a parallel ``numpy`` library, the code can also automatically take advantage of hardware for parallel computing because each of the four array operations can be trivially parallelized. .. index:: seed (random numbers) Fixing the random sequence ~~~~~~~~~~~~~~~~~~~~~~~~~~ During software development with random numbers it is advantageous to always generate the same sequence of random numbers as this may help debugging processes. To fix the sequence, we set a *seed* of the random number generator to some chosen integer, e.g., .. code-block:: python np.random.seed(10) Calls to ``random_walk1D_vec`` give positions of the particle as depicted in Figure :ref:`diffu:randomwalk:1D:code1:fig1`. The particle starts at the origin and moves with :math:`p=\frac{1}{2}`. Since the seed is the same, the plot to the left is just a magnification of the first 1,000 steps in the plot to the right. .. demo_random_walk1D produced the plots .. _diffu:randomwalk:1D:code1:fig1: .. figure:: rw1D_1sample.png :width: 800 *1,000 (left) and 50,000 (right) steps of a random walk* .. index:: verification Verification (9) ~~~~~~~~~~~~~~~~~~~~~~~~~ When we have a scalar and a vectorized code, it is always a good idea to develop a unit test for checking that they produce the same result. A problem in the present context is that the two versions apply to different random number generators. For a test to be meaningful, we need to fix the seed and use the same generator. This means that the scalar version must either use ``np.random`` or have this as an option. An option is the most flexible choice: .. code-block:: text import random def random_walk1D(x0, N, p, random=random): ... r = random.uniform(0, 1) Using ``random=np.random``, the ``r`` variable gets computed by ``np.random.uniform``, and the sequence of random numbers will be the same as in the vectorized version that employs the same generator (given that the seed is also the same). A proper test function may be to check that the positions in the walk are the same in the scalar and vectorized implementations: .. code-block:: python def test_random_walk1D(): # For fixed seed, check that scalar and vectorized versions # produce the same result x0 = 2; N = 4; p = 0.6 np.random.seed(10) scalar_computed = random_walk1D(x0, N, p, random=np.random) np.random.seed(10) vectorized_computed = random_walk1D_vec(x0, N, p) assert (scalar_computed == vectorized_computed).all() Note that we employ ``==`` for arrays with real numbers, which is normally an inadequate test due to rounding errors, but in the present case, all arithmetics consists of adding or subtracting one, so these operations are expected to have no rounding errors. Comparing two ``numpy`` arrays with ``==`` results in a boolean array, so we need to call the ``all()`` method to ensure that all elements are ``True``, i.e., that all elements in the two arrays match each other pairwise. .. _diffu:randomwalk:1D:pde: Equivalence with diffusion -------------------------- .. index:: diffusion limit of random walk The original random walk algorithm can be said to work with dimensionless coordinates :math:`\bar x_i = -N + i`, :math:`i=0,1,\ldots, 2N+1` (:math:`i\in [-N,N]`), and :math:`\bar t_n=n`, :math:`n=0,1,\ldots,N`. A mesh with spacings :math:`\Delta x` and :math:`\Delta t` with dimensions can be introduced by .. math:: x_i = X_0 + \bar x_i \Delta x,\quad t_n = \bar t_n\Delta t{\thinspace .} If we implement the algorithm with dimensionless coordinates, we can just use this rescaling to obtain the movement in a coordinate system without unit spacings. Let :math:`P^{n+1}_i` be the probability of finding the particle at mesh point :math:`\bar x_i` at time :math:`\bar t_{n+1}`. We can reach mesh point :math:`(i,n+1)` in two ways: either coming in from the left from :math:`(i-1,n)` or from the right (:math:`i+1,n)`. Each has probability :math:`\frac{1}{2}` (if we assume :math:`p=q=\frac{1}{2}`). The fundamental equation for :math:`P^{n+1}_i` is .. _Eq:diffu:randomwalk:1D:pde:Markov: .. math:: \tag{462} P^{n+1}_i = \frac{1}{2} P^{n}_{i-1} + \frac{1}{2} P^{n}_{i+1}{\thinspace .} (This equation is easiest to understand if one looks at the random walk as a Markov process and applies the transition probabilities, but this is beyond scope of the present text.) Subtracting :math:`P^{n}_i` from (:ref:`diffu:randomwalk:1D`) results in .. math:: P^{n+1}_i - P^{n}_i = \frac{1}{2} (P^{n}_{i-1} -2P^{n}_i + \frac{1}{2} P^{n}_{i+1}){\thinspace .} Readers who have seen the Forward Euler discretization of a 1D diffusion equation recognize this scheme as very close to such a discretization. We have .. math:: \frac{\partial}{\partial t}P(x_i,t_{n}) = \frac{P^{n+1}_i - P^{n}_i}{\Delta t} + {\mathcal{O}(\Delta t)}, or in dimensionless coordinates .. math:: \frac{\partial}{\partial\bar t}P(\bar x_i,\bar t_n) \approx P^{n+1}_i - P^{n}_i{\thinspace .} Similarly, we have .. math:: \begin{align*} \frac{\partial^2}{\partial x^2}P(x_i,t_n) &= \frac{P^{n}_{i-1} -2P^{n}_i + \frac{1}{2} P^{n}_{i+1}}{\Delta x^2} + {\mathcal{O}(\Delta x^2)},\\ \frac{\partial^2}{\partial x^2}P(\bar x_i,\bar t_n) &\approx P^{n}_{i-1} -2P^{n}_i + \frac{1}{2} P^{n}_{i+1}{\thinspace .} \end{align*} Equation (:ref:`diffu:randomwalk:1D`) is therefore equivalent with the dimensionless diffusion equation .. _Eq:diffu:randomwalk:1D:pde:dimless: .. math:: \tag{463} \frac{\partial P}{\partial\bar t} = \frac{1}{2} \frac{\partial^2 P}{\partial \bar x^2}, or the diffusion equation .. _Eq:diffu:randomwalk:1D:pde:dim: .. math:: \tag{464} \frac{\partial P}{\partial t} = D\frac{\partial^2 P}{\partial x^2}, with diffusion coefficient .. math:: D = \frac{\Delta x^2}{2\Delta t}{\thinspace .} This derivation shows the tight link between random walk and diffusion. If we keep track of where the particle is, and repeat the process many times, or run the algorithms for lots of particles, the histogram of the positions will approximate the solution of the diffusion equation for the local probability :math:`P^n_i`. Suppose all the random walks start at the origin. Then the initial condition for the probability distribution is the Dirac delta function :math:`\delta(x)`. The solution of :ref:`(463) ` can be shown to be .. _Eq:diffu:randomwalk:1D:pde:dimless:sol: .. math:: \tag{465} \bar P(\bar x,\bar t) = \frac{1}{\sqrt{4\pi{\alpha} t}}e^{-\frac{x^2}{4{\alpha} t}}, where :math:`{\alpha} = \frac{1}{2}`. Implementation of multiple walks -------------------------------- Our next task is to implement an ensemble of walks (for statistics, see the section :ref:`diffu:randomwalk:1D:EVar`) and also provide data from the walks such that we can compute the probabilities of the positions as introduced in the previous section. An appropriate representation of probabilities :math:`P^n_i` are histograms (with :math:`i` along the :math:`x` axis) for a few selected values of :math:`n`. To estimate the expectation and variance of the random walks, the section :ref:`diffu:randomwalk:1D:EVar` points to recording :math:`\sum_j x_{j,k}` and :math:`\sum_j x_{j,k}^2`, where :math:`x_{j,k}` is the position at time/step level :math:`k` in random walk number :math:`j`. The histogram of positions needs the individual values :math:`x_{i,k}` for all :math:`i` values and some selected :math:`k` values. We introduce ``position[k]`` to hold :math:`\sum_j x_{j,k}`, ``position2[k]`` to hold :math:`\sum_j (x_{j,k})^2`, and ``pos_hist[i,k]`` to hold :math:`x_{i,k}`. A selection of :math:`k` values can be specified by saying how many, ``num_times``, and let them be equally spaced through time: .. code-block:: python pos_hist_times = [(N//num_times)*i for i in range(num_times)] This is one of the few situations we want integer division (``//``) or real division rounded to an integer. Scalar version ~~~~~~~~~~~~~~ Our scalar implementation of running ``num_walks`` random walks may go like this: .. code-block:: python def random_walks1D(x0, N, p, num_walks=1, num_times=1, random=random): """Simulate num_walks random walks from x0 with N steps.""" position = np.zeros(N+1) # Accumulated positions position[0] = x0*num_walks position2 = np.zeros(N+1) # Accumulated positions**2 position2[0] = x0**2*num_walks # Histogram at num_times selected time points pos_hist = np.zeros((num_walks, num_times)) pos_hist_times = [(N//num_times)*i for i in range(num_times)] #print 'save hist:', post_hist_times for n in range(num_walks): num_times_counter = 0 current_pos = x0 for k in range(N): if k in pos_hist_times: #print 'save, k:', k, num_times_counter, n pos_hist[n,num_times_counter] = current_pos num_times_counter += 1 # current_pos corresponds to step k+1 r = random.uniform(0, 1) if r <= p: current_pos -= 1 else: current_pos += 1 position [k+1] += current_pos position2[k+1] += current_pos**2 return position, position2, pos_hist, np.array(pos_hist_times) Vectorized version ~~~~~~~~~~~~~~~~~~ We have already vectorized a single random walk. The additional challenge here is to vectorize the computation of the data for the histogram, ``pos_hist``, but given the selected steps in ``pos_hist_times``, we can find the corresponding positions by indexing with the list ``pos_hist_times``: ``position[post_hist_times]``, which are to be inserted in ``pos_hist[n,:]``. .. code-block:: python def random_walks1D_vec1(x0, N, p, num_walks=1, num_times=1): """Vectorized version of random_walks1D.""" position = np.zeros(N+1) # Accumulated positions position2 = np.zeros(N+1) # Accumulated positions**2 walk = np.zeros(N+1) # Positions of current walk walk[0] = x0 # Histogram at num_times selected time points pos_hist = np.zeros((num_walks, num_times)) pos_hist_times = [(N//num_times)*i for i in range(num_times)] for n in range(num_walks): r = np.random.uniform(0, 1, size=N) steps = np.where(r <= p, -1, 1) walk[1:] = x0 + np.cumsum(steps) # Positions of this walk position += walk position2 += walk**2 pos_hist[n,:] = walk[pos_hist_times] return position, position2, pos_hist, np.array(pos_hist_times) Improved vectorized version ~~~~~~~~~~~~~~~~~~~~~~~~~~~ Looking at the vectorized version above, we still have one potentially long Python loop over ``n``. Normally, ``num_walks`` will be much larger than ``N``. The vectorization of the loop over ``N`` certainly speeds up the program, but if we think of vectorization as also a way to parallelize the code, all the independent walks (the ``n`` loop) can be executed in parallel. Therefore, we should include this loop as well in the vectorized expressions, at the expense of using more memory. We introduce the array ``walks`` to hold the :math:`N+1` steps of all the walks: each row represents the steps in one walk. .. code-block:: python walks = np.zeros((num_walks, N+1)) # Positions of each walk walks[:,0] = x0 Since all the steps are independent, we can just make one long vector of enough random numbers (``N*num_walks``), translate these numbers to :math:`\pm 1`, then we reshape the array such that the steps of each walk are stored in the rows. .. code-block:: python r = np.random.uniform(0, 1, size=N*num_walks) steps = np.where(r <= p, -1, 1).reshape(num_walks, N) The next step is to sum up the steps in each walk. We need the ``np.cumsum`` function for this, with the argument ``axis=1`` for indicating a sum across the columns: .. code-block:: python >>> a = np.arange(6).reshape(2,3) >>> a array([[0, 1, 2], [3, 4, 5]]) >>> np.cumsum(a, axis=1) array([[ 0, 1, 3], [ 3, 7, 12]]) Now ``walks`` can be computed by .. code-block:: python walks[:,1:] = x0 + np.cumsum(steps, axis=1) The ``position`` vector is the sum of all the walks. That is, we want to sum all the rows, obtained by .. code-block:: python position = np.sum(walks, axis=0) A corresponding expression computes the squares of the positions. Finally, we need to compute ``pos_hist``, but that is a matter of grabbing some of the walks (according to ``pos_hist_times``): .. code-block:: python pos_hist[:,:] = walks[:,pos_hist_times] The complete vectorized algorithm without any loop can now be summarized: .. code-block:: python def random_walks1D_vec2(x0, N, p, num_walks=1, num_times=1): """Vectorized version of random_walks1D; no loops.""" position = np.zeros(N+1) # Accumulated positions position2 = np.zeros(N+1) # Accumulated positions**2 walks = np.zeros((num_walks, N+1)) # Positions of each walk walks[:,0] = x0 # Histogram at num_times selected time points pos_hist = np.zeros((num_walks, num_times)) pos_hist_times = [(N//num_times)*i for i in range(num_times)] r = np.random.uniform(0, 1, size=N*num_walks) steps = np.where(r <= p, -1, 1).reshape(num_walks, N) walks[:,1:] = x0 + np.cumsum(steps, axis=1) position = np.sum(walks, axis=0) position2 = np.sum(walks**2, axis=0) pos_hist[:,:] = walks[:,pos_hist_times] return position, position2, pos_hist, np.array(pos_hist_times) What is the gain of the vectorized implementations? One important gain is that each vectorized operation can be automatically parallelized if one applies a parallel ``numpy`` library like `Numba `__. On a single CPU, however, the speed up of the vectorized operations is also significant. With :math:`N=1,000` and 50,000 repeated walks, the two vectorized versions run about 25 and 18 times faster than the scalar version, with ``random_walks1D_vec1`` being fastest. .. The downside of vectorization of random walks is the large .. arrays that arise, especially the version in ``random_walks1D_vec2``. .. CPU: .. N=1000, 50K walks .. scalar: 42, vec1: 1.7, vec2: 1.7, factors 25 and 25 Remark on vectorized code and parallelization ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Our first attempt on vectorization removed the loop over the :math:`N` steps in a single walk. However, the number of walks is usually much larger than :math:`N`, because of the need for accurate statistics. Therefore, we should rather remove the loop over all walks. It turns out, from our efficiency experiments, that the function ``random_walks1D_vec2`` (with no loops) is slower than ``random_walks1D_vec1``. This is a bit surprising and may be explained by less efficiency in the statements involving very large arrays, containing all steps for all walks at once. From a parallelization and improved vectorization point of view, it would be more natural to switch the sequence of the loops in the serial code such that the shortest loop is the outer loop: .. code-block:: python def random_walks1D2(x0, N, p, num_walks=1, num_times=1, ...): ... current_pos = x0 + np.zeros(num_walks) num_times_counter = -1 for k in range(N): if k in pos_hist_times: num_times_counter += 1 store_hist = True else: store_hist = False for n in range(num_walks): # current_pos corresponds to step k+1 r = random.uniform(0, 1) if r <= p: current_pos[n] -= 1 else: current_pos[n] += 1 position [k+1] += current_pos[n] position2[k+1] += current_pos[n]**2 if store_hist: pos_hist[n,num_times_counter] = current_pos[n] return position, position2, pos_hist, np.array(pos_hist_times) The vectorized version of this code, where we just vectorize the loop over ``n``, becomes .. code-block:: python def random_walks1D2_vec1(x0, N, p, num_walks=1, num_times=1): """Vectorized version of random_walks1D2.""" position = np.zeros(N+1) # Accumulated positions position2 = np.zeros(N+1) # Accumulated positions**2 # Histogram at num_times selected time points pos_hist = np.zeros((num_walks, num_times)) pos_hist_times = [(N//num_times)*i for i in range(num_times)] current_pos = np.zeros(num_walks) current_pos[0] = x0 num_times_counter = -1 for k in range(N): if k in pos_hist_times: num_times_counter += 1 store_hist = True # Store histogram data for this k else: store_hist = False # Move all walks one step r = np.random.uniform(0, 1, size=num_walks) steps = np.where(r <= p, -1, 1) current_pos += steps position[k+1] = np.sum(current_pos) position2[k+1] = np.sum(current_pos**2) if store_hist: pos_hist[:,num_times_counter] = current_pos return position, position2, pos_hist, np.array(pos_hist_times) This function runs significantly faster than the ``random_walks1D_vec1`` function above, typically 1.7 times faster. The code is also more appropriate in a parallel computing context since each vectorized statement can work with data of size ``num_walks`` over the compute units, repeated ``N`` times (compared with data of size ``N``, repeated ``num_walks`` times, in ``random_walks1D_vec1``). The scalar code with switched loops, ``random_walks1D2`` runs a bit slower than the original code in ``random_walks1D``, so with the longest loop as the inner loop, the vectorized function ``random_walks1D2_vec1`` is almost 60 times faster than the scalar counterpart, while the code ``random_walks1D_vec2`` without loops is only around 18 times faster. Taking into account the very large arrays required by the latter function, we end up with ``random_walks1D2_vec1`` as the preferred implementation. Test function ~~~~~~~~~~~~~ During program development, it is highly recommended to carry out computations by hand for, e.g., ``N=4`` and ``num_walks=3``. Normally, this is done by executing the program with these parameters and checking with pen and paper that the computations make sense. The next step is to use this test for correctness in a formal test function. First, we need to check that the simulation of multiple random walks reproduces the results of ``random_walk1D``, ``random_walk1D_vec1``, and ``random_walk1D_vec2`` for the first walk, if the seed is the same. Second, we run three random walks (``N=4``) with the scalar and the two vectorized versions and check that the returned arrays are identical. For this type of test to be successful, we must be sure that exactly the same set of random numbers are used in the three versions, a fact that requires the same random number generator and the same seed, of course, but also the same sequence of computations. This is not obviously the case with the three ``random_walk1D*`` functions we have presented. The critical issue in ``random_walk1D_vec1`` is that the first random numbers are used for the first walk, the second set of random numbers is used for the second walk and so, to be compatible with how the random numbers are used in the function ``random_walk1D``. For the function ``random_walk1D_vec2`` the situation is a bit more complicated since we generate all the random numbers at once. However, the critical step now is the reshaping of the array returned from ``np.where``: we must reshape as ``(num_walks, N)`` to ensure that the first ``N`` random numbers are used for the first walk, the next ``N`` numbers are used for the second walk, and so on. We arrive at the test function below. .. code-block:: python def test_random_walks1D(): # For fixed seed, check that scalar and vectorized versions # produce the same result x0 = 0; N = 4; p = 0.5 # First, check that random_walks1D for 1 walk reproduces # the walk in random_walk1D num_walks = 1 np.random.seed(10) computed = random_walks1D( x0, N, p, num_walks, random=np.random) np.random.seed(10) expected = random_walk1D( x0, N, p, random=np.random) assert (computed[0] == expected).all() # Same for vectorized versions np.random.seed(10) computed = random_walks1D_vec1(x0, N, p, num_walks) np.random.seed(10) expected = random_walk1D_vec(x0, N, p) assert (computed[0] == expected).all() np.random.seed(10) computed = random_walks1D_vec2(x0, N, p, num_walks) np.random.seed(10) expected = random_walk1D_vec(x0, N, p) assert (computed[0] == expected).all() # Second, check multiple walks: scalar == vectorized num_walks = 3 num_times = N np.random.seed(10) serial_computed = random_walks1D( x0, N, p, num_walks, num_times, random=np.random) np.random.seed(10) vectorized1_computed = random_walks1D_vec1( x0, N, p, num_walks, num_times) np.random.seed(10) vectorized2_computed = random_walks1D_vec2( x0, N, p, num_walks, num_times) # positions: [0, 1, 0, 1, 2] # Can test without tolerance since everything is +/- 1 return_values = ['pos', 'pos2', 'pos_hist', 'pos_hist_times'] for s, v, r in zip(serial_computed, vectorized1_computed, return_values): msg = '%s: %s (serial) vs %s (vectorized)' % (r, s, v) assert (s == v).all(), msg for s, v, r in zip(serial_computed, vectorized2_computed, return_values): msg = '%s: %s (serial) vs %s (vectorized)' % (r, s, v) assert (s == v).all(), msg Such test functions are indispensable for further development of the code as we can at any time test whether the basic computations remain correct or not. This is particularly important in stochastic simulations since without test functions and fixed seeds, we always experience variations from run to run, and it can be very difficult to spot bugs through averaged statistical quantities. Demonstration of multiple walks ------------------------------- Assuming now that the code works, we can just scale up the number of steps in each walk and the number of walks. The latter influences the accuracy of the statistical estimates. Figure :ref:`diffu:randomwalk:1D:fig:demo1:EX` shows the impact of the number of walks on the expectation, which should approach zero. Figure :ref:`diffu:randomwalk:1D:fig:demo1:VarX` displays the corresponding estimate of the variance of the position, which should grow linearly with the number of steps. It does, seemingly very accurately, but notice that the scale on the :math:`y` axis is so much larger than for the expectation, so irregularities due to the stochastic nature of the process become so much less visible in the variance plots. The probability of finding a particle at a certain position at time (or step) 800 is shown in Figure :ref:`diffu:randomwalk:1D:fig:demo1:HistX`. The dashed red line is the theoretical distribution :ref:`(465) ` arising from solving the diffusion equation :ref:`(463) ` instead. As always, we realize that one needs significantly more statistical samples to estimate a histogram accurately than the expectation or variance. .. _diffu:randomwalk:1D:fig:demo1:EX: .. figure:: rw1D_EX_100_10000_100000_1000000.png :width: 800 *Estimated expected value for 1000 steps, using 100 walks (upper left), 10,000 (upper right), 100,000 (lower left), and 1,000,000 (lower right)* .. _diffu:randomwalk:1D:fig:demo1:VarX: .. figure:: rw1D_VarX_100_10000_100000_1000000.png :width: 800 *Estimated variance over 1000 steps, using 100 walks (upper left), 10,000 (upper right), 100,000 (lower left), and 1,000,000 (lower right)* .. _diffu:randomwalk:1D:fig:demo1:HistX: .. figure:: rw1D_HistX_100_10000_100000_1000000.png :width: 800 *Estimated probability distribution at step 800, using 100 walks (upper left), 10,000 (upper right), 100,000 (lower left), and 1,000,000 (lower right)* .. _diffu:randomwalk:1D:avplotter: Ascii visualization of 1D random walk ------------------------------------- .. index:: scitools.avplotter .. index:: Plotter class (SciTools) .. index:: interrupt a program by Ctrl+c If we want to study (very) long time series of random walks, it can be convenient to plot the position in a terminal window with the time axis pointing downwards. The module ``avplotter`` in SciTools has a class ``Plotter`` for plotting functions in the terminal window with the aid of ascii symbols only. Below is the code required to visualize a simple random walk, starting at the origin, and considered over when the point :math:`x=-1` is reached. We use a spacing :math:`\Delta x = 0.05` (so :math:`x=-1` corresponds to :math:`i=-20`). .. code-block:: python def run_random_walk(): from scitools.avplotter import Plotter import time, numpy as np p = Plotter(-1, 1, width=75) # Horizontal axis: 75 chars wide dx = 0.05 np.random.seed(10) x = 0 while True: random_step = 1 if np.random.random() > 0.5 else -1 x = x + dx*random_step if x < -1: break # Destination reached! print p.plot(0, x) # Allow Ctrl+c to abort the simulation try: time.sleep(0.1) # Wait for interrupt except KeyboardInterrupt: print 'Interrupted by Ctrl+c' break Observe that we implement an infinite loop, but allow a smooth interrupt of the program by ``Ctrl+c`` through Python's ``KeyboardInterrupt`` exception. This is a useful recipe that can be used in many occasions! The output looks typically like .. code-block:: text * | * | * | * | * | * | * | * | * | * | * | * | * | * | * | * | * | * | Positions beyond the limits of the :math:`x` axis appear with a value. `A long file `__ contains the complete ascii plot corresponding to the function ``run_random_walk`` above. .. "https://github.com/hplgit/fdm-book/blob/master/doc/.src/chapters/diffu/fig-diffu/rw_ascii.txt" .. _diffu:randomwalk:1D:ode: Random walk as a stochastic equation ------------------------------------ .. index:: stochastic difference equation The (dimensionless) position in a random walk, :math:`\bar X_k`, can be expressed as a stochastic difference equation: .. _Eq:diffu:randomwalk:1D:ode:x: .. math:: \tag{466} \bar X_k = \bar X_{k-1} + s, \quad x_0=0, where :math:`s` is a `Bernoulli variable `__, taking on the two values :math:`s=-1` and :math:`s=1` with equal probability: .. math:: \hbox{P}(s=1)=\frac{1}{2},\quad \hbox{P}(s=-1)=\frac{1}{2}{\thinspace .} The :math:`s` variable in a step is independent of the :math:`s` variable in other steps. The difference equation expresses essentially the sum of independent Bernoulli variables. Because of the central limit theorem, :math:`X_k`, will then be normally distributed with expectation :math:`k{\hbox{E}\lbrack s \rbrack}` and :math:`k{\hbox{Var}\lbrack s \rbrack}`. The expectation and variance of a Bernoulli variable with values :math:`r=0` and :math:`r=1` are :math:`p` and :math:`p(1-p)`, respectively. The variable :math:`s=2r-1` then has expectation :math:`2{\hbox{E}\lbrack r \rbrack}-1=2p-1=0` and variance :math:`2^2{\hbox{Var}\lbrack r \rbrack}=4p(1-p)=1`. The position :math:`X_k` is normally distributed with zero expectation and variance :math:`k`, as we found in the section :ref:`diffu:randomwalk:1D:EVar`. The central limit theorem tells that as long as :math:`k` is not small, the distribution of :math:`X_k` remains the same if we replace the Bernoulli variable :math:`s` by any other stochastic variable with the same expectation and variance. In particular, we may let :math:`s` be a standardized Gaussian variable (zero mean, unit variance). .. Let us introduce .. !bt .. x_k = \Delta x\bar x_k,\] .. !et .. such that we take steps of length :math:`\Delta x`. Dividing :ref:`(466) ` by :math:`\Delta t` gives .. math:: \frac{\bar X_k - \bar X_{k-1}}{\Delta t} = \frac{1}{\Delta t} s{\thinspace .} In the limit :math:`\Delta t\rightarrow 0`, :math:`s/\Delta t` approaches a white noise stochastic process. With :math:`\bar X(t)` as the continuous process in the limit :math:`\Delta t\rightarrow 0` (:math:`X_k\rightarrow X(t_k)`), we formally get the stochastic differential equation .. index:: stochastic ODE .. index:: Fokker-Planck equation .. index:: Wiener process .. _Eq:_auto199: .. math:: \tag{467} d\bar X = dW, where :math:`W(t)` is a `Wiener process `__. Then :math:`X` is also a Wiener process. It follows from the stochastic ODE :math:`dX=dW` that the probability distribution of :math:`X` is given by the `Fokker-Planck equation `__ :ref:`(463) `. In other words, the key results for random walk we found earlier can alternatively be derived via a stochastic ordinary differential equation and its related Fokker-Planck equation. Random walk in 2D ----------------- The most obvious generalization of 1D random walk to two spatial dimensions is to allow movements to the north, east, south, and west, with equal probability :math:`\frac{1}{4}`. .. code-block:: python def random_walk2D(x0, N, p, random=random): """2D random walk with 1 particle and N moves: N, E, W, S.""" # Store position in step k in position[k] d = len(x0) position = np.zeros((N+1, d)) position[0,:] = x0 current_pos = np.array(x0, dtype=float) for k in range(N): r = random.uniform(0, 1) if r <= 0.25: current_pos += np.array([0, 1]) # Move north elif 0.25 < r <= 0.5: current_pos += np.array([1, 0]) # Move east elif 0.5 < r <= 0.75: current_pos += np.array([0, -1]) # Move south else: current_pos += np.array([-1, 0]) # Move west position[k+1,:] = current_pos return position The left plot in Figure :ref:`diffu:randomwalk:2D:fig:rect_vs_diag` provides an example on 200 steps with this kind of walk. We may refer to this walk as a walk on a *rectangular mesh* as we move from any spatial mesh point :math:`(i,j)` to one of its four neighbors in the rectangular directions: :math:`(i+1,j)`, :math:`(i-1,j)`, :math:`(i,j+1)`, or :math:`(i,j-1)`. .. _diffu:randomwalk:2D:fig:rect_vs_diag: .. figure:: rw2D_sample200.png :width: 800 *Random walks in 2D with 200 steps: rectangular mesh (left) and diagonal mesh (right)* Random walk in any number of space dimensions --------------------------------------------- From a programming point of view, especially when implementing a random walk in any number of dimensions, it is more natural to consider a walk in the diagonal directions NW, NE, SW, and SE. On a two-dimensional spatial mesh it means that we go from :math:`(i,j)` to either :math:`(i+1,j+1)`, :math:`(i-1,j+1)`, :math:`(i+1,j-1)`, or :math:`(i-1,j-1)`. We can with such a *diagonal mesh* (see right plot in Figure :ref:`diffu:randomwalk:2D:fig:rect_vs_diag`) draw a Bernoulli variable for the step in each spatial direction and trivially write code that works in any number of spatial directions: .. code-block:: python def random_walkdD(x0, N, p, random=random): """Any-D (diagonal) random walk with 1 particle and N moves.""" # Store position in step k in position[k] d = len(x0) position = np.zeros((N+1, d)) position[0,:] = x0 current_pos = np.array(x0, dtype=float) for k in range(N): for i in range(d): r = random.uniform(0, 1) if r <= p: current_pos[i] -= 1 else: current_pos[i] += 1 position[k+1,:] = current_pos return position A vectorized version is desired. We follow the ideas from the section :ref:`diffu:randomwalk:1D:code1`, but each step is now a vector in :math:`d` spatial dimensions. We therefore need to draw :math:`Nd` random numbers in ``r``, compute steps in the various directions through ``np.where(r <=p, -1, 1)`` (each step being :math:`-1` or :math:`1`), and then we can reshape this array to an :math:`N\times d` array of step *vectors*. Doing an ``np.cumsum`` summation along axis 0 will add the vectors, as this demo shows: .. code-block:: python >>> a = np.arange(6).reshape(3,2) >>> a array([[0, 1], [2, 3], [4, 5]]) >>> np.cumsum(a, axis=0) array([[ 0, 1], [ 2, 4], [ 6, 9]]) With such summation of step vectors, we get all the positions to be filled in the ``position`` array: .. code-block:: python def random_walkdD_vec(x0, N, p): """Vectorized version of random_walkdD.""" d = len(x0) # Store position in step k in position[k] position = np.zeros((N+1,d)) position[0] = np.array(x0, dtype=float) r = np.random.uniform(0, 1, size=N*d) steps = np.where(r <= p, -1, 1).reshape(N,d) position[1:,:] = x0 + np.cumsum(steps, axis=0) return position .. _diffu:randomwalk:2D:fig:samples: .. figure:: rw2D_samples_5000.png :width: 800 *Four random walks with 5000 steps in 2D* Multiple random walks in any number of space dimensions ------------------------------------------------------- As we did in 1D, we extend one single walk to a number of walks (``num_walks`` in the code). Scalar code (2) ~~~~~~~~~~~~~~~~~~~~~~~~ As always, we start with implementing the scalar case: .. code-block:: python def random_walksdD(x0, N, p, num_walks=1, num_times=1, random=random): """Simulate num_walks random walks from x0 with N steps.""" d = len(x0) position = np.zeros((N+1, d)) # Accumulated positions position2 = np.zeros((N+1, d)) # Accumulated positions**2 # Histogram at num_times selected time points pos_hist = np.zeros((num_walks, num_times, d)) pos_hist_times = [(N//num_times)*i for i in range(num_times)] for n in range(num_walks): num_times_counter = 0 current_pos = np.array(x0, dtype=float) for k in range(N): if k in pos_hist_times: pos_hist[n,num_times_counter,:] = current_pos num_times_counter += 1 # current_pos corresponds to step k+1 for i in range(d): r = random.uniform(0, 1) if r <= p: current_pos[i] -= 1 else: current_pos[i] += 1 position [k+1,:] += current_pos position2[k+1,:] += current_pos**2 return position, position2, pos_hist, np.array(pos_hist_times) Vectorized code (3) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Significant speed-ups can be obtained by vectorization. We get rid of the loops in the previous function and arrive at the following vectorized code. .. code-block:: python def random_walksdD_vec(x0, N, p, num_walks=1, num_times=1): """Vectorized version of random_walks1D; no loops.""" d = len(x0) position = np.zeros((N+1, d)) # Accumulated positions position2 = np.zeros((N+1, d)) # Accumulated positions**2 walks = np.zeros((num_walks, N+1, d)) # Positions of each walk walks[:,0,:] = x0 # Histogram at num_times selected time points pos_hist = np.zeros((num_walks, num_times, d)) pos_hist_times = [(N//num_times)*i for i in range(num_times)] r = np.random.uniform(0, 1, size=N*num_walks*d) steps = np.where(r <= p, -1, 1).reshape(num_walks, N, d) walks[:,1:,:] = x0 + np.cumsum(steps, axis=1) position = np.sum(walks, axis=0) position2 = np.sum(walks**2, axis=0) pos_hist[:,:,:] = walks[:,pos_hist_times,:] return position, position2, pos_hist, np.array(pos_hist_times) .. 2D N=4M .. scalar/vec (1 walk): 25 .. 2D N=5000 .. num_walks=10K .. scalar/vec=50 .. 3D: 40 .. 1D: 100 .. scalar is about the same for 1D, 2D, 3D, but vec is 0.3, 0.7, 1.0 .. _diffu:app: Applications ============ .. _diffu:app:substance: Diffusion of a substance ------------------------ The first process to be considered is a substance that gets transported through a fluid at rest by pure diffusion. We consider an arbitrary volume :math:`V` of this fluid, containing the substance with concentration function :math:`c(\boldsymbol{x},t)`. Physically, we can think of a very small volume with centroid :math:`\boldsymbol{x}` at time :math:`t` and assign the ratio of the volume of the substance and the total volume to :math:`c(\boldsymbol{x}, t)`. This means that the mass of the substance in a small volume :math:`\Delta V` is approximately :math:`\varrho c\Delta V`, where :math:`\varrho` is the density of the substance. Consequently, the total mass of the substance inside the volume :math:`V` is the sum of all :math:`\varrho c\Delta V`, which becomes the volume integral :math:`\int_V\varrho cdV`. Let us reason how the mass of the substance changes and thereby derive a PDE governing the concentration :math:`c`. Suppose the substance flows out of :math:`V` with a flux :math:`\boldsymbol{q}`. If :math:`\Delta S` is a small part of the boundary :math:`\partial V` of :math:`V`, the volume of the substance flowing out through :math:`dS` in a small time interval :math:`\Delta t` is :math:`\varrho \boldsymbol{q}\cdot\boldsymbol{n} \Delta t \Delta S`, where :math:`\boldsymbol{n}` is an outward unit normal to the boundary :math:`\partial V`, see Figure :ref:`diffu:app:substance:fig1`. We realize that only the normal component of :math:`\boldsymbol{q}` is able to transport mass in and out of :math:`V`. The total outflow of the mass of the substance in a small time interval :math:`\Delta t` becomes the surface integral .. math:: \int\limits_{\partial V} \varrho\boldsymbol{q}\cdot\boldsymbol{n} \Delta t\, dS{\thinspace .} Assuming conservation of mass, this outflow of mass must be balanced by a loss of mass inside the volume. The increase of mass inside the volume, during a small time interval :math:`\Delta t`, is .. math:: \int\limits_V \varrho (c(\boldsymbol{x},t+\Delta t) - c(\boldsymbol{x},t)) dV, assuming :math:`\varrho` is constant, which is reasonable. The outflow of mass balances the loss of mass in :math:`V`, which is the increase with a minus sign. Setting the two contributions equal to each other ensures balance of mass inside :math:`V`. Dividing by :math:`\Delta t` gives .. math:: \int\limits_V \varrho \frac{c(\boldsymbol{x},t+\Delta t) - c(\boldsymbol{x},t)}{\Delta t} dV = - \int\limits_{\partial V} \varrho\boldsymbol{q}\cdot\boldsymbol{n}\, dS{\thinspace .} Note the minus sign on the right-hand side: the left-hand side expresses loss of mass, while the integral on the right-hand side is the gain of mass. Now, letting :math:`\Delta t\rightarrow 0`, we have .. math:: \frac{c(\boldsymbol{x},t+\Delta t) - c(\boldsymbol{x},t)}{\Delta t} \rightarrow \frac{\partial c}{\partial t}, so .. _Eq:diffu:app:substance:integral: .. math:: \tag{468} \int\limits_V \varrho \frac{\partial c}{\partial t}dV + \int\limits_{\partial V} \varrho\boldsymbol{q}\cdot\boldsymbol{n}\, dS = 0{\thinspace .} To arrive at a PDE, we express the surface integral as a volume integral using Gauss' divergence theorem: .. math:: \int\limits_V (\varrho \frac{\partial c}{\partial t} + \nabla\cdot(\varrho \boldsymbol{q}))dV = 0{\thinspace .} Since :math:`\varrho` is constant, we can divide by this quantity. If the integral is to vanish for an arbitrary volume :math:`V`, the integrand must vanish too, and we get the mass conservation PDE for the substance: .. _Eq:diffu:app:substance:massconv: .. math:: \tag{469} \frac{\partial c}{\partial t} + \nabla\cdot \boldsymbol{q} = 0{\thinspace .} .. _diffu:app:substance:fig1: .. figure:: continuum.png :width: 200 *An arbitrary volume of a fluid* A fundamental problem is that this is a scalar PDE for four unknowns: :math:`c` and the three components of :math:`\boldsymbol{q}`. We therefore need additional equations. Here, Fick's law comes at rescue: it models how the flux :math:`\boldsymbol{q}` of the substance is related to the concentration :math:`c`. Diffusion is recognized by mass flowing from regions with high concentration to regions of low concentration. This principle suggests that :math:`\boldsymbol{q}` is proportional to the negative gradient of :math:`c`: .. _Eq:diffu:app:substance:Fick: .. math:: \tag{470} \boldsymbol{q} = -{\alpha}\nabla c, where :math:`{\alpha}` is an empirically determined constant. The relation :ref:`(470) ` is known as Fick's law. Inserting :ref:`(470) ` in :ref:`(469) ` gives a scalar PDE for the concentration :math:`c`: .. _Eq:diffu:app:substance:PDE: .. math:: \tag{471} \frac{\partial c}{\partial t} = {\alpha}\nabla^2 c{\thinspace .} .. _diffu:app:heat: Heat conduction --------------- Heat conduction is a well-known diffusion process. The governing PDE is in this case based on the first law of thermodynamics: the increase in energy of a system is equal to the work done on the system, plus the supplied heat. Here, we shall consider media at rest and neglect work done on the system. The principle then reduces to a balance between increase in internal energy and supplied heat flow by conduction. Let :math:`e(x,t)` be the *internal energy* per unit mass. The increase of the internal energy in a small volume :math:`\Delta V` in a small time interval :math:`\Delta t` is then .. math:: \varrho (e(\boldsymbol{x},t+\Delta t) - e(\boldsymbol{x},t))\Delta V, where :math:`\varrho` is the density of the material subject to heat conduction. In an arbitrary volume :math:`V`, as depicted in Figure :ref:`diffu:app:substance:fig1`, the corresponding increase in internal energy becomes the volume integral .. math:: \int\limits_V \varrho (e(\boldsymbol{x},t+\Delta t) - e(\boldsymbol{x},t))dV{\thinspace .} This increase in internal energy is balanced by heat supplied by conduction. Let :math:`\boldsymbol{q}` be the heat flow per time unit. Through the surface :math:`\partial V` of :math:`V` the following amount of heat flows out of :math:`V` during a time interval :math:`\Delta t`: .. math:: \int\limits_{\partial V} \boldsymbol{q}\cdot\boldsymbol{n}\Delta t\, dS{\thinspace .} The simplified version of the first law of thermodynamics then states that .. math:: \int\limits_V \varrho (e(\boldsymbol{x},t+\Delta t) - e(\boldsymbol{x},t))dV = - \int\limits_{\partial V} \boldsymbol{q}\cdot\boldsymbol{n}\Delta t\, dS{\thinspace .} The minus sign on the right-hand side ensures that the integral there models net *inflow* of heat (since :math:`\boldsymbol{n}` is an outward unit normal, :math:`\boldsymbol{q}\cdot\boldsymbol{n}` models *outflow*). Dividing by :math:`\Delta t` and notifying that .. math:: \lim_{\Delta t\rightarrow 0} \frac{e(\boldsymbol{x},t+\Delta t) - e(\boldsymbol{x},t)}{\Delta t} = \frac{\partial e}{\partial t}, we get (in the limit :math:`\Delta t\rightarrow 0`) .. math:: \int\limits_V \varrho \frac{\partial e}{\partial t} dV + \int\limits_{\partial V} \boldsymbol{q}\cdot\boldsymbol{n}\Delta t\, dS = 0{\thinspace .} This is the integral equation for heat conduction, but we aim at a PDE. The next step is therefore to transform the surface integral to a volume integral via Gauss' divergence theorem. The result is .. math:: \int\limits_V\left( \varrho \frac{\partial e}{\partial t} + \nabla\cdot\boldsymbol{q}\right) dV = 0{\thinspace .} If this equality is to hold for all volumes :math:`V`, the integrand must vanish, and we have the PDE .. _Eq:diffu:app:heat:PDE1: .. math:: \tag{472} \varrho \frac{\partial e}{\partial t} = -\nabla\cdot\boldsymbol{q}{\thinspace .} Sometimes the supplied heat can come from the medium itself. This is the case, for instance, when radioactive rock generates heat. Let us add this effect. If :math:`f(\boldsymbol{x},t)` is the supplied heat per unit volume per unit time, the heat supplied in a small volume is :math:`f\Delta t\Delta V`, and inside an arbitrary volume :math:`V` the supplied generated heat becomes .. math:: \int\limits_V f\Delta t dV{\thinspace .} Adding this to the integral statement of the (simplified) first law of thermodynamics, and continuing the derivation, leads to the PDE .. _Eq:diffu:app:heat:PDE2: .. math:: \tag{473} \varrho \frac{\partial e}{\partial t} = -\nabla\cdot\boldsymbol{q} + f{\thinspace .} There are four unknown scalar fields: :math:`e` and :math:`\boldsymbol{q}`. Moreover, the temperature :math:`T`, which is our primary quantity to compute, does not enter the model yet. We need an additional equation, called the *equation of state*, relating :math:`e`, :math:`V=1/\varrho=`, and :math:`T`: :math:`e=e(V,T)`. By the chain rule we have .. math:: \frac{\partial e}{\partial t} = \left.\frac{\partial e}{\partial T}\right\vert_{V} \frac{\partial T}{\partial t} + \left.\frac{\partial e}{\partial V}\right\vert_{T} \frac{\partial V}{\partial t}{\thinspace .} The first coefficient :math:`\partial e/\partial T` is called *specific heat capacity at constant volume*, denoted by :math:`c_v`: .. math:: c_v = \left.\frac{\partial e}{\partial T}\right\vert_{V}{\thinspace .} The specific heat capacity will in general vary with :math:`T`, but taking it as a constant is a good approximation in many applications. The term :math:`\partial e/\partial V` models effects due to compressibility and volume expansion. These effects are often small and can be neglected. We shall do so here. Using :math:`\partial e/\partial t = c_v\partial T/\partial t` in the PDE gives .. math:: \varrho c_v\frac{\partial T}{\partial t} = -\nabla\cdot\boldsymbol{q} + f{\thinspace .} We still have four unknown scalar fields (:math:`T` and :math:`\boldsymbol{q}`). To close the system, we need a relation between the heat flux :math:`\boldsymbol{q}` and the temperature :math:`T` called Fourier's law: .. math:: \boldsymbol{q} = -k\nabla T, which simply states that heat flows from hot to cold areas, along the path of greatest variation. In a solid medium, :math:`k` depends on the material of the medium, and in multi-material media one must regard :math:`k` as spatially dependent. In a fluid, it is common to assume that :math:`k` is constant. The value of :math:`k` reflects how easy heat is conducted through the medium, and :math:`k` is named the *coefficient of heat conduction*. We have now one scalar PDE for the unknown temperature field :math:`T(\boldsymbol{x},t)`: .. _Eq:diffu:app:heat:PDE: .. math:: \tag{474} \varrho c_v\frac{\partial T}{\partial t} = \nabla\cdot(k\nabla T) + f{\thinspace .} .. _diffu:app:porous: Porous media flow ----------------- The requirement of mass balance for flow of a single, incompressible fluid through a deformable (elastic) porous medium leads to the equation .. math:: S\frac{\partial p}{\partial t} + \nabla\cdot(\boldsymbol{q} - \alpha\frac{\partial\boldsymbol{u}}{\partial t}) = 0, where :math:`p` is the fluid pressure, :math:`\boldsymbol{q}` is the fluid velocity, :math:`\boldsymbol{u}` is the displacement (deformation) of the medium, :math:`S` is the storage coefficient of the medium (related to the compressibility of the fluid and the material in the medium), and :math:`\alpha` is another coefficient. In many circumstances, the last term with :math:`\boldsymbol{u}` can be neglected, an assumption that decouples the equation above from a model for the deformation of the medium. The famous Darcy's law relates :math:`\boldsymbol{q}` to :math:`p`: .. math:: \boldsymbol{q} = -\frac{K}{\mu}(\nabla p - \varrho\boldsymbol{g}), where :math:`K` is the permeability of the medium, :math:`\mu` is the dynamic viscosity of the fluid, :math:`\varrho` is the density of the fluid, and :math:`\boldsymbol{g}` is the acceleration of gravity, here taken as :math:`\boldsymbol{g} = -g\boldsymbol{k}`. Combining the two equations results in the diffusion model .. _Eq:diffu:app:porous:PDE: .. math:: \tag{475} S\frac{\partial p}{\partial t} = \mu^{-1}\nabla(K\nabla p) + \frac{\varrho g}{\mu}\frac{\partial K}{\partial z}{\thinspace .} Boundary conditions consist of specifying :math:`p` or :math:`\boldsymbol{q}\cdot\boldsymbol{n}` at (normal velocity) each point of the boundary. Potential fluid flow -------------------- Let :math:`\boldsymbol{v}` be the velocity of a fluid. The condition :math:`\nabla\times\boldsymbol{v} =0` is relevant for many flows, especially in geophysics when viscous effects are negligible. From vector calculus it is known that :math:`\nabla\times\boldsymbol{v} =0` implies that :math:`v` can be derived from a scalar potential field :math:`\phi`: :math:`\boldsymbol{v} = \nabla\phi`. If the fluid is incompressible, :math:`\nabla\cdot\boldsymbol{v} = 0`, it follows that :math:`\nabla\cdot\nabla\phi = 0`, or .. _Eq:_auto200: .. math:: \tag{476} \nabla^2\phi = 0{\thinspace .} This Laplace equation is sufficient for determining :math:`\phi` and thereby describe the fluid motion. This type of flow is known as `potential flow `__. One very important application where potential flow is a good model is water waves. As boundary condition we must prescribe :math:`\boldsymbol{v}\cdot\boldsymbol{n} =\partial\phi/\partial n`. This gives rise to what is known as a pure Neumann problem and will cause numerical difficulties because :math:`\phi` and :math:`\phi` plus any constant are two solutions of the problem. The simplest remedy is to fix the value of :math:`\phi` at a point. Streamlines for 2D fluid flow ----------------------------- The streamlines in a two-dimensional stationary fluid flow are lines tangential to the flow. The `stream function `__ :math:`\psi` is often introduced in two-dimensional flow such that its contour lines, :math:`\psi = \hbox{const}`, gives the streamlines. The relation between :math:`\psi` and the velocity field :math:`\boldsymbol{v}=(u,v)` is .. math:: u = \frac{\partial\psi}{\partial y},\quad v = - \frac{\partial\psi}{\partial x}{\thinspace .} It follows that :math:`\nabla\boldsymbol{v} = \psi_{yx} - \psi_{xy}=0`, so the stream function can only be used for incompressible flows. Since .. math:: \nabla\times\boldsymbol{v} = \left(\frac{\partial v}{\partial y} - \frac{\partial u}{\partial x}\right)\boldsymbol{k} \equiv \omega\boldsymbol{k}, we can derive the relation .. _Eq:_auto201: .. math:: \tag{477} \nabla^2\psi = -\omega, which is a governing equation for the stream function :math:`\psi(x,y)` if the vorticity :math:`\omega` is known. The potential of an electric field ---------------------------------- Under the assumption of time independence, Maxwell's equations for the electric field :math:`\boldsymbol{E}` become .. math:: \begin{align*} \nabla\cdot\boldsymbol{E} &= \frac{\rho}{\epsilon_0},\\ \nabla\times\boldsymbol{E} &= 0, \end{align*} where :math:`\rho` is the electric charge density and :math:`\epsilon_0` is the electric permittivity of free space (i.e., vacuum). Since :math:`\nabla\times\boldsymbol{E}=0`, :math:`\boldsymbol{E}` can be derived from a potential :math:`\varphi`, :math:`\boldsymbol{E} = -\nabla\varphi`. The electric field potential is therefore governed by the Poisson equation .. _Eq:_auto202: .. math:: \tag{478} \nabla^2\varphi = -\frac{\rho}{\epsilon_0}{\thinspace .} If the medium is heterogeneous, :math:`\rho` will depend on the spatial location :math:`\boldsymbol{r}`. Also, :math:`\epsilon_0` must be exchanged with an electric permittivity function :math:`\epsilon(\boldsymbol{r})`. Each point of the boundary must be accompanied by, either a Dirichlet condition :math:`\varphi(\boldsymbol{r}) = \varphi_D(\boldsymbol{r})`, or a Neumann condition :math:`\frac{\partial\varphi(\boldsymbol{r})}{\partial n} = \varphi_N(\boldsymbol{r})`. .. _diffu:app:Couette: Development of flow between two flat plates ------------------------------------------- Diffusion equations may also arise as simplified versions of other mathematical models, especially in fluid flow. Consider a fluid flowing between two flat, parallel plates. The velocity is uni-directional, say along the :math:`z` axis, and depends only on the distance :math:`x` from the plates; :math:`\boldsymbol{u} = u(x,t)\boldsymbol{k}`. The flow is governed by the Navier-Stokes equations, .. math:: \begin{align*} \varrho\frac{\partial\boldsymbol{u}}{\partial t} + \varrho\boldsymbol{u}\cdot\nabla\boldsymbol{u} &= -\nabla p + \mu\nabla^2\boldsymbol{u} + \varrho\boldsymbol{f},\\ \nabla\cdot\boldsymbol{u} &= 0, \end{align*} where :math:`p` is the pressure field, unknown along with the velocity :math:`\boldsymbol{u}`, :math:`\varrho` is the fluid density, :math:`\mu` the dynamic viscosity, and :math:`\boldsymbol{f}` is some external body force. The geometric restrictions of flow between two flat plates puts restrictions on the velocity, :math:`\boldsymbol{u} = u(x,t)\boldsymbol{i}`, and the :math:`z` component of the Navier-Stokes equations collapses to a diffusion equation: .. math:: \varrho\frac{\partial u}{\partial t} = - \frac{\partial p}{\partial z} + \mu\frac{\partial^2 u}{\partial z^2} + \varrho f_z, if :math:`f_z` is the component of :math:`\boldsymbol{f}` in the :math:`z` direction. The boundary conditions are derived from the fact that the fluid sticks to the plates, which means :math:`\boldsymbol{u}=0` at the plates. Say the location of the plates are :math:`z=0` and :math:`z=L`. We then have .. math:: u(0,t)=u(L,t)=0{\thinspace .} One can easily show that :math:`\partial p/\partial z` must be a constant or just a function of time :math:`t`. We set :math:`\partial p/\partial z = -\beta(t)`. The body force could be a component of gravity, if desired, set as :math:`f_z = \gamma g`. Switching from :math:`z` to :math:`x` as independent variable gives a very standard one-dimensional diffusion equation: .. math:: \varrho\frac{\partial u}{\partial t} = \mu\frac{\partial^2 u}{\partial z^2} + \beta(t) + \varrho\gamma g,\quad x\in [0,L],\ t\in (0,T]{\thinspace .} The boundary conditions are .. math:: u(0,t)=u(L,t)=0, while some initial condition .. math:: u(x,0) = I(x) must also be prescribed. The flow is driven by either the pressure gradient :math:`\beta` or gravity, or a combination of both. One may also consider one moving plate that drives the fluid. If the plate at :math:`x=L` moves with velocity :math:`U_L(t)`, we have the adjusted boundary condition .. math:: u(L,t) = U_L(t){\thinspace .} .. _diffu:app:pipeflow: Flow in a straight tube ----------------------- Now we consider viscous fluid flow in a straight tube with radius :math:`R` and rigid walls. The governing equations are the Navier-Stokes equations, but as in the section :ref:`diffu:app:Couette`, it is natural to assume that the velocity is directed along the tube, and that it is axi-symmetric. These assumptions reduced the velocity field to :math:`\boldsymbol{u} = u(r,x,t)\boldsymbol{i}`, if the :math:`x` axis is directed along the tube. From the equation of continuity, :math:`\nabla\cdot\boldsymbol{u} = 0`, we see that :math:`u` must be independent of :math:`x`. Inserting :math:`\boldsymbol{u} = u(r,t)\boldsymbol{i}` in the Navier-Stokes equations, expressed in axi-symmetric cylindrical coordinates, results in .. _Eq:diffu:app:pipeflow:pde: .. math:: \tag{479} \varrho\frac{\partial u}{\partial t} = \mu\frac{1}{r}\frac{\partial}{\partial r}\left( r\frac{\partial u}{\partial r}\right) + \beta(t) + \varrho\gamma g,\quad r\in [0,R],\ t\in (0,T]{\thinspace .} Here, :math:`\beta(t) = -\partial p/\partial x` is the pressure gradient along the tube. The associated boundary condition is :math:`u(R,t)=0`. Tribology: thin film fluid flow ------------------------------- Thin fluid films are extremely important inside machinery to reduce friction between gliding surfaces. The mathematical model for the fluid motion takes the form of a diffusion problem and is quickly derived here. We consider two solid surfaces whose distance is described by a gap function :math:`h(x,y)`. The space between these surfaces is filled with a fluid with dynamic viscosity :math:`\mu`. The fluid may move partially because of pressure gradients and partially because the surfaces move. Let :math:`U\boldsymbol{i} + V\boldsymbol{j}` be the relative velocity of the two surfaces and :math:`p` the pressure in the fluid. The mathematical model builds on two principles: 1) conservation of mass, 2) assumption of locally quasi-static flow between flat plates. The conservation of mass equation reads :math:`\nabla\cdot\boldsymbol{u}`, where :math:`\boldsymbol{u}` is the local fluid velocity. For thin films the detailed variation between the surfaces is not of interest, so :math:`\nabla\cdot\boldsymbol{u} = 0` is integrated (average) in the direction perpendicular to the surfaces. This gives rise to the alternative mass conservation equation .. math:: \nabla\cdot\boldsymbol{q} = 0,\quad \boldsymbol{q} = \int\limits_0^{h(x,y)}\boldsymbol{u} dz, where :math:`z` is the coordinate perpendicular to the surfaces, and :math:`\boldsymbol{q}` is then the volume flux in the fluid gap. Locally, we may assume that we have steady flow between two flat surfaces, with a pressure gradient and where the lower surface is at rest and the upper moves with velocity :math:`U\boldsymbol{i} + V\boldsymbol{j}`. The corresponding mathematical problem is actually the limit problem in the section :ref:`diffu:app:Couette` as :math:`t\rightarrow\infty`. The limit problem can be solved analytically, and the local volume flux becomes .. math:: \boldsymbol{q}(x,y,z) = \int\limits_0^{h}\boldsymbol{u}(x,y,z) dz = -\frac{h^3}{12\mu}\nabla p + \frac{1}{2} Uh\boldsymbol{i} + \frac{1}{2} Vh\boldsymbol{j}{\thinspace .} The idea is to use this expression locally also when the surfaces are not flat, but slowly varying, and if :math:`U`, :math:`V`, or :math:`p` varies in time, provided the time variation is sufficiently slow. This is a common quasi-static approximation much used in mathematical modeling. Inserting the expression for :math:`\boldsymbol{q}` via :math:`p`, :math:`U`, and :math:`V` in the equation :math:`\nabla\boldsymbol{q} = 0` gives a diffusion PDE for :math:`p`: .. _Eq:_auto203: .. math:: \tag{480} \nabla\cdot\left(\frac{h^3}{12\mu}\nabla p\right) = \frac{1}{2}\frac{\partial}{\partial x}(hU) + \frac{1}{2}\frac{\partial}{\partial x}(hV){\thinspace .} The boundary conditions must involve :math:`p` or :math:`\boldsymbol{q}` at the boundary. Propagation of electrical signals in the brain ---------------------------------------------- .. ``_ .. ``_ .. ``_ .. The book by Peskin & ... One can make a model of how electrical signals are propagated along the neuronal fibers that receive synaptic inputs in the brain. The signal propagation is one-dimensional and can, in the simplest cases, be governed by the `Cable equation `__: .. math:: c_m \frac{\partial V}{\partial t} = \frac{1}{r_l}\frac{\partial^2 V}{\partial x^2} - \frac{1}{r_m}V label{} where :math:`V(x,t)` is the voltage to be determined, :math:`c_m` is capacitance of the neuronal fiber, while :math:`r_l` and :math:`r_m` are measures of the resistance. The boundary conditions are often taken as :math:`V=0` at a short circuit or open end, :math:`\partial V/\partial x=0` at a sealed end, or :math:`\partial V/\partial x \propto V` where there is an injection of current. Exercises (7) ====================== .. --- begin exercise --- .. _diffu:exer:CN:Rannacher: Exercise 3.6: Stabilizing the Crank-Nicolson method by Rannacher time stepping ------------------------------------------------------------------------------ It is well known that the Crank-Nicolson method may give rise to non-physical oscillations in the solution of diffusion equations if the initial data exhibit jumps (see the section :ref:`diffu:pde1:analysis:CN`). Rannacher [Ref15]_ suggested a stabilizing technique consisting of using the Backward Euler scheme for the first two time steps with step length :math:`\frac{1}{2}\Delta t`. One can generalize this idea to taking :math:`2m` time steps of size :math:`\frac{1}{2}\Delta t` with the Backward Euler method and then continuing with the Crank-Nicolson method, which is of second-order in time. The idea is that the high frequencies of the initial solution are quickly damped out, and the Backward Euler scheme treats these high frequencies correctly. Thereafter, the high frequency content of the solution is gone and the Crank-Nicolson method will do well. Test this idea for :math:`m=1,2,3` on a diffusion problem with a discontinuous initial condition. Measure the convergence rate using the solution :ref:`(393) ` with the boundary conditions :ref:`(394) `-:ref:`(395) ` for :math:`t` values such that the conditions are in the vicinity of :math:`\pm 1`. For example, :math:`t< 5a 1.6\cdot 10^{-2}` makes the solution diffusion from a step to almost a straight line. The program ``diffu_erf_sol.py`` shows how to compute the analytical solution. .. --- end exercise --- .. --- begin exercise --- .. _diffu:exer:energy:estimates: Project 3.7: Energy estimates for diffusion problems ---------------------------------------------------- .. index:: energy estimates (diffusion) This project concerns so-called *energy estimates* for diffusion problems that can be used for qualitative analytical insight and for verification of implementations. **a)** We start with a 1D homogeneous diffusion equation with zero Dirichlet conditions: .. _Eq:diffu:exer:estimates:p1:pdf: .. math:: \tag{481} u_t = \alpha u_xx, x\in \Omega =(0,L),\ t\in (0,T], .. _Eq:diffu:exer:estimates:p1:bc: .. math:: \tag{482} u(0,t) = u(L,t) = 0, t\in (0,T], .. _Eq:diffu:exer:estimates:p1:ic: .. math:: \tag{483} u(x,0) = I(x), x\in [0,L] {\thinspace .} The energy estimate for this problem reads .. _Eq:diffu:exer:estimates:p1:result: .. math:: \tag{484} ||u||_{L^2} \leq ||I||_{L^2}, where the :math:`||\cdot ||_{L^2}` norm is defined by .. _Eq:diffu:exer:estimates:L2: .. math:: \tag{485} ||g||_{L^2} = \sqrt{\int_0^L g^2dx}{\thinspace .} The quantify :math:`||u||_{L^2}` or :math:`\frac{1}{2} ||u||_{L^2}` is known as the *energy* of the solution, although it is not the physical energy of the system. A mathematical tradition has introduced the notion *energy* in this context. The estimate :ref:`(484) ` says that the "size of $u$" never exceeds that of the initial condition, or more precisely, it says that the area under the :math:`u` curve decreases with time. To show :ref:`(484) `, multiply the PDE by :math:`u` and integrate from :math:`0` to :math:`L`. Use that :math:`uu_t` can be expressed as the time derivative of :math:`u^2` and that :math:`u_xxu` can integrated by parts to form an integrand :math:`u_x^2`. Show that the time derivative of :math:`||u||_{L^2}^2` must be less than or equal to zero. Integrate this expression and derive :ref:`(484) `. .. ``_ **b)** Now we address a slightly different problem, .. _Eq:diffu:exer:estimates:p2:pdf: .. math:: \tag{486} u_t = \alpha u_xx + f(x,t), x\in \Omega =(0,L),\ t\in (0,T], .. _Eq:diffu:exer:estimates:p2:bc: .. math:: \tag{487} u(0,t) = u(L,t) = 0, t\in (0,T], .. _Eq:diffu:exer:estimates:p2:ic: .. math:: \tag{488} u(x,0) = 0, x\in [0,L] {\thinspace .} The associated energy estimate is .. _Eq:diffu:exer:estimates:p2:result: .. math:: \tag{489} ||u||_{L^2} \leq ||f||_{L^2}{\thinspace .} (This result is more difficult to derive.) Now consider the compound problem with an initial condition :math:`I(x)` and a right-hand side :math:`f(x,t)`: .. _Eq:diffu:exer:estimates:p3:pdf: .. math:: \tag{490} u_t = \alpha u_xx + f(x,t), x\in \Omega =(0,L),\ t\in (0,T], .. _Eq:diffu:exer:estimates:p3:bc: .. math:: \tag{491} u(0,t) = u(L,t) = 0, t\in (0,T], .. _Eq:diffu:exer:estimates:p3:ic: .. math:: \tag{492} u(x,0) = I(x), x\in [0,L] {\thinspace .} Show that if :math:`w_1` fulfills :ref:`(481) `-:ref:`(483) ` and :math:`w_2` fulfills :ref:`(486) `-:ref:`(488) `, then :math:`u=w_1 + w_2` is the solution of :ref:`(490) `-:ref:`(492) `. Using the triangle inequality for norms, .. math:: ||a + b|| \leq ||a|| + ||b||, show that the energy estimate for :ref:`(490) `-:ref:`(492) ` becomes .. _Eq:diffu:exer:estimates:p3:result: .. math:: \tag{493} ||u||_{L^2} \leq ||I||_{L^2} + ||f||_{L^2}{\thinspace .} **c)** One application of :ref:`(493) ` is to prove uniqueness of the solution. Suppose :math:`u_1` and :math:`u_2` both fulfill :ref:`(490) `-:ref:`(492) `. Show that :math:`u=u_1 - u_2` then fulfills :ref:`(490) `-:ref:`(492) ` with :math:`f=0` and :math:`I=0`. Use :ref:`(493) ` to deduce that the energy must be zero for all times and therefore that :math:`u_1=u_2`, which proves that the solution is unique. **d)** Generalize :ref:`(493) ` to a 2D/3D diffusion equation :math:`u_t = \nabla\cdot (\alpha \nabla u)` for :math:`x\in\Omega`. .. --- begin hint in exercise --- **Hint.** Use integration by parts in multi dimensions: .. math:: \int_\Omega u \nabla\cdot (\alpha\nabla u){\, \mathrm{d}x} = - \int_\Omega \alpha \nabla u\cdot\nabla u{\, \mathrm{d}x} + \int_{\partial\Omega} u \alpha\frac{\partial u}{\partial n}, where :math:`\frac{\partial u}{\partial n} = \boldsymbol{n}\cdot\nabla u`, :math:`\boldsymbol{n}` being the outward unit normal to the boundary :math:`\partial\Omega` of the domain :math:`\Omega`. .. --- end hint in exercise --- **e)** Now we also consider the multi-dimensional PDE :math:`u_t = \nabla\cdot (\alpha \nabla u)`. Integrate both sides over :math:`\Omega` and use Gauss' divergence theorem, :math:`\int_\Omega \nabla\cdot\boldsymbol{q}{\, \mathrm{d}x} = \int_{\partial\Omega}\boldsymbol{q}\cdot\boldsymbol{n}{\, \mathrm{d}s}` for a vector field :math:`\boldsymbol{q}`. Show that if we have homogeneous Neumann conditions on the boundary, :math:`\partial u/\partial n=0`, area under the :math:`u` surface remains constant in time and .. _Eq:diffu:exer:estimates:p4:result: .. math:: \tag{494} \int_{\Omega} u{\, \mathrm{d}x} = \int_{\Omega} I{\, \mathrm{d}x} {\thinspace .} **f)** Establish a code in 1D, 2D, or 3D that can solve a diffusion equation with a source term :math:`f`, initial condition :math:`I`, and zero Dirichlet or Neumann conditions on the whole boundary. We can use :ref:`(493) ` and :ref:`(494) ` as a partial verification of the code. Choose some functions :math:`f` and :math:`I` and check that :ref:`(493) ` is obeyed at any time when zero Dirichlet conditions are used. Iterate over the same :math:`I` functions and check that :ref:`(494) ` is fulfilled when using zero Neumann conditions. **g)** Make a list of some possible bugs in the code, such as indexing errors in arrays, failure to set the correct boundary conditions, evaluation of a term at a wrong time level, and similar. For each of the bugs, see if the verification tests from the previous subexercise pass or fail. This investigation shows how strong the energy estimates and the estimate :ref:`(494) ` are for pointing out errors in the implementation. Filename: ``diffu_energy``. .. --- end exercise --- .. --- begin exercise --- .. _diffu:exer:splitting_prec: Exercise 3.8: Splitting methods and preconditioning --------------------------------------------------- .. index:: Richardson iteration .. index:: preconditioning In the section :ref:`diffu:2D:direct_vs_iter`, we outlined a class of iterative methods for :math:`Au=b` based on splitting :math:`A` into :math:`A=M-N` and introducing the iteration .. math:: Mu^{k} = Nu^k + b{\thinspace .} The very simplest splitting is :math:`M=I`, where :math:`I` is the identity matrix. Show that this choice corresponds to the iteration .. _Eq:diffu:exer:splitting_prec:simplest: .. math:: \tag{495} u^k = u^{k-1} + r^{k-1},\quad r^{k-1} = b - Au^{k-1}, where :math:`r^{k-1}` is the residual in the linear system in iteration :math:`k-1`. The formula :ref:`(495) ` is known as Richardson's iteration. Show that if we apply the simple iteration method :ref:`(495) ` to the *preconditioned* system :math:`M^{-1}Au=M^{-1}b`, we arrive at the Jacobi method by choosing :math:`M=D` (the diagonal of :math:`A`) as preconditioner and the SOR method by choosing :math:`M=\omega^{-1}D + L` (:math:`L` being the lower triangular part of :math:`A`). This equivalence shows that we can apply one iteration of the Jacobi or SOR method as preconditioner. .. removed !bsol ... !esol environment (because of the command-line option --without_solutions) .. --- end exercise --- .. --- begin exercise --- .. _diffu:exer:earthosc: Problem 3.9: Oscillating surface temperature of the earth --------------------------------------------------------- Consider a day-and-night or seasonal variation in temperature at the surface of the earth. How deep down in the ground will the surface oscillations reach? For simplicity, we model only the vertical variation along a coordinate :math:`x`, where :math:`x=0` at the surface, and :math:`x` increases as we go down in the ground. The temperature is governed by the heat equation .. math:: \varrho c_v\frac{\partial T}{\partial t} = \nabla\cdot(k\nabla T), in some spatial domain :math:`x\in [0,L]`, where :math:`L` is chosen large enough such that we can assume that :math:`T` is approximately constant, independent of the surface oscillations, for :math:`x>L`. The parameters :math:`\varrho`, :math:`c_v`, and :math:`k` are the density, the specific heat capacity at constant volume, and the heat conduction coefficient, respectively. **a)** Derive the mathematical model for computing :math:`T(x,t)`. Assume the surface oscillations to be sinusoidal around some mean temperature :math:`T_m`. Let :math:`T=T_m` initially. At :math:`x=L`, assume :math:`T\approx T_m`. .. removed !bsol ... !esol environment (because of the command-line option --without_solutions) **b)** Scale the model in a) assuming :math:`k` is constant. Use a time scale :math:`t_c = \omega^{-1}` and a length scale :math:`x_c = \sqrt{2{\alpha}/\omega}`, where :math:`{\alpha} = k/(\varrho c_v)`. The primary unknown can be scaled as :math:`\frac{T-T_m}{2A}`. Show that the scaled PDE is .. math:: \frac{\partial u}{\partial \bar t} = \frac{1}{2}\frac{\partial^2 u}{\partial x^2}, with initial condition :math:`u(\bar x,0) = 0`, left boundary condition :math:`u(0,\bar t) = \sin(\bar t)`, and right boundary condition :math:`u(\bar L,\bar t) = 0`. The bar indicates a dimensionless quantity. Show that :math:`u(\bar x, \bar t)=e^{-\bar x}\sin (\bar x - \bar t)` is a solution that fulfills the PDE and the boundary condition at :math:`\bar x =0` (this is the solution we will experience as :math:`\bar t\rightarrow\infty` and :math:`L\rightarrow\infty`). Conclude that an appropriate domain for :math:`x` is :math:`[0,4]` if a damping :math:`e^{-4}\approx 0.18` is appropriate for implementing :math:`\bar u\approx\hbox{const}`; increasing to :math:`[0,6]` damps :math:`\bar u` to 0.0025. .. removed !bsol ... !esol environment (because of the command-line option --without_solutions) **c)** Compute the scaled temperature and make animations comparing two solutions with :math:`\bar L=4` and :math:`\bar L=8`, respectively (keep :math:`\Delta x` the same). .. removed !bsol ... !esol environment (because of the command-line option --without_solutions) .. --- end exercise --- .. --- begin exercise --- .. _diffu:exer:bloodflow: Problem 3.10: Oscillating and pulsating flow in tubes ----------------------------------------------------- We consider flow in a straight tube with radius :math:`R` and straight walls. The flow is driven by a pressure gradient :math:`\beta(t)`. The effect of gravity can be neglected. The mathematical problem reads .. _Eq:_auto204: .. math:: \tag{496} \varrho\frac{\partial u}{\partial t} = \mu\frac{1}{r}\frac{\partial}{\partial r}\left( r\frac{\partial u}{\partial r}\right) + \beta(t),\quad r\in [0,R],\ t\in (0,T], .. _Eq:_auto205: .. math:: \tag{497} u(r,0) = I(r),\quad r\in [0,R], .. _Eq:_auto206: .. math:: \tag{498} u(R,t) = 0,\quad t\in (0,T], .. _Eq:_auto207: .. math:: \tag{499} \frac{\partial u}{\partial r}(0,t) = 0,\quad t\in (0,T]. We consider two models for :math:`\beta(t)`. One plain, sinusoidal oscillation: .. _Eq:_auto208: .. math:: \tag{500} \beta = A\sin(\omega t), and one with periodic pulses, .. _Eq:_auto209: .. math:: \tag{501} \beta = A\sin^{16}(\omega t), Note that both models can be written as :math:`\beta = A\sin^m(\omega t)`, with :math:`m=1` and :math:`m=16`, respectively. **a)** Scale the mathematical model, using the viscous time scale :math:`\varrho R^2/\mu`. .. removed !bsol ... !esol environment (because of the command-line option --without_solutions) **b)** Implement the scaled model from a), using the unifying :math:`\theta` scheme in time and centered differences in space. .. removed !bsol ... !esol environment (because of the command-line option --without_solutions) **c)** Verify the implementation in b) using a manufactured solution that is quadratic in :math:`r` and linear in :math:`t`. Make a corresponding test function. .. --- begin hint in exercise --- **Hint.** You need to include an extra source term in the equation to allow for such tests. Let the spatial variation be :math:`1-r^2` such that the boundary condition is fulfilled. .. --- end hint in exercise --- **d)** Make animations for :math:`m=1,16` and :math:`\alpha=1,0.1`. Choose :math:`T` such that the motion has reached a steady state (non-visible changes from period to period in :math:`u`). **e)** For :math:`\alpha\gg 1`, the scaling in a) is not good, because the characteristic time for changes (due to the pressure) is much smaller than the viscous diffusion time scale (:math:`\alpha` becomes large). We should in this case base the short time scale on :math:`1/\omega`. Scale the model again, and make an animation for :math:`m=1,16` and :math:`\alpha = 10`. .. removed !bsol ... !esol environment (because of the command-line option --without_solutions) Filename: ``axisymm_flow``. .. --- end exercise --- .. --- begin exercise --- .. _diffu:exer:welding: Problem 3.11: Scaling a welding problem --------------------------------------- Welding equipment makes a very localized heat source that moves in time. We shall investigate the heating due to welding and choose, for maximum simplicity, a one-dimensional heat equation with a fixed temperature at the ends, and we neglect melting. We shall scale the problem, and besides solving such a problem numerically, the aim is to investigate the appropriateness of alternative scalings. The governing PDE problem reads .. math:: \begin{alignat*}{2} \varrho c\frac{\partial u}{\partial t} &= k\frac{\partial^2 u}{\partial x^2} + f, & x\in (0,L),\ t\in (0,T),\\ u(x,0) &= U_s, & x\in [0,L],\\ u(0,t) = u(L,t) &= 0, & t\in (0,T]. \end{alignat*} Here, :math:`u` is the temperature, :math:`\varrho` the density of the material, :math:`c` a heat capacity, :math:`k` the heat conduction coefficient, :math:`f` is the heat source from the welding equipment, and :math:`U_s` is the initial constant (room) temperature in the material. A possible model for the heat source is a moving Gaussian function: .. math:: f = A\exp{\left(-\frac{1}{2}\left(\frac{x-vt}{\sigma}\right)^2\right)}, where :math:`A` is the strength, :math:`\sigma` is a parameter governing how peak-shaped (or localized in space) the heat source is, and :math:`v` is the velocity (in positive :math:`x` direction) of the source. **a)** Let :math:`x_c`, :math:`t_c`, :math:`u_c`, and :math:`f_c` be scales, i.e., characteristic sizes, of :math:`x`, :math:`t`, :math:`u`, and :math:`f`, respectively. The natural choice of :math:`x_c` and :math:`f_c` is :math:`L` and :math:`A`, since these make the scaled :math:`x` and :math:`f` in the interval :math:`[0,1]`. If each of the three terms in the PDE are equally important, we can find :math:`t_c` and :math:`u_c` by demanding that the coefficients in the scaled PDE are all equal to unity. Perform this scaling. Use scaled quantities in the arguments for the exponential function in :math:`f` too and show that .. math:: \bar f= e^{-\frac{1}{2}\beta^2(\bar x -\gamma \bar t)^2}, where :math:`\beta` and :math:`\gamma` are dimensionless numbers. Give an interpretation of :math:`\beta` and :math:`\gamma`. .. removed !bsol ... !esol environment (because of the command-line option --without_solutions) **b)** Argue that for large :math:`\gamma` we should base the time scale on the movement of the heat source. Show that this gives rise to the scaled PDE .. math:: \frac{\partial\bar u}{\partial\bar t} = \gamma^{-1}\frac{\partial^2\bar u}{\partial\bar x^2} + \bar f, and .. math:: \bar f = \exp{(-\frac{1}{2}\beta^2(\bar x - \bar t)^2)}{\thinspace .} Discuss when the scalings in a) and b) are appropriate. .. removed !bsol ... !esol environment (because of the command-line option --without_solutions) **c)** One aim with scaling is to get a solution that lies in the interval :math:`[-1,1]`. This is not always the case when :math:`u_c` is based on a scale involving a source term, as we do in a) and b). However, from the scaled PDE we realize that if we replace :math:`\bar f` with :math:`\delta\bar f`, where :math:`\delta` is a dimensionless factor, this corresponds to replacing :math:`u_c` by :math:`u_c/\delta`. So, if we observe that :math:`\bar u\sim1/\delta` in simulations, we can just replace :math:`\bar f` by :math:`\delta \bar f` in the scaled PDE. Use this trick and implement the two scaled models. Reuse software for the diffusion equation (e.g., the ``solver`` function in ``diffu1D_vc.py``). Make a function ``run(gamma, beta=10, delta=40, scaling=1, animate=False)`` that runs the model with the given :math:`\gamma`, :math:`\beta`, and :math:`\delta` parameters as well as an indicator ``scaling`` that is 1 for the scaling in a) and 2 for the scaling in b). The last argument can be used to turn screen animations on or off. Experiments show that with :math:`\gamma=1` and :math:`\beta=10`, :math:`\delta =20` is appropriate. Then :math:`\max |\bar u|` will be larger than 4 for :math:`\gamma =40`, but that is acceptable. Equip the ``run`` function with visualization, both animation of :math:`\bar u` and :math:`\bar f`, and plots with :math:`\bar u` and :math:`\bar f` for :math:`t=0.2` and :math:`t=0.5`. .. --- begin hint in exercise --- **Hint.** Since the amplitudes of :math:`\bar u` and :math:`\bar f` differs by a factor :math:`\delta`, it is attractive to plot :math:`\bar f/\delta` together with :math:`\bar u`. .. --- end hint in exercise --- .. removed !bsol ... !esol environment (because of the command-line option --without_solutions) **d)** Use the software in c) to investigate :math:`\gamma=0.2,1,5,40` for the two scalings. Discuss the results. .. removed !bsol ... !esol environment (because of the command-line option --without_solutions) .. ===== Exercise: Radial heat conduction out of offshore pipelines ===== .. Easy to make something out of the ideas/5620/apps/offshore... mekit .. paper where one has a multi-walled radial heat conduction equation. .. Can, as in the paper, use one cell per material. Coupling to soil .. outside with many parameters given. The discussion of the Fourier .. number is interesting - I guess time changes here relates to .. BCs on the inner wall because the gas suddenly has a different .. temperature? Could be a good project perhaps; anyway, the theory .. can be written up. Filename: ``welding``. .. --- end exercise --- .. --- begin exercise --- .. _diffu:exer:axisymm: Exercise 3.12: Implement a Forward Euler scheme for axi-symmetric diffusion --------------------------------------------------------------------------- Based on the discussion in the section :ref:`diffu:fd2:radial`, derive in detail the discrete equations for a Forward Euler in time, centered in space, finite difference method for axi-symmetric diffusion. The diffusion coefficient may be a function of the radial coordinate. At the outer boundary :math:`r=R`, we may have either a Dirichlet or Robin condition. Implement this scheme. Construct appropriate test problems. .. removed !bsol ... !esol environment (because of the command-line option --without_solutions) Filename: ``FE_axisym``. .. --- end exercise ---