.. !split

.. _wave:2D3D:impl:

Implementation  (3)
===================


.. index::
   single: wave equation; 2D, implementation


We shall now describe in detail various Python implementations
for solving a standard 2D, linear wave equation with constant
wave velocity and :math:`u=0` on the
boundary. The wave equation is to be solved
in the space-time domain :math:`\Omega\times (0,T]`,
where :math:`\Omega = (0,L_x)\times (0,L_y)` is a rectangular spatial
domain. More precisely,
the complete initial-boundary value problem is defined by


.. math::
        
        u_t = c^2(u_{xx} + u_{yy}) + f(x,y,t),\quad (x,y)\in \Omega,\ t\in (0,T],
        

.. math::
          
        u(x,y,0) = I(x,y),\quad (x,y)\in\Omega,
        

.. math::
          
        u_t(x,y,0) = V(x,y),\quad (x,y)\in\Omega,
        

.. math::
          
        u = 0,\quad (x,y)\in\partial\Omega,\ t\in (0,T],
        

where :math:`\partial\Omega` is the boundary of :math:`\Omega`, in this case
the four sides of the rectangle :math:`[0,L_x]\times [0,L_y]`:
:math:`x=0`, :math:`x=L_x`, :math:`y=0`, and :math:`y=L_y`.

The PDE is discretized as

.. math::
         [D_t D_t u = c^2(D_xD_x u + D_yD_y u) + f]^n_{i,j},
        

which leads to an explicit updating formula to be implemented in a
program:


.. math::
        
        u^{n+1} = -u^{n-1}_{i,j} + 2u^n_{i,j} + \nonumber
        

.. _Eq:wave:2D3D:impl1:2Du0:ueq:discrete:

.. _Eq:wave:2D3D:impl1:2Du0:ueq:discrete:

.. math::
   :label: wave:2D3D:impl1:2Du0:ueq:discrete
          
        \quad C_x^2(
        u^{n}_{i+1,j} - 2u^{n}_{i,j} + u^{n}_{i-1,j}) + C_y^2
        (u^{n}_{i,j+1} - 2u^{n}_{i,j} + u^{n}_{i,j-1}) + \Delta t^2 f_{i,j}^n,
        
        
for all interior mesh points :math:`i\in{{\mathcal{I^i}_x}}` and
:math:`j\in{{\mathcal{I^i}_y}}`, and for :math:`n\in{{\mathcal{I^+}_t}}`.
The constants :math:`C_x` and :math:`C_y` are defined as


.. math::
         C_x = c\frac{\Delta t}{\Delta x},\quad C_x = c\frac{\Delta t}{\Delta y}
        {\thinspace .}
        

At the boundary we simply set :math:`u^{n+1}_{i,j}=0` for
:math:`i=0`, :math:`j=0,\ldots,N_y`; :math:`i=N_x`, :math:`j=0,\ldots,N_y`;
:math:`j=0`, :math:`i=0,\ldots,N_x`; and :math:`j=N_y`, :math:`i=0,\ldots,N_x`.
For the first step, :math:`n=0`, :eq:`wave:2D3D:impl1:2Du0:ueq:discrete`
is combined with the discretization of the initial condition :math:`u_t=V`,
:math:`[D_{2t} u = V]^0_{i,j}` to obtain a special formula for
:math:`u^1_{i,j}` at the interior mesh points:


.. math::
        
        u^{1} = u^0_{i,j} + \Delta t V_{i,j} + \nonumber
        

.. _Eq:wave:2D3D:impl1:2Du0:ueq:discrete:

.. _Eq:wave:2D3D:impl1:2Du0:ueq:discrete:

.. math::
   :label: wave:2D3D:impl1:2Du0:ueq:discrete
          
        \quad {\frac{1}{2}}C_x^2(
        u^{0}_{i+1,j} - 2u^{0}_{i,j} + u^{0}_{i-1,j}) + {\frac{1}{2}}C_y^2
        (u^{0}_{i,j+1} - 2u^{0}_{i,j} + u^{0}_{i,j-1}) + \frac{1}{2}\Delta t^2f_{i,j}^n,
        
        
The algorithm is very similar to the one in 1D:

1. Set initial condition :math:`u^0_{i,j}=I(x_i,y_j)`

2. Compute :math:`u^1_{i,j}` from :eq:`wave:2D3D:impl1:2Du0:ueq:discrete`

3. Set :math:`u^1_{i,j}=0` for the boundaries :math:`i=0,N_x`, :math:`j=0,N_y`

4. For :math:`n=1,2,\ldots,N_t`:

 1. Find :math:`u^{n+1}_{i,j}` from :eq:`wave:2D3D:impl1:2Du0:ueq:discrete`
    for all internal mesh points, :math:`i\in{{\mathcal{I^i}_x}}`, :math:`j\in{{\mathcal{I^i}_y}}`

 2. Set :math:`u^{n+1}_{i,j}=0` for the boundaries :math:`i=0,N_x`, :math:`j=0,N_y`


.. _wave2D3D:impl:scalar:

Scalar computations
-------------------

The ``solver`` function for a 2D case with constant wave velocity and
:math:`u=0` as boundary condition follows the setup from the similar
function for the 1D case in ``wave1D_u0_s.py``, but there are
a few necessary extensions. The code is in the program
`wave2D_u0.py <http://tinyurl.com/jvzzcfn/wave/wave2D_u0/wave2D_u0.py>`_.

Domain and mesh
~~~~~~~~~~~~~~~

The spatial domain is now :math:`[0,L_x]\times [0,L_y]`, specified
by the arguments ``Lx`` and ``Ly``. Similarly, the number of mesh
points in the :math:`x` and :math:`y` directions,
:math:`N_x` and :math:`N_y`, become the arguments ``Nx`` and ``Ny``.
In multi-dimensional problems it makes less sense to specify a
Courant number as the wave velocity is a vector and the mesh spacings
may differ in the various spatial directions.
We therefore give :math:`\Delta t` explicitly. The signature of
the ``solver`` function is then


.. code-block:: python

        def solver(I, V, f, c, Lx, Ly, Nx, Ny, dt, T,
                   user_action=None, version='scalar'):

Key parameters used in the calculations are created as

.. code-block:: python

        x = linspace(0, Lx, Nx+1)                  # mesh points in x dir
        y = linspace(0, Ly, Ny+1)                  # mesh points in y dir
        dx = x[1] - x[0]
        dy = y[1] - y[0]
        Nt = int(round(T/float(dt)))
        t = linspace(0, N*dt, N+1)                 # mesh points in time
        Cx2 = (c*dt/dx)**2;  Cy2 = (c*dt/dy)**2    # help variables
        dt2 = dt**2


Solution arrays
~~~~~~~~~~~~~~~

We store :math:`u^{n+1}_{i,j}`, :math:`u^{n}_{i,j}`, and
:math:`u^{n-1}_{i,j}` in three two-dimensional arrays,


.. code-block:: python

        u   = zeros((Nx+1,Ny+1))   # solution array
        u_1 = zeros((Nx+1,Ny+1))   # solution at t-dt
        u_2 = zeros((Nx+1,Ny+1))   # solution at t-2*dt

where :math:`u^{n+1}_{i,j}` corresponds to ``u[i,j]``,
:math:`u^{n}_{i,j}` to ``u_1[i,j]``, and
:math:`u^{n-1}_{i,j}` to ``u_2[i,j]``


.. index:: index set notation


Index sets
~~~~~~~~~~

It is also convenient to introduce the index sets (cf. The section :ref:`wave:indexset`)


.. code-block:: python

        Ix = range(0, u.shape[0])
        Iy = range(0, u.shape[1])
        It = range(0, t.shape[0])


Computing the solution
~~~~~~~~~~~~~~~~~~~~~~

Inserting the initial
condition ``I`` in ``u_1`` and making a callback to the user in terms of
the ``user_action`` function is a straightforward generalization of
the 1D code from the section :ref:`wave:string:impl`:


.. code-block:: python

        for i in Ix:
            for j in Iy:
                u_1[i,j] = I(x[i], y[j])
        
        if user_action is not None:
            user_action(u_1, x, xv, y, yv, t, 0)

The ``user_action`` function has additional arguments compared to the
1D case. The arguments ``xv`` and ``yv`` fact will be commented
upon in the section :ref:`wave2D3D:impl:vectorized`.

The key finite difference formula :ref:`(6.5) <Eq:wave:2D3D:models:unp1>`
for updating the solution at
a time level is implemented in a separate function as


.. code-block:: python

        def advance_scalar(u, u_1, u_2, f, x, y, t, n, Cx2, Cy2, dt,
                           V=None, step1=False):
            Ix = range(0, u.shape[0]);  Iy = range(0, u.shape[1])
            dt2 = dt**2
            if step1:
                Cx2 = 0.5*Cx2;  Cy2 = 0.5*Cy2; dt2 = 0.5*dt2
                D1 = 1;  D2 = 0
            else:
                D1 = 2;  D2 = 1
            for i in Ix[1:-1]:
                for j in Iy[1:-1]:
                    u_xx = u_1[i-1,j] - 2*u_1[i,j] + u_1[i+1,j]
                    u_yy = u_1[i,j-1] - 2*u_1[i,j] + u_1[i,j+1]
                    u[i,j] = D1*u_1[i,j] - D2*u_2[i,j] + \ 
                             Cx2*u_xx + Cy2*u_yy + dt2*f(x[i], y[j], t[n])
                    if step1:
                        u[i,j] += dt*V(x[i], y[j])
            # Boundary condition u=0
            j = Iy[0]
            for i in Ix: u[i,j] = 0
            j = Iy[-1]
            for i in Ix: u[i,j] = 0
            i = Ix[0]
            for j in Iy: u[i,j] = 0
            i = Ix[-1]
            for j in Iy: u[i,j] = 0
            return u

The ``step1`` variable has been introduced to allow the formula to be
reused for first step :math:`u^1_{i,j}`:


.. code-block:: python

        u = advance_scalar(u, u_1, u_2, f, x, y, t,
                           n, Cx2, Cy2, dt, V, step1=True)

Below, we will make many alternative implementations of the
``advance_scalar`` function to speed up the code since most of
the CPU time in simulations is spent in this function.


.. _wave2D3D:impl:vectorized:

Vectorized computations
-----------------------

The scalar code above turns out to be extremely slow for large 2D
meshes, and probably useless in 3D beyond debugging of small test cases.
Vectorization is therefore a must for multi-dimensional
finite difference computations in Python. For example,
with a mesh consisting of :math:`30\times 30` cells, vectorization
brings down the CPU time by a factor of 70 (!).

In the vectorized case we must be able to evaluate user-given functions
like :math:`I(x,y)` and :math:`f(x,y,t)`, provided as Python functions ``I(x,y)``
and ``f(x,y,t)``, for the entire mesh in one array operation.
Having the one-dimensional coordinate arrays ``x`` and ``y`` is not
sufficient: these must be extended to vectorized versions,


.. code-block:: python

        from numpy import newaxis
        xv = x[:,newaxis]
        yv = y[newaxis,:]
        # or
        xv = x.reshape((x.size, 1))
        yv = y.reshape((1, y.size))

This is a standard required technique when evaluating functions over
a 2D mesh, say ``sin(xv)*cos(xv)``, which then gives a result with shape
``(Nx+1,Ny+1)``.

With the ``xv`` and ``yv`` arrays for vectorized computing,
setting the initial condition is just a matter of


.. code-block:: python

        u_1[:,:] = I(xv, yv)

One could also have written ``u_1 = I(xv, yv)`` and let ``u_1`` point
to a new object, but vectorized operations often makes use of
direct insertion in the original array through ``u_1[:,:]`` because
sometimes not all of the array is to be filled by such a function
evaluation. This is the case with the computational scheme for :math:`u^{n+1}_{i,j}`:


.. code-block:: python

        def advance_vectorized(u, u_1, u_2, f_a, Cx2, Cy2, dt,
                               V=None, step1=False):
            dt2 = dt**2
            if step1:
                Cx2 = 0.5*Cx2;  Cy2 = 0.5*Cy2; dt2 = 0.5*dt2
                D1 = 1;  D2 = 0
            else:
                D1 = 2;  D2 = 1
            u_xx = u_1[:-2,1:-1] - 2*u_1[1:-1,1:-1] + u_1[2:,1:-1]
            u_yy = u_1[1:-1,:-2] - 2*u_1[1:-1,1:-1] + u_1[1:-1,2:]
            u[1:-1,1:-1] = D1*u_1[1:-1,1:-1] - D2*u_2[1:-1,1:-1] + \ 
                           Cx2*u_xx + Cy2*u_yy + dt2*f_a[1:-1,1:-1]
            if step1:
                u[1:-1,1:-1] += dt*V[1:-1, 1:-1]
            # Boundary condition u=0
            j = 0
            u[:,j] = 0
            j = u.shape[1]-1
            u[:,j] = 0
            i = 0
            u[i,:] = 0
            i = u.shape[0]-1
            u[i,:] = 0
            return u


Array slices in 2D are more complicated to understand than those in
1D, but the logic from 1D applies to each dimension separately.
For example, when doing :math:`u^{n}_{i,j} - u^{n}_{i-1,j}` for :math:`i\in{{\mathcal{I^+}_x}}`,
we just keep ``j`` constant and make a slice in the first index:
``u_1[1:,j] - u_1[:-1,j]``, exactly as in 1D. The ``1:`` slice
specifies all the indices :math:`i=1,2,\ldots,N_x` (up to the last
valid index),
while ``:-1`` specifies the relevant indices for the second term:
:math:`0,1,\ldots,N_x-1` (up to, but not including the last index).

In the above code segment, the situation is slightly more complicated,
because each displaced slice in one direction is
accompanied by a ``1:-1`` slice in the other direction. The reason is
that we only work with the internal points for the index that is
kept constant in a difference.

The boundary conditions along the four sides makes use of
a slice consisting of all indices along a boundary:


.. code-block:: python

        u[: ,0] = 0
        u[:,Ny] = 0
        u[0 ,:] = 0
        u[Nx,:] = 0


The ``f`` function is in the above vectorized update of ``u`` first computed
as an array over all mesh points:

.. code-block:: text


        f_a = f(xv, yv, t[n])

We could, alternatively, used the call ``f(xv, yv, t[n])[1:-1,1:-1]``
in the last term of the update statement, but other implementations
in compiled languages benefit from having ``f`` available in an array
rather than calling our Python function ``f(x,y,t)`` for
every point.

Also in the ``advance_vectorized`` function we have introduced a
boolean ``step1`` to reuse the formula for the first time step
in the same way as we did with ``advance_scalar``.
We refer to the ``solver`` function in ``wave2D_u0.py``
for the details on how the overall algorithm is implemented.

The callback function now has the arguments
``u, x, xv, y, yv, t, n``. The inclusion of ``xv`` and ``yv`` makes it
easy to, e.g., compute an exact 2D solution in the callback function
and compute errors, through an expression like
``u - exact_solution(xv, yv, t[n])``.

.. _wave2D3D:impl:verify:

Verification  (3)
-----------------

Testing a quadratic solution
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The 1D solution from the section :ref:`wave:pde2:fd:verify:quadratic` can be
generalized to multi-dimensions and provides a test case where the
exact solution also fulfills the discrete equations such that we know
(to machine precision) what numbers the solver function should
produce. In 2D we use the following generalization of
:ref:`(2.21) <Eq:wave:pde2:fd:verify:quadratic:uex>`:


.. _Eq:wave2D3D:impl:verify:quadratic:

.. math::
   :label: wave2D3D:impl:verify:quadratic
        
        {u_{\small\mbox{e}}}(x,y,t) = x(L_x-x)y(L_y-y)(1+{\frac{1}{2}}t)
        {\thinspace .}
        
        
This solution fulfills the PDE problem if :math:`I(x,y)={u_{\small\mbox{e}}}(x,y,0)`,
:math:`V=\frac{1}{2}{u_{\small\mbox{e}}}(x,y,0)`, and :math:`f=2c^2(1+{\frac{1}{2}}t)(y(L_y-y) +
x(L_x-x))`. To show that :math:`{u_{\small\mbox{e}}}` also solves the discrete equations,
we start with the general results :math:`[D_t D_t 1]^n=0`, :math:`[D_t D_t t]^n=0`,
and :math:`[D_t D_t t^2]=2`, and use these to compute


.. math::
         [D_xD_x {u_{\small\mbox{e}}}]^n_{i,j} = [y(L_y-y)(1+{\frac{1}{2}}t) D_xD_x x(L_x-x)]^n_{i,j}
        = y_j(L_y-y_j)(1+{\frac{1}{2}}t_n)2{\thinspace .}
        

A similar calculation must be carried out for the :math:`[D_yD_y
{u_{\small\mbox{e}}}]^n_{i,j}` and :math:`[D_tD_t {u_{\small\mbox{e}}}]^n_{i,j}` terms.  One must also show
that the quadratic solution fits the special formula for
:math:`u^1_{i,j}`. The details are left as :ref:`wave:exer:quadratic:2D`.
The ``test_quadratic`` function in the
`wave2D_u0.py <http://tinyurl.com/jvzzcfn/wave/wave2D_u0/wave2D_u0.py>`_
program implements this verification as a nose test.


.. _wave2D3D:impl:Cython:

Migrating loops to Cython
=========================


.. index:: Cython


Although vectorization can bring down the CPU time dramatically
compared with scalar code, there is still some factor 5-10 to win
in these types of applications
by implementing the finite difference scheme in compiled code,
typically in Fortran, C, or C++. This can quite easily be done by
adding a little extra code to our program. Cython is an extension of
Python that offers the easiest way to nail our Python loops in the
scalar code down to machine code and the efficiency of C.

Cython can be viewed as an extended Python language where variables are
declared with types and where functions are marked to be implemented in C.
Migrating Python code to Cython is done by copying the desired code
segments to functions (or classes) and placing them in one or more separate
files with extension ``.pyx``.

Declaring variables and annotating the code
-------------------------------------------

Our starting point is the plain ``advance_scalar`` function for a scalar
implementation of the updating algorithm for new values
:math:`u^{n+1}_{i,j}`:


.. code-block:: python

        def advance_scalar(u, u_1, u_2, f, x, y, t, n, Cx2, Cy2, dt,
                           V=None, step1=False):
            Ix = range(0, u.shape[0]);  Iy = range(0, u.shape[1])
            dt2 = dt**2
            if step1:
                Cx2 = 0.5*Cx2;  Cy2 = 0.5*Cy2; dt2 = 0.5*dt2
                D1 = 1;  D2 = 0
            else:
                D1 = 2;  D2 = 1
            for i in Ix[1:-1]:
                for j in Iy[1:-1]:
                    u_xx = u_1[i-1,j] - 2*u_1[i,j] + u_1[i+1,j]
                    u_yy = u_1[i,j-1] - 2*u_1[i,j] + u_1[i,j+1]
                    u[i,j] = D1*u_1[i,j] - D2*u_2[i,j] + \ 
                             Cx2*u_xx + Cy2*u_yy + dt2*f(x[i], y[j], t[n])
                    if step1:
                        u[i,j] += dt*V(x[i], y[j])
            # Boundary condition u=0
            j = Iy[0]
            for i in Ix: u[i,j] = 0
            j = Iy[-1]
            for i in Ix: u[i,j] = 0
            i = Ix[0]
            for j in Iy: u[i,j] = 0
            i = Ix[-1]
            for j in Iy: u[i,j] = 0
            return u


We simply take
a copy of this function and put it in a file ``wave2D_u0_loop_cy.pyx``.
The relevant Cython implementation arises from declaring variables with
types and adding some important annotations to speed up array
computing in Cython. Let us first list the complete code in the
``.pyx`` file:


.. code-block:: cython

        import numpy as np
        cimport numpy as np
        cimport cython
        ctypedef np.float64_t DT    # data type
        
        @cython.boundscheck(False)  # turn off array bounds check
        @cython.wraparound(False)   # turn off negative indices (u[-1,-1])
        cpdef advance(
            np.ndarray[DT, ndim=2, mode='c'] u,
            np.ndarray[DT, ndim=2, mode='c'] u_1,
            np.ndarray[DT, ndim=2, mode='c'] u_2,
            np.ndarray[DT, ndim=2, mode='c'] f,
            double Cx2, double Cy2, double dt2):
        
            cdef int Ix_start = 0
            cdef int Iy_start = 0
            cdef int Ix_end = u.shape[0]-1
            cdef int Iy_end = u.shape[1]-1
            cdef int i, j
            cdef double u_xx, u_yy
        
            for i in range(Ix_start+1, Ix_end):
                for j in range(Iy_start+1, Iy_end):
                    u_xx = u_1[i-1,j] - 2*u_1[i,j] + u_1[i+1,j]
                    u_yy = u_1[i,j-1] - 2*u_1[i,j] + u_1[i,j+1]
                    u[i,j] = 2*u_1[i,j] - u_2[i,j] + \
                             Cx2*u_xx + Cy2*u_yy + dt2*f[i,j]
            # Boundary condition u=0
            j = Iy_start
            for i in range(Ix_start, Ix_end+1): u[i,j] = 0
            j = Iy_end
            for i in range(Ix_start, Ix_end+1): u[i,j] = 0
            i = Ix_start
            for j in range(Iy_start, Iy_end+1): u[i,j] = 0
            i = Iy_end
            for j in range(Iy_start, Iy_end+1): u[i,j] = 0
            return u


.. index:: declaration of variables in Cython


This example may act as a recipe on how to transform array-intensive
code with loops into Cython.

1. Variables are declared with types: for example,
   ``double v`` in the argument list instead of just ``v``, and ``cdef double v``
   for a variable ``v`` in the body of the function.
   A Python ``float`` object is declared as ``double`` for
   translation to C by Cython, while an ``int`` object is
   declared by ``int``.

2. Arrays need a comprehensive type declaration involving

   * the type ``np.ndarray``,

   * the data type of the elements, here 64-bit floats,
     abbreviated as ``DT`` through ``ctypedef np.float64_t DT``
     (instead of ``DT`` we could use the full name of the
     data type: ``np.float64_t``, which is a Cython-defined type),

   * the dimensions of the array, here ``ndim=2`` and ``ndim=1``,

   * specification of contiguous memory for the array (``mode='c'``).


3. Functions declared with ``cpdef`` are translated to C but also
   accessible from Python.

4. In addition to the standard ``numpy`` import we also need a special
   Cython import of ``numpy``: ``cimport numpy as np``, to appear *after*
   the standard import.

5. By default, array indices are checked to be within their legal
   limits. To speed up the code one should turn off this feature
   for a specific function by placing ``@cython.boundscheck(False)``
   above the function header.

6. Also by default, array indices can be negative (counting from the
   end), but this feature has a performance penalty and is therefore
   here turned off by writing ``@cython.wraparound(False)`` right above
   the function header.

7. The use of index sets ``Ix`` and ``Iy`` in the scalar code cannot be
   successfully translated to C. One reason is that constructions like
   ``Ix[1:-1]`` involve negative indices, and these are now turned
   off. Another reason is that Cython loops must take the form ``for i
   in xrange`` or ``for i in range`` for being translated into efficient
   C loops. We have therefore introduced ``Ix_start`` as ``Ix[0]``
   and ``Ix_end`` as ``Ix[-1]`` to hold the start and end of the values
   of index :math:`i`. Similar variables are introduced for the :math:`j` index.
   A loop ``for i in Ix`` is with these new variables written as
   ``for i in range(Ix_start, Ix_end+1)``.


.. admonition:: Array declaration syntax in Cython

   We have used the syntax ``np.ndarray[DT, ndim=2, mode='c']`` to declare
   ``numpy`` arrays in Cython. There is a simpler, alternative syntax, employing
   `typed memory views <http://docs.cython.org/src/userguide/memoryviews.html>`_,
   where the declaration looks like ``double [:,:]``.
   However, the full support for this functionality is not yet ready, and
   in this text we use the full array declaration syntax.


Visual inspection of the C translation
--------------------------------------


.. index:: cython -a (Python-C translation in HTML)


Cython can visually explain how successfully it can translate a code from
Python to C. The command


.. code-block:: console

        Terminal> cython -a wave2D_u0_loop_cy.pyx

produces an HTML file ``wave2D_u0_loop_cy.html``, which can be loaded into
a web browser to illustrate which lines of the code that have been
translated to C. Figure :ref:`wave:2D3D:impl:fig:cython:ma1` shows
the illustrated code. Yellow lines indicate the lines that Cython did not manage
to translate to efficient C code and that remain in Python.
For the present code we see that Cython is able to translate all the
loops with array computing to C, which is our primary goal.


.. _wave:2D3D:impl:fig:cython:ma1:

.. figure:: wave2D_u0_loop_cy1.png
   :width: 500

   *Visual illustration of Cython's ability to translate Python to C*


You can also inspect the generated C code directly, as it appears
in the file ``wave2D_u0_loop_cy.c``. Nevertheless,
understanding this C code requires some
familiarity with writing Python extension modules in C by hand.
Deep down in the file we can see in detail how the compute-intensive
statements are translated some complex C code that is quite different from
what we a human would write (at least if a direct correspondence to
the mathematics was in mind).


Building the extension module  (1)
----------------------------------


.. index:: C extension module


.. index:: setup.py


.. index:: distutils


Cython code must be translated to C, compiled, and linked to form what
is known in the Python world as a *C extension module*.
This is usually done by making a ``setup.py`` script, which
is the standard way of building and installing Python software.
For an extension module arising from Cython code, the following
``setup.py`` script is all we need to build and install the module:


.. code-block:: python

        from distutils.core import setup
        from distutils.extension import Extension
        from Cython.Distutils import build_ext
        
        cymodule = 'wave2D_u0_loop_cy'
        setup(
          name=cymodule
          ext_modules=[Extension(cymodule, [cymodule + '.pyx'],)],
          cmdclass={'build_ext': build_ext},
        )

We run the script by


.. code-block:: console

        Terminal> python setup.py build_ext --inplace

The ``--inplace`` option makes the extension module available in the
current directory as the file ``wave2D_u0_loop_cy.so``. This
file acts as a normal Python module that can be imported and inspected:


        >>> import wave2D_u0_loop_cy
        >>> dir(wave2D_u0_loop_cy)
        ['__builtins__', '__doc__', '__file__', '__name__',
         '__package__', '__test__', 'advance', 'np']

The important output from the ``dir`` function is our Cython function
``advance`` (the module also features the imported ``numpy`` module under
the name ``np`` as well as many standard Python objects with double
underscores in their names).

The ``setup.py`` file makes use of the ``distutils`` package in Python
and Cython's extension of this package.
These tools know how Python was built on the computer and will
use compatible compiler(s) and options when building other code
in Cython, C, or C++. Quite some experience with building large
program systems is needed to do the build process manually, so using
a ``setup.py`` script is strongly recommended.


.. admonition:: Simplified build of a Cython module

   When there is no need to link the C code with special libraries,
   Cython offers a shortcut for generating and importing the extension
   module:
   
   
   .. code-block:: python
   
           import pyximport; pyximport.install()
   
   This makes the ``setup.py`` script redundant. However, in the ``wave2D_u0.py``
   code we do not use ``pyximport`` and require an explicit build process
   of this and many other modules.


Calling the Cython function from Python
---------------------------------------

The ``wave2D_u0_loop_cy``
module contains our ``advance`` function, which we now may call from
the Python program for the wave equation:


.. code-block:: python

        import wave2D_u0_loop_cy
        advance = wave2D_u0_loop_cy.advance
        ...
        for n in It[1:-1:                  # time loop
            f_a[:,:] = f(xv, yv, t[n])     # precompute, size as u
            u = advance(u, u_1, u_2, f_a, x, y, t, Cx2, Cy2, dt2)


Efficiency  (1)
~~~~~~~~~~~~~~~

For a mesh consisting of :math:`120\times 120` cells, the scalar Python code
require 1370 CPU time units, the vectorized version requires 5.5,
while the Cython version requires only 1! For a smaller mesh with
:math:`60\times 60` cells Cython is about 1000 times faster than the scalar
Python code, and the vectorized version is about 6 times slower than
the Cython version.

.. In 3D these numbers are even more favorable.


Migrating loops to Fortran
==========================

Instead of relying on Cython's (excellent) ability to translate Python to C,
we can invoke a compiled language directly and write the loops ourselves.
Let us start with Fortran 77, because this is a language with more
convenient array handling than C (or plain C++). Or more precisely,
we can with ease program with the same multi-dimensional indices
in the Fortran code as in the ``numpy``
arrays in the Python code, while in C these arrays are
one-dimensional and requires us to reduce multi-dimensional indices
to a single index.

.. Fortran compilers

.. build on 60 years of intensive research on how to optimize loops with

.. array computations.


The Fortran subroutine
----------------------


.. index:: wrapper code


.. index:: Fortran subroutine


We write a Fortran subroutine ``advance`` in a file
`wave2D_u0_loop_f77.f <http://tinyurl.com/jvzzcfn/wave/wave2D_u0/wave2D_u0_loop_f77.f>`_
for implementing the updating formula
:eq:`wave:2D3D:impl1:2Du0:ueq:discrete` and setting the solution to zero
at the boundaries:


.. code-block:: fortran

              subroutine advance(u, u_1, u_2, f, Cx2, Cy2, dt2, Nx, Ny)
              integer Nx, Ny
              real*8 u(0:Nx,0:Ny), u_1(0:Nx,0:Ny), u_2(0:Nx,0:Ny)
              real*8 f(0:Nx, 0:Ny), Cx2, Cy2, dt2
              integer i, j
        Cf2py intent(in, out) u
        
        C     Scheme at interior points
              do j = 1, Ny-1
                 do i = 1, Nx-1
                    u(i,j) = 2*u_1(i,j) - u_2(i,j) +
             &      Cx2*(u_1(i-1,j) - 2*u_1(i,j) + u_1(i+1,j)) +
             &      Cy2*(u_1(i,j-1) - 2*u_1(i,j) + u_1(i,j+1)) +
             &      dt2*f(i,j)
                 end do
              end do
        
        C     Boundary conditions
              j = 0
              do i = 0, Nx
                 u(i,j) = 0
              end do
              j = Ny
              do i = 0, Nx
                 u(i,j) = 0
              end do
              i = 0
              do j = 0, Ny
                 u(i,j) = 0
              end do
              i = Nx
              do j = 0, Ny
                 u(i,j) = 0
              end do
              return
              end

This code is plain Fortran 77, except for the special ``Cf2py`` comment
line, which here specifies that ``u`` is both an input argument *and*
an object to be returned from the ``advance`` routine. Or more
precisely, Fortran is not able return an array from a function,
but we need a *wrapper code* in C for the Fortran subroutine to enable
calling it from Python, and in this wrapper code one can return ``u``
to the calling Python code.


.. admonition:: Remark

   It is not strictly necessary to return ``u`` to the calling Python
   code since the ``advance`` function will modify the elements of ``u``,
   but the convention in Python is to get all output from a function
   as returned values. That is, the right way of calling the above
   Fortran subroutine from Python is
   
   
   .. code-block:: python
   
           u = advance(u, u_1, u_2, f, Cx2, Cy2, dt2)
   
   The less encouraged style, which works and resembles the way the
   Fortran subroutine is called from Fortran, reads
   
   
   .. code-block:: python
   
           advance(u, u_1, u_2, f, Cx2, Cy2, dt2)


Building the Fortran module with f2py
-------------------------------------

The nice feature of writing loops in Fortran is that the tool ``f2py``
can with very little work produce a C extension module such that
we can call the Fortran version of ``advance`` from Python.
The necessary commands to run are


.. code-block:: console

        Terminal> f2py -m wave2D_u0_loop_f77 -h wave2D_u0_loop_f77.pyf \ 
                  --overwrite-signature wave2D_u0_loop_f77.f
        Terminal> f2py -c wave2D_u0_loop_f77.pyf --build-dir build_f77 \ 
                  -DF2PY_REPORT_ON_ARRAY_COPY=1 wave2D_u0_loop_f77.f

The first command asks ``f2py`` to interpret the Fortran code and make
a Fortran 90
specification of the extension module in the file
``wave2D_u0_loop_f77.pyf``. The second command makes
``f2py`` generate all necessary
wrapper code, compile our Fortran file and the wrapper code,
and finally build the module.
The build process takes place in the specified subdirectory ``build_f77``
so that files can be inspected if something goes wrong.
The option ``-DF2PY_REPORT_ON_ARRAY_COPY=1`` makes ``f2py`` write a message
for every array that is copied in the communication between Fortran and Python,
which is very useful for avoiding unnecessary array copying (see below).
The name of the module file
is ``wave2D_u0_loop_f77.so``, and this file can be imported and inspected
as any other
Python module:


        >>> import wave2D_u0_loop_f77
        >>> dir(wave2D_u0_loop_f77)
        ['__doc__', '__file__', '__name__', '__package__',
         '__version__', 'advance']
        >>> print wave2D_u0_loop_f77.__doc__
        This module 'wave2D_u0_loop_f77' is auto-generated with f2py....
        Functions:
          u = advance(u,u_1,u_2,f,cx2,cy2,dt2,
              nx=(shape(u,0)-1),ny=(shape(u,1)-1))


.. admonition:: Examine the doc strings

   Printing the doc strings of the module and its functions is
   extremely important after having created a module with ``f2py``,
   because ``f2py`` makes Python interfaces to the Fortran functions
   that are different from how the functions are declared in
   the Fortran code (!). The rationale for this behavior is that
   ``f2py`` creates *Pythonic* interfaces such that Fortran routines
   can be called in the same way as one calls Python functions.
   Output data from Python functions is always returned
   to the calling code, but this is technically impossible in Fortran.
   Also, arrays in Python are passed to Python functions without
   their dimensions because that information is packed with the array
   data in the array objects, but this is not possible in Fortran.
   Therefore, ``f2py`` removes array dimensions from the argument list,
   and ``f2py`` makes it possible to
   return objects back to Python.


Let us follow the advice of examining the doc strings
and take a close look at
the documentation ``f2py`` has generated for our Fortran ``advance``
subroutine:


        >>> print wave2D_u0_loop_f77.advance.__doc__
        This module 'wave2D_u0_loop_f77' is auto-generated with f2py
        Functions:
          u = advance(u,u_1,u_2,f,cx2,cy2,dt2,
                      nx=(shape(u,0)-1),ny=(shape(u,1)-1))
        .
        advance - Function signature:
          u = advance(u,u_1,u_2,f,cx2,cy2,dt2,[nx,ny])
        Required arguments:
          u : input rank-2 array('d') with bounds (nx + 1,ny + 1)
          u_1 : input rank-2 array('d') with bounds (nx + 1,ny + 1)
          u_2 : input rank-2 array('d') with bounds (nx + 1,ny + 1)
          f : input rank-2 array('d') with bounds (nx + 1,ny + 1)
          cx2 : input float
          cy2 : input float
          dt2 : input float
        Optional arguments:
          nx := (shape(u,0)-1) input int
          ny := (shape(u,1)-1) input int
        Return objects:
          u : rank-2 array('d') with bounds (nx + 1,ny + 1)

Here we see that the ``nx`` and ``ny`` parameters declared in
Fortran are optional arguments that can be omitted when calling
``advance`` from Python.

We strongly recommend to print out the
documentation of *every* Fortran function to be called from Python
and make sure the call syntax is exactly as listed in the
documentation.


How to avoid array copying
--------------------------


.. index:: row-major ordering


.. index:: column-major ordering


.. index:: Fortran array storage


.. index:: C/Python array storage


Multi-dimensional arrays are stored as a stream of numbers in memory.
For a two-dimensional array consisting of rows and columns there are
two ways of creating such a stream: *row-major ordering*, which means
that rows are stored consecutively in memory, or *column-major
ordering*, which means that the columns are stored one after each other.
All programming languages inherited from C, including Python, apply
the row-major ordering, but Fortran uses column-major storage.
Thinking of a two-dimensional array in Python or C
as a matrix, it means that Fortran
works with the transposed matrix.

Fortunately, ``f2py`` creates extra code so that accessing ``u(i,j)`` in
the Fortran subroutine corresponds to the element ``u[i,j]`` in the
underlying ``numpy`` array (without the extra code, ``u(i,j)`` in Fortran
would access ``u[j,i]`` in the ``numpy`` array).  Technically, ``f2py``
takes a copy of our ``numpy`` array and reorders the data before
sending the array to Fortran. Such copying can be costly. For 2D wave
simulations on a :math:`60\times 60` grid the overhead of copying is a
factor of 5, which means that almost the whole performance gain of
Fortran over vectorized ``numpy`` code is lost!

To avoid having ``f2py`` to copy
arrays with C storage to the corresponding Fortran storage, we declare
the arrays with Fortran storage:


.. code-block:: python

        order = 'Fortran' if version == 'f77' else 'C'
        u   = zeros((Nx+1,Ny+1), order=order)   # solution array
        u_1 = zeros((Nx+1,Ny+1), order=order)   # solution at t-dt
        u_2 = zeros((Nx+1,Ny+1), order=order)   # solution at t-2*dt


In the compile and build step of using ``f2py``, it is recommended to add
an extra option for making ``f2py`` report on array copying:


.. code-block:: console

        Terminal> f2py -c wave2D_u0_loop_f77.pyf --build-dir build_f77 \ 
                  -DF2PY_REPORT_ON_ARRAY_COPY=1 wave2D_u0_loop_f77.f


It can sometimes be a challenge to track down which array that causes
a copying. There are two principal reasons for copying array data:
either the array does not have Fortran storage or the element types do
not match those declared in the Fortran code. The latter cause is
usually effectively eliminated by using ``real*8`` data in the Fortran
code and ``float64`` (the default ``float`` type in ``numpy``) in the arrays
on the Python side. The former reason is more common, and to check
whether an array before a Fortran call has the right storage one can
print the result of ``isfortran(a)``, which is ``True`` if the array ``a``
has Fortran storage.

Let us look at an example where we face problems with array storage.
A typical problem in the ``wave2D_u0.py`` code is
to set


.. code-block:: python

        f_a = f(xv, yv, t[n])

before the call to the Fortran ``advance`` routine. This computation creates
a new array with C storage. An undesired copy of ``f_a`` will be produced
when sending ``f_a`` to a Fortran routine.
There are two remedies, either direct insertion
of data in an array with Fortran storage,

.. code-block:: python

        f_a = zeros((Nx+1, Ny+1), order='Fortran')
        ...
        f_a[:,:] = f(xv, yv, t[n])

or remaking the ``f(xv, yv, t[n])`` array,

.. code-block:: python

        f_a = asarray(f(xv, yv, t[n]), order='Fortran')

The former remedy is most efficient if the ``asarray`` operation is to
be performed a large number of times.

Efficiency  (2)
~~~~~~~~~~~~~~~

The efficiency of this Fortran code is very similar to the Cython code.
There is usually nothing more to gain, from a computational efficiency
point of view, by implementing the *complete* Python program in Fortran
or C. That will just be a lot more code for all administering work
that is needed in scientific software, especially if we extend our
sample program ``wave2D_u0.py`` to handle a real scientific problem.
Then only a small portion will consist of loops with intensive
array calculations. These can be migrated to Cython or Fortran as
explained, while the rest of the programming can be more conveniently
done in Python.


Migrating loops to C via Cython
===============================

The computationally intensive loops can alternatively be implemented
in C code. Just as Fortran calls for care regarding the storage of
two-dimensional arrays, working with two-dimensional arrays in C
is a bit tricky. The reason is that
``numpy`` arrays are viewed as one-dimensional arrays when
transferred to C, while C programmers will think of ``u``, ``u_1``, and
``u_2`` as two dimensional arrays and index them like ``u[i][j]``.
The C code must declare ``u`` as ``double* u`` and translate an index
pair ``[i][j]`` to a corresponding single index when ``u`` is
viewed as one-dimensional. This translation requires knowledge of
how the numbers in ``u`` are stored in memory.

Translating index pairs to single indices
-----------------------------------------

Two-dimensional ``numpy`` arrays with the default C storage are stored
row by row. In general, multi-dimensional arrays with C storage are
stored such that the last index has the fastest variation, then the
next last index, and so on, ending up with the slowest variation
in the first index. For a two-dimensional ``u`` declared as
``zeros((Nx+1,Ny+1))`` in Python, the individual elements are stored
in the following order:


.. code-block:: python

        u[0,0], u[0,1], u[0,2], ..., u[0,Ny], u[1,0], u[1,1], ...,
        u[1,Ny], u[2,0], ..., u[Nx,0], u[Nx,1], ..., u[Nx, Ny]


Viewing ``u`` as one-dimensional, the index pair :math:`(i,j)` translates
to :math:`i(N_y+1)+j`. So, where a C programmer would naturally write
an index ``u[i][j]``, the indexing must read ``u[i*(Ny+1) + j]``.
This is tedious to write, so it can be handy to define a C macro,


.. code-block:: c

        #define idx(i,j) (i)*(Ny+1) + j

so that we can write ``u[idx(i,j)]``, which reads much better and is
easier to debug.


.. admonition:: Be careful with macro definitions

   Macros just perform simple text substitutions:
   ``idx(hello,world)`` is expanded to ``(hello)*(Ny+1) + world``.
   The parenthesis in ``(i)`` are essential - using the natural mathematical
   formula ``i*(Ny+1) + j`` in the macro definition,
   ``idx(i-1,j)`` would expand to ``i-1*(Ny+1) + j``, which is the wrong
   formula. Macros are handy, but requires careful use.
   In C++, inline functions are safer and replace the need for macros.


The complete C code
-------------------

The C version of our function ``advance`` can be coded as follows.


.. code-block:: c

        #define idx(i,j) (i)*(Ny+1) + j
        
        void advance(double* u, double* u_1, double* u_2, double* f,
        	     double Cx2, double Cy2, double dt2,
        	     int Nx, int Ny)
        {
          int i, j;
          /* Scheme at interior points */
          for (i=1; i<=Nx-1; i++) {
            for (j=1; j<=Ny-1; j++) {
                u[idx(i,j)] = 2*u_1[idx(i,j)] - u_2[idx(i,j)] +
                Cx2*(u_1[idx(i-1,j)] - 2*u_1[idx(i,j)] + u_1[idx(i+1,j)]) +
                Cy2*(u_1[idx(i,j-1)] - 2*u_1[idx(i,j)] + u_1[idx(i,j+1)]) +
                dt2*f[idx(i,j)];
        	}
            }
          }
          /* Boundary conditions */
          j = 0;  for (i=0; i<=Nx; i++) u[idx(i,j)] = 0;
          j = Ny; for (i=0; i<=Nx; i++) u[idx(i,j)] = 0;
          i = 0;  for (j=0; j<=Ny; j++) u[idx(i,j)] = 0;
          i = Nx; for (j=0; j<=Ny; j++) u[idx(i,j)] = 0;
        }


The Cython interface file
-------------------------

All the code above appears in a file `wave2D_u0_loop_c.c <http://tinyurl.com/jvzzcfn/wave//wave2D_u0/wave2D_u0_loop_c.c>`_.
We need to compile this file together with C wrapper code such that
``advance`` can be called from Python. Cython can be used to generate
appropriate wrapper code.
The relevant Cython code for interfacing C is
placed in a file with extension ``.pyx``. Here this file, called
`wave2D_u0_loop_c_cy.pyx <http://tinyurl.com/jvzzcfn/wave/wave2D_u0/wave2D_u0_loop_c_cy.pyx>`_, looks like


.. code-block:: cython

        import numpy as np
        cimport numpy as np
        cimport cython
        
        cdef extern from "wave2D_u0_loop_c.h":
            void advance(double* u, double* u_1, double* u_2, double* f,
                         double Cx2, double Cy2, double dt2,
                         int Nx, int Ny)
        
        @cython.boundscheck(False)
        @cython.wraparound(False)
        def advance_cwrap(
            np.ndarray[double, ndim=2, mode='c'] u,
            np.ndarray[double, ndim=2, mode='c'] u_1,
            np.ndarray[double, ndim=2, mode='c'] u_2,
            np.ndarray[double, ndim=2, mode='c'] f,
            double Cx2, double Cy2, double dt2):
            advance(&u[0,0], &u_1[0,0], &u_2[0,0], &f[0,0],
                    Cx2, Cy2, dt2,
                    u.shape[0]-1, u.shape[1]-1)
            return u

We first declare the C functions to be interfaced.
These must also appear in a C header file, `wave2D_u0_loop_c.h <http://tinyurl.com/jvzzcfn/wave/wave2D_u0/wave2D_u0_loop_c.h>`_,


.. code-block:: c++

        extern void advance(double* u, double* u_1, double* u_2, double* f,
        		    double Cx2, double Cy2, double dt2,
        		    int Nx, int Ny);

The next step is to write a Cython function with Python objects as arguments.
The name ``advance`` is already used for the C function so the function
to be called from Python is named ``advance_cwrap``. The contents of
this function is simply a call to the ``advance`` version in C. To this end,
the right information from the Python objects must be passed on as
arguments to ``advance``. Arrays are sent with their C pointers to the
first element, obtained in Cython as ``&u[0,0]`` (the ``&`` takes the
address of a C variable). The ``Nx`` and ``Ny`` arguments in ``advance`` are
easily obtained from the shape of the ``numpy`` array ``u``.
Finally, ``u`` must be returned such that we can set ``u = advance(...)``
in Python.

Building the extension module  (2)
----------------------------------

It remains to build the extension module. An appropriate
``setup.py`` file is


.. code-block:: python

        from distutils.core import setup
        from distutils.extension import Extension
        from Cython.Distutils import build_ext
        
        sources = ['wave2D_u0_loop_c.c', 'wave2D_u0_loop_c_cy.pyx']
        module = 'wave2D_u0_loop_c_cy'
        setup(
          name=module,
          ext_modules=[Extension(module, sources,
                                 libraries=[], # C libs to link with
                                 )],
          cmdclass={'build_ext': build_ext},
        )

All we need to specify is the ``.c`` file(s) and the ``.pyx`` interface
file. Cython is automatically run to generate the necessary wrapper
code. Files are then compiled and linked to an extension module
residing in the file ``wave2D_u0_loop_c_cy.so``. Here is a
session with running ``setup.py``
and examining the resulting module in Python


.. code-block:: console

        Terminal> python setup.py build_ext --inplace
        Terminal> python
        >>> import wave2D_u0_loop_c_cy as m
        >>> dir(m)
        ['__builtins__', '__doc__', '__file__', '__name__', '__package__',
         '__test__', 'advance_cwrap', 'np']

The call to the C version of ``advance`` can go like this in Python:


.. code-block:: python

        import wave2D_u0_loop_c_cy
        advance = wave2D_u0_loop_c_cy.advance_cwrap
        ...
        f_a[:,:] = f(xv, yv, t[n])
        u = advance(u, u_1, u_2, f_a, Cx2, Cy2, dt2)


Efficiency  (3)
~~~~~~~~~~~~~~~

In this example, the C and Fortran code runs at the same speed, and there
are no significant differences in the efficiency of the wrapper code.
The overhead implied by the wrapper code is negligible as long as
we do not work with very small meshes and consequently little numerical
work in the ``advance`` function.

Migrating loops to C via f2py
=============================

An alternative to using Cython for interfacing C code is to apply
``f2py``. The C code is the same, just the details of specifying how
it is to be called from Python differ. The ``f2py`` tool requires
the call specification to be a Fortran 90 module defined in a ``.pyf``
file. This file was automatically generated when we interfaced a
Fortran subroutine. With a C function we need to write this module
ourselves, or we can use a trick and let ``f2py`` generate it for us.
The trick consists in writing the signature of the C function with
Fortran syntax and place it in a Fortran file, here
``wave2D_u0_loop_c_f2py_signature.f``:


.. code-block:: fortran

              subroutine advance(u, u_1, u_2, f, Cx2, Cy2, dt2, Nx, Ny)
        Cf2py intent(c) advance
              integer Nx, Ny, N
              real*8 u(0:Nx,0:Ny), u_1(0:Nx,0:Ny), u_2(0:Nx,0:Ny)
              real*8 f(0:Nx, 0:Ny), Cx2, Cy2, dt2
        Cf2py intent(in, out) u
        Cf2py intent(c) u, u_1, u_2, f, Cx2, Cy2, dt2, Nx, Ny
              return
              end

Note that we need a special ``f2py`` instruction, through a ``Cf2py``
comment line, for telling that all the function arguments are
C variables. We also need to specify that the function is actually
in C: ``intent(c) advance``.

Since ``f2py`` is just concerned with the function signature and not the
complete contents of the function body, it can easily generate the
Fortran 90 module specification based solely on the signature above:


.. code-block:: console

        Terminal> f2py -m wave2D_u0_loop_c_f2py \ 
                  -h wave2D_u0_loop_c_f2py.pyf --overwrite-signature \ 
                  wave2D_u0_loop_c_f2py_signature.f

The compile and build step is as for the Fortran code, except that we
list C files instead of Fortran files:


.. code-block:: console

        Terminal> f2py -c wave2D_u0_loop_c_f2py.pyf \ 
                  --build-dir tmp_build_c \ 
                  -DF2PY_REPORT_ON_ARRAY_COPY=1 wave2D_u0_loop_c.c

As when interfacing Fortran code with ``f2py``, we need to print out
the doc string to see the exact call syntax from the Python side.
This doc string is identical for the C and Fortran versions of
``advance``.

.. No worries with transposed storage, copy of arrays can only take

.. place if the type don't match


.. ===== Migrating loops to C via Instant =====


Migrating loops to C++ via f2py
-------------------------------

C++ is a much more versatile language than C or Fortran and has over
the last two decades become very popular for numerical computing.
Many will therefore prefer to migrate compute-intensive Python code
to C++. This is, in principle, easy: just write the desired C++ code
and use some tool for interfacing it from Python. A tool like
`SWIG <http://swig.org/>`_ can interpret the C++ code and generate
interfaces for a wide range of
languages, including Python, Perl, Ruby, and Java.
However, SWIG is a comprehensive tool with a correspondingly
steep learning curve. Alternative tools, such as
`Boost Python <http://www.boost.org/doc/libs/1_51_0/libs/python/doc/index.html>`_, `SIP <http://riverbankcomputing.co.uk/software/sip/intro>`_,
and `Shiboken <http://qt-project.org/wiki/Category:LanguageBindings::PySide::Shiboken>`_
are similarly comprehensive. Simpler tools include
`PyBindGen <http://code.google.com/p/pybindgen/>`_,

.. More tools for interfacing C++ from Python


A technically much easier way of interfacing C++ code is to drop the
possibility to use C++ classes directly from Python, but instead
make a C interface to the C++ code. The C interface can be handled
by ``f2py`` as shown in the example with pure C code. Such a solution
means that classes in Python and C++ cannot be mixed and that only
primitive data types like numbers, strings, and arrays can be
transferred between Python and C++. Actually, this is often a very
good solution because it forces the C++ code to work on array data,
which usually gives faster code than if fancy data structures with
classes are used. The arrays coming from Python, and looking like
plain C/C++ arrays, can be efficiently wrapped in more user-friendly
C++ array classes in the C++ code, if desired.

.. __Remaining.__

.. Use some array class. Key issue: ``extern "C"`` declaration of C++

.. function in the C file with the interface we want to wrap.


Using classes to implement a simulator
======================================

 * Introduce classes ``Mesh``, ``Function``, ``Problem``, ``Solver``, ``Visualizer``,
   ``File``
.. communicate with compiled code by ensuring that classes work with arrays

.. ===== Callbacks to Python from Fortran or C =====

.. 5631 projects:

.. Drop f_a, call py from f77

.. Call py with cpdef f from Cython, but hardcode function name

.. As above, but transfer function to Cython

.. Instant

.. test_quadratic for all

.. Think of extensions: Neumann with modified stencil or ghost cells,

.. variable coefficients, damping, ...


Exercises  (3)
==============


.. --- begin exercise ---

.. _wave:exer:quadratic:2D:

Exercise 10: Check that a solution fulfills the discrete model
--------------------------------------------------------------

Carry out all mathematical details to show that
:eq:`wave2D3D:impl:verify:quadratic` is indeed a solution of the
discrete model for a 2D wave equation with :math:`u=0` on the boundary.
One must check the boundary conditions, the initial conditions,
the general discrete equation at a time level and the special
version of this equation for the first time level.
Filename: ``check_quadratic_solution.pdf``.

.. --- end exercise ---


.. --- begin exercise ---

.. _wave:exer:mesh3D:calculus:

Project 11: Calculus with 2D/3D mesh functions
----------------------------------------------

The goal of this project is to redo
:ref:`wave:exer:mesh1D:calculus` with 2D and 3D
mesh functions (:math:`f_{i,j}` and :math:`_{fi,j,k}`).

**Differentiation.**
The differentiation results in a discrete gradient
function, which in the 2D case can be represented by a three-dimensional
array ``df[d,i,j]`` where ``d`` represents the direction of
the derivative and ``i`` and ``j`` are mesh point counters in 2D
(the 3D counterpart is ``df[d,i,j,k]``).

**Integration.**
The integral of a 2D mesh function :math:`f_{i,j}` is defined as


.. math::
         F_{i,j} = \int_{y_0}^{y_j} \int_{x_0}^{x_i} f(x,y)dxdy,

where :math:`f(x,y)` is a function that takes on the values of the
discrete mesh function :math:`f_{i,j}` at the mesh points, but can also
be evaluated in between the mesh points. The particular variation
between mesh points can be taken as bilinear, but this is not
important as we will use a product Trapezoidal rule to approximate
the integral over a cell in the mesh and then we only need to
evaluate :math:`f(x,y)` at the mesh points.

Suppose :math:`F_{i,j}` is computed. The calculation of :math:`F_{i+1,j}`
is then


.. math::
        
        F_{i+1,j} &= F_{i,j} + \int_{x_i}^{x_{i+1}}\int_{y_0}^{y_j} f(x,y)dydx\\ 
        &\approx \Delta x \int_{y_0}^{y_j} f(x_{i+\frac{1}{2}},y)dy\\ 
        & \approx \Delta x \frac{1}{2}\left(
        \int_{y_0}^{y_j} f(x_{i},y)dy
        + \int_{y_0}^{y_j} f(x_{i+1},y)dy\right)
        

The integrals in the :math:`y` direction can be approximated by a Trapezoidal
rule. A similar idea can be used to compute :math:`F_{i,j+1}`. Thereafter,
:math:`F_{i+1,j+1}` can be computed by adding the integral over the final
corner cell to :math:`F_{i+1,j} + F_{i,j+1} - F_{i,j}`. Carry out the
details of these computations and extend the ideas to 3D.
Filename: ``mesh_calculus_3D.py``.

.. --- end exercise ---


.. --- begin exercise ---

.. _wave:app:exer:wave2D:Neumann:

Exercise 12: Implement Neumann conditions in 2D
-----------------------------------------------

Modify the `wave2D_u0.py <http://tinyurl.com/jvzzcfn/wave/wave2D_u0/wave2D_u0.py>`_
program, which solves the 2D wave equation :math:`u_{tt}=c^2(u_{xx}+u_{yy})`
with constant wave velocity :math:`c` and :math:`u=0` on the boundary, to have
Neumann boundary conditions: :math:`\partial u/\partial n=0`.
Include both scalar code (for debugging and reference) and
vectorized code (for speed).

To test the code, use :math:`u=1.2` as solution (:math:`I(x,y)=1.2`, :math:`V=f=0`, and
:math:`c` arbitrary), which should be exactly reproduced with any mesh
as long as the stability criterion is satisfied.
Another test is to use the plug-shaped pulse
in the ``pulse`` function from the section :ref:`wave:pde2:software`
and the `wave1D_dn_vc.py <http://tinyurl.com/jvzzcfn/wave/wave1D_dn_vc.py>`_
program. This pulse
is exactly propagated in 1D if :math:`c\Delta t/\Delta x=1`. Check
that also the 2D program can propagate this pulse exactly
in :math:`x` direction (:math:`c\Delta t/\Delta x=1`, :math:`\Delta y` arbitrary)
and :math:`y` direction (:math:`c\Delta t/\Delta y=1`, :math:`\Delta x` arbitrary).
Filename: ``wave2D_dn.py``.

.. --- end exercise ---


.. --- begin exercise ---

.. _wave:exer:3D:f77:cy:efficiency:

Exercise 13: Test the efficiency of compiled loops in 3D
--------------------------------------------------------

Extend the ``wave2D_u0.py`` code and the Cython, Fortran, and C versions to 3D.
Set up an efficiency experiment to determine the relative efficiency of
pure scalar Python code, vectorized code, Cython-compiled loops,
Fortran-compiled loops, and C-compiled loops.
Normalize the CPU time for each mesh by the fastest version.
Filename: ``wave3D_u0.py``.

.. --- end exercise ---