How to automatically generate Jupyter Notebooks

Hans Petter Langtangen [1, 2]

[1] Simula
[2] University of Oslo

May 14, 2015

Table of contents

Sample problem
The ascii notebook format
      Cell delimiter lines
      Include lines
      Mako processing
      Example on syntax
The compiled file
The generator code

Summary. This note explains how to write your own notebook generator in Python such that you can write a notebook in plain ascii in your favorite editor and also use handy tools such as preprocessors to introduce variables and other programming constructs into the text as well as to run computations.

Ascii input is particularly useful if you have LaTeX code that you want to make use of in notebooks. Then you must translate the LaTeX code to the syntax described here and run the compiler to be described.

Sample problem

The notebook generator will be demonstrated through a specific example. on writing a little report where we 1) present a differential equation, 2) solve it by SymPy, and 3) show Python code for the solution and some computations. We show how the SymPy calculations can be done on the fly while compiling the document: results in Python variables from the SymPy calculations are magically propagated into the text. (This functionality is quite similar to PythonTeX, but just based on a standard template language, Mako, instead of quite comprehensive LaTeX code.)

The ascii notebook format

Cell delimiter lines

We go for a very simple format: ----- is delimiter lines between cells. Cells are written in either plain Markdown or as a set of statements in a programming language, depending on whether the cell is a text or code cell.

Deliminter lines with an extension text x, as in -----x, indicates code cell in language x, where x is a short name for the language, typically the file extension: py for Python, f for Fortran, c for C, cpp for C++, js for JavaScript, sh for Bash or another Unix shell, sys for the console (terminal window), java for Java, tex for LaTeX or TeX, html for HTML, etc.

If x is proceeded by -t it means that the cell is not a code cell, but a standard static Markdown code cell typeset within triple backticks as usual in Markdown. (Sometimes one wants to show code, but it is not intended to be executable.)

Include lines

It is handy to include other files in a document so we invent the syntax #include "filename" at the beginning of a line to include a file with name filename.

Tip: the include syntax can be extended. Since we are in charge of parsing the ascii input file, we can invent more sophisticated syntax for including files. For example, we may specify a start and an end line, either by numbers or preferably by regular expressions (which tend to be more robust in that they changes less often than line numbers). The syntax could be like

#include "myprog.py" fromto: from_regex@to_regex

However, this extension is not incorporated in the first version of the notebook generator. We just mention the possibility.

Mako processing

It is also very handy to run the text through a preprocessor that is a full-fledged template programming language of the type that is popular in the web world. Here we have chosen Mako. Running the text through Mako enables the use of variables, if-tests, and loops, to menition the most usual constructs. Pure Python functions can be defined inside <% and %> and called in the code. Mako applies the syntax ${var} for variables and ${myfunc(arg1, arg2=None)} for function calls.

Tip: put Python code in a separate file for testing! Having lots of Python code inside <% and %> Mako tags is not recommended as debugging can be a nightmare. Instead, put the code in a file, say myprog.py, and just include it:

<%
#include "myprog.py"
%>

Then you can debug myprog.py as standard Python code, but call up its functions and use its global variables in the document's text (!).

Example on syntax

Let us show a very simple document with some code, some math, and use of include and Mako. The task is to solve a differential equation by SymPy on the fly in the code and use SymPy output directly in the text. For this goal, we write the SymPy code in a separate file where a dump function can be used for heavily printing of intermediate results, but a global variable allow_printing determines whether printing is turned on and off: we want it on when debugging, but off when compiling our document.

The document starts with an author, his address, and the date, where author and address are Mako variables we can specify on the command line when compiling the document. This text is a Markdown cell and therefore starts with -----:

-----
## Test of Jupyter Notebook generator

**${NAME}**, ${ADDRESS}

**May 14, 2015**

Note that a double ## is a Mako comment line and it will not be a part of the final output from Mako. IC is another variable that must be specified on the command line (and fed to Mako) for the initial condition of the differential equation.

Next follows the math part where we have an included SymPy code to solve the math problem. The SymPy is in the file .solve_dyeqy.py:

# Solve y'=y, y(0)=2 by sympy
# This file is intended for being included via mako
# in a document, but it is much easier to debug the
# python code in a separate standard .py file.
# Then we just include this file in the document inside
# <% ... %> mako directives and set allow_printing=False.

def dump(var):
    if allow_printing:
        print var

def solve():
    """Solve y'=y, y(0)=2."""
    import sympy as sym
    t = sym.symbols('t', real=True, positive=True)
    y = sym.symbols('y', cls=sym.Function)
    # Solve differential equation using dsolve
    eq = sym.diff(y(t), t) - y(t)
    dump(eq)
    sol = sym.dsolve(eq)
    dump(sol)
    y = sol.rhs          # grab right-hand side of equation
    # Determine integration constant C1 from initial condition
    C1 = sym.symbols('C1')
    y0 = 2
    eq = y.subs(t, 0) - y0  # equation for initial condition
    dump(eq)
    sol = sym.solve(eq, C1)     # solve wrt C1
    dump(sol)
    y = y.subs(C1, sol[0])  # insert C1=2 in solution
    dump(y)
    y_func = sym.lambdify([t], y, modules='numpy')
    return sym.latex(y), sym.latex(sol[0]), y_func

if __name__ == '__main__':
    allow_printing = True
    solve()

This is just standard Python code. The __name__ variable equals __builtin__ when we run this code inside Mako so then the test block is inactive. Instead, we can define allow_printing = False, call `solve(), and store its output in variables such that we can access them in the running text. Here is the syntax:

## Math

This is a test notebook where we solve the following math
problem:

$$
y' = y,\quad y(0)=${IC}
$$

## Solve the problem by SymPy
<%
## Make sure to test the Python file first!
#include ".solve_dyeqy.py"
allow_printing = False
y_expr, C_expr, y_func = solve()
%>

The equation is separable, and we find by standard methods
that

$$
y(t) = ${y_expr}.
$$
The integration constant is found from the initial condition
$y(0)=${IC}$ and equals in this case $${C_expr}$.

Note how we in the middle of math expressions use Mako variables taken from both the command line, such as ${IC}, and from the Python code, such as ${y_expr} and ${C_expr}!

We can now define some code cells for execution. We want to create a Python code for the solution, using the SymPy variable y_expr and SymPy's ability to write the expression for a numerical Python function, here called y(t). Note that the delimiter for a Python code cell is -----py.

## Code

We implement the evaulation of $y(t)$ in Python:

<%
## We use sympy to convert y_expr to a string to
## be returned as Python code
import sympy
## Note that we have y_func which is a real Python
## function, but here we make a similar one: y(t)
## so the user can see it.
%>

-----py
from numpy import exp

def y(t):
    return ${sympy.printing.lambdarepr.lambdarpr(${y_expr})}

## Try values
y(0)
-----py
y(1), 2*exp(1)
-----py
y(2), 2*exp(2)

Remark. The way we use Mako here hides the computations by SymPy. Sometimes this is the desired behavior in a text. In other occasions, however, you may want to show all the SymPy steps and then you can do that explicitly in notebook cells. If you want to show just some selected steps, you can show the code as Markdown code using the delimiter -----py-t instead of -----py.

Finally, we show how to compile the ascii file into a Jupyter Notebook, using a console cell that is to be shown as plain Markdown Bash code.

## Compilation
-----
This is how we run the notebook generator on files with
extension `.aipynb`:

## console cell, but typeset as pure code (-t extension)
-----sys-t
Terminal> ipynb_generator.py myfile.aipynb  MYVAR=4 GRADE='excellent'

What we have not shown here, is the ability to call Python function in the text. We could, if it was sensible, call the solve function in the text, e.g., as in ...and the solution becomes ${solve()[0]}.

The compiled file

We compile our example file by the following command:

Terminal> python ipynb_generator.py .test1.aipynb NAME="Core Dump" \
          ADDRESS="ADDRESS=Seg. Fault Ltd and Univ. of C. Space" \
	  IC=2

Note that some Mako variables are supposed to be given on the command line, here three, while others are defined in Python code within <% and %> tags in the document.

The output of the ipynb_generator.py command above is a notebook file .test1.ipynb. The file looks like this:

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "\n",
    "# Test of Jupyter Notebook generator\n",
    "\n",
    "**Core Dump**, Seg. Fault Ltd and Univ. of C. Space\n",
    "\n",
    "**May 14, 2015**\n",
    "\n",
    "\n",
    "This is a test notebook where we solve the following math\n",
    "problem:\n",
    "\n",
    "$$\n",
    "y' = y,\\quad y(0)=2\n",
    "$$\n",
    "\n",
    "\n",
    "\n",
    "The equation is separable, and we find by standard methods\n",
    "that\n",
    "\n",
    "$$\n",
    "y(t) = 2 e^{t}.\n",
    "$$\n",
    "The integration constant is found from the initial condition\n",
    "$y(0)=2$ and equals in this case $2$.\n",
    "\n",
    "\n",
    "We implement the evaulation of $y(t)$ in Python:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "from numpy import exp\n",
    "\n",
    "def y(t):\n",
    "    return 2*exp(t)\n",
    "\n",
    "# Try values\n",
    "y(0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "y(1), 2*exp(1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "y(2), 2*exp(2)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "\n",
    "This is how we run the notebook generator on files with\n",
    "extension `.aipynb`:\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "\n",
    "```Python\n",
    "\n",
    "Terminal> ipynb_generator.py myfile.aipynb\n",
    "```\n"
   ]
  }
 ],
 "metadata": {},
 "nbformat": 4,
 "nbformat_minor": 0
}

The notebook file resides in GitHub and can be automatically rendered there.

The generator code

We shall now list the code that translates the ascii input, with the special syntax explained, into a notebook. The algorithmic steps are

  1. Read the file.
  2. Find all include lines and include the corresponding text.
  3. Run Mako on the file.
  4. Make a cells list of all cells., i.e., detect the beginning of a new cell by the delimiter line -----. The cells list consists of elements of 3-lists, where each 3-list has the cell type, a description, and all the lines of the cell as its three elements.
  5. When making a new element in cells, see if the delimiter line has a language specification and therefore is a code cell, or if it is a plain Markdown code cell.
  6. Go through cells and join separate lines in each cell into a string.
  7. Import functions from IPython.nbformat.v4 for translating the information in the cells list into a cell list nb_cells suitable for the nootebook.
  8. Write the nb_cells list to JSON format.
The read function looks as follows.

def read(text, argv=sys.argv[2:]):
    lines = text.splitlines()
    # First read all include statements
    for i in range(len(lines)):
        if lines[i].startswith('#include "'):
            filename = lines[i].split('"')[1]
            with open(filename, 'r') as f:
                include_text = f.read()
            lines[i] = include_text
    text = '\n'.join(lines)

    # Run Mako
    mako_kwargs = {}
    for arg in argv:
        key, value = arg.split('=')
        mako_kwargs[key] = value

    encoding = 'utf-8'
    try:
        import mako
        has_mako = True
    except ImportError:
        print 'Cannot import mako - mako is not run'
        has_mako = False

    if has_mako:
        from mako.template import Template
        from mako.lookup import TemplateLookup
        lookup = TemplateLookup(directories=[os.curdir])

        text = unicode(text, encoding)
        temp = Template(text=text, lookup=lookup,
                        strict_undefined=True)
        text = temp.render(**mako_kwargs)

    # Parse the cells
    lines = text.splitlines()
    cells = []
    inside = None    # indicates which type of cell we are inside
    fullname = None  # full language name in code cells
    for line in lines:
        if line.startswith('-----'):
            # New cell, what type?
            m = re.search(r'-----([a-z0-9-]+)?', line)
            if m:
                shortname = m.group(1)
                if shortname:
                    # Check if code is to be typeset as static
                    # Markdown code (e.g., shortname=py-t)
                    astext = shortname[-2:] == '-t'
                    if astext:
                        # Markdown
                        shortname = shortname[:-2]
                        inside = 'markdown'
                        cells.append(['markdown', 'code', ['\n']])
                        cells[-1][2].append('```%s\n' % fullname)
                    else:
                        # Code cell
                        if shortname in shortname2language:
                            fullname = shortname2language[shortname]
                        inside = 'codecell'
                        cells.append(['codecell', fullname, []])
                else:
                    # Markdown cell
                    inside = 'markdown'
                    cells.append(['markdown', 'text', ['\n']])
            else:
                raise SyntaxError(
                    'Wrong syntax of cell delimiter:\n%s'
                    % repr(line))
        else:
            # Ordinary line in a cell
            if inside in ('markdown', 'codecell'):
                cells[-1][2].append(line)
            else:
                raise SyntaxError(
                    'line\n  %s\nhas not beginning cell delimiter'
                    % line)
    # Merge the lines in each cell to a string
    for i in range(len(cells)):
        if cells[i][0] == 'markdown' and cells[i][1] == 'code':
            # Add an ending ``` of code
            cells[i][2].append('```\n')
        cells[i][2] = '\n'.join(cells[i][2])
    import pprint
    return cells

The line fullname = shortname2language[shortname] is not easy to understand unless we have the definition of the dictionary

# Mapping of shortnames like py to full language
# name like python used by markdown/pandoc
shortname2language = dict(
    py='Python', ipy='Python', pyshell='Python', cy='Python',
    c='C', cpp='Cpp', f='Fortran', f95='Fortran95',
    rb='Ruby', pl='Perl', sh='Shell', js='JavaScript', html='HTML',
    tex='Tex', sys='Bash',
    )

The translation from a cells list to the similar list needed by the IPython notebook writing functions is taken care of in the following function:

def write(cells):
    """Turn cells list into valid IPython notebook code."""
    # Use IPython.nbformat functionality for writing the notebook
    from IPython.nbformat.v4 import (
        new_code_cell, new_markdown_cell, new_notebook)
    nb_cells = []

    for cell_tp, language, block in cells:
        if cell_tp == 'markdown':
            nb_cells.append(new_markdown_cell(source=block))
        elif cell_tp == 'codecell':
            nb_cells.append(new_code_cell(source=block))

    nb = new_notebook(cells=nb_cells)
    from IPython.nbformat import writes
    filestr = writes(nb, version=4)
    return filestr

A driver or main program is needed:

def driver():
    """Compile a document and its variables."""
    try:
        filename = sys.argv[1]
        with open(filename, 'r') as f:
            text = f.read()
    except (IndexError, IOError) as e:
        print 'Usage: %s filename' % (sys.argv[0])
        print e
        sys.exit(1)
    cells = read(text, argv=sys.argv[2:])
    filestr = write(cells, 3)
    filename = filename[-5:] + '.ipynb'
    with open(filename, 'w') as f:
        f.write(filestr)

The true file has support for notebook format version 3 and 4 and contains also a lot of logging statements to aid debugging.

Summary. Hopefully, this example has shown
  1. how to generate your input own format for writing Jupyter notebooks,
  2. how you can extend such a format with a preprocessor like Mako,
  3. how the IPython.nbformat functions can be used for writing notebooks.