Programming with sound

Sound on a computer is nothing but a sequence of numbers. As an example, consider the famous A tone at 440 Hz. Physically, this is an oscillation of a tuning fork, loudspeaker, string or another mechanical medium that makes the surrounding air also oscillate and transport the sound as a compression wave. This wave may hit our ears and through complicated physiological processes be transformed to an electrical signal that the brain can recognize as sound. Mathematically, the oscillations are described by a sine function of time: $\begin{equation} s(t) = A\sin\left( 2\pi f t\right), \tag{41} \end{equation}$ where $A$ is the amplitude or strength of the sound and $f$ is the frequency (440 Hz for the A in our example). In a computer, $s(t)$ is represented at discrete points of time. CD quality means 44100 samples per second. Other sample rates are also possible, so we introduce $r$ as the sample rate. An $f$ Hz tone lasting for $m$ seconds with sample rate $r$ can then be computed as the sequence $\begin{equation} s_n = A\sin\left( 2\pi f {n\over r}\right), \quad n=0,1,\ldots, m\cdot r\tp \tag{42} \end{equation}$ With Numerical Python this computation is straightforward and very efficient. Introducing some more explanatory variable names than $r$ , $A$ , and $m$ , we can write a function for generating a note:

import numpy as np

def note(frequency, length, amplitude=1, sample_rate=44100):
    time_points = np.linspace(0, length, length*sample_rate)
    data = np.sin(2*np.pi*frequency*time_points)
    data = amplitude*data
    return data

Writing sound to file

The note function above generates an array of float data representing a note. The sound card in the computer cannot play these data, because the card assumes that the information about the oscillations appears as a sequence of two-byte integers. With an array's astype method we can easily convert our data to two-byte integers instead of floats:

data = data.astype(numpy.int16)

That is, the name of the two-byte integer data type in numpy is int16 (two bytes are 16 bits). The maximum value of a two-byte integer is

$2^{15}-1$ , so this is also the maximum amplitude. Assuming that amplitude in the note function is a relative measure of intensity, such that the value lies between 0 and 1, we must adjust this amplitude to the scale of two-byte integers:

max_amplitude = 2**15 - 1
data = max_amplitude*data

The data array of int16 numbers can be written to a file and played as an ordinary file in CD quality. Such a file is known as a wave file or simply a WAV file since the extension is .wav. Python has a module wave for creating such files. Given an array of sound, data, we have in SciTools a module sound with a function write for writing the data to a WAV file (using functionality from the wave module):

import scitools.sound
scitools.sound.write(data, 'Atone.wav')

You can now use your favorite music player to play the Atone.wav file, or you can play it from within a Python program using

scitools.sound.play('Atone.wav')

The write function can take more arguments and write, e.g., a stereo file with two channels, but we do not dive into these details here.

Reading sound from file

Given a sound signal in a WAV file, we can easily read this signal into an array and mathematically manipulate the data in the array to change the flavor of the sound, e.g., add echo, treble, or bass. The recipe for reading a WAV file with name filename is

data = scitools.sound.read(filename)

The data array has elements of type int16. Often we want to compute with this array, and then we need elements of float type, obtained by the conversion

data = data.astype(float)

The write function automatically transforms the element type back to int16 if we have not done this explicitly.

One operation that we can easily do is adding an echo. Mathematically this means that we add a damped delayed sound, where the original sound has weight $\beta$ and the delayed part has weight $1-\beta$ , such that the overall amplitude is not altered. Let $d$ be the delay in seconds. With a sampling rate $r$ the number of indices in the delay becomes $dr$ , which we denote by $b$ . Given an original sound sequence $s_n$ , the sound with echo is the sequence $\begin{equation} e_n = \beta s_n + (1-\beta) s_{n-b}\tp \tag{43} \end{equation}$ We cannot start $n$ at 0 since $e_0=s_{0-b}=s_{-b}$ which is a value outside the sound data. Therefore we define $e_n=s_n$ for $n=0,1,\ldots,b$ , and add the echo thereafter. A simple loop can do this (again we use descriptive variable names instead of the mathematical symbols introduced):

def add_echo(data, beta=0.8, delay=0.002, sample_rate=44100):
    newdata = data.copy()
    shift = int(delay*sample_rate)  # b (math symbol)
    for i in range(shift, len(data)):
        newdata[i] = beta*data[i] + (1-beta)*data[i-shift]
    return newdata

The problem with this function is that it runs slowly, especially when we have sound clips lasting several seconds (recall that for CD quality we need 44100 numbers per second). It is therefore necessary to vectorize the implementation of the difference equation for adding echo. The update is then based on adding slices:

newdata[shift:] = beta*data[shift:] + \ 
                  (1-beta)*data[:len(data)-shift]

Playing many notes

How do we generate a melody mathematically in a computer program? With the note function we can generate a note with a certain amplitude, frequency, and duration. The note is represented as an array. Putting sound arrays for different notes after each other will make up a melody. If we have several sound arrays data1, data2, data3, $\ldots$ , we can make a new array consisting of the elements in the first array followed by the elements of the next array followed by the elements in the next array and so forth:

data = numpy.concatenate((data1, data2, data3, ...))

The frequency of a note that is $h$ half tones up from a base frequency $f$ is given by $f2^{h/12}$ . With the tone A at 440 Hz, we can define notes and the corresponding frequencies as

base_freq = 440.0
notes = ['A', 'A#', 'B', 'C', 'C#', 'D', 'D#', 'E',
         'F', 'F#', 'G', 'G#']
notes2freq = {notes[i]: base_freq*2**(i/12.0)
              for i in range(len(notes))}

With the notes to frequency mapping a melody can be made as a series of notes with specified duration:

l = .2  # basic duration unit
tones = [('E', 3*l), ('D', l), ('C#', 2*l), ('B', 2*l), ('A', 2*l),
         ('B', 2*l), ('C#', 2*l), ('D', 2*l), ('E', 3*l),
         ('F#', l), ('E', 2*l), ('D', 2*l), ('C#', 4*l)]

samples = []
for tone, duration in tones :
    s = note(notes2freq[tone], duration)
    samples.append(s)

data = np.concatenate(samples)
data *= 2**15-1
scitools.sound.write(data, "melody.wav")

Playing the resulting file melody.wav reveals that this is the opening of the most-played tune during international cross country skiing competitions.

All the notes had the same amplitude in this example, but more dynamics can easily be added by letting the elements in tones be triplets with tone, duration, and amplitude. The basic code above is found in the file melody.py.

Music of a sequence

Problem

The purpose of this example is to listen to the sound generated by two mathematical sequences. The first one is given by an explicit formula, constructed to oscillate around 0 with decreasing amplitude: $\begin{equation} x_n = e^{-4n/N}\sin(8\pi n/N)\tp \tag{44} \end{equation}$ The other sequence is generated by the difference equation (13) for logistic growth, repeated here for convenience: $\begin{equation} x_n = x_{n-1} + q x_{n-1}\left(1 - x_{n-1}\right),\quad x = x_0\tp \tag{45} \end{equation}$ We let $x_0=0.01$ and $q=2$ . This leads to fast initial growth toward the limit 1, and then oscillations around this limit (this problem is studied in Exercise 19: Demonstrate oscillatory solutions of the logistic equation).

The absolute value of the sequence elements $x_n$ are of size between 0 and 1, approximately. We want to transform these sequence elements to tones, using the techniques of the section Programming with sound. First we convert $x_n$ to a frequency the human ear can hear. The transformation $\begin{equation} y_n = 440 + 200 x_n \tag{46} \end{equation}$ will make a standard A reference tone out of $x_n=0$ , and for the maximum value of $x_n$ around 1 we get a tone of 640 Hz. Elements of the sequence generated by (44) lie between -1 and 1, so the corresponding frequencies lie between 240 Hz and 640 Hz. The task now is to make a program that can generate and play the sounds.

Solution

Tones can be generated by the note function from the scitools.sound module. We collect all tones corresponding to all the $y_n$ frequencies in a list tones. Letting N denote the number of sequence elements, the relevant code segment reads

from scitools.sound import *
freqs = 440 + x*200
tones = []
duration = 30.0/N     # 30 sec sound in total
for n in range(N+1):
    tones.append(max_amplitude*note(freqs[n], duration, 1))
data = concatenate(tones)
write(data, filename)
data = read(filename)
play(filename)

It is illustrating to plot the sequences too,

plot(range(N+1), freqs, 'ro')

To generate the sequences (44) and (45), we make two functions, oscillations and logistic, respectively. These functions take the number of sequence elements (N) as input and return the sequence stored in an array.

In another function make_sound we compute the sequence, transform the elements to frequencies, generate tones, write the tones to file, and play the sound file.

As always, we collect the functions in a module and include a test block where we can read the choice of sequence and the sequence length from the command line. The complete module file looks as follows:

from scitools.sound import *
from scitools.std import *

def oscillations(N):
    x = zeros(N+1)
    for n in range(N+1):
        x[n] = exp(-4*n/float(N))*sin(8*pi*n/float(N))
    return x

def logistic(N):
    x = zeros(N+1)
    x[0] = 0.01
    q = 2
    for n in range(1, N+1):
        x[n] = x[n-1] + q*x[n-1]*(1 - x[n-1])
    return x

def make_sound(N, seqtype):
    filename = 'tmp.wav'
    x = eval(seqtype)(N)
    # Convert x values to frequences around 440
    freqs = 440 + x*200
    plot(range(N+1), freqs, 'ro')
    # Generate tones
    tones = []
    duration = 30.0/N     # 30 sec sound in total
    for n in range(N+1):
        tones.append(max_amplitude*note(freqs[n], duration, 1))
    data = concatenate(tones)
    write(data, filename)
    data = read(filename)
    play(filename)

This code should be quite easy to read at the present stage in the document. However, there is one statement that deserves a comment:

    x = eval(seqtype)(N)

The seqtype argument reflects the type of sequence and is a string that the user provides on the command line. The values of the string equal the function names oscillations and logistic. With eval(seqtype) we turn the string into a function name. For example, if seqtype is 'logistic', performing an eval(seqtype)(N) is the same as if we had written logistic(N). This technique allows the user of the program to choose a function call inside the code. Without eval we would need to explicitly test on values:

if seqtype == 'logistic':
    x = logistic(N)
elif seqtype == 'oscillations':
    x = oscillations(N)

This is not much extra code to write in the present example, but if we have a large number of functions generating sequences, we can save a lot of boring if-else code by using the eval construction.

The next step, as a reader who have understood the problem and the implementation above, is to run the program for two cases: the oscillations sequence with $N=40$ and the logistic sequence with $N=100$ . By altering the $q$ parameter to lower values, you get other sounds, typically quite boring sounds for non-oscillating logistic growth ( $q < 1$ ). You can also experiment with other transformations of the form (46), e.g., increasing the frequency variation from 200 to 400.

« Previous

Next »