This chapter is taken from the book A Primer on Scientific Programming with Python by H. P. Langtangen, 5th edition, Springer, 2016.
Sound on a computer is nothing but a sequence of numbers. As an example, consider the famous A tone at 440 Hz. Physically, this is an oscillation of a tuning fork, loudspeaker, string or another mechanical medium that makes the surrounding air also oscillate and transport the sound as a compression wave. This wave may hit our ears and through complicated physiological processes be transformed to an electrical signal that the brain can recognize as sound. Mathematically, the oscillations are described by a sine function of time: $$ \begin{equation} s(t) = A\sin\left( 2\pi f t\right), \tag{41} \end{equation} $$ where \( A \) is the amplitude or strength of the sound and \( f \) is the frequency (440 Hz for the A in our example). In a computer, \( s(t) \) is represented at discrete points of time. CD quality means 44100 samples per second. Other sample rates are also possible, so we introduce \( r \) as the sample rate. An \( f \) Hz tone lasting for \( m \) seconds with sample rate \( r \) can then be computed as the sequence $$ \begin{equation} s_n = A\sin\left( 2\pi f {n\over r}\right), \quad n=0,1,\ldots, m\cdot r\tp \tag{42} \end{equation} $$ With Numerical Python this computation is straightforward and very efficient. Introducing some more explanatory variable names than \( r \), \( A \), and \( m \), we can write a function for generating a note:
import numpy as np
def note(frequency, length, amplitude=1, sample_rate=44100):
time_points = np.linspace(0, length, length*sample_rate)
data = np.sin(2*np.pi*frequency*time_points)
data = amplitude*data
return data
The note
function above generates an array of float
data
representing a note. The sound card in the computer cannot
play these data, because the card assumes that the information about
the oscillations appears as a sequence of
two-byte integers. With an array's astype
method we can easily convert
our data to two-byte integers instead of floats
:
data = data.astype(numpy.int16)
That is, the name of the two-byte integer data type in numpy
is
int16
(two bytes are 16 bits).
The maximum
value of a two-byte integer is \( 2^{15}-1 \), so this is also the
maximum amplitude.
Assuming that amplitude
in the note
function is a relative
measure of intensity, such that the value lies between 0 and 1,
we must adjust this amplitude to the scale of two-byte integers:
max_amplitude = 2**15 - 1
data = max_amplitude*data
The data
array of int16
numbers can be written to a file and played
as an ordinary file in CD quality. Such a file is known as a
wave file or simply a WAV file since the extension is .wav
.
Python has a module wave
for creating such files.
Given an array of sound, data
, we have in SciTools a
module sound
with a function write
for writing the data to a WAV file (using functionality from the
wave
module):
import scitools.sound
scitools.sound.write(data, 'Atone.wav')
You can now use your favorite music player to play the Atone.wav
file, or you can play it from within a Python program using
scitools.sound.play('Atone.wav')
The write
function can take more arguments and write, e.g.,
a stereo file with two channels, but we do not dive into these details here.
Given a sound signal in a WAV file, we can easily read this signal
into an array and mathematically manipulate the data in the array to change the
flavor of the sound, e.g., add echo, treble, or bass.
The recipe for reading a WAV file with name filename
is
data = scitools.sound.read(filename)
The data
array has elements of type int16
. Often we want to
compute with this array, and then we need
elements of float
type, obtained by the conversion
data = data.astype(float)
The write
function automatically transforms the element type back
to int16
if we have not done this explicitly.
One operation that we can easily do is adding an echo. Mathematically this means that we add a damped delayed sound, where the original sound has weight \( \beta \) and the delayed part has weight \( 1-\beta \), such that the overall amplitude is not altered. Let \( d \) be the delay in seconds. With a sampling rate \( r \) the number of indices in the delay becomes \( dr \), which we denote by \( b \). Given an original sound sequence \( s_n \), the sound with echo is the sequence $$ \begin{equation} e_n = \beta s_n + (1-\beta) s_{n-b}\tp \tag{43} \end{equation} $$ We cannot start \( n \) at 0 since \( e_0=s_{0-b}=s_{-b} \) which is a value outside the sound data. Therefore we define \( e_n=s_n \) for \( n=0,1,\ldots,b \), and add the echo thereafter. A simple loop can do this (again we use descriptive variable names instead of the mathematical symbols introduced):
def add_echo(data, beta=0.8, delay=0.002, sample_rate=44100):
newdata = data.copy()
shift = int(delay*sample_rate) # b (math symbol)
for i in range(shift, len(data)):
newdata[i] = beta*data[i] + (1-beta)*data[i-shift]
return newdata
The problem with this function is that it runs slowly, especially when
we have sound clips lasting several seconds (recall that
for CD quality we need 44100 numbers per second). It is therefore necessary
to vectorize the implementation of the difference equation for adding echo.
The update is then based on adding slices:
newdata[shift:] = beta*data[shift:] + \
(1-beta)*data[:len(data)-shift]
How do we generate a melody mathematically in a computer program?
With the note
function we can generate a note with a
certain amplitude, frequency, and duration. The note is represented
as an array. Putting sound arrays for different notes
after each other will make up a melody.
If we have
several sound arrays data1
, data2
, data3
, \( \ldots \),
we can make
a new array consisting of the elements in the first array followed
by the elements of the next array followed by the elements in the next
array and so forth:
data = numpy.concatenate((data1, data2, data3, ...))
The frequency of a note that is \( h \) half tones up from a base frequency \( f \) is given by \( f2^{h/12} \). With the tone A at 440 Hz, we can define notes and the corresponding frequencies as
base_freq = 440.0
notes = ['A', 'A#', 'B', 'C', 'C#', 'D', 'D#', 'E',
'F', 'F#', 'G', 'G#']
notes2freq = {notes[i]: base_freq*2**(i/12.0)
for i in range(len(notes))}
With the notes to frequency mapping a melody can
be made as a series of notes with specified duration:
l = .2 # basic duration unit
tones = [('E', 3*l), ('D', l), ('C#', 2*l), ('B', 2*l), ('A', 2*l),
('B', 2*l), ('C#', 2*l), ('D', 2*l), ('E', 3*l),
('F#', l), ('E', 2*l), ('D', 2*l), ('C#', 4*l)]
samples = []
for tone, duration in tones :
s = note(notes2freq[tone], duration)
samples.append(s)
data = np.concatenate(samples)
data *= 2**15-1
scitools.sound.write(data, "melody.wav")
Playing the resulting file melody.wav
reveals that this is the
opening of the most-played tune during
international cross country skiing competitions.
All the notes had the same amplitude in this example, but more
dynamics can easily be added by letting the elements in tones
be
triplets with tone, duration, and amplitude. The basic code above is
found in the file melody.py.
The purpose of this example is to listen to the sound generated by two mathematical sequences. The first one is given by an explicit formula, constructed to oscillate around 0 with decreasing amplitude: $$ \begin{equation} x_n = e^{-4n/N}\sin(8\pi n/N)\tp \tag{44} \end{equation} $$ The other sequence is generated by the difference equation (13) for logistic growth, repeated here for convenience: $$ \begin{equation} x_n = x_{n-1} + q x_{n-1}\left(1 - x_{n-1}\right),\quad x = x_0\tp \tag{45} \end{equation} $$ We let \( x_0=0.01 \) and \( q=2 \). This leads to fast initial growth toward the limit 1, and then oscillations around this limit (this problem is studied in Exercise 19: Demonstrate oscillatory solutions of the logistic equation).
The absolute value of the sequence elements \( x_n \) are of size between 0 and 1, approximately. We want to transform these sequence elements to tones, using the techniques of the section Programming with sound. First we convert \( x_n \) to a frequency the human ear can hear. The transformation $$ \begin{equation} y_n = 440 + 200 x_n \tag{46} \end{equation} $$ will make a standard A reference tone out of \( x_n=0 \), and for the maximum value of \( x_n \) around 1 we get a tone of 640 Hz. Elements of the sequence generated by (44) lie between -1 and 1, so the corresponding frequencies lie between 240 Hz and 640 Hz. The task now is to make a program that can generate and play the sounds.
Tones can be generated by the note
function from
the scitools.sound
module. We collect all tones corresponding to
all the \( y_n \) frequencies in a list tones
. Letting N
denote the number of sequence elements,
the relevant code segment
reads
from scitools.sound import *
freqs = 440 + x*200
tones = []
duration = 30.0/N # 30 sec sound in total
for n in range(N+1):
tones.append(max_amplitude*note(freqs[n], duration, 1))
data = concatenate(tones)
write(data, filename)
data = read(filename)
play(filename)
It is illustrating to plot the sequences too,
plot(range(N+1), freqs, 'ro')
To generate the sequences (44)
and (45), we make two functions,
oscillations
and logistic
, respectively.
These functions take the number of sequence elements (N
)
as input and return the sequence stored in an array.
In another function make_sound
we compute the sequence,
transform the elements to frequencies, generate tones, write
the tones to file, and play the sound file.
As always, we collect the functions in a module and include a test block where we can read the choice of sequence and the sequence length from the command line. The complete module file looks as follows:
from scitools.sound import *
from scitools.std import *
def oscillations(N):
x = zeros(N+1)
for n in range(N+1):
x[n] = exp(-4*n/float(N))*sin(8*pi*n/float(N))
return x
def logistic(N):
x = zeros(N+1)
x[0] = 0.01
q = 2
for n in range(1, N+1):
x[n] = x[n-1] + q*x[n-1]*(1 - x[n-1])
return x
def make_sound(N, seqtype):
filename = 'tmp.wav'
x = eval(seqtype)(N)
# Convert x values to frequences around 440
freqs = 440 + x*200
plot(range(N+1), freqs, 'ro')
# Generate tones
tones = []
duration = 30.0/N # 30 sec sound in total
for n in range(N+1):
tones.append(max_amplitude*note(freqs[n], duration, 1))
data = concatenate(tones)
write(data, filename)
data = read(filename)
play(filename)
This code should be quite easy to read at the present stage in the document.
However, there is one statement that deserves a comment:
x = eval(seqtype)(N)
The seqtype
argument reflects the type of sequence
and is a string that the user provides on
the command line. The values of the string equal the function
names oscillations
and logistic
. With
eval(seqtype)
we turn the string into a function name.
For example, if seqtype
is 'logistic'
,
performing an eval(seqtype)(N)
is the same as if we had written
logistic(N)
. This technique allows the user of the program
to choose a function call inside the code. Without eval
we
would need to explicitly test on values:
if seqtype == 'logistic':
x = logistic(N)
elif seqtype == 'oscillations':
x = oscillations(N)
This is not much extra code to write in the present example, but if we
have a large number of functions generating sequences, we can save
a lot of boring if-else code by using the eval
construction.
The next step, as a reader who have understood the problem and the
implementation above, is to run the program for two cases:
the oscillations
sequence with \( N=40 \) and the
logistic
sequence with \( N=100 \).
By altering the \( q \) parameter to lower values, you get other sounds,
typically quite boring sounds for non-oscillating logistic growth (\( q < 1 \)).
You can also experiment with other transformations of the form
(46), e.g., increasing the frequency variation
from 200 to 400.