We shall start with introducing two fundamental methods for determining the coefficients \( c_i \) in (1). These methods will be introduce for approximation of vectors. Using vectors in vector spaces to bring across the ideas is believed to appear more intuitive to the reader than starting directly with functions in function spaces. The extension from vectors to functions will be trivial as soon as the fundamental ideas are understood.
The first method of approximation is called the least squares method and consists in finding \( c_i \) such that the difference \( f-u \), measured in a certain norm, is minimized. That is, we aim at finding the best approximation \( u \) to \( f \), with the given norm as measure of "distance". The second method is not as intuitive: we find \( u \) such that the error \( f-u \) is orthogonal to the space where \( u \) lies. This is known as projection, or in the context of differential equations, the idea is also well known as Galerkin's method. When approximating vectors and functions, the two methods are equivalent, but this is no longer the case when applying the principles to differential equations.
Let \( \f = (3,5) \) be a vector in the \( xy \) plane and suppose we want to approximate this vector by a vector aligned in the direction of another vector that is restricted to be aligned with some vector \( (a,b) \). Figure 1 depicts the situation. This is the simplest approximation problem for vectors. Nevertheless, for many readers it will be wise to refresh some basic linear algebra by consulting a textbook. Problem 1: Linear algebra refresher suggests specific tasks to regain familiarity with fundamental operations on inner product vector spaces. Familiarity with such operations are assumed in the forthcoming text.
We introduce the vector space \( V \) spanned by the vector \( \psib_0=(a,b) \): $$ \begin{equation} V = \mbox{span}\,\{ \psib_0\}\tp \tag{2} \end{equation} $$ We say that \( \psib_0 \) is a basis vector in the space \( V \). Our aim is to find the vector \( \u = c_0\psib_0\in V \) which best approximates the given vector \( \f = (3,5) \). A reasonable criterion for a best approximation could be to minimize the length of the difference between the approximate \( \u \) and the given \( \f \). The difference, or error \( \e = \f -\u \), has its length given by the norm $$ \begin{equation*} ||\e|| = (\e,\e)^{\half},\end{equation*} $$ where \( (\e,\e) \) is the inner product of \( \e \) and itself. The inner product, also called scalar product or dot product, of two vectors \( \u=(u_0,u_1) \) and \( \v =(v_0,v_1) \) is defined as $$ \begin{equation} (\u, \v) = u_0v_0 + u_1v_1\tp \tag{3} \end{equation} $$
Remark. We should point out that we use the notation \( (\cdot,\cdot) \) for two different things: \( (a,b) \) for scalar quantities \( a \) and \( b \) means the vector starting in the origin and ending in the point \( (a,b) \), while \( (\u,\v) \) with vectors \( \u \) and \( \v \) means the inner product of these vectors. Since vectors are here written in boldface font there should be no confusion. We may add that the norm associated with this inner product is the usual Euclidean length of a vector.
We now want to find \( c_0 \) such that it minimizes \( ||\e|| \). The algebra is simplified if we minimize the square of the norm, \( ||\e||^2 = (\e, \e) \), instead of the norm itself. Define the function $$ \begin{equation} E(c_0) = (\e,\e) = (\f - c_0\psib_0, \f - c_0\psib_0) \tp \tag{4} \end{equation} $$ We can rewrite the expressions of the right-hand side in a more convenient form for further work: $$ \begin{equation} E(c_0) = (\f,\f) - 2c_0(\f,\psib_0) + c_0^2(\psib_0,\psib_0)\tp \tag{5} \end{equation} $$ This rewrite results from using the following fundamental rules for inner product spaces: $$ \begin{equation} (\alpha\u,\v)=\alpha(\u,\v),\quad \alpha\in\Real, \tag{6} \end{equation} $$ $$ \begin{equation} (\u +\v,\w) = (\u,\w) + (\v, \w), \tag{7} \end{equation} $$ $$ \begin{equation} (\u, \v) = (\v, \u)\tp \end{equation} \tag{8} $$
Minimizing \( E(c_0) \) implies finding \( c_0 \) such that $$ \begin{equation*} \frac{\partial E}{\partial c_0} = 0\tp \end{equation*} $$ It turns out that \( E \) has one unique minimum and no maximum point. Now, when differentiating (5) with respect to \( c_0 \), note that none of the inner product expressions depend on \( c_0 \), so we simply get $$ \begin{equation} \frac{\partial E}{\partial c_0} = -2(\f,\psib_0) + 2c_0 (\psib_0,\psib_0) \tp \tag{9} \end{equation} $$ Setting the above expression equal to zero and solving for \( c_0 \) gives $$ \begin{equation} c_0 = \frac{(\f,\psib_0)}{(\psib_0,\psib_0)}, \tag{10} \end{equation} $$ which in the present case, with \( \psib_0=(a,b) \), results in $$ \begin{equation} c_0 = \frac{3a + 5b}{a^2 + b^2}\tp \tag{11} \end{equation} $$
For later, it is worth mentioning that setting the key equation (9) to zero and ordering the terms lead to $$ (\f-c_0\psib_0,\psib_0) = 0, $$ or $$ \begin{equation} (\e, \psib_0) = 0 \tp \tag{12} \end{equation} $$ This implication of minimizing \( E \) is an important result that we shall make much use of.
We shall now show that minimizing \( ||\e||^2 \) implies that \( \e \) is orthogonal to any vector \( \v \) in the space \( V \). This result is visually quite clear from Figure 1 (think of other vectors along the line \( (a,b) \): all of them will lead to a larger distance between the approximation and \( \f \)). The see mathematically that \( \e \) is orthogonal to any vector \( \v \) in the space \( V \), we express any \( \v\in V \) as \( \v=s\psib_0 \) for any scalar parameter \( s \) (recall that two vectors are orthogonal when their inner product vanishes). Then we calculate the inner product $$ \begin{align*} (\e, s\psib_0) &= (\f - c_0\psib_0, s\psib_0)\\ &= (\f,s\psib_0) - (c_0\psib_0, s\psib_0)\\ &= s(\f,\psib_0) - sc_0(\psib_0, \psib_0)\\ &= s(\f,\psib_0) - s\frac{(\f,\psib_0)}{(\psib_0,\psib_0)}(\psib_0,\psib_0)\\ &= s\left( (\f,\psib_0) - (\f,\psib_0)\right)\\ &=0\tp \end{align*} $$ Therefore, instead of minimizing the square of the norm, we could demand that \( \e \) is orthogonal to any vector in \( V \), which in our present simple case amounts to a single vector only. This method is known as projection. (The approach can also be referred to as a Galerkin method as explained at the end of the section Approximation of general vectors.)
Mathematically, the projection method is stated by the equation $$ \begin{equation} (\e, \v) = 0,\quad\forall\v\in V\tp \tag{13} \end{equation} $$ An arbitrary \( \v\in V \) can be expressed as \( s\psib_0 \), \( s\in\Real \), and therefore (13) implies $$ \begin{equation*} (\e,s\psib_0) = s(\e, \psib_0) = 0,\end{equation*} $$ which means that the error must be orthogonal to the basis vector in the space \( V \): $$ \begin{equation*} (\e, \psib_0)=0\quad\hbox{or}\quad (\f - c_0\psib_0, \psib_0)=0, \end{equation*} $$ which is what we found in (12) from the least squares computations.
Let us generalize the vector approximation from the previous section to vectors in spaces with arbitrary dimension. Given some vector \( \f \), we want to find the best approximation to this vector in the space $$ \begin{equation*} V = \hbox{span}\,\{\psib_0,\ldots,\psib_N\} \tp \end{equation*} $$ We assume that the space has dimension \( N+1 \) and that basis vectors \( \psib_0,\ldots,\psib_N \) are linearly independent so that none of them are redundant. Any vector \( \u\in V \) can then be written as a linear combination of the basis vectors, i.e., $$ \begin{equation*} \u = \sum_{j=0}^N c_j\psib_j,\end{equation*} $$ where \( c_j\in\Real \) are scalar coefficients to be determined.
Now we want to find \( c_0,\ldots,c_N \), such that \( \u \) is the best approximation to \( \f \) in the sense that the distance (error) \( \e = \f - \u \) is minimized. Again, we define the squared distance as a function of the free parameters \( c_0,\ldots,c_N \), $$ \begin{align} E(c_0,\ldots,c_N) &= (\e,\e) = (\f -\sum_jc_j\psib_j,\f -\sum_jc_j\psib_j) \nonumber\\ &= (\f,\f) - 2\sum_{j=0}^N c_j(\f,\psib_j) + \sum_{p=0}^N\sum_{q=0}^N c_pc_q(\psib_p,\psib_q)\tp \tag{14} \end{align} $$ Minimizing this \( E \) with respect to the independent variables \( c_0,\ldots,c_N \) is obtained by requiring $$ \begin{equation*} \frac{\partial E}{\partial c_i} = 0,\quad i=0,\ldots,N \tp \end{equation*} $$ The first term in (14) is independent of \( c_i \), so its derivative vanishes. The second term in (14) is differentiated as follows: $$ \begin{equation} \frac{\partial}{\partial c_i} \left(2\sum_{j=0}^N c_j(\f,\psib_j)\right) = 2(\f,\psib_i), \tag{15} \end{equation} $$ since the expression to be differentiated is a sum and only one term, \( c_i(\f,\psib_i) \), contains \( c_i \) (this term is linear in \( c_i \)). To understand this differentiation in detail, write out the sum specifically for, e.g, \( N=3 \) and \( i=1 \).
The last term in (14) is more tedious to differentiate. It can be wise to write out the double sum for \( N=1 \) and perform differentiation with respect to \( c_0 \) and \( c_1 \) to see the structure of the expression. Thereafter, one can generalize to an arbitrary \( N \) and observe that $$ \begin{equation} \frac{\partial}{\partial c_i} c_pc_q = \left\lbrace\begin{array}{ll} 0, & \hbox{ if } p\neq i\hbox{ and } q\neq i,\\ c_q, & \hbox{ if } p=i\hbox{ and } q\neq i,\\ c_p, & \hbox{ if } p\neq i\hbox{ and } q=i,\\ 2c_i, & \hbox{ if } p=q= i\tp\\ \end{array}\right. \tag{16} \end{equation} $$ Then $$ \begin{equation*} \frac{\partial}{\partial c_i} \sum_{p=0}^N\sum_{q=0}^N c_pc_q(\psib_p,\psib_q) = \sum_{p=0, p\neq i}^N c_p(\psib_p,\psib_i) + \sum_{q=0, q\neq i}^N c_q(\psib_i,\psib_q) +2c_i(\psib_i,\psib_i)\tp \end{equation*} $$ Since each of the two sums is missing the term \( c_i(\psib_i,\psib_i) \), we may split the very last term in two, to get exactly that "missing" term for each sum. This idea allows us to write $$ \begin{equation} \frac{\partial}{\partial c_i} \sum_{p=0}^N\sum_{q=0}^N c_pc_q(\psib_p,\psib_q) = 2\sum_{j=0}^N c_i(\psib_j,\psib_i)\tp \tag{17} \end{equation} $$ It then follows that setting $$ \begin{equation*} \frac{\partial E}{\partial c_i} = 0,\quad i=0,\ldots,N,\end{equation*} $$ implies $$ - 2(\f,\psib_i) + 2\sum_{j=0}^N c_i(\psib_j,\psib_i) = 0,\quad i=0,\ldots,N\tp$$ Moving the first term to the right-hand side shows that the equation is actually a linear system for the unknown parameters \( c_0,\ldots,c_N \): $$ \begin{equation} \sum_{j=0}^N A_{i,j} c_j = b_i, \quad i=0,\ldots,N, \tag{18} \end{equation} $$ where $$ \begin{align} A_{i,j} &= (\psib_i,\psib_j), \tag{19}\\ b_i &= (\psib_i, \f)\tp \tag{20} \end{align} $$ We have changed the order of the two vectors in the inner product according to (8): $$ A_{i,j} = (\psib_j,\psib_i) = (\psib_i,\psib_j),$$ simply because the sequence \( i\,-\,j \) looks more aesthetic.
In analogy with the "one-dimensional" example in the section Approximation of planar vectors, it holds also here in the general case that minimizing the distance (error) \( \e \) is equivalent to demanding that \( \e \) is orthogonal to all \( \v\in V \): $$ \begin{equation} (\e,\v)=0,\quad \forall\v\in V\tp \tag{21} \end{equation} $$ Since any \( \v\in V \) can be written as \( \v =\sum_{i=0}^N s_i\psib_i \), the statement (21) is equivalent to saying that $$ \begin{equation*} (\e, \sum_{i=0}^N s_i\psib_i) = 0,\end{equation*} $$ for any choice of coefficients \( s_0,\ldots,s_N \). The latter equation can be rewritten as $$ \begin{equation*} \sum_{i=0}^N s_i (\e,\psib_i) =0\tp \end{equation*} $$ If this is to hold for arbitrary values of \( s_0,\ldots,s_N \) we must require that each term in the sum vanishes, which means that $$ \begin{equation} (\e,\psib_i)=0,\quad i=0,\ldots,N\tp \tag{22} \end{equation} $$ These \( N+1 \) equations result in the same linear system as (18): $$ (\f - \sum_{j=0}^N c_j\psib_j, \psib_i) = (\f, \psib_i) - \sum_{j=0}^N (\psib_i,\psib_j)c_j = 0,$$ and hence $$ \sum_{j=0}^N (\psib_i,\psib_j)c_j = (\f, \psib_i),\quad i=0,\ldots, N \tp $$ So, instead of differentiating the \( E(c_0,\ldots,c_N) \) function, we could simply use (21) as the principle for determining \( c_0,\ldots,c_N \), resulting in the \( N+1 \) equations (22).
The names least squares method or least squares approximation are natural since the calculations consists of minimizing \( ||\e||^2 \), and \( ||\e||^2 \) is a sum of squares of differences between the components in \( \f \) and \( \u \). We find \( \u \) such that this sum of squares is minimized.
The principle (21), or the equivalent form (22), is known as projection. Almost the same mathematical idea was used by the Russian mathematician Boris Galerkin to solve differential equations, resulting in what is widely known as Galerkin's method.