Matrix Inversion - Linear Algebra for Deep Learning (Part 3)

Inverting a matrix is one of the most useful numerical tools in applied science. It is an essential component in computational options pricing and deep learning. In this article we discuss the importance of matrix inversion and define it mathematically.

In the previous article on Matrix Algebra we considered basic matrix operations such as how to add and multiply matrices together. We noted that these operations were a necessary prerequisite towards the concept of matrix inversion.

In this article we are going to introduce maxtrix inversion and describe why it is important. We will motivate matrix inversion via the concept of solving simultaneous linear equations, which may be familiar from your high school mathematics classes.

The need to solve simultaneous linear equations arises in many fields of applied science. Within physics and engineering numerically simulating fluid flows on a computer requires the solution of simultaneous linear equations. In quantitative finance they are utilised to solve the Black-Scholes PDE, which is necessary for certain types of options pricing.

The importance of matrix inversion as a technique lies in the fact that it can help us solve these simultaneous linear equations. It is a crucial method in statistics and machine learning, famously arising in the solution to the Ordinary Least Squares estimate for linear regression.

## Simultaneous Linear Equations

Let's begin with a motivating example of a small set of simultaneous linear equations:

\begin{eqnarray} x + 2y + 4z &=& 10 \\ 3x + y - 5z &=& -8 \\ 4x - 3y + 7z &=& 4 \end{eqnarray}

The usual goal is to find the values of $x$, $y$ and $z$ that satisfy these equations. In the above case with only three equations some relatively simple algebraic manipulation will provide the answer.

Once the number of equations grows significantly larger however it can becomes notationally unwieldy to write the equations in this manner.

An alternative is to write such systems compactly using matrix notation. By setting $\boldsymbol{x} = [x, y, z]^T$, $\boldsymbol{b} = [10, -8, 4]^T$ and the matrix of coefficients $\boldsymbol{A}$ by the following:

$\boldsymbol{A} = \left[ \begin{array}{ccc} 1 & 2 & 4 \\ 3 & 1 & -5 \\ 4 & -3 & 7 \end{array} \right]$

It is possible to write the above system as:

$\left[ \begin{array}{ccc} 1 & 2 & 4 \\ 3 & 1 & -5 \\ 4 & -3 & 7 \end{array} \right] \left[ \begin{array}{c} x \\ y \\ z \end{array} \right] = \left[ \begin{array}{c} 10 \\ -8 \\ 4 \end{array} \right]$

Or, more compactly:

\begin{eqnarray} \boldsymbol{A} \boldsymbol{x} = \boldsymbol{b} \end{eqnarray}

Note the bold typeface emphasising that the terms in the equation are vector and matrix quantities rather than scalar quantities.

If this were an algebraic equation of the form $A x = b$, where $A, x, b \in \mathbb{R}$, that is, the terms are all scalar quantities, with $A \neq 0$ then this could be solved for $x$ simply by dividing through by $A$:

\begin{equation} x = \frac{b}{A} = \frac{1}{A} b = A^{-1} b \end{equation}

In the final equality of this line it is shown that $x = A^{-1} b$. One might now ask whether it is possible to solve the matrix equation $\boldsymbol{A} \boldsymbol{x} = \boldsymbol{b}$ via some form of analagous expression $\boldsymbol{x} = \boldsymbol{A}^{-1} \boldsymbol{b}$?

Since the operation of dividing through by a matrix is undefined the answer lies in defining the matrix $\boldsymbol{A}^{-1}$, which is known as the matrix inverse of $\boldsymbol{A}$. In order to do so however it is necessary to introduce some other mathematical tools.

## Kronecker Delta and Indentity Matrices

The Kronecker delta is a mathematical function of two variables $i,j$ with the following values:

\begin{equation} \delta_{ij} = \begin{cases} 1, & \text{if } i=j,\\ 0, & \text{if } i\neq j. \end{cases} \end{equation}

Essentially, the Kronecker delta equals one when $i$ and $j$ are equal and zero otherwise.

This function can be utilised to define a new square matrix where each element in the matrix at position $i, j$ is equal to the Kronecker delta of that value.

Mathematically this is written compactly as $\boldsymbol{I}_n=[\delta_{ij}]_{n \times n}$. The matrix itself will have the following form:

\begin{equation} \boldsymbol{I}_n=\begin{bmatrix} \kern4pt 1 & 0 & 0 & \dots & 0 \kern4pt \\ \kern4pt 0 & 1 & 0 & \dots & 0 \kern4pt \\ \kern4pt 0 & 0 & 1 & \dots & 0 \kern4pt \\ \kern4pt \vdots & \vdots & \vdots & \ddots & \vdots \kern4pt \\ \kern4pt 0 & 0 & 0 & \dots & 1 \kern4pt \\ \end{bmatrix} \end{equation}

In particular all elements of the matrix are zero except those on the diagonal of the matrix, which are equal to one. This is because the diagonal elements represent the case where $i=j$ and the Kronecker delta equals one at these values.

This type of matrix is known as the identity matrix with dimensionality $n$. The identity matrix possesses the unique property that when it left- or right-multiplies any square matrix $\boldsymbol{A} \in \mathbb{R}^{n \times n}$ it leaves $\boldsymbol{A}$ unchanged:

\begin{equation} \boldsymbol{I}_n \boldsymbol{A} = \boldsymbol{A} \boldsymbol{I}_n = \boldsymbol{A} \end{equation}

If you recall how matrix multiplication works from the previous article it should be possible to convince yourself that this is the case.

This now allows us to define the matrix inverse $\boldsymbol{A}^{-1}$. The matrix inverse is precisely the matrix that when left- or right-multiplied to $\boldsymbol{A}$ produces the identity matrix:

\begin{equation} \boldsymbol{A}^{-1} \boldsymbol{A} = \boldsymbol{I}_n = \boldsymbol{A} \boldsymbol{A}^{-1} \end{equation}

In order to gain some intuition as to why this is so consider the following familiar rules of multiplication for an equivalent scalar algebraic equation:

\begin{equation} \frac{a}{a} = \frac{1}{a} a = a^{-1} a = 1 = a a^{-1} = a \frac{1}{a} = \frac{a}{a} \end{equation}

In particular $a^{-1} a = 1 = a a^{-1}$. The value $1$ can be interpreted as a $1 \times 1$ matrix with a single value.

## Solving the Matrix Equation

With these definitions it is now possible to solve the original equation $\boldsymbol{A} \boldsymbol{x} = \boldsymbol{b}$:

\begin{eqnarray} \boldsymbol{A} \boldsymbol{x} &=& \boldsymbol{b} \\ \boldsymbol{A}^{-1} \boldsymbol{A} \boldsymbol{x} &=& \boldsymbol{A}^{-1} \boldsymbol{b} \\ \boldsymbol{I}_n \boldsymbol{x} &=& \boldsymbol{A}^{-1} \boldsymbol{b} \\ \boldsymbol{x} &=& \boldsymbol{A}^{-1} \boldsymbol{b}\label{eqn-matrix-inverse} \end{eqnarray}

Firstly the equation is left-multiplied by the inverse matrix $\boldsymbol{A}^{-1}$. The term $\boldsymbol{A}^{-1} \boldsymbol{A}$ can then be set to the identify matrix $\boldsymbol{I}_n$.

However we noted that any vector or matrix multiplied by the identify matrix is left unchanged. Hence we can remove the reference to $\boldsymbol{I}_n$ and leave an expression for $\boldsymbol{x}$ in terms of the matrix inverse $\boldsymbol{A}^{-1}$ and the vector $\boldsymbol{b}$.

It is now possible to solve the equation $\boldsymbol{x} = \boldsymbol{A}^{-1} \boldsymbol{b}$ for the unknown vector $\boldsymbol{x}$ assuming that an appropriate $\boldsymbol{A}^{-1}$ exists.

However we have yet to describe how to go about actually calculating $\boldsymbol{A}^{-1}$ or whether it is even possible to do so.

This will be the topic of future articles in the series, but we will briefly mention some of the main mechanisms for calculating the matrix inverse.

## Algorithms for Matrix Inversion

A common method to find the inverse of a matrix (if it exists) is to use a technique known as Gauss-Jordan elimination (or Gaussian Elimination). However for a large $n \times n$-dimensional matrix this is an expensive and inefficient mechanism for finding an inverse.

The cost of the operation scales cubically with the size of the matrix. That is it has an arithmetic complexity of $\mathcal{O}(n^3)$. This can be a significant problem if many such inverse operations need to be carried out.

For most applications in applied science, quantitative finance and deep learning it is sufficient to approximate the solution $\boldsymbol{x}$ to within a certain tolerance rather than directly seeking the exact value.

There are many less computationally intensive algorithms that are effective enough to provide the necessary accuracy.

These algorithms consist either of a well-optimised version of Gauss-Jordan elimination for particular structured matrices1 or make use of an iterative approach that approximates $\boldsymbol{x}$ to a certain precision in a finite number of iterations.

The need to effectively approximate $\boldsymbol{x}$ in the above matrix equation has lead to the development of an entire subfield of mathematics known as Numerical Linear Algebra.

Certain matrix factorisations can be extremely useful in more efficiently solving this equation. These factorisations will be discussed at length in subsequent articles.

## Next Steps

In the next article we will explore Gauss-Jordan elimination and write some Python code to carry out a matrix inversion and compare our implementation to one provided by the SciPy library.

## Footnotes

1. Such 'banded' matrices arise as the result of discretising Partial Differential Equations (PDE)—a common technique in computational fluid dynamics or options pricing. A well-known algorithm here is the Thomas Tridiagonal Matrix Algorithm (TDMA).