Linear Algebra in AI: Vectors, Matrices, Eigenvalues, and Singular Value Decomposition

Raj Shaikh 16 min read 3215 words

1: Linear Algebra – The Backbone of AI

Subtopics for Linear Algebra:

Vectors: The Building Blocks of AI
Matrices: Multi-Dimensional Magic
Matrix Operations
Special Matrices
Determinants and Inverses
Eigenvalues and Eigenvectors
Singular Value Decomposition (SVD)
Matrix Factorization and Decomposition

1. Vectors: The Building Blocks of AI

Imagine vectors as arrows in space. They have a direction and a magnitude (length). In AI, vectors represent things like:

A row of pixel values in an image 🖼️
A word embedding in NLP 📝
A hidden layer in a neural network 🤖

Mathematical Form

A vector in \( n \)-dimensional space (\( \mathbb{R}^n \)) is written as:

\[ \mathbf{v} = \begin{bmatrix} v_1 \\ v_2 \\ \vdots \\ v_n \end{bmatrix} \]

Operations on Vectors

Addition: Add corresponding elements. \[ \mathbf{u} + \mathbf{v} = \begin{bmatrix} u_1 \\ u_2 \end{bmatrix} + \begin{bmatrix} v_1 \\ v_2 \end{bmatrix} = \begin{bmatrix} u_1 + v_1 \\ u_2 + v_2 \end{bmatrix} \]
Scalar Multiplication: Scale each element. \[ c \cdot \mathbf{v} = c \cdot \begin{bmatrix} v_1 \\ v_2 \end{bmatrix} = \begin{bmatrix} c \cdot v_1 \\ c \cdot v_2 \end{bmatrix} \]

Dot Product

The dot product of two vectors measures their similarity:

\[ \mathbf{u} \cdot \mathbf{v} = u_1v_1 + u_2v_2 + \ldots + u_nv_n \]

Example: Similarity between word embeddings in NLP.

Norm (Length of a Vector)

The norm of a vector is its length:

\[ \|\mathbf{v}\| = \sqrt{v_1^2 + v_2^2 + \ldots + v_n^2} \]

2. Matrices: Multi-Dimensional Magic

Now, matrices are like a spreadsheet of numbers—but cooler. They’re the backbone of linear transformations in AI.

Mathematical Form

A matrix is a rectangular array of numbers with dimensions \( m \times n \) (rows \( m \), columns \( n \)):

\[ A = \begin{bmatrix} a_{11} & a_{12} & \ldots & a_{1n} \\ a_{21} & a_{22} & \ldots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1} & a_{m2} & \ldots & a_{mn} \end{bmatrix} \]

Matrix Operations

Addition: Add corresponding elements.
Scalar Multiplication: Scale every element by a number.
Matrix Multiplication: The rulebook for combining matrices: \[ C = A \cdot B \] Where \( C_{ij} = \sum_{k} A_{ik}B_{kj} \).

Why Vectors and Matrices Matter in AI

Vectors are data points (e.g., images, text embeddings).
Matrices encode relationships and transformations (e.g., weights in a neural network).

Code Example: Vectors and Matrices in Python

Let’s visualize this with a Python example using NumPy:

import numpy as np

# Define vectors
u = np.array([1, 2])
v = np.array([3, 4])

# Vector operations
dot_product = np.dot(u, v)
norm_u = np.linalg.norm(u)

print("Dot Product:", dot_product)
print("Norm of Vector u:", norm_u)

# Define matrices
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])

# Matrix operations
C = np.dot(A, B)
print("Matrix Multiplication:\n", C)

2: Matrix Operations – Unlocking the Magic of Data Transformation

Subtopics for Matrix Operations:

Matrix Addition and Subtraction
Scalar Multiplication
Matrix Multiplication (Dot Product)
Transpose of a Matrix
Identity Matrix: The “Neutral” Element
Diagonal Matrices: The Cool Simplifications
Inverse of a Matrix: The “Undo” Button
Determinants: A Measure of Matrix Power

1. Matrix Addition and Subtraction

Matrix addition is like blending two sets of data together, element by element.

Mathematical Form

If \( A \) and \( B \) are matrices of the same dimensions (\( m \times n \)):

\[ C = A + B \quad \text{where} \quad C_{ij} = A_{ij} + B_{ij} \]

Example

\[ A = \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix}, \quad B = \begin{bmatrix} 5 & 6 \\ 7 & 8 \end{bmatrix} \]\[ A + B = \begin{bmatrix} 1+5 & 2+6 \\ 3+7 & 4+8 \end{bmatrix} = \begin{bmatrix} 6 & 8 \\ 10 & 12 \end{bmatrix} \]

Humor break: Adding matrices is like putting peanut butter on toast—it just works, as long as the bread (dimensions) matches! 🍞

2. Scalar Multiplication

Scalar multiplication scales every element of the matrix by a constant \( c \).

Mathematical Form

\[ C = c \cdot A \quad \text{where} \quad C_{ij} = c \cdot A_{ij} \]

Example

\[ c = 2, \quad A = \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} \]\[ 2 \cdot A = \begin{bmatrix} 2 \cdot 1 & 2 \cdot 2 \\ 2 \cdot 3 & 2 \cdot 4 \end{bmatrix} = \begin{bmatrix} 2 & 4 \\ 6 & 8 \end{bmatrix} \]

Think of scalar multiplication as turning up the volume of your favorite song—everything gets louder (or scaled up). 🎵

3. Matrix Multiplication (Dot Product)

Matrix multiplication is like the ultimate power tool in AI. It combines two matrices into a new matrix that captures how they interact.

Mathematical Form

If \( A \) is \( m \times n \) and \( B \) is \( n \times p \):

\[ C = A \cdot B \quad \text{where} \quad C_{ij} = \sum_{k=1}^n A_{ik}B_{kj} \]

Example

\[ A = \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix}, \quad B = \begin{bmatrix} 5 & 6 \\ 7 & 8 \end{bmatrix} \]\[ C = A \cdot B = \begin{bmatrix} 1 \cdot 5 + 2 \cdot 7 & 1 \cdot 6 + 2 \cdot 8 \\ 3 \cdot 5 + 4 \cdot 7 & 3 \cdot 6 + 4 \cdot 8 \end{bmatrix} = \begin{bmatrix} 19 & 22 \\ 43 & 50 \end{bmatrix} \]

Matrix multiplication is essential for AI tasks like combining weights and inputs in neural networks.

4. Transpose of a Matrix

The transpose flips the matrix over its diagonal. Rows become columns, and vice versa.

Mathematical Form

If \( A \) is \( m \times n \):

\[ A^T = \begin{bmatrix} a_{11} & a_{21} & \ldots \\ a_{12} & a_{22} & \ldots \\ \vdots & \vdots & \ddots \end{bmatrix} \]

Example

\[ A = \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix}, \quad A^T = \begin{bmatrix} 1 & 3 \\ 2 & 4 \end{bmatrix} \]

Transposes often show up in dot products and matrix inversions. Think of it as flipping your pancake for even cooking. 🥞

5. Identity Matrix: The “Neutral” Element

The identity matrix, \( I \), is the matrix equivalent of 1 in multiplication. It leaves other matrices unchanged.

Mathematical Form

\[ I = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix} \]

Property

\[ A \cdot I = I \cdot A = A \]

Think of \( I \) as the matrix equivalent of saying, “You do you!” to other matrices. ✌️

6. Code Example: Matrix Operations in Python

Here’s how we can perform these operations with Python’s NumPy:

import numpy as np

# Define matrices
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])
I = np.eye(2)  # Identity matrix

# Operations
add_result = A + B
scalar_result = 2 * A
dot_result = np.dot(A, B)
transpose_result = A.T

# Print results
print("Matrix Addition:\n", add_result)
print("Scalar Multiplication:\n", scalar_result)
print("Matrix Multiplication (Dot Product):\n", dot_result)
print("Transpose of A:\n", transpose_result)
print("Identity Matrix:\n", I)

3: Determinants, Inverses, and Eigen-Stars!

Subtopics for This Level:

Determinants: The Matrix “Power Meter”
Matrix Inverse: The “Undo” Button
Eigenvalues and Eigenvectors: Matrix Superstars
Diagonalization and Power of Matrices

1. Determinants: The Matrix “Power Meter”

The determinant tells us if a matrix is invertible (spoiler: a non-zero determinant means it is!). It’s also a geometric measure of how a matrix transforms space.

Mathematical Definition

For a \( 2 \times 2 \) matrix \( A \):

\[ A = \begin{bmatrix} a & b \\ c & d \end{bmatrix}, \quad \text{Determinant:} \quad \text{det}(A) = ad - bc \]

For larger matrices, determinants are calculated recursively using minors and cofactors. Fancy, huh? 🤓

Example

\[ A = \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} \]\[ \text{det}(A) = (1)(4) - (2)(3) = 4 - 6 = -2 \]

Key Property

If \( \text{det}(A) = 0 \), the matrix is singular (non-invertible).

Imagine the determinant as the “stretch factor” of a transformation. If it’s 0, the matrix flattens everything into a lower dimension, like squishing a 3D object onto a flat plane. Oops! 💥

2. Matrix Inverse: The “Undo” Button

The inverse of a matrix \( A \) is the matrix \( A^{-1} \) such that:

\[ A \cdot A^{-1} = I \quad \text{(Identity Matrix)} \]

Formula for a \( 2 \times 2 \) Matrix

If \( A = \begin{bmatrix} a & b \\ c & d \end{bmatrix} \):

\[ A^{-1} = \frac{1}{\text{det}(A)} \begin{bmatrix} d & -b \\ -c & a \end{bmatrix} \]

Example

\[ A = \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix}, \quad \text{det}(A) = -2 \]\[ A^{-1} = \frac{1}{-2} \begin{bmatrix} 4 & -2 \\ -3 & 1 \end{bmatrix} = \begin{bmatrix} -2 & 1 \\ 1.5 & -0.5 \end{bmatrix} \]

Key Points

A matrix must have a non-zero determinant to be invertible.
In AI, matrix inverses are often used in regression (e.g., calculating weights in closed-form linear regression).

3. Eigenvalues and Eigenvectors: Matrix Superstars

Eigenvalues and eigenvectors are like the DNA of a matrix—they tell us its fundamental properties.

Definition

Given a matrix \( A \) and a vector \( \mathbf{v} \), if:

\[ A \mathbf{v} = \lambda \mathbf{v} \]

then \( \lambda \) is an eigenvalue and \( \mathbf{v} \) is its corresponding eigenvector.

Key Insight

The eigenvector \( \mathbf{v} \) points in a direction that doesn’t change under the transformation \( A \)—it only gets scaled by \( \lambda \) (the eigenvalue).

Example

Let’s compute the eigenvalues of:

\[ A = \begin{bmatrix} 4 & 1 \\ 2 & 3 \end{bmatrix} \]

Solve \( \text{det}(A - \lambda I) = 0 \):

\[ \text{det} \left( \begin{bmatrix} 4-\lambda & 1 \\ 2 & 3-\lambda \end{bmatrix} \right) = 0 \]\[ (4-\lambda)(3-\lambda) - (2)(1) = 0 \]\[ \lambda^2 - 7\lambda + 10 = 0 \quad \Rightarrow \quad \lambda = 5, 2 \]

The eigenvalues are \( \lambda = 5 \) and \( \lambda = 2 \). Eigenvectors are found by plugging these values back into \( (A - \lambda I) \mathbf{v} = 0 \).

4. Diagonalization and Power of Matrices

If a matrix \( A \) can be diagonalized, it means:

\[ A = PDP^{-1} \]

where \( P \) contains eigenvectors and \( D \) is a diagonal matrix of eigenvalues.

This makes computations like \( A^n \) super efficient:

\[ A^n = P D^n P^{-1} \]

In AI, this concept is a backbone of Principal Component Analysis (PCA) and other dimensionality reduction techniques.

Code Example: Determinants, Inverses, and Eigenvalues

Let’s compute these properties in Python:

import numpy as np

# Define matrix
A = np.array([[4, 1], [2, 3]])

# Determinant
det_A = np.linalg.det(A)
print("Determinant of A:", det_A)

# Inverse
if det_A != 0:
    inv_A = np.linalg.inv(A)
    print("Inverse of A:\n", inv_A)
else:
    print("Matrix is not invertible.")

# Eigenvalues and Eigenvectors
eigenvalues, eigenvectors = np.linalg.eig(A)
print("Eigenvalues:", eigenvalues)
print("Eigenvectors:\n", eigenvectors)

4: Singular Value Decomposition (SVD) – The Swiss Army Knife of Linear Algebra

Subtopics for SVD:

What is SVD?
The SVD Formula
Understanding \( U \), \( \Sigma \), and \( V^T \)
Computing SVD Step-by-Step (With a Numerical Example)
Applications of SVD in AI
Challenges and Tips for Implementing SVD

1. What is SVD?

Singular Value Decomposition is a technique to decompose any matrix \( A \) into three simpler matrices:

\[ A = U \Sigma V^T \]

Here’s the magic:

\( U \): Orthogonal matrix representing the “left singular vectors.”
\( \Sigma \): Diagonal matrix of singular values.
\( V^T \): Transpose of an orthogonal matrix representing the “right singular vectors.”

Think of it like peeling an onion: SVD breaks down \( A \) into layers, making it easier to analyze and manipulate. 🧅

2. The SVD Formula

Let’s break it down:

\( A \) is an \( m \times n \) matrix.
\( U \) is an \( m \times m \) orthogonal matrix.
\( \Sigma \) is an \( m \times n \) diagonal matrix with singular values on the diagonal.
\( V^T \) is an \( n \times n \) orthogonal matrix.

The diagonal elements of \( \Sigma \) are the singular values \( \sigma_1, \sigma_2, \ldots \), which satisfy:

\[ \sigma_1 \geq \sigma_2 \geq \ldots \geq 0 \]

3. Understanding \( U \), \( \Sigma \), and \( V^T \)

Let’s put these components in perspective:

\( U \): Represents the directions (basis vectors) in the row space of \( A \).
\( \Sigma \): Contains the “importance” or strength of each direction (singular values).
\( V^T \): Represents the directions in the column space of \( A \).

Geometric View

SVD essentially transforms the data from the input space into an intermediary space, stretching or compressing it, and finally rotating it into a new coordinate system.

4. Computing SVD Step-by-Step

Let’s compute SVD for a simple matrix \( A \):

Matrix Example

\[ A = \begin{bmatrix} 1 & 0 \\ 0 & 1 \\ 0 & 0 \end{bmatrix} \]

Compute \( A^T A \) and \( AA^T \):
\[ A^T A = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \end{bmatrix} \begin{bmatrix} 1 & 0 \\ 0 & 1 \\ 0 & 0 \end{bmatrix} = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} \]\[ AA^T = \begin{bmatrix} 1 & 0 \\ 0 & 1 \\ 0 & 0 \end{bmatrix} \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \end{bmatrix} = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 0 \end{bmatrix} \]
Compute eigenvalues and eigenvectors:
- Eigenvalues of \( A^T A \) are \( \lambda = 1, 1 \).
- Corresponding eigenvectors form \( V \).
Compute singular values \( \sigma \):
- Singular values are \( \sqrt{\lambda} = 1, 1 \).
Form matrices \( U \), \( \Sigma \), and \( V^T \):
\[ \Sigma = \begin{bmatrix} 1 & 0 \\ 0 & 1 \\ 0 & 0 \end{bmatrix}, \quad U = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix}, \quad V^T = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} \]

Result:

\[ A = U \Sigma V^T \]

5. Applications of SVD in AI

Dimensionality Reduction:
- SVD is used in PCA to reduce the dimensionality of datasets while preserving the most important information.
Recommender Systems:
- SVD helps decompose user-item matrices in collaborative filtering.
Image Compression:
- By keeping only the largest singular values, you can compress an image while retaining its quality.
Latent Semantic Analysis (LSA):
- In NLP, SVD is used to uncover latent relationships between terms in a document.

6. Challenges and Tips for Implementing SVD

Computational Complexity:
- SVD can be computationally expensive for large matrices. Use approximate methods like truncated SVD.
Numerical Stability:
- Small singular values can cause numerical instability. Regularization techniques can mitigate this.
Interpretability:
- Understanding singular vectors in the context of the original data can be tricky.

Code Example: SVD in Python

Here’s how to compute SVD in Python using NumPy:

import numpy as np

# Define a matrix
A = np.array([[1, 0], [0, 1], [0, 0]])

# Perform SVD
U, Sigma, VT = np.linalg.svd(A)

# Print results
print("U:\n", U)
print("Sigma:\n", Sigma)
print("V^T:\n", VT)

Mermaid.js Diagram

Here’s a diagram showing the SVD decomposition:

graph LR
    A[Matrix A] --> U[Matrix U]
    A --> Sigma[Diagonal Matrix Sigma]
    A --> VT[Matrix V Transpose]
    Sigma --> Compressed[Reduced Matrix Representation]

5: Matrix Factorization and PCA – Compressing the World with Math

Subtopics for Matrix Factorization and PCA:

What is Matrix Factorization?
Applications of Matrix Factorization in AI
PCA: The Math Behind Dimensionality Reduction
Deriving PCA Step-by-Step
Connection Between PCA and SVD
Challenges in Implementing PCA

1. What is Matrix Factorization?

Matrix Factorization is the process of decomposing a large matrix into smaller matrices that, when multiplied, approximate the original matrix. Mathematically:

\[ A \approx W \cdot H \]

Here:

\( A \): Original matrix (\( m \times n \))
\( W \): Basis matrix (\( m \times k \))
\( H \): Coefficient matrix (\( k \times n \))

2. Applications of Matrix Factorization in AI

Matrix Factorization is a rockstar in AI, with applications like:

Recommendation Systems:
- Decomposing a user-item matrix to predict missing ratings.
Topic Modeling:
- Extracting hidden topics from text data.
Image Compression:
- Representing images with fewer components.

3. PCA: The Math Behind Dimensionality Reduction

Principal Component Analysis (PCA) transforms a dataset into a new coordinate system, where the axes (principal components) capture the maximum variance in the data.

Goal of PCA

Reduce the dimensionality of the dataset while retaining as much variance as possible. It achieves this by projecting the data onto the directions of maximum variance.

4. Deriving PCA Step-by-Step

Let’s break PCA into digestible steps.

Step 1: Data Standardization

Center the data by subtracting the mean:

\[ X_{\text{centered}} = X - \mu \]

Here, \( \mu \) is the mean vector.

Step 2: Compute Covariance Matrix

The covariance matrix measures how variables are correlated:

\[ C = \frac{1}{n-1} X_{\text{centered}}^T X_{\text{centered}} \]

Step 3: Find Eigenvalues and Eigenvectors

Compute the eigenvalues (\( \lambda \)) and eigenvectors (\( v \)) of the covariance matrix:

\[ Cv = \lambda v \]

Step 4: Select Principal Components

Sort eigenvalues in descending order and pick the top \( k \) eigenvectors. These eigenvectors become the principal components.

Step 5: Transform Data

Project the original data onto the principal components:

\[ X_{\text{reduced}} = X_{\text{centered}} \cdot V_k \]

Where \( V_k \) is the matrix of the top \( k \) eigenvectors.

5. Connection Between PCA and SVD

Here’s the plot twist: PCA is essentially a special case of SVD!

When performing PCA:

\[ X = U \Sigma V^T \]

Columns of \( U \): Principal components.
Diagonal of \( \Sigma \): Singular values, related to eigenvalues of the covariance matrix.

Think of SVD as the machinery and PCA as the elegant output.

6. Challenges in Implementing PCA

Scaling Issues:
- PCA is sensitive to scaling. Always standardize your data.
Interpretability:
- Principal components are often hard to interpret in real-world terms.
Computational Complexity:
- For large datasets, computing eigenvalues can be slow. Use approximate methods.

Numerical Example: PCA on a Simple Dataset

Let’s illustrate PCA with a small dataset:

Data Matrix

\[ X = \begin{bmatrix} 2.5 & 2.4 \\ 0.5 & 0.7 \\ 2.2 & 2.9 \\ 1.9 & 2.2 \\ 3.1 & 3.0 \\ 2.3 & 2.7 \\ 2.0 & 1.6 \\ 1.0 & 1.1 \\ 1.5 & 1.6 \\ 1.1 & 0.9 \end{bmatrix} \]

Step 1: Center the Data

Compute mean for each column, subtract from each element.

Step 2: Compute Covariance Matrix

\[ C = \frac{1}{n-1} X^T X \]

Step 3: Compute Eigenvalues and Eigenvectors

Let’s say the eigenvalues are \( \lambda_1 = 1.28 \), \( \lambda_2 = 0.49 \). The eigenvector for \( \lambda_1 \) is:

\[ v_1 = \begin{bmatrix} 0.677 \\ 0.736 \end{bmatrix} \]

Step 4: Transform Data

Project onto \( v_1 \):

\[ X_{\text{reduced}} = X_{\text{centered}} \cdot v_1 \]

Code Example: PCA in Python

Let’s perform PCA using Python’s NumPy:

import numpy as np

# Data matrix
X = np.array([[2.5, 2.4],
              [0.5, 0.7],
              [2.2, 2.9],
              [1.9, 2.2],
              [3.1, 3.0],
              [2.3, 2.7],
              [2.0, 1.6],
              [1.0, 1.1],
              [1.5, 1.6],
              [1.1, 0.9]])

# Center the data
X_centered = X - np.mean(X, axis=0)

# Compute covariance matrix
cov_matrix = np.cov(X_centered.T)

# Compute eigenvalues and eigenvectors
eigenvalues, eigenvectors = np.linalg.eig(cov_matrix)

# Project data onto the first principal component
X_reduced = np.dot(X_centered, eigenvectors[:, 0])
print("Reduced Data:\n", X_reduced)

Mermaid.js Diagram

graph TD
    Data[Original Data] --> Centered[Centered Data]
    Centered --> Covariance[Covariance Matrix]
    Covariance --> Eigen[Eigenvalues and Eigenvectors]
    Eigen --> Reduced["Reduced Data (Principal Components)"]

Last updated on February 28, 2025

Linear Regression in Machine Learning: Fundamentals, Loss Functions, and Regularization Techniques Information Theory in AI: Entropy, Cross-Entropy, KL Divergence, and Mutual Information