## Foundations for Machine Learning

LINEAR ALGEBRA I CALCULUS I STATISTICS I DATA STRUCTURES I

Author of Deep Learning Illustrated

**14-part ON-DEMAND training modules**, this course provides a comprehensive overview of all of the subjects --across **mathematics**, **statistics**, and **computer science** --that underlie contemporary **machine learning** approaches, including **deep learning** and other **artificial intelligence** techniques.

**Chief Data Scientist** at the machine learning company, **Untap**. He authored the 2019 book **Deep Learning Illustrated**, an instant **#1 bestseller** that was translated into six languages. Jon is renowned for his compelling lectures, which he offers in-person at **Columbia University, New York University,** and the NYC Data Science Academy. Jon holds a **Ph.D. in ****Neuroscience** from **Oxford** and has been publishing on machine learning In leading **academic journals** since 2010; his papers have been cited over a thousand times.

**1. Linear Algebra Course (3 modules) **

- Intro to Linear Algebra

- Linear Algebra II: Matrix Operations

**2. Calculus Course (4 modules) **

- Calculus II: Partial Derivatives & Integrals

**3. Probability and Statistics Course (4 modules) **

- Intro to Statistics

**4. Computer Science (3 modules) **

- Algorithms and Data Structures

- Optimization

On-Demand Access

- What Linear Algebra Is
- A Brief History of Algebra
- Tensors
- Scalars
- Vectors and Vector Transposition
- Norms and Unit Vectors
- Basis, Orthogonal, and Orthonormal Vectors
- Arrays in NumPy
- Matrices
- Tensors in TensorFlow and PyTorch

- Tensor Transposition
- Basic Tensor Arithmetic
- Reduction
- The Dot Product
- Solving Linear Systems

- The Frobenius Norm
- Matrix Multiplication
- Symmetric and Identity Matrices
- Matrix Inversion
- Diagonal Matrices
- Orthogonal Matrices

**4. Eigendecomposition**

- Eigenvectors
- Eigenvalues
- Matrix Determinants
- Matrix Decomposition
- Application of Eigendecomposition

**5. Matrix Operations for Machine Learning**

- Singular Value Decomposition (SVD)
- The Moore-Penrose Pseudoinverse
- The Trace Operator
- Principal Component Analysis (PCA): A Simple Machine Learning Algorithm
- Resources for Further Study of Linear Algebra

**1. Limits**

- What calculus is
- A Brief History of Calculus
- The Method of Exhaustion
- Matrix Decomposition
- Application of Eigendecomposition

**2. Computing Derivatives with Differentiation**

- The Delta Method
- Basic Derivative Properties
- The Power Rule
- The Sum Rule
- The Product Rule
- The Quotient Rule
- The Chain Rule

**3. Automatic Differentiation**

- AutoDiff with Pytorch
- AutoDiff with TensorFlow 2
- Relating Differentiation to Machine Learning
- Cost (or Loss) Functions
- The Future: Differentiable Programming

**4. Gradients Applied to Machine Learning**

- Partial Derivatives of Multivariate Functions
- The Partial-Derivative Chain Rule
- Cost (or Loss) Functions
- Gradients
- Gradient Descent
- Backpropagation
- Higher-Order Partial Derivatives

**5. Integrals**

- Binary Classification
- The Confusion Matrix
- The Receiver-Operating Characteristic (ROC) Curve
- Calculating Integrals Manually
- Numeric Integration with Python
- Finding the Area Under the ROC Curve
- Resources for Further Study of Calculus

**1. Introduction to Probability**

- What Probability Theory Is
- A Brief History: Frequentists vs Bayesians
- Applications of Probability to Machine Learning
- Random Variables
- Discrete vs Continuous Variables
- Probability Mass and Probability Density Function
- Expected Value
- Measures of Central Tendency: Mean, Median, and Mode
- Quantiles: Quartiles, Deciles, and Percentiles
- The Box-and-Whisker Plot
- Measures of Dispersion: Variance, Standard Deviation, and Standard Error
- Measures of Relatedness: Covariance and Correlation
- Marginal and Conditional Probabilities
- Independence and Conditional Independence

**2. Distribution in Machine Learning**

**Uniforms**

- Gaussian: Normal and Standard Normal
- The Central Limit Theorem
- Log-Normal
- Binominal and Multinomial
- Poisson
- Mixture Distributions
- Preprocessing Data for Model Input

**3. Information Theory**

- What Information Theory Is
- Self-Information
- Nats, Bits and Shannons
- Shannon and Differential Entropy
- Kullback-Leibler Divergence
- Cross-Entropy

**4. Frequentist Statistics**

- Frequentist vs Bayesian Statistics
- Review of Relevant Probability Theory
- z-scores and Outliers
- p-values
- Comparing Means with t-tests
- Confidence Intervals
- ANOVA: Analysis of Variance
- Pearson Correlation Coefficient
- R-Squared Coefficient of Determination
- Correlation vs Causation
- Correcting for Multiple Comparisons

**5. Regression**

- Features: Independent vs Dependent Variables
- Linear Regression to Predict Continuous Values
- Fitting a Line to Points on a Cartesian Plane
- Ordinary Least Squares
- Logistic Regression to Predict Categories
- (Deep) ML vs Frequentist Statistics

**6. Bayesian Statistics**

- When to use Bayesian Statistics
- Prior Probabilities
- Bayes’ Theorem
- PyMC3 Notebook
- Resources for Further Study of Probability and Statistics

**1. Introduction to Data Structures and Algorithms**

- A Brief History of Data
- A Brief History of Algorithms
- “Big O” Notation for Time and Space Complexity

**2. Lists and Dictionaries**

- List-Based Data Structures: Arrays, Linked Lists, Stacks, Queues, and Deques
- Searching and Sorting: Binary, Bubble, Merge, and Quick
- Set-Based Data Structures: Maps and Dictionaries
- Hashing: Hash Tables, Load Factors, and Hash Maps

**3. Trees and Graphs**

- Trees: Decision Trees, Random Forests, and Gradient-Boosting (XGBoost)
- Graphs: Terminology, Directed Acyclic Graphs (DAGs)
- Resources for Further Study of Data Structures & Algorithms

**4. The Machine Learning Approach to Optimization**

- The Statistical Approach to Regression: Ordinary Least Squares
- When Statistical Approaches to Optimization Breakdown
- The Machine Learning Solution

**5. Gradient Descent**

- Objective Functions
- Cost / Loss / Error Functions
- Minimizing Cost with Gradient Descent
- Learning Rate
- Critical Points, incl. Saddle Points
- Gradient Descent from Scratch with PyTorch
- The Global Minimum and Local Minima
- Mini-Batches and Stochastic Gradient Descent (SGD)
- Learning Rate Scheduling
- Maximizing Reward with Gradient Ascent

**6. Fancy Deep Learning Optimizers**

- A Layer of Artificial Neurons in PyTorch
- Jacobian Matrices
- Hessian Matrices and Second-Order Optimization
- Momentum
- Nesterov Momentum
- AdaGrad
- AdaDelta
- RMSProp
- Adam
- Nadam
- Training a Deep Neural Net
- Resources for Further Study

**Programming:** All code demos will be in Python, so experience with it or another object-oriented programming language would be helpful for following along with the code examples.

**Mathematics:** Familiarity with secondary school-level mathematics will make the class easier to follow along with. If you are comfortable dealing with quantitative information -- such as understanding charts and rearranging simple equations -- then you should be well prepared to follow along with all the mathematics.