This page comprises the courses I’ve taken throughout my journey of continous self-education starting during my PhD years and on. The list is not intended to cover a university level program. However, the hope is it may serve as a useful guideline to broaden, deepen, or bridge any gaps in learning towards gaining proficiency in Machine Learning, Natural Language Processing, as well as Computer Science fundamentals (in Python).

While the amount of subjects and courses might be overwhelming, it’s practical and totally doable to start by taking 1-2 courses from each subject that focus on different aspects (e.g., theory vs. practice) or areas.

# Table of Contents

- NLP
- ML and Learning Theory
- Neural Networks/NNs/DL
- Math for ML
- Probability Theory
- Statistics/Statistical Inference
- Python
- Data Science Practice
- Python for Data Science: Pandas/NumPy/IPython
- Computer Science/Algorithms and Data Structures
- Scientific Paper Writing
- Other Useful Subjects

## NLP

#### Traditional NLP Algorithms

**Book** by Dan Jurafsky and James H. Martin Speech and Language Processing (2020, in progress)

**Deeplearning.ai@Coursera** Natural Language Processing Specialization, Courses 1 & 2

CMU 11-711 “Algorithms for NLP”

- program, tasks, books recommendation (NO videos!)

#### DL in NLP

**Book** Yoav Goldberg A Primer on Neural Network Models for Natural Language Processing (2015)

**Stanford** CS224n: Natural Language Processing with Deep Learning

**CMU** CS 11-747 Neural Networks for NLP - more latest theory

**Deeplearning.ai@Coursera** Natural Language Processing Specialization, Courses 3 & 4

## ML and Learning Theory

**Book** by Hal Daumé III, A Course in Machine Learning (2017)

**CalTech** Learning from Data by Prof. Yaser Abu-Mostafa *– fundamental/theoretic*

- program, tasks, and videos

**Cornell** CS4780 Machine Learning by Prof. Kilian Weinberger *–this course & the CalTech’s one perfectly complement each other for the fundamentals of traditional ML & Learning Theory*

**Stanford@Coursera** Machine Learning by Andrew Ng *–very popular, less in-depth theory*

**John Hopkins University@Coursera** Data Science: Statistics and Machine Learning Specialization by Brian Caffo *–in R!!! applied data science*

## Neural Networks/NNs/DL

**Book** Deep Learning (2017) by Ian Goodfellow, Yoshua Bengio, Aaron Courville

**Book** Dive Into Deep Learning (2020) by Aston Zhang, Zachary C. Lipton, Mu Li, and Alexander J. Smola

**CMU 11-785** Intro to Deep Learning by Bhiksha Raj *–fundamental*

- program, tasks, videos, books
- videos Fall 2019 (
*–actually, the 2020 course run might be better as the classes were lead from home and the instructor used time more freely*)

**DeepLearning.ai@Coursera** Deep Learning Specialization

#### PyTorch

Deep Learning with PyTorch Book (2020) by Eli Stevens and Thomas Viehmann

## Math for ML

*This is more advanced for in-depth understanding.*

**Imperial College London@Coursera** Mathematics for Machine Learning Specialization

- program, tasks, and videos

**MIT OCW** Matrix Methods in Data Analysis, Signal Processing, and Machine Learning *–very fundamental (lots of math!)*

- program, tasks, and videos

## Probability Theory (Math for Statistics and Learning Theory)

**MIT OCW** Probabilistic Systems Analysis and Applied Probability by Prof. John Tsitsiklis

**MIT@EdX** Probability - The Science of Uncertainty and Data by Prof. John Tsitsiklis *–same as above but might be updated; only starts at certain dates and not available out of the running sessions!*

- program, tasks, and videos

**METU** Probability And Random Variables by Porf. Elif Uysal *–faster paced than MIT; NO HMM, Processes, Intro to Stat Inference*

- program and videos (NO tasks)

## Statistics/Statistical Inference

**Book** NIST/SEMATECH e-Handbook of Statistical Methods

**Book** (selected chapters) HANDBOOK OF BIOLOGICAL STATISTICS by JOHN H.MCDONALD: hypothesis testing chapter

**Book** An Introduction to Statistical Learning: with Applications in R (Springer Texts in Statistics)

**Book** The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition (Springer Series in Statistics) *–all-time classics*

**MIT OCW** Statistics for Applications (renamed to Fundamentals of Statistics) by Philippe Rigollet *–fundamental; only after mastering Probability Theory*

- program, tasks, and videos

**MIT@EdX** Fundamentals of Statistics by Philippe Rigollet *–same as above but might be updated; only starts at certain dates and not available out of the running sessions!*

- program, tasks, and videos

**John Hopkins University@Coursera** Advanced Statistics for Data Science Specialization by Brian Caffo - R language

- program, tasks, and videos

## Python

**Google@Coursera** Crash Course on Python *–the bare basics*

**MIT@EdX** Introduction to Computer Science and Programming Using Python *–more comprehensive - only starts at certain dates and not available out of the running sessions!*

**MIT@EdX XSeries** Computational Thinking using Python *–!!! it’s a paid course $*

#### Python Practice

## Data Science Practice

#### Git

Git & GitHub Tutorial for Beginners by The Net Ninja –*or anything similar, should be plenty online*

#### Kaggle

Introductory tasks:

- https://www.kaggle.com/c/titanic
- https://www.kaggle.com/c/house-prices-advanced-regression-techniques

**Educative.io** Grokking Data Science: Chapter 4. End-to-End Machine Learning Project *–a walk through a Kaggle competition*

## Python for Data Science: Pandas/NumPy/IPython

**Book** Python for Data Analysis. Data Wrangling with Pandas, NumPy, and IPython.

**Book** High Performance Python, 2nd Edition by Micha Gorelick, Ian Ozsvald (2020)

**Coursera** Applied Data Science with Python Specialization

**Coursera** Pandas Python Library for Beginners/Indermediate in Data Science

**Harvard@EdX** Using Python for Research *–covers NumPy, Scikit-learn*

**Educative.io** From Python to Numpy *–1 month free; further subscription for $*

## Computer Science/Algorithms and Data Structures

*The prep for typical programming interview questions.*

**Book** Algorithms, 4th Edition (2020) by Robert Sedgewick and Kevin Wayne *–in Java*

**Coursera** Algorithms I, II by Robert Sedgewick and Kevin Wayne (authors of the Algorithms book) *–in Java*

**MIPT** Algorithms and Data Structuresin in Python 3 (Алгоритмы и структуры данных на Python 3) by Timofei Khiryanov *–in Russian! I personally like them most*

- additional practice

**MIR OCW** 6.006 Introduction to Algorithms by Prof. Erik Demaine *–arguably another best course on algo & DS*

Udacity https://www.udacity.com/course/intro-to-theoretical-computer-science–cs313

**Harvard@EdX** CS50’s Introduction to Computer Science

## Scientific Paper Writing

**École Polytechnique@Coursera** How to Write and Publish a Scientific Paper (Project-Centered Course)

**Tsinghua University@EdX** Writing, Presenting and Submitting Scientific Papers in English

## Other Useful Subjects

The following subject vary from those that are more fundamental and typically taught in the Bachelor Degree program but are useful to refresh/revisit to more in-depth gaduate degree courses that may be useful for applied NLP only to certain extent and in the volume of selected chapters.

- Linear Algebra
- Partial Differential Equations
- Measure-Theoretic Probability
- Convex Optimization
- Statistical Inference –
*basically, similar to some fundamental readings on ML theory suggetsed above* - Discrete Mathematics
- Scientific Computing/Numerical Analysis
- Data Structures and Algorithms
- Software Design Paradigms in Python and C++
- Stochastic Calculus
- Stochastic Optimization
- Managing/Analyzing Large Data Sets
- Parallel/Distributed Computing
- Deep Learning (start with Intro to Deep Learning by Bhiksha Raj and then decide where to advance)
- Reinforcement Learning
- One domain/practical project course focused on modeling/algorithms
- One domain/practical project course focused on big data (Stanford’s CS246: Mining Massive Data Sets, videos available here)