Machine Learning Roadmap. From Zero to Advanced.

Mar 04, 2025

Machine Learning Roadmap, Data Science Roadmap

📌 Download the PDF Version of the Roadmap

📌 Before we start…

If you like the roadmap, you will also like my weekly newsletter! SUBSCRIBE HERE!
Every day, I also post Machine Learning Content on LinkedIn, follow me!

⭐ Intro

Getting into Machine Learning and Data Science without a clear Machine Learning Roadmap can feel overwhelming.

When I started my ML journey in 2017, there were not much of structured resources. Now, the problem is the opposite — there’s too much material, making it hard to know where to begin.

I’ve seen this struggle firsthand, mentoring 70+ aspiring Data Scientists who all asked the same question: “Where should I start?” This roadmap is the answer to this question.

With 7+ years in the ML field, after leading 4 teams and building 8 end-to-end Machine Learning solutions, I’ve learned what truly matters for a successful Data Science career.

This guide provides a structured, step-by-step path to go from beginner to Middle/Senior Data Scientist — with insights even useful for Lead roles.

🟢 Roadmap Alternative Views

You can download this ML Roadmap as a PDF Document HERE.
You can view this ML Roadmap as a GitHub Repo HERE (support us with a STAR!)
This is a Medium Article

👉 This roadmap is for:

Data Science beginners who are looking for a practical step-by-step guide.
Data Scientists who aim to level up skills for a job change or promotion.
Data Scientists who are looking to refresh their knowledge and prepare for interviews.
Data Scientists who want to level up skills in a specific domain, e.g. Optimization.

📚 Roadmap Overview

Ok, enough talking. This is the roadmap overview. We will be breaking down each part in detail below.

Machine Learning Roadmap Overview

You will learn things step-by-step, slowly increasing the importance and the difficulty of the topics.

I recommend that you complete the first 2 modules (Python and ML intro) and slowly start building your own small project. Then grow this project as you go through the ML deep dive and MLOps modules.

Now, let’s start.

1. Python

Life is too short, learn Python. Forget R or S or T or whatever other programming language letters you see.

1.1 Introduction

Here, I propose 2 versions — free and paid. If you ask me, I would go for the paid version because I think those courses are a bit better structured and easier to start with.

Paid version:

Course 1: Basic Python from CodeAcademy

Course 2: Python Programming Skill Track from DataCamp

Free version:

Course 1: Course from futurecoder.io

This course is similar to the CodeAcademy course from the paid version and even has a similar online programming editor. This is a great start for a complete beginner.

Course 2: Course from Dave Gray

Out of tens of free video courses, I found this one to be well-explained with clear language.

Course 3: Mini-project course from freeCodeCamp

1.2 Data manipulation (Pandas & Numpy)

Now, that you are familiar with Python, we can go to what data scientists use it for.

Step 1: Kaggle Pandas course with exercises
This is a great start with Pandas. Go cell by cell of the notebooks and I HIGHLY RECOMMEND (in fact, I insist), complete exercises.
Step 2: Data manipulation topic in awesome mlcourse.ai by Yury Kashnitsky and Co.
Step 3: Data manipulation with Numpy
Step 4 (optional): Pandas exercises repo by Guilherme Samora

1.3 Data visualization

Data visualization is a critical part of data scientist work, so you must know how to make good plots to tell a story.

Intro

Data visualization topic in mlcourse.ai

Deeper dive

Below is the list of resources for each library. You can go through the list of plots to understand what kind of plots you can make in general and try several of them. Then, when doing the projects, always go back there and check how to make what you want.

Matplotlib

Seaborn

Plotly

Plotly is good when you need interactive plots. For me, it was useful when I had to go through multivariate time series data.

1.4 Selected Practical Topics

You got some Python exposure. Now, let’s see what else is important to learn as you go. You do not necessarily need to go through it at the beginning but I recommend you to come back when you start working on your portfolio or work projects.

Topic 1: Python environments and how to set it up with Conda

You need to know what an environment is and how to work with it. I recommend starting with Conda and changing later if required.

Topic 2: Demystifying methods in Python

I highly recommend looking at this article because you will often see and use different Python methods and this article gives an excellent explanation.

Topic 3: Python clean code tips and formatting.

Topic 4: Python imports

Topic 5: Python decorators

2. Data Science / ML introduction

Now we are getting to the “most interesting“ part.

My opinion is that you need to know the basics well to be a solid data scientist. It does not mean to be a nerd, but a good understanding of the main principles will help you in both the job and succeeding in an interview.

In the roadmap, I suggest you get to know only the most often used algorithms but you have to know them very well. Using this knowledge, you can then proceed with other algorithms.

Now, let’s go.

2.1. Introduction

Andrew Ng course

This is a perfect course to get an overview of what machine learning is and what are the two most common problems that are solved by ML: regression and classification. Do not go over the tons of other intro courses, take this.

Note: by default, Coursera is not free but you can ask for financial aid and they will give you that after consideration. I did that several times back in my student days.

2.2. Basic probability, statistics, and linear algebra

Linear algebra

Step 1: Videos of 3blue1brown about linear algebra

Step 2: Tutorial of Python Linear Algebra by Pablo Caceres

Figure from 3blue1brown YouTube Channel

Probability and Statistics

Step 1: Statistics Crash Course by Adriene Hill

Step 2: Learn Statistics with Python by Ethan Weed

2.3 Supervised learning

There are a massive amount of algorithms but you barely use even 20% of them. I propose you learn the following list and then proceed with the rest using the knowledge you get.

There will be some intersections with Andrew Ng’s course, but it would not hurt to go a bit deeper and have different implementations and perspectives on the same material.

Linear regression

Intro theory: Nando de Freitas lectures at UBC

Python Implementation

Regularization in linear regression

LASSO Regularization and Feature Selection

Regularization is an essential concept to understand and with linear models, you can do it easier. There will be a lot of questions in interviews about it, so make sure you know them.

Step 1: Nando de Freitas lectures at UBC

Step 2: Visual explanation with code

Sklearn tutorial with Lasso model

Logistic regression

Logistic regression is a baseline algorithm for classification tasks. As it is highly related to the linear regression model, you do not need to learn it from scratch but it is important to understand some important concepts about it.

Intro: Logistic regression topic of mlcourse.ai

Selected topic: odds ratio as weights interpretability

Gradient boosting

This one you have to know by heart, I’m sorry. I give you some good resources to start.

Gradient Boosting Algorithm Structure

Step 1: Gradient Boosting topic of mlcourse.ai

Step 2: Gradient Boosting, deeper dive

I personally learned a lot from the original XGBoost paper, but Natekin’s paper is very detailed and always great to come back to when you forget things.

Step 3: Demo playground by Alex Rogozhnikov

Another genius made a great visualization for us, normal people. By the way, check out his entire blog. It is simply amazing.

Random Forest

Another crucial algorithm to know by heart. Please, understand the difference between Random Forest and Gradient Boosting, I bet you get this question in 30–40% of the interviews.

Step 1: Lectures by Nando de Freitas

Step 2: Bagging topic on mlcourse.ai

k Nearest Neighbours (k-NN)

Link to the material

2.4 Unsupervised learning

You have grown up, my friend. You are ready to know how to learn things from data without knowing what is the true label/value. Let’s see how.

k-Means Clustering

Clustering

Dimensionality reduction

PCA: Material from the one and famous Sebastian Rashka
t-SNE
- What is it and how to run it in Python
- How to use t-SNE effectively (with great visualizations)
UMAP

PCA and t-SNE visualization (Source)

3. Data Science / ML Deep dive

3.1 Selected Practical Topics

Feature selection

Feature selection is one of the most important topics when you really want to improve your model, make it more transparent, and understand the WHYs behind the predictions.

Feature importance

Linear methods: Chapter 5 of Interpretable Machine Learning book

Link 1: https://christophm.github.io/interpretable-ml-book/limo.html
Link 2: https://christophm.github.io/interpretable-ml-book/logistic.html

Tree-based methods: Youtube Raschka lecture

Permutation feature importance: Chapter 8 of Interpretable Machine Learning book

SHAP: SHAP library documentation

Model metrics evaluation

Ok, you fit the model but then what? Even more, which metric you choose for your problem? The following links provide a good overview about Pros and Cons of the main regression and classification metrics. You might also often see questions about these metrics in the interview.

Regression metrics: H2O blog tutorial

Classification metrics: Evidently AI blog tutorial

Cross-validation

Cross-validation is important to understand to effectively avoid overfitting.

3.2 Neural networks introduction

There are tons of resources on neural networks. It is THE hottest topic. Especially with all the buzz with LLM. In my opinion, to get an intro into the topic, Andrew Ng’s specialization is still great. He goes step-by-step and I guarantee you will understand the concept. From that, you can go deeper depending on the domain you are interested in.

It has 5 courses in it, so take a deep breath.

Convolutional Neural Network Block (Source)

3.3 Optimization with Python

Optimization is a relatively hard, heavy-math topic. But it is used in many practical applications. I highly advise you to steadily learn this topic, as it will open great career opportunities.

Introduction to mathematical optimization with Python

This is an AWESOME resource on numerical optimization. Clear examples in Python with mathematical derivations of the basics.

Bayesian Optimization

Bayesian Optimization vs Grid Search vs Random Search

Bayesian optimization is a set of optimization methods that allow optimization of black-box functions using input-output sampling.

Source 1: Awesome playground with theory explanation by distill.pub

Source 2: Tutorial with deep theory dive by Nando de Freitas and Co.

Optimization with SciPy

There are various optimization Python libraries you can use for optimization. SciPy is very often used for it. If you come across the need to use SciPy for this, look at these resources:

Interactive playground of several optimization methods

Sometimes it is useful to play with parameters and see how the algorithm works. Here is a great playground with a couple of methods.

Additional resources

3.4 Signal processing

Signal processing is often an essential part of an ML projects because you have to be able to filter data from noise outliers and other dirty stuff.

Paid source:

I highly recommend a paid course by Mike Cohen. For this price and quality, I consider it essentially free. I have completed the course myself and like it a lot. Since then, I have applied several methods from the course in practice.

Time Series Filters (Image by Author)

Free resources:

If you want fully free alternatives, here are some links on filtering and Fourier transform.

Mean filter

Median filters

Exponential smoothing

Gaussian filter

Fourier transform

Theory and examples

Low and high pass filters

Tutorial

3.5 Anomaly detection

In addition to modeling and optimization, another large class of ML problems are related to anomaly detection. I gathered a list of resources to get an overview of the problem and applied methods to solve it.

Review of anomaly types and detection methods

Good overview with Python examples

Deep learning methods review

List of libraries for time series anomaly detection

Selected articles:

4. MLOps for data scientists

There will always be a debate if data scientists need to know MLOps / machine learning engineering.

MLOps is a huge field itself, but you will be a much stronger professional who can build things from end-to-end on a decent level alone or a solid level within a team of ML engineers.

In this roadmap, I provide you with basic resources to start, so with some digging, you can do a decent job of preparing your models for deployment.

4.1 Introduction

Alexey Grigoriev with a team created a great intro course on MLOPs and I will be referring to his course several times. I suggest you take the whole course, but here I mention specific topics.

Another amazing resource is the neptune.ai blog. They do a fantastic job! I will be mentioning their resources all the way.

I suggest you take a look at his intro videos that do not concern the course environment setup.

4.2 Model registry and experiment tracking

Model registry and experiment tracking are required to develop a consistent workflow of model creation and deployment, especially if you work in a team. Here are some resources to get an intro about it:

Source 1
Source 2
A hands-on example of how to perform experiment tracking and create a model registry.

Model Registry Role in Machine Learning

4.3. ML Pipelines

In my opinion, a data scientist MUST know how to create good clear machine learning pipelines. It is a misery to see still people making terrible hard-coded pipelines while there are many pipeline-building tools. I recommend you do the following:

Step 1: Get an intro to what ML pipelines are and what are the available tools for it:

Step 2: Complete a section of the MLOps zoomcamp with Mage / Prefect (previous years)

Step 3: Take a very simple dataset and make a new pipeline with any tool you like without guidance

4.4 Model Monitoring

Model monitoring is another essential topic that comes after model development. This is the weakest point of all ML solutions I have seen and built so far.

For learning model monitoring, there is no better place than the blog of evidentlyai.com. They also host the model monitoring section on MLOps zoomcamp.

Metrics for Data Drift Monitoring

4.5 Docker basics

Docker is a tool/platform that allows one to build, share, and run container applications that are reproducible in any new environment. Docker is always scary for data scientists because, come on, what the hell is that? I am with you.

However, the truth is that docker is not that difficult, at least at the level that data scientists should know it.

To start, I recommend a great crash course by Nana

4.6 Additional resources

If you feel you want to go deeper into some of the topics, you may want to look at the MLOps roadmap.

However, my warning is that you need to be careful because it is not possible to know everything. So, make sure you understand the basics that you need at your job or to make portfolio projects, and then move on slowly.

📌 That’s it! But 2 more things…

If you like the roadmap, you will also like my weekly newsletter! SUBSCRIBE HERE!
Every day, I also post Machine Learning Content on LinkedIn, follow me!

Join my newsletter for 1 weekly piece with 2 ML guides:

1. Technical ML tutorial or skill learning guide

2. Tips list to grow ML career, LinkedIn, income

Join now!