Machine Learning Pipelines

← Back to all posts

Reading time - 4 mins

1. Technical ML Section

Learn 5 best practices for writing Machine Learning Pipelines.

The full article is here (reading time - 6 mins).

2. Career ML Section

Learn how to avoid 3 (unspoken) mistakes on ML interviews

1. Technical ML Section:

(For a more detailed discussion, read the full blog article!)

The ML Pipeline Challenge

Most Data Scientists don’t come from a Software Engineering background, so key coding best practices are often unfamiliar to them.

Yet, these days, Data Scientists are often expected to build end-to-end ML solutions, including pipelines that run in production.

Without solid Software engineering principles, these pipelines often end up messy, inflexible, and fragile—especially without proper code reviews from experienced ML/MLOps engineers.

I’ve seen so many poorly ML Pipelines, so I’ve selected 5 most common ML Pipeline bad coding practices and tips on how to solve them.

→ Bad Practice 1: Hardcoding Parameters in Code

🟠 Hardcoding leads to:

Difficulty in making changes if the codebase is big
Difficulty in adapting parameters for similar modeled objects without lengthy "if-else" statements.
Need for re-deployment if the parameters need to be changed.

✅ Solution:

Use a Configuration (config) File and write the entire ML Pipeline such that the parameters that might change are extracted from this config file. Here is an example:

→ Bad Practice 2: Ignoring Modularization

🟠 Ignoring modularization leads to:

Poor code readability and difficulty in following data transformation
Difficult maintenance and testing
Code repetition and limited reusability

✅ Solution:

Split the code into functions (class methods). Ideally, one function should perform one task.

→ Bad Practice 3: Avoiding type annotations and documenting the code

🟠 Ignoring proper code documentation leads to:

Poor code readability, especially for other developers
Inability to check type-related issues for linters
Poor maintainability, especially in big codebases

✅ Solution:

Add typing annotations, docstrings and comments to your code. Below is an example.

→ Bad Practice 4: Avoiding Unit & Integration Tests

🟠 Avoiding unit and integration tests leads to:

Undetected errors in data preprocessing or feature engineering go unnoticed
Pipeline failures in production & unexpected outputs due to untested edge cases

✅ Solution:

Use Unit Tests to validate individual components, such as data preprocessing functions or feature engineering steps.
Use Integration Tests to check how multiple components work together, such as ensuring the model correctly processes preprocessed data or handles edge cases.

→ Bad Practice 5: Avoiding Logging

🟠 Neglecting proper logging leads to:

Difficult debugging: no clear insights into where and why failures occur
Lack of visibility: hard to track pipeline execution and performance

✅ Solution:

Use tools like Loguru, Structlog or even simple Python logging to implement logging for visibility into pipeline steps. Here is an example of logging:

2. Career ML Section

3 Mistakes to Avoid on ML Interviews

🟠 Mistake 1: Lying about your experience.

Trust me, an experienced interviewer will notice this quickly.

This happens because usually an interviewer goes from simple questions and dive deeper to understand your actual experience.

When lying, simple questions can be answered but the deeper ones are usually not.

For example, if you lie and say that you have Data Science Leadership experience, questions like: “Can you describe an example of how you handle a situation when a delegated task has failed?“ will kill you straight away.

🟠 Mistake 2: Trying to answer questions by guessing

If you don’t know the answer, just say so.

For example, if asked, “How does a median filter work for time series?” and you’ve never used one, don’t guess—it looks terrible.

Instead, say: “I haven’t worked with median filters, but I have experience with X and Y processing methods for time series.”

🟠 Mistake 3: Trying to answer the question straight away

If the question is open ended, for instance, a case study or a ML System Design question, don’t try to answer right away.

Instead, ask clarifying questions first. This will:

Show you’re focused on understanding the bigger picture
Help you avoid misinterpretations or missing key details
Give you time to think while the interviewer responds

That is it for this week!

If you haven’t yet, follow me on LinkedIn where I share Technical and Career ML content every day!

Machine Learning Pipelines: Ad-hoc vs Frameworks

Join MAIstermind for 1 weekly piece with 2 ML guides:

1. Technical ML tutorial or skill learning guide

2. Tips list to grow ML career, LinkedIn, income

Join here!

#6: 5 Tips to Write ML Pipelines | 3 Mistakes to Avoid on ML Interviews

Reading time - 4 mins

1. Technical ML Section

Learn 5 best practices for writing Machine Learning Pipelines.

The full article is here (reading time - 6 mins).

2. Career ML Section

1. Technical ML Section:

(For a more detailed discussion, read the full blog article!)

The ML Pipeline Challenge

→ Bad Practice 1: Hardcoding Parameters in Code

→ Bad Practice 2: Ignoring Modularization

→ Bad Practice 3: Avoiding type annotations and documenting the code

→ Bad Practice 4: Avoiding Unit & Integration Tests

→ Bad Practice 5: Avoiding Logging

🟠 Mistake 1: Lying about your experience.

🟠 Mistake 2: Trying to answer questions by guessing

🟠 Mistake 3: Trying to answer the question straight away

That is it for this week!

Related articles:

Join MAIstermind for 1 weekly piece with 2 ML guides: