5 Tips for Writing Machine Learning Pipelines

Mar 22, 2025

Machine Learning Pipelines and how to write them

Estimated reading time - 7 minutes

The ML Pipeline Challenge

Most Data Scientists don’t come from a Software Engineering background, so key coding best practices are often unfamiliar to them.

Yet, these days, Data Scientists are often expected to build end-to-end ML solutions, including pipelines that run in production.

Without solid Software engineering principles, these pipelines often end up messy, inflexible, and fragile - especially without proper code reviews from experienced ML/MLOps engineers.

I’ve seen so many poorly ML Pipelines, so I’ve selected 5 most common bad coding practices of writing ML Pipeline and tips on how to solve them.

Bad Practice 1: Hardcoding Parameters in Code

Here is what parameter hardcoding might look like:

🟠 Hardcoding leads to:

Difficulty in making changes if the codebase is big

Difficulty in adapting parameters for similar modeled objects without lengthy "if-else" statements.

Need for re-deployment if the parameters need to be changed.

✅ Solution:

Use a Configuration (config) File and write the entire ML Pipeline such that the parameters that might change are extracted from this config file. Here is an example:

Config allows:

Easy finding and changing parameters in one place
Avoiding re-deployment of an ML application, parameters can be changed on the fly
Flexible selection of these parameters in the code, e.g. hyperparameters tuning.

Bad Practice 2: Ignoring Modularization

Here is what ignoring modularization might look like:

🟠 Ignoring modularization leads to:

Poor code readability and difficulty in following data transformation
Difficult maintenance and testing
Code repetition and limited reusability

✅ Solution:

Split the code into functions (class methods)
Ideally, one function should perform one task
Good practice for “run” method in the pipeline class is only to call other methods

Bad Practice 3: Avoiding type annotations and documenting the code

Here is an example:

🟠 Ignoring proper code documentation leads to:

Poor code readability, especially for other developers
Inability to check type-related issues for linters
Poor maintainability, especially in big codebases
Inability to produce high-quality code documentation

✅ Solution:

Add typing annotations, docstrings and comments to your code.

Below is an example of the fixed code. Even in this simple case, we can see that we change the data type from a DataFrame to NumPy arrays.

Bad Practice 4: Avoiding Unit & Integration Tests

🟠 Avoiding unit and integration tests leads to:

Undetected errors in data preprocessing or feature engineering go unnoticed
Pipeline failures in production
Unexpected outputs due to untested edge cases
Difficult debugging

✅ Solution:

Use unit tests to validate individual components, such as data preprocessing functions or feature engineering steps, ensuring they produce the expected outputs for given inputs.
Use Integration tests to check how multiple components work together, such as ensuring the model correctly processes preprocessed data or handles edge cases.

Here is an example of the unit test:

Bad Practice 5: Avoiding Logging

🟠 Neglecting proper logging leads to:

Difficult debugging: no clear insights into where and why failures occur
Lack of visibility: hard to track pipeline execution and performance
Inefficient troubleshooting: more time spent diagnosing errors

✅ Solution:

Use tools like Loguru, Structlog or even simple Python logging to implement logging for visibility into pipeline steps.

Here is an example of logging:

Conclusion

If you want your pipeline to actually work in production, stop hardcoding parameters, modularize your code, add proper documentation, write tests, and use logging.

These aren’t just “nice-to-haves”—they save you from wasting hours fixing broken code and wondering why your model suddenly performs terribly.

Want to read more about ML Pipelines? Check out more of my articles here:

Machine Learning Pipelines: Ad-hoc vs Frameworks

Machine Learning Pipelines: Ad-hoc vs Frameworks

Subscribe to my newsletter below to receive the articles straight to your inbox!

Join my newsletter for 1 weekly piece with 2 ML guides:

1. Technical ML tutorial or skill learning guide

2. Tips list to grow ML career, LinkedIn, income

Join now!

5 Tips for Writing Machine Learning Pipelines

The ML Pipeline Challenge

Bad Practice 1: Hardcoding Parameters in Code

Bad Practice 2: Ignoring Modularization

Bad Practice 3: Avoiding type annotations and documenting the code

Bad Practice 4: Avoiding Unit & Integration Tests

Bad Practice 5: Avoiding Logging

Conclusion

Related Articles:

Join my newsletter for 1 weekly piece with 2 ML guides: