Completed

Binary Classification Ml Model for Laboratory Report Alert Detection

Published on the May 05, 2025 in IT & Programming

About this project

Open

We need to develop a binary classification machine learning model that predicts whether a laboratory report will contain a problem or not (target: Alert, True/False).
The data is stored in multiple Parquet files and includes information about samples, analyses, and specifications.
The model must handle multi-table joins, preprocess the data, train a robust classifier, and explain why it predicts a report as problematic (for example, using shap or lime).
It should also generate a technical report with the model performance and important features, and provide a script that allows us to run new predictions with explanations.

Project overview

We operate in a laboratory environment that processes and validates thousands of reports across different systems (4 separate environments). These environments share a similar data structure, and we want to build one model that can generalize across them. If not possible, separate models per environment are acceptable. The Alert column is our main concern. If the model wrongly classifies a problematic report as normal, it can have serious consequences, so recall must be prioritized. Our goal is to catch any issues before reports are officially released, automating this check with a high level of reliability and interpretability.

Category IT & Programming
Subcategory Data Science
Project size Small
Is this a project or a position? Project
Required availability As needed

Delivery term: May 30, 2025

Skills needed