Anomaly Detection on Battery Health

Overview

This project focuses on detecting anomalies in battery health using data from NASA. The primary goal is to identify potential issues in battery performance to enhance maintenance strategies and ensure reliability.

Objectives

Analyze Battery Health Data: Understand patterns and detect anomalies in battery performance data.
Develop Anomaly Detection Models: Build models to identify deviations from normal battery behavior.
Improve Maintenance Strategies: Use insights to predict potential failures and optimize maintenance schedules.

Methodology

1. Data Preprocessing:

Cleaning and normalizing the battery health data.
Handling missing values and outliers.

2. Exploratory Data Analysis (EDA):

Visualizing data distributions and trends.
Identifying key features related to battery health.

3. Model Development:

Implementing machine learning algorithms for anomaly detection (e.g., Isolation Forest, One-Class SVM).
Evaluating model performance using appropriate metrics.

4. Results Visualization:

Visualizing detected anomalies and their impact on battery health.
Comparing model predictions with actual battery performance.

Outcomes

Early Detection: The models can detect anomalies early, preventing potential battery failures.
Optimized Maintenance: Improved maintenance scheduling based on predictive insights, leading to cost savings.
Enhanced Reliability: Ensuring the reliability of battery systems by continuously monitoring health and performance.

Visualisation:

Local Outlier Factor (LOF) based Anomaly detection

Comparison of results from the above algorithms:

I have tried 5 methods for anamoly detection:

IQR based
DBSCAN (density based)
Isolation Forest ()
Local Outlier Factor (LOF)
Elliptical Envelope

linkcode

1. IQR based

As we have see, total 37,766 datapoints (20.13% data) are tagged as outliers using IQR based method.
This data is huge and might contain some useful information as well, so it’s better to treat/ detect outliers with some robust and SOTA methods of anomaly detection. ( worst performer )

2. DBSCAN (density based)

With proper hit and trial for finding the best hyperparameters (eps and minpts) I got eps = 0.5, min_samples = 200. With this, I was able to detect 8009 datapoints as outliers. Have validated visually in lower dimension with above hyperparameters.

3. Isolation Forest

Anomalies with contamination 1%, 2%, 3%, 4% are 1858, 3716, 5573, 7431 respectievly.
Even with the least contamination %, we are able to get the exact no. of outliers which I have validated visually in lower dimension with different different set of features.
However, as per the subject matter expert we can still get it validated if certain points are anamlalies or not, but we are getting good match.
We can easily further classify the outliers into good, average and extreme anaamloes.
As we go on increasing the contamination %, we are getting more number of datapoints classified as extreme outliers.

4. Local Outlier Factor (LOF)

In LOF, the hypermeters used are nneighbors = [5, 20] and contamination = [0.01,0.04].
Almost 7429 datapoints are getting classified as outliers using maximum nearest neigbors (20) and with higesht contamination percentage of 4%.
I have validated visually in lower dimension with above hyperparameters but the performance is less acuurate than Isolation Forest but more accurate than other methods.
With least nearest neigbors and least contamination percentage, we are getting less no. of outliers specifically in the mid-battery capacity range (1.4-1.6) having least and maximum temperature measured. This can be informed to subject matter expert and thus can be validated before removal.
Along with extreme endpoints, the mid level inliers are also considered as outliers upon increasing contamination% upto 4% in bivariate analysis of temperature vs capacity

5. Elliptical Envelope

Providing contamination as hyperparametr, we are able to get 18387 data points labelled (-1) i.e outliers.
Along with extreme endpoints, the mid level inliers are also considered as outliers even in lower contamination% of 1 % in bivariate analysis of temperature vs capacity

Final performance conclusions:

Isolation Forest >> Local Outlier Factor (LOF) > DBSCAN (density based) > Elliptical Envelope >> IQR based