Data Science Toolbox

ELI5

tags: #explainibility #interpretability #python

ELI5 is a Python library which allows to visualize and debug various Machine Learning models using unified API. It has built-in support for several ML frameworks and provides a way to explain black-box models.

SHAP

tags: #explainibility #interpretability #python

SHAP (SHapley Additive exPlanations) is a game theoretic approach to explain the output of any machine learning model. Includes great (interactive) dashboards. Only used it on random forrests so far.

Lime

tags: #explainibility #interpretability #python

lime (Local Interpretable Model-agnostic Explanations) explain what classifiers are doing.

Interpretable Machine Learning - A Guide for Making Black Box Models Explainable.

tags: #explainibility #interpretability #book online version

UMAP

tags: #dimensionreduction Code, Paper Uniform Manifold Approximation and Projection (UMAP) is a dimension reduction technique that can be used for visualisation similarly to t-SNE, but also for general non-linear dimension reduction.

t-SNE

Explaination of Random Forrest Feature Importance

tags: #randomforrests #featureimportance Article

dtreeviz

tags: #interpretability #randomforrests Code Article A python library for decision tree visualization and model interpretation.

HDBSCAN

tags: #visualisation #unsupervised Documentation The hdbscan library is a suite of tools to use unsupervised learning to find clusters, or dense regions, of a dataset.

Kalman Filters

Huber Loss

If MSE is too sensitive to outliers in your data and MAE not enough try Huber loss

Active Learning

tags: #data #labels If you have lots of unlabelled data but labelling is expensive, active learning can help find the best data to label.

Pandas Profiling

tags: #exploration #python #visualization pandas’ DataFrame.describe() on steroids.

SweetViz

tags #exploration #python #visualization In-depth EDA (target analysis, comparison, feature analysis, correlation) in two lines of code, alternative to Pandas Profiling

Great Expectations

tags: #data #etl #pipeline Helps eliminate data pipeline debt, through data testing, documentation, and profiling. Assertions for data

PopMon

tags: #data #pipeline #drift Monitor the stability of a pandas or spark dataframe

Kats “One stop shop for time series analysis in Python”

tags: #timeseries #forecasting #detection Includes 10+ forecasting models, backtesting hyperparameter tuning, pattern detection and time series feature extraction.

Darts

Forecasting: Principles and Practice

tags: #book #timeseries #rlang #forecasting Book Mostly methods like ARIMA, including a chapter on #hierarchical time series

Matrix Profile

tags: #timeseries #anomalydetection #motif Website Presentation Part 1 and Part 2 Python package #python “The matrix profile is a data structure and associated algorithms that helps solve the dual problem of anomaly detection and motif discovery. It is robust, scalable and largely parameter-free.”

Time Series Classification Repository

tags: #timeseries #classification http://timeseriesclassification.com/

tsfresh

tags: #python #timeseries #features tsfresh alculates a large number of time series characteristics (features).

Featuretools

tags: python #features Featuretools automatically creates features from temporal and relational datasets (timeseries and relational data)

Reptile

Article tags: #fewshot #metalearning

Yellowbrick

tags: #vizualization Website Yellowbrick extends the Scikit-Learn API to make model selection and hyperparameter tuning easier. Under the hood, it’s using Matplotlib.

SMOGN

tags: #imbalanced #machinelearning SMOGN: Synthetic Minority Over-sampling for regression with Gaussian Noise