The Mythos of Model Interpretability

Created
TagsInterpretable MLResearch

This paper :

Introduction

Desiderata of Interpretability Research

Interpretability arises when there is a mismatch between the goals we set to out to solve and what an algorithm is predicting. Real World Objectives are often hard to encode in simple metrics such as test set predictive performance. E.g. Ethics and Legality.

Trust

Causality

Potential identify causal relationships in supervised models. Supervised model are trained with goal of prediction by just association. Identification of causal relationships is not an ingredient but if possible, it helps to further scientists in buttoning down to the important feature-output relationships.

Transferability

Informativeness

Fair and Ethical Decision Making

Properties of Interpretable Models

Transparency

Simulatability

Decomposability

Algorithmic Transparency

Post-hoc Interpretability

Extraction of information from learned models.

In the form of :

Text Explanations

Methods

Visualization

Local Explanations

Not global explanations, but for e.g. wrt a particular class.

Discussion

Linear models are not strictly more interpretable than DNNs

Linear Models - Algorithmic Transparency ✅

Linear Models - Simulatibility ❌ (High Dimensional)

Linear Models - Decomposability ❌ (Heavily Engineered features)

There's a tradeoff between algorithmic transparency vs decomposability when choosing between DNNs and Linear models. Atleast Post-hoc reasoning makes sense.