The Mythos of Model Interpretability
Created | |
---|---|
Tags | Interpretable MLResearch |
This paper :
- Tackles grounding some of the assumptions, definitions around Interpretable Machine Learning.
- Motivations behind Interpretability
- Identification of techniques and properties that make a model interpretable.
- Feasibility and Desirability of different notions
- Questions the assumption "Linear models are more interpretable and that DNNs are not"
Introduction
- Literature suggests that definitions of interpretability are ill-defined and claims, technical descriptions are diverse and sometimes not on the same page.
- Papers propose Interpretability as a means to engender trust or identify causal connections in data.
- Models are trained on simplified objectives (metrics) which fail to capture the complexity of real life goals.
- Divergence between real life and offline experimentation may also occur when there's a distributional shift between the data shown to a supervised learner and the data it sees in the deployed environment. E.g. Recommendation, envs where current actions alter future timesteps.
- Often suggested that human decision-making is interpretable. But is that really true? i.e., humans cant really explain the mechanism by which brains work. This suggests that often we look for useful information rather than exact mechanisms (Relevance Information :))
- When are models interpretable? —> We understand how the models actually work i.e., they are transparent. Do we look at parameters? Do we look at properties of the algorithm itself? Model Complexity (Is it even possible for a human to understand the model)?
- Post-hoc Interpretations are a class of methods which explain predictions without understanding the core of how the algorithms work. E.g. how we explain our thinking :)
- Irony: Human brains are black boxes, but post-hoc interpretability is useful nevertheless. We still look to identify and demystify models to their core as much as possible.
Desiderata of Interpretability Research
Interpretability arises when there is a mismatch between the goals we set to out to solve and what an algorithm is predicting. Real World Objectives are often hard to encode in simple metrics such as test set predictive performance. E.g. Ethics and Legality.
Trust
- Argument is that when a model is trustworthy, it is interpretable. How do we encode trust though?
- Trust could be basically that confidence that the model will perform well with respect to the real world objectives.
- Perhaps, it could also be said that we trust a model when we say that we are ok with it taking decisions.
- We care not only about how many times it is right but also for which examples it is right. For e.g. if a model makes the same mistakes as a human, it could still potentially be called trustworthy by the human making mistakes because there's essentially no cost of giving the model full control.
Causality
Potential identify causal relationships in supervised models. Supervised model are trained with goal of prediction by just association. Identification of causal relationships is not an ingredient but if possible, it helps to further scientists in buttoning down to the important feature-output relationships.
Transferability
- Basically the argument is that under new conditions how does our model perform? Sometimes newer conditions invalidate any future predictions by the model e.g. asthma patients contracting Pneumonia. Model might predict lesser aggressive treatments for such patients if the model is trained to predict if a person will die if they have had asthma previously. This invalidates the model prediction.
- Adversarial Environments? E.g. security.
- Shift from IID data to OOD :)
Informativeness
Fair and Ethical Decision Making
Properties of Interpretable Models
Transparency
Simulatability
- A model is transparent if a person can contemplate the entire model at once.
- When a human can take the input data together with the parameters of the model and produce a prediction in reasonable time by stepping through every calculation.
- The definition of reasonable cannot be fixed as it depends on :
- Model Size
- Computation required for inference. (Number of calculations)
- Finally, to say that "linear or xyz model is intrinsically intepretable" is wrong. Its just a relative term and its more appropriate to say that the simpler model is more transparent as compared to a high dimensional model.
Decomposability
- Each Input, param, and calculation offers an intuitive explanation as to why its predicting what it is. E.g. In decision trees nodes, weights in an NN can be thought of as the strength of an association between each feature and label.
- This notion requires input features themselves to be interpretable.
Algorithmic Transparency
- Do we understand the shape of error surface?
- Can we prove the algorithm to converge to the optimal solution?
- Deep Learning methods lack algorithmic transparency.
- Humans do not exhibit these forms of transparency.
Post-hoc Interpretability
Extraction of information from learned models.
In the form of :
- Natural Language Explanations, Visualizations of learned representations or models
Text Explanations
Methods
- Train an RNN in parallel to generate an explanation. Maximize likelihood of text.
- Neural Captioning is a perfect example.
Visualization
- Use t-SNE to visualize high dimensional embeddings.
- Visualize filters in CNNs, etc
Local Explanations
Not global explanations, but for e.g. wrt a particular class.
Discussion
Linear models are not strictly more interpretable than DNNs
Linear Models - Algorithmic Transparency ✅
Linear Models - Simulatibility ❌ (High Dimensional)
Linear Models - Decomposability ❌ (Heavily Engineered features)
There's a tradeoff between algorithmic transparency vs decomposability when choosing between DNNs and Linear models. Atleast Post-hoc reasoning makes sense.