Researchers at Duke University have recently introduced Concept-Whitening, a new type of layer in neural networks that provides the necessary means of interpreting the neural models without hurting predictive performance. The new layer is an alternative to a batch normalization layer as it normalizes and also de-correlates, whitens the latent space — the numerical parameters that store encoded features.
There has been a trade-off between predictive accuracy and interpretability in the machine learning field from the onset of neural networks. The researchers experimented with ConvNets and tested the performance pre-and-post addition of a concept-whitening layer. In ConvNet, the earlier layers detect edges and corners, and the successive layers are built upon those features to detect far more complex attributes. The latent space of the neural model encodes concepts that discriminate classes it is meant to detect. Sadly, the neural models are cheaters. They learn most discriminative features that may not be relevant at all. It is thus vital to know what these models encode in them.
Therefore, a lot of attempts have been made to see inside their hidden layers. In the recent past, there were efforts to interpret individual nodes of pre-trained neural networks. But, the nodes are not always ‘pure,’ i.e., encodes a mixture of features, and information about any concept could be scattered throughout the network. Similarly, Concept-vector — vectors from the latent space chosen to align with predefined or automatically discovered concepts — have also been used. Consequently, they assume each vector encodes only one concept, which is not valid. Hence, these post-hoc approaches rely on the latent space to possess properties that it may not have and can produce misleading and unusable interpretations. Thus, Concept-Whitening emerges as a significant development in deep learning that is featured in Nature Machine Intelligence.
The concepts need not be the labels in the classification problem like the points on any axis that are easier to detect and interpret. The Concept-Whitening module imposes the latent space aligned along the target concepts’ axis. Thus, each point in the latent space has an interpretation in terms of known concepts. The module uses Whitening, which decorrelates and normalizes each axis, along with a rotation matrix that preserves whitening transformation and aligns the concepts with the axes to disentangle concepts.
The researchers were quickly able to show a small modification, adding a Concept-Whitening module to neural network architecture, easily visualizing how the network is learning all of the different concepts at any chosen layer. They even showed how concepts are represented at a given layer of the network. The module provides all these perks without hurting predictive performance.
Their experiment with ConvNets revealed that complex concepts are filtered out. The lower layers of the model create lower-level abstract concepts. For instance, an airplane at an early layer is represented by an abstract concept defined by white or grey objects on a blue background. A bed is represented by an abstract concept that seems to be characterized by warm colors (orange, yellow).
In that sense, the Concept-Whitening layer discovers new, more straightforward concepts that can be formally defined and built on, if desired.