Show
Article MenuOpen AccessPerspective Fraunhofer IOSB, Gutleuthausstraße 1, 76275 Ettlingen, Germany * Author to whom correspondence should be addressed. Academic Editor: Andreas Holzinger Received: 7 October 2021 / Revised: 16 November 2021 / Accepted: 29 November 2021 / Published: 8 December
2021 Deep Learning is a state-of-the-art technique to make inference on extensive or complex data. As a black box model due to their multilayer nonlinear structure, Deep Neural Networks are often criticized as being non-transparent and their predictions not traceable by humans. Furthermore, the models learn from artificially generated datasets, which often do not reflect reality. By basing decision-making algorithms on Deep Neural Networks, prejudice and unfairness may be promoted unknowingly due to a lack of transparency. Hence, several so-called explanators, or explainers, have been developed. Explainers try to give insight into the inner structure of machine learning black boxes by analyzing the connection between the input and output. In this survey, we present the mechanisms and properties of explaining systems for Deep Neural Networks for Computer Vision tasks. We give a comprehensive overview about the taxonomy of related studies and compare several survey papers that deal with explainability in general. We work out the drawbacks and gaps and summarize further research ideas. Artificial Intelligence (AI)-based technologies are increasingly being used to make inference on classification or regression problems: automated image and text interpretation in medicine, insurance, advertisement, public video
surveillance, job applications, or credit scoring save staff and time and are successful in practice. The severe drawback is that many of these technologies are black boxes and the referenced results can hardly be understood by the user. Quality assurance and the measurability of safety and reliability of explainers has not been sufficiently researched, yet. Das and Rad [1] advised against blindly trusting the
results of a highly predictive classifier, by today’s standard, due to the strong influence of data bias, trustability, and adversarial examples in Machine Learning (ML). These findings are being observed more and more, and consequently, explainable ML is now applied to some questions, for example to explore COVID-19
[2,3,4]. Recent models are more complex, and Deep-Learning (DL) architectures are becoming deeper and deeper, comprising millions of
parameters. Following this, the classification process is merely comprehensible by humans. Metrics such as accuracy or the mean average precision depend on the quality of hand-annotated data. However, these metrics are often the only values that evaluate the learning algorithm itself. In recent years, several weaknesses of Deep-Learning models were found (even high-performing models are affected). For instance, object detectors can be fooled easily by applying small changes to input images
or by creating artificial adversarial examples [5,6]. Such attacks could reveal that supposedly good models are not robust or focus on less relevant features for classifying objects; see the examples in the following. The problem is that the neural network learns only from the training data, which should characterize the task.
However, suitable training data are tedious to create and annotate; hence, they are not always perfect. If in the training data there is a bias, the models will learn it as well. Now, exemplary attacks on neural networks are presented: A dog or wolf classifier turned out to be just a good snow detector [7] because of a bias in the background of the training images. The reason was that most of the photos
for the training set of wolfs were taken on days with snowy weather, while the dog images were shot in landscapes without snow. There are several cases that underline the negative characteristics of a DNN: changing only one pixel in an input image or one letter in a sentence or even adding small-magnitude perturbations could change the prediction
[8,9,10]; see also Figure 1 (we have obtained the necessary copyright permission to reprint this and all other figures in this work). Adversarial examples
with serious impact exist; stickers fixed on road signs [11] or extended adversarial patch attacks in optical flow networks [12] could lead to dangerous misinterpretation. Glasses confuse face detectors by imitating persons
[13,14]. Additional examples are depicted in Figure 2, Figure 3 and
Figure 4. All these cases show how harmful it could be to rely on a black box with supposedly good performance results without quality control. However, currently applied DNN methods and models are such vulnerable black boxes. Since the European Union’s new General Data Protection Regulation (GDPR) was passed in the EU in May 2018, it has restricted the use of Machine Learning and automated individual
decision-making and focuses on the protection of sensitive data of persons such as their age, sex, ancestry, name, or place of residence, for instance. The GDPR also imposes a data quality requirement in the AI area because the quality of the training datasets is decisive for the outcome. If a result affects users, they should be able to demand for explanations of the model’s decision that was made about them
[15]. The explanation has to be transmitted in a precise, transparent, understandable, and easily accessible form and in a clear and simple language. For example, if a doctor makes a mistake, the patient wants to know why. Was the mistake excusable, or does the mistake reflect negligence, even intentionally or occurring due to other factors? Similarly, if a model fails and contributes to an adverse
clinical event or malpractice, doctors need to be able to understand why it produced the result and how it reached a decision. Reference [16] proposed ten commandments as practical guidelines in his work about ethical medical AI. Individual human’s decisions are not free from prejudice, but it should be ensured that a model for a selection procedure, e.g., for job proposals or suspended sentences, should
not discriminate humans in general because of sex, origin, etc. Here, disadvantages can arise for individuals, groups of persons, or a whole society. The prevailing conditions can thereby deteriorate more and more, for instance, due to a gender word bias [17]. An interesting example is word embeddings [18] that a Natural
Language Processing (NLP) algorithm creates from training datasets, which are just any available texts of a specific origin [19]. The discrimination of women, the disabled, black people, etc., seem to be deeply anchored in these texts because of authors’ general or specific prejudices with the serious consequence that the model learns that to be real. This could reinforce discrimination and unfairness. Implicit
Association Tests [20] have uncovered stereotype biases, which people are not aware of. If the model supposes—and studies [18,21,22] show—that doctors are
male and nurses are female, furthermore, that women are sensitive and men are successful, it will sort out all women who apply as a chief doctor—only because of their sex—no need to check their qualification. If in the training data, foreigners have predominantly less income and increased unemployment, an automatic credit scoring model will suggest a higher interest rate or even refuse the request only because of the origin and without considering the individual financial situation. In summary,
this means models that are trained on contaminated data may spread prejudices and discrimination and unfairness is able to progress. Figure 2. Reference [23] created artificial images that were unrecognizable by humans, but the state-of-the-art classifier was very confident that they were known objects. Images are either directly (top) or indirectly
(bottom) encoded. Figure 2. Reference [23] created artificial images that were unrecognizable by humans, but the state-of-the-art classifier was very confident that they were known objects. Images are either directly (top) or indirectly (bottom) encoded. Figure 3. Texture–shape cue conflict [24]: texture (left) is classified as elephant; content (middle) is classified as cat; texture–shape (right) is classified as elephant because of a texture bias. Figure 3. Texture–shape cue conflict [24]: texture (left) is classified as elephant; content (middle) is classified as cat; texture–shape (right) is classified as elephant because of a texture bias. Figure 4. Reference [24] showed that Convolutional Neural Networks focus more on texture than on shapes and that it is a good idea to improve shape bias to obtain more reliable results in the way humans would interpret the content of an image. Figure 4. Reference [24] showed that Convolutional Neural Networks focus more on texture than on shapes and that it is a good idea to improve shape bias to obtain more reliable results in the way humans would interpret the content of an image. 1.2. ContributionGaining understanding of and insight into the models should uncover the named problems. Properties of a model such as transparency and interpretability are basic to build provider trust and fairness. If this succeeds, the causes of the discrimination and serious mistakes can be prevented, additionally. There is the opportunity to improve society by making automated decisions free of prejudice. Our contribution on the way of achieving this goal is to give an overview of state-of-the-art-explainers with regard to their taxonomy by differentiating their mechanisms and technical properties. We did not limit our work to explaining methods, but also looked at the meaning of understanding a machine in general. To our knowledge, this is the first survey paper that focuses mainly on ML black box DNNs for Computer Vision tasks, but also summarizing some other related ML areas. 2. Overview of Explaining Systems for DNNsWe give a short introduction to the early approaches to explain the inner operations of Machine Learning. After that we, focus on understanding DNNs. 2.1. Early Machine Learning Explaining SystemsEarly explaining systems for ML black boxes go back to 1986 with Generalized Additive Models (GAM) [25]. GAMs are global statistic models that use smooth functions, which are estimated using a scatterplot smoother. The technique is applicable to any likelihood-based regression model, provides a flexible method for identifying nonlinear covariate effects in exponential family models and other likelihood-based regression models, and has the advantage of being completely automatic. In its most general form, the algorithm can be applied to any situation in which a criterion is optimized involving one or more smooth functions. Later, Decision Trees were shown to be successful classification tools that provide individual explanations [26,27]. A Decision Tree is a tree-like graph of decisions and their possible consequences, which visualizes an algorithm that only contains condition control statements. Classification begins with the application of the root node test, its outcome determining the branch to a succeeding node, whereby interior nodes are tests applied to instances during classification and branches from an interior node correspond to the possible test outcomes. The process is recursively applied until a leaf node is reached. Finally, the instance is labeled with the class of the leaf node [28]. Another approach [29] shows the marginal effect of one or two features on the prediction of learning techniques using Partial Dependence Plots (PDPs). The method gives a statement about the global relationship of a feature and whether its relation to the outcome is linear, monotonic, or more complex. The PDP is the average of Individual Conditional Expectation (ICE) over all features. ICE [30] points to how the prediction changes if a feature changes. The PDP is limited to two features. Another example is the early use of Explainable AI (XAI) [31] in a simulator game for the commercial platform training aid Full Spectrum Command (FSC) developed by the USC Institute for Creative Technologies and Quicksilver Software, Inc., for the U.S. Army, designed not for entertainment, but as a training tool to achieve a targeted training objective. FSC includes an XAI system that allows the user to ask any subordinate soldier questions about the current behavior of the platoon during the after-action review phase of the game. In addition to this, XAI identifies key events to be highlighted in the after-action review. It was motivated by previous work such as [32], a rule-based expert system that uses early AI techniques and a model of the interaction between physicians and human consultants to attempt to satisfy the demands of a user community. The proposed procedure of [33] is based on a set of assumptions, which allows one to explain why the model predicted a particular label for a single instance and what features were most influential for that particular instance for several classification models using Decision Trees. The framework provides local explanation vectors as class probability gradients, which yield the relevant features of every point of interest in the data space. For models where such gradient information cannot be calculated explicitly, the authors employed a probabilistic approximate mimic of the model to be explained. We do not comment further on early studies of explaining systems of Random Forests [34], Naive Bayesian classifiers [35,36,37], Support Vector Machines (SVMs) [38,39], or other early Machine Learning prediction methods [40]. 2.2. Methods and Properties of DNN ExplainersIn the last few years, the importance of DNNs in inference tasks grew rapidly, and with the increasing complexity of these models, the need for better explanations did as well. The most commonly used DNNs for image or video processing [41,42,43] are Convolutional Neural Networks (CNNs) [44], for videos [43] or sequences of text Recurrent Neural Networks (RNNs) [45], and especially for language modeling [46] Long Short-Term Memory networks (LSTMs) [47]. There exist some general surveys of methods for explaining several machine learning black boxes, so-called XAI, that cover a wide spectrum of AI-based black boxes; for instance, see [48,49]. However, we only wanted to focus on the black box DNNs and deepen the insights, especially in Computer Vision. In the following, we describe the methods, algorithms, and tools of state-of-the-art explainers in these tasks. A systematic process for gaining knowledge is described as a method, and a utensil in this process is a tool. An algorithm is a finite sequence of well-defined instructions, typically used to solve a class of specific problems or to perform a computation in the method. The problem is that some explaining methods are in turn black boxes. White box explainers use methods that gain insights and show all causal effects, for instance linear regression or Decision Trees. Black box explainers do not require access to the internals and do not disclose all feature interactions. There are mainly two kinds of explaining methods:
They can be also split into:
They can be also split into:
In general, it can be said that an ante hoc, global, and model-agnostic system is superior to a post hoc, local, and model-specific one. Now, we investigate the employed algorithms of explainers:
Further, we considered important tools:
Reference [64] defined key terms such as “explanation”, “interpretability”, and “explainability” philosophically. We want to emphasize that interpretability and explainability are not the same, although they are often used interchangeably. Explainability arises from a set of connected inference rules and includes interpretability, but not vice versa. An explanation is one of several purposes for empirical research. It is a way to uncover new knowledge and to report relationships among different aspects of the studied phenomena. Explanation attempts to answer the “why” and “how” questions and has variable explanatory power [65]. The quality of an explanation is determined by the person receiving it. It is much more concrete; facts can be described with words or formulas, while an interpretation is just a substantial formation that arises in the head. We also describe the difficulties of both interpretability and completeness, so a compromise is needed. For more details, see the following. We compiled the most important definitions of the properties of explainers and their obvious connections in a more technical way:
Other, in our eyes, less gentle aspects of data-mining and machine-learning models that are mentioned in the literature are fidelity, trust, monotonicity, usability, reliability, causality, scalability, and generality. We summarize a short overview of the definitions of [48,69]: The fidelity measures how exact an interpretable model is in imitating the behavior of a black box. It is measured in terms of the accuracy score, but with respect to the outcome of the black box, similarly to the model accuracy. Another property is trust; its degree for a model increases if the model is built by respecting the constraints of monotonicity given by the user [70]. The usability influences the trust level of a model because the user tends to trust models that provide information that assists them in accomplishing a task with awareness. That is why a queryable and interactive explanation results in being more usable than a fixed, textual explanation. Reliability means that a model should have the ordinary required features to maintain certain levels of performance independently of small variations of the input data or the parameters. Causality is the ability to control changes in the input due to a perturbation affecting the model behavior. Further, Big Data require scalability, as it is opportune to have models able to scale to large input data with large input spaces. Moreover, generality means one might use the same model with different data in different application scenarios, and it is preferable to have portable models that do not require special training regimes or restrictions. We give some examples of explaining approaches that can be placed in the last point: A global ante hoc method for tabulator data is the Bayesian Rule List (BRL) [71,72]. The BRL is a generative model that yields a posterior distribution over possible decision lists, which consist of a series of if-then statements that discretize a high-dimensional, multivariate feature space into a series of simple, readily interpretable decision statements. The “if” statements define a partition of a set of features, and the “then” statements correspond to the predicted outcome of interest. According to the authors, their experiments showed that the BRL has predictive accuracy on par with the current top algorithms for prediction in Machine Learning. The BRL is able to be used to produce highly accurate and interpretable medical scoring systems. The work was developed from preliminary versions that used a different prior, called the Bayesian list machine [73], a generative model for fitting decision lists, a type of interpretable classifier, to the data. Similar to DeepRED [74], the rule generation “KnowledgeTron” (KT) [75] applied an if-then-rule for each neuron, layer by layer, to obtain a formalized relationship between a DNN and a rule-based system. The extraction of rules is important because this allows interpreting the network’s knowledge and can regularize it and prevent it from overfitting the data. Another option in this field is to decompose a DNN into Decision Trees, e.g., TREPAN [76] or DeepRED. TREPAN is an algorithm for extracting comprehensible, symbolic representations from trained neural networks. It queries a given network to induce a Decision Tree that describes the concept represented by the network. The authors demonstrated that TREPAN is able to produce Decision Trees that are accurate and comprehensible and maintain a high level of fidelity to the networks from which they were extracted. According to the authors of DeepRED, their method is the first attempt to extract rules and make a DNN’s decision more transparent. Decision Trees have been used since the 1990s to explain machine-learning tasks, but applied to DNNs, their generation is quite expensive and the comprehensibility suffers from the necessary increasing size and number of trees. Decision Trees are able to explain a model completely, but with DNNs, there is a conflict with comprehensibility. The problem is that incomprehensible Decision Trees are presumptively no more explanatory than the original DNN. Furthermore, the inability to encode loops makes it difficult to explain most algorithms, let alone heuristics encoded by recurrent DNNs. 2.3. Selected DNN Explainers PresentedLet us give a technical overview about the selected explanator models; we focus on the category of Computer Vision: The Counterfactual Impact Evaluation (CIE) method [77,78] is a local method of comparison for different predictions. Counterfactuals are contrastive. They explain why a decision was made instead of another. A counterfactual explanation of a prediction may be defined as the smallest change to the feature values that changes the prediction to a predefined output. They could be employed for DNNs with any data type. A famous work that is often associated with visualizing and understanding Convolutional Neural Networks is DeconvNet [51]: DeconvNet is a calculation of a backward convolutional network that reuses the weights at each layer from the output layer back to the input image. The employed mechanisms are deconvolution and unpooling, which are especially designed for CNNs with convolutions, max-pooling, and Rectified Linear Units (ReLUs). The method makes it possible to create feature maps of an input image that activates certain hidden units most, linked to a particular prediction; see Figure 5. With their propagation technique, they identified the most responsible patterns for this output. The patterns are visualized in the input space. DeconvNet is limited to max-pooling layers, but the unpooling uses an approximate inverse. A particular theoretical criterion that could directly connect the prediction to the created input patterns is missing. To close that gap, Reference [55] proposed a new and efficient way following the initial attempt of Zeiler and Fergus. They replaced max-pooling by a convolutional layer with an increased stride and called the method the All Convolutional Net. The performance on image recognition benchmarks was similar as well. With this approach, they were able to analyze the neural network by introducing a novel variant of DeconvNet to visualize the concepts learned by higher network layers of the CNN. The problem of max-pooling layers is that they are not invertible in general. That is the reason why Zeiler and Fergus computed positions of maxima within each pooling region and used these “switches” in DeconvNet for a discriminative reconstruction. Not using max-pooling, Springenberg et al. could directly display learned features without being conditioned on an image. Furthermore, for higher layers, they produced sharper, more recognizable visualizations of descriptive image regions than previous methods. This is in agreement with the fact that higher layers learn more invariant representations. Nie et al. [79] criticized DeconvNet or methods such as GBP for generating more human-interpretable visualizations instead of class-sensitive ones than the map. They carried out only a partial image reconstruction, not emphasizing class-relevant pixels or presenting the learned weights, thus providing no added or explanatory value. In their work, the authors began with a random three-layer CNN and later generalized it to more realistic cases. They explained that both GBP and DeconvNet are basically performing partial image recovery, which verifies their class-insensitive properties, unlike the saliency map. In addition to this, DeconvNet relies on max-pooling to recover the input. Nie et al. revealed that it is the backward ReLU, used by both GBP and DeconvNet, along with the local connections in CNNs that are responsible for the human-interpretable visualizations. Finally, the authors concluded in principle that DeconvNet or GBP is unrelated to the decision-making of neural networks. Yosinski et al. [61] introduced two tools to aid the interpretation of DNNs in a global way. First, they displayed the neurons’ activations produced at each layer of a trained CNN processing an image or sequence of images. They found that looking at live activations that change in response to the input images helps build valuable intuitions about the inner mechanisms of these neural networks. The second tool was built on previous versions that calculated less recognizable images. Some novel regularization methods that were combined produced qualitatively clearer and more interpretable visualizations and enabled plotting features at each layer via regularized optimization in the image space. A gradient-based method is Layerwise Relevance Propagation (LRP) [53], which suffers from the shattered gradients problem; see Figure 7. It relies on a conservation principle to propagate the outcome decision back without using gradients. The idea behind it is a decomposition of prediction function as a sum of layerwise relevance values. When LRP is applied to deep ReLU networks, LRP can be understood as a deep Taylor decomposition of the prediction. This principle ensures that the prediction activity is fully redistributed through all the layers onto the input variables. For more about how to explain nonlinear classification decisions with deep Taylor decomposition, see [54]. They decomposed the network classification decision into the contributions of its input elements and assessed the importance of single pixels in image classification tasks. Their method efficiently utilizes the structure of the network by backpropagating the explanations from the output to the input layer and displaying the connections in heat maps. Reference [60] investigated different methods to compute heat maps in Computer Vision applications. They concluded that layerwise relevance propagation (e.g., LRP) is qualitatively and quantitatively superior in explaining what made a DNN arrive at a particular classification decision to the sensitivity-based approaches or the deconvolution methods [51]. The inferior methods were much noisier and less suitable for identifying the most important regions with respect to the classification task. Their work did not give an answer about how to make a more detailed analysis of the prioritization of image regions or even how to quantify the heat map quality. Reference [80] criticized that the explaining methods DeconvNet, guided backpropagation, and LRP do not produce the theoretically correct explanation for a linear model and their contributions to understanding were scarce. Based on an analysis of linear models (see also [54,81]), they proposed a generalization that yielded the two neuronwise explanation techniques PatternNet (for signal visualization) and PatternAttribution (for decomposition methods) by taking the data distribution into account. They demonstrated that their methods were sound and constitute a theoretical, qualitative, and quantitative improvement towards explaining and understanding DNNs. Furthermore, a global and ante hoc model is a joint framework for description and prediction, presented by [82]. The model creates Black box Explanations through Transparent Approximations (BETAs). It learns a compact two-level decision set in which each rule explains parts of the model behavior unambiguously and is a combined objective function to optimize these aspects: high agreement between the explanation and the model; little overlap between the decision rules in the explanation; the explanation decision set is lightweight and small. An interpretable end-to-end explainer for healthcare is the REverse Time AttentIoN mechanism RETAIN [83] for application to Electronic Health Record (EHR) data. The approach mimics physician practice by attending to the EHR data. Two RNNs are trained in a reverse time order with the goal of efficiently generating the appropriate attention variables. It is based on a two-level neural attention generation process that detects influential past visits and significant clinical variables to improve accuracy and interpretability. Another technique was realized by [59]. To find prototype class members, they created input images that had the highest probability of being predicted as certain classes of a trained CNN (Figure 8). Their tools were Taylor series, based on partial derivatives to display input sensitivities in images. A few years later, Reference [58] developed this idea further by synthesizing the preferred inputs for neurons in neural networks via deep generator networks for activation maximizing. The first algorithm is the generator and creates synthetic prototype class members that look real. The second algorithm is the black box classifier of the artificial image whose classification probability should be maximized. To view the prototype images, see Figure 9. Another related derivative-based method is DeepLift [84]. It propagates activation differences instead of gradients through the network. Partial derivatives do not explain a single decision, but point to what change in the image could make a change in the prediction. Reference [85] showed that some convolutional layers behave as unsupervised object detectors. They used global average pooling and created heat maps of a pre-softmax layer that pointed out the regions of an image that were responsible for a prediction. The method is called Class Activation Mapping (CAM). On this was created Gradient-weighted Class Activation Mapping (Grad-CAM) [86] (see Figure 10), which is applicable to several CNN model families, classification, image captioning, visual question answering, reinforcement learning, or re-training. An outcome decision can be explained by Grad-CAM by using the gradient information to understand the importance of each neuron in the last convolutional layer of the CNN. The Grad-CAM localizations are combined with existing high-resolution visualizations to obtain guided Grad-CAM visualizations that are both high-resolution, as well as class discriminative. On the methods CAM and Grad-CAM was built Grad-CAM++ [87], which gives human-interpretable visual explanations of CNN-based predictions of multiple tasks such as classification, image captioning, or action recognition. Grad-CAM++ composites the positive partial derivatives of feature maps of a convolutional rear layer with a weighted special class score. The method can provide better visual explanations of CNN predictions, in particular better object localization, and explains by considering the occurrence of multiple object instances in an image. To mark the most responsible pixels or areas of pixels of an image for a special class prediction, it is a promising idea to increase human understanding. The approach of [88] focuses on single words of a caption generated by an RNN and highlighted the region of the image that is most important for this word; see Figure 11. It displays a visualization of an attention map by a highlighted region for the word “dog” of the associated image caption “A dog is standing on a hardwood floor”. The underlying mechanism relies on a combination of RNN and cross-attention, trained for the task of generating a caption for the given image. Corresponding data are required for such a task, i.e., the combination of images and their associated captions. The explanation is right, although the caption of the classifier is not totally correct because the dog is lying on the floor and not standing. An explainer just explains a given prediction or a certain part of it, independent if it being wrong or not. A person can use it to check various criticisms, but it is not built to automatically verify or rate its use for bias detection and the identification of adversarial attacks or even debug it. Much more general is Local Interpretable Model-agnostic Explanations (LIMEs), presented by [7], which can explain the predictions of any agnostic black box classifier and any data; see Figure 6. This is a post hoc, local model, being interpretable and model-agnostic. LIMEs focus on feature importance and give outcome explanations: they highlight the superpixels of the regions of the input image or the words from a text or table that are relevant for the given prediction. While the model complexity is kept low, the model minimizes the distance of the explanation to the prediction. This ensures a trade-off between interpretability and local fidelity. A challenge for transparency is that LIMEs are themselves a black box, and Reference [89] pointed to the poor performance of LIMEs regarding their proposed evaluation metrics correctness, consistency, and confidence in comparison to the other considered explainers Grad-CAM, Smooth-Grad, and IG (described in the following). Furthermore, one can explain only images that can be split into superpixels. The authors did not describe how to explain video object detection or segmentation networks. An interesting approach is a prototype evaluator on LIMEs. The Submodular Pick (SP-LIME) model judges whether you can trust the whole model or not. It selects a picked diverse set of representative instances with LIMEs via submodular optimization. The user should evaluate the black box by regarding the feature words of the selected instances. It is conceivable that it also recognizes bias or systematic susceptibility to adversarial examples. With this knowledge, it is also possible to improve a bad model. SP-LIME was researched with text data, but the authors claimed that it can be transferred to models for any data type. Another approach that focuses on the most discriminative region in an image to explain an automatic decision is Deep Visual Explanation (DVE) [90]; see Figure 12. This was inspired by CAM and Grad-CAM and tested the explanator on randomly chosen images from the COCO dataset [91], applied to the pre-trained neural network VGG-16 using the Kullback–Leibler (KL) divergence [92]. They captured the discriminative areas of the input image by considering the activation of high and low spatial scales in the Fourier space. With their conditional multivariate model Prediction Difference Analysis (PDA), Reference [93] concentrated on explaining visualizations of natural and medical images in classification tasks. Their goal was to improve and interpret DNNs. Their technique was based on the univariate approach of [94] and the idea that the relevance of an input feature with respect to a class can be estimated by measuring how the prediction changes if the feature is removed. Zintgraf et al. removed several features at one time using their knowledge about the images by strategically choosing patches of connected pixels as the feature sets. Instead of going through all individual pixels, they considered all patches of a special size implemented in a sliding window fashion. They visualized the effects of different window sizes and marginal versus conditional sampling and displayed feature maps of different hidden layers and top-scoring classes. Reference [95] described Smooth-Grad, which reduces visual noise and, hence, improves visual explanations about how a DNN is making a classification decision. Comparing their work to several gradient-based sensitivity map methods such as LRP, DeepLift, and Integrated Gradients (IG) [96], which estimate the global importance of each pixel and create saliency maps, showed that Smooth-Grad focuses on local sensitivity and calculates averaging maps with a smoothing effect made from several small perturbations of an input image. The effect is enhanced by further training with these noisy images and finally having an impact on the quality of sensitivity maps by sharpening them. The work of [89] evaluated the explainers LIME, Grad-CAM, Smooth-Grad, and IG with regard to the properties correctness, consistency, and confidence and came to the result that Grad-CAM often performs better than others. To improve and expand the understanding of Multimodal Explanation (ME) [97], a local, post hoc approach gave visual and textual justifications of the predictions with the help of two novel explanation datasets through crowd sourcing. The employed tasks were classification decisions for activity recognition and visual question answering; see Figure 13. The visual explanation was created by an attention mechanism that conveyed knowledge about what region of the image was important for the decision. This explanation guides the generation of the textual justification out of a LSTM feature, which is a prediction of a classification problem over all possible justifications. A new and extensive visualized approach was created by [98] (see Figure 14), showing what features a Deep-Learning model has learned and how those features interact to make predictions. Their model is called Summit and combines two scalable tools: (1) activation aggregation discovers important neurons; (2) neuron-influenced aggregation identifies relationships among such neurons. An attribution graph that reveals and summarizes crucial neuron associations and substructures that contribute to a model’s outcomes is created. Summit combines famous methods such as computing synthetic prototypes of features and showing examples from the dataset that maximize special neurons of different layers. Deeper in the graph, it is examined how the low-level features combine to create high-level features. Novel as well is that it exploits neural networks with activation atlases [63]; see Figure 15. This method uses feature inversion to visualize millions of activations from an image classification network to create an explorable activation atlas of features the network has learned. Their approach is able to reveal visual abstractions within a model and even high-level misunderstandings in a model that can be exploited. Activation atlases are a novel way to peer into convolutional vision networks and represents a global, hierarchical, and human-interpretable overview of concepts within the hidden layers. Just in the last few months, the importance of interpretability has been growing, which is why there appeared several single studies, investigating the contribution of some aspects of a neural network such as the impact of color [99], texture [24], etc., without explaining a whole model extensively, however, which all contribute to a deeper understanding. In Table 1, we give an overview of the presented explainers of DNNs, sorted by date and year. The main techniques and properties are mentioned for a brief comparison. The property model-agnostic is abbreviated as agn. You can initially orientate yourself first on the type of data and desired method to find a suitable model. It makes sense to choose a further developed variant in each case, e.g., the All Convolutional Net instead of DeconvNet or GRAD-Cam instead of LIME, IG, or Smooth-Grad. 2.4. Analysis of Understanding and Explaining MethodsWe still wanted to go into studies on the general analysis of explainability and machine understanding and considered some survey papers. With an overview of the interpretability of Machine Learning, Reference [64] try to explain explanations. They defined some key terms and reviewed a number of approaches towards classical explainable AI systems, also focusing on Deep Learning tasks. Furthermore, they investigated the role of single layers, individual units, and representation vectors in the explanation of deep network representations. Finally, they presented a taxonomy that examined what is being explained by these explanations. They summarized that it is not obvious what the best purpose or type of explanation metric is and should be and gave advice to combine explaining ideas from different fields. Another approach towards understanding is to evaluate the human interpretability of explanation [110]. They investigated the consistence of the output of an ML system with its input and the supposed rationale. Therefore, they carried out user studies to identify what kinds of increases in complexity have the most dominant effect on the time humans need to take to verify the rationale and which seem to be more insensitive. Their study quantified what kind of explanation makes them be the most understandable by humans. As a main result, they found that in general, greater complexity results in higher response times and lower satisfaction. Even simple interpretable explainers do not mostly quantify if the user can trust them. A study on trust in black box models and post hoc explanations [111] provided the main problems in the literature and the kind of black box systems. They evaluated three different explanation approaches: (1) based on the users’ initial trust, (2) the users’ trust in the provided explanation of three different post hoc explanator approaches, and (3) the established trust in the black box by a within-subject design study. The participants were asked if they trust that a special model works well in the real world, if they suppose it to be able to distinguish between the classes, and why. The results of their work led to the conclusion that although the black box prediction model and explanation are independent of each other, trusting the explanation approach is an important aspect of trusting the prediction model. A discussion of some existing approaches to perturbation analysis was given in the study of [112]; see Figure 16. Their work, based on [103], found an adversarial effect of perturbations on the network’s output. Extremal perturbations are regions of an input image that maximally affect the activation of a certain neuron in a DNN. Measuring the effects of perturbations of the image is an important family of attribution methods. Attribution aims at characterizing the response of a DNN by looking at which parts of the network’s input are the most responsible ones for determining its prediction, which is mostly performed by several kinds of backpropagation. Fong et al. investigated their effect as a function of the area. In particular, they visualized the difference between perturbed and unperturbed activations using a representation inversion technique. They introduced TorchRay [113], a PyTorch interpretability library. Fan et al. [114] reviewed the key ideas, implications, and limitations of existing interpretability studies. They proposed a comprehensive taxonomy for interpretation methods with a focus on medicine. To overcome the vulnerabilities of existing deep reconstruction networks, while at the same time transferring the interpretability of the model-based methods to the hybrid DNNs, they used their recently proposed ACID framework. This allows a synergistic integration of data-driven priors and Compressed Sensing (CS)-modeled priors. Some results are visualized from their own implementation and link to open-sourced relevant codes on GitHub; see [115]. Finally, they concluded that a unified and accountable interpretation framework is critical to elevate interpretability research to a new level. In their work, Burkart and Huber [116] offered a formalization of different explanation approaches and a review of the corresponding classification and regression literature for the entire explanation chain. They gave reasons for explainability and the assessment of it and introduced example domains demanding XAI. Concepts and definitions were worked out. In the main part, surrogate models, where the explanation is directly inferred from the black box model, were described. In addition to this, the authors specified approaches that can directly generate an explanation. Various aspects of the data, data quality, and ontologies were highlighted as well. The conclusion of this extensive study was that the most one can strive for when giving an explanation is a sort of human-graspable approximation of the decision process. This is the reason why in an explanation, all sorts of environmental conditions play a role, as the person telling the story seeks to build trust in and understanding of her/his decision. Another paper [117] presented an overview of interpretable approaches and defined new characteristics of interpretable ML, for example privacy, robustness, causality, or faith. Related open-source software tools help to explore and understand the behavior of the ML models and to describe enigmatic and unclear ML models. These tools assist in constructing interpretable models and include a variety of interpretable ML methods that assist people with understanding the connection between input and output variables through interpretation and validating the decision of a predictive model to enable lucidity, accountability, and fairness in the algorithmic decision-making policies. 2.5. Open Problems in Understanding DNNs and Future WorkWhen summarizing the functionality of explainers, one notices that some facts are hard to measure: First, the time, which is needed to understand the decision, is difficult to obtain. Local working explainers can deliver a root case for each prediction, but: How many examples are necessary to look at, to be sure, that all results and thereby the black box is faithful? In addition to this, the model complexity of several explainers is different. The complexity is often calculated as a term opposed to interpretability. The complexity of a black box can be expressed, for instance, as the number of non-zero weights in neural networks or the depth of trees for the decision. However, the complexity of the explanation could depend on the complexity of the black box. More work must be performed also in data exploration: Interpretable data in the mentioned papers are mainly images, texts, and tabular data, which are all easily interpretable data for humans. Missing is general data such as vectors, matrices, or complex spatiotemporal numbers. Of course, they have to be transformed before analyzing to be understandable for our brains. Sequences, networks, etc., could be the inputs to a black box, but until now, such models have not been explained. There is no agreement on how to quantify the quality and grade of explanation. These are open problems. It is a challenge to develop appropriate suitable vocabularies of the explainer and align them with the semantics and vocabulary of the domain. Some metrics to evaluate explainers have been proposed, e.g., causability ([68] or [89]), but unfortunately, many of them tend to suffer from major drawbacks such as the computational cost [118] or simply focusing on one desirable attribute of a good explainer [119]. However, a definition with the properties of a DNN model such as reliability, soundness, completeness, compactness, comprehensibility, and the knowledge of the breaking points of an algorithm is still missing. This is uniquely difficult for DNNs because of their deep nonlinear structure and the huge number of parameters that have to be optimized and ultimately explained in a consistent, understandable, and transparent manner. However, to describe the necessary properties of a model, there is a need to focus on uniform definitions. It is important to consider the robustness, which indicates how robust the system is to small changes of the architecture or the test data [67,120], and this could be a reference to reliability. To compare or even evaluate models, also fairness, which requires that the model guarantees the protection of groups against discrimination [48], and reliability need to be quantified. In addition to this, an evaluator that detects bias or identifies and fends off adversarial attacks would be beneficial. Our further work will be performed here. 3. ConclusionsIn this paper, we summarized state-of-the-art explainers and survey manuscripts on Deep Neural Networks. First, we gave reasons why it is necessary to comprehend black box DNNs: adversarial attacks that are not understandable by humans pose a serious threat to the robustness of DNNs. Furthermore, we expounded on why models learn prejudices caused by contaminated training data; hence, through widespread application, they could be partly responsible for increased unfairness. On the other hand, novel laws demand for the right of users to ask how intelligent decision-making systems arrive at their decisions. One first solution was the development of explainers. We gave a taxonomic introduction to the main definitions, properties, tools, and methods of explaining systems. Unlike others—to our best knowledge—we focused on proven explaining models, mostly in the area of Computer Vision, and explained their development and pros and cons. We presented different representations of explanations and differentiated the explainers with regard to the application, data, and properties. Finally, we introduced surveys and studies that analyzed or evaluated explaining systems and tried to quantify machine understanding in general. This leads to the realization that uniform definitions and evaluating systems for explainers are still needed to solve open problems. Finally, the outlook for further ideas such as quantifying the grade of explanation or directly evaluating models makes this work valuable. The idea, research, collection and writing of the survey was done by the first author, V.B. The co-authors D.M. and M.A. improved the
paper through their comments and corrections about the layout and content. Supervision and project administration were done by D.M. and M.A. Funding acquisition was done by M.A. All authors have read and agreed to the published version of the manuscript. This research was funded by the Fraunhofer Institute of Optronics, System Technologies and Image Exploitation IOSB. Not applicable. Informed Consent StatementNot applicable. Data Availability StatementData is contained within the article. Conflicts of InterestThe authors declare no conflict of interest. References
Figure 1. DeepFool [8] examines the robustness of neural networks. Very small noisy images were added on correctly classified images; humans cannot see the difference, but the model changes its prediction: x (left) is correctly classified as whale, but x+r as turtle; r (right) is very small. Figure 1. DeepFool [8] examines the robustness of neural networks. Very small noisy images were added on correctly classified images; humans cannot see the difference, but the model changes its prediction: x (left) is correctly classified as whale, but x+r as turtle; r (right) is very small. Figure 5. DeconvNet [51]: three examples of the input image (a), strongest feature map (b), and feature map projections (c) of Layer 5 and the classifier with the probability of the correct class (d) and the most probable class (e), respectively. Figure 5. DeconvNet [51]: three examples of the input image (a), strongest feature map (b), and feature map projections (c) of Layer 5 and the classifier with the probability of the correct class (d) and the most probable class (e), respectively. Figure 6. Reference [7] presented Local Interpretable Model-agnostic Explanations (LIMEs), which can explain the predictions of any agnostic black box classifier and any data. Here, the superpixels, which are areas of an input image, are highlighted that are most responsible for the top three image classification predictions: (1) original image, (2) explaining electric guitar, (3) explaining acoustic guitar, and (4) explaining Labrador. Figure 6. Reference [7] presented Local Interpretable Model-agnostic Explanations (LIMEs), which can explain the predictions of any agnostic black box classifier and any data. Here, the superpixels, which are areas of an input image, are highlighted that are most responsible for the top three image classification predictions: (1) original image, (2) explaining electric guitar, (3) explaining acoustic guitar, and (4) explaining Labrador. Figure 7. Layerwise Relevance Propagation (LRP) [53] is a gradient method suffering from the shattered gradients problem. The idea behind it is a decomposition of the prediction function as a sum of layerwise relevance values. When the LRP is applied to deep ReLU networks, the LRP can be understood as a deep Taylor decomposition of the prediction. Figure 7. Layerwise Relevance Propagation (LRP) [53] is a gradient method suffering from the shattered gradients problem. The idea behind it is a decomposition of the prediction function as a sum of layerwise relevance values. When the LRP is applied to deep ReLU networks, the LRP can be understood as a deep Taylor decomposition of the prediction. Figure 8. Deep inside convolutional networks [59]: created input images that have the highest probability of being predicted as certain classes of a trained CNN. Here, one can see the created prototypes of the classes goose, ostrich, and limousine (left to right). Figure 8. Deep inside convolutional networks [59]: created input images that have the highest probability of being predicted as certain classes of a trained CNN. Here, one can see the created prototypes of the classes goose, ostrich, and limousine (left to right). Figure 9. To explain what a black box classifier network comprehends as a class member [58] to make synthetic prototype images that look real. They were created by a deep generator network and classified by the black box neural network. Figure 9. To explain what a black box classifier network comprehends as a class member [58] to make synthetic prototype images that look real. They were created by a deep generator network and classified by the black box neural network. Figure 10. Gradient-weighted Class Activation Mapping (Grad-CAM) [86] explains the outcome decision of cat or dog, respectively, of an input image using the gradient information to understand the importance of each neuron in the last convolutional layer of the CNN. Figure 10. Gradient-weighted Class Activation Mapping (Grad-CAM) [86] explains the outcome decision of cat or dog, respectively, of an input image using the gradient information to understand the importance of each neuron in the last convolutional layer of the CNN. Figure 11. Deep Learning [88]: This method highlights the region of an image (dog) that is most important for the part “dog” of the not quite correct predicted output “A dog is standing on a hardwood floor” of a trained CNN. However, the dog is not standing. Figure 11. Deep Learning [88]: This method highlights the region of an image (dog) that is most important for the part “dog” of the not quite correct predicted output “A dog is standing on a hardwood floor” of a trained CNN. However, the dog is not standing. Figure 12. Deep visual explanation [90] highlights the most discriminative region in an image of six examples (park bench, cockatoo, street sign, traffic light, racket, chihuahua) to explain the decision made by VGG-16. Figure 12. Deep visual explanation [90] highlights the most discriminative region in an image of six examples (park bench, cockatoo, street sign, traffic light, racket, chihuahua) to explain the decision made by VGG-16. Figure 13. Multimodal Explanation (ME) [97] explains by two types of justifications of visual question answering task: The example shows two images with food, and the question is if they contain healthy meals or not. The explanations of the answers “yes” or “no” are given textually in justifying in a sentence and visually in pointing out the most responsible areas of the image. Figure 13. Multimodal Explanation (ME) [97] explains by two types of justifications of visual question answering task: The example shows two images with food, and the question is if they contain healthy meals or not. The explanations of the answers “yes” or “no” are given textually in justifying in a sentence and visually in pointing out the most responsible areas of the image. Figure 14. Summit [98] visualizes what features a Deep-Learning model has learned and how those features are connected to make predictions. The Embedding View (A) shows which classes are related to each other; the Class Sidebar (B) is linked to the embedding view, listing all classes sorted in several ways; the Attribution Graph (C) summarizes crucial neuron associations and substructures that contribute to a model’s prediction. Figure 14. Summit [98] visualizes what features a Deep-Learning model has learned and how those features are connected to make predictions. The Embedding View (A) shows which classes are related to each other; the Class Sidebar (B) is linked to the embedding view, listing all classes sorted in several ways; the Attribution Graph (C) summarizes crucial neuron associations and substructures that contribute to a model’s prediction. Figure 15. Activation atlases with 100,000 activations [63]. Figure 15. Activation atlases with 100,000 activations [63]. Figure 16. Extremal perturbations [112]: The example shows the regions of an image (boxed) that maximally affect the activation of a certain neuron in a DNN (“mousetrap” class score). For clarity, the masked regions are blacked out. In practice, the network sees blurred regions. Figure 16. Extremal perturbations [112]: The example shows the regions of an image (boxed) that maximally affect the activation of a certain neuron in a DNN (“mousetrap” class score). For clarity, the masked regions are blacked out. In practice, the network sees blurred regions. Table 1. Overview of some explainers for DNNs. Ordered by model resp. paper name, reference (author and year), data type, main method, and main properties. Table 1. Overview of some explainers for DNNs. Ordered by model resp. paper name, reference (author and year), data type, main method, and main properties.
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Why are neural networks called black boxes?What is the black box? At the most basic level, “black box” just means that, for deep neural networks, we don't know how all the individual neurons work together to arrive at the final output. A lot of times it isn't even clear what any particular neuron is doing on its own.
Why is neural network called black box in contrast to decision tree?A neural network is more of a “black box” that delivers results without an explanation of how the results were derived. Thus, it is difficult or impossible to explain how decisions were made based on the output of the network.
What is a neural network quizlet?Neural networks. Neural networks are a class of machine learning algorithms used to model complex patterns in datasets using multiple hidden layers and non-linear activation functions.
What are neural networks in artificial intelligence?A neural network is a method in artificial intelligence that teaches computers to process data in a way that is inspired by the human brain. It is a type of machine learning process, called deep learning, that uses interconnected nodes or neurons in a layered structure that resembles the human brain.
|