SADL Experiment Result

This page accompanies a submission to ICSE 2019 Technical Papers track, “Guiding Deep Learning System Testing Using Surise Adequacy”. We have listed all figures, including the ones omitted from the submission due to space limit (titles of figures that are not in the paper are in colour red). The page also contains additional analysis undertaken as part of author response.

RQ1: Is SADL capable of capturing the relative surprise of an input of a DL system?

Figure 2

All results were included in the paper.

Accuracy of test inputs in MNIST and CIFAR-10 dataset, selected from the input with the lowest SA, increasingly including inputs with higher SA, and vice versa (i.e., from the input with the highest SA to inputs with lower SA).
Accuracy of test inputs in MNIST and CIFAR-10 dataset, selected from the input with the lowest SA, increasingly including inputs with higher SA, and vice versa  (i.e., from the input with the highest SA to inputs with lower SA).

Figure 4

We have included the DSA plots for MNIST and CIFAR-10 in the paper.

Sorted DSA values of adversarial examples for MNIST and CIFAR-10.
Sorted DSA values of adversarial examples for MNIST and CIFAR-10.

Figure 4’ (NOT IN THE PAPER)

In addition, here are the per-class plots that show sorted DSA values of each class in MNIST. Note that the number of adversarial examples of each class is different because each adversarial example generation algorithm has own method of targeting specific class.

Sorted DSA values of adversarial examples for MNIST-10 per class.
Sorted DSA values of adversarial examples for MNIST-10 per class.

Figure 4’’ (NOT IN THE PAPER)

The following are the per-class plots that show sorted DSA values of each class in CIFAR-10.

Sorted DSA values of adversarial examples for CIFAR-10 per class.
Sorted DSA values of adversarial examples for CIFAR-10 per class.

RQ2: Does the selection of layers of neurons used for SA computation have any impact on how accurately SA reflects the behaviour of DL systems?

Figure 5’ (NOT IN THE PAPER)

This figure contains sorted LSA values from all layers in MNIST model. In the paper, pool1 was omitted.

Sorted LSA of randomly selected 2,000 adversarial examples for MNIST from different layers.
Sorted LSA of randomly selected 2,000 adversarial examples for MNIST from different layers.

Figure 5’’ (NOT IN THE PAPER)

This figure contains sorted LSA values from all layers in CIFAR-10 model. In the paper, only activation_1, activation_5, and activation_8 were presented.

Sorted LSA of randomly selected 2,000 adversarial examples for CIFAR-10 from different layers.
Sorted LSA of randomly selected 2,000 adversarial examples for CIFAR-10 from different layers.

RQ3: Is SC correlated to existing coverage criteria for DL systems?

Figure 6’ (NOT IN THE PAPER)

This figure shows changes in various coverage criteria against increasing input diversity for each subject model. In the paper, only CIFAR-10 and Chauffeur were shown.

Changes in various coverage criteria against increasing input diversity. We put additional inputs into the original test inputs and observe changes in coverage values.
Changes in various coverage criteria against increasing input diversity. We put additional inputs into the original test inputs and observe changes in coverage values.

Correlation Analysis for Figure 6’ (NOT IN THE PAPER)

In response to one of the reviewer questions, we have calculated Spearman’s rank correlation coefficient between LSC/DSC and other coverage criteria. While the results show strong correlation, note that the sample sizes are very small (ranging from four to six) and some of the correlations are not statistically significant.

DNN LSC DSC
Criteria Spearman's \(\rho\) \(p\)--value Criteria Spearman's \(\rho\) \(p\)--value
MNIST NC 0.926 0.008 NC 0.926 0.008
KMNC 1.000 0.000 KMNC 1.000 0.000
NBC 1.000 0.000 NBC 1.000 0.000
SNAC 0.971 0.001 SNAC 0.971 0.001
CIFAR-10 NC 0.941 0.005 NC 0.941 0.005
KMNC 1.000 0.000 KMNC 1.000 0.000
NBC 1.000 0.000 NBC 1.000 0.000
SNAC 1.000 0.000 SNAC 1.000 0.000
Dave-2 NC 0.949 0.051 NC N/A N/A
KMNC 0.949 0.051 KMNC N/A N/A
NBC 0.949 0.051 NBC N/A N/A
SNAC 0.949 0.051 SNAC N/A N/A
Chauffeur NC 1.000 0.000 NC N/A N/A
KMNC 1.000 0.000 KMNC N/A N/A
NBC 1.000 0.000 NBC N/A N/A
SNAC 1.000 0.000 SNAC N/A N/A