Adversarial Attacks on DNNs - Literature Review, Part 2

Introduction

In deep learning, adversarial attacks are a type of attack that aims to deceive a model by adding small perturbations to the input data. These perturbations are often imperceptible to humans, but can cause the model to make incorrect predictions. In the context of autonomous driving, generating adversarial test cases is vital to ensure the safety and reliability of self-driving cars. This literature review provides an overview of the state-of-the-art in adversarial attacks and test case generation for autonomous driving models, including DeepBillboard, SINVAD, and Input Validation Enhancement.

SINVAD

SINVAD, or Search-based Image Space Navigation for DNN Image Classifier Test Input Generation, aims to automate the process of generating test cases for DNN models.

Background

While many approaches to constructing adversarial examples have been proposed, existing methods often create perturbations in the pixel space that are not practically nor semantically meaningful. To generate more realistic adversarial examples in the larger space of valid image representations, SINVAD uses a VAE (Variational Autoencoder) to learn a latent space representation of the input images, and generate border images, that are on the boundary of the decision boundary of the classifier.

Semantic manifold problem refers to the fact that only a small subset of the entire input space is semantically meaningful, with the rest being noise. Most existing TIG (Test Input Generator) methods generate test inputs by adding noise to the input space, and expect the output of the DNN model to change. SINVAD, on the other hand, uses VAE to generate images in the space of semantically meaningful inputs.

Generative model is a model that learns the distribution of the input data and generates new samples from that distribution. Variational Autoencoder (VAE) is a generative model that learns a latent space representation of the input data, and generates new samples by sampling from the latent space. Latent space is a multi-dimensional vector space that captures the underlying structure of the input data. The whole process of VAE can be summarized as: Input → Encoder → Latent Space → Decoder → Output.

Methodology

A common application of DNNs is image classification. A neural network $N$ acts as a function that maps the space of images $I = [0,1]^{c \times w \times h}$ to a set of labels, corresponded with the probability belonging to each label $P = \{v \in [0, 1]^n : \sum_{i=1}^{n} v_i = 1\}$ . Formally, DNN is a mapping $N: I \rightarrow P$ , and it also provides the most likely label $N_{\text{argmax}}: I \rightarrow \{1, \ldots, n\}$ .

Take MNIST as an example, the input space is $D = [0,1]^{1 \times 28 \times 28}$ , which is a 28x28 grayscale image. Most of the input space is semantically meaningless, but the model will regardlessly make predictions for any input, including completely random noise. The goal of SINVAD is to generate test inputs that are both semantically meaningful and adversarial.

SINVAD introduces VAE to learn a latent space representation of the input space by encoder $E: D \rightarrow \R^d$ , performs search-based optimization in the latent space, and generates adversarial examples by decoder $D: \R^d \rightarrow D'$ . The search space is restricted to $D'$ , which contains images that closely resemble the original input $D$ .

Challenges

4 research questions are proposed to evaluate the effectiveness of SINVAD:

Plausibility: Can SINVAD generate images closely related to $D$ $D$ ?
- Evaluation: Human evaluation.
Indecisiveness: Can SINVAD generate images that are on the decision boundary of the classifier?
- Evaluation: Dropout test.
- If part of the neural network is randomly dropped out, the output of the network will be less confident, and the decision boundary will be more uncertain. If the DNN model makes different predictions for the same input under different dropout rates, the input is on the decision boundary.
Differential Testing: Can SINVAD perform differential testing between two DNN models?
- Differential testing is a technique to compare the outputs of two models to find discrepancies.
- Evaluation: Traverse the space and compare the outputs of two models, to identify semantic differences in two models.
Application: Can SINVAD be utilized to identify “weak spots” of DNN models?
- Evaluation: Find potentially harmful pairs, which are pairs of labels that are often confused by the DNN model. (e.g., 3 and 8 in MNIST)

Results

Plausibility

Human evaluation and metric-based evaluation shows that the generated images resemble the digits in MNIST more than random noise.

To control variables, the fitness function $f(i) = |LSA_N(i)-t|$ , is introduced to measure the similarity between the generated image and the original image. Since VAE optimization resemble (right) more of a digit than raw pixel optimization (left) with the same fitness value, the VAE-optimized images are more plausible.

PCA position is another metric used to infer the semantic meaning of an image, obtained by performing dimensionality reduction in the AT space. All digits are clustered together in the PCA space. Raw optimization (green) are usually outliers, while VAE optimization (red) are in the cluster.

Since each digit has a separate cluster in the PCA space, interpolation between two digits will result in a smooth transition between the two clusters, and thus can be used to generate adversarial examples. The following image shows the interpolation between 4 and 9 of both raw and VAE optimization.

Raw optimization: $i_t[i, j] = (1-t) \cdot i_0[i, j] + t \cdot i_1[i, j]$ for each pixel $(i, j)$ in the pixel space.

VAE optimization: $i_t = D(E(i_0) \cdot (1-t) + E(i_1) \cdot t)$ in the latent space.

This example shows that VAE optimization generates more plausible boundary images than raw optimization.

A genetic algorithm is used to generate even more ambiguous images, by changing the fitness function to

$f(i) = \begin{cases} \infty & \text{if } N_c(i) = N_c(i_0) \\ |E(i) - E(i_0)| & \text{otherwise} \end{cases}$

where $N_c(i) = \text{argmax }{i \in N(i)}$ is the most likely label of the image $i$ . This fitness function rejects images that do not change the prediction of the classifier; and when the image is on the decision boundary, the fitness function will make the generated image as close to the original image as possible.

Indecisiveness

In this experiment, dropout is used in both the training and testing phase to keep the network uncertain. The prediction variability of the network is measured by the Bessel corrected sample standard deviation of the predictions, as in the following formula:

$V(I) = \sum_{j=1}^{c} \frac{1}{N-1}\sum_{i=1}^{N} (p_{i,j} - \bar{p}_j)^2$

where $p_{i,j}$ is the probability of the $j$ -th class of the $i$ -th sample, and $\bar{p}_j$ is the posterior probability of the $j$ -th class, averaged over all samples.

The results show that VAE optimization generates more indecisive images than the original dataset, as the prediction variability of the network is much higher for VAE optimization, to a level of 0.7.

Differential Testing

Similar to the genetic algorithm in the plausibility experiment, SINVAD can be used to generate images that produce different predictions in two models. The fitness function is defined as

$f(i) = \begin{cases} \infty & \text{if } N(i) = N'(i_0) \\ m|E(i) - E(i_0)| & \text{otherwise} \end{cases}$

where $N$ and $N'$ are two different models, and constant $m = 2 + N_{i, c} - N'_{i,c}$ is to balance the difference in the prediction diversity of the two models. In this way, SINVAD can generate images that are semantically similar to the original ones but produce different predictions in two models.

Human can therefore summarize the weakness of each model by comparing the situations where the two models make different predictions. For example, since CustomNet fails to recognize the digit 5 in the second example, it can be assumed that CustomNet is not robust to grey backgrounds.

Application

The last experiment is to find potentially harmful pairs, which are pairs of labels that are often confused by the DNN model. The fitness function is defined as

$f(i) = \begin{cases} \infty & \text{if } N_c(i) = t \\ |E(i) - E(i_0)| & \text{otherwise} \end{cases}$

where $t$ is the target label. This fitness function will generate images that are semantically similar to the original image reading the target label, but the DNN model will predict a different label.

These heatmaps show the frequency of an image labeled as $x$ (horizontal) and predicted as $y$ (vertical). The diagonal line represents the correct predictions, and the off-diagonal lines represent the harmful pairs. The off-diagonal boxes with high frequency are the harmful pairs, like 4 and 9, 3 and 8, which are often confused by the DNN model.

The right heatmap (SINVAD) shows much more errors than the left heatmap (original dataset), indicating the application of SINVAD in identifying common misclassifications.

Conclusion

SINVAD is a novel approach to generating adversarial examples for DNN models. It shifts the focus from the pixel space to the latent space, and generates images that are semantically meaningful and adversarial. Since VAE learns the latent space representation of the input data, SINVAD can also be extended to other data modalities, such as audio and text. The experiments show that SINVAD can generate plausible, indecisive, and differentially testable images, and identify harmful pairs in DNN models.

DeepBillboard

Similar to SINVAD, DeepBillboard aims to generate realistic adversarial examples, specifically for autonomus driving models. DeepBillboard can simulate changes in physical conditions, such as viewing angle, distance, and lighting, to generate adversarial examples that are robust to real-world conditions.

Background

Deep learning models for autonomous driving are often trained on images captured by cameras mounted on the vehicle. These images can be affected by various physical conditions, such as lighting, weather, and occlusions. Adversarial examples generated in the pixel space, like by DeepXplore, often fail to resemble real-world conditions. DeepBillboard focus on a specific type of adversarial attacks: they create or modify billboard in the scene to deceive the model.

Billboards are chosen as the target for the following reasons:

The content of billboards can be easily controlled and manipulated.
The presence of billboards are not related to road conditions, so the models are expected to make the same predictions regardless of the size, location, or content of the billboard.
The billboards are large enough that are clearly captured by the camera in all viewing angles, distances, and lighting conditions.
Changes to billboards as adversarial examples are unlikely to cause accidents for human drivers, but partial changes can cause the model to make incorrect predictions.

DeepBillboard introduces a joint optimization algorithm to generate a pertubated billboard image that misleads the model in steering angle prediction. To apply this in a video stream, DeepBillboard also introduces a temporal consistency constraint to minimize the changes in the pertubation between frames, making the adversarial examples more robust to real-world conditions.

Convolutional Neural Networks (CNNs) are a type of DNN that are commonly used for image classification tasks. CNNs consist of multiple layers of convolutional filters that learn to extract features from the input image. It differs from another type of DNN, Recurrent Neural Networks (RNNs), which are designed to process sequential data: CNN are more suitable for making predictions based on static images. NVIDIA DAVE-2 is a CNN-based model that predicts the steering angle of a self-driving car based on the input image.

Digital Adversarial examples. Prior work on adversarial examples has focused on generating digital perturbations to deceive the model. These perturbations are often imperceptible to humans but can cause the model to make incorrect predictions. DeepXplore is a white-box TIG (Test Input Generator) that generates adversarial examples by changing lighting conditions and adding noise to the input image. These methods contribute to understanding digital adversarial examples, and the generated test cases may never exist in the real world.

Physical Adversarial examples. Kurakin et al. (2016) proposed a method to extend adversarial examples to be robust to real-world conditions, such that capturing a photo with a smartphone camera would still deceive the model. Sharif et al. (2016) presented dodging and impersonation attacks for DNN-based face recognition systems. These methods contribute to understanding adversarial examples in physically stable environments, where the position, distance, viewing angle, and lighting conditions are controlled. However, autonomous driving models are exposed to more complex and dynamic environments, where the physical conditions are constantly changing.

Eykholt et al. (2017, 2018) proposed a method to generate adversarial examples for road sign classifiers, and extended the method to attack YOLO detectors, a type of real-time object detection model. Compared to existing methods, DeepBillboard provides more robust, widely applicable, and both digital and physical adversarial examples.

Methodology

The metric used to evaluate the test effectiveness is the off-tracking distance, defined as the distance between the predicted steering angle and the ground truth steering angle. Denote the velocity of the vehicle as $v$ , the interval between two decision points as $i$ , the ground truth steering angle as $\alpha$ , and the predicted steering angle as $\alpha'$ , the off-tracking distance is calculated as

$H(\alpha, \alpha') = v \cdot i \cdot \sin(\alpha - \alpha')$

Given that $v$ and $i$ are constant in the experiment, the steering angle error $\alpha - \alpha'$ is the only factor that affects the off-tracking distance.

The test is conducted in a actual driving-by scenario, and therefore, continuous scenes instead of a single static image are used as input. Denote $\hat{X}$ as the set of frames captured by the camera. The goal of DeepBillboard is to inserts an adversarial billboard image into every frame in $\hat{X}$ .

The attacking strength is measures by mean angle error for every frame in $\hat{X}$ , defined as

$M_0 = \frac{1}{\text{card}(\hat{X})} \sum_{x \in \hat{X}} |f(x_i') - f(x_i)|$

where $f(\cdot)$ is the steering angle prediction function, and $x_i$ and $x_i'$ are the original and adversarial frames, respectively.

The attacking possibility or success rate is defined as the percentage of frames in $\hat{X}$ that the model makes incorrect predictions, which means the steering angle error is greater than a threshold $\tau$ . The formula is

$M_1 = \frac{\text{card}(\{x \in \hat{X} : |f(x_i') - f(x_i)| > \tau\})}{\text{card}(\hat{X})}$

$\tau$ can be computed using the abovementioned off-tracking distance formula.

The procedure of DeepBillboard is as follows:

Locate the billboard and fill with uniform color. Paint its four corners with contrasting colors.
Uses joint gradient ascent to optimize the billboard image. Iteratively update the billboard image to maximize $M_0$ .

The process is based on a single-frame adversarial attack, which finds the pertubation $\delta$ that maximizes the distance function $\max H(f(x + \delta), A_x$ , where $A_x$ is the ground truth steering angle of the frame $x$ . For further processing, it is rewritten as $\text{argmin}_{\delta} -L(f(x + \delta), A)$ , where $L$ is the loss function that measures the steering angle error.

Joint loss optimization extends the single-frame attack to a multi-frame attack. The goal is to find a fixed pertubation $\Delta$ that creates minimal joint loss across all frames in $\hat{X}$ . Its formula is $\text{argmin}_{\Delta} \sum_{x \in \hat{X}} -L(f(x + \Delta), A)$ .

Handling overlapped pertubations. In the single-frame attack, every single frame may generate a set of pertubations that overlap with each other and partially contribute to the final adversarial billboard. To maximize the attacking strength $M_0$ , the overlapped pertubations are merged into a single pertubation, with each frame contributing $k$ pixels such that $n \cdot k < m$ , where $n$ is the number of frames and $m$ is the total number of pixels in the billboard. The overlapped pixels are updated with a greedy strategy.

Enhancing Pertubation Printability. To ensure the printability of the billboard, the distance between the pertubation and the printable color space is minimized. For each pixel, the non-printability score (NPS) is calculated as the maximum distance between the pixel and the printable color space $P \in [0, 1]^3$ , as: $NPS(p') = \prod_{p \in P} | p - p' |$ . The NPS is then added to the loss function to penalize non-printable colors.

Adjust Color Difference. Under different environmental conditions, the color of the billboard may change. Assume the entire billboard is prefilled with a uniform color $p = \{r, g, b\}$ , and the actual color captured by the camera is $p' = \{r', g', b'\}$ . It is observed that all colors in the billboard are shifted by the same amount, so the color adjustment function $ADJ_i = d_i(p, p') = p_i - p_i'$ can transform the actual color to the displayed color.

The following Python code snippet summarizes the process of DeepBillboard:

def generate(IMGS, COORD, ITER, BSZ, ADJ, DIM):
    '''
    IMGS: set of frames captured by the camera
    COORD: coordinates of the billboard
    ITER: number of iterations
    BSZ: batch size
    ADJ: adjustment function
    DIM: dimension of the billboard
    '''
    # Initialize the billboard template
    perturb = color_init(DIM)
    pert_data = np.zeros(BSZ, DIM)

    # Gradient ascent
    for i in range(ITER):
        random.shuffle(IMGS)
        for j in range(0, len(IMGS), BSZ): # process in batches
            batch = IMGS[j:j+BSZ]
            pert_data.clear()
            for img in batch:
                grad = np.derive(obj, img) # compute gradient
                grad = domain_constraint(grad) # only update the billboard area
                pert_data[img] = rev_proj(grad, ADJ) # reverse projection
            pert_data = handle_overlap(pert_data) # handle overlapped pertubations
            
            atmpt_pert = pert_data * s + perturb # add base color to the billboard
            atmpt_pert = nps_ctrl(atmpt_pert, ADJ) # control non-printability
            atmpt_imgs = apply_pertubation(IMGS, atmpt_pert, COORD) # apply pertubation to images
            this_diff = loss(atmpt_imgs) # compute M0

            if this_diff > last_diff or random.random() < SA: # accept the pertubation
                perturb = atmpt_pert
                IMGS = atmpt_imgs
                last_diff = this_diff
    return perturb

The gradient ascent algorithm applies the pertubation in ITER iterations. In each iteration, the images are first randomly shuffled to avoid early convergence. Then, the images are processed in batches.

For each image in the batch, the gradient of the objective function is computed, and limited to the billboard area. The gradient is then reverse-projected to the printable color space, and the overlapped pertubations are merged. There are 3 approaches in merging overlapped pertubations: (1) use the largest gradient, (2) use the sum of gradients, or (3) use the gradient that causes the most significant change in the steering angle.

After batch processing, the pertubation goes through color adjustment, non-printability control, and is applied to the images. The loss function is computed to measure the attacking strength. If the current pertubation improves the attacking strength, it is accepted; otherwise, it may still be accepted with a certain probability to avoid local optima.

Evaluation

To measure the effectiveness of DeepBillboard, two parallel experiments are conducted: digital test and physical test. For both tests, average angle error and success rate are used as metrics.

For the digital test, DeepBillboard is applied to some of the driving scenes in the NVIDIA DAVE-2 and a few other datasets.

For the physical test, the researchers first record a set of driving scenes with a billboard, as the training videos, and then apply DeepBillboard to generate adversarial billboards. After that, the adversarial billboards are printed and placed in the real-world environment. The videos are recorded again as the testing videos. Multiple weather conditions and billboard colors are considered in the physical test.

The digital test reveals that DAVE-1 and DAVE-2 are significantly vulnerable to adversarial billboards, with an average angle error more than 10 degrees. DAVE-3 and Epoch adopts dropout layers, which randomize the network and make the model more robust and generalizable, and thus perform better in resisting adversarial attacks. DAVE-3 outperforms Epoch, as it was trained on augmented data, which was cropped to only contain the pavement, and thus is less likely to be affected by the billboard.

Further, since the videos record the process of a vehicle driving by the billboard, the billboard becomes larger and larger in the video, which makes the average angle error increase with the time.

In the physical test, the researchers created two types of adversarial billboards: one that makes the model turn left and one that makes the model turn right. The billboard appears on the right side of the road, but the road is straight, so the model should not turn.

The results show that the adversarial billboards are effective in deceiving the model. Of the 900 frames in the testing videos, the model makes incorrect predictions in 268 frames (29.8%), with 100% success rate for a left adversarial billboard in the dusk condition. Further, the left adversarial billboard results in a positive angle error (left) and the right adversarial billboard results in a negative angle error (right), which is consistent with the design of the adversarial billboards. This trend become more obvious at later frames, since the billboard becomes larger in each frame.

Conclusion

DeepBillboard is an innovative solution to generating printable and physically robust adversarial examples for autonomous driving models. By focusing on billboards, DeepBillboard can simulate real-world conditions and generate adversarial examples that are robust to changes in viewing angle, distance, and lighting. Joint loss optimization, overlap handling, and color adjustment enhance the effectiveness of the adversarial examples, and are also vital for future research in related fields.

Input Validation Enhancement

Zhang et al. (2024) proposed a comprehensive framework for input validation enhancement, which aims to improve the quality of DNN test cases by validating the input data.

Background

In the context of software engineering, it is important to validate the input data to prevent unexpected behaviors and ensure the credibility of the software. In the context of DNN models, input validation is crucial to ensure the safety and reliability of the model. A trained DNN model is expected to respond to input data outside the training set in a reasonable and predictable manner. However, it is hard for DNN to distinguish between valid and invalid inputs, and existing Test Input Generators (TIGs) often generate test cases that are semantically meaningless or irrelevant to the model, such as random noise.

Input Validators (IVs) are used to validate the input data before feeding it into the DNN model. For example, SINVAD introduces VAE to learn a latent space representation of the input data, and generate semantically meaningful test cases. It still remains a challenge to extend the IVs to other data modalities and improve the quality of the test cases.

On the other hand, many metrics, including Neuron Coverage (NC) and Surprise Adequacy (SA) are proposed to evaluate how the input data affects the output of the DNN model. However, whether these metrics can effectively measure and guide the test case generation process remains an open question.

The typical test input validation workflow consists of two phases: test input generation and test input validation. Two phases are conducted by TIGs and IVs, respectively.

Many adversarial attack methods have been proposed to generate adversarial examples that deceive the DNN model. Compared to baseline methods, TIGs generate test cases that involve more complex and diverse input data, and are more likely to reveal vulnerabilities in the model.

IVs are used to validate the input data before feeding it into the DNN model. An input is deemed invalid if it is anomalous or an outlier to the data distribution learned from training. IVs are often based on semi-supervised learning, which uses both labeled and unlabeled data to learn the data distribution. Semi-supervised learning can be divided into three categories:

Statistical Density-based methods infer the data distribution from the training data and detect outliers based on statistical properties.
Reconstruction-based methods map the input data to a latent space and reconstruct the input data from the latent space. If the reconstruction error is high, the input is considered an outlier.
Distance-based methods measure the distance between the input data and the training data in the feature space. If the distance is above a threshold, the input is considered an outlier.

Three categories of metrics are proposed to evaluate the effectiveness of IVs:

NC-based metrics measure the percentage of neurons in the DNN model that are activated by the input data.
SA-based metric measures the degree (both the number and the intensity) of out-of-distribution inputs that can be detected by the IV.
Uncertainty-based metrics measure the uncertainty of the DNN model in predicting the output for the input data.

Empirical Study

Studied TIGs

Three TIGs are evaluated in this study: DeepXplore, DLFuzz and SINVAD.

DeepXplore is a white-box TIG that aims to jointly maximize neuron coverage and behavior difference between two DNN models. It has 3 variants, each generating a different type of perturbation: lighting condition, single rectangle occlusion, and multiple rectangle occlusion.
DLFuzz is a white-box differential fuzzing testing framework that generates adversarial examples that maximize neuron coverage and difference between the output of the DNN model and the ground truth label.
SINVAD is a black-box TIG that searches for ouput in the latent space of the input data. The translation from the color pixel space to the latent space is learned by a VAE.

To observe the test case quality, the generated test cases are first evaluated by human volunteers. They are required to classify the test cases into 3 categories: valid (relevant) with clear category, valid (relevant) with unclear category, and invalid (irrelevant). The results show that all the abovementioned TIGs can generate test cases that are unrecognizable to humans, which indicates that the test cases are invalid.

However, SINVAD generates test cases that cause 48% of human volunteers to misclassify on MNIST and 100% on UdaCity (highest for both datasets), and in the meantime, barely deceive the DNN model. On the other hand, TIGs that employ pixel-level pertubations yield better results in terms of human recognition. (Finding 1)

Studied IVs

4 IVs of 3 categories are evaluated in this study:

Statistical Density-based PixelCNN++
Reconstruction-based DAIV
Distance-based DeepKNN, DeepSVDD

For all IVs, the decision threshold is set to the value such that 95% of the training data is considered valid. In other words, the False Negative Rate (FNR) is 5%.

PixelCNN+ models the statistical probability distribution on the training data. Discretized logistic mixture likelihood on the pixel space is used to measure the likelihood of the input data. The log-likelihood of input $x$ is defined as

$\log p = \sum_{i=1}^{n} \log p(x_i | x_{<i}) = \sum_{i=1}^{n} \log \sum_{k=1}^{K} \pi_k \cdot \text{logistic}(x_i; \mu_{k,i}, s_{k,i})$

The input $x$ is considered valid if $\log p \geq t$ , and invalid otherwise, where $t$ is a threshold. 95% highest log-likelihood values are used as the threshold.

DAIV is a reconstruction-based IV. It trains a pair of encoder and decoder to map the input data to a latent space and reconstruct the input data from the latent space. The train goal is to minimize the KL divergence of the posterior and prior distribution of the latent variable.

Specifically, the encoder $f_\phi$ maps $x$ , described with parameters $(\mu_z, \sigma_z)$ , to the latent space variable $z$ such that $z \sim \mathcal{N}(\mu_z, \sigma_z)$ .

For testing, $L$ samples $z_1, \ldots, z_L$ are drawn from the latent space, and decoded by the decoder $g_\theta$ to obtain $\mu_{z_l}, \sigma_{z_l}$ , which is then compared with the original input data $x$ .

The reconstruction probability, which is the average reconstruction probability of $L$ samples, is defined as

$p = \frac{1}{L}\sum_{l=1}^{L} p_\theta(x | \mu_{z_l}, \sigma_{z_l})$

The input $x$ is considered valid if $p \geq t$ , and invalid otherwise, where $t$ is a threshold. 95% highest reconstruction probabilities are used as the threshold.

DeepKNN is a distance-based IV that measures the nearest neighbor distance between the input data and the training data in the feature space. It computes k-th nearest neighbor (KNN) distance between the input data and the training data, and considers the input data invalid if the distance is above a threshold.

Specifically, denote $\phi(x)$ as the feature representation of the input data $x$ , and $z = \frac{\phi(x)}{||\phi(x)||_2}$ as the normalized feature vector. The KNN distance is defined as

$d_k = || z - z_\text{train}^k ||_2$

where $z_\text{train}^k$ is the normalized feature vector of k-th nearest neighbor of $z$ in the training data. The input $x$ is considered valid if $d_k \leq t$ , and invalid otherwise, where $t$ is a threshold. 95% lowest KNN distances are used as the threshold.

DeepSVDD is another distance-based IV that measures the distance in a hypersphere. The center $c$ of the hypersphere is the mean of the feature representation of the training data, and the radius $t$ is the average distance between the training data and the center. The input $x$ is considered valid if $d_c = ||\phi(x) - c||_2 \leq t$ , and invalid otherwise.

True Positive Rate (TPR) is the percentage that both human and IV consider the input data valid.
False Positive Rate (FPR) is the percentage that human considers the input data invalid but IV considers it valid.
False Negative Rate (FNR) is the percentage that human considers the input data valid but IV considers it invalid.
True Negative Rate (TNR) is the percentage that both human and IV consider the input data invalid.
Accuracy $ACC = \frac{TP + TN}{TP + FP + FN + TN}$ is the percentage that IV makes the correct decision.

The aggregated experimental results are given in the normalized confusion matrix. The larger the diagonal values of the matrix (TP and FP), the better the IV performs.

It can be concluded that distance-based DeepKNN and DeepSVDD outperform PixelCNN+ and DAIV. D-KNN and D-SVDD performs the best in MNIST and UdaCity, respectively. (Finding 3)

The detailed results show the accuracy for each combination of TIG and IV.

Several observations can be made from the results:

DeepKNN and DeepSVDD outperform PixelCNN+ and DAIV in both MNIST and UdaCity datasets, with an accuracy difference of 10% or more.
The majority of inaccurate decisions are FN, where the IV considers the input data invalid but human considers it valid.
Most of the IVs perform well on DL (DeepXplore-light) and DB (DeepXplore-blackout).
For MNIST, only DeepKNN performs well on DO (DeepXplore-occl), while for UdaCity, all IVs perform well on DO.
DeepKNN and DeepSVDD outperform other IVs in DF (DLFuzz) for both datasets.
SINVAD is associated with high FP rates for all IVs, indicating that the test cases generated by SINVAD are mostly invalid. For UdaCity, all IVs except PixelCNN+ reaches nearly 0% accuracy.

IVs identify SINVAD as the most effective TIG, while in fact the FP rate is high and the accuracy is low. This is because, according to human evaluation, the test cases generated by SINVAD are frequently unrecognizable to humans, and thus are invalid. (Finding 2)

Studied Metrics

5 metrics of 3 types are evaluated in this study.

Neuron Coverage (NC) measures the percentage of neurons in the DNN model that are activated by the input data. The NC of the test input set $T$ is defined as

$NC(T) = \frac{\text{card}(\{n \in N : \exists x \in T, a(n, x) > t\})}{\text{card}(N)}$

where $N$ is the set of neurons in the DNN model, $a(n, x)$ is the activation of neuron $n$ by input $x$ , and $t$ is a threshold. $t = 0.25$ is used in this study, as DeepXplore and DLFuzz commonly use this threshold.

K-multisection Neuron Coverage (KMNC) is an extension of NC that divides the activation range of each neuron into $k$ sections. It discretizes the activation value of each neuron into $k$ sections, and measures the percentage of neurons that are activated by the input data up to the $k$ -th section.

$KMNC(T, K) = \frac{\sum_{n \in N} \text{card}(\{S_k^n : \exists x \in T, a(n, x) \in S_k^n\})}{\text{card}(N) \cdot K}$

where $S_k^n$ is the $k$ -th section of neuron $n$ , and $K$ is the number of sections. $K = 100$ is used in this study.

Likelihood-based Surprise Adequacy (LSA) uses Kernel Density Estimation (KDE) to estimate the probability density of each activation value of the neurons in the selected layer. The surprise reflects how “surprised” the DNN model is by the input data, that is, the difference in activation traces between the new input data and the training data.

Denote

$\begin{aligned} f(x) &= \frac{1}{\text{card}(X)} \sum_{x_i \in X} K(a_L(x) - a_L(x_i))\\ LSA(x) &= -\log f(x) \end{aligned}$

where $X$ is the training data, $a_L(x)$ is the activation trace of the input data $x$ in the selected layer $L$ , and $K$ is the kernel function. The surprise of the input data $x$ is the negative log-likelihood of $f(x)$ .

Distance-based Surprise Adequacy (DSA) measures the Euclidean distance between the activation trace of the input data and the training data.

The reference point $x_n$ is the nearest neighbor of the input data $x$ in the training data, and the distance-based surprise is defined as

$DSA(x) = \min_{x_n \in X} ||a_L(x) - a_L(x_n)||$

DeepGini estimates the uncertainty of the DNN classification based on the input of the last layer (softmax layer).

The likelihood of misclassification is $DG(x) = 1 - \sum_{c=1}^{C} p_c(x)^2$ . The more concentrated the probability distribution is, the lower the uncertainty is. When one of the class has a probability of 1, the uncertainty is 0.

MC-Dropout is the standard deviation induced by Monte-Carlo Dropout. Take $N = 100$ sampled point prediction results $[m(x)_1, m(x)_2, \ldots, m(x)_N]$ , where each prediction is produced by a different dropout rate. MC-Dropout is defined as

$MC(x) = \sigma(m(x)) = \sqrt{\frac{1}{N} \sum_{i=1}^{N} (m(x)_i - \bar{m}(x))^2}$

$DG(x) / MC(x)$ standardizes the uncertainty to the range of $[0, 1]$ . The higher the value, the more uncertain the DNN model is in predicting the output for the input data.

The above table shows the metrics given by DNN models for both the valid and invalid test cases generated by TIGs. It is expected that if a metric differentiates between valid and invalid test cases, it is of higher credibility in guiding the test case generation process.

The observations are as follows: (Finding 4)

For NC-based metrics, valid test cases mostly produce higher values than invalid test cases.
For SA-based metrics, invalid test cases mostly produce higher values than valid test cases. DSA is a better differentiator than LSA.
For uncertainty-based metrics, invalid test cases produce higher values than valid test cases.

Testing Framework

The researchers propose a comprehensive testing framework for input validation enhancement, consisting of a TIG module and an IV module.

Specifically, each TIG is paired with the IV that achieves the highest accuracy in the evaluation of its test cases. Then a joint optimization is conducted to maximize both the validity of TIG test cases and the accuracy of IV. Further, human evaluation is re-conducted to validate the original objective of the TIG, that is, to produce more valid test cases.

The results show that the joint optimization can improve the validity of the test cases generated by TIGs, with an improvement of 2%-10% in the percentage of valid test cases, except for SINVAD. At the same time, the accuracy of IVs is also improved.

Conclusion

The proposed testing framework for input validation enhancement aims to improve the quality of DNN test cases by validating the input data. The framework consists of a TIG module and an IV module, which are paired and jointly optimized to maximize the validity of the test cases and the accuracy of the IV. The results show that the framework can improve the validity of the test cases generated by TIGs and the accuracy of IVs, thus enhancing the reliability and effectiveness of the testing process.

References

[1] Goodfellow, I. J., Shlens, J., & Szegedy, C. (2015). Explaining and Harnessing Adversarial Examples. ICLR 2015. https://ai.google/research/pubs/pub43405

[2] Kang, S., Feldt, R., & Yoo, S. (2020). SINVAD: Search-based image space navigation for DNN Image Classifier Test Input generation. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2005.09296

[3] Pei, K., Cao, Y., Yang, J., & Jana, S. (2019). DeepXplore. GetMobile, 22(3), 36–38. https://doi.org/10.1145/3308755.3308767

[4] Zhang, J., Keung, J., Ma, X., Li, Y., & Chan, W. K. (in press). Enhancing valid test input generation with distribution awareness for deep neural networks. The 48th IEEE International Conference on Computers, Software, and Applications (COMPSAC 2024). https://scholars.cityu.edu.hk/en/publications/publication(44d4dce9-f84e-43d9-acad-0b42e602eca7).html

[5] Zhou, H., Li, W., Zhu, Y., Zhang, Y., Yu, B., Zhang, L., & Liu, C. (2018). DeepBillboard: Systematic Physical-World Testing of Autonomous Driving Systems. arXiv (Cornell University). https://doi.org/10.48550/arxiv.1812.10812

[6] Zou, J., Pan, Z., Qiu, J., Liu, X., Rui, T., & Li, W. (2020). Improving the Transferability of Adversarial Examples with Resized-Diverse-Inputs, Diversity-Ensemble and Region Fitting. In Lecture notes in computer science (pp. 563–579). https://doi.org/10.1007/978-3-030-58542-6_34

本文采用署名-相同方式共享 4.0 国际许可协议，转载请注明出处。

Adversarial Attacks on DNNs - Literature Review, Part 2

Introduction

SINVAD

Background

Related Work

Methodology

Challenges

Results

Plausibility

Indecisiveness

Differential Testing

Application

Conclusion

DeepBillboard

Background

Related Work

Methodology

Evaluation

Conclusion

Input Validation Enhancement

Background

Related Work

Empirical Study

Studied TIGs

Studied IVs

Studied Metrics

Testing Framework

Conclusion

References