standardizes the uncertainty to the range of. The higher the value, the more uncertain the DNN model is in predicting the output for the input data.
The above table shows the metrics given by DNN models for both the valid and invalid test cases generated by TIGs. It is expected that if a metric differentiates between valid and invalid test cases, it is of higher credibility in guiding the test case generation process.
The observations are as follows: (Finding 4)
The researchers propose a comprehensive testing framework for input validation enhancement, consisting of a TIG module and an IV module.
Specifically, each TIG is paired with the IV that achieves the highest accuracy in the evaluation of its test cases. Then a joint optimization is conducted to maximize both the validity of TIG test cases and the accuracy of IV. Further, human evaluation is re-conducted to validate the original objective of the TIG, that is, to produce more valid test cases.
The results show that the joint optimization can improve the validity of the test cases generated by TIGs, with an improvement of 2%-10% in the percentage of valid test cases, except for SINVAD. At the same time, the accuracy of IVs is also improved.
The proposed testing framework for input validation enhancement aims to improve the quality of DNN test cases by validating the input data. The framework consists of a TIG module and an IV module, which are paired and jointly optimized to maximize the validity of the test cases and the accuracy of the IV. The results show that the framework can improve the validity of the test cases generated by TIGs and the accuracy of IVs, thus enhancing the reliability and effectiveness of the testing process.
[1] Goodfellow, I. J., Shlens, J., & Szegedy, C. (2015). Explaining and Harnessing Adversarial Examples. ICLR 2015. https://ai.google/research/pubs/pub43405
[2] Kang, S., Feldt, R., & Yoo, S. (2020). SINVAD: Search-based image space navigation for DNN Image Classifier Test Input generation. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2005.09296
[3] Pei, K., Cao, Y., Yang, J., & Jana, S. (2019). DeepXplore. GetMobile, 22(3), 36–38. https://doi.org/10.1145/3308755.3308767
[4] Zhang, J., Keung, J., Ma, X., Li, Y., & Chan, W. K. (in press). Enhancing valid test input generation with distribution awareness for deep neural networks. The 48th IEEE International Conference on Computers, Software, and Applications (COMPSAC 2024). https://scholars.cityu.edu.hk/en/publications/publication(44d4dce9-f84e-43d9-acad-0b42e602eca7).html
[5] Zhou, H., Li, W., Zhu, Y., Zhang, Y., Yu, B., Zhang, L., & Liu, C. (2018). DeepBillboard: Systematic Physical-World Testing of Autonomous Driving Systems. arXiv (Cornell University). https://doi.org/10.48550/arxiv.1812.10812
[6] Zou, J., Pan, Z., Qiu, J., Liu, X., Rui, T., & Li, W. (2020). Improving the Transferability of Adversarial Examples with Resized-Diverse-Inputs, Diversity-Ensemble and Region Fitting. In Lecture notes in computer science (pp. 563–579). https://doi.org/10.1007/978-3-030-58542-6_34