CS454 AI Based Software Engineering (Autumn 2023)

Syllabus

This class broadly covers the use of machine intellgence for software engineering. There are two major topics in the couse:

We will learn a class of algorithms that are referred to metaheuristics. This includes evolutionary computation as well as other bio-inspired algorithms. While this course is not about these algorithms, currently they are not really part of other School of Computing courses, so we will study them.
We will learn how SE problems are fomulated as optimisation and/or learning problem, and will try to replicate some of them by ourselves. Originally, CS454 was focused more on metaheuristic algorithms, but we will try to also expand into various ML algorithms. However, since we are NOT directly covering ML algorithms, it is highly recommended that you have taken some introductory courses on the relevant topics (AI, ML, Deep Learning, etc).

Lectures

Time: 10:30-11:45, Mondays and Wednesdays
Location: E3-1 Room 1101

Communication

On the course Slack (invitation link has been distributed - contact the lecturer if you joined late and do not have it).

Lecturer

Shin Yoo shin.yoo@kaist.ac.kr
Office: E3-1 Room 2405

Required Skills

The course assumes that students have the following skills. Unless you satisfy the following, you may encounter significant difficulties during courseworks and projects:

Strong programming skills: you are required to develop a course project. There will be also a number of hands-on sessions where we will program together during the class.
Unix/Linux-savvy: you should be familiar with the usual build tools and Unix/Linux command line environments.
Git-aware: you will be required to submit a github repository as part of your project deliverables.
Background knowledg about basic AI/ML
(Optional but strongly recommended) LaTeX

Out of these, I would understand if you have to learn how to use git during this course. But programming proficiency and Unix/Linux literacy are essential from the beginning.

Evaluation

This may change.

Coursework: 60%
Project: 40%

Teaching Assistant

Juyeon Yoon, juyeon.yoon@kaist.ac.kr

Lectures

The schedule below is still tentative for now and may change, given that we do not know the exact class size yet.

08/28: Introduction
08/30: Metaheuristics
09/04: Fitness Landscape, Random Search, and Local Search Algorithms
09/06: Random/Local Search Hands-on
09/11: No Lecture - ASE 2023
09/13: No Lecture - ASE 2023
09/18: Evolutionary Computation 1/2
09/20: Evolutionary Computation 2/2
09/25: Genetic Algorithm Hands-on
09/27: SBST and Metaprogramming Tutorial
10/02: Public Holiday
10/04: Genetic Programming
10/09: Day off for Hangeul Proclamation Day
10/11: Symbolic Regression Hands-on
10/16: Midterm Exam Week
10/18: Midterm Exam Week
10/22: Project Proposal
10/25: Project Proposal
10/30: Multi-objective Optimisation
11/01: MOEA Evaluation Metrics, Normalisation
11/06: Bio-inspired Algorithms, SBSE Overview
11/08: Advanced Algorithms and Optimisation Tips
11/13: Elementary Landscape Analysis, No Free Lunch Theorem
11/15: Test Adequacy for DNN + Input Generation
11/20: Automated Program Repair
11/22: Naturalness of Code + N-gram Language Model Hands-on
11/27: Large Language Models
11/29: No Lecture - Undergraduate Admission Interview
12/04: No Lecture - Presentation Week
12/06: Project Presentation Submission Deadline
12/11: Final Exam Week
12/13: Final Exam Week

Details of each assignment and submission dates will be updated as we go along.

Assignment #1: Next Release Problem

You will implement solvers for Next Release Problem using basic heuristics (greedy) as well as local search algorithms (hill climbing and simulated annealing). You can access the assignment via GitHub Class: LINK. This assignment accounts for 10% of the course. Deadline is 15th September 2023.

Assignment #2: Search Based Test Data Generation

You will implement a search-based test data generation tool for a subset of Python, using genetic algorithm as the base optimizer. You can access the assignment via GitHub Class: LINK. Deadline is the end of 27th October 2023.

Assignment #3: Evolving Spectrum Based Fault Localisation Techniques

You will implement a basic GP system in order to evolve an SBFL formula, an automated debugging technique used for fault localization. You can access the assignment via GitHub Classroom: LINK. Deadline is the end of 22nd November 2023.

Assignment #4: Evolutionary Fuzzing of DNNs

You will implement a GA based DNN fuzzer, so that the generated inputs will trick an existing image classifier. You can access the assignment via GitHub Classroom: LINK. Deadline is the end of 13th December 2023

Group Project Outline

The course project requires you to identify and formulate an SBSE problem and solve it using a metaheuristic approach.

Project Ideas

There are a few ways to find ideas to work on:

Consider your own experience as a developer: if there is some aspect of software development that you think you can improve by automatically searching for a solution, you can deploy algorithms that we have learnt.
Replicating and/or Improving Literature: there are very good surveys [4, 2, 5, 3] and position papers [1] on SBSE, as well as individual research work. You can find a research topic/idea that looks attractive to you, and replicate the work in a small scale. Of course, if you can find a way to improve techniques in the literature, it is even more desirable! There are a few possible places to look:

– International Symposium on Search Based Software Engineering (SSBSE): http://dblp.uni-trier.de/db/conf/ssbse/index.html

– SBSE Track at ACM Genetic Algorithm and Evolutionary Computation Conference (GECCO): https://dblp.uni-trier.de/db/conf/gecco/ (Look for Search-based Software Engineering section under each year’s proceedings)

– International Workshop on Search Based Software Testing (SBST): https://dblp.org/db/conf/icse/. URL is for all workshops collocated with International Conference on Software Engineering - look for SBST, which has been running since 2014.

Many major software engineering conferences have also accepted many SBSE papers over the years. Our reading list is a great starting place. If you are unsure about the scope of a paper, please consult me.
Practical Challenges: The SSBSE conference, which I mentioned above, has also been running a Challenge Track since 2013. Each year, the organisers choose a few software systems in different languages, and participants are encouraged to do something related to those systems using SBSE. Applications that have been submitted so far include: fault prediction, energy consumption improvement, searching for GUI crash events, test case prioritisation, test data generation, deep parameter optimisation, etc.

I understand that the scope is very wide and initially choosing a subject can be a bit confusing. You are welcome to consult any of the T/As and/or me, either by email or face to face. If you want to see me, it is best to make an explicit appointment by email(shin.yoo@kaist.ac.kr).

Deliverables & Timeline

20th September: Deadline for registering your team

Try to build a team of four people. There is a link to a Google Sheet document on the course Slack workspace: build and register your team by no later than 20th September 2023.

6th October: Initial Expression of Interest

Submit an 1-page document that contains one or more project ideas your team is currently considering. I will read them and give feedback to you. Submission should be made on KLMS.

22 & 25 October: Project Sales Pitch

Your team will finalise your project theme, and give a presentation to the class. Try to include the following points:

What the target problem is and why it’s important
How you plan to attack it using optimization or other AI/ML techniques
How you plan to evaluate the difference you have made

6th December: Final Project Presentation Deadline

Each team will record the final presentation (max 15 min), upload it on YouTube as an unlisted video (therefore not searchable), and post the link to the #project channel on course Slack by the end of 6th.

18th December 2023: Final Report and Deliverable

Submit the following via KLMS: the dealine is the end of 18th December 2023. The person who is submitting the team report can zip it with his/her individual report; others can simply upload their individual report.

Each team should submit at least one group report. There is no page limit, or required font size, but please make sure you submit a PDF file to KLMS. The group report should describe the problem, your approach, and the evaluation results. It should also include a link to your code repository.
Each person should submit an individual report. Individual reports should include the following two parts:
- Detailed description of what you have contributed to the project
- A 10-point scale evaluation of all your teammates, wich short descriptive evaluation of their contribution and teamwork

Previous Projects

These are made available only as a reference. You cannot propose identical projects. Two of them are in Korean, because the policy was different then: apologies for those who do not read/speak Korean.

Automatically Testing Self-Driving Cars with Search-Based Procedural Content Generation (from 2020 - later published in SBST 2021
SWAY for Decision Space of Permuatations with Case Study on Test Case Prioritization (from 2020) - later published in SSBSE 2021
Searching for Belief Statements from Empirical Data of Cyber-Physical System-of-Systems (from 2021)
Direct Greybox Vuzzing using Monte-Carlo Decision Tree and Ant Colony Optimization (from 2021)

References

Mark Harman. The current state and future of search based software engineering. In FOSE ’07: 2007 Future of Software Engineering, pages 342–357, Washington, DC, USA, 2007. IEEE Computer Society.
Mark Harman, S. Afshin Mansouri, and Yuanyuan Zhang. Search based software engineer- ing: A comprehensive analysis and review of trends techniques and applications. Technical Report TR-09-03, Department of Computer Science, King’s College London, April 2009.
Mark Harman, S. Afshin Mansouri, and Yuanyuan Zhang. Search-based software engi- neering: Trends, techniques and applications. ACM Computing Surveys, 45(1):11:1–11:61, December 2012.
Philip McMinn. Search-based software test data generation: A survey. Software Testing, Verification and Reliability, 14(2):105–156, June 2004.
Outi Räihä. A survey on search-based software design. Technical Report D-2009-1, Department of Computer Science, University of Tampere, 2009.

Reading List

Here are some papers on various SBSE topics to pick from in order to prepare for the paper presentation of your group. You can also consult the SBSE Repository.

Project Management

F. Ferrucci, M. Harman, J. Ren, and F. Sarro. Not going to take this anymore: Multi-objective overtime planning for software engineering projects. In Proceedings of the 2013 International Conference on Software Engineering, ICSE ’13, pages 462–471, Piscataway, NJ, USA, 2013. IEEE Press.
R. Degiovanni, F. Molina, G. Regis, N. Aguirre. A genetic algorithm for goal-conflict identification. In Proceedings of the 33rd International Conference on Automated Software Engineering, ASE ‘18.

Modularisation

B. S. Mitchell and S. Mancoridis. On the automatic modularization of software systems using the bunch tool. IEEE Transactions on Software Engineering, 32(3):193–208, 2006.
K. Praditwong, M. Harman, and X. Yao. Software module clustering as a multi-objective search problem. IEEE Transactions on Software Engineering, 37(2):264–282, March-April 2010.

Test Data Generation

G. Fraser and A. Arcuri. Whole test suite generation. Software Engineering, IEEE Transactions on, 39(2):276–291, Feb 2013.
J. M. Rojas, G. Fraser, and A. Arcuri. Seeding strategies in search-based unit test generation. Journal of Software Testing, Verification, and Reliability, 2016. Wiley.
R. B. Abdessalem, S. Nejati, L. C. Briand, and T. Stifter. Testing vision-based control systems using learnable evolutionary algorithms. In Proceedings of the 40th International Conference on Software Engineering, ICSE ’18, pages 1016–1026, New York, NY, USA, 2018. ACM.
A. Panichella, F. Kifetew, and P. Tonella. Automated Test Case Generation as a Many-Objective Optimisation Problem with Dynamic Selection of the Targets. IEEE Transactions on Software Engineering, 2018. IEEE.
J. Castelein, M. Aniche, M. Soltani, A. Panichella, and A. van Deursen. Search-based test data generation for SQL queries. In Proceedings of the 40th International Conference on Software Engineering, ICSE ’18, pages 1230–1230, New York, NY, USA, 2018. ACM.
R. B. Abdessalem, A. Panichella, S. Nejati, L. C. Briand, and T. Stifer. Testing Autonomous Cars for Feature Interaction Failures using Many-Objective Search. In Proceedings of the 33rd International Conference on Automated Software Engineering, ASE ‘18.
A. Gambi, M. Mueller, and G. Fraser. Automatically testing self-driving cars with search-based procedural content generation. Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis, pages 318-328, 2019. ACM.
R. Jabbarvand, J. Lin, and S. Malek. Search-based energy testing of Android. In Proceedings of the 41st International Conference on Software Engineering, ICSE ’19.
F. Zhang, S. P. Chowdhury, and M. Christakis. DeepSearch: A Simple and Effective Blackbox Attack for Deep Neural Networks. In Proceedings of the 28th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE ‘20.
X. Gao, R. K. Saha, M. R. Prasad, and A. Roychoudhury. Fuzz Testing based Data Augmentation to Improve Robustness of Deep Neural Networks. In Proceedings of the 42nd International Conference on Software Engineering, ICSE ‘20.
S. Reddy, C. Lemieux, R. Padhye, K. Sen. Quickly Generating Diverse Valid Test Inputs with Reinforcement Learning. In Proceedings of the 42nd International Conference on Software Engineering, ICSE ‘20.

Automated Program Repair

W. Weimer, T. Nguyen, C. L. Goues, and S. Forrest. Automatically finding patches using genetic programming. In Proceedings of the 31st IEEE International Conference on Software Engineering (ICSE ’09), pages 364–374, Vancouver, Canada, 16-24 May 2009. IEEE.
S. H. Tan, H. Yoshida, M. R. Prasad, and A. Roychoudhury. Anti-patterns in search-based program repair, Proceedings of the 24th ACM SIGSOFT International Symposium on Fundations of Software Engineering, pages 727-737, 2016. ACM.
S. Mahajan, A. Alameer, P. McMinn, and W. Halfond. Automated repair of layout cross browser issues using search-based techniques. Proceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis, pages 249–260, 2017. ACM.
Y. Yuan, and W. Banzhaf. ARJA: Automated Repair of Java Programs via Multi-Objective Genetic Programming. IEEE Transactions on Software Engineering, 2018.
M. Wen, J. Chen, R. Wu, D. Hao, and S.-C. Cheung. Context-aware patch generation for better au- tomated program repair. In Proceedings of the 40th International Conference on Software Engineering, ICSE ’18, pages 1–11, New York, NY, USA, 2018. ACM.
S. Saha, R. K. Saha, and M. R. Prasad. Harnessing evolution for multi-hunk program repair. In Proceedings of the 41st International Conference on Software Engineering, ICSE ’19.

Regression Testing

S. Yoo and M. Harman. Pareto efficient multi-objective test case selection. In Proceedings of International Symposium on Software Testing and Analysis, pages 140–150. ACM Press, July 2007.
M. G. Epitropakis, S. Yoo, M. Harman, and E. K. Burke. Empirical evaluation of pareto efficient multi- objective regression test case prioritisation. In Proceedings of the 2015 International Symposium on Software Testing and Analysis, ISSTA 2015, pages 234–245, New York, NY, USA, 2015. ACM.

Refactoring

M. O’Cinnéide, L. Tratt, M. Harman, S. Counsell, and I. Hemati Moghadam. Experimental assessment of software metrics using automated refactoring. In Proceedings of the ACM-IEEE international symposium on Empirical software engineering and measurement, ESEM ’12, pages 49–58, New York, NY, USA, 2012. ACM.
V. Alizadeh and M. Kessentini. Reducing Interactive Refactoring Effort via Clustering-Based Multi-objective Search. In Proceedings of the 33rd International Conference on Automated Software Engineering, ASE ‘18.

Genetic Improvement

D. White, A. Arcuri, and J. Clark. Evolutionary improvement of programs. IEEE Transactions on Evolutionary Computation, 15(4):515 –538, August 2011.
E. T. Barr, M. Harman, Y. Jia, A. Marginean, and J. Petke. Automated Software Transplantation. Proceedings of the 2015 International Symposium on Software Testing and Analysis, pages 257–269, 2015.
W. Langdon and M. Harman. Optimizing existing software with genetic programming. Transactions on Evolutionary Computation, 19(1):118–135, 2015.
B. R. Bruce, J. Petke, and M. Harman. Reducing Energy Consumption using Genetic Improvement. In Proceedings of the 2015 Annual Conference on Genetic and Evolutionary Computation, pages 1327-1334, 2015. ACM.

Deep Parameter Optimisation

F. Wu, W. Weimer, M. Harman, Y. Jia, and J. Krinke. Deep parameter optimisation. In Proceedings of the 2015 Annual Conference on Genetic and Evolutionary Computation, GECCO 2015, pages 1375–1382, 2015.

Fault Localisation

S. Yoo. Evolving human competitive spectra-based fault localisation techniques. In G. Fraser and J. Teixeira de Souza, editors, Search Based Software Engineering, volume 7515 of Lecture Notes in Computer Science, pages 244–258. Springer Berlin Heidelberg, 2012.
J. Sohn and S. Yoo. FLUCCS: Using code and change metrics to improve fault localisation. In Proceedings of the International Symposium on Software Testing and Analysis, ISSTA 2017, pages 273–283, 2017.

Miscellaneous

M. Almasi, H. Hemmati, G. Fraser, P. McMinn, J. Benefelds. Search-based detection of deviation failures in the migration of legacy spreadsheet applications. Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis. pages 266-275. 2018, ACM.
Y. Shen, Y. Jiang, C. Xu, P. Yu, X. Ma, and J. Lu. ReScue: Crafting Regular Expression DoS Attacks. In Proceedings of the 33rd International Conference on Automated Software Engineering, ASE ‘18.
J. Chen, V. Nair, R. Krishna, and T. Menzies. “Sampling” as a Baseline Optimizer for Search-Based Software Engineering. IEEE Transactions on Softare Engineering, 2018.
M. Pradel, G. Gousios, J. Liu, S. Chandra. TypeWriter: Neural Type Prediction with Search-based Validation. In Proceedings of the 28th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE ‘20.