CS454 AI Based Software Engineering (Autumn 2024)
Syllabus
This class broadly covers the use of machine intellgence for software engineering. There are two major topics in the couse:
- We will learn a class of algorithms that are referred to metaheuristics. This includes evolutionary computation as well as other bio-inspired algorithms. While this course is not about these algorithms, currently they are not really part of other School of Computing courses, so we will study them.
- We will learn how SE problems are fomulated as optimisation and/or learning problem, and will try to replicate some of them by ourselves. Originally, CS454 was focused more on metaheuristic algorithms, but we will try to also expand into various ML algorithms. However, since we are NOT directly covering ML algorithms, it is highly recommended that you have taken some introductory courses on the relevant topics (AI, ML, Deep Learning, etc).
Lectures
- Time: 14:30-16:00, Tuesdays and Thursdays
- Location: E3-1 Room 1220
Communication
On the course Slack (invitation link has been distributed - contact the lecturer if you joined late and do not have it).
Lecturer
- Shin Yoo shin.yoo@kaist.ac.kr
- Office: E3-1 Room 2405
Teaching Assistant
Required Skills
The course assumes that students have the following skills. Unless you satisfy the following, you may encounter significant difficulties during courseworks and projects:
- Strong programming skills: you are required to develop a course project. There will be also a number of hands-on sessions where we will program together during the class.
- Unix/Linux-savvy: you should be familiar with the usual build tools and Unix/Linux command line environments.
- Git-aware: you will be required to submit a github repository as part of your project deliverables.
- Background knowledg about basic AI/ML
- (Optional but strongly recommended) LaTeX
Out of these, I would understand if you have to learn how to use git during this course. But programming proficiency and Unix/Linux literacy are essential from the beginning.
Evaluation
The course puts a strong emphasis on understanding the algorithms and approaches hands-on, instead of written exams. Assignments may be hard, but I am confident that you will learn something you can reuse after the course by doing them yourselves.
- Coursework: 70%
- Project: 30%
Lectures
The schedule below is still tentative for now and may change, given that we do not know the exact class size yet.
- 09/03: Introduction
- 09/05: Metaheuristics
- 09/10: Fitness Landscape, Random Search, and Local Search Algorithms
- 09/12: Random/Local Search Hands-on + SBST and Metaprogramming Tutorial
- 09/17: Public Holiday (Chuseok)
- 09/19: Evolutionary Computation 1/2
- 09/24: Evolutionary Computation 2/2
- 09/26: Genetic Algorithm Hands-on
- 10/01: Public Holiday
- 10/03: Public Holiday
- 10/08: Genetic Programming
- 10/10: Symbolic Regression Hands-on
- 10/15: Multi-objective Optimisation
- 10/17: Project Proposals
- 10/22: Midterm Exam Week
- 10/24: Midterm Exam Week
- 10/29: No Lecture - ASE 2024
- 10/31: No Lecture - ASE 2024
- 11/05: MOEA Evaluation Metrics, Normalisation
- 11/07: Bio-inspired Algorithms, SBSE Overview
- 11/12: Advanced Algorithms and Optimisation Tips
- 11/14: Elementary Landscape Analysis, No Free Lunch Theorem
- 11/19: Test Adequacy for DNN + Input Generation
- 11/21: Automated Program Repair
- 11/26: Guest Lecture: Prof. Jooyong Yi, UNIST
- 11/28: No Lecture - Undergraduate Admission Interview
- 12/03: Naturalness of Code + N-gram Language Model Hands-on
- 12/05: Large Language Models
- 12/10: Project Presentation
- 12/12: Project Presentation
- 12/17: Final Exam Week
- 12/19: Final Exam Week
Assignments will be made available as we go along.
Assignment #1: Search Based Test Data Generation
You will implement a search-based test data generation tool for a subset of Python, using local search algorithms as the base optimizer. You can access the assignment via GitHub Class: LINK. This assignment accounts for 20% of the course. Deadline is the end of 3rd of October 2024.
Assignment #2: Next Release Problem
You will implement solvers for Next Release Problem using basic heuristics (greedy) as well as genetic algorithm. You can access the assignment via GitHub Class: LINK. This assignment accounts for 15% of the course. Deadline is 17th of October 2024.
Assignment #3: Evolving Spectrum Based Fault Localisation Techniques
You will implement a basic GP system in order to evolve an SBFL formula, an automated debugging technique used for fault localization. You can access the assignment via GitHub Classroom: LINK. This assignment accounts for 15% of the course. Deadline is the end of 19th of November 2024.
Assignment #4: Evolutionary Fuzzing of DNNs
You will implement a GA based DNN fuzzer, so that the generated inputs will trick an existing image classifier. You can access the assignment via GitHub Classroom: LINK. This assignment accounts for 20% of the course. Deadline is the end of 10th of December 2024.
Group Project Outline
The course project requires you to identify and formulate an SBSE problem and solve it using a metaheuristic approach.
Project Ideas
There are a few ways to find ideas to work on:
-
Consider your own experience as a developer: if there is some aspect of software development that you think you can improve by automatically searching for a solution, you can deploy algorithms that we have learnt.
-
Replicating and/or Improving Literature: there are very good surveys [4, 2, 5, 3] and position papers [1] on SBSE, as well as individual research work. You can find a research topic/idea that looks attractive to you, and replicate the work in a small scale. Of course, if you can find a way to improve techniques in the literature, it is even more desirable! There are a few possible places to look:
– International Symposium on Search Based Software Engineering (SSBSE): http://dblp.uni-trier.de/db/conf/ssbse/index.html
– SBSE Track at ACM Genetic Algorithm and Evolutionary Computation Conference (GECCO): https://dblp.uni-trier.de/db/conf/gecco/ (Look for Search-based Software Engineering section under each year’s proceedings)
– International Workshop on Search Based Software Testing (SBST): https://dblp.org/db/conf/icse/. URL is for all workshops collocated with International Conference on Software Engineering - look for SBST, which has been running since 2014.
Many major software engineering conferences have also accepted many SBSE papers over the years. Our reading list is a great starting place. If you are unsure about the scope of a paper, please consult me.
-
Practical Challenges: The SSBSE conference, which I mentioned above, has also been running a Challenge Track since 2013. Each year, the organisers choose a few software systems in different languages, and participants are encouraged to do something related to those systems using SBSE. Applications that have been submitted so far include: fault prediction, energy consumption improvement, searching for GUI crash events, test case prioritisation, test data generation, deep parameter optimisation, etc.
I understand that the scope is very wide and initially choosing a subject can be a bit confusing. You are welcome to consult any of the T/As and/or me, either by email or face to face. If you want to see me, it is best to make an explicit appointment by email(shin.yoo@kaist.ac.kr).
Deliverables & Timeline
30th September: Deadline for registering your team
Try to build a team of four people. There is a link to a Google Sheet document on the course Slack workspace: build and register your team by no later than 30th September 2024.
10th October: Initial Expression of Interest
Submit an 1-page document that contains one or more project ideas your team is currently considering. I will read them and give feedback to you. Submission should be made on KLMS.
17th October: Project Sales Pitch
Your team will finalise your project theme, and give a presentation to the class. Try to include the following points:
- What the target problem is and why it’s important
- How you plan to attack it using optimization or other AI/ML techniques
- How you plan to evaluate the difference you have made
10th, 12th December: Final Project Presentation
Each team will do the final presentation of the project. The presentation should report the status of the project, and clearly outline what remains to be done until the final submission.
20th December: Final Report and Deliverable
Submit the following via KLMS: the dealine is the end of 20th December 2024. The person who is submitting the team report can zip it with his/her individual report; others can simply upload their individual report.
-
Each team should submit at least one group report. There is no page limit, or required font size, but please make sure you submit a PDF file to KLMS. The group report should describe the problem, your approach, and the evaluation results. It should also include a link to your code repository.
-
Each person should submit an individual report. Individual reports should include the following two parts:
- Detailed description of what you have contributed to the project
- A 10-point scale evaluation of all your teammates, wich short descriptive evaluation of their contribution and teamwork
Previous Projects
These are made available only as a reference. You cannot propose identical projects. Two of them are in Korean, because the policy was different then: apologies for those who do not read/speak Korean.
- Automatically Testing Self-Driving Cars with Search-Based Procedural Content Generation (from 2020 - later published in SBST 2021
- SWAY for Decision Space of Permuatations with Case Study on Test Case Prioritization (from 2020) - later published in SSBSE 2021
- Searching for Belief Statements from Empirical Data of Cyber-Physical System-of-Systems (from 2021)
- Direct Greybox Fuzzing using Monte-Carlo Decision Tree and Ant Colony Optimization (from 2021)
- Grammatical Evolution based Test Case Generation for Microcontroller (from 2023)
References
-
Mark Harman. The current state and future of search based software engineering. In FOSE ’07: 2007 Future of Software Engineering, pages 342–357, Washington, DC, USA, 2007. IEEE Computer Society.
-
Mark Harman, S. Afshin Mansouri, and Yuanyuan Zhang. Search based software engineer- ing: A comprehensive analysis and review of trends techniques and applications. Technical Report TR-09-03, Department of Computer Science, King’s College London, April 2009.
-
Mark Harman, S. Afshin Mansouri, and Yuanyuan Zhang. Search-based software engi- neering: Trends, techniques and applications. ACM Computing Surveys, 45(1):11:1–11:61, December 2012.
-
Philip McMinn. Search-based software test data generation: A survey. Software Testing, Verification and Reliability, 14(2):105–156, June 2004.
-
Outi Räihä. A survey on search-based software design. Technical Report D-2009-1, Department of Computer Science, University of Tampere, 2009.
Reading List
Here are some papers on various SBSE topics to pick from in order to prepare for the paper presentation of your group. You can also consult the SBSE Repository.
Project Management
-
F. Ferrucci, M. Harman, J. Ren, and F. Sarro. Not going to take this anymore: Multi-objective overtime planning for software engineering projects. In Proceedings of the 2013 International Conference on Software Engineering, ICSE ’13, pages 462–471, Piscataway, NJ, USA, 2013. IEEE Press.
-
R. Degiovanni, F. Molina, G. Regis, N. Aguirre. A genetic algorithm for goal-conflict identification. In Proceedings of the 33rd International Conference on Automated Software Engineering, ASE ‘18.
Modularisation
-
B. S. Mitchell and S. Mancoridis. On the automatic modularization of software systems using the bunch tool. IEEE Transactions on Software Engineering, 32(3):193–208, 2006.
-
K. Praditwong, M. Harman, and X. Yao. Software module clustering as a multi-objective search problem. IEEE Transactions on Software Engineering, 37(2):264–282, March-April 2010.
Test Data Generation
-
G. Fraser and A. Arcuri. Whole test suite generation. Software Engineering, IEEE Transactions on, 39(2):276–291, Feb 2013.
-
J. M. Rojas, G. Fraser, and A. Arcuri. Seeding strategies in search-based unit test generation. Journal of Software Testing, Verification, and Reliability, 2016. Wiley.
-
R. B. Abdessalem, S. Nejati, L. C. Briand, and T. Stifter. Testing vision-based control systems using learnable evolutionary algorithms. In Proceedings of the 40th International Conference on Software Engineering, ICSE ’18, pages 1016–1026, New York, NY, USA, 2018. ACM.
-
A. Panichella, F. Kifetew, and P. Tonella. Automated Test Case Generation as a Many-Objective Optimisation Problem with Dynamic Selection of the Targets. IEEE Transactions on Software Engineering, 2018. IEEE.
-
J. Castelein, M. Aniche, M. Soltani, A. Panichella, and A. van Deursen. Search-based test data generation for SQL queries. In Proceedings of the 40th International Conference on Software Engineering, ICSE ’18, pages 1230–1230, New York, NY, USA, 2018. ACM.
-
R. B. Abdessalem, A. Panichella, S. Nejati, L. C. Briand, and T. Stifer. Testing Autonomous Cars for Feature Interaction Failures using Many-Objective Search. In Proceedings of the 33rd International Conference on Automated Software Engineering, ASE ‘18.
-
A. Gambi, M. Mueller, and G. Fraser. Automatically testing self-driving cars with search-based procedural content generation. Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis, pages 318-328, 2019. ACM.
-
R. Jabbarvand, J. Lin, and S. Malek. Search-based energy testing of Android. In Proceedings of the 41st International Conference on Software Engineering, ICSE ’19.
-
F. Zhang, S. P. Chowdhury, and M. Christakis. DeepSearch: A Simple and Effective Blackbox Attack for Deep Neural Networks. In Proceedings of the 28th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE ‘20.
-
X. Gao, R. K. Saha, M. R. Prasad, and A. Roychoudhury. Fuzz Testing based Data Augmentation to Improve Robustness of Deep Neural Networks. In Proceedings of the 42nd International Conference on Software Engineering, ICSE ‘20.
-
S. Reddy, C. Lemieux, R. Padhye, K. Sen. Quickly Generating Diverse Valid Test Inputs with Reinforcement Learning. In Proceedings of the 42nd International Conference on Software Engineering, ICSE ‘20.
Automated Program Repair
-
W. Weimer, T. Nguyen, C. L. Goues, and S. Forrest. Automatically finding patches using genetic programming. In Proceedings of the 31st IEEE International Conference on Software Engineering (ICSE ’09), pages 364–374, Vancouver, Canada, 16-24 May 2009. IEEE.
-
S. H. Tan, H. Yoshida, M. R. Prasad, and A. Roychoudhury. Anti-patterns in search-based program repair, Proceedings of the 24th ACM SIGSOFT International Symposium on Fundations of Software Engineering, pages 727-737, 2016. ACM.
-
S. Mahajan, A. Alameer, P. McMinn, and W. Halfond. Automated repair of layout cross browser issues using search-based techniques. Proceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis, pages 249–260, 2017. ACM.
-
Y. Yuan, and W. Banzhaf. ARJA: Automated Repair of Java Programs via Multi-Objective Genetic Programming. IEEE Transactions on Software Engineering, 2018.
-
M. Wen, J. Chen, R. Wu, D. Hao, and S.-C. Cheung. Context-aware patch generation for better au- tomated program repair. In Proceedings of the 40th International Conference on Software Engineering, ICSE ’18, pages 1–11, New York, NY, USA, 2018. ACM.
-
S. Saha, R. K. Saha, and M. R. Prasad. Harnessing evolution for multi-hunk program repair. In Proceedings of the 41st International Conference on Software Engineering, ICSE ’19.
Regression Testing
-
S. Yoo and M. Harman. Pareto efficient multi-objective test case selection. In Proceedings of International Symposium on Software Testing and Analysis, pages 140–150. ACM Press, July 2007.
-
M. G. Epitropakis, S. Yoo, M. Harman, and E. K. Burke. Empirical evaluation of pareto efficient multi- objective regression test case prioritisation. In Proceedings of the 2015 International Symposium on Software Testing and Analysis, ISSTA 2015, pages 234–245, New York, NY, USA, 2015. ACM.
Refactoring
-
M. O’Cinnéide, L. Tratt, M. Harman, S. Counsell, and I. Hemati Moghadam. Experimental assessment of software metrics using automated refactoring. In Proceedings of the ACM-IEEE international symposium on Empirical software engineering and measurement, ESEM ’12, pages 49–58, New York, NY, USA, 2012. ACM.
-
V. Alizadeh and M. Kessentini. Reducing Interactive Refactoring Effort via Clustering-Based Multi-objective Search. In Proceedings of the 33rd International Conference on Automated Software Engineering, ASE ‘18.
Genetic Improvement
-
D. White, A. Arcuri, and J. Clark. Evolutionary improvement of programs. IEEE Transactions on Evolutionary Computation, 15(4):515 –538, August 2011.
-
E. T. Barr, M. Harman, Y. Jia, A. Marginean, and J. Petke. Automated Software Transplantation. Proceedings of the 2015 International Symposium on Software Testing and Analysis, pages 257–269, 2015.
-
W. Langdon and M. Harman. Optimizing existing software with genetic programming. Transactions on Evolutionary Computation, 19(1):118–135, 2015.
-
B. R. Bruce, J. Petke, and M. Harman. Reducing Energy Consumption using Genetic Improvement. In Proceedings of the 2015 Annual Conference on Genetic and Evolutionary Computation, pages 1327-1334, 2015. ACM.
Deep Parameter Optimisation
- F. Wu, W. Weimer, M. Harman, Y. Jia, and J. Krinke. Deep parameter optimisation. In Proceedings of the 2015 Annual Conference on Genetic and Evolutionary Computation, GECCO 2015, pages 1375–1382, 2015.
Fault Localisation
-
S. Yoo. Evolving human competitive spectra-based fault localisation techniques. In G. Fraser and J. Teixeira de Souza, editors, Search Based Software Engineering, volume 7515 of Lecture Notes in Computer Science, pages 244–258. Springer Berlin Heidelberg, 2012.
-
J. Sohn and S. Yoo. FLUCCS: Using code and change metrics to improve fault localisation. In Proceedings of the International Symposium on Software Testing and Analysis, ISSTA 2017, pages 273–283, 2017.
Miscellaneous
-
M. Almasi, H. Hemmati, G. Fraser, P. McMinn, J. Benefelds. Search-based detection of deviation failures in the migration of legacy spreadsheet applications. Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis. pages 266-275. 2018, ACM.
-
Y. Shen, Y. Jiang, C. Xu, P. Yu, X. Ma, and J. Lu. ReScue: Crafting Regular Expression DoS Attacks. In Proceedings of the 33rd International Conference on Automated Software Engineering, ASE ‘18.
-
J. Chen, V. Nair, R. Krishna, and T. Menzies. “Sampling” as a Baseline Optimizer for Search-Based Software Engineering. IEEE Transactions on Softare Engineering, 2018.
-
M. Pradel, G. Gousios, J. Liu, S. Chandra. TypeWriter: Neural Type Prediction with Search-based Validation. In Proceedings of the 28th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE ‘20.