Shuyao Jiang - Publications

Debugging Performance Issues in WebAssembly Runtimes via Mutation-based Inference

Ruiying Zeng, Shuyao Jiang, Wenxuan Zhao, Yangfan Zhou

ICSE'26 The 48th IEEE/ACM International Conference on Software Engineering, Rio de Janeiro, Brazil, Apr. 12-18, 2026.

Performance debugging in WebAssembly (Wasm) runtimes is essential for ensuring the robustness of Wasm, especially since performance issues have frequently occurred in Wasm runtimes, which can significantly degrade the capabilities of hosted services. Many performance issues in Wasm runtimes result from suboptimal compilation of input Wasm programs, for which existing performance debugging methods primarily designed for application-level inefficiencies are not well-suited. In this paper, we present WarpL, a novel mutation-based approach that aims to identify the exact suboptimal instruction sequences responsible for the performance issues in Wasm runtimes, thereby narrowing down the root causes. Specifically, WarpL obtains a functionally similar mutant in which the performance issue does not manifest, and isolates the exact suboptimal instructions by comparing the machine code of the original and mutated programs. We implement WarpL as an open-source tool and evaluate it on 12 real-world performance issues across three widely used Wasm runtimes. WarpL identified the exact causes in 10 out of 12 issues. Notably, we have used WarpL to successfully diagnose six previously unknown performance issues in Wasmtime.

Debugging Performance Issues in WebAssembly Runtimes via Mutation-based Inference

Ruiying Zeng, Shuyao Jiang, Wenxuan Zhao, Yangfan Zhou

ICSE'26 The 48th IEEE/ACM International Conference on Software Engineering, Rio de Janeiro, Brazil, Apr. 12-18, 2026.

Performance debugging in WebAssembly (Wasm) runtimes is essential for ensuring the robustness of Wasm, especially since performance issues have frequently occurred in Wasm runtimes, which can significantly degrade the capabilities of hosted services. Many performance issues in Wasm runtimes result from suboptimal compilation of input Wasm programs, for which existing performance debugging methods primarily designed for application-level inefficiencies are not well-suited. In this paper, we present WarpL, a novel mutation-based approach that aims to identify the exact suboptimal instruction sequences responsible for the performance issues in Wasm runtimes, thereby narrowing down the root causes. Specifically, WarpL obtains a functionally similar mutant in which the performance issue does not manifest, and isolates the exact suboptimal instructions by comparing the machine code of the original and mutated programs. We implement WarpL as an open-source tool and evaluate it on 12 real-world performance issues across three widely used Wasm runtimes. WarpL identified the exact causes in 10 out of 12 issues. Notably, we have used WarpL to successfully diagnose six previously unknown performance issues in Wasmtime.

Can User Feedback Help Issue Detection? An Empirical Study on a One-billion-user Online Service System

Shuyao Jiang, Jiazhen Gu, Wujie Zheng, Yangfan Zhou, Michael R. Lyu

ESEM'25 The 19th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, Honolulu, HI, USA, Sep. 29 - Oct. 3, 2025.

Background: It has long been suggested that user feedback, typically written in natural language by end-users, can help issue detection. However, for large-scale online service systems that receive a tremendous amount of feedback, it remains a challenging task to identify severe issues from user feedback. Aims: To develop a better feedback-based issue detection approach, it is crucial first to gain a comprehensive understanding of the characteristics of user feedback in real production systems. Method: In this paper, we conduct an empirical study on 50,378,766 user feedback items from six real-world services in a one-billion-user online service system. We first study what users provide in their feedback. We then examine whether certain features of feedback items can be good indicators of severe issues. Finally, we investigate whether adopting machine learning techniques to analyze user feedback is reasonable. Results: Our results show that a large proportion of user feedback provides irrelevant information about system issues. As a result, it is crucial to filter out issue-irrelevant information when processing user feedback. Moreover, we find severe issues that cannot be easily detected based solely on user feedback characteristics. Finally, we find that the distributions of the feedback topics in different time intervals are similar. This confirms that designing machine learning-based approaches is a viable direction for better analyzing user feedback. Conclusions: We consider that our findings can serve as an empirical foundation for feedback-based issue detection in large-scale service systems, which sheds light on the design and implementation of practical issue detection approaches.

Can User Feedback Help Issue Detection? An Empirical Study on a One-billion-user Online Service System

Shuyao Jiang, Jiazhen Gu, Wujie Zheng, Yangfan Zhou, Michael R. Lyu

ESEM'25 The 19th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, Honolulu, HI, USA, Sep. 29 - Oct. 3, 2025.

Background: It has long been suggested that user feedback, typically written in natural language by end-users, can help issue detection. However, for large-scale online service systems that receive a tremendous amount of feedback, it remains a challenging task to identify severe issues from user feedback. Aims: To develop a better feedback-based issue detection approach, it is crucial first to gain a comprehensive understanding of the characteristics of user feedback in real production systems. Method: In this paper, we conduct an empirical study on 50,378,766 user feedback items from six real-world services in a one-billion-user online service system. We first study what users provide in their feedback. We then examine whether certain features of feedback items can be good indicators of severe issues. Finally, we investigate whether adopting machine learning techniques to analyze user feedback is reasonable. Results: Our results show that a large proportion of user feedback provides irrelevant information about system issues. As a result, it is crucial to filter out issue-irrelevant information when processing user feedback. Moreover, we find severe issues that cannot be easily detected based solely on user feedback characteristics. Finally, we find that the distributions of the feedback topics in different time intervals are similar. This confirms that designing machine learning-based approaches is a viable direction for better analyzing user feedback. Conclusions: We consider that our findings can serve as an empirical foundation for feedback-based issue detection in large-scale service systems, which sheds light on the design and implementation of practical issue detection approaches.

Distinguishability-guided Test Program Generation for WebAssembly Runtime Performance Testing

Shuyao Jiang, Ruiying Zeng, Yangfan Zhou, Michael R. Lyu

SANER'25 The 32nd IEEE International Conference on Software Analysis, Evolution and Reengineering, Montreal, QC, Canada, Mar. 4-7, 2025.

WebAssembly (Wasm) is a binary instruction format designed as a portable compilation target, which has been widely used on both the web and server sides in recent years. As high performance is a critical design goal of Wasm, it is essential to conduct performance testing for Wasm runtimes. However, existing research on Wasm runtime performance testing still suffers from insufficient high-quality test programs. To solve this problem, we propose a novel test program generation approach WarpGen. It first extracts code snippets from historical issue-triggering test programs as initial operators, then inserts an operator into a seed program to synthesize a new test program. To verify the quality of generated programs, we propose an indicator called distinguishability, which refers to the ability of a test program to distinguish abnormal performance of specific Wasm runtimes. We apply WarpGen for performance testing on four Wasm runtimes and verify its effectiveness compared with baseline approaches. In particular, WarpGen has identified seven new performance issues in three Wasm runtimes.

Distinguishability-guided Test Program Generation for WebAssembly Runtime Performance Testing

Shuyao Jiang, Ruiying Zeng, Yangfan Zhou, Michael R. Lyu

SANER'25 The 32nd IEEE International Conference on Software Analysis, Evolution and Reengineering, Montreal, QC, Canada, Mar. 4-7, 2025.

WebAssembly (Wasm) is a binary instruction format designed as a portable compilation target, which has been widely used on both the web and server sides in recent years. As high performance is a critical design goal of Wasm, it is essential to conduct performance testing for Wasm runtimes. However, existing research on Wasm runtime performance testing still suffers from insufficient high-quality test programs. To solve this problem, we propose a novel test program generation approach WarpGen. It first extracts code snippets from historical issue-triggering test programs as initial operators, then inserts an operator into a seed program to synthesize a new test program. To verify the quality of generated programs, we propose an indicator called distinguishability, which refers to the ability of a test program to distinguish abnormal performance of specific Wasm runtimes. We apply WarpGen for performance testing on four Wasm runtimes and verify its effectiveness compared with baseline approaches. In particular, WarpGen has identified seven new performance issues in three Wasm runtimes.

Revealing Performance Issues in Server-side WebAssembly Runtimes via Differential Testing

Shuyao Jiang, Ruiying Zeng, Zihao Rao, Jiazhen Gu, Yangfan Zhou, Michael R. Lyu

ASE'23 The 38th IEEE/ACM International Conference on Automated Software Engineering, Luxembourg, Luxembourg, Sep. 11-15, 2023.

WebAssembly (Wasm) is a bytecode format originally serving as a compilation target for Web applications. It has recently been used increasingly on the server side, e.g., providing a safer, faster, and more portable alternative to Linux containers. With the popularity of server-side Wasm applications, it is essential to study performance issues (i.e., abnormal latency) in Wasm runtimes, as they may cause a significant impact on server-side applications. However, there is still a lack of attention to performance issues in server-side Wasm runtimes. In this paper, we design a novel differential testing approach WarpDiff to identify performance issues in server-side Wasm runtimes. The key insight is that in normal cases, the execution time of the same test case on different Wasm runtimes should follow an oracle ratio. We identify abnormal cases where the execution time ratio significantly deviates from the oracle ratio and subsequently locate the Wasm runtimes that cause the performance issues. We apply WarpDiff to test five popular server-side Wasm runtimes using 123 test cases from the LLVM test suite and demonstrate the top 10 abnormal cases we identified. We further conduct an in-depth analysis of these abnormal cases and summarize seven performance issues, all of which have been confirmed by the developers. We hope our work can inspire future investigation on improving Wasm runtime implementation and thus promoting the development of server-side Wasm applications.

Revealing Performance Issues in Server-side WebAssembly Runtimes via Differential Testing

Shuyao Jiang, Ruiying Zeng, Zihao Rao, Jiazhen Gu, Yangfan Zhou, Michael R. Lyu

ASE'23 The 38th IEEE/ACM International Conference on Automated Software Engineering, Luxembourg, Luxembourg, Sep. 11-15, 2023.

WebAssembly (Wasm) is a bytecode format originally serving as a compilation target for Web applications. It has recently been used increasingly on the server side, e.g., providing a safer, faster, and more portable alternative to Linux containers. With the popularity of server-side Wasm applications, it is essential to study performance issues (i.e., abnormal latency) in Wasm runtimes, as they may cause a significant impact on server-side applications. However, there is still a lack of attention to performance issues in server-side Wasm runtimes. In this paper, we design a novel differential testing approach WarpDiff to identify performance issues in server-side Wasm runtimes. The key insight is that in normal cases, the execution time of the same test case on different Wasm runtimes should follow an oracle ratio. We identify abnormal cases where the execution time ratio significantly deviates from the oracle ratio and subsequently locate the Wasm runtimes that cause the performance issues. We apply WarpDiff to test five popular server-side Wasm runtimes using 123 test cases from the LLVM test suite and demonstrate the top 10 abnormal cases we identified. We further conduct an in-depth analysis of these abnormal cases and summarize seven performance issues, all of which have been confirmed by the developers. We hope our work can inspire future investigation on improving Wasm runtime implementation and thus promoting the development of server-side Wasm applications.

Towards Usable Neural Comment Generation via Code-Comment Linkage Interpretation: Method and Empirical Study

Shuyao Jiang, Jiacheng Shen, Shengnan Wu, Yu Cai, Yue Yu, Yangfan Zhou

TSE'23 IEEE Transactions on Software Engineering, vol. 49, no. 4, pp. 2239-2254, Apr. 2023.

Code comment is important to facilitate code comprehension for developers. Recent studies suggest to generate comments automatically with deep learning, in particular, based on neural machine translation models. However, such a promising Neural Comment Generation (NCG) technique suffers from unsatisfactory performance, as well as poor usability, i.e., developers cannot easily understand and modify the auto-generated comments. This paper suggests that a proper interpretation of how the comments are generated can significantly improve the usability of NCG approaches. We propose a novel model-independent framework, namely CCLink, to interpret the auto-generated comments. CCLink generates a set of code mutants and obtains their corresponding comments. Based on these data, several contribution mining algorithms are designed to infer the key elements in code that contributes to the generation of the key phrases in the comments. The links between code and its auto-generated comment can thus be constructed. This in turn allows CCLink to visualize the links as the comment interpretations to developers. It greatly facilitates manual verification and correction of the comments. We examine the performance of CCLink with different contribution mining algorithms, NCG approaches, and real-world datasets. We also conduct an empirical study on 32 experienced Java programmers to evaluate the effectiveness of CCLink. The results show that CCLink is promising in making NCG more usable with a proper interpretation of the auto-generated comments.

Towards Usable Neural Comment Generation via Code-Comment Linkage Interpretation: Method and Empirical Study

Shuyao Jiang, Jiacheng Shen, Shengnan Wu, Yu Cai, Yue Yu, Yangfan Zhou

TSE'23 IEEE Transactions on Software Engineering, vol. 49, no. 4, pp. 2239-2254, Apr. 2023.

Code comment is important to facilitate code comprehension for developers. Recent studies suggest to generate comments automatically with deep learning, in particular, based on neural machine translation models. However, such a promising Neural Comment Generation (NCG) technique suffers from unsatisfactory performance, as well as poor usability, i.e., developers cannot easily understand and modify the auto-generated comments. This paper suggests that a proper interpretation of how the comments are generated can significantly improve the usability of NCG approaches. We propose a novel model-independent framework, namely CCLink, to interpret the auto-generated comments. CCLink generates a set of code mutants and obtains their corresponding comments. Based on these data, several contribution mining algorithms are designed to infer the key elements in code that contributes to the generation of the key phrases in the comments. The links between code and its auto-generated comment can thus be constructed. This in turn allows CCLink to visualize the links as the comment interpretations to developers. It greatly facilitates manual verification and correction of the comments. We examine the performance of CCLink with different contribution mining algorithms, NCG approaches, and real-world datasets. We also conduct an empirical study on 32 experienced Java programmers to evaluate the effectiveness of CCLink. The results show that CCLink is promising in making NCG more usable with a proper interpretation of the auto-generated comments.

Boosting Neural Commit Message Generation with Code Semantic Analysis

Shuyao Jiang

ASE'19 The 34th IEEE/ACM International Conference on Automated Software Engineering, San Diego, CA, USA, Nov. 11-15, 2019. 🏆 Second Place in ACM Student Research Competition

It has been long suggested that commit messages can greatly facilitate code comprehension. However, developers may not write good commit messages in practice. Neural machine translation (NMT) has been suggested to automatically generate commit messages. Despite the efforts in improving NMT algorithms, the quality of the generated commit messages is not yet satisfactory. This paper, instead of improving NMT algorithms, suggests that proper preprocessing of code changes into concise inputs is quite critical to train NMT. We approach it with semantic analysis of code changes. We collect a real-world dataset with 50k+ commits of popular Java projects, and verify our idea with comprehensive experiments. The results show that preprocessing inputs with code semantic analysis can improve NMT significantly. This work sheds light to how to apply existing DNNs designed by the machine learning community, e.g., NMT models, to complete software engineering tasks.

Boosting Neural Commit Message Generation with Code Semantic Analysis

Shuyao Jiang

ASE'19 The 34th IEEE/ACM International Conference on Automated Software Engineering, San Diego, CA, USA, Nov. 11-15, 2019. 🏆 Second Place in ACM Student Research Competition

It has been long suggested that commit messages can greatly facilitate code comprehension. However, developers may not write good commit messages in practice. Neural machine translation (NMT) has been suggested to automatically generate commit messages. Despite the efforts in improving NMT algorithms, the quality of the generated commit messages is not yet satisfactory. This paper, instead of improving NMT algorithms, suggests that proper preprocessing of code changes into concise inputs is quite critical to train NMT. We approach it with semantic analysis of code changes. We collect a real-world dataset with 50k+ commits of popular Java projects, and verify our idea with comprehensive experiments. The results show that preprocessing inputs with code semantic analysis can improve NMT significantly. This work sheds light to how to apply existing DNNs designed by the machine learning community, e.g., NMT models, to complete software engineering tasks.