Yongle Zhang is a tenure-track assistant professor in the Computer Science Department at Purdue University. He received his Ph.D. from the University of Toronto working with Dr. Ding Yuan. He was recommended to Institute of Computing Technologies for Master’s program after he received his Bachelor’s degree from Shandong University.

His Chinese given name is 永乐(Yong Le), pronounced as 永/juŋ/ 乐/lə/.

I am looking for highly motivated students at different levels (research intern, master, PhD) to work with me in software systems. If you are interested, please contact me with your resume and transcripts.


My research interest is in systems software with a focus on improving the reliability and availability of complex, real-world systems. In particular, we are currently working on cutting-edge tools that help developers with failure detection and diagnosis in production cloud systems, as well as design and implementation of diagnosable software systems.


  • [Dec. 2021] We received an NSF Core grant! Many thanks to NSF and Pedro Fonseca.
  • [Dec. 2021] Min-Ju Li will intern in Cloudera in summer 2022!
  • [Nov. 2021] Our paper about concurrency bugs in persistent memory applications was accepted to ASPLOS 2022!
  • [Oct. 2021] Our paper about upgrade failures in distributed systems appeared on SOSP 2021!
  • [Jan. 2021] I will join Purdue CS as a tenure-track assistant professor.

Recent Publications

  • Efficiently Detecting Concurrency Bugs in Persistent Memory Programs [pdf][code]. Zhangyu Chen, Yu Hua, Yongle Zhang, Luochangqi Ding. The 2022 Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’22).
  • Understanding and Detecting Software Upgrade Failures in Distributed Systems [pdf][code]. Yongle Zhang, Junwen Yang, Zhuqi Jin, Utsav Sethi, Shan Lu, Ding Yuan. The 28th ACM Symposium on Operating Systems Principles (SOSP’21), October 2021.
  • The Inflection Point Hypothesis: A Principled Debugging Approach for Locating the Root Cause of a Failure [pdf]. Yongle Zhang, Kirk Rodrigues, Yu Luo, Michael Stumm, Ding Yuan. The 27th ACM Symposium on Operating Systems Principles (SOSP’19), Oct 2019.
  • Pensieve: Non-Intrusive Failure Reproduction for Distributed Systems using the Event Chaining Approach [pdf]. Yongle Zhang, Serguei Makarov, Xiang Ren, David Lion, Ding Yuan. The 26th ACM Symposium on Operating Systems Principles (SOSP’17), Oct 2017.
  • lprof: A Non-intrusive Request Flow Profiler for Distributed Systems [pdf]. Xu Zhao*, Yongle Zhang*, David Lion, Muhammad FaizanUllah, Yu Luo, Ding Yuan, and Michael Stumm. Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI’14).
  • Simple Testing Can Prevent Most Critical Failures: An Analysis of Production Failures in Distributed Data-intensive Systems [pdf]. Ding Yuan, Yu Luo, Xin Zhuang, Guilherme Rodrigues, Xu Zhao, Yongle Zhang, Pranay U. Jain, and Michael Stumm. Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI’14), Oct 2014.

Full publication list…