Meta Minesweeper: Scalable Statistical Root Cause Analysis on App Telemetry
Manage episode 487366638 series 3670304
This research paper introduces Minesweeper, a novel technique for automated root cause analysis (RCA) of software bugs at scale. Leveraging telemetry data, Minesweeper efficiently identifies statistically significant patterns in user app traces that correlate with bugs, even in the absence of detailed debugging information. The method uses sequential pattern mining, specifically the PrefixSpan algorithm, for pattern extraction and incorporates statistical measures of precision and recall to rank patterns by distinctiveness. Practical challenges like handling numeric data and mitigating redundant patterns are addressed, and the system's scalability and accuracy are demonstrated through real-world evaluations on Facebook's app data. The results show Minesweeper significantly improves the speed and accuracy of RCA, aiding engineers in quickly identifying and resolving bugs.
https://arxiv.org/pdf/2010.09974
43 episodes