name: template-default layout: true
◀
▶
--- name: template-title layout: true template: template-default class: slide-title, center --- name: template-section layout: true template: template-default class: slide-section, center, middle --- name: template-page layout: true template: template-default class: slide-page --- template: template-title # Performance testing ## Andrey Akinshin, JetBrains ### Dotnetos, Warsaw, Poland, 10.10.2019 --- template: template-section ## Part 1 ## Introduction --- layout: true template: template-page
.footer-note[(1) Introduction]
--- class: normal ### How do we typically solve performance problems? .center[data:image/s3,"s3://crabby-images/acfca/acfca530d593d03b596340a26477b838b69938e5" alt=":scale 90%"] .bottom-hint-huge[https://www.youtube.com/watch?v=z6_ZbG7Zu0c] --- class: normal ### What do we want? -- 1. Prevent performance degradations -- 2. Detect not-prevented degradations -- 3. Detect other kinds of performance anomalies -- 4. Reduce Type I (false positive) error rate
*(Detecting fake problem)* -- 5. Reduce Type II (false negative) error rate
*(Missing real problem)* -- 6. Automate everything --- class: normal ### Agenda * Performance Summary Reports * Performance Comparing Reports * Performance Anomalies * Performance Alarms & Asserts --- template: template-section ## Part 2 ## Performance Summary Reports --- layout: true template: template-page
.footer-note[(2) Performance Summary Reports]
--- class: normal ### BenchmarkDotNet .up[] .center[data:image/s3,"s3://crabby-images/86edd/86eddcb22c452e300b5feb8fb31c6c84f1715676" alt=":scale 100%"] --- class: normal ### Normal distribution .up[] .center[data:image/s3,"s3://crabby-images/0a2c1/0a2c1765e8465f5a60c6aa1c74dcc75f8f9255d8" alt=":scale 65%"] -- .up1[]
.c[ > *Normality is a myth; there never was, and never will be, a normal distribution.* > "Testing for normality", R.C. Geary, 1947 ]
.bottom-hint-huge[https://aakinshin.net/posts/normality/] --- class: normal ### An experiment .up[] .size-50[ .center[ ```md | Method | Mean | Error | StdDev | Median | |------- |---------:|---------:|---------:|---------:| | A | `136.2 ms` | 19.30 ms | 56.92 ms | 107.0 ms | | B | `133.7 ms` | 4.14 ms | 12.20 ms | 130.2 ms | ``` ] ] -- .center[data:image/s3,"s3://crabby-images/35c54/35c544ce1344a82d7897c0ec02842244426f019b" alt=":scale 60%"] --- class: normal ### What kind of metrics should we report? .up3[] .center[data:image/s3,"s3://crabby-images/e3456/e34568c7291c2012e949e75a8cd9761d3c46f59c" alt=":scale 80%"] --- class: normal ### Mean and StdDev may be misleading .up[] .center[data:image/s3,"s3://crabby-images/0a5bd/0a5bd5fe6bec308dafe06f4df4ec361c79aee443" alt=":scale 80%"] .bottom-hint[Justin Matejka, George Fitzmaurice (2017), "Same Stats, Different Graphs: Generating Datasets with Varied Appearance and Identical Statistics through Simulated Annealing", CHI 2017 Conference proceedings: ACM SIGCHI Conference on Human Factors in Computing Systems] --- class: normal ### Mean and StdDev may be misleading .center[data:image/s3,"s3://crabby-images/a0523/a0523e0357f8a664d56a93b01e2959f2a4dae35e" alt=":scale 100%"] .bottom-hint[Justin Matejka, George Fitzmaurice (2017), "Same Stats, Different Graphs: Generating Datasets with Varied Appearance and Identical Statistics through Simulated Annealing", CHI 2017 Conference proceedings: ACM SIGCHI Conference on Human Factors in Computing Systems] --- class: normal ### Ranges [User-friendly approach] -- .size-150[ ```md | Method | Range | |------- |-------------:| | A | `100ms..200ms` | | B | `120ms..150ms` | ``` ] --- class: normal ### Ranges and Notes [User-friendly approach] .size-150[ ```md | Method | Range | Notes | |------- |-------------:|--------------:| | A | 100ms..200ms | `High outliers` | | B | 120ms..150ms | | ``` ] --- class: normal ### Ranges and Notes [User-friendly approach] .size-150[ ```md | Method | Range | Notes | |------- |-------------:|----------------:| | A | 100ms..200ms | High outliers`^1` | | B | 120ms..150ms | | `^1 High outliers: 327ms, 364ms, 396ms` ``` ] --- template: template-section ## Part 3 ## Performance Comparing Reports --- layout: true template: template-page
.footer-note[(3) Performance Comparing Reports]
--- class: normal ### Distribution comparing: Case 1 .up[] .center[data:image/s3,"s3://crabby-images/63c76/63c76c0603d2552f56188085023bfce7be95ffc6" alt=":scale 55%"] --- class: normal ### Overview [Statistical tests] .up[] .center[data:image/s3,"s3://crabby-images/e18d2/e18d2588b0010da893f78be3abf973e175367245" alt=":scale 55%"] --- class: normal ### Confusing naming [Statistical tests] -- * We have: -- * Null hypothesis (we don't have a degradation) -- * Alternative hypothesis (we have a degradation) -- * The result: p-value -- * p-value < α: we reject the null hypothesis -- * p-value > α: we fail to reject the null hypothesis --- class: normal ### Limitations [Statistical tests] -- .up[] ```cs // Before changes (N = 10000; Range = [1.01min..1.02min]) Iteration 0000: 1.011 min Iteration 0001: 1.014 min Iteration 0002: 1.021 min Iteration 0003: 1.017 min ... Iteration 9999: 1.012 min ``` .up1[] ```cs // After changes (N = 1; Range = [60.127min..60.127min]) Iteration 0000: 60.127 min ``` -- ```md 👨💼Team Lead : Do we have a performance degradation? ``` -- .up1[] ```md 👩💻Performance Engineer : Yes, probably we have a degradation here. ``` -- .up1[] ```cs 👻Statistical Test : throw new DivideByZeroException("N should be >= 1") ``` --- class: normal ### Incorrect interpretation [Statistical tests] -- .up[] ```cs // Before changes (N = 10000; Range = [1.01min..1.02min]) Iteration 0000: 1.011 min Iteration 0001: 1.014 min Iteration 0002: 1.021 min Iteration 0003: 1.017 min ... Iteration 9999: 1.012 min ``` .up1[] ```cs // After changes (N = 3; Range = [60.1min..60.3min]) Iteration 0000: 60.127 min Iteration 0001: 60.279 min Iteration 0002: 60.241 min ``` -- ```md 👨💼Team Lead : Do we have a performance degradation? ``` -- .up1[] ```md 👩💻Performance Engineer : Yes, most likely we have a degradation here. ``` -- .up1[] ```md 💩Statistical Test : We fail to reject the null hypothesis. ``` --- class: normal ### Wrong question [Statistical tests] .size-80[ .c[ .cl[ ```cs // Before Changes public void Foo(object x) { // Some code } ``` ] .cr[ ```cs // After Changes public void Foo(object x) { `if (x == null)` `throw new NullReferenceException("x");` // Some code } ``` ] ] ] -- ```md 👨💼Team Lead : Do we have a performance degradation? ``` -- .up1[] ```md 👩💻Performance Engineer : Yes! Because we added new code. ``` -- .up1[] ```md 💩Statistical Test : We fail to reject the null hypothesis. ``` --- class: normal ### Right question [Statistical tests] .size-80[ .c[ .cl[ ```cs // Before Changes public void Foo(object x) { // Some code } ``` ] .cr[ ```cs // After Changes public void Foo(object x) { `if (x == null)` `throw new NullReferenceException("x");` // Some code } ``` ] ] ] -- ```md 👨💼Team Lead : How big is the degradation that we have? ``` -- .up1[] ```md 👩💻Performance Engineer : Very small, we can ignore it. ``` -- .up1[] ```cs 👻Statistical Test : throw new InvalidOperationException(":(") ``` --- class: normal ### Distribution comparing: Case 2 .up[] .center[data:image/s3,"s3://crabby-images/dcf3b/dcf3b70f01550c8280da44e7a7c2531ee56ef1b3" alt=":scale 55%"] --- class: normal ### Distribution comparing: Case 3 .up[] .center[data:image/s3,"s3://crabby-images/f01a7/f01a7d93ce420b50cf53d3512b46c63dc38a953d" alt=":scale 55%"] --- class: normal ### Shift function .up[] .center[data:image/s3,"s3://crabby-images/696b7/696b79168836637733e2b48f0dedbc7b1246af47" alt=":scale 70%"] --- class: normal ### Shift function .up[] .center[data:image/s3,"s3://crabby-images/99a58/99a58d4c81fd87dffdeaf1b31c09be601c0eb43a" alt=":scale 70%"] --- class: normal ### Ratio function .up[] .center[data:image/s3,"s3://crabby-images/09373/09373b5db746d4e058b1195748093e9c073d397e" alt=":scale 70%"] --- class: normal ### Distribution Comparing: Case 4 .up[] .center[data:image/s3,"s3://crabby-images/71222/71222cddf7e2fc032923b6a40facc6673b862052" alt=":scale 55%"] --- class: normal ### Distribution Comparing: Case 5 .up[] .center[data:image/s3,"s3://crabby-images/dfe1c/dfe1c38dcea048b27fbf986f97b181258faaf744" alt=":scale 55%"] --- class: normal ### Ranges [User-friendly approach] .size-150[ ```md | Method | Ratio | |------- |---------:| | A | Baseline | | B | 2.0-3.0 | | C | 0.5-0.6 | | D | 0.3-8.5 | ``` ] -- ```md 👨💼Team Lead : Are you sure that we have a degradation? ``` -- .up1[] ```md 👩💻Performance Engineer : Yes, I'm sure, we should investigate it! ``` -- .up1[] ```md 👩💻Performance Engineer : No, we need more data to be sure. ``` --- class: normal ### Ranges and Notes [User-friendly approach] .size-150[ ```md | Method | Ratio | Is result reliable? | |------- |---------:|-------------------------:| | A | Baseline | | | B | 2.0-3.0 | `Most likely` | | C | 0.5-0.6 | `Most likely` | | D | 0.3-8.5 | `Not sure; need more data` | ``` ] --- template: template-section ## Part 4 ## Performance Anomalies --- layout: true template: template-page
.footer-note[(4) Performance Anomalies]
--- class: normal ### Performance anomalies -- * Changes in performance distribution
(degradation, acceleration, something else) -- * Huge duration -- * Huge variance -- * Huge outliers -- * Multimodality -- * Clasterization -- * ... --- class: normal ### Changes in performance distribution .up[] .center[data:image/s3,"s3://crabby-images/e48f1/e48f1d9732fedceb02e958eac26c57bba2cbbbc6" alt=":scale 75%"] --- class: normal ### ED-PELT .center[data:image/s3,"s3://crabby-images/d7fb3/d7fb3c87bea1e086f82b997a9a78c83487ee777e" alt=":scale 90%"] .bottom-hint-huge[https://link.springer.com/article/10.1007/s11222-016-9687-5
https://aakinshin.net/posts/edpelt/] --- class: normal ### Huge degradation .center[data:image/s3,"s3://crabby-images/d57e1/d57e1341d818dfefbf39e994cecd6489dc3ed40b" alt=":scale 90%"] --- class: normal ### Many small distributions .center[data:image/s3,"s3://crabby-images/de1a7/de1a76538696f3902199229a630a3b614b0d7697" alt=":scale 90%"] --- class: normal ### Huge duration .center[data:image/s3,"s3://crabby-images/c427f/c427f64330491f13095b4f28d1041552bdc92134" alt=":scale 90%"] --- class: normal ### Huge variance .center[data:image/s3,"s3://crabby-images/79cfa/79cfaf2fc5493e5456246ffee1065217b0c08ae3" alt=":scale 90%"] --- class: normal ### Huge outliers .center[data:image/s3,"s3://crabby-images/1d26a/1d26af582e4a3d7122b128c9e6894fca1b5a456c" alt=":scale 90%"] --- class: normal ### Multimodality .center[data:image/s3,"s3://crabby-images/8d6be/8d6be8aab353f9ba49f921feb4ee1fc4c0bcfed1" alt=":scale 90%"] --- class: normal ### Multimodality .center[data:image/s3,"s3://crabby-images/94271/9427165df76ef588768ab2a58d9cd665b723efb3" alt=":scale 90%"] .bottom-hint-huge["A story about slow NuGet package browsing", https://aakinshin.net/posts/nuget-package-browsing/] --- class: normal ### Clasterization .center[data:image/s3,"s3://crabby-images/4f513/4f513333cfa9d84e89b61e2a90ceb10a45f9c63e" alt=":scale 90%"] --- class: normal ### Clasterization .center[data:image/s3,"s3://crabby-images/0bf82/0bf82ad56c77886c4a603db0fd04ec01e4c61460" alt=":scale 90%"] --- class: normal ### Clasterization .center[data:image/s3,"s3://crabby-images/ae6c0/ae6c0245fc6c1ce4413a28ba49b9b10f0c527bb7" alt=":scale 90%"] --- class: normal ### False anomalies .center[data:image/s3,"s3://crabby-images/eeeca/eeeca5b458b08421ba68687e08b66fd5c75285a0" alt=":scale 37%"] -- * Changes in tests -- * Changes in the test order -- * Changes in CI agent software/hardware -- * Any other changes --- template: template-section ## Part 5 ## Performance Alarms & Asserts --- layout: true template: template-page
.footer-note[(5) Performance Alarms & Asserts]
--- class: normal ### Alarms vs. Asserts | | Alarms | Asserts | |:-----------------|:--------------------|:---------------| | Action | Send a notification | Fail a test | | False-positive | Acceptable | Not acceptable | | Detection moment | After changes | Before changes | | Amount of data | Doesn't matter | A lot | --- class: normal ### Performance data -- * Sources * Unit/Integration/Functional/End-to-end tests * Microbenchmarks * Stress testing * GUI tests * Fuzz testing * Monitoring * Telemetry -- * Metrics * Wall-clock time of the whole test * Wall-clock time of the test stages * CPU/Memory/Disk/Network usage * Performance counters (e.g., GC.CollectionCount) * Hardware counters (e.g., CacheMisses, BranchMispredictions) --- class: normal ### Environment -- .up[] .center[My workplace:] .center[data:image/s3,"s3://crabby-images/d0bf3/d0bf3bb267e45ea2523becec8a427aa6b5c4e1bf" alt=":scale 70%"] --- class: normal ### Physical environment .pull-left-33[data:image/s3,"s3://crabby-images/dcb50/dcb508ace340bc0a9b9ed88b8f058d60a149af86" alt=":scale 100%"
In a freezer] .pull-left-33[data:image/s3,"s3://crabby-images/645ef/645efa9906edd75452602f675ef3b8ee10d93872" alt=":scale 100%"
In a blanket] .pull-left-33[data:image/s3,"s3://crabby-images/3cec3/3cec342c59368074d2bbb381641bde94e1423947" alt=":scale 100%"
In an owen] --- class: normal ### Dashboard-oriented approach for alarms .up[] .center[**Worst degradations**] .up1[] | Test | Ratio | |--------:|-----------:| | Test472 | 35.0..48.0 | | Test982 | 10.0..12.0 | | Test872 | 1.1.. 1.2 | | Test375 | 1.0.. 1.1 | | Test184 | 0.9.. 1.0 | | Test592 | 0.9.. 1.0 | | Test824 | 0.9.. 1.0 | | Test294 | 0.9.. 1.0 | | Test235 | 0.9.. 1.0 | | Test948 | 0.9.. 1.0 | --- class: normal ### Absolute Threshold [Asserts] .up[] ```cs [Test(Timeout = 5000)] public void Foo() { // ... } ``` --- class: normal ### Relative Threshold [Asserts] .up[] ```cs [Benchmark] public void Foo() { // ... } [Benchmark] public void Baseline() { // ... } [Test] public void VerifyFooPerf() { Assert.True(Mean(Foo) / Mean(Baseline) > 5000); } ``` --- class: normal ### Adaptive Threshold [Asserts] .up[] ```cs [Test] public void VerifyPerf(history: List
, current: List
) { var result = Compare(history, current) Assert.True(result.Ratio.Upper < 2); } ``` --- class: normal ### Adaptive Threshold + Optional Stopping [Asserts] .up[] ```cs [Test] public void VerifyPerf(history: List
) { var current = new List
(); while (true) { var duration = Measure(); current.Add(duration); var result = Compare(history, current); if (result.Ratio.Upper > 2) { if (result.Notes == "We are sure") Fail(); // A degradation is detected! if (result.Notes == "We are not sure") continue; // We need more data } else return; // No major degradations are detected } } ``` --- template: template-section ## Part 6 ## Conclusion --- layout: true template: template-page
.footer-note[(6) Conclusion]
--- class: normal ### Final tips -- * Make performance reports user-friendly -- * Learn mathematical statistics -- * Check all kinds of performance anomalies -- * Use all the data that you have -- * Automate everything that can be automated --- class: normal ### Performance Culture .center[data:image/s3,"s3://crabby-images/67c5e/67c5e80d7145bb9ac7993ccc39aac9ca8fab80cd" alt=":scale 50%"] --- class: normal ### My first performance monitoring attempt .up[] .center[data:image/s3,"s3://crabby-images/65203/652031342ea9bfba8001ac9d7e476ed1b1137ec8" alt=":scale 100%"] --- class: normal ### Reference literature .up[] .center[data:image/s3,"s3://crabby-images/4d9aa/4d9aa175e259749d8858bae39cf6c74cd785b80c" alt=":scale 38%"] .bottom-hint-huge[https://aakinshin.net/prodotnetbenchmarking/] --- class: normal ### Thank you for your attention! .center[data:image/s3,"s3://crabby-images/38761/38761c5cf7b4fffcf79262c72abb2a37ebb1dfbe" alt=":scale 30%"] .center[ Andrey Akinshin https://aakinshin.net https://github.com/AndreyAkinshin https://twitter.com/andrey_akinshin andrey.akinshin@gmail.com ]