No video :(

"How NOT to Measure Latency" by Gil Tene

Подписаться 82 тыс.

Просмотров 104 тыс.

50% 1

Time is Money. Understanding application responsiveness and latency is critical but good characterization of bad data is useless. Gil Tene discusses some common pitfalls encountered in measuring latency and response time behavior. He introduces how simple, open sourced tools can be used to improve and gain higher confidence in both latency measurement and reporting.
Gil Tene
AZUL SYSTEMS
@giltene
Gil Tene is CTO and co-founder of Azul Systems. He has been involved with virtual machine and runtime technologies for the past 25 years. His pet focus areas include system responsiveness and latency behavior. Gil is a frequent speaker at technology conferences worldwide, and an official JavaOne Rock Star. He pioneered the Continuously Concurrent Compacting Collector (C4) that powers Azul's continuously reactive Java platforms. In past lives, he also designed and built operating systems, network switches, firewalls, and laser based mosquito interception systems.

Опубликовано:

29 авг 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 16

@pranytt3485 2 года назад

Key takeaways for me : 1. Most of the tools that capture the response times, report 99 percentile latency of every 30 sec duration. For example prometheus metrics are scraped every one minute. But the real thing to look at is the Max response time. 2. Gatling fixed the co-ordinated omission problem. Most of the other tools like Jmeter, etc still have this problem. So use Gatling for your load generation and reporting purposes. 3. Didn't understand co-ordinated omission fully. But I'm now informed that it is bad and needs to be looked out for. 4. When a graph shows sudden spike, it is an indication of a 'possible' coordinated omission. If a graph is smoothly growing it is an indication that there is no bad data. Exceptions maybe there to this rule. 5. There is no point in looking at percentile graphs if you don't have performance goals set for your service. If you are comparing two systems and your target is 20ms, then you could plot graphs and see what is the maximum throughput each system supports while maintaining latency at 20 ms.

@TheSuckerOfTheWorld 9 лет назад

10 Minutes in and I already see the very obvious flaw that +Gil Tene pointed out in my day-to-day monitoring. Great talk!

@whitegelfling 8 лет назад

Coordinate emission: One issue here is one that is often encountered in metrics in business, and that is that the bosses want simple, easy, and reliable numbers to look at. To the guy behind the project it is seen as a system that ions out a rare case, without understanding the maths behind it.

@timothydsears 8 лет назад

Terrific talk about load testing and lazy thinking. The early part probably applies to anyone thinking about metrics for a complex system.

@TestAutomationTV Год назад

Nice talk, I've read good things about it. Now starting to listen, looking forward to finding some good stuff about performance testing.

@WilsonMar1 8 лет назад

[6:52] I don't have the data. A common problem we have is we plot only what is convenient. We only plot what gives us nice colorful charts. We choose the noise to display.

@Turalcar Год назад

I'm more used to graphs being split for request kinds. To me the first thing that jumped out was the large difference between 50th and 75th percentile.

@ericj1380 2 года назад

@12:04, is this because of 5 page loads/40 resources per page increasing the chance of hitting above p99? If that’s the case couldn’t you just adjust each graph to be on a per-resource or per-page basis? Which seems like it would directly reflect the percentile.

@minimaddu 9 лет назад

Great talk! I'm curious, we get most of our production response time stats from AWS load balancer logs. Is that an accurate measure of response time?

@tirumaraiselvan1 7 месяцев назад

19:34 should be 100 measurements of 100s each , no? 100 requests will be sent that second and each will be stalled for 100s.

@ruimeireles1695 4 года назад

Anyone can write all the tool names mentioned in the presentation? I can't find some of them, probably because I'm not writing the name correctly.

@whitegelfling 8 лет назад

Ok, i'm only a few mins in and my brain hurts.. I can't belive that people seriously ignore the max in things like this.. scary.

@MikkoRantalainen 4 года назад

I agree. Only maximum (worst case latency) and median latency are worth wathing. Everything else is just noise.

@MikkoRantalainen 4 года назад

Note that "median" is not the target, the diffence between the worst case latency and median latency is the part of the picture that could get better if you fix the bad stuff. Getting median latency downwards often requires LOTS of changes to the system.

@MikkoRantalainen 4 года назад

All well made latency graphs should have number of the requests per second on the horizontal axis and maximum response time on vertical axis. The number of requests per second that gets the maximum response time too high is the limit.