Everything Wrong with Ump Scorecards

Baseball's Not Dead

Подписаться 55 тыс.

Просмотров 96 тыс.

50% 1

Видео Поделиться Скачать Добавить в

Twitter - / dead_baseball

Игры

Опубликовано:

11 сен 2023

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 262

@tiger4thewin 9 месяцев назад

I always skipped overall consistency since the EUZ never looked completely right to me. Thank you for being able to articulate what seemed missing and suggesting some interesting changes!

@joshknapp7455 9 месяцев назад

I think having a 'total distance missed' number is a good idea but bad math. I think having an average distance missed would show a better picture because a total distance for one game vs another could vary wildly because of the difference in number of pitches thrown

@nmappraiser9926 8 месяцев назад

Except that total distance increases with the number of missed calls. Average could be skewed by one particularly bad call. I think for a full picture you'd want both numbers included.

@rick809 8 месяцев назад

@@nmappraiser9926 yeah that's the issue. Total distance increases with the number of missed calls, so an ump who misses on a lot of really close calls might have a higher total distance missed than an ump who does a bad job with an easier game, even though the former probably had the better game overall. Normalizing it based on the average makes it a better metric to compare two umps directly without having to look at and understand the rest of the information on the card. Also, total distance missed could also increase drastically by 1 really bad call, so I'm not sure what your point is there. Regardless, to get rid of that issue, you could always just remove outliers. The problem I have with average distance missed is that it's just a number and the viewer has to look at a lot of scorecards to get a feel for what the typical range is for that value and what "good" and "bad" look like. Everyone understands percentages, and having the average there and the expected average and deviation from the expected average makes it very clear for someone looking at an ump scorecard for the first time that this ump did a good or bad job.

@user-se4rr3rs3m 8 месяцев назад

Maybe some kind of weighted average depending on how close the missed call was. 0.1 inch missed call would have a low weight compared to an 2 inches missed call. I think the main idea is to evaluate the ump performance depending of how close their missed called were.

@SalvoSLR 8 месяцев назад

@@nmappraiser9926 agreed. Average would be missing the intent of showing a distance missed

@letsmakeit110 8 месяцев назад

@@user-se4rr3rs3m now that sounds like relative accuracy.

@hardyworld 9 месяцев назад

Great video. I agree with all your suggestions except replacing relative accuracy. I agree with you that it is an important data point and it needs to remain as part of the scorecard. If you want to ADD the total distance missed, that's fine, but I'm not convinced it's a meaningful addition. Perhaps average distance missed on all missed balls and average distance missed on all missed strikes could be better than the total distance missed?

@finnb2318 9 месяцев назад

I think the wording of the poll was unfortunate (~Which number do you look at first). Even I might have answered Total Accuracy, despite considering Relative Accuracy more important. If instead he had asked a number of people to rank them from most to least important or indicative of an umpire's performance (say four points for first, one for fourth) he might find that he would have gotten a very different result. It's also a relatively new addition. Even with the internet moving as fast as it does, and him speaking to a digital audience, I would wait some more before, in effect, claiming that the measurement has failed as it's not understood.

@mwal223 8 месяцев назад

@@finnb2318I absolutely agree with you there. Total accuracy is usually the first metric I look at (because attention is drawn to it on the scorecard) but I consider expected/relative accuracy to be at least as important, if not more

@junct 8 месяцев назад

in statistics, mean squared error is a common indicator of how close a thing to the model. so rather than average distance missed, you square the distance before averaging it. the idea is that you punish big misses more heavily and let small misses be left off more lightly. so mean squared error would probably be a more meaningful stat to look for

@dominicpancella3012 9 месяцев назад

This is a great video! Umpire Scorecards is still a project in its relative infancy, so it's always good to get this kind of constructive feedback from people who regularly use the site. As a data person myself, I've got a couple comments on the way the scorecards are presented. All your concerns are very well founded and well-explained. 1) A kernel density map is definitely not the right tool for the job as far as the EUZ is concerned, for all the reasons you mentioned. What would be better is your distance idea, where the weight of how a bad call affects the shape of the zone is determined by the square of the distance from the edge of the zone. But that doesn't tell the whole story either: the reason KD is used is to potentially identify common areas of misses, which as you say is more useful for much larger samples that would allow you to clearly see, for example, that a particular ump tends to call balls just off the outside corner strikes more than the average ump (or more than the True Zone). For single games that usually isn't feasible though, and I don't have a terribly good answer as to the best solution for this problem other than "the algorithm is a work in progress." 2) I like your Total Distance Missed idea, and I think it could benefit from the inclusion of some other metric like average distance per miss or average distance missed per call. A more elegant solution could even include what SABRmetricians have grown accustomed to, the + stat where 100 is average. I also agree with your sentiment that this is fundamentally the most important part of the scorecard-how does this umpire compare to other human umpires calling the same game? How does he compare to an electronic umpire? They ought to keep the "correct calls above expected" line in there though, that's insight with a tremendous punch. 3) I have an idea brewing as far as favor is concerned, and it wouldn't even be that difficult a result to achieve. Replace the net runs for with a slightly more complex calculation where you take all the missed calls that benefited each team, multiply the count by the median in each case, and subtract one expected run value from the other. This would strongly decrease the relative strength of any outliers in either direction. You could also step it up a notch and, instead of using the median, weight each call based on the distance missed such that more egregious misses are counted as such and close calls don't have as much of an impact. But in order not to skew the favor metric too much by introducing different units, it might be better to simply apply those weights to the impactful calls list, for which run expectancy changes are not shown anyway (but perhaps should be, in a simple way as above, like "+0.19 runs for TEX" or something). 4) How do we measure consistency? US measures it based on how many correct and incorrect calls fall within their estimation of what the umpire's zone is. And in theory this makes sense, but in practice it turns out kind of weird and janky. Another way to look at consistency would be to look at other games this umpire has called and determine how many more makes/misses he would have than usual and/or how much farther/closer his misses are on average to his typical missed calls. The primary reason to do this is that players and coaches would be looking primarily at an ump's overall tendencies rather than his in-game tendencies unless it was clear from the get-go his zone was wider than usual or that he was calling one corner differently from another or something. 5) Where's the distillation of all these numbers into an overall umpire rating? EA has been doing this kind of thing for decades, and it would ostensibly be pretty simple to add an overall grade to the top of a game scorecard to let you know how this umpire's performance actually was without you having to dig into or combine any of the other numbers yourself. Even make it a letter grade so it's simpler, e.g. for XYZ game Livensparger earns a B+ and for ABC other game Wendelstedt earns a C-. Or whatever. All of the other information is useful, but people with short attention spans want either a single number or scalable identifier that they can use to tell everyone their least favorite umpire is horrible and should be fired and incarcerated.

@dash4800 8 месяцев назад

But you would think that the brilliant statisticians coming up with this would immediately see how flawed their system is. It really looks like a system that they put through no actual real world testing before trotting it out there. Like many baseball stats, they came up with a formula that gave them a number and never stopped to think if that number was either helpful or actually reflective of whats happening.

@panner11 8 месяцев назад

@@dash4800 Things like this are always a work in progress. Models that simulate human judgements are inherently flawed and are built up over time. In this case, I don't think the people who wrote the paper are the same people that run Ump Scorecard. So you shouldn't think of it as the people who came up with it didn't know the flaws. They are well aware, of course. It's possible Ump Scorecard just went with it, there's no reason to believe the creators had much to with it being used in production.

@MrTheboffin 8 месяцев назад

@@dash4800 I suspect that the probably did but couldn't find a better one. After all the one proposed in the video has flaws as well since it provides no weight to correct calls which based on the reste of the card represent the vast majority of the data.

@laartwork 8 месяцев назад

It's a project that will end soon. Second year of my AAA team here using the robo ump and no one notices. Just imagine 100% accuracy every game.

@jamesknapp64 16 дней назад

As a mathematician this is a great comment.

@kyokyo718 8 месяцев назад

Even if Ump scorecards never changes, you demonstrating how to read the data presented in its current form is extremely valuable information. Kudos to you for managing both sides of analysis.

@Uncle_Benny 9 месяцев назад

I love you're idea for total inches missed! I'd recommend also adding some sort of average or "Inches per pitch". Because again, if an ump has a game with 30 close calls, and misses by 1 inch on just 10 of those, he'd still have a total of 10 inches missed, whereas if an ump didn't have many close calls, but missed 2 or 3 by 3 inches, he's only at 6" or 9" total, even though he actually had a worse game

@brp5121 8 месяцев назад

This is one of the most helpful videos I've ever seen. Not only did you make great points, I also learned a ton about Umpire Scorecarda.

@panner11 9 месяцев назад

I'm very impressed with this video. EUZ sparks a lot of confusion and I was skeptical if you'd present the algorithm correctly, but you nailed it. The message being that it's not the EUZ is bad, it just isn't very suited for use with small sample sizes like a single game. Over the course of a career, you can get a good sense of an ump's tendencies and whether they are consistent or not. But when there's a lack of data in certain areas of the zone, the zone becomes a crapshoot. Thanks for presenting the information accurately and not hamming up the issues. As for a better model, your progressive deformation model is pretty good. Probably they should just aggregate a bunch of models including these like most modern models do. But people also like transparency especially in sports so I understand them wanting to stick with it.

@rowlofobro2 9 месяцев назад

Overall favor is one of the most misleading stats since its only based off of expected runs and misses a lot of context. I remember a Mariners game earlier this year where there was 2 on 2 out and Julio hitting with a 2/1 count. The ump misses a call and calls a ball a strike making it 2/2. The run value went towards that other team because a 3/1 count is much better as a hitter than a 2/2. The next pitch the ump also misses but this time he calls a strike a ball and the expected explodes in Seattle's favor because expected run value thinks that should have been strike 3 and inning over and all of a sudden seattle gets like a full run in value. Had the pitches played out in the opposite order and a strike was call a ball to make it 3/1 but then a ball was called a strike, the favor would have heavily shifted towards the other team for avoiding a walk when in reality the end result is more a less the same and the ump messed up bad for both teams.

@IsYouAWizard 9 месяцев назад

The Total Distance Missed is cool, definitely think if it were to be put into practice it would need to be divided by the number of pitches that the ump had to make a call on (no swing).

@1868JG 9 месяцев назад

Or a per 100 pitches version.

@Liwet. 8 месяцев назад

For Total Distance Missed, you should instead square each individual distance, find the average of all these values, and then take the square root of that average. You aren't going to have umps that exclusively have 'good' misses and umps that exclusively have 'bad' misses; you'll be comparing umps that have a mixture of the two. Squaring will make the bad misses harder to overcome in the average. Otherwise one bad miss will look the same as 3 good misses.

@karolrafalski3419 8 месяцев назад

Was just about to comment that root meas square of distances would probably be a better metric. That combined with total amount (and type) of missed calls would paint a decent picture of consistency with the true zone.

@Mason.Becker 9 месяцев назад

Amazing video. The only thing I sometimes wish was on the scorecards is 2 separate zones, one for right handed hitters and one for left handed. Just to see the differences, if any, for things like a pitch inside to a righty that is called a strike but outside to a lefty called a ball

@harrisonkarp7406 9 месяцев назад

One of the smartest RU-vid videos ever fantastic job

@jameskingsbery3644 8 месяцев назад

Some math thoughts: 1. When dealing with rare events, such as happens for calls high in the strike zone in the KDE used for EUZ, one approach is to use a Bayesian method. The simplified-for-RU-vid-comments version is: they should add some "fake" pitches that are correctly classified (or, classified as the "average" ump would call) just for the purposes of the KDE. As there are more pitches in a part of the zone, the actual pitches overwhelm the fake data. If there aren't a lot of pitches in a part of the zone, the KDE can anchor on the added data for what the ump probably would have done. 2. The total distance missed is an interesting idea. I don't know how it would be made simple to understand but summing the squares (that is, the distance missed of each pitch times itself, all summed together) of the errors would highlight bad calls more. If you have 8 pitches that miss by a quarter of an inch and a pitch that misses by two inches, the total distance missed would be the same (2 inches), but the one bad call seems worse than a bunch of close ones. By summing the squares, the two pitchers would have 0.5 (for a lot of small misses) vs. 4 (for the one big miss). In any case, great video!

@mrmikejsteele 8 месяцев назад

I’m not sure I’ve ever learned more about something I’m familiar with in such a short time. This was clear, fair, and interesting. I’ll never read an Ump Scorecard the same way again. Thank you!

@hardhatlunchpal 9 месяцев назад

Wouldn't it be a better stat with overall distance if you made it per missed call. Like you take the overall distance divided by the number of pitches used

@fellow456 9 месяцев назад

Yea, if you just have total distance missed, then an ump could get shafted just by virtue of having to call more pitches over the course of a game.

@smoceany9478 9 месяцев назад

nah i think per pitch called is better, per missed pitch would make an ump who calls 1 pitch 3 inches off wrong and thats it is worse than an ump missing 10 pitches but average 2.9 inches per ball missed

@anthonyregier9649 9 месяцев назад

@@smoceany9478you could have every correct strike or ball count as 0. Would be a low number but you could multiply it by 100 inches or something for a grade ranking.

@smoceany9478 9 месяцев назад

@@anthonyregier9649 yea thats what im saying, add the inches missed for every incorrect pitch and divide it by every pitch called

@donelec5955 9 месяцев назад

Finally someone was able to explain the ump consistency to me

@jeffroitero4266 9 месяцев назад

Dude, you're amazing. I've wanted to do a deep dive into this ump zone thing... but I haven't had the time, which is a slightly artful way of saying that I didn't actually want to do it that badly. And I'm rewarded for my laziness by the fact that you did the work for me. Only so much better than I would have. You rock. And you know all this, but comments are good for youtube, so... comment comment comment.

@8stormy5 8 месяцев назад

Two criticisms. First, EUZ is a tool to determine whether misses are arbitrary or follow a pattern. It's obviously not going to work when there are no misses. Second, overcomplicating the model of fit to get "everything right" risks overfitting the model to the data. The model would simply collapse into a simple and literal description of reality, which means it can't meaningfully predict at all whether a missed call was missed arbitrarily or due to bias.

@panner11 8 месяцев назад

His criticism of EUZ was pretty spot on though, it's extremely volatile when there is a lack of data. In a single game sample size that happens often. Just on inspection we can see how wonky the EUZs are for single games. You are right about the progressive deformation model he suggests, it is prone to overfitting and assumes the zone is just the real zone. But aggregate it with the other models and normalize it and I'm sure it would be fine.

@raschticky 9 месяцев назад

I have never agreed more with a quote than I have with “Joe West is the Yuniesky Betancourt of umpires”

@kadensadich1311 9 месяцев назад

Incredible video, well put together. Always get happy when I see another video put out by you. Your editing is so good and you present everything so well and use so many great sources and back up everything. You don't just give one side of the argument, you give both. So yeah, thanks for this video and keep up the great work :D

@jeffroitero4266 9 месяцев назад

Angel Hernandez will be the first ump with a total distance missed that exceeds his own height.

@SvanMagic 8 месяцев назад

CB Buckner may beat that total.

@62Trevor2199 9 месяцев назад

Like the refreshed intro! Great vid!

@paulframe85 8 месяцев назад

I love the intro sequence on your videos. It's just so much fun!

@Busanjingu.popularrapper 8 месяцев назад

Wow. This is one of the most impressive video ever. Mentioning issues and coming up with clear solutions are such a rare thing in youtube.

@taylorb5039 9 месяцев назад

Particularly good video topic, and a very well executed video. I hope it's not condescending to say I'm seeing your growth as a creator! It will pay off. Thanks for making vids

@IKER0718 9 месяцев назад

damn the intro alone deserve a lot more subscribers!!! love that a lot, now im going to enjoy the video!

@andrewlauer4030 8 месяцев назад

You don't suggest a fix for the Favor category, but I think it's actually fairly easy. Since they already have a metric for expected accuracy on certain calls, they could apply that as a weighting factor to the win probability added. So that way a call that is essentially a toss up can't swing the favor too much. Something like WPA*[2*(Expected Accuracy - 0.5)]. That way if the expected accuracy on a call is only 55%, then only 10% of the win probability added by the call would count towards the umpires bias for the game. It wouldn't be a perfect fix, but it would go towards addressing the problem you are talking about.

@inline885 8 месяцев назад

Wow this video totally changed the way I look at this. Great work!

@EDF1919 9 месяцев назад

Always a good day when BND uploads.

@cosmoid 8 месяцев назад

I never really understood the EUZ in the first place, but I love getting to see the accuracy of ball/strike calls.

@a-a-ron3542 8 месяцев назад

One thing I keep thinking about is that pitchers are consistently throwing harder than ever with greater movement than ever. It's literally harder to call balls and strikes than ever. Furthermore, part of that velo consistency is the increased use of bullpens, which means they are going to see a greater range of pitchers with more diverse release points and styles, so it's harder to get into a groove with a single pitcher. The Phil Cuzzi's and the Angel Hernandez's are always going to be terrible, but I don't know that umps are worse; I genuinely think it's a harder job than it used to be. Edit: we also didn't have the luxury of the little box 10 years ago. We would just say things like, "Oh, it was two strikes, he should have been protecting."

@aaronlee9784 9 месяцев назад

Glad to see an analysis/scripted video! You're easily one of my favorite if not my #1 baseball creator

@sirgermaine 8 месяцев назад

If we already have inches off, the simplest piece of context you could put on impactful missed calls is to put (amount of impact) and (distance from correct) along with the context for the call. There is already room, you just throw it right below as +1.25 SEA / .34 inches out or +1.2 NYY / 1.8 inches in

@Sean-uh6te 9 месяцев назад

The strike zone box on TV is not the strike zone. It rarely gets the top of the zone right. Its just a TV graphic for us at home. Like the fist down yellow line you see in football. Its not perfect. This is why MLB has asked teams to stop looking at the reply in the dugout on their ipads. It’s needlessly piss them off thinking its the true strike zone. If the broadcasters were to remind people of this on air, like the football broadcasters do, it would clear up a lot of confusion and negativity.

@1uckedout 8 месяцев назад

The problem of viewers thinking the box is perfectly accurate is so bad even a baseball youtuber like this believes it's always right. He even called it the "true zone" in this video lol

@CharlesFreck 8 месяцев назад

I hate that people think it's real. It's usually about a full ball short high, meaning high strikes look a mile out of the zone when they're easily completely within the actual zone. People also think the zone is based on where the batter is when they've swung/the ball passes the plate, but it's not. It's based on their stance when in the box and ready i.e. the top of the zone is much, much higher then on TV. People need to remember the zone changes FOR EACH BATTER. If you're tall, you're going to have a bigger strike zone then a tiny guy, and the tiny guys zone will be lower then the tall guy. The zone is not fixed. There's no true zone. It's situational for each individual that steps up too the plate.

@Sean-uh6te 8 месяцев назад

@@CharlesFreck you said it better than me. I’m not a conspiracy guy but it feels like one to swing favor towards a robo ump. Probably not, but it feels like things are going in that direction and its made easier when the audience is shown a false strike zone.

@terri2rial 9 месяцев назад

dont know much about baseball, i’m an *extremely* casual fan, but this video was so eloquent and well researched it makes me wanna become a baseball nerd. well done!!

@jcorn12 9 месяцев назад

I'm struggling to see how total distance missed is any more intuitive than relative accuracy. Otherwise great video

@JGPRSNJ 9 месяцев назад

Relative accuracy is much more straight to the point. It would also need to be a distance missed per missed call or something because it would be very difficult to compare games on just a flat distance number

@stevedomique9278 9 месяцев назад

A thousand percent, agree with every other point in the video. Relative accuracy seems like a great stat, it's just misunderstood and underemphasized in the umpire scorecard.

@NickyQuesne13 9 месяцев назад

Great vid. Gotta say, the dig at Cinemasins was the cherry on top...

@panner11 9 месяцев назад

I never understood that channel. Only saw a few videos, but never saw a single sin that was legitimate criticism like a review would do. Seemed like just 100 random observations about the movie labeled as sins. I assume it's satire but it's not even funny so idk.

@coreygroh654 8 месяцев назад

Love the addition of JoRam's knockout in the intro

@LeaminOwnsAll 8 месяцев назад

Great breakdown on the umpire scorecard! Learned a lot watching this!

@JDawg12329 9 месяцев назад

What if consistently took the average number of differing calls in an area. So you could split the zone up into four quadrants, then have a 3 inch box in each corner just outside the zone and then finally 4 more boxes well outside in the corners, So they have 12 zones that they can measure whether the calls were all the same within that zone. If you have to make 8 calls in the bottom right quadrant of the zone and you call 7 strikes and 1 ball, you would have a consistency rate of 0.88 for that particular section. Then you average out the zones to get an overall consistency.

@hb-robo 9 месяцев назад

This would be amazing, sort of like an accuracy heatmap. I would probably suggest the more common 3x3 grid inside the zone, which would lead to 16 cells outside the zone (3 per side + 4 corners) for a total of 25. Just to get more granular, since we already know the precise XY coordinates of every pitch

@komiteunofficialaccount9224 9 месяцев назад

Great explanation of KDE, and how it went wrong. I miss *some* math classes.

@nickbuchholz6841 8 месяцев назад

This was really good, great presentation and points. 10/10

@tonychen3628 8 месяцев назад

😂Intro now includes JRam knocking out TA. That should worth 1 million likes.

@itsokthen 8 месяцев назад

A change to help show the favor better is to have the impactful calls section show how much that individual call moved was worth. If an ump has a +2 favor but i see one call was +1.7 I would be able to understand it better

@charlie-wf3bn 8 месяцев назад

its so interesting, because the formula they use for overall consistency makes sense and is good statistics, even though it doesn't map onto the tendencies of baseball players well. such a niche bit of statistics for this game.

@panner11 8 месяцев назад

It's just too volatile for a small sample size like a single game. It maps the tendencies of baseball players well with larger samples.

@CYMotorsport 9 месяцев назад

Maybe a pipe dream but can you explain to me why baseball has yet to test out leveraging sensors and accelerometers ? The nfl uses them for the pylon. Formula 1 uses them but our cars go twice as fast as baseballs. It’s a static plate I do not understand why you wouldn’t rig up some type of real time hyper accurate true zone monitoring. They already do it with cameras and the sensors are reliable and hidden. They are exponentially more accurate than an ump who can still manage the game with their work load on batter ball plays. They can also be there as back up in case of tech failure early on while they implement.

@saccharide 8 месяцев назад

There's something called RMS (root mean squared) that can be considered as well. It is a weighted average with the squared - so something way off would be counted more. Mathematically, it's the average of the square of the error and then the average is square rooted to bring it back to the right dimensions

@Jay-gb9pi 8 месяцев назад

I always wondered if there was a way to change the bottom and top of the true zone depending on the height of the batter... because the true zone does not change up or down I tend to look at the inside/outside calls 1st and w/t more scrutiny than missed calls at the top/bottom of the zone since you don't know if the batter was 5'6 or 6'6...and knowing how different batters are called would be really interesting... getting to see how the different zones styles altuve and judge are working with would Cool

@simonthegreat527 9 месяцев назад

Is it just me, or have the batters purposefully made it harder and harder on these umpires as they have (for the most part) gotten better and better? When I was young, a hitter would never, ever, never, ever take a pitch right on the corner with two strikes while expecting the ump to call a ball. It was called protecting with 2 strikes. Today, hitters rarely seem to be able to protect with 2 strikes and instead use eagle eyes to either milk a walk or hit a mistake. In my short time playing baseball, basketball, football, or any form of competition the coaches always said "Do not let the referees or umpires decide the game." You swing the bat if you have two strikes and the pitch is potentially a strike, don't stand there and get mad that a borderline call goes one way or the other.

@1uckedout 8 месяцев назад

That's a product of the three true outcome way of playing. They're looking to hit it over the fence or not hit it at all. They'll take borderline pitches and hope for a call to go their way on 2 stikes because they don't want to offer at it if they don't have the potential of extra bases. I think it's a great approach when there's less than two strikes but I hate watching hitters take close pitches on 3 strikes.

@CharlesFreck 8 месяцев назад

@@1uckedout Nailed it. Strike, Walk or Homer (realistically, extra base hit). The idea is that it, on average, forces pitchers to throw more, and thus, make more mistakes, opening up more chances later. The Japanese play more like Simon is talking about, training to foul off anything they don't want to hit. But the problem is, Major League pitchers are the best in the world, and it's significantly harder to foul a ball off. You're risking a miss, ground ball or fly out everytime you swing, you just can't expect to beat a pitcher every pitch. So anything that isn't exactly what you're looking for, you leave, and hope they threw a ball.

@darthjaxrevan 8 месяцев назад

In reference to the three biggest favor calls, could just add how much each favored a team.

@davrosthecreator1660 8 месяцев назад

9:15 I’ve been saying this for ages. This is the best way to hold terrible umps accountable. Decide what percentage of umps would make the right call based on certain pitches. If the ump makes the right call, award them points based on that percentage. Easy calls get hardly any points, tough calls get more points. And then vice versa for blown calls. If it’s a strike right down the middle called a ball, they are deducted a bunch. But if it dots the corner and it’s called a ball, they won’t be punished as much.

@Dockie27 9 месяцев назад

Great video, but I got completely distracted on an hour long rabbit hole (hour deep?) looking at the Galle Crater, Mars, and asteroid impacts. Thanks for the new space stuff to learn about!

@CMCFLYYY 8 месяцев назад

One thing to keep in mind. You rightfully brought up how their "Kernel Density Estimation" can create wonky estimated zones, because the algorithm they're using to do the estimation has issues with small sample sizes. So I would keep that in mind when leaning on Relative Accuracy so hard - if the algorithm for KED estimations can produce such wonky results, how do we know the algorithm they use for Expected Accuracy isn't similarly flawed and similarly produces wonky results. Honestly I think the best metric to use would be the Total Distance Missed you mentioned in the video. What we want to know is...how often did this ump miss calls and by how much did he miss them. That's it. Just looking at Accuracy can be misleading because it doesn't factor in by how much he missed on those misses. And Relative Accuracy (based on whatever algorithm they use) is flawed in the same way if it's all you look at, because it too ignores how little or egregious the misses are. Both treat all misses the same. So IMO, Accuracy is good but I think Total Distance Missed is just as if not more important. And break it down by balls and strikes for each team. So you could say the ump missed 4 balls for this team by 2 inches, and 12 strikes by 18 inches. But for the other team he only missed 1 ball by an inch and 2 strikes by 3 inches etc. And then look specifically at the worst calls by distance to see if those came in key situations where he could've been favoring one team over the other, instead of using expected runs. Meaning, if an ump missed 18 calls by 12 inches but 10 of those inches came on 2 calls in key 2-out situations where he called obvious balls strike-3s to end innings, that could possibly point to a situation where he was favoring one team over the other. Great stuff though.

@RipleySawzen 8 месяцев назад

This would be downright easy to fix, as a programmer. 1. No correct call should count against the ump, unless that correct call is locally surrounded by incorrect calls. 2. If you can draw a box around all of the strikes, and there are no balls inside that box, it's an automatic 100 for consistency.

@matrixphijr 9 месяцев назад

Maybe changing the name of ‘Favor’ to something else would help, because that word definitely implies a purposeful act, which is probably why people associate it with rigging the game.

@hb-robo 9 месяцев назад

agreed, something that indicates less intention would be better, maybe "Net Benefit [LAD +04]"

@josecarrera6519 8 месяцев назад

actually very informative and will no longer read ump scorecards like a noob, except if the favor goes against my team!

@josephtaylor5077 8 месяцев назад

Never heard of Umpire Strike Zones. I’ll have too look them up. Great analysis of the numbers.

@jacobs7424 8 месяцев назад

"Average distance missed" plotted against "catcher framing score" would be the best judge of consistency vs bias.

@540058 8 месяцев назад

Total distance missed-->Distance missed per ball Amazing video.

@Kirk00077 4 месяца назад

If we assume that umpires aren’t actually biased toward or against a particular team (which I think is a bit silly) then one interesting interpretation of favor is that it reflects the difficulty of calling one team’s pitch mix correctly compared to the other: if Aaron Nola starts against Dustin May, I might expect the umpire to “favor” the Phillies.

@grife3000 9 месяцев назад

Honestly I have no use for "relative accuracy" or "overall consistency". Just move ball and strike accuracy to the top section, and that leaves room for another couple of "impactful calls", maybe like a top 5? And if they just added the "runs favored" stat to each "impactful call" it would show so much more about what you were worrying about -- the one really super impactful call compared to the others. Imagine if it went "1. +1.71 Runs for SEA 2. 0.23 Runs for OAK 3. 0.21" you'd have a much better idea if there was a massive bias or not. And then it would be up to you to interpret that as you will. And I second your request to have a season-long strike zone map shown for each umpire. While it would still be suspect to the same biases you mention (missing high on 4 seamers intentionally, trying to hit the lower pitches more), at least the sample size could show an ump's general consistency. Great video, I love this information age. Can't wait for robo umps to be consistent enough to use on an every pitch basis. I'm dreading the stupid challenge system that will occur first though.

@darkbreaker9767 8 месяцев назад

I have an idea to fix the favor metric. Multiply the favor swing by the distance from the zone, or set blocks of distance as different favor scores. Especially for high-swing situations like bases loaded full count.

@sawmill035 8 месяцев назад

Excellent video, however, I must say that total distance missed is a very bad idea. Lets take an example Game 1: Home team wins 15-13. 400 pitches were thrown in the game, and the umpire had to make 200 calls. He missed 10 calls by an average of 1 inch each, for a total distance missed of 10 inches. Game 2: Home team wins 1-0. 200 pitches were thrown in the game with 100 calls made by the ump. That umpire also missed 10 calls by an average of 1 inch each, for a total distance missed of 10 inches. You see the problem here? The solution is "average distance missed per call". In game 1, it would be 0.05 inches. In game 2, it would be 0.1 inches. So, the umpire in game 1 was actually better, as we expected. However, imo this is more confusing than relative accuracy, which very clearly indicates the expected accuracy a normal umpire would have given the pitch distance from the zone. I think a percentage from 0-100% is much easier to read than something like 0.00894 inches/call.

@sealeo5772 9 месяцев назад

When I heard kernel density estimation my ears perked up and I got a bit excited that finally something I know about from using GIS software and learning about mapping statistics in school is relevant to nerdy baseball stats.

@joepiazza3756 8 месяцев назад

EUZ is meant to show what a zone usually is called for the ump over a career. It's like an ump scouting report for teams so they know where they can get away with pitching or laying off a swing. So in that first game shown, the calls may all be correct but one that was close but the players expected something else based on his history and thus was inconsistent this game compared to how he normally calls it.

@BaseballsNotDead 8 месяцев назад

That is not how EUZ works. I explain it fully in the video. If what you're saying was the case, every individual ump would have the same EUZ for each game, which they don't.

@andrewszaflarski5379 8 месяцев назад

I guess a question that I also had re: Ump Score Cards is this: Is one side of the shown strike zone considered to be "inside" and the other "outside" regardless of the left/right handedness of the batter, or is one side inside for right handed batters and outside for left handed batters and visa versa? I'm pretty sure that its the former, but I don't know for certain and hadn't found the explanation.

@Falllll 9 месяцев назад

Love this video. I've always been a bit suspicious of certain things on the scorecards, but have never taken the time to dive into how exactly some of those things are calculated, so I appreciate this being explained here.

@josephalvarez5315 8 месяцев назад

Commenting to boost algorithm. This is a great video

@jamesberry3230 9 месяцев назад

the so called true strike zone is not true because the strike zone is a box not a plane at the front edge of the home plate and the whole ball must pass complete thru the strike zone to be called a strike create a box the correct size of the strike zone and have all pitchers throw 24 mixed pitches ( ie. fastball, curve ball, slider, etc.) while umps call balls and strikes; thus obtain an accurate assessment of ball and strike calls

@capraagricola 8 месяцев назад

It's actually fairy least to implement your idea for EUZ -- you can initialize the states of the EUZ to be the exact strike zone and instead of comprehensively solving for the EUZ at the end of the game you can iteratively solve it with each pitch as input.

@G.Aaron.Fisher 8 месяцев назад

Honestly, you could combine your "inches missed" idea with the Overall Favor metric to create a new stat measured in inch-runs.

@freedbygsus 8 месяцев назад

This is a really great video, but I think you have a significant blind spot: the Strike Zone does not have a static size. The top and bottom of the strike zone are defined by the height of 3 points on the batter's body relative to the ground *when the pitch is delivered*. Even before you consider how batters adjust their stance for the pitch delivery, the Strike Zone still varies a good amount from one batter to another. A lot of computerized systems set the zone boundaries based on some percentage of the batter's total height, but two 6' 1" batters can have two differently sized strike zones based on their body proportions. A 6' 1" batter with a taller torso will have a larger strike zone than a 6' 1" batter with a shorter torso and the batter with the shorter torso will have a strike zone that is higher off the ground than the other batter's zone. Now factor in that batters have different stances at pitch delivery and you see even more dramatic variation. All of that is to say that umpire consistency should be a measure of the consistency of their *zone accuracy* from one batter to another. An umpire calling an accurate zone for Altuve and Judge in the same game is demonstrating good consistency (with the rules) and should be appreciated more than an umpire who basically gives up on adjusting to certain batters. That kind of consistency matters because that's what makes batters question whether they can rely upon their own sense of their own strike zone at the plate which significantly affects their approach for a PA.

@DarthAnimal 8 месяцев назад

That being said, consistency is the only thing thats important even if its not calculated correctly. If an Ump has a strike zone in his brain, and he's calling pitches 100% accurate within that strike zone, then thats the fairest possible game for each team, and nothing else should really matter

@olivialambert4124 8 месяцев назад

The metric of distance missed should be squared imo. Not only would distance squared be the default norm for statistics but to me at least it makes sense. If he's missing by half an inch that's quite significantly better than missing by an entire 1 inch. Squaring the distance missed accounts for that. I'd also drop the favour entirely and just do a simple ratio of how many calls were correct for side A vs side B. Anyone who wants to know how much it impacted the game can look in depth at the specific calls. Anyone who won't spend the time looking likely won't be using that metric correctly and wants to see a 2% bias for team x as an easier representation of the data.

@jayball820 8 месяцев назад

My biggest issue with the ump score card is that it is fighting against frame rate. Especially since the only reason I believe we should keep a human ump behind the plate is that if we removed them we would remove the benefit that a catcher can provide by framing a pitch well. Correct me if I'm wrong, but there is nothing on the ump score card that gives them a benefit for having a great catcher that has amazing framing skills. Those "missed calls" on the ump shouldn't be counted against them while the catcher is celebrated. Beside that problem I personally have with ump scorecards, you brought up a lot of good points I'd never been able to articulate. I always knew when looking at it there was something off, but I could never understand why I had that feeling. As always great video, keep it up!

@panner11 8 месяцев назад

Doing that would raise a bit of issue, like a bit of feedback loop. It's like if you adjusted pitcher ERA based on how good the batter is. But then you adjust the batter's stats to how good the pitcher is. If you don't count the missed call when the catcher fools the ump, then you're not awarding the correct call if the catcher wouldn't have fooled a better ump. Things start feeding backs until it normalizes to the mean. This type of feedback loop is why we generally don't adjust for these types of things. Just take the flat stats.

@BeefPapa 9 месяцев назад

What I do is look for the names Hernandez, Diaz or Bucknor and just laugh my ass off.

@user-dg9ki6vo6r 7 месяцев назад

I really like your suggestions. Can we put a "top 5 misses by distance" or "top 3..."? if we need to fill out the space with additional relevant things as well? It would be nice to know how much of the total distance missed was from the individual worst calls.

@user-se4rr3rs3m 8 месяцев назад

Great video! Well explained. I enjoy that you do criticize in a respectful and constructive manner.

@ColumbiaSCRealEstate 8 месяцев назад

Total baseball geekdom... I love it!

@EMETRL 8 месяцев назад

The xAcc stat is really, really important and it's nice to see the progression of sports statistics away from raw, misleading data to more useful data. But I wonder if the xAcc should take into account who the pitcher/batter is and not just look at pitches as if they're thrown by a robot against a bat held by another robot. I'm not an ump so I don't know what it's like but I suspect that there are some intangible things about, say, the way a pitcher throws the ball, that can make it easier/harder to call certain pitches. You could correct for this by looking at whether some pitchers have a statistically significant effect on ump accuracy. I'm sure the better pitchers throw the most difficult pitches to call, but even within pitches that are look identical on paper, maybe some pitchers have a throwing form that just, for whatever reason, confuses people. Or maybe there's batters that have a tendency to distract/obstruct the umpire in some way that shows up in average accuracy of umpires with said batter on the plate. We're fortunate in baseball that in just one season, you get hundreds of data points for basically every batter that's worth a damn, and regularly starting pitchers throw the ball over 10,000 times. That's just one season, and the average length of career of ALL pitchers is 11 seasons. We should really be weaponizing the large sample size of baseball to better the sport.

@DarthAnimal 8 месяцев назад

Favour should be "Adjusted favour per pitch" If a call is missed by a foot, it should be adjusted higher, and it should be per pitch because you might see 300 pitches in a game of baseball, since theres no time limit, and one of those pitches might end up being a grand slam. So if its 4 runs scored after a missed strike called a ball, allowing the batter to stay on, then its a favour per pitch of .013. A low score indicates that any difference in runs came down to luck, basically. And if its only missed by a millimeter then its adjusted to be nearly zero

@MajicMiranda 9 месяцев назад

I think a way to improve the favor metric would be to incorporate the expected accuracy of the call. For example, on a very close pitch, perhaps the expected accuracy of the call would be 55%, whereas a very bad missed call might have an expected accuracy of 98%. Then take that expected accuracy and multiply it by the favor for the missed call. This way an umpire is only penalized about half as much for very close missed calls, while still being penalized essentially in full for atrocious missed calls. True favor is still useful, but I'd call this "weighted favor"

@prestonk6271 8 месяцев назад

Maybe in the “Impactful Calls” section there’s something in parentheses stating how many runs that play accounted for. Ex: (MIA +2.1 R)

@matthiasm4299 8 месяцев назад

I think support vector machines (SVM) might be used to construct a better strike zone estimate. It should not have the problem of high balls / low strikes biasing the zone, since only the points close / over the boundary are used to construct it. Therefore, consistency should work fine. However, it would still have to be somehow combined with the theoretical strike zone for a visual representation that is fair to the umpire.

@robertrogers7938 8 месяцев назад

The EUZ is created from pitches that are taken and then called strike/ball. The problem might be from sample size. In Thomas's (?) game against the Angels and Tigers, there might have been too few pitches in the upper right quadrant called a strike/ball and, therefore, influenced the EUZ. Ohtani and Lorenzen are good enough to pitch low in the zone for strikes and high out of the zone with fastballs for swing and misses. The pitchers in the game might not have left too many pitches high over the plate. AND, most importantly, if they did, then they were probably batted balls and do not count towards the EUZ. Once the EUZ is statistically established at the 50% line, then Umpire Scorecard simply gives consistency based on that EUZ. If the pitch is in the upper right corner (out of the EUZ in this case) and called a strike then it is considered Not Consistent. So, the umpire's accuracy is good but it lowers the consistency. The two main things here to address is (1) sample size and (2) establishing EUZ at 50%. There is no specific number to establish the zone at, but 50% is a good number. Sample size can be a problem, especially when hitters can get the bat on the ball when pitched in a specific location.

@perrytilton5221 8 месяцев назад

I'm not a fan of the scorecard zone because the zone by definition is 3D. These mean very little to me.

@hippokrampus2838 9 месяцев назад

I'm pretty sure the creator of umpire scorecard has admitted that he doesn't like umpires and knows that their information is misleading but doesn't care. CloseCallSports did a video that included that fact via interview. EDIT: it wasn't umpire score card, it was umpire auditor

@sherman4114 9 месяцев назад

That was umpire auditor. I don't recall hearing about any negative biases from the umpire scorecards people.

@hippokrampus2838 9 месяцев назад

@@sherman4114 you're right, I'll edit my comment

@mclew1234 8 месяцев назад

I agree that as data builds having an EUZ of each umpire based on their previous calls would be great. I've always said as a ball player while I having the true zone would be great humans are always going to be slightly off, i'd much prefer a consistent zone that's consistently wrong in a certain way than a zone that's all over the shop. If I know as a player that an ump will call a ball out a strike but a ball on the inside corner will get given a ball I can adjust my approach appropriately & in the pro's players can do this pre game by knowing we have ump X today so we are gonna have to swing at a pitch just outside but I don't have to go after balls on the inside corner etc.

@blakestoudt2131 8 месяцев назад

Excellent breakdown. The EUZ always confused me as well. I can tell how much you care about drawing the right conclusions from this data that is relatively new to the fans.

@Dave__AC 9 месяцев назад

Total distance missed is a cool idea but I wonder how much it is affected by the number of calls eg if you make 100 calls and they all miss by 0.1 that's the same as 25 calls that miss by 0.4 even though that game really should be significantly "worse" imo. I guess it would depend on how consistent the number of calls per game is, if it's relatively stable then that's fine but if not then it might make sense to say average distance missed and just add a denominator of the total number of calls.

@rmp5s 8 месяцев назад

This is one reason why these proposed "robo-umps" are bad. These "scorecards" have a box and if it touched the box it was a strike, if it didn't, it was a ball. There's a bit more to it than that. Calling balls and strikes isn't a perfect black or white thing...there is a bit of interpretation to it and I think this, what I call "the human element", is an important part of the game. LOVE your proposed changes, by the way!!

@ryanzmuda3167 8 месяцев назад

glad they have it. You are correct about the types. Now umpires must be held accountable

@blue17echo 8 месяцев назад

You actually probably want something like the sum of the squares of the missed distance, then maybe normalized to a percentage scale for readability.-- kinda like in linear regression where you seek to minimize the sum of squares of distances.

@AggieRinse 8 месяцев назад

Maybe one could use your distance metric as a scaling factor for the favor resulting from missed calls. I can see the reasoning for releasing the measurements as is, but that could be one idea to refine the overall favor concept.

@MRConvex8 9 месяцев назад

Distance Missed is a nice way to present the information contained in Relative Accuracy, but it requires normalization. In your example you conveniently use two games with the same number of pitches. The value of distance missed is lost if we're using it to compare games with vastly different pitch counts.

@ligomi 8 месяцев назад

I throughly enjoyed the Naked Gun clip

@stephenkasper6081 9 месяцев назад

It sounds like you have the same problem with scorecards that I have with OPS+, giving an exact measurement to an approximate value.

@madaman6556 8 месяцев назад

Total distance missed would be great. If umpscorecards adds it, I would think they should compliment it with average distance missed (total distance missed / # calls missed). Plus something like xDistance Missed where they calculate the average size of zone for MLB players (because of different heights and such) and determine what the missed call distance would be for the average batter. Great video!

@hb-robo 9 месяцев назад

The EUZ progressive deformation suggestion is a fantastic idea.