Тёмный

Predict Baseball Stats using Machine Learning and Python 

Dataquest
Подписаться 59 тыс.
Просмотров 17 тыс.
50% 1

We'll predict future season stats for baseball players using machine learning. The stat we'll predict is the wins above replacement (WAR) a player will generate next season.
We'll first download and clean baseball season data using python and pybaseball. We'll do feature selection using a sequential feature selector to identify the most promising predictors for machine learning. We'll then train a ridge regression model to predict future season WAR. We'll measure error and improve the model.
In the end, you'll have a model that can predict future season WAR and the next steps to improve the model.
You can find the full code here - [project-walkthroughs/baseball_games at master · dataquestio/project-walkthroughs · GitHub](github.com/dataquestio/projec...)
Chapters
00:00 Introduction
02:00 - Download the data
05:52 - Creating an ML target
09:15 - Cleaning the data
16:19 - Selecting useful features
27:13 - Making predictions with ML
38:15 - Improving accuracy
49:26 - Diagnosing issues with the model
52:28 - Wrap-up and next steps with the model
-----------------------------
Join 1M+ Dataquest learners today!
Master data skills and change your life.
Sign up for free: bit.ly/3O8MDef

Опубликовано:

 

10 июл 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 40   
@imfrshlikeuhh
@imfrshlikeuhh Год назад
the fact that this type of content is FREE is mind blowing
@anishapostate4221
@anishapostate4221 Год назад
he fact that people are not knowing this is another mind blowing thing
@imfrshlikeuhh
@imfrshlikeuhh Год назад
@@anishapostate4221 i wldnt say that, there are plenty more ppl who dont know this than do
@DanielGarcia-uq8yz
@DanielGarcia-uq8yz Год назад
Great project...love the concept of dataquest's guided project walkthroughs. Thanks Vik
@kingofhavila9850
@kingofhavila9850 Год назад
That day I joined the webinar slightly late so I was excited about watching this video.
@jscott21
@jscott21 Год назад
Incredible video - thank you so much
@SuperNunera
@SuperNunera Год назад
Ty for sharing. Amazing content.
@pushkarratnaparkhi2205
@pushkarratnaparkhi2205 Год назад
Great video. Thanks 💯💯
@evanmaurer1968
@evanmaurer1968 Год назад
I appreciate this content sir. Thank you so much!
@reena3571
@reena3571 Год назад
Thank you immensely for sharing
@leassis91
@leassis91 Год назад
thank you for this content!
@tomkmb4120
@tomkmb4120 Год назад
Hey Vik, coming here from your more recent video with NBA stats analysis. In this instance, is pybaseball replacing the more manual work being done by playwright and having to parse the specific html in order to scrape the data you need? Is there an equivalent for the NBA to pybaseball? I think there may be one for the NFL that I've seen in places but this is all new to me so I can't be sure. Just struggling a bit with adapting that previous video to be a regular python file instead of following along directly with your Jupyter tutorial is all.
@hakeemyatim5363
@hakeemyatim5363 Год назад
Hello! This is an awesome project and walkthrough that you've done! I actually wanted to try predicting HR's instead of WAR's in this model, but when I tried it with scaling the data for ridge regression, I would get HR numbers between 0 to 1 with the minmax scaler. But if I skip that part, I'd get the whole number of the predicted HR for the next year. Would it still be accurate if we are just looking at HR's when I skip the scaling? Again, Great Video!
@Dataquestio
@Dataquestio Год назад
You don't want to scale your target column. So if you're predicting HRs, you want to scale all of the columns except the HR column.
@cloudcomputingbd
@cloudcomputingbd Год назад
nice
@henryryan5194
@henryryan5194 Год назад
I might be missing something, but... Once you have trained and tested the model, what is the process to apply the model to predict the following year? In this video you trained the mode to predict the "Next_WAR" which in this case would be the players 2022 WAR, and then evaluated the model based on the real result vs. your predicted result. But, if you wanted to predict 2023 WAR, how would the code need to be adjusted? Essentially, how do you used the trained model to predict 2023 player WAR?
@willcarroll9762
@willcarroll9762 Год назад
You ever figure it out? I’m struggling there too
@Chris-rl6rw
@Chris-rl6rw Год назад
@@willcarroll9762 This model can only predict one year out into the future. To predict 2023, you would need 2022 data. It's not necessairly a full time series analysis, but a linear regression model used to predict the following years stats. Predicting Next WAR is predicting next years stat. You could attempt to create a column for 2 years out into the future by shifting the 'WAR' column again and testing how the model predicts two years into the future and so on. My guess is it may start performing poorly at that stage.
@LouieWinehouse
@LouieWinehouse Год назад
you could train it based on the first 3 months of data to predict the next 6 months of the season or however u want. For my mlb ML model i train it on March-July to predict August-October
@tomkmb4120
@tomkmb4120 11 месяцев назад
A little confused on the Sequential Feature Selector, you mention that after normalising the data - it picks the features that it thinks will help with accuracy the most, how is it determining that? Sorry if that's a stupid question.
@arundey3971
@arundey3971 Год назад
any idea on why pybaseball package no longer loads. I tried pip install pybaseball, and I get an error.
@chealol4233
@chealol4233 23 дня назад
How would you be able to do this for "Predicting" an player to record a hit in a given game? Is that possible?
@gianpierrealvarado993
@gianpierrealvarado993 4 месяца назад
Does anyone know why I wouldn’t be able to import pybaseball on JupyterLab anymore? I’m trying to follow along on my own notebook and for some reason I’m getting an error code that the module doesn’t exist. Thanks for any help in advance!
@paperk1d
@paperk1d Год назад
Is it possible to this in R I am just started to learn about programming so I don’t have much knowledge about this
@fudgenuggets405
@fudgenuggets405 9 месяцев назад
I don't think pybaseball is working any more. I get a blank .csv at the beginning after supposedly downloading the Fangraphs data.
@vitonash
@vitonash 9 месяцев назад
a bit confused on what the purpose of making the full copy and then dropna() was. it doesn't seem like the full copy was used at all throughout the rest of the code?
@wanjohisamuel8547
@wanjohisamuel8547 Год назад
Your videos are amazing. I'm starting to love ML. What advice will you give to someone who is starting Data Science...
@Dataquestio
@Dataquestio Год назад
That's great to hear, Wanjohi! I actually started a site called Dataquest where you can learn data science from scratch - the data scientist path will teach you all the main data science skills - www.dataquest.io/path/data-scientist/ .
@AlyssaFord-xs3ht
@AlyssaFord-xs3ht Год назад
I am having trouble finding the batting csv file
@tjans1979
@tjans1979 Год назад
What editor are you using for this?
@turtle1897
@turtle1897 Год назад
It’s Jupyter Notebook
@zachbroussard8734
@zachbroussard8734 Год назад
I’m not getting the CSV when I run this. Can anyone help?
@el_goomba
@el_goomba Год назад
how would you adjust the code to predict 2023 war?
@kellybjames
@kellybjames 4 месяца назад
did you solve for this?
@peter93263
@peter93263 Год назад
Can you do something similar for English Premier league soccer?
@AbrarMuhtasim
@AbrarMuhtasim Год назад
'Customer segmentation and clustering in retail using machine learning' with real data set. Please make a project tutorial in this project😭😭😭😭
@emmamutegi5919
@emmamutegi5919 Год назад
I have a problem running this...help removed_columns = ['NEXT_WAR', 'Name', 'Team' ,'IDfg', 'Season'] selected_columns = dataset.columns[~dataset.columns.isin(removed_columns)] 'AttributeError: 'function' object has no attribute 'columns'
@Dataquestio
@Dataquestio Год назад
It looks like 'dataset' is a function for some reason. It should be a pandas Dataframe. Make sure you didn't accidentally assign to the `dataset` variable.
@turtle1897
@turtle1897 Год назад
@@DataquestioI have that same issue and I have just started Dquest and was just using this as a follow along project while I wasn’t studying. I have some knowledge but not yet to this stage yet just working towards familiarity
Далее
УНИТАЗ В ЛЕСУ?? #shorts
00:24
Просмотров 803 тыс.
Predict NBA Games With Python And Machine Learning
58:33
181 - Multivariate time series forecasting using LSTM
22:40
Predicting the Winning Team with Machine Learning
29:37