Predict Football Match Winners With Machine Learning And Python

Dataquest

Подписаться 62 тыс.

Просмотров 178 тыс.

50% 1

Видео Поделиться Скачать Добавить в

Опубликовано:

2 окт 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 214

@vikasparuchuri Год назад

Hi everyone! You can find the data and code for this tutorial here - github.com/dataquestio/project-walkthroughs/tree/master/football_matches .

@knotty2348 Год назад

You are a hero. Had this project in mind for years. You saved me some hundreds of hours of research and learning :) Thanks a lot!

@emmanuelteitelbaum 2 года назад

I like that as the founder of Dataquest, you yourself are providing the tutorial (as opposed to hiring someone). Also, thanks for offering the free access to educators and students.

@Dataquestio 2 года назад

Thanks, Emmanuel! -Vik

@ifkica1822 2 года назад

@@Dataquestio sorry, I just joined Dataquest. can you please tell me if the free option for students is still available?

@ukaszhangiel7610 Год назад

Does this model completely ignore who the opponent is?! From what I see, the features used are: a) general match features - time of the game, home/away b) rolling averages for one team As a result the program tries to predict the outcome of the game completely ignoring who the opponent is. It will come with a predictions which is purely based on general match factors, and the past performance of one team, completely ignoring the specific opponent features. I.e. for a Arsenal game it will give me the same result retrospectively if Arsenal plays the 1st or the last team in the table. Do I get it right? If so, how can it make sense?

@SmallDoggo37 2 дня назад

I also see it this way. I believe it would be predicting based on the the form that both teams are in looking at their last 3 games. So if a team like bournemouth had a three game stretch against burnley, luton, and sheffield and won all three, and city played against arsenal, liverpool, and villa and won 2 of 3, I think the model would predict bournemouth to beat city. I could be wrong and the random forest model might accumulate strength's of teams based on the multiple branches in the decision tree tho. I am not the most familiar with this model. I would love for someone to correct me if im wrong

@ctrl-shift-run8681 2 года назад

This is a very cool project! I ran it across 7 leagues and it is interesting how the same set of predictors get very different results. In England and France, it does pretty well but in Brazil and Japan, not so much.

@Dataquestio 2 года назад

That is interesting! I wonder if there is more variance there due to transfers, less data, etc.

@andrempunkt 22 дня назад

I have a question. In the predictors you have opp_code for the opponent but no code for the actual team (could be called team_code for example). Why is this not nessesary?

@ibukunalade4286 9 месяцев назад

I really love this work. I will try with 10 seasons and make my train 70% of the dataset and my test 30%. But I want to ask, after all is done. How do I predict specific upcoming matches. I plan on adding upcoming games I want to predict to the test part and then predicting from there.

@DarkCode 6 месяцев назад

I'm trying to predict who will win the NHL championship, their divisions, and the rest of the regular season l. I need help with this project, I will be using machine language. I'm using colab. I need help with this. Any takers? Any and all help, would help!

@Qubitmyst 2 года назад

Inspiring well done ! Can you use gf and ga direct columns in your predictors with no using rolling_avarage function ? Now imagine you can get a very good algorithm for prediction after you save the model , how do you use this algorithm for the next season to predict games ?? Can you give me a clue ? For example sesson 2022 - 2023 to predict one game? thank You

@tomi4tv126 7 месяцев назад

You have to use rolling averages because when you try to predict the outcome of the match (before it has started) you wont know gf and ga yet. But we know average gf and ga of last 3 games the team has played. Model can be used for new seasons, but the problem is data. You will have to gather data about games after this video. That is the tricky part, but he made also video before this one about Web scraping (getting new data direct from web). Or maybe you can find some updated data set online (maybe Kaggle). From my experience, those data sets you find online wont have more detailed statistics of game, so it would be best to web scrape the data yourself.

@sakariyaqaase6773 2 года назад

thanks Vic, i tried to run the rolling average function but it's give me this error value ValueError: closed only implemented for datetimelike and offset based windows

@martincal7115 Год назад

I'm having the same issue. Did you find a way to fix it? Thanks

@avikpal6508 2 года назад

I generally opposed to the idea of using AI/ML model for EPL or in any sports , but definitely concept can be reused in multiple business cases . Great job mate !

@harryhaz4629 Год назад

Great video thanks. But I was wondering how do you get the model to predict the upcoming football matches. Let's say Manchester United vs Liverpool etc.

@Captain_Roy16 6 месяцев назад

Can we implement something like Fixture difficulty code and predict more accurately?

@jamespapworth1477 8 месяцев назад

Why do you use RandomForest Classifier for this? Is it superior in someway for this application as compared to other Machine Learning models eg KNN, ANN etc

@madebymate4870 Год назад

This is a very great video, but i don't understand exactly how to predict the individual matches. what parameters and how should i put in rf.predict() if i want to have the outcome of a single match?

@royalzikhali5295 Год назад

did you ever find the answer

@aravindgpandey 2 года назад

Very nice explanation. This is what I was looking for so long. Thanks much

@avibm948 Год назад

Nice video Vic, learned a lot from your videos recently my only criticism is that some of the viewers may feel that they can generate positive returns based on probability higher than 50 or 60 percent. It would be better to predict the probability of winning because the betting reward is based on probability. So assuming we predict that a team wins is 70 percent and the odd reward is less than 7/10 we are going to lose on average, even though our model was right. The reason the model is able to predict with a probability of higher than 50 percent is that some teams are better than others and the betting odds reflect it. One can scrap the odds also and do the analysis but I believe the betting companies already use AI to predict the initial odds. There will be opportunities when the odds differ substantially from a good predictive model.

@goober-ll1wx 11 месяцев назад

yeah its basically a massive nothing burger, you'll still lose money and if by some miracle you can model it well, then your bookie will back you off before you make any money!

@madhuacharyya6963 Год назад

Hi, I have enjoyed watching your demonstration of predicting the EPL game results. However, the predicted results don't reflect the actual results. So my question is, how can I predict more accurate results, and how can I train the dataset. Looking forward to hearing your reply.

@ILikeNoisyGoat Год назад

Hi! Can I make the predicted value into probability? or logistic regression? Thank you!

@sushik.8043 8 месяцев назад

Where can I find a whole spreadsheet like this but for the NFL or NBA?

@KabirKohli-rm7xm Год назад

Hi, Thanks for the awesome video. I had one doubt (might be stupid) The aim of the model is to predict the winner of match between two teams (suppose team A vs team B). But for training the model on a single match result , we are only giving the stats for home team (A). Would'nt it make more sense to add stats for team B also in the same row , and then ask it to make the prediction.

@cevikyi 2 года назад

Hi, thanks for the great video. Why didn't you involve "team" as a predictor in each model as you've used opponent team information? Doesn't this miss the relationship between team A vs team B and so on?

@Dataquestio 2 года назад

Hi Yigit - great question. You are welcome to try it with team and measure error. The reason I didn't use it is because using a column like that can have a tendency to overfit. Some teams have performed really well in the last few seasons, but that doesn't necessarily mean they'll perform well in the future.

@cevikyi 2 года назад

@@Dataquestio Thanks for the guidance!

@samdowns4786 2 года назад

Hi, great video. I am just wondering how to implement this onto matches in the future, predicting who would win the game this weekend for example

@stephenwood6139 Год назад

This is by far the best and most practical video on football predictions I've seen online, very well explained and actually leaves you with something useful afterward. Great work!

@stephenwood6139 Год назад

I managed to resolve this :)

@mirror1023 Год назад

When creating the new columns using rolling_averages, we lost the first few games of the season when we dropped na rows. We also carried rolling averages into other seasons. How do we fix this?

@UzmaLatif-n1b 11 месяцев назад

i just started learning Python n Machine learning. I started learning from your tutorials and it is making me better in Data science day by day. Keep it up. you are best online teacher.

@uncaged3076 5 месяцев назад

Is there anyway I can reference your work? I am trying to use the idea of rolling averages on a project

@sureshmakwana8709 2 года назад

You saved my this semester's Machine Learning mini Project ❤️❤️

@欧阳小匪 Год назад

The video content shared by this author is very good, and it provides a lot of reference directions for predicting stocks. Thank you so much.

@andreeadumitrescu1717 2 месяца назад

Hi! What if I have all the data in a .txt file, one column, and separated rows? How do I translate that in a dataframe? Exemple: FT Greece 3 - 0 Italy Sunday 12/04/2008 FT France 1 - 2 England ....and so on

@pratiek8s 2 года назад

Very informative. Thank you sir.

@francescoscalia3541 2 года назад

hey @Dataquest amazing content. i created the algo to predict games using your tutorial. im asking now what i have to do to make the algo do the predictions for the futures games since i noticed of course it predicted the past games. Could u tell me? thanks!

@CromwellAndy-d4r 15 дней назад

Gonzalez Timothy Thompson Donald Anderson Richard

@titrecords2294 Год назад

Been learning ML on provided data ever since, thank you sir for teaching me in the last tutorial how to curate my own data. 🙏

@paulohss2 Год назад

Great content! May I just ask why you did the division at the end of the tutorial? It was 27 / 40. From where the '40' figure came from?

@rishavmishra5786 Год назад

its 27 for 1 and 13 for 0 , totaling 27+13=40. and weight of 1 in total weight of 40. 27/40

@pranavps1342 Месяц назад

From where can i find historical data of football matches (laliga epl )

@gabriel.o.michael9549 2 года назад

I have to say, you're a natural educator. If you haven't, please consider teaching a younger audience. I bet you'll be good at it.

@Dataquestio 2 года назад

Thank you, Gabriel! I really appreciate that. -Vik

@jacobdebrone 10 месяцев назад

interesting stuff bro You just got yourself a subscriber

@nonsobismark1846 Год назад

Great work... By is there any prediction sites where you update the predictions

@Dataquestio Год назад

Thanks! There is no live site yet, but someone can make one with this code :)

@Rip_Ta4 2 месяца назад

I was looking to extend this, however there would be a problem extending the data. The one problem with these types of predictory models is that there are financial takeovers, financial problems, key players coming in and leaving, player injuries, etc. For example, the massive spending on the Chelsea squad, and them actually doing worse, and that is something that a AI most likely would not be able to predict.

@jamshidnoori1496 2 года назад

why I get this error = TypeError: list indices must be integers or slices, not list after I write this code rf.fit(train[predictors],train['target']).Thanks

@xsquirrel7091 2 года назад

Because you are putting a list as a list indice. In this case you have probably forgot to put the quotation marks in train['predictors'].

@Dataquestio 2 года назад

Hi Jamshid - `train` should be a DataFrame, but it looks like you might have it stored as a list. The full code is here if you want to compare - github.com/dataquestio/project-walkthroughs/blob/master/football_matches/prediction.ipynb .

@jamshidnoori1496 2 года назад

@@xsquirrel7091 Hi, Thank you very much. I have already put " predictors" as variable to choose de columns name. like this ( predictors = ['venue_code','opp_code','hour','day_code']).

@jamshidnoori1496 2 года назад

@@Dataquestio Great work thanks

@jamshidnoori1496 2 года назад

Yes , you are right. I passed the 'train ' and " test " as a list not dataframe. train = [matches[matches["date"] < '2022-01-01']] test = [matches[matches["date"] > '2022-01-01']] But should be like this train = matches[matches["date"] < '2022-01-01'] test = matches[matches["date"] > '2022-01-01']

@alexjamarco 2 года назад

Hi Vikas. Very nice tutorial. I was able to code all along and i was my first ML project. Seems awesome how the computer predicts stuff like this. I have a question: we have our training and testing datasets, right? How can we ask the algorithm to predict an event that it's not on the training data? For example, let's say I have a csv of next weekend's matches. How Can I ask the algorithm to try to predict the winner? Sorry if it seems a silly question, but I actually couldn't find a more clearer way to ask. Thanks and well done once again!

@Dataquestio 2 года назад

Hi Alexandre - you'd basically put the information for next weekend's matches (opponent code, venue code, rolling averages, etc) into a new testing set, and then make predictions on that set.

@kennedyogutu4099 2 года назад

Feed your data into your trained model.

@amragl 2 года назад

@@Dataquestio Hi Vikas, would it be possible to explain it in a different way? I still don't understand it. Many thanks for your videos!!

@nicolasm31 2 месяца назад

Excelente video con muy buena información. Solo una pregunta, como se haría la predicción del resultado para cierto equipo en la siguiente fecha, jornada o partido? ... gracias!

@pstryq224 2 года назад

Great tutorial! Do you have any advice for future matches - what values should I add to the data in my CSV file in a situation when I want to predict the results of future matches? I mean the values that we do not know yet, such as distance, shots on target, etc. All test data in the video have these data supplemented, so I wonder what to put in these "empty" columns. Thank you.

@Dataquestio 2 года назад

Hi there - distance, shots on target, etc, are only looked at for prior matches. If you're trying to predict future matches, you would use the rolling average of those columns from previous matches (this is what the video shows).

@siraatmedia8348 7 месяцев назад

What you did with the rolling averages was impressive. Is there such a thing as when a ML algo creates such features for you? I.e. it randomly multiply/dividing this by that or rolling averages or random features to create a new feature?

@alessandrocerri5668 8 месяцев назад

HI, I have a question, everything was built without taking into consideration the matches that still have to be played so there is no real prediction of future matches but only on those already actually played, correct?

@Makako_Loko 6 месяцев назад

First of all, thank you for this video. I have a doubt, how do I apply this to future matches that will happen? How do I put it in the ML?

@alexCh-ln2gw Год назад

And then behind the scenes corruption happens that causes players to matchfix/throw/lie on the ground for excessive amounts of time and all your betting money is gone.

@acegameboy6232 Год назад

I just finished writing this out and for the most part it works except for this line: combined, error = make_predictions(matches_rolling, predictors + new_cols) error: ValueError: Found array with 0 sample(s) (shape=(0, 12)) while a minimum of 1 is required This line in particular is giving me trouble in both the one I hand wrote myself and copying and pasting your program. I've looked through the code and some forums but nothing seems to be wrong. I think maybe it could be a year issue in that the way to write this out has changed as time went on and that this form of writing it is old. I'm not sure what the issue is so if someone could help me out that would be great. I'm planning to use this as an American Football predicter to see if the program will be able to predict which team will win. I'm doing it primarily because of my cousin and his fondness for fantasy football. It got me a little interested in the sport but I figured I'd create a model to make things a little fun for me.

@Kiirby1x 7 месяцев назад

Hello, could someone explain to me how I could input future games for it to make a prediction?

@danielgonzalez5052 Год назад

Hi Vikas! When doing the rolling part I'm facing an issue that says: "closed only implemented for datetimelike and offset based windows" You know what can be the problem? Thank you!

@angstrom1058 3 месяца назад

An application about as old as the hills...

@stephenbube965 Год назад

am new to this.....was asking how one can get the predictions from the machine learning, am stuck at the combined precision stage and cant find a way of extracting future predictions.any help will be highly appreciated

@torezo9028 7 месяцев назад

Is there a recently updated data set?

@zuzekavova4651 Год назад

i hope you dont stop making these videos

@matilda_aaaaa Год назад

Hi Excellent video and thanks for this. I want to know how I can calculate the rolling averages on sql as I’m not proficient in python

@danielgonzalez5052 2 года назад

Hi Vika, amazing tutorial! I have one question, how should we treat the ties in this model? Thank you!

@Dataquestio 2 года назад

It's up to you. You could make this a 3-class classification problem, and code loss as 0, tie as 1, win as 2. You can also do what's done in the video, and code a tie as a loss.

@doll0101 Год назад

Please somebody help me to plot a graph for output!(source code) pls pls

@AlisaMusicFM Год назад

I know how to achieve 80-85% forecast accuracy, but I can't do it alone, because I need a mathematician

@mosa5x198 11 месяцев назад

Why would you need a mathematician ?

@obaidulmostafa3384 Год назад

Which algorithm did you use to complete this project, Brother?

@tomphillips5513 6 месяцев назад

I have seen a lot of other people ask this in the comments, but there hasn't really been a solid reply... how can you apply this to predict the results of matches that haven't occurred yet? Because this is all well and good to split the data into parts that the ML algorithm sees and does not see, but it is pretty useless when applying it to life because we already know the result of that game that occurred, even if the ML doesn't. Could someone either explain to me what I am missing, or suggest the next steps for predicting matches of which there is limited data recorded already?

@chottomtaki 2 года назад

Thanks for the very interesting training, can you please provide the one relating credit scroring modeling for

@Dataquestio 2 года назад

Thanks for the suggestion - I'll consider it for a future video.

@bonifaceboban368 2 года назад

i got an error like this after writing below code can you please explain how to resolve it preds = rf.predict(test[predictors]) NotFittedError: This RandomForestClassifier instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator.

@tomkmb4120 Год назад

What's a good way to split data for training, test if it doesn't contain something like a DateTime component?

@kevwhiteford5167 Год назад

Is there a quick way to add and predict up and coming matches?

@chrissherman6591 7 месяцев назад

Love the video, once I finish the model how do I feed in data from new games

@anlgoy9386 Год назад

My English is weak, so I'm using Chatgpt for translation. Can we combine data from different websites to create a CSV file and analyze it to increase our chances of winning? For example, we could gather match data and odds from Flashscore, voting results from Oddsportal for each match , and win/loss probabilities from Tablesleague. Then, we could use artificial intelligence to create a prediction program. Would you be interested in this?

@manasseholowoyeye3236 Год назад

did you later discover any means or do you use any app currently?

@НиколайУваров-у5н Год назад

Big thanks for this video! Helped me a lot! Tried this method on my project with soccer data analysis and everything went fine until this function: "def make_predictions(data, predictors):". Got KeyError: "["rolling_cols"] not in index". Any advice on solving this issue? Thanks in advance!

@bigtomDW Год назад

" predictions + new_cols " seems to be my issue. having predictions by itself doesnt throw the error.

@alemassa6632 Год назад

Wonderful, I litterally have understood nothing but.... wonderful!

@mrcaljoe1 Год назад

37:50 what does the ** before map_values do?

@johanBe75 Год назад

So many great Reviews, but yet just youtube!

@FlisB Год назад

Interesting. I was running a similar model on football matches, except that I had rolling attributes of both teams as the predictors and the class was home_win, draw, away_win. A match is included only once. However I think your approach might be better.

@kiss-my-axe8810 7 месяцев назад

what was your win%??

@InvestorLondon Год назад

Amazing Video! Your really helping me Through my ML journey!

@berrauniverse 10 месяцев назад

Did this using logistic regression with binary classification and achieved a 70% precision. Used different parameters for training the model though. Also had to put the sleep time to 10 seconds when scraping to avoid 429 HTTP response.

@cgruita 5 месяцев назад

Wow, 70% precision is very impressive! What did you use? XGBoost, LightGBM?

@kavinpandian 3 месяца назад

great tutorial!

@johnowusukonduah2305 Год назад

Is it positive to add the concept of time series to model the performance behavior of teams in the epl?

@benjaminmwangi6872 2 года назад

Hi, 1. Kindly suggest a roadmap for me to adequately comprehend this project. I have no experience in the field nor programming background. 2.How do I run this project in the meantime as i upscale my skills? Awsome tutorial. Got yourself a believer.

@Dataquestio 2 года назад

I would recommend following the data scientist path at Dataquest - www.dataquest.io/path/data-scientist/ . This will help you learn all of the skills (including programming) to build this model.

@NguyenNamDuong-kx4gu Год назад

can you do it for the future :( i really need it

@mhch77 Год назад

Hey Vick, Great Video! Wanted to ask how would I go about making predictions for a single match?

@kenneth_wu Год назад

Great video. Thanks for sharing. I think I am going to have a try.

@adrianfong4347 2 года назад

Hi Vik! I am learning so much through this video and decided to try adopt it to NBA data too:) . I am running into an issue where I merging the combined dataframe with on left_on = game_date, team and right = game_date, opponent. However, my new merged table is blank. My theory is that despite my data having the same 3 letter abbreviations for the teams (LAL, WAS, CHI, etc) in both the team and opponent, python is saying they aren't the same and not joining the tables. They are both 'object' data types (if that matters...). Any recommendations on how I can make them identical? Thank you!

@Dataquestio 2 года назад

Hi Adrian - do you actually have data from both sides of the match? For example, if LAL played WAS, you would need a row where WAS is the team and LAL is the opponent, and a row where LAL is the team and WAS is the opponent for the same game day. If you don't have this, you would need to create those rows (by duplicating the dataframe then swapping team and opponent) before merging.

@Dataquestio 2 года назад

You would also need to swap points for/against, etc.

@FlisB Год назад

Did you scrape the data from basketball-reference?

@laus-thecurious4120 2 года назад

where can i get this dataset other than your github . i want dataset for indian super league .

@Dataquestio 2 года назад

This video shows how to scrape the data - ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-Nt7WJa2iu0s.html . You can modify this for other leagues.

@velsiu 8 месяцев назад

how to use it to predict future matches from like today or tomorrow ?

@meetupadhyay9687 Год назад

Hey what is train test percentage?

@rodi21 Год назад

Amazing, Vic! I'm following you! Great job and explanation!

@thiagotms1 Год назад

This some quality video! Thanks!

@miggroup5557 2 года назад

amazing. I have selfish desires to learn ML. I want to create something for the housing market. Can someone help??

@Dataquestio 2 года назад

Hi there - I'll probably do a video on predicting house prices with ML soon. Getting the data is the hardest part...

@hristolakov3563 2 года назад

Why are we only looking at matches that have been played? I mean, i understand it for the learning part and the back testing, but the machine hasn't actually predicted a match, that hasn't been played, from the date of the video going forward. That would have been useful. Is it like we just have to add these upcoming matches to the matches.csv? It is what i am trying to do, but it is pretty tough for a beginner, like me. Will push harder, hopefully find a solution. Thank you for the video and the great explanations.

@hristolakov3563 2 года назад

When we merge the 'matches' with 'shooting', we basically get rid of all the future matches. I should probably keep the not-played matches in the list somehow with NaN values under shooting?

@Dataquestio 2 года назад

If you want to predict future matches, you can just feed them into the prediction methods. The reason we remove the rows where matches haven't been played is because we can only use data for training if we know the outcome. But once we train a model, you can feed that data in to get future predictions (the same way we feed in the test set).

@janeklebor2851 5 месяцев назад

will smith teaching ml

@ಅಮಾಯಕ 2 года назад

Bro. Literally learnt to play with data in just 2 videos. Thanks.

@chasingwildlife6584 Год назад

Great Video Vik. Love the work. Thanks for giving us this great resource. Now time to find the rest of the data.

@StartupPickMeUps 2 года назад

This is so good! It would be good to see a video on exactly how to feed in future fixtures as I'm unclear on how this is achievable :D

@Dataquestio 2 года назад

Hi Liam - thanks for the suggestion. What you need to do is pass in future data to the predict methods, the same way we're passing in the test set now. I can look into making a video.

@StartupPickMeUps 2 года назад

@@Dataquestio after asking this question, I actually gave it a go myself but unless I add future data to my test data, I’m unsure how to do it, and it takes the accuracy is way off for me :D

@pain-nw5lo 2 года назад

@@Dataquestio Yes please! Im also stuck on passing future data :c

@robnotaro8584 Год назад

How do you use this to predict the upcoming weeks matches??

@Denis-bu4ri 7 месяцев назад

@Skeeyeee613 Год назад

Thank you very much for such wonderful content. When I try running your line 65 I'm getting an error saying mapping is not defined. Any suggestion?

@johanBe75 Год назад

it is fake tutorials with clickbait. Just look at reviews so many of then so great isn´t it?

@agdaltarek 2 года назад

hello, my question is how would you deal with predicting newly promoted teams results ? especially teams that maybe are promoted for the first time in a very long time.

@Dataquestio 2 года назад

This is a tricky one. You could build a separate model to predict how well a team will do in the first season after promotion based on lower league results.

@agdaltarek 2 года назад

@@Dataquestio yep maybe based on previous promoted teams, i thought about that

@ericmckee8007 2 года назад

Thank you greatly, this has been extremely helpful. I ran into a KeyError issue when running make_predictions telling me that all of the rolling columns were not in index (gf_rolling,..). Do you have an idea as to why this is happening? I followed the code exactly, so I'm not sure what is causing this... If I remove "+ new_cols" when calling the function it works fine. Thanks again

@Dataquestio 2 года назад

Hi Eric- this would happen if the new columns aren't in the matches_rolling dataframe. This is the code that adds the columns - "matches_rolling = matches.groupby("team").apply(lambda x: rolling_averages(x, cols, new_cols))"

@PeterKrusz91 2 года назад

At line 30, on the 17:49 mark, when we run, preds = rf.predict(test[predictors]) , I get a ValueError, "ValueError: Found array with 0 sample(s) (shape=(0, 4)) while a minimum of 1 is required." Is anyone running into a similar issue?

@Dataquestio 2 года назад

Hi Peter - I'm guessing your test set is empty. You might want to check your code that splits the train and test set up. -Vik

@acegameboy6232 Год назад

@@Dataquestio what about line 58? I get a ValueError saying ValueError: Found array with 0 sample(s) (shape=(0, 12)) while a minimum of 1 is required What can I do to fix this? I typed everything in correctly and I even did it 5 times and it gives the same result.

@Test-zw3ht 2 года назад

Hi, I used your scraping code to collect the model data and code taking "result" (astype("category").cat.codes) the accuracy became as I assumed lower suddenly i 've used RandomizedSearchCV to see if there would be improvement. Then...then added "gf","ga" as predictor. With the same parameters except criterio="entropy", using sklearn's classification_report I got an accuracy of 0.98 and f1_score>=0.95,précision>=0.93 for each of the target values (0,1,2). However I don't know much about football so maybe I took observable preachers after the game. Anyway I wanted to say thank you

@Dataquestio 2 года назад

You don't want to use `gf` and `ga` as predictors. Because you won't know these until the match is over and you already know the winner. That's why your accuracy is so high - because the model is being fed the answer.

@amragl 2 года назад

Hi!, I don't think I understand how you can use the rolling_average cols on the predict dataset, you wouldn't have that information until after you match is finished, right? so, how can those columns be used in the predict dataset? , Many thanks for your great videos and content! Well explained and very educative.

@Dataquestio 2 года назад

Hi there - the rolling average is computed on matches prior to the current one. We don't use any knowledge of the current match. -Vik

@amragl 2 года назад

@@Dataquestio many thanks for taking the time to respond!! You and your learning platform are awesome 😎!!!

@chigstardan7285 2 года назад

This video came at the right time i trying to figure how to get rolling averages for a dataframe and especially that part with the 'left' argument, Thanks so much.

@Dataquestio 2 года назад

Glad it helped! -Vik