Тёмный

Web Scraping Football Matches From The EPL With Python [part 1 of 2] 

Dataquest
Подписаться 60 тыс.
Просмотров 101 тыс.
50% 1

In this video, we'll learn how to scrape football match data from the English Premier League.
We'll download all of the matches for several seasons using Python and the requests library. We'll then parse and clean the data using BeautifulSoup and pandas. By the end, we'll have a single pandas DataFrame with all of the EPL matches for multiple seasons.
In the next part of this series, we'll use the data we scraped to predict which side will win each match.
You can find the code we write here - github.com/dataquestio/projec... .
Chapters
00:00 Introduction
01:21 Scraping our first page with requests
05:07 Parsing html links with BeautifulSoup
10:40 Extract match stats using pandas and requests
14:21 Get match shooting stats with requests and pandas
18:09 Cleaning and merging scraped data with pandas
22:07 Scraping data for multiple season and teams with a loop
35:42 Final match results DataFrame and next steps
---------------------------------
Join 1M+ Dataquest learners today!
Master data skills and change your life.
Sign up for free: bit.ly/3O8MDef

Опубликовано:

 

1 авг 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 220   
@imfrshlikeuhh
@imfrshlikeuhh 2 года назад
Really really enjoy your content. Love the examples. Love the teaching style. Love the explanations.
@benjaminhorn8420
@benjaminhorn8420 2 года назад
This was very useful! Thank you. I also had issues with the Premier League data so scrapped La Liga instead which worked fine. Will now attempt to follow the second part!
@nikolavladimirov8838
@nikolavladimirov8838 8 месяцев назад
Outstanding tutorial with concise explanations for each line of code! Great for both beginners and advaned pandas users
@JoaoSantos-jb7ul
@JoaoSantos-jb7ul 2 года назад
Nice explanations, Vikas! The combination requests + Beautiful Soup + pandas is fantastic! Thanks! Greetings from São Paulo, Brazil!
@Dataquestio
@Dataquestio 2 года назад
Thanks, Joao! -Vik
@jonathanchagolla9217
@jonathanchagolla9217 2 года назад
Love your teaching style. Thanks for this content!
@Dataquestio
@Dataquestio 2 года назад
Thanks, Jonathan! -Vik
@everflores9484
@everflores9484 Год назад
Something I did that may be useful for other people: I added a comment before every line/block to tell future me what I was doing. Great video!
@rodneymawero9063
@rodneymawero9063 2 года назад
Adding l = links at 8:22 saved the day! Thanks for the video!
@mementomori8856
@mementomori8856 2 года назад
I've always wanted to work on a project on football since it's my favorite sport, this is a good starting point. Love your pace as well 🙏🏽.
@35162me
@35162me 5 месяцев назад
hello A year later. How is the project coming along? Just an interested party.
@DamilolaAyodele-wq1su
@DamilolaAyodele-wq1su Год назад
Thank you so much! I've been putting off scrapping data online forever. Finally did it, thanks to you
@4tifk
@4tifk 23 дня назад
thank u vikas paruchuri...this video saved me...greetings from pakistan...teaching style very good!!!
@migi7787
@migi7787 2 года назад
Wonderful teaching, wonderful project, so easy to access the knowledge, THANK YOU!!!😊
@Dataquestio
@Dataquestio 2 года назад
Glad you liked it :) - Vik
@kevinr662
@kevinr662 10 месяцев назад
you are a good teacher clear and precise and i wish you all the success in the world. thank for the info
@principeabel
@principeabel Год назад
All your videos have helped me a lot. Thank you very much for your videos, I learn a lot. Thank you for this content that you upload 😊
@bencole8301
@bencole8301 2 месяца назад
Really enjoyed this walkthrough! Thank you for sharing!
@williehogan1822
@williehogan1822 2 года назад
Excellent content and super teaching style. Thank you for sharing. Keep it going, it's very much appreciated.
@Dataquestio
@Dataquestio 2 года назад
Thanks, Willie!
@AndrewGuimeres
@AndrewGuimeres 10 месяцев назад
Thanks for the tutorial. It was really easy to follow. keep up the good work. Cheers!
@andynos
@andynos 2 года назад
Thanks man!!! you are doing great. Very interesting to watch your videos
@ameybikram5781
@ameybikram5781 Год назад
Wow , thank you so much, you made web scrapping look so easy .
@kurkdebraine8139
@kurkdebraine8139 Год назад
Perfectly explained. TY a lot ! :)
@tamzen7945
@tamzen7945 Год назад
Thanks for the motivation. I wasn't sure if I could do it, but I might try it eventually.
@Dataquestio
@Dataquestio Год назад
I hope you try it!
@simmiesanya6003
@simmiesanya6003 Год назад
This is really awesome. I learnt alot. I'm having issues scrapping multiple years though. Something about remote host cutting off the connection
@josephchoi7362
@josephchoi7362 2 года назад
You have THE most soothing voice
@asdfgh6906
@asdfgh6906 2 года назад
I love your explanations
@roopeshpyneni5496
@roopeshpyneni5496 7 месяцев назад
Nice explanation! Really helped me a lott!
@04mdsimps
@04mdsimps 2 года назад
Well thats my day sorted. Kudos sir
@kennyquango76
@kennyquango76 Год назад
This is an excellent tutorial. Thank you very much!
@jaikermontoya8891
@jaikermontoya8891 2 года назад
Thank you very much. I have learned sooo much with this video.
@Dataquestio
@Dataquestio 2 года назад
Glad it was helpful! -Vik
@thewebscrapingclub
@thewebscrapingclub Год назад
Great tutorial, thanks.
@Chariotzable
@Chariotzable 2 года назад
You are a great teacher. Thank you so much for sharing. When should we expect part 2?
@Dataquestio
@Dataquestio 2 года назад
Thanks a lot! Part 2 is actually live here - ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-0irmDBWLrco.html .
@SuperSumittanwar
@SuperSumittanwar 2 года назад
Awsome content and nice new mic that you have now👌
@Dataquestio
@Dataquestio 2 года назад
Thanks! -Vik
@Yahik00
@Yahik00 Год назад
Hey VIK I recently came across this video. I found it very helpful, and I'm trying to extend it to include the other tables as well. However, I've encountered some difficulties in retrieving the other tables using the approaches you mentioned in the code. I've tried searching for specific URLs or identifiers, but I haven't been successful so far. I was wondering if you could kindly provide an example code snippet that demonstrates how to add the passing table or any other table from the website.
@Qhorin
@Qhorin Год назад
Super cool, thank you!!
@prawson81
@prawson81 Год назад
brilliant, I too am getting value errors, just trying the time adjustments now.
@samcrowson167
@samcrowson167 Год назад
This is a great tutorial. I tried following along but instead of team stats tried extracting player stats for the season. fell over on the last hurdle of the loop. But going to give it another go this evening. Great content, thank you
@thomasyusuf1366
@thomasyusuf1366 10 месяцев назад
did you ever figure it out?
@sebbyclarke2304
@sebbyclarke2304 6 месяцев назад
YEAH PLEASE LMK
@samcrowson167
@samcrowson167 4 месяца назад
@@sebbyclarke2304 Hi, I used the below code to complete the loop at the end of the script. You should be able to follow the video and amend teams links with players, then apply something similar to the below as the final step. combined_df=pd.DataFrame(columns=individual_matches.columns) combined_df["Player"]="" for squad_url in squad_urls: player_name=squad_url.split("/")[-1].replace("-Match-Logs", "").replace("-", " ") data=requests.get(squad_url) individual_matches=pd.read_html(data.text, match="2005-2006 Match Logs")[0] individual_matches.columns=individual_matches.columns.droplevel() individual_matches=individual_matches[individual_matches["Comp"]=="Premier League"] individual_matches["Player"]=player_name combined_df=combined_df.append(individual_matches) time.sleep(1)
@chadrickclarke1730
@chadrickclarke1730 2 года назад
Hey, thanks for the video. Would you be able to give some guidance on how to pull the info from the match report ?
@MrWopper7
@MrWopper7 2 года назад
Awesome vid man thanks!! when is part 2 coming out? :D
@Dataquestio
@Dataquestio 2 года назад
It came out today! You can find it here - ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-0irmDBWLrco.html .
@danielcharles1086
@danielcharles1086 Год назад
One can notice the mastery of the subject in you throughout. Thank you will be following other tutorials
@satyajitpaul339
@satyajitpaul339 2 года назад
informative...thanks
@Dataquestio
@Dataquestio 2 года назад
Glad you liked it! -Vik
@alessoclass3929
@alessoclass3929 2 года назад
Waiting part 2 :)
@Dataquestio
@Dataquestio 2 года назад
Part 2 is live! It's at ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-0irmDBWLrco.html .
@judechi2652
@judechi2652 Год назад
hi Vik and everyone else, I have an issue which I'm hoping anyone can help me fix. on trying to concatenate all_matches with the code match_df = pd.concat(all_matches) the error message is that there's nothing to concatenate
@waves3188
@waves3188 2 года назад
Tip: When web scraping assign the html code to a variable or copy it to a notepad as a text file before the site you're working with kicks you out for exceeding max requests. Learned this the hard way lol 🥴
@Dataquestio
@Dataquestio 2 года назад
Great tip! I personally like to cache files where I can (just save them as html files) and then load from disk if I need to. -Vik
@Yahik00
@Yahik00 Год назад
How long does it block?
@sigfigronath
@sigfigronath 2 месяца назад
we need another EPL video :)
@RubensBarrichello.
@RubensBarrichello. 4 месяца назад
I went at it with a different approach. I started with the year I wanted to start with and did 'next season'. that way the dataframe is in chronological order. Otherwise it would read the August 2022 to may 2022 and then previous season is scraped thus Auguest 2021 to may 2021 follows.
@user-mq3js2cb3h
@user-mq3js2cb3h 6 месяцев назад
Sir this is a great video. It is helping me get started in web scraping. You didn't close the parentheses in your last long code having try and except part 31:30
@lyonoconner448
@lyonoconner448 11 месяцев назад
excelent , part 2 ?
@mainacyrus74
@mainacyrus74 10 месяцев назад
i really love your video.. i have a question tried scrapping two football sites and compare the data but its becoming tricky as both websites have different naming of the same team how can i resolve that issue
@matthewmoore8445
@matthewmoore8445 Год назад
Is anyone able to explain to me why the code that was utilized in the project does not extract future matches? Been banging my head off the wall on how to get these future matches in and I cannot figure out why.
@cuttell2000
@cuttell2000 Год назад
Great video. Is there a part 2?
@enochtolulope15
@enochtolulope15 2 года назад
Thank you for this tutorial. However, I ran into errors that I couldn't solve. I tried concatenating the dataframes using "pd.concat(all_matches)" but I keep getting "ValueError: No objects to concatenate". What could be the issue?
@Dataquestio
@Dataquestio 2 года назад
Hi Enoch - this will happen if the `all_matches` list is empty. Are you sure you're appending the match data to the list? The code is here if you want to check - github.com/dataquestio/project-walkthroughs/blob/master/football_matches/scraping.ipynb
@MaartenRobaeys
@MaartenRobaeys Год назад
Hi, extremely valuable, where to find part 2 please? Thanks
@joeguerby
@joeguerby Год назад
Thanks for this amazing video, i got an error on all_matches.append(team_data) line. all matches ins not defined. How can you help me to fix it ? Please
@2ncielkrommalzeme210
@2ncielkrommalzeme210 7 месяцев назад
can we apply this requests to horse race every eac horse. to investigate their perfornmance and predict their feature tendencies.
@davidwisemantel5041
@davidwisemantel5041 2 года назад
Love it, thanks so much! OOI, how would have you got the table using other means such as id? (rather than matching the string)
@Dataquestio
@Dataquestio 2 года назад
Thanks, David! You can get the table by position (when pandas parses html, the first table on the page is element 0 in the list, and so on). You can also do it by id by first extracting only the table html with beautifulsoup, then parsing it with pandas.
@davidwisemantel5041
@davidwisemantel5041 2 года назад
@@Dataquestio Makes sense. Sorry one more quesiton. How would you deal with a situation where each key value is it's own table? For example if you were scraping horse racing data, where each horse had it's own table of information. Using concat would join the data but how would you reference the key? TIA!!!!!
@miiyyke
@miiyyke Год назад
Please 🙏🏾 I’m getting error once’s I reach Import pandas as pd matches = pd.read_html(data.text, match=“Scores & Fixtures”)
@DozieOhiri
@DozieOhiri Год назад
Please can anyone explain why at 6:50 he only calls the first index of the standings table?
@majidmenouar2444
@majidmenouar2444 2 года назад
Great rythm, When is the next video taking place please
@Dataquestio
@Dataquestio 2 года назад
We'll be releasing the next video on Monday. -Vik
@Nunexx97
@Nunexx97 2 года назад
What changes do I have to make to the script to collect only match data without the shooting stats? The shooting stats section is currently empty on FBref... Thanks a lot for the great video!
@Dataquestio
@Dataquestio 2 года назад
Hi Nuno - you should just be able to remove the code to scrape the shooting stats, and everything else should work fine!
@abdulmalikbello538
@abdulmalikbello538 2 года назад
Amazing content! Thanks a lot. I noticed that the shooting data has been summarized as of today(10/05/22), it is no longer a detailed match by match table.
@Dataquestio
@Dataquestio 2 года назад
Thanks, Abdulmalik! That's too bad about the shooting data on fbref. Hopefully it is a temporary bug, and will be fixed.
@joeguerby
@joeguerby Год назад
@@Dataquestio In fact this is not a bug, the code actually allows to extract the sum of the shots of the match and not the number of shots per team. I have revised the code so that I get the stats of the team and the stats of the opponents of the team. This is the code for scrapping shots by team and opponent : teamshooting = pd.read_html(data4.text, match = "Shooting")[0] oppshooting = pd.read_html(data4.text, match = "Shooting")[1] teamshooting.head() oppshooting.head()
@ditirorampate502
@ditirorampate502 Год назад
when trying to scrape for seasons from 2016, there is KeyError: "['FK'] not in index", dont know what causes it. what might be the problem?
@temitopeayoade5924
@temitopeayoade5924 2 года назад
hello, when running the for loop, i am getting No tables found, i have check the code on github, everything is same. please help...
@rezarafieirad
@rezarafieirad 2 года назад
thanks alot. very nice explanation. where is part 2 ?
@Dataquestio
@Dataquestio 2 года назад
You can find part 2 at ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-0irmDBWLrco.html .
@user-wy4yu8rv3v
@user-wy4yu8rv3v Год назад
I got an issue with match dataframe match_df and it says it is not defined. How can i fix it ? Thanks
@Hkillelea0924
@Hkillelea0924 7 месяцев назад
At the end of code it doesnt return anything for me for len(all_matches)? Also, the tables didnt print out at the end when I typed in match_df
@bhargavpandit2300
@bhargavpandit2300 6 месяцев назад
FBRef just doesnt allow me to scrape data anymore?? I always get a 403 status code back. any one else facing the same problem?? What can I do to fix it ?
@adelekefikayomi8351
@adelekefikayomi8351 Год назад
I need help anybody! I tried to webscrape other sections like passing,goal shot creation etc but it's saying list out of index Any ideas anyone?
@saladin2020
@saladin2020 Год назад
may I know why is there error quite often on the class name of table.stats_table while using the css selector? standings_table = soup.select('table.stats_table')[0]
@pratyushshankar9613
@pratyushshankar9613 Год назад
same error. did you find a fix?
@suhaas1709
@suhaas1709 6 месяцев назад
I get ‘html’ is not defined error @27:30 would really appreciate any help with this issue
@shahabasmuhammed7523
@shahabasmuhammed7523 Год назад
links gives me a null list, can anyone help me with this ?
@kmind71
@kmind71 Год назад
I'm still trying to understand at around 16:20 when you do a List Comprehension as links = [l for l in links if l and 'all_comps/shooting/' in l] you have to add the "if l" portion of the condition. I know you mentioned that you add it because some of the list items don't have an 'href' but it's still not clicking for me. Any chance you or someone could please go into detail a tad more? Thanks so much!
@Dataquestio
@Dataquestio Год назад
This filters out any cases when l is None. So if there is no href, then None will be assigned to l, and we can filter it out with this list comprehension.
@kmind71
@kmind71 Год назад
@@Dataquestio Thank you!
@sheetcreate9016
@sheetcreate9016 Год назад
team_data = matches.merge(shooting[["Date", "Sh", "SoT", "Dist", "FK", "PK", "PKatt"]], on="Date") got an err AttributeError: 'list' object has no attribute 'merge' how to fix this error?
@hamzaelmi5584
@hamzaelmi5584 5 месяцев назад
is it possible to use pychar / vscode for this project? not that familiar with Jptr / G colab
@carlgreener1728
@carlgreener1728 2 года назад
Hi Vik, Thanks for this. I get an error in the for loop stating that the 'list index out of range' for the 'standings_table = soup.select('table.stats_table')[0]' line. I've reviewed against the code in github and there aren't any differences. Can you help please?
@cregv
@cregv 2 года назад
I think it is site security or popups
@Dataquestio
@Dataquestio 2 года назад
This would happen when there is no table in the html you downloaded. You might want to try rendering the html (save it to a file and open it in a browser) to see what the issue is. There could be an issue with rate limiting or another site issue causing problems with the html. -Vik
@lordrahl372
@lordrahl372 2 года назад
I ran into the same issue when attempting more than 2 years of seasons, and it seems to be working if you import the time module and place the following code: "time.sleep(5)" under "soup = BeautifulSoup(data.text)". I think what is happening is the website is blocking us from doing too many requests. Time.sleep(5) delays the scraping process, thus limiting too many requests at once.
@user-cg6os8qt6u
@user-cg6os8qt6u Год назад
@@lordrahl372 thank you so much bro, that code helped to solve this issue.
@adrianbusuttil3012
@adrianbusuttil3012 5 месяцев назад
@@lordrahl372 I did this and worked like a charm - thanks
@stuck3315
@stuck3315 Год назад
Great video and content. All of these have been very helpful for someone new to Python. I did run into an issue with this example and not sure where I went wrong. Tryingt to use match_df = pd.concat(all_matches) gives me a TypeError: cannot concantenate object of type Tried using pd.DataFrame instead and got output to my csv but there are just headers (date, pk, etc) but no data. If i use print(all_matches) prior to the pd.concat or pd.DataFrame command I can see tthe actual data correctly
@Dataquestio
@Dataquestio Год назад
Hi there - I'm guessing the data didn't scrape properly in this case (if it did scrape properly, you'd have data in all_matches). I'd try increasing the value in time.sleep, because the website you're getting data from can return empty tables if you scrape too quickly.
@luvlifereal4023
@luvlifereal4023 5 месяцев назад
@dataquest your web scraping for the premier league after the request and data.text nothing happened i followed your video , or it because im using visual studio code and you use jupiter
@pablosilva10127
@pablosilva10127 2 года назад
Vik, is there any chance you guys could make a path with julia language?
@Dataquestio
@Dataquestio 2 года назад
Hi Pablo - it's something we've thought about. Have you seen job postings that require Julia, or do you use it at work?
@marzm7050
@marzm7050 Год назад
there is no table names scores & fixtures what am i supposed to do now
@tomaszd1875
@tomaszd1875 6 месяцев назад
Hi, great tutorial. just wonder why " table = soup.select("table.stats_table") " returning empty list? when I use index 0 is telling me that list index out of range. it worked well ok until I wanted to scale and finished all the code in tutorial
@tomaszd1875
@tomaszd1875 6 месяцев назад
​ @lordrahl372 thanks for your comment to another user. I am sorted
@AndrewPutraHartanto
@AndrewPutraHartanto 3 месяца назад
​@@tomaszd1875You have solution?
@saifulanwar4394
@saifulanwar4394 11 месяцев назад
why use "/" in team_name, i do that and the result is southampton. please explain about that
@hades7167
@hades7167 4 месяца назад
Hi can u do it for the 2024 table?
@rudraparikh4115
@rudraparikh4115 9 месяцев назад
standings_table = soup.select('table.stats_table')[0] getting list index out of range error. Please help me
@joeguerby
@joeguerby 5 месяцев назад
I got the same issue, it seem that the HTML structure have changed @Dataquestio
@AndrewPutraHartanto
@AndrewPutraHartanto 3 месяца назад
​@@joeguerbyyou have solution with new HTML?
@ryosato7558
@ryosato7558 2 года назад
hi guys, so i was having the no table found error too, and analyzing the code i noticed that the error was in the data.text where the pag was blocking the request code, so i just increased the time sleep by 5 and put another time sleep where we request the shooting dataframe, the code should be very slow but it works, hope it helps!!
@vivekaugustine9583
@vivekaugustine9583 2 года назад
Thank you for that, it has helped me heaps since I found the same problem. How long did the code take to respond?
@Dataquestio
@Dataquestio 2 года назад
Thanks for the solution, Ryo!
@g18ytstar34
@g18ytstar34 Год назад
Sorry, I have this issue too but I don’t understand how to go through it ? Can you help ?
@dhruvmanojpujaristudent590
@dhruvmanojpujaristudent590 11 месяцев назад
Hey VIK, i m getting a indexerror list index out of range in standing_table=soup.select('table.stats_table')[0] in the for loop because im not able to execute it i have tried various things and used the solution provided in the comments section as well can you help me out here?? please.
@alperengul9331
@alperengul9331 3 месяца назад
Did you solve it?
@faisalali-yp7tw
@faisalali-yp7tw 2 года назад
Can you make a tutorial? , how dockrize scarpy +PostgreSQL?
@Dataquestio
@Dataquestio 2 года назад
Hi Faisal - thanks for the suggestion! I'll keep this in mind. - Vik
@mrmason13
@mrmason13 2 года назад
I want this but for streaming football, like create a framework that scraps all the link to stream a single match
@Dataquestio
@Dataquestio 2 года назад
Stream as in recreate a match from text match logs, or stream as in watch a video of the match? You would need a different site if you want to get video.
@olimics9639
@olimics9639 3 месяца назад
My app, says there is something with the url
@hichamelkaissi7786
@hichamelkaissi7786 2 года назад
Thank you for your tutorial.. Unfortunately, I have increased the number of years to 10 and got blocked by the website after scraping just the first year.
@Dataquestio
@Dataquestio 2 года назад
Hi Hicham - that's too bad - upping the delay in between requests with `time.sleep(10)` could help. I may also post a tutorial later about how you can do this with a headless browser framework like playwright.
@hichamelkaissi7786
@hichamelkaissi7786 2 года назад
@@Dataquestio Hello Dataquest! Thank you for taking the time to reply. I think everyone will appreciate a tutorial on a headless browser framework. I tried to use Scraper API. It works for a few iterations but then breaks. I will try to up the sleep time as you mentioned. Thanks again for your time.
@lordrahl372
@lordrahl372 2 года назад
I came back to this tutorial hoping to start continue this webscraping project. I started from scratch in a new notebook so I could understand it better however I am getting these errors: matches = pd.read_html(data.text, match="Scores & Fixtures")[0] ValueError: No tables found or ValueError: No tables found matches = pd.read_html(data.text, match="Scores & Fixtures")[0] At first I thought it was a typo due to my own fault, however I went back to my old notebook file, and I remember I was able to execute the code and create a new file with the merged data. My old notebook file had the same errors ("no tables found"). I even went onto the dataquest github repo and cloned the notebook files for this tutorial. I ran the code and got the same value errors. Not sure what to do at this point and I have been trying to figure it out all day.
@robertooliveira8736
@robertooliveira8736 2 года назад
did you manage to solve the problem?
@lordrahl372
@lordrahl372 2 года назад
@camposI haven't had too much time lately. I ran the code again on Sunday, but it returned the same error. I've been trying to think of a solution while I am doing other things, but unfortunately I can't think of anything except trying a different scraping method other than requests.
@lordrahl372
@lordrahl372 2 года назад
@@robertooliveira8736 Haven't yet, they may have updated the website or something, because it worked a month ago. Strangely retrieving the table by itself outside of the loop works. Unfortunately I am still learning webscraping but I thought about trying out scrapy (another web scraper).
@Dataquestio
@Dataquestio 2 года назад
Hey, sorry to hear about the issue. This will happen if the page if the full page content wasn't scraped. This could happen for a few reasons - the site is down, the site has blocked you, or the content has changed. I think the site may be blocking people. I'll look into this soon, and will try to post a solution. One way around this is in the meantime is to use a headless browser instead of downloading the html with requests. There is a video on how to use a headless browser (playwright) here - ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-SJ7xnhSLwi0.html . -Vik
@robertooliveira8736
@robertooliveira8736 2 года назад
@campos Hello. Thanks for the feedback. I verified that when placing 'sleep(1)' it is blocked generating the Error. so I put the 'sleep(15)', now it runs normally.
@razinchoudhury1368
@razinchoudhury1368 Месяц назад
when i run the code on jupyter lab it was working in the first couple tries, but now i keep getting an error early in the code. for some reason i get a index out of boudns error for the soup.select(table.stats_table) part of the code. it was working perfeclty before and showed all the links and eveyderhitng, and out of nowehre it stopped and i keep getting this error. Can anyoen explain why please? Thanks
@razinchoudhury1368
@razinchoudhury1368 29 дней назад
for those with the same problem change your time.sleep to more seconds
@senerustunerciyas2918
@senerustunerciyas2918 Год назад
How can I get 3 or multiple seasons ?
@robertooliveira8736
@robertooliveira8736 2 года назад
managed to solve the problem that shows? 'ValueError: No tables found' @Dataquest ?
@Dataquestio
@Dataquestio 2 года назад
Hi Roberto - you would see this error if no tables are showing on the original page. It may be because the page isn't working, or you've been blocked. I would check the html you downloaded to ensure that it has tables in it. -Vik
@robertooliveira8736
@robertooliveira8736 2 года назад
@@Dataquestio Hello. Thanks for the feedback. I verified that when placing 'sleep(1)' it is blocked generating the Error. so I put the 'sleep(15)', now it runs normally.
@samdowns4786
@samdowns4786 Год назад
Hi, Great video and very easy to follow. I have followed the code very closely but get the following error when trying to run the for loop. it seems to not like this line: matches = pd.read_html(data.text, match="Scores & Fixtures")[0] and the error reads: ImportError: html5lib not found, please install it I have tried installing the html5lib and then importing it but to no success. I think it is quite a simple thing to fix but I just cannot see it. Any help? Thanks
@principeabel
@principeabel Год назад
In the description of the video comes the project code. If it works for you in the cycle part, put this: time.sleep(10) it takes a long, long, long time, so let it run
@rafaelg8238
@rafaelg8238 2 года назад
Thanks for video. Could you to do video with post method and export file like .csv, .xlsx because there is a little videos that in youtube, please.
@Dataquestio
@Dataquestio 2 года назад
Thanks for the suggestion! I'll look into doing that. -Vik
@manohartanna7423
@manohartanna7423 2 года назад
While typing links to find squad it's showing empty list could u please tell me why
@Dataquestio
@Dataquestio 2 года назад
Hi Manohar - I can't be sure without the full code. But you can look at the example code here to compare - github.com/dataquestio/project-walkthroughs/blob/master/football_matches/scraping.ipynb .
@manohartanna7423
@manohartanna7423 2 года назад
Thank you
@emanuelviola2609
@emanuelviola2609 2 года назад
really nice teaching, sad they changed the shooting stats presentations, I´m thinking on focusing only on the premier league fixtures and shooting stats so i can go through all the video.
@Dataquestio
@Dataquestio 2 года назад
I just checked fbref.com/en/squads/b8fd03ef/2020-2021/matchlogs/all_comps/shooting/Manchester-City-Match-Logs-All-Competitions , and it looks like the shooting stats are working again!
@werty7099
@werty7099 2 года назад
@@Dataquestio those are for the 2020-2021 season, the 2021-2022 ones are still not there :( thanks for a great video though
@svenwitte2503
@svenwitte2503 Год назад
Can you help me, i got stuck on errors
@sayantanighosh1493
@sayantanighosh1493 8 месяцев назад
Hello sir i am getting error in the line "standings_table = soup.select('table.stats_table')[0]".. The error is stating that list index out of range..please help me out
@xyz-gn6jy
@xyz-gn6jy 5 месяцев назад
did you find any solution
@AndrewPutraHartanto
@AndrewPutraHartanto 3 месяца назад
​@@xyz-gn6jyyou have solution?
@emilioariaschavez6348
@emilioariaschavez6348 Год назад
What can I Do when I have an index error: list of index out of range even if I have the same code
@pratyushshankar9613
@pratyushshankar9613 Год назад
same error. did you find a fix?
@mikiemax1
@mikiemax1 8 месяцев назад
same error here too
@AndrewPutraHartanto
@AndrewPutraHartanto 3 месяца назад
​@@pratyushshankar9613you have solution?
@AndrewPutraHartanto
@AndrewPutraHartanto 3 месяца назад
​@@mikiemax1you have solution?
@dcr7417
@dcr7417 2 года назад
Hi, thanks for this video! As others have mentioned - great teaching style. I'm getting an error with the final for loop. It's something to do with: matches = pd.read_html(data.text, match="Scores & Fixtures")[0] or shooting = pd.read_html(data.text, match="Shooting")[0] I get this error: ValueError: No tables found Anyone got any ideas?
@principeabel
@principeabel 2 года назад
No, I also get that error
@Dataquestio
@Dataquestio 2 года назад
This would happen when you don't get any data back from the server. I've heard about some issues people have when the time.sleep() is too short. If there are too many requests too quickly, the server will stop returning results. Try changing it to time.sleep(10) to pause longer between requests. That might fix it. -Vik
@SkyyJames7
@SkyyJames7 2 года назад
@@principeabel yeah I’m getting the same errors too. I tried to add another time.sleep under shooting but it isn’t working. The website may have changed something
@viniccciuz
@viniccciuz Год назад
I was having the exact same error, increasing time.sleep() to 10 worked for me
@syedmohammadabudarda7598
@syedmohammadabudarda7598 2 года назад
Where is the part 2?
@Dataquestio
@Dataquestio 2 года назад
See this video for part 2 ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-0irmDBWLrco.html
@olimics9639
@olimics9639 3 месяца назад
so it doesn't proced
Далее
Microsoft Power BI Intro: Exploring FIFA Stats
42:05
Просмотров 15 тыс.
I'm Excited To see If Kelly Can Meet This Challenge!
00:16
Always Check for the Hidden API when Web Scraping
11:50
BeautifulSoup + Requests | Web Scraping in Python
6:58
Web Scraping with ChatGPT is mind blowing 🤯
8:03
Просмотров 41 тыс.