Тёмный

Solving real world data science tasks with Python Beautiful Soup! (movie dataset creation) 

Keith Galli
Подписаться 222 тыс.
Просмотров 283 тыс.
50% 1

Data is everywhere! Enhance your career and acquire new skills by taking a course on DataCamp! Click here to take the first chapter of any course for FREE: bit.ly/36lKg44 (you’ll be supporting my channel too!)
In this video we scrape Wikipedia pages to create a dataset on Disney movies.
The video is formatted with tasks for you to try to solve on your own throughout. For the best learning experience, at each task you should pause the video, try the task on your own, and then resume when you want to see how I would solve it.
We cover a wide range of Python & data science topics in this video. They include:
- Web scraping with BeautifulSoup
- Cleaning data
- Testing code with Pytest
- Pattern matching with regular expressions (Re library)
- Working with dates (datetime library)
- Saving & loading data with Pickle library
- Accessing data from an API using Requests library
Link to code & datasets: github.com/KeithGalli/disney-...
Previous tutorial on Beautiful Soup: • Comprehensive Python B...
If you enjoyed this video, make sure to like & subscribe :)
This video was sponsored by DataCamp
---------------------
Video timeline!
0:00 - Video overview
1:58 - Check out DataCamp! (sponsored)
3:12 - Setup
Task #1: Scrape the infobox from Toy Story 3 wiki page (save in python dictionary) (4:24)
Link: en.wikipedia.org/wiki/Toy_Sto...
Task #2: Scrape infobox for all movies in List of Disney Films (save as list of dictionaries) (28:52)
Link: en.wikipedia.org/wiki/List_of...
30:30 - Robots.txt (Are you allowed to scrape a site?)
32:52 - Task #2: Scrape infobox for all movies in List of Disney Films (save as list of dictionaries)
57:27 - Save & Load dataset checkpoint (JSON file)
Task #3: Clean our data! (1:02:04)
1:09:28 - Task #3.1: Strip out all references ([1],[2],etc) from HTML
1:16:39 - Task #3.2: Split up the long strings
1:25:02 - Task #3.3: Examine errors we are getting
1:30:27 - Task #3.4: Convert “Running time” field to an integer
1:44:57 - Task #3.5: Convert “Budget” & “Box office” fields to floats
2:33:53 - Task #3.6: Convert dates into datetime objects
2:47:36 - Saving our data again (using Pickle)
Task #4: Attach IMDB, Metascore, and Rotten Tomatoes scores to dataset (working with APIs) (2:53:18)
Task #5: Save final dataset as a JSON file and as a CSV file (3:13:48)
---------------------
Extra resources!
Setup Jupyter notebook: jupyter.readthedocs.io/en/lat...
Google Colab (cloud-based notebook): colab.research.google.com/
Learn regular expressions: • Python Tutorial: re Mo...
Practice your Python Pandas data science skills with problems on StrataScratch!
stratascratch.com/?via=keith
Join the Python Army to get access to perks!
RU-vid - / @keithgalli
Patreon - / keithgalli
---------------------
Follow me on social media!
Instagram | / keithgalli
Twitter | / keithgalli
If you are curious to learn how I make my tutorials, check out this video: • How to Make a High Qua...
*I use affiliate links on the products that I recommend. I may earn a purchase commission or a referral bonus from the usage of these links.

Опубликовано:

 

4 июл 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 312   
@KeithGalli
@KeithGalli 3 года назад
Hey everyone! Been a while, but happy to be back 😊. Spent a while putting this video together so hope you enjoy it! For more great learning resources (and to support my channel in the process), be sure to check out DataCamp. Click here to take the first chapter of any course for FREE: bit.ly/36lKg44 As always if you have any questions or suggestions for future videos, feel free to let me know here in the comments!
@spiritedaway99
@spiritedaway99 3 года назад
Welcome back,I enjoyed your video and learned a lot,thank youuu 💛keep up the good work💛
@KeithGalli
@KeithGalli 3 года назад
@@spiritedaway99 glad to hear it! Thanks for the kind words :)
@sodiqkayode5857
@sodiqkayode5857 3 года назад
Hi Keith, I've been on this bs4 project but navigating through "load more" button has been a great challenge. Checked stack overflow and web, more solutions ain't working while others suggested selenium which I don't wanna use (if that's the only solution, I will). How have you been able to get more data embedded in load more button?
@rishabhnegi6941
@rishabhnegi6941 3 года назад
could anyone help me in writing description for this project with all the library mentioned, so that I can put it on my resume.
@luciekovarikova4481
@luciekovarikova4481 3 года назад
Hi! :) Just want to say a huge thank you! I have found your videos as a total beginner, so thanks to this comment, I found DataCamp. Courses there are just great, and students have 3 months trial for free with GitHub!
@thesocialcow1631
@thesocialcow1631 2 года назад
I am 24 and unemployed and desperately applying for jobs while daily watching Keith's videos and working on projects to shape up my CV. I don't know when will I get the breakthrough but I will definitely appreciate the honest work by Keith. I am falling in love with Data Science with each passing day. I hope someday I will be able to enter a job in Data Science. Much Love and Peace.
@KeithGalli
@KeithGalli 2 года назад
I wish you the best of luck! Keep at it, you'll get the breakthrough soon enough.
@eugenemadu3932
@eugenemadu3932 2 года назад
Same here. I'm 24 and I can't get a job with the course I studied in the university. Then I stumbled into data science and analytics. Now I know how to use several softwares. But I'm yet to apply them to real life jobs.
@tuandino6990
@tuandino6990 2 года назад
Keep that attitude, you'll get there. I'm a marketer which means i need to work with data everyday and keith's videos teach me a lot.
@ukasz8631
@ukasz8631 2 года назад
and how it's going so far bruh?
@parzynamea4701
@parzynamea4701 2 года назад
@@ukasz8631 mordeczko zajety jest dostal prace
@aaronschwartzman5486
@aaronschwartzman5486 3 года назад
Taking a data science bootcamp. I learned so much more from you than my program. They only explain concepts. You explain your thought process which is much more valuable. You’re a true blessing! Thank you so much!
@malikmudassarawan
@malikmudassarawan 2 года назад
Indeed
@rockboyjazz
@rockboyjazz 3 года назад
Probably one of the best data cleaning and analysis tutorials on youtube. Clean and concise, straight to the point.
@MALAYAPH24
@MALAYAPH24 3 года назад
Awesome Keith! I can't thank you enough for sharing your knowledge and skills to us at no cost. God bless you and more power!
@adityatile1168
@adityatile1168 3 года назад
Heyy, I really love the way you also show the behind the scenes process, it really teaches a lot and separates your tutorials from typical tutorials :)
@dritslem4711
@dritslem4711 2 года назад
You have seriously pushed me to a whole new level with Python. Thank you for your great videos and resources, man. Writing my thesis this spring, and your help has made me efficient enough to actually rely on Python for my data crunching.
@Schoolboyfrm5th
@Schoolboyfrm5th 3 года назад
for months i have been struggling in how to structure my learning journey through projects and ive finally found you, THANKS MAN
@investandcyclecheap4890
@investandcyclecheap4890 2 года назад
Thanks for taking the time to make a scraping step by step video. Many other python videos exist but I love how you take the time to stop and explain things step by step. That has helped my learning journey so much! Really appreciate your hard work. Keep it up !
@shaharrefaelshoshany9442
@shaharrefaelshoshany9442 3 года назад
Best instructor ever. Dude your lecturing skills are priceless. Amazing content ❤
@kennethodhiambo1803
@kennethodhiambo1803 2 года назад
Very long, very educative video here Keith. I have watched and followed along over a period of 2 weeks and its been worth my while. As a newbie in to the Data Science and Engineering world, I truly appreciate the work you put into these videos. Thanks a lot Keith
@harryfeng4199
@harryfeng4199 2 года назад
i just want to say all ur vids has been sooo helpful. honestly. thank YOU
@bobbrendel758
@bobbrendel758 2 года назад
So much value in each video. You definitely found your niche.
@azrmuradl6420
@azrmuradl6420 2 года назад
Can't think of any video better than this! Keep it up, man, we need these videos!
@ebohp1859
@ebohp1859 3 года назад
Man you are gold! Thank you!
@sagarmgandhi
@sagarmgandhi 2 года назад
Well structured tutorial with live bug solving !! This is actually what anybody should refer to !!
@nehachouhan3398
@nehachouhan3398 Год назад
Thanks alot Mr Keith Galli.......Your videos are extremely good.
@rangavembar
@rangavembar 3 года назад
Phew!! That was one incredible tutorial!! A BIG THUMBS UP!!
@rakeshk.9855
@rakeshk.9855 3 года назад
Hi Keith, I am a final semester MS Data Science student and I really loved how well you explained complex tasks. You searching for code help in google to solve the problem made the videos more realistic and relatable. Thank you for sharing. Subscribed!
@nawid1687
@nawid1687 3 года назад
Dude your channel is a mine of gold
@prasadcaher
@prasadcaher 3 года назад
Hi Keith, Thanks for making this video, made it much easy to understand the flow and practical use of bs4 for gathering data. Great work.
@mechwarrior83
@mechwarrior83 2 года назад
Amazing! thank you for putting this out there, cannot wait to follow along.
@monagulapa3022
@monagulapa3022 3 года назад
Thank you so much. This is a very big help in understanding the step-by-step data creation.
@fabianrestrepo82
@fabianrestrepo82 3 года назад
Dude your tutorials are awesome, keep it up!!!
@maliksaraanasim5404
@maliksaraanasim5404 2 года назад
Bro , Love the way you Talk to yourself when figuring out a problem because that gives me an insight on how a professional thinks! Secondly the thing that you dont crop out the parts when you run into problems is veryyy helpful . Thanks man!
@maloof2826
@maloof2826 2 года назад
First time got to know Regular expression is such a powerful tool !! Thanks a ton!!
@MashiroRedo
@MashiroRedo 3 года назад
A/B Testing! This would be a great next project, please do this
@KeithGalli
@KeithGalli 3 года назад
Will look into it!
@MashiroRedo
@MashiroRedo 3 года назад
Keith Galli yo thanks a ton! Been learning through you for a couple months now and I’ve improved so much.
@KeithGalli
@KeithGalli 3 года назад
@@MashiroRedo You're very welcome! Happy to hear the videos have been helpful :)
@akshaykumarsingh9770
@akshaykumarsingh9770 3 года назад
@@KeithGalli please do A/B testing project bro
@andyn6053
@andyn6053 3 года назад
@@akshaykumarsingh9770 yes I would love to see a real world A/B test project. Also the Seaborn library would be a nice tutorial.
@susanwowe7810
@susanwowe7810 2 года назад
Love to watch this video and have learned a lot from you. Thank you so much for your kind work. You shine! 🙌
@russellmcbride5435
@russellmcbride5435 3 года назад
Brilliant content and presentation style, Keith. I got everything working except extracting the API key from the Environment variable (ended up hard coding which worked). Thanks again!
@ProBloggerWorld
@ProBloggerWorld 2 года назад
Watching him feels like pair programming with a good friend. 👍🏻👍🏻👍🏻
@vivekambastha2273
@vivekambastha2273 3 года назад
This is amazing magic Keith. Your tutorials helping me to become pro in Python and keeping me ahead of many of my friends. Thanks a ton SIR...
@BboyKeny
@BboyKeny 2 года назад
1. Get a head start 2. Flex on your friends by sharing this video and your results 3. Increase Keith's ad revenue 4. Becoming elite data scientists with your friends
@muhammadrehan3030
@muhammadrehan3030 3 года назад
Oh great. Thank you very much for another great video. I was just stoping learning Data Science then I watched your videos on Pandas and Numpy. Thank you for bringing me back to Data Science.
@sudarshangupta9066
@sudarshangupta9066 3 года назад
Thank you Keith. I am a beginner and I really enjoyed this. Keep up the good work.
@paulntalo1425
@paulntalo1425 3 года назад
Thank for such an easy to follow along video. Its my first lesson in beautiful soup
@ryunosukefuriya3748
@ryunosukefuriya3748 3 года назад
I don't deserve this. Thank you, Keith. I'll work on a little each day and finish by end of October lol.
@KeithGalli
@KeithGalli 3 года назад
Haha that seems like a good plan to me! The video isn't going anywhere :)
@Amir-gt7xn
@Amir-gt7xn 3 года назад
Thanks man! Learned a lot! Keep it up!
@olajiireolajide
@olajiireolajide 2 года назад
Keith, you are the GOAT
@Trazynn
@Trazynn 3 года назад
Exactly what I was looking for.
@gisleberge4363
@gisleberge4363 3 года назад
Thanks. Following you through the whole process is great learning experience, especially when you stop solving the little bugs underway which always will be there in webscraping. Well explained, structured, realistic and a gift for people who want to learn the topic - keep up the good work!
@KeithGalli
@KeithGalli 3 года назад
Thank you for the kind words! Glad you enjoyed :)
@jaysonnolastname1260
@jaysonnolastname1260 Год назад
Wow, great video. Thank you.
@BTB25647
@BTB25647 Год назад
Hey Keith.....Thanks for putting in this effort. It is truly appreciated.
@jonpounds1922
@jonpounds1922 3 года назад
So informative, just absolutely love to see it
@KeithGalli
@KeithGalli 3 года назад
Thank bro!! 🙏
@Ndofi
@Ndofi 3 года назад
Thanks for your sharing, which indeed have been adding value to my data science carrier...
@harishkumarp5712
@harishkumarp5712 3 года назад
kudos to your efforts!
@rahulvijay6611
@rahulvijay6611 2 года назад
I really learn so much from you, please keep making all this amazing video
@vikranttyagiRN
@vikranttyagiRN 3 года назад
An Absolutely amazing video. Followed it till the end and learned alot. Keep them coming please
@MewGulf13
@MewGulf13 Год назад
Thank you so much for this. I took my time on this and have learned so much from this project. I really appreciate it!
@ferrofaza3417
@ferrofaza3417 3 года назад
Hi Keith, I am a student currently in college studying geophysics. I just want to say thank you very much for your videos it helps me a lot. Big love from Indonesia. Again, thank you. (also I know this is really specific to my range of study, but I would love to learn 3D data modelling with python)
@NTC
@NTC 3 года назад
oh, man! you made my day - though I've just completed Task 1!
@Glaszg
@Glaszg 3 года назад
Thanks for this!👏🏻
@JMatthews
@JMatthews Год назад
Such a great job Keith, really appreciate how you explain things in such a cool manner and in the most practical manner you do the very regular Google search, so that novices like myself can understand and relate very well. Your command in Python is commendable, Good job, keep it p and god bless.
@guitarparamount8575
@guitarparamount8575 3 года назад
Thank you for these videos Keith. Just finished my masters in physics and wanted to brush up on some python, your video style, coding expertise and enthusiastic approach is top-notch. SUBSCRIBED! **
@KeithGalli
@KeithGalli 3 года назад
Thanks for subscribing! I appreciate the kind words :)
@varunv8420
@varunv8420 3 года назад
Thanks for such a wonderful video. This is very helpful
@101touchapps
@101touchapps Месяц назад
a couple of years ago (5 years) i did a pokemon pandas tutorial from you and it totally got me into the world of data science. i came back to say thanks for that tutorial. It really helped me. now am a python instructor.
@KeithGalli
@KeithGalli Месяц назад
Love it!! Thanks for the message. Glad I could play a small part in your journey. Congrats on being an instructor now.
@riptorforever2
@riptorforever2 2 года назад
Thanks for this fantastic leasson. It's so much useful a long video focus on solving real problem, besides the 'tutorial libs' videos. thanks again!
@souravsanyal6183
@souravsanyal6183 3 года назад
Good project...Keith. Loved it very much. More expected.
@nawid1687
@nawid1687 3 года назад
You are amazing man keep up the great work
@aflah7572
@aflah7572 3 года назад
Awesome tutorial thank you!
@shoaibsh2872
@shoaibsh2872 3 года назад
What a perfect timing. I was thinking of the same thing but instead of movie I wanted to scrape game information. I was looking for RU-vid to find something. And you just uploaded it in a perfect time. Thanks dude 😀
@KeithGalli
@KeithGalli 3 года назад
Haha that's awesome! Hope you enjoy it :)
@julianrestrepo1270
@julianrestrepo1270 3 года назад
Hi! Thanks for this great video, i've been looking at your python tutorials and they are great!, thank You so much. Regards from Colombia Latam.
@SoBasicCat
@SoBasicCat 3 года назад
I love learning from please keep it up your videos are very very fruitful thank you a lot !!
@qwerty-6573
@qwerty-6573 3 года назад
Thank you so much! Now I have the confidence to do projects on my own, you changed my life. It would be great if you could do videos on Tableau! :)
@keivanipchihagh2115
@keivanipchihagh2115 3 года назад
Awesome video!! Thanks for the effort.
@jobanchauhan7596
@jobanchauhan7596 Год назад
I am happy for you that back, I watched all new videoes , I am waiting specially for this dataset analysis.....
@emmavpgh
@emmavpgh 7 месяцев назад
you a blessing bro, thank you
@KeithGalli
@KeithGalli 6 месяцев назад
Thank you for the kind words!
@12_vermouth_12
@12_vermouth_12 3 года назад
A very big thumbs up for you! Totally enjoyed it! Thank you so much for your effort!
@KeithGalli
@KeithGalli 3 года назад
You're very welcome! Glad you enjoyed :)
@swarupsarangi734
@swarupsarangi734 3 года назад
Awesome
@python360
@python360 3 года назад
Great tutorial man, love how you find content_key and and use as the key in the dict.
@michaelmolter8828
@michaelmolter8828 2 года назад
Really like this messy, real world, non-toy example. All my scrapping project get messy quick with dirty looking edge case handling. Glad to know I’m not actually doing it wrong, it’s apparently just part of the process.
@samuelmolano2140
@samuelmolano2140 3 года назад
yessssssss!!! Glad to see your videos man, they have really helped me a lot. Keep it going!
@KeithGalli
@KeithGalli 3 года назад
Happy to hear that :). More to come!
@milesmena4994
@milesmena4994 2 года назад
This is great! I was testing my scraping skills on some wikipedia pages, but I couldn't find anything as rich as this. Thanks, I will enjoy watching this.
@KeithGalli
@KeithGalli 2 года назад
Glad it was helpful!
@vgodoyd
@vgodoyd 3 года назад
Very good video! Thanks a lot. Greetings from Chile!
@nsnilesh604
@nsnilesh604 3 года назад
Thanks for sharing something good which help me to improve skills 🙂
@ahmedelsheikh9912
@ahmedelsheikh9912 2 года назад
Extremely 🥇 wonderful, Go ahead…
@maulanabhara4448
@maulanabhara4448 3 года назад
Data automation & Scheduling ! This would be a great next project, please do this keith
@KeithGalli
@KeithGalli 3 года назад
Yeah I agree something to do with automation/scheduling is a great idea. I'll look into it!
@mathguy3801
@mathguy3801 3 года назад
Excellent video. I enjoyed every minute of it and look forward to the next one. Thanks for your hard work!
@KeithGalli
@KeithGalli 3 года назад
Glad you enjoyed and you are very welcome!!
@wernergomindes2570
@wernergomindes2570 3 года назад
Thanks man
@moayadelrashed1656
@moayadelrashed1656 2 года назад
Thanks so much
@amanthapa3066
@amanthapa3066 3 года назад
Was waiting for your video so long.
@KeithGalli
@KeithGalli 3 года назад
Well hope you enjoy it now that I finally posted it!! :)
@gisleberge4363
@gisleberge4363 3 года назад
Very clever what you did with the currency conversion. Had to twist my head around a few time before i got the hang of it 🙂
@phanphancharee4838
@phanphancharee4838 Год назад
Why I didn’t find this channel earlier.❤️❤️❤️❤️❤️
@shahzan525
@shahzan525 3 года назад
I don't know , but I learned a lot from your videos ..... Thanks keith
@KeithGalli
@KeithGalli 3 года назад
Happy to hear you learned a lot from the videos!
@alenjose3903
@alenjose3903 3 года назад
🙌🏻 looking forward to more of these data science works
@KeithGalli
@KeithGalli 3 года назад
Glad to hear it! :)
@alenjose3903
@alenjose3903 3 года назад
Keith Galli your sales analysis video is one of my favs. 💯❤️
@Glaszg
@Glaszg 3 года назад
Hey Keith, please do SQL tutorial!
@ritwaj6791
@ritwaj6791 3 года назад
Hey Keith, learning from your project tutorials for a month than any other online course. They are really really helpful. Can you make an intro tutorial on SQL and project tutorials too...?
@user-qy9ys7ux6v
@user-qy9ys7ux6v 3 года назад
That is great thanks!
@amoghht5655
@amoghht5655 3 года назад
3 hours well spend ,thank you @Keith Galli
@digigoliath
@digigoliath 3 года назад
Awesome!!! TQVM!!
@KeithGalli
@KeithGalli 3 года назад
You're very welcome!!!
@abdoulayeleye5399
@abdoulayeleye5399 3 года назад
good it's a very real python web scraping. nice job
@KeithGalli
@KeithGalli 3 года назад
Thank you!
@hilmikilickaya3528
@hilmikilickaya3528 3 года назад
Thank you for sharing! You are amazing. It would be great if you make videos about docker, and spark.
@shiv9475
@shiv9475 3 года назад
Nice , Your work is really great ,i want more videos like this!!!!!!!
@KeithGalli
@KeithGalli 3 года назад
Thank you! Hope to make more videos like this fairly soon
@josephvijay1199
@josephvijay1199 3 года назад
Bro Really Superb, I really learned pandas because of you your realtime problem solving helped me a lot , put lot more videos , great job continue it ❤️❤️❤️❤️
@omghadge9309
@omghadge9309 Год назад
awesome content keith , learned a lot ! Keep such videos coming !
@aditisrivastava7079
@aditisrivastava7079 3 года назад
Hey Brother I just love you for the amazing work....keep it up
@KeithGalli
@KeithGalli 3 года назад
I appreciate the support 😊
@yacineyoussoufrachidmohame5764
@yacineyoussoufrachidmohame5764 3 года назад
We are infinitely indebted to you. Thanks for sharing this wonderful content 🙏🏽🔥.
@sebastianalvarez1537
@sebastianalvarez1537 3 года назад
this guy is a beast
@KeithGalli
@KeithGalli 3 года назад
🤠
@raoulgz4578
@raoulgz4578 3 года назад
You are my hero
@nghiepcrypto7034
@nghiepcrypto7034 3 года назад
it gonna very exciting if you do an analysis report of Marketing Analytics. Thank you for making this video
@mmarva3597
@mmarva3597 3 года назад
Wonderful merci beaucoup !!! bon travail
@KeithGalli
@KeithGalli 3 года назад
de rien! glad you enjoyed :)
@wiz8058
@wiz8058 3 года назад
you always doing great bro.
@KeithGalli
@KeithGalli 3 года назад
Thank you! Always appreciate the support
@wiz8058
@wiz8058 3 года назад
Keith Galli you welcome bro and always appreciate your valuable videos you putting out here for free. Thanks very much.
Далее
Data Analysis with Python for Excel Users - Full Course
3:57:46
Solving real world data science tasks with Python Pandas!
1:26:07
Programming Is NOT Enough | Add these 7 skills…
13:19