Тёмный

Introduction to Web Scraping with Python and Beautiful Soup 

Data Science Dojo
Подписаться 106 тыс.
Просмотров 1,5 млн
50% 1

Web scraping is a very powerful tool to learn for any data professional. With web scraping the entire internet becomes your database. In this tutorial, we show you how to parse a web page into a data file (CSV) using a Python package called BeautifulSoup.
In this example, we web scrape graphics cards from NewEgg.com.
Find the updated version of this tutorial here: • Web Scraping Tutorial ...
Python Code:
code.datasciencedojo.com/data...
Sublime:
www.sublimetext.com/3
Anaconda:
www.anaconda.com/distribution...
JavaScript beautifier:
beautifier.io/
If you are not seeing the command line, follow this tutorial:
www.tenforums.com/tutorials/7...
Table of Contents:
0:00 - Introduction
1:28 - Setting up Anaconda
3:00 - Installing Beautiful Soup
3:43 - Setting up urllib
6:07 - Retrieving the Web Page
10:47 - Evaluating Web Page
11:27 - Converting Listings into Line Items
16:13 - Using jsbeautiful
16:31 - Reading Raw HTML for Items to Scrape
18:34 - Building the Scraper
22:11 - Using the "findAll" Function
27:26 - Testing the Scraper
29:07 - Creating the .csv File
32:18 - End Result
--
At Data Science Dojo, we believe data science is for everyone. Our data science trainings have been attended by more than 10,000 employees from over 2,500 companies globally, including many leaders in tech like Microsoft, Google, and Facebook. For more information please visit: hubs.la/Q01Z-13k0
💼 Learn to build LLM-powered apps in just 40 hours with our Large Language Models bootcamp: hubs.la/Q01ZZGL-0
💼 Get started in the world of data with our top-rated data science bootcamp: hubs.la/Q01ZZDpt0
💼 Master Python for data science, analytics, machine learning, and data engineering: hubs.la/Q01ZZD-s0
💼 Explore, analyze, and visualize your data with Power BI desktop: hubs.la/Q01ZZF8B0
--
Unleash your data science potential for FREE! Dive into our tutorials, events & courses today!
📚 Learn the essentials of data science and analytics with our data science tutorials: hubs.la/Q01ZZJJK0
📚 Stay ahead of the curve with the latest data science content, subscribe to our newsletter now: hubs.la/Q01ZZBy10
📚 Connect with other data scientists and AI professionals at our community events: hubs.la/Q01ZZLd80
📚 Checkout our free data science courses: hubs.la/Q01ZZMcm0
📚 Get your daily dose of data science with our trending blogs: hubs.la/Q01ZZMWl0
--
📱 Social media links
Connect with us: / data-science-dojo
Follow us: / datasciencedojo
Keep up with us: / data_science_dojo
Like us: / datasciencedojo
Find us: www.threads.net/@data_science...
--
Also, join our communities:
LinkedIn: / 13601597
Twitter: / 1677363761399865344
Facebook: / aiandmachinelearningfo...
Vimeo: vimeo.com/datasciencedojo
Discord: / discord
_
Want to share your data science knowledge? Boost your profile and share your knowledge with our community: hubs.la/Q01ZZNCn0
#webscraping #python #beautifulsoup

Опубликовано:

 

1 авг 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 1,8 тыс.   
@muhammadisrarulhaq9052
@muhammadisrarulhaq9052 5 лет назад
I was able to make a program for my client i never thought was possible. I got paid real money for this. Blessings so much learned, this is like magic
@GamingTechSnips
@GamingTechSnips 4 года назад
Can you tell me hoow much time did it take? And is it recommended for a Uni Student to make it as a semester project?
@brandonhirdler
@brandonhirdler 4 года назад
@@GamingTechSnips Depends on your skill as a programmer
@utkarsh1874
@utkarsh1874 4 года назад
@@GamingTechSnipsLess than a week even when you have zero background knowledge
@maniafranzio3023
@maniafranzio3023 4 года назад
Indox Me If You Can!!! I Need some tips from you..
@burinome
@burinome 4 года назад
Damn you're lucky, my client paid me fake money. smh
@harsh3305
@harsh3305 5 лет назад
MINOR SUGGESTION As of 10/03/2019, If you are following along this tutorial. "container.div" won't give you the div with the "item-info" class. Instead it will give you the div with the "item-badges" class. This is because the latter occurs before the former. When you access any tag with the dot(.) operator, it will just return the first instance of that tag. I had a problem following this along until i figured this out. To solve this just use the "find()" method to find exactly the div which contains the information that you want. For e.g. divWithInfo = containers[0].find("div","item-info")
@vincentn2059
@vincentn2059 5 лет назад
Thank you. Can't express how helpful this was and unlocked everything for me. Only part I wasn't understanding. Thank you
@johntimo8570
@johntimo8570 5 лет назад
Thanks for the tip!
@DR-fn3yv
@DR-fn3yv 5 лет назад
Okay, but how do we further navigate into more embedded items? I'm trying to pull out 'title' out of the a class within item-branding, but doesn't work.
@vincentn2059
@vincentn2059 5 лет назад
D R have you used find and/or the findall method? Doing a couple of searches on google and stack overflow helped me get further into methods. Also, do you know basic html?
@DR-fn3yv
@DR-fn3yv 5 лет назад
@@vincentn2059 Yeah, I'm using the find Method. I've looked in quite a few places but can't find the information I need. I'm at the part in the video where he uses container.div.div.a.img. I've used containers[0].find('div', 'item-info') which works correctly but now I am stuck at the part where I have to navigate further to pull out the information I need.
@Tocy777isback0414
@Tocy777isback0414 4 года назад
It's weird to think about it like that, but this video started my whole Python learning back in 2017 and I am SO SO SO much thankful for it.
@chinzzz388
@chinzzz388 4 года назад
How good are you at python now? Just wondering how much progress one can make in 3 years
@DragonRazor9283
@DragonRazor9283 3 года назад
yes, please update us now!
@Tocy777isback0414
@Tocy777isback0414 3 года назад
@@DragonRazor9283 So the projects I have done by now are: Web scraping sec.gov xml files, converting them to excel, inserting them into SQL database, I have built a dynamic website around this in Flask (Python library). I have expanded my web scraping to sites that provide data in JSON which usually contains more data than it is available on the website directl and this way its more speed efficient, I have moved all this to pythonanywhere where I haven and FTP server as well and have automated tasks which run every hour/day. My main field is still web scraping but now I can run SQL queries with pyhton and display them as well. That is to say, I have learned all this in my free time after work.
@Tocy777isback0414
@Tocy777isback0414 3 года назад
@@chinzzz388 Sorry I didn't see your comment somehow. So the projects I have done by now are: Web scraping sec.gov xml files, converting them to excel, inserting them into SQL database, I have built a dynamic website around this in Flask (Python library). I have expanded my web scraping to sites that provide data in JSON which usually contains more data than it is available on the website directl and this way its more speed efficient, I have moved all this to pythonanywhere where I haven and FTP server as well and have automated tasks which run every hour/day. My main field is still web scraping but now I can run SQL queries with pyhton and display them as well. That is to say, I have learned all this in my free time after work. This earned me a new position at my company which doubled my pay. This earned me a new position at my company which doubled my pay.
@chinzzz388
@chinzzz388 3 года назад
@@Tocy777isback0414 that is amazing my man!! Congrats and keep grinding :)
@Datasciencedojo
@Datasciencedojo 5 лет назад
Table of Contents: 0:00 - Introduction 1:28 - Setting up Anaconda 3:00 - Installing Beautiful Soup 3:43 - Setting up urllib 6:07 - Retrieving the Web Page 10:47 - Evaluating Web Page 11:27 - Converting Listings into Line Items 16:13 - Using jsbeautiful 16:31 - Reading Raw HTML for Items to Scrape 18:34 - Building the Scraper 22:11 - Using the "findAll" Function 27:26 - Testing the Scraper 29:07 - Creating the .csv File 32:18 - End Result
@prem.sagar.m
@prem.sagar.m 4 года назад
Hi .how can we scrape if the web page is single page app
@alexzhang5816
@alexzhang5816 4 года назад
Thank you for the tutorial, however I am not able to get all the list, it only prints one result so its not looping all the containers. Can you please help me out? containers = page_soup.findAll("div",{"class":"item-container"}) for container in containers: brand_description = container.a.img["title"] price_box = container.findAll("li",{"class":"price-current"}) price = price_box[0].strong.text print("brand_description:" + brand_description) print("price:" + price)
@ninananou7603
@ninananou7603 4 года назад
@Data Science Dojo svp document pdf ou siteweb
@greysonnewton6284
@greysonnewton6284 4 года назад
I am trying to scrape the prices off of new-egg's website. the price is nested within price=container.findAll ('ul' , {'class' : 'price'}), where I call: price[0].li.span ---> I dont get an output. when I call: price[0].li.span.text ---> I get an attribute non-existent error. How would I scrap the price in this new-egg example? Also, the current price they have wrapped within a 'strong' tag that is within a span class. How would we scrape this?
@syaifullaffandi
@syaifullaffandi 2 года назад
Thx
@arjoon
@arjoon 6 лет назад
This was really good content, definitely the best intro to web scraping I've seen. You don't go through it as though you're reading from the documentation, there's more of a flow.
@delt19
@delt19 5 лет назад
Coming from an R user, this is a very well done introductory tutorial into web scraping in Python. I like the real world example with Newegg and troubleshooting along the way.
@YasarHabib
@YasarHabib 5 лет назад
This was by far the best introduction to web scraping I've found online. Clear, concise, and easy to digest. Thank YOU!
@pdubocho
@pdubocho 6 лет назад
The man, the myth, the legend. You have no idea how much stress and lost time you have prevented. THANK YOU!
@evanzhao3887
@evanzhao3887 5 лет назад
If you had some prior experiences with web crawling, this video can makes your crawling skills into a whole new level. Allows you to crawl website containing complicated info about multiple items into a very organized dataset. The various tools introduced in the video are also fantastically helpful as well. A BIG THANK YOU
@EustaceKirstein
@EustaceKirstein 5 лет назад
32:30, I started cheesing at how awesome the end result of this whole project was. Definitely inspiring - thank you for the excellent guide!
@saadiyafourie
@saadiyafourie 5 лет назад
Absolute champion, quite possibly the best code tutorial I've ever watched. Oh the possibilities! Thank you :)
@andreabtahi9519
@andreabtahi9519 5 лет назад
I am just starting web scrapping and I can honestly say that this video clearly explained everything. I watched this at 1.5 speed and it made sense. I would love more videos like this. I loved how you made it generic so it can apply to more than one website!
@christophedamour6919
@christophedamour6919 6 лет назад
A BIG BIG THANK YOU: the most understable tutorial I've ever seen on how to scrape a web page (and I have visionned like 100 of them)
@Jack258jfisodjfjc
@Jack258jfisodjfjc 5 лет назад
you look like a god when your writing multiple lines at the same time.
@devendravijay1303
@devendravijay1303 5 лет назад
One of the best teacher I have come across RU-vid. Web Scraping explained so well that even a layman can follow and understand the basic concepts. I wish, in life I had a teacher/mentor/friend like the one teaching in this video.
@brendanp9415
@brendanp9415 4 года назад
This is the best web scraping tutorial that I’ve found. I’ve been frustrated for hours trying to use other resources. Thank you for making this, your explanations are thorough and great!
@frozy3155
@frozy3155 4 года назад
wow even almost 3 years later this video helped me so much and helped me to make a program that picks a random steam game, this was so hard, but i figured it out, big props to you and this video
@Datasciencedojo
@Datasciencedojo 2 года назад
Hello everyone, find the updated version of this tutorial here: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-rlR0f4zZKvc.html
@adamhemeon734
@adamhemeon734 2 года назад
Two years into a web program and a year working in the field and never bothered to learn how to do this. Great video, I followed along 5 years later in 2022 with Python 3.7.8 and it still works.
@VIK2000GEV
@VIK2000GEV 4 года назад
Very high-quality tutorial. How to set up everything before running any code is very nice to include, and timestamping it so people who already know it can quickly skip is just much appreciated. Keeping the tutorial example script and diverse is very welcome. Writing it from scratch just makes sooo useful for remembering what was where. I wish other people made tutorials like this... Timestamping is so useful when you just want to look-up that one thing and don't really remember when it appeared.
@viveksuman9600
@viveksuman9600 4 года назад
I saw this video and then successfully wrote the entire code without looking at the video. Not even once. This is because i understood every line of it. Thank you man. Your explanation is very beginner friendly.
@wendikinglopez8842
@wendikinglopez8842 4 года назад
Yes. It helped me UNDERSTAND finally, I think because he taught it with respect for the viewer.
@ThatGuyDownInThe
@ThatGuyDownInThe 4 года назад
This is actually the coolest thing I've seen in my entire life. Wow. Thank you so much I love you man.
@bernardtumanjong4856
@bernardtumanjong4856 5 лет назад
Truly enjoyed your simple step by step explanation on why each command or function is needed, and what it does. Your Python knowledge and skills are evident, as you are able to provide immediate solutions to errors and or challenges to the problem you are attempting to solve. Followed along with the tools and enjoyed the session. Thank you.
@Winterbear009
@Winterbear009 2 года назад
I am from commerce background. I have zero knowledge of all the programming language. I found your video and explanation so good that at least now I can start my journey into scrapping and coding. I am so thankful at the moment. Love your channel. Thank you so much.
@Datasciencedojo
@Datasciencedojo 2 года назад
Hello Ella, glad to help you. Stay tuned with us for more tutorials!
@Winterbear009
@Winterbear009 2 года назад
@@Datasciencedojo Yes Chief👍 Have subscribed already. 🤗
@sarvagyaan1097
@sarvagyaan1097 6 лет назад
enjoyed, data science ! Need more like this one
@bokabosiljcic8694
@bokabosiljcic8694 5 лет назад
This was fast, precise and beautiful! By saying beautiful I didn't mean to state the obvious :) Thanks
@SnehilSinghsl
@SnehilSinghsl 6 лет назад
I cant believe I actually sat through 33 minutes learning web scrapping, something completely new to me. I was looking for a shortcut but your tutorial was just perfect! :D Thanks for this.!
@theworkflow19
@theworkflow19 4 года назад
You are a blessing seriously! The first tutorial that actually made sense from start to finish. I was able to understand so much from this! Please Please Please Please upload more videos on Python Web Scraping with BeautifulSoup. Thank you again for this blessing!
@LePnen
@LePnen 7 лет назад
Thank you very much for this video! I hope you do a second one on this subject. I'd like to know how to scrape several pages as you mentioned in the end of video. This was just what I was hoping for. Thanks!
@funny_buddy_official2712
@funny_buddy_official2712 6 лет назад
Hey, please help me, when I tried scrapping other site , I am getting 403 forbidden error , how do I fix that? Is it possible to scrap a secure site?
@edenhoward2053
@edenhoward2053 3 года назад
UPDATE/SUGGESTION The findALL function has been renamed to the find_all function in Bs4 version 4.9.3
@paulprice5860
@paulprice5860 4 года назад
Thanks. I have a basic understanding of python and html and I found this tutorial very easy to follow. You do a great job of clearly explaining things in the code which is what I need at my current skill level. Much appreciated.
@annatinaschnegg5936
@annatinaschnegg5936 3 года назад
I really liked the tone, rythm and clarity of this tutorial! I‘m not a total beginner with python anymore and so was able to listen and (mostly?) understand while preparing lunch for my kids. (I‘ll rewatch to try and do it later)
@lydialim2964
@lydialim2964 6 лет назад
THIS IS AMAZING!!! Everything was very well-explained and instructed, I managed to get my first webscrape off an E-commerce site! Thanks so much, you have a loyal subscriber in me! Perhaps you could cover using time sleeps to avoid getting blacklisted by the websites we are scraping? And also how to scrape multiple pages in one go?
@adrianramos2989
@adrianramos2989 5 лет назад
This material is just amazing. Thank you! Have you considered making an intro to Web Scraping using R?
@Billsethtoalson
@Billsethtoalson 5 лет назад
DUDE! High Quality Content!! You are very good at walking through the logical steps for breaking down a page! Other tutorials are great but are always geared toward the specific task at hand. With this it felt like I also learned how to tackle a page! This helped a bunch!
@denisshmelev4990
@denisshmelev4990 5 лет назад
I thought web scraping was hard until I found your video. Huge thanks man, you saved so much time for me!
@ahmedalthagafi4492
@ahmedalthagafi4492 7 лет назад
Great video. ..very easy to follow. hope you do more of that kind. Thanks.
@Datasciencedojo
@Datasciencedojo 7 лет назад
Glad you enjoyed it! Did you mean more videos about web scraping, programming, data science, or data acquisition?
@siddhartha8886
@siddhartha8886 7 лет назад
yes , I need more videos on web scraping. Thank you :)
@maxiewawa
@maxiewawa 7 лет назад
number 1, i realise this was 5 months ago but still thought I'd make a suggestion. If you get good at data scraping you end up with enormous CSV files... how do you manipulate them? Like if I was looking for a certain price at a certain date in the past, putting all your data in a python list and iterating through it crashes my computer usually...
@anonyme103
@anonyme103 5 лет назад
This is very well explained and I enjoyed every second of it ! please do more ^^
@felixkimutai8478
@felixkimutai8478 4 года назад
I have watched all the web scraping videos on RU-vid but this one is the top, I learned a lot. Thank you.
@souravmahanty7025
@souravmahanty7025 6 лет назад
This is the first tutorial on this that actually makes sense. THANK YOU. You earned a subscriber.
@haxxorlord7327
@haxxorlord7327 4 года назад
this soup is very beautiful, goddamn
@schlongmasterlol2724
@schlongmasterlol2724 4 года назад
16:04 Command for it on windows is CTRL + SHIFT + P :)
@patrickjane276
@patrickjane276 6 лет назад
that was awesome man - so much appreciation for things like this! you could throw in adding the csv into a database - and then throw in a query on the best card!
@AniketRajAniketkno
@AniketRajAniketkno 4 года назад
I love how passionate you are about this and how clear it is to understand your explanation. I WANTS IT ALL! GIMME! Lol Awesome video!
@gauravsharma-mi2er
@gauravsharma-mi2er 7 лет назад
Wow great video.Can you make a video on srapping data from multiple pages
@gangstasteve5753
@gangstasteve5753 5 лет назад
would that involve threading?
@learnmandarinwithkaili1102
@learnmandarinwithkaili1102 4 года назад
When I watched this tutorial, it seemed easy to scrap until I stuck a thousand times while actually scrapping a webpage. Happy Coding for dummies lollll
@redfeather22sa
@redfeather22sa 3 года назад
it must have been a magic day when I saw this for the first time 1.5 years ago !!! its where i all started !!! Thanks! Best Video & Intro into webscraping for absolute begginers !! Thanks (notable mentions to Corey Shafer who I was watching a a few weeks earlier, who gave me the taste of it & how easy it could be to use/do). Thank you friends!! An amazing tool!!
@rupertrussell1
@rupertrussell1 5 лет назад
Fantastic tutorial! gave me 95% of what I needed for my first screen scraping project.
@linuxit5869
@linuxit5869 7 лет назад
Awesome tutorial, Please add how to scrap multiple pages :)
@johannbauer2863
@johannbauer2863 5 лет назад
Linux IT make a list and a for loop?
@petersilie9504
@petersilie9504 5 лет назад
Use multithreading for this
@diegugawa
@diegugawa 5 лет назад
@@petersilie9504 Can you do this in python 3? I don't think it's possible (apparently the multithreading module it is not recommended). Sounds like a job for a compiler language.
@BackwardshturT
@BackwardshturT 4 года назад
@@johannbauer2863 can you please explain? Thanks
@dragoxofield
@dragoxofield 7 лет назад
Nice! I was wondering if you could do a page monitor where it tells you exactly where the website has changed?
@iloveanime9226
@iloveanime9226 6 лет назад
yeah, that would be interesting, Basically you would save all the variables then check and save them into new variables compare old ones then change if there is a difference?
@Niccolatorres
@Niccolatorres 6 лет назад
An easy way to do this is download the html from desired page and store it's md5. Check the same html periodically and compares both stored and current md5. This is an easy and less cpu-consuming way to check whether the website has changed.
@iloveanime9226
@iloveanime9226 6 лет назад
yeah, that seems like a better way to do it but you would need to clean up the ad containers since they always change although the page content did not change.
@sebastianpeters2296
@sebastianpeters2296 4 года назад
Hey there! this guide really helped me to create a tailored scraper for a pilot project. Even though I am at the very beginning stage of learning Python, I could manage to create the entire script, and even learned while the process. Amazing, really appreciate this!
@npithia
@npithia 4 года назад
This is gold for someone learning python and seeing its application.
@freediugh416
@freediugh416 7 лет назад
wow this was great! I am completely new to this and still could follow perfectly fine and loved the explanations of everything. Would love to know how to run this script every day automatically and send results to phone or create alerts for changes and send those to a phone. Again, awesome job!
@96hugoS
@96hugoS 7 лет назад
This is what I'm looking for as well, but I'm not getting any further unfortunately
@iloveanime9226
@iloveanime9226 6 лет назад
so you would need to host a server online to run it always, you can use it to also link to an app that checks the changes and alerts you. Just some ideas you can search more on StackOverflow :)
@DonGass
@DonGass 6 лет назад
twilio is a good service for sending text messages via API...you could combine it with the scraping functionality and some sort of compare logic to text you the changes...
@jackjackattack4384
@jackjackattack4384 6 лет назад
The only solution is to be constantly updating your code. There's not really a good way outside of intelligently analyzing the picture, description, & brand.
@ScremoSam1
@ScremoSam1 5 лет назад
This has been so useful. Thanks so much. What I need to know now, is how I can get the scraper to continue working when there's a 'Load More' button, which doesn't take you to another page. If anyone knows anything about this please let me know.
@brandonhirdler
@brandonhirdler 4 года назад
This is a really good question. Maybe click the load more button and then copy the URL? Or define how many results you want for that page then copy the URL I'm pretty sure when you hit load more its actually altering the html path?
@BrianGlaze
@BrianGlaze 4 года назад
Maybe you can program in a click to load more function into your code.
@barodrinksbeer7484
@barodrinksbeer7484 2 года назад
Late answer, but the solution is coding a click load more button. Similarily how you can do a click next page button for your script to continue onwards.
@sacroultima
@sacroultima 3 года назад
You are sooooo comfortable to listen to. Not because you have a perfect pronanciation and a seamless script you are gliding through. You are just talking but not constantly jumping back and forwards. Accurate tempo and personality in your voice. New subscripion
@Datasciencedojo
@Datasciencedojo 3 года назад
This makes us feel really motivated, Law! Thanks a lot :)
@benzavaleta92
@benzavaleta92 5 лет назад
two days finding answers and you give me all that i need in 30 minutes !!! thanks so much!!!
@jdsr4c
@jdsr4c 5 лет назад
I'm getting this error when I try to run it: File "", line 2, in NameError: name 'page' is not defined
@bmxng33
@bmxng33 5 лет назад
he set it as page_soup, not page
@adammarsono8908
@adammarsono8908 5 лет назад
Hello, at 20:14 , the tag (in my case) jumps to tag inside tag. How to choose which tag we want to grab if there is more than 1 tag with same name
@hieudao428
@hieudao428 5 лет назад
I ran into a similar problem. You can use the "find()" method in python to find a specific tag. you can either have it in the following: a) container.find("a","item-brand") b) container.find("div","item-branding") once you are in a specific tag, you can just go with . notation to get to the next sub-tag. so for example, I had container.find("div", "item-branding").a.img["title"] You can just skip directly by searching for the "a" tag instead of the "div" tag or maybe even the "img" tag.
@DhirajShah
@DhirajShah 5 лет назад
I was also stuck there but i found the solution. Directly Just find the div with class : item-branding and from there you can get the image which will give you title.
@felipeabarcaguzman1057
@felipeabarcaguzman1057 4 года назад
@@hieudao428 Thankss!!
@mgmartin51
@mgmartin51 5 лет назад
Boy is this ever clear. Very straighforward presentation!
@alexrobert4614
@alexrobert4614 5 лет назад
This is one of the only clear|fun python tutorials out there. Congrats
@chriswashingtonbeats
@chriswashingtonbeats 4 года назад
the first div that it showed was item badges how do i navigate to different divs?
@zionrogue4593
@zionrogue4593 4 года назад
I am having the same problem??
@syomantakchaudhuri9935
@syomantakchaudhuri9935 5 лет назад
Looks like they added another div at the very beginning of each item-container. The brand name can now be extracted with a little more effort- brand_container = x.findAll("div",{"class":"item-info"}) print(brand_container[0].div.a.img["title"])
@rramey5597
@rramey5597 5 лет назад
Try using a simpler one liner - print(container.a.img["title"].split(" ")[0])
@mikez9898
@mikez9898 5 лет назад
Great, thank you. it worked! brand_container = container.findAll("div", {"class": "item-info"}) brand = brand_container[0].div.a.img["title"]
@matthias1312
@matthias1312 5 лет назад
Thank you! Took me forever to figure it out before I read this comment!
@arujbudhraja
@arujbudhraja 4 года назад
Awestruck! It's amazingly simple to follow along! Thank you, sir, for adding to the community of self-learners!
@PanamaSoftwash
@PanamaSoftwash 5 лет назад
I dont know much about coding but the way you explained this made perfect sense. I hope to learn a lot from your channel.
@arshdeepsinghahuja
@arshdeepsinghahuja 7 лет назад
shipping_container = container.findAll("li",{"class":"price-ship"}) GETTING THIS ERROR Traceback (most recent call last): File "", line 1, in TypeError: 'tuple' object is not callable
@cihansariyildiz1748
@cihansariyildiz1748 4 года назад
try find instead findAll
@neilaybhalerao8373
@neilaybhalerao8373 3 года назад
Same!!! I didn't understand when he said "oh I need to close this function".... Can anyone explain?
@cameroncrawley2217
@cameroncrawley2217 3 года назад
@@neilaybhalerao8373 shipping_container = container.findAll("li", {"class":"price-ship"} is what he typed originally. He forgot to add the ending ) to close the function. So he should've typed shipping_container = container.findAll("li", {"class":"price-ship"})
@sk_4142
@sk_4142 4 года назад
brand = make_rating_sp[0].img["title"].title() TypeError: 'NoneType' object is not subscriptable [Finished in 3.074s] anyone know why this is happening? or how to fix this?
@SourPickle-bv9gd
@SourPickle-bv9gd 3 года назад
Did you get an answer? I’m having this problem aswell
@Pulits
@Pulits 4 года назад
I did a Web Scraper not so long ago with another set of tools. This video has motivated to create one, too!
@mehmetyersel449
@mehmetyersel449 4 года назад
Great video! Thank you for sparing the time and explaining in such a straight forward delivery.
@wadephz
@wadephz 6 лет назад
Hi, thanks for the video! How do you get to the second div tag in "container"?
@SpenderBara
@SpenderBara 6 лет назад
Have the same question here. I've tryed different notations i.e. div[2], dic{2}, div(2) and others, but still don't get the second or third div
@shankargs7685
@shankargs7685 7 лет назад
Hi Dojo, Really nice video. I have one doubt. The recent eCommerce sites done have class items constant, they have alpha numeric values like class="_3Hjcsab" how do you scrape when the site keeps on changing?
@ahmedramadan8153
@ahmedramadan8153 7 лет назад
try the Xpath way!! i don't think they will change all the Attributes and the path of the element Periodically.
@Datasciencedojo
@Datasciencedojo 7 лет назад
Then it gets harder! It's an adversarial problem. The time of development greatly increases because you have to build functions to check if the tag has all the features you are looking for before grabbing it. It's not as straight forward as grabbing by the div or id. In this case it might not be practical to scrape this sites because they clearly do not want to be scraped. Even if you scraped them successfully, they would be aware and change their code again accordingly.
@vidhishah5484
@vidhishah5484 7 лет назад
Yeah, ran into the same problem, tried a lot to get around it but couldn't :/
@das250250
@das250250 7 лет назад
Yes scraping may be a limited toolset as websites use more sophisticated format
@jasonrobinette4486
@jasonrobinette4486 7 лет назад
Thanks great vid-easy to follow for a rookie
@mohitnagarkoti4086
@mohitnagarkoti4086 4 года назад
Awsome, Good, Excellent, Nice, Best. Hope RU-vid's algorithm recommend this to every Scrapper Enthusiast.
@prayerbabies
@prayerbabies 6 лет назад
The first video of this type that really made sense to me ... thank you very much.
@alibee6232
@alibee6232 6 лет назад
when i type uclient = ureq(my_url) it gives me a 403 error forbidden and a bunch of timeout, does this mean that it works but it crashed or will crash if it runs?
@DDay_8
@DDay_8 6 лет назад
4K Bahrami same here
@changleo4417
@changleo4417 6 лет назад
DeeganCraft how does that work?
@lSh0x
@lSh0x 6 лет назад
you guys are using pages instead of pages ... :)
@franciszekszombara8881
@franciszekszombara8881 6 лет назад
this helped me: stackoverflow.com/questions/41214965/python-3-5-urllib-request-403-forbidden-error
@yuriipidlisnyi2248
@yuriipidlisnyi2248 6 лет назад
Maybe it's better to use find() instead of findAll() to get product's name? So code will be less complex, like this : title = container.find("a",{"class" : "item-title"}).text
@SohCahToa30
@SohCahToa30 5 лет назад
How would you loop with this configuration?
@armeljoelirie3797
@armeljoelirie3797 4 года назад
thank so much
@cwhizkid420
@cwhizkid420 4 года назад
This is one of the most useful Web Scrapping videos I have ever come across. I could learn it from scratch. Thanks.
@jonpotter5776
@jonpotter5776 6 лет назад
As someone self learning Python (my first programming language) with a web scraping script in mind, this was great!
@user-ik9th1nk9n
@user-ik9th1nk9n 7 лет назад
Can anyone say what is the syntax when you are using two objects inside findAll. If the HTML is
@bikashacharya6051
@bikashacharya6051 7 лет назад
if u sloved this please let me know too!!!
@mariahakinbi
@mariahakinbi 7 лет назад
if you just want that one element with that id, then you can just use the id: container.findAll("div", {"id":"yyyyyyy"}) ids are supposed to be unique to one element, whereas classes are not unique and can be used with multiple elements for more info: stackoverflow.com/questions/12889362/difference-between-id-and-class-in-css-and-when-to-use-it css-tricks.com/the-difference-between-id-and-class/ so basically, you should use one or the other, otherwise there is redundant and/or conflicting html code in the webpage you're trying to parse
@meunomejaestavaemuso
@meunomejaestavaemuso 7 лет назад
maybe container.findAll("div", {"id":"yyyyyyy" , 'class':'xxxxxx'}) ??
@mariahakinbi
@mariahakinbi 7 лет назад
you shouldn't have to use both the id and the class, going with the id like I showed above should give you what you want
@mattshort181
@mattshort181 6 лет назад
You shouldn't need to find something with a class and id, since ID is supposed to be a single item on a page. A page with the multiple IDs that are the same should be using classes.
@danieldimitrov7067
@danieldimitrov7067 3 года назад
The best tutorial on web scrapping, I've ever seen! Great work!
@clivestephenson2793
@clivestephenson2793 4 года назад
You are the most concise teacher of python I have come across Thanks I will definitely give your other videos a view
@Moccar
@Moccar 6 лет назад
I get an error when trying to call uClient & page_html. It says: File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 1318, in do_open" and then like 40 more lines like that but from different files like client.py and ss1.py and so forth. I don't really know how to fix it. Anyone here has the same issue?
@jshen4037
@jshen4037 6 лет назад
you have to flip the narn and as you are doing it you say NANANANANA NAARN MAN like batman
@makedredd299
@makedredd299 7 лет назад
Hi, I'm getting stuck at 28:50 when running the script. How do I solve this problem? $ python Dojo.py Traceback (most recent call last) : File "Dojo.py", line 18, in brand = container.div.img["title"] TypeError: 'NoneType' object is not SUBSCRIPTABLE Best Regards
@jayadrathas169
@jayadrathas169 7 лет назад
That is a corner case error...your best bet is to apply a try or if else statement.
@pyhna-lol2625
@pyhna-lol2625 7 лет назад
Hey, I got it too. It seems to come when they don't have the "3VGA" or whatever. I fixed it by taking the first word out from the output "title_container[0].text". So I tossed the original second part of "brand = xxx" and replaced it with "brand = title_container[0].text.split(' ', 1)[0]". Hope it helps.
@makedredd299
@makedredd299 7 лет назад
Tnx then I'm not going crazy it's the website changing that causes this kind of errors ☺
@DavidDreesYT
@DavidDreesYT 6 лет назад
Looks like you need to add another "div" tag. --> brand = container.div.div.a.img["title"]
@funny_buddy_official2712
@funny_buddy_official2712 6 лет назад
Hey, please help me, when I tried scrapping other site , I am getting 403 forbidden error , how do I fix that? Is it possible to scrap a secure site?
@alancoates
@alancoates 6 лет назад
Your presentation and explanation are awesome! You have opened my eyes to the uses of Python and Beautiful Soup.
@DincerHoca
@DincerHoca 5 лет назад
Thanks for the video. This was the best web scraping tutorial I have seen on youtube.
@MilanTheAngel
@MilanTheAngel 7 лет назад
I'm on ubuntu, is that ok?
@phucduong7071
@phucduong7071 6 лет назад
It works better on Ubuntu...
@WisdomSeller
@WisdomSeller 7 лет назад
could you upload the script?
@Jack258jfisodjfjc
@Jack258jfisodjfjc 5 лет назад
your such a great teacher! Just because you can code doesn't mean you can teach. Awesome!
@brendensong8000
@brendensong8000 3 года назад
As of Nov 2020, I went through the whole thing without any issue! I used a different product name, but everything worked so well! Everything worked so perfectly! I learned so much from this video! this is awesome!!!! Thank you!!!!
@blackalk9420
@blackalk9420 4 года назад
def Data Science Dojo(): Data Science Dojo = ("like", "share", "sub") good job = (input.comment("Thanks you very much ! ")) if good job in Data Science Dojo : print("love and respect from Kuwait") else: print("sorry maybe next time") Data Science Dojo() ------- Output :- peace out and happy basic coding :D
@jaromtollefson3127
@jaromtollefson3127 5 лет назад
I keep getting 0 when I call len(containers)
@saarakylmanen9345
@saarakylmanen9345 5 лет назад
Me too. Did you figure it out? I didn't..
@tzviyudin1687
@tzviyudin1687 5 лет назад
@@saarakylmanen9345 did u figure this out yet
@tzviyudin1687
@tzviyudin1687 5 лет назад
did u figure this out yet
@Datasciencedojo
@Datasciencedojo 2 года назад
Watch the Livestream for Web Scraping with Python and BeautifulSoup by Arham Noman now on ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-rlR0f4zZKvc.html
@tntcaptain9
@tntcaptain9 4 года назад
Saw many videos on web scraping but yours was probably the best one.
@edwardadams3727
@edwardadams3727 4 года назад
brand = container.find("a", {"class":"item-brand"}).img.get('title') your welcome
@hanzenpeter3917
@hanzenpeter3917 4 года назад
'NoneType' object has no attribute 'img' :D could you please send me your code?
@Orokusaki1986
@Orokusaki1986 5 лет назад
Just use pycharm, man :-P
@travisw5076
@travisw5076 5 лет назад
just use vim and then go native linux your set now you can throw the desktop away and get a tiling WM
@joeyzalman8254
@joeyzalman8254 4 года назад
The explanation was super. Also thank you you for showing all those handy tools. Keep it up!
@paulsalmon3056
@paulsalmon3056 4 года назад
Loved this video, clear and straight to the point, was able to make my first web scraping program
@AlqGo
@AlqGo 7 лет назад
You don't want Annaconda. There's so much junk that comes with it. Just install regular Python and open your terminal or powershell and install packages you need. For example, pip install beautifulsoup4, if you want to install Beautiful Soup. It's MUCH more cleaner than installing annaconda rubbish.
@souravmahanty7025
@souravmahanty7025 6 лет назад
There are a few useful things that come with anaconda, like jupyter notebook. It is a great program for learning python as a beginner, as a lot of online courses use it.
@robertmielewczyk9804
@robertmielewczyk9804 6 лет назад
I think the more important reason is that you install only the things you need exactly for the project you are doing, and you are working on virtual environment so you can easily jump to different versions of python etc. And it also makes life simpler if you're using windows for programming in my opinion. However if you're just doing easy personal projects it probably won't matter
@sranstankovic233
@sranstankovic233 5 лет назад
pip install virtualenv
@robertgresham3489
@robertgresham3489 5 лет назад
pip install jupyter. It's not that hard. In fact, you can use this video to scrape all of the modules that come with anaconda, use that to build a list of pip commands that install all of the modules the list...
@PurchaseAreaMusic
@PurchaseAreaMusic 5 лет назад
@@robertgresham3489 Code Inception. lol
@insigpilot
@insigpilot 5 лет назад
This was very good. I'm a beginner to Python and this webscraping tutorial left me with very little questions.
@pradhyumsharma1710
@pradhyumsharma1710 5 лет назад
You are a great teacher I believe I was not much not into data Science but after watching your video it made it simple and easy. Thank you I took 100% from it. :) requesting more videos from you..
@antisocialfox7268
@antisocialfox7268 5 лет назад
i love this so much, i was just checking out what webscraping is but HOLY MOLY THIS IS SOME BROKEN ASS ABILITY
@PunitSoni00
@PunitSoni00 5 лет назад
Great introduction to web-scraping. Thanks for posting this.
Далее
would you eat this? #shorts
00:23
Просмотров 1,7 млн
ПАЛОЧКА В НОС (СЕКРЕТ)
00:40
Просмотров 142 тыс.
25 Nooby Pandas Coding Mistakes You Should NEVER make.
11:30
Beautiful Soup 4 Tutorial #1 - Web Scraping With Python
17:01
How I'd Learn AI (If I Had to Start Over)
15:04
Просмотров 762 тыс.
Find and Find_All | Web Scraping in Python
12:10
Просмотров 54 тыс.
Industrial-scale Web Scraping with AI & Proxy Networks
6:17
Python RAG Tutorial (with Local LLMs): AI For Your PDFs
21:33
The Biggest Mistake Beginners Make When Web Scraping
10:21