currently planning for my computer science A level project and wanted to learn what this web scraping thingamejiggy was all about this video was an amazing introduction! simple, clear, but not over proffessional didn't leave me feeling overwhelmed, and i'm going to watch more of your tuts now, cheers mate!
wooooow it's been years that i didn't see a video about tinkernut. i think about 10 years ago i learned sql and php with your tutorial about making a webpage with users passwords etc. man so nice to see a video of you.
cool tutorial :D for more complicated data I use xpath, although its syntax is a bit weird at first. furthermore: validate, validate and validate your data. you do not want a program which crashes randomly, only because a value is missing, empty or malformed :)
Man... I've seen other web scraping tutorials and they take you ten miles down the road and through all types of advanced garbage at you. Granted, I know what you have shown here is the quick and easy way, but that's all I have wanted to get an understanding of, what it is, and how it basically works. Thank you.
So glad to see you posting again! I missed your videos so much. I believe my first video of yours was either How to Setup a Webserver or How to Make an Operating System. Both excellent videos!
This channel used to have like 100k views. Now its down to just less than 10k. Idk why. When I was around 13, I wanted to make an fps game and found his video to be very interesting. I follow this channel since then. Tinkernut was the reason I started learning programming. After watching his HTML tutorial (create a website from scratch). Even though I neither have com-sci degree nor working as a programmer, I'm still learning python during my freetime. Thank you Daniel.
A survey businessman could use web scraping to scrape a competitors website for product pricing to include product numbers photos prices and then use this to monitor their price changes and or adjust their own prices on their website to stay just a slight bit more competitive
Web scraping is to copying and pasting manually, as copying and pasting manually is to using your eyeballs, memorising, then typing it into a file. There is no difference between surfing the web and web scraping. One is just faster. Like how copy/pasting something from Wikipedia is faster than reading and re-writing it.
Funny how it's titled Beginners Guide to Scraping and once he's done with the introduction and starts typing a bunch of codes that " beginners" have absolutely no clue how to do... Thanks, man great help!
Great video. With the phrase "web scraper", I can't help but picture a function that returns a digital box chevy with candy paint, 26" chrome rims, tinted windows, and triple 15" subs in the trunk with some Too $hort going. I hope someone else from Northern California is thinking the same thing, and cracks up seeing this. But thank you for your fantastic educational video! cheers.
Overall, I highly recommend this video to anyone who is interested in learning Python. It is a comprehensive and informative resource that will teach you need to know to get started with this powerful programming language.
Thanks for sharing the expertise! However, I get the following error when running the code. writer.writerow([quote.text, author.text]) UnicodeEncodeError: 'latin-1' codec can't encode character '\u201c' in position 0: ordinal not in range(256)
Ok, so this is amazing, thank you! How would you generalize a scraper like I want to scrape all the news sites in the world and extract the main articles?
Where can we find out if we are allowed to scrape data from a specific website so that eventually we don't end up in trouble? Does scraping code/process works the same way for scraping product prices, e.g. trying to replicate camel for amazon or that takes additional authorization from amazon?
Excellent question! All popular websites have a scraping/crawling text file called "robots.txt". This tells what can and can't be scraped from a website. Here is an example of Amazon's robots.txt file (spoiler, you can't scrape much) www.amazon.com/robots.txt
@@jimavictor6022 As long as you don't scrape things like other people's documents from governamental sites or usernames plus passwords you should be fine with the rest. What website owners are really worried about are their website availability (whether they are online or offline) and bandwidth usage as they pay X for X amount of gigabytes consumed. (they pay for each gigabyte they send and receive from users) So as long as you don't consciously/unconsciously take down their site you're fine.
@@jimavictor6022 On top of that they have their automated way to detect bots, the worst that can happen is getting your IP "banned" or simply restricted from viewing their webpages, that will happen way, way, way... before you getting sued by them.
Thanks for the vid! After a VERY VERY long time i'm getting back into casual coding and looking to casually make some scraping info programs for games with the option to select which info the person wants to see. So if the site allows scraping would it be better to have my app in progress be independant, have checks done once a minute or every dive minutes? Or have the info scraped, processed and posted on a site i create and retrieved for ppl using the the app? That is if i start shareing the app. My concern is annoying the site owners by checking too often, forgive me if its a silly question, i'm not experiance with scraping.
Thanks a lot for this clear video! How would I retrieve more information associated with the quote? For instance I would like to receive and print both the author and the associated tags.
Thanks, this was very good, can you share any link where you have done the same for teh website which require username and password, can you please share the same, thanks a ton
I just checked a website I want to scalp in a future, but this will be significantly more difficult. I want to get live train schedules but to the live data is inside Java-Script pop-up window.
This is not so easy on windows. Im a beginner at this, but it keeps giving me the "ModuleNotFoundError: No module named bs4". I have spent hours online trying to figure this out.
I had no clue it was this easy, but how do I find out which websites I'm not allowed to scrape? All I get from Google is ways to prevent scraping on my own website (which I don't have, but that's beyond the point).
Love your videos, I don’t understand much of the content, but what’s the difference between taking these quotes via code and just copy pasting into a excel sheet? I’m a noob sorry
You can do it automatically every X amount of time. You can use a "bot" to do something with that data you scraped. I don't use Excel, but if you're talking about what I am thinking, Excel is doing exactly what was talked on this video; web scraping. The thing is that Excel is doing it for you without the need of you programing it first, but that web scraping it does is very, very limited to what tools made for scraping can do.
In practice? Nothing is different, you get the same result. However, let's say you have a website with 2000 quotes and you need to keep a sheet up to date. That's where a scraper would be useful, as its time you really only need to spend once, plus, at that kind of scale it would be faster to write the code than do it manually.
yes and no, you can check for things like user agent string or try run javascript or something like that, however its actually a really hard problem to solve because a scraping script can look indistinguishable from a browser ..
Great explanation. Simple and up to the point. Had to look up, though, what the zip function did, but, I guess, it's even better that I had to find it out on my own. However, the quotation marks are not saved right in csv file, instead, they show as 3 weird characters. They do display correctly in Thonny, though. Also, the authors are not put into a separate column, but in the same one with the quote. Also, the quote with a semicolon in it got broken at this semicolon in two parts, and the second part was placed into a separate column. Also, in the csv file open I had to put encoding = "utf-8" after the "w", because I was getting an encoding error. Could this somehow be causing the about problems?
Yeah, I thought it was very nice too. For me I use visual studio and I found it to be very helpful since I was able to use python and install the pips for python via command prompt then use visual studio code. Though what my primary application would be for finding different sites from a website. Would be interesting for finding src's and href's. Nice name btw. I like the commonality of it.
Holy Cow! I never once thought web scraping would be so much fun! I just started learning Python. Originally, I learned the fundamentals with Python (2-months) But eventually went into web development from Data Science learning JavaScript and eventually some React. Basically, I did 1 year of JavaScript and now I am focused on Python again because I am taking the Meta backend Cert with Python. Is it me? or Python much easier for web scrapping compared to JavaScript? I am too new to give an opinion, but I once tried web scrapping with JavaScript, and it was much more complicated in comparison to this tutorial which you gave, thank you for the excellent tutorial. It was well articulated and easy to follow through. Would you have any recommendations for a newbie such as myself on what I should focus on for backend development? 🤔