Тёмный

Always Check for the Hidden API when Web Scraping 

John Watson Rooney
Подписаться 86 тыс.
Просмотров 631 тыс.
50% 1

Опубликовано:

 

26 сен 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 545   
@hectord.7107
@hectord.7107 2 года назад
I've doing this for years as a self taught programmer, there are some little tricks you did here that i didn't know, thank you for the video.
@JohnWatsonRooney
@JohnWatsonRooney 2 года назад
Glad it was helpful!
@sworatex1683
@sworatex1683 2 года назад
It's my first year in programming and there was nothing new actually. I don't even think that pain was worth it I'd just make the scraper in js and make it return a json string.
@sworatex1683
@sworatex1683 2 года назад
But I guess that would be useless for bigger projects. I'd just do it in js if I want like an actual product list like this.
@gesuchter
@gesuchter 2 года назад
@D R lol
@mattshu
@mattshu 2 года назад
@D R how do you “block” scraping??
@abc.2924
@abc.2924 3 года назад
I've been using this trick for a while now, and I've learned it from you, so thanks. Amazing work man
@JohnWatsonRooney
@JohnWatsonRooney 3 года назад
That’s great 👍
@transatlant1c
@transatlant1c 2 года назад
Nice video. It’s worth noting as well that many APIs will pageinate, so rather than checking how many total results exist and manually iterating over them - you just check to see if the ‘next page url’ or equivalent key exists in the results and if so, get that too until it doesn’t exist anymore, merging/appending each time until the dataset is complete 👍
@JohnWatsonRooney
@JohnWatsonRooney 2 года назад
Yes you’re right thank you!
@ianroberts6531
@ianroberts6531 2 года назад
In fact you can see at 05:33 that this particular API does just that - there's "nextPageURL" and "totalPages" at the end of the response JSON.
@freekdl6648
@freekdl6648 2 года назад
I rarely praise anything, but this tutorial was SO good! Well explained, no filler. In 7 or 8 minutes you guided me through finding the hidden information I needed, which tools I need to use and how to automate it. This tutorial gave me enough confidence to try to write my first Python script! Within hours I built a scraper that can pull all metadata for a full NFT collection from a marketplace. Without this video it would have taken days/weeks to discover all of this
@JohnWatsonRooney
@JohnWatsonRooney 2 года назад
That awesome! thank you very kind!
@channul4887
@channul4887 2 года назад
"In 7 or 8 minutes" More like 11
@freekdl6648
@freekdl6648 2 года назад
@@channul4887 Nope! I had different goals so no need to follow the full tutorial
@MelonPython
@MelonPython 2 года назад
I even added it in my playlist. Great video. Definetely starting to love API's more and more.
@danielcardin9241
@danielcardin9241 Год назад
Because of this video, I was able to start my own rockets and satellites company. In only four hours, I started the company, launched thousands of rockets, and now I have my own interplanetary wireless intranet from which I can control the entire galaxy! Thanks again!
@zenon1903
@zenon1903 2 года назад
Please ignore my first comment. I checked out your first video in this series and learned about using scrapy shell to test each line of code. With that I found the bug in my code. The code worked PERFECTLY as advertised. Your the man! Much thanks!
@drkskwlkr
@drkskwlkr 2 года назад
Loved everything about this video! Great delivery style, production quality and interesting topic for me. First time visitor to this channel and not a Python user (thanks, RU-vid, for your weird but helpful predictive algorithms).
@JohnWatsonRooney
@JohnWatsonRooney 2 года назад
Thank you! I’m glad you enjoyed it
@Jacob-gy9bg
@Jacob-gy9bg 2 года назад
Wow, thanks for this excellent tutorial! I just spent all this time writing cumbersome Selenium code, when it turns out all the data I was looking for was already right there!
@JohnWatsonRooney
@JohnWatsonRooney 2 года назад
Great! That’s exactly what I was hoping to achieve with this video
@brockobama257
@brockobama257 Год назад
bro, you're a game changer and i love you. if i ever see you in person ill offer to buy you a beer, or lunch, coffee whatever
@gleysonoliveira802
@gleysonoliveira802 3 года назад
This video was the answer to my prayers! The next best option was to watch an one hour video and hope they would teach what you taught... In 10 minutes!!! 👏👏👏
@JohnWatsonRooney
@JohnWatsonRooney 3 года назад
Thank you glad it helped!!
@joshuakb2
@joshuakb2 2 года назад
This video came into my feed just a couple days after I used exactly this method to collect some data from a website. Very good info! This is much easier than web scraping. Unfortunately, in my case, the data I could get out of the API was incomplete, and each item in the response contained a URL to a page with the rest of the info I needed, so I had to write some code to fetch each of those pages and scrape the info I needed. But much easier than having to scrape the initial list, as well.
@JohnWatsonRooney
@JohnWatsonRooney 2 года назад
Thanks! I’m glad it helped in some way. I often find that a combination of many different methods is needed to get the end result
@Kralnor
@Kralnor 2 года назад
This is a true gold nugget. Thanks for demonstrating how to easily view the request in Insomnia and auto-generate code!
@ERol-du3rd
@ERol-du3rd 2 года назад
Awesome advice, a lot of people skip checking the requests when building scrapers but it can save a lot of time when it works
@tikendraw
@tikendraw 3 года назад
I just want you to never stop creating such informative video. For god sake.
@shivan2418
@shivan2418 2 года назад
This is, no joke, the most useful video I ever saw on RU-vid!
@judgewooden
@judgewooden Год назад
I like how you regularly start sentences with 'you might think' assuming we are all idiots. I approve, glad smart people, like you, make time to explain to us plebs how the world works. Apprecated.
@JohnWatsonRooney
@JohnWatsonRooney Год назад
Hey, thanks. I do my best to explain things how I would have wanted to be taught
@marlinhicks
@marlinhicks 3 месяца назад
Been using python for a couple years now as a picked up language and I really appreciate getting to see how someone experienced approaches these problems
@fuad471
@fuad471 Год назад
really nice and helpful tips in an actual topic with a sight-pleasuring recording quality, thank you for your time and efforts.
@wp4297
@wp4297 2 года назад
HUGE! I've been looking for this info for 2 days. 12 mins of your video better than anything else, by far. Thumbs up and thank you so much
@JohnWatsonRooney
@JohnWatsonRooney 2 года назад
Thank you !!
@wp4297
@wp4297 2 года назад
@@JohnWatsonRooney you saved me a lot of time. I'm new to the topic, next days I'll take a look at you channel
@sajayagopi
@sajayagopi 2 года назад
I was struggling with selinium to extract a table from javascript website. This video saved so much time. Thank you
@JeanOfmArc
@JeanOfmArc 2 года назад
You have shown me the light. Thank you for stopping me from making more web scripts that load up web pages in browsers to click buttons.
@JeanOfmArc
@JeanOfmArc 2 года назад
I have tried this method, but sadly the site I am trying to scrape from returns "error": "invalid_client", "error_description": "The client credentials provided were invalid, the request is unauthorized." Am I out of luck?
@ThijmenCodes
@ThijmenCodes Год назад
Nice video! Used a similar method to collect European Court of Human Rights case documents since there is no official API. Glad to see such methods gaining popularity online, it’s so useful!
@mattimhiyasmith
@mattimhiyasmith 2 года назад
I have used the inspect with network method but wasn't aware of the copy as url method, thanks for that tip will save me a lot of time!
@ЦветанГергинов-с7ю
@ЦветанГергинов-с7ю 6 месяцев назад
There is always something new to learn. I’ve been spending hours to grind such an information by hand-writing the whole program to get my result ;D Thanks!
@huisopperman
@huisopperman 9 месяцев назад
Thanks for sharing! This has helped me a lot. After struggling for weeks with selenium, I was able to apply this technique fairly quickly, and am now using it as source to scrape ETF-composition data to feed directly into a PowerBI dataset. Much appreciated!
@sheikhakbar2067
@sheikhakbar2067 3 года назад
I always come to your channel for these excellent time-saving tips and tricks! Thank you!
@JohnWatsonRooney
@JohnWatsonRooney 3 года назад
Glad you like them!
@glitchinLife
@glitchinLife 2 года назад
Nice tutorial on scrapping, some tricks I have been using myself, and some others never heard of until now thx for sharing!!! Small adjustments if I may (please don't take this as criticism) I think you don't need to loop over each product to copy it to your res, you can use extend instead, also I think the header didn't change so you can take it out the loop over pages
@vinny723
@vinny723 2 года назад
Great tutorial. My screen scrapping job went from 4.5 hours to 8 minutes!!!!!
@GLo-qc8rz
@GLo-qc8rz 7 месяцев назад
OMG man, was searching for 3 hrs how to extract javascript data w/o complicated rendering and your vid gave a 3 second solution. thank you so much man
@lucasmoratoaraujo8433
@lucasmoratoaraujo8433 Год назад
Greetings from Brazil! Thank you! I just had to adjust some of the quote marks on the header (there were some 'chained' double quotes (like ""windows"")), making some of the header's strings be interpreted by python as code, not text. Just had to change inner double quotes for single quotes (e.g. "'windows'") and it worked perfectly!). Can't wait to try your other tutorials! Once more, thank your very much!
@JohnWatsonRooney
@JohnWatsonRooney Год назад
Hey! Thank you! I’m glad you managed to get it work
@eakerz5642
@eakerz5642 Год назад
Tnx :) Went from 1 hour scraping with Selenium to 1 minute just getting the JSONs.
@michelramon5786
@michelramon5786 8 месяцев назад
I was like "hm, okay, yeah" to "HOLY SHIT, THATS THE DOPEST SHIT I'VE EVER SEEN" I'm starting to get into this niche and I intend to learn more Python and SQL (you know, Data Analysis stuff/jobs) and I'm doing a project to scrape NBA statistics but there are always some errors and it ends up taking a long time. BUT THIS IS GOLD CONTENT, KEEP IT UP
@Oiympus
@Oiympus 2 года назад
nice tips, it's always fun to poke around and look at what data webpages are using
@BIO5Error
@BIO5Error 3 года назад
Very, very interesting - I'm going to give this a go myself. Cheers for another great video John.
@krahkyn
@krahkyn 9 месяцев назад
This is such useful content that shows how much value experience gives - thank you for the straightforward and realistic tutorial!
@isaacmartinez442
@isaacmartinez442 13 дней назад
Wow I love this! I was able to do it!! I did have to adapt to my own situation but still. Thank you so much
@unknownuser993
@unknownuser993 2 года назад
Wow that ‘generate code’ feature is super useful. Thanks!
@Josh-kw7zk
@Josh-kw7zk 8 месяцев назад
Thank you so much for this tutorial. It helped me a lot on my project. And i learn a lot of new things that i didnt know. Thank you!
@aidanomalley8607
@aidanomalley8607 Год назад
Thank you, your videos has automated my job. All i need now is a AI cam of myself
@mujeebishaque
@mujeebishaque 2 года назад
I love you, John. You're awesome! Thanks for being unique and producing quality content.
@milosZcr
@milosZcr 2 месяца назад
Great content, very useful now that I am learning about this subject. You earned a new sub here
@BOSS-AI-20
@BOSS-AI-20 Год назад
This video is really amazing I learned web scraping from your videos thanks
@klabauter-ev4ix
@klabauter-ev4ix Год назад
That was incredibly helpful and exactly what I needed today. Your presentation is very clear. Thank you!
@rameshks5281
@rameshks5281 3 года назад
Easy to understand and very neat & clean narration. Keep it up 🙂
@JohnWatsonRooney
@JohnWatsonRooney 3 года назад
Thanks a lot 😊
@phoenixflower1225
@phoenixflower1225 Год назад
Thank you so much - this is so insightful and educational. Really helped me understand so many things in so little time.
@ScottCov
@ScottCov 2 года назад
John Great Video...Thanks for taking the time to do this!!!
@krims15
@krims15 2 года назад
Nice tutorial. But one important thing you haven't mentioned is that most of such APIs usually have some sort of authorization (based on headers, referrer, token, key, whitelist, etc.).
@walteredmond7904
@walteredmond7904 2 года назад
Yeah for sure I work for MS and all apis are white listed. Wish I could access it as a Public User lol
@maskettaman1488
@maskettaman1488 2 года назад
Some do. Most 'public' ones (eg. no account needed) will not. Even then figuring out what they do for auth is often trivial
@maskettaman1488
@maskettaman1488 2 года назад
@@csharpminorflat5 They're not going to IP block someone for shopping on their site too much lmao
@Al3xdude19
@Al3xdude19 2 года назад
@@maskettaman1488 I’ve been IP blocked for web scraping before. Then again, I didn’t purchase anything. I was taking the photos lol
@maskettaman1488
@maskettaman1488 2 года назад
@@Al3xdude19 You have to REALLY deviate from normal behaviour to catch blocks like that lol. If you fire off requests as fast as possible then yeah you'll probably get caught
@muhammadrehan3030
@muhammadrehan3030 3 года назад
Thank you for such a wonderful videos. I learned a lot from you. BTW your video quality and background are always very beautiful.
@JohnWatsonRooney
@JohnWatsonRooney 3 года назад
thanks! it's nice of you to mention video quality and background, i do my best!
@davida99
@davida99 2 года назад
Wow! I just found a gem of a channel! Love your content!
@JohnWatsonRooney
@JohnWatsonRooney 2 года назад
Thanks appreciated!!
@MikeD-qx1kr
@MikeD-qx1kr 11 месяцев назад
John, a specific video about how to scrape React website would be nice. It uses a mix of html and JSON data on pages...just an idea. Keep up the good work loving it.
@codydabest
@codydabest 2 года назад
This was nearly exactly my job back in 2014/2015 for a giant e-com shoe company. Was always nice when you'd come across a brand that included their inventory count in their API. But yes selenium/watir all day lol
@JohnWatsonRooney
@JohnWatsonRooney 2 года назад
I’m often quite surprised how much info you can easily get!
@davidl3383
@davidl3383 2 года назад
brillant, i start to do that and it's very effective. Good chanel and good job. Thank you John
@phoenixflower1225
@phoenixflower1225 Год назад
This is seriously high level content right here
@Moiludi
@Moiludi 10 месяцев назад
Thank you! It provided a new way of thinking at the issue of collecting data. 🙏
@mrklopp1029
@mrklopp1029 3 года назад
Thank you for those videos. They're extremely helpful. Keep up the good work! 🙂
@JohnWatsonRooney
@JohnWatsonRooney 3 года назад
Glad you like them!
@alexcalish9774
@alexcalish9774 2 года назад
Wow, I think this 1 tutorial will do the most to up level my scraping than I could have ever imagined. Bye bye selenium.
@JSaretin
@JSaretin 2 года назад
Instead of looping over the results again, you can use res.extend(products)
@JohnWatsonRooney
@JohnWatsonRooney 2 года назад
Yes absolutely, thank you for sharing
@ninjahkz4078
@ninjahkz4078 2 года назад
Lol, I hadn't thought of a possibility to get an api like that until now haha ​​thanks a lot!
@Davidca12
@Davidca12 2 года назад
This single-handedly cut the running time of my program from literal hours to a couple of minutes, cannot thank you enough!
@JohnWatsonRooney
@JohnWatsonRooney 2 года назад
Brilliant, thanks!
@sakibullah3577
@sakibullah3577 2 дня назад
really nice juicy piece of knowledge. this XRH tab changes the game hopefully however there arise an issue of how to tackle the cookie-expire problem and if the api needs JWT token, private key
@pascal831
@pascal831 Год назад
Thanks John! You are a lifesaver sir!
@Rob-ky1ob
@Rob-ky1ob 2 года назад
Instead of looping over the list and doing an append of each individual item, you can do list().extend(list()) which extends the list with the new list. The result of this is 1 list of dictionaries (basically an identical result to how you did it) but with less and cleaner code.
@bigstupidtree3771
@bigstupidtree3771 2 года назад
This has saved me hours today. Genuinely, thank you. 🙇‍♂️
@JohnWatsonRooney
@JohnWatsonRooney 2 года назад
That’s great, thank you for watching!
@Jiloh5
@Jiloh5 2 года назад
It worked like charm! I really needed this. Thanks
@voinywolnyprod3046
@voinywolnyprod3046 2 года назад
Quite interesting! Thank you so much for showing such nice tricks, gonna get familiar with Insomnia.
@inspecteurbane5666
@inspecteurbane5666 2 года назад
Thanks a lot, very interesting video, i learned so many things that i didn't know. I will come back for sure!
@RS-Amsterdam
@RS-Amsterdam 3 года назад
You made/make stepping into scraping and developing, easy and fun . Thanks for sharing !!
@JohnWatsonRooney
@JohnWatsonRooney 3 года назад
Thank you!
@irfanshaikh262
@irfanshaikh262 2 года назад
John, you make scraping interesting and motivating simultaneously. Good that I found your channel P. S. I lost myself it at 0:10😂
@techkhid4836
@techkhid4836 18 дней назад
Thank you soooo much.....I needed THIS!
@RatoCanguru_Lucas
@RatoCanguru_Lucas 2 месяца назад
Man, this is gold. Thanks for sharing!
@agsantiago22
@agsantiago22 2 года назад
Great video! Thanks so much for sharing! I think you should consider some academic research program (if you havent't already). I am sure you would do an amazing work. Congrats and thanks again!
@JohnWatsonRooney
@JohnWatsonRooney 2 года назад
Thanks for watching I’m glad you enjoyed it!
@TheEkkas
@TheEkkas 2 года назад
Such a nice vid, if there was a VPN add, I didn't even notice it!
@lagu1ful
@lagu1ful 2 года назад
thank you for the information that you have explained, this is very helpful for the research I am doing
@lagu1ful
@lagu1ful 2 года назад
thank you so much
@jawadch8723
@jawadch8723 2 года назад
This feels illegal. And I love it!
@JohnWatsonRooney
@JohnWatsonRooney 2 года назад
Haha
@amirahmed5905
@amirahmed5905 Год назад
Hi there, I found your channel where each and every video delicately made for web scrapping and automation which helps me a lot as work with web scraping and web automation. I have a request, if possible then please make python data post methods on Stateful api v1 and how to mimic cookies and session to get the job done. Thank you.
@im4485
@im4485 3 года назад
Hi John. Amazing content as always. Do you think I can skip learning scrappy for now? Can I do most of the scraping tasks just by using BS and request html?
@JohnWatsonRooney
@JohnWatsonRooney 3 года назад
Sure you can. If it works for you then carry on!
@edgarasben
@edgarasben 2 года назад
This is amazing! So many things I didn’t know.
@ChrisS-oo6fl
@ChrisS-oo6fl 2 года назад
Complete rookie here. I’m trying to understand scraping to help access my lap times from the mylaps api utilizing my own interface. This is intimidating for a novice like me.
@lokeshchowdary7487
@lokeshchowdary7487 2 года назад
Thank you for making this awesome tutorial.
@vintprox
@vintprox 2 года назад
DevTools is a swiss knife for this kind of reverse engineering! It never gets old for me.
@tobias2688
@tobias2688 2 года назад
Hi John, I loved the video so much that I had to join Patreon to subscribe there to you. Thanks!
@JohnWatsonRooney
@JohnWatsonRooney 2 года назад
Hey! Thank you very much!
@ericbwertz
@ericbwertz 2 года назад
Nice video -- perfect level of detail.
@JohnWatsonRooney
@JohnWatsonRooney 2 года назад
Thanks
@bronxandbrenx
@bronxandbrenx 3 года назад
My master in data extraction.
@ernestomacias5192
@ernestomacias5192 2 года назад
Well it sure is alot better then what API pulling I had to do in my last job it was nothing short of a nightmare.
@giovannimilana6428
@giovannimilana6428 Год назад
Huge thanks this video was a game changer for me!
@Adam-xr6fj
@Adam-xr6fj 2 года назад
I used to do this in PHP without knowing better. Nice to see how it's done in python.
@elliotnyberg9332
@elliotnyberg9332 2 года назад
This is amazing and will help me alot. Thank you!!
@nadyamoscow2461
@nadyamoscow2461 3 года назад
Clear and helpful as usual. Thanks a lot!!
@tubelessHuma
@tubelessHuma 3 года назад
This is very useful trick John. 💖
@xXEnigmaXx001
@xXEnigmaXx001 Год назад
This works in a lot of cases were the API is open. However, in cases like Social Media Platforms were you have to have an account to access the API or a Wordpress Websites were the API is turned off it wont work. The best approach in these situations, is really just to use Selenium or anything close and try to crawl the pages with a delay.
@JohnWatsonRooney
@JohnWatsonRooney Год назад
Yea as you say anything that needs a login is much more tricky, in some cases you can pass the cookie and headers around and maintain the session but sometimes selenium/playwright is the best option
@shaikusman536
@shaikusman536 Год назад
John brother your content is Amazing..........Pls improve audio quality...Respect from INDIA.
@randyallen8610
@randyallen8610 Год назад
Great content. Thanks for this video
@PatoToledo23
@PatoToledo23 2 года назад
I was using this too and burp suite to have more control in some times
@neekolad
@neekolad 8 месяцев назад
ive experienced being blocked when scraping a web site, but can i be blocked when i scrape through api? should i use sleep like when scraping an actual website? keep up the good work, not many wholesome web scrape content out there
@JohnWatsonRooney
@JohnWatsonRooney 8 месяцев назад
You can still be blocked yes, but this way is the best for getting the data - less requests and more data per request
@neekolad
@neekolad 8 месяцев назад
okay, so should i get information for 10 different products if i can in one api call, instead of doing it 10 times and getting information about one product in each call, no matter if its the same amount of data, its easier for server to handle it?@@JohnWatsonRooney
@joseluisdiaztorres825
@joseluisdiaztorres825 Год назад
Thank you so much for the tutorial. I have a question, how to get a Authentication value that include the header, can I do automatically and without selenium? In this moment, I get it manually in the network tab, further, the authentication value expire after of a time.
@JohnWatsonRooney
@JohnWatsonRooney Год назад
I don't think you can no - what I do is load up the page once with selenium, grab all the headers and cookies and use them in subsequent requests using this method.
@swaidaslam8266
@swaidaslam8266 2 года назад
Wow, just did not know something like that existed. Thanks :)
@JohnWatsonRooney
@JohnWatsonRooney 2 года назад
That’s great thanks for watching
@samsloot
@samsloot 2 года назад
Do you have any tips to get, for example, your own instagram followers? When I copy the cURL request and paste it in Insomnia, I don't get the same JSON as the browser, but instead I get some HTML which previews the Instagram logo. I assume it has to do with some authentication but I have know idea how to fix it.
@tnssajivasudevan1601
@tnssajivasudevan1601 3 года назад
Great information Sir, really helpful.
@chadgray1745
@chadgray1745 2 года назад
Thanks for this - and other - videos, John. Super helpful! Regarding the cookie expiring, can you suggest a way to use playwright to programmatically generate the cookie used on the API request? I am assuming that cookie isn’t the same as the cookie used for the request of the html but maybe that’s wrong?
@TypicallyThomas
@TypicallyThomas 2 года назад
Thanks so much. This makes things a lot easier
@JohnWatsonRooney
@JohnWatsonRooney 2 года назад
Great to hear!
@SkySesshomaru
@SkySesshomaru 3 года назад
This is gold man, thank you! Just WOW.
@JohnWatsonRooney
@JohnWatsonRooney 3 года назад
Thanks!
Далее
This is How I Scrape 99% of Sites
18:27
Просмотров 71 тыс.
А Вы за пластику?
00:31
Просмотров 12 тыс.
This script I threw together saves me hours.
13:38
Просмотров 19 тыс.
The most important Python script I ever wrote
19:58
Просмотров 198 тыс.
The Biggest Mistake Beginners Make When Web Scraping
10:21
still the best way to scrape data.
41:01
Просмотров 16 тыс.
Industrial-scale Web Scraping with AI & Proxy Networks
6:17
Build a Web Scraper (super simple!)
23:26
Просмотров 955 тыс.