Тёмный

Learn Python - Web scraping a private API - Questions from the comments episode 2 

Make Data Useful
Подписаться 32 тыс.
Просмотров 8 тыс.
50% 1

It's questions from the comments time! In this episode, we explore how to scrape a website by using its private API. You will learn about the requests library, functions, for loops and a little bit of pandas. If you ever wonder "Why should I learn programming" I hope this video helps!

Опубликовано:

 

16 сен 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 51   
@danielderma
@danielderma 4 года назад
I am taking two of the most highly rated courses on udemy about scrape and they do not have half of your production, and teaching you are great. Success for the future!, Éxito.
@MakeDataUseful
@MakeDataUseful 4 года назад
Thank you!! Plenty more to come :)
@rafidrahman8654
@rafidrahman8654 4 года назад
Helloooo, thank you so so much for literallyy making a video about my comment. Learnt so so much about python and API requests. You are one of the best teachers in youtube period. This certainly gave me a head start in my project and I can wait to complete it ! However, there is this one issue I am facing. According to calculations for 1 million job listing, there should be = 1 000 000 / 30 = 33 333 pages, given 1 page has 30 listing. But whenever I cross the 332 page mark and hover into pages like 400/500 I get the following message, " {"message":"[query_phase_execution_exception] Result window is too large, from + size must be less than or equal to: [10000] but was [1000020]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting.","error":{"status":500}} " This is a major problem I am facing. Given the genius you are, I am sure you will come up with an idea to fix this lol. Thanks in advance loll !
@MakeDataUseful
@MakeDataUseful 4 года назад
Unfortunately, it looks like you are at the mercy of the data provider, I did a bit of google and only found these two StackOverflow questions stackoverflow.com/questions/63086701/web-scraping-a-job-platform-with-1-million-listings stackoverflow.com/questions/63097845/web-scraping-api-see-the-scroll-api-for-a-more-efficient-way-to-request-large The answers confirm the fact that the website has a limit on 10,000 requests, now! While you can only get 10k max requests.... You can get max 10k requests per filter AND one of the filters is longitude and latitude based... What this means is, technically, IN THEORY, we could script a clever bit of Python to walk the earth collecting job listings along its way. Open up your dev tools and double click on the map and see the API calls flowing in, if walking the earth sounds a little painful and time consuming you could also find a list online of all the major capital cities around the world and their longitude and latitude as a good start. Best of luck! - Adam
@rafidrahman8654
@rafidrahman8654 4 года назад
@@MakeDataUseful so unfortunately I am able to scrape 10, 000 listings only :( ? Is it possible to use Selenium Web Driver and scroll through all the pages ( like you did in the youtube scraping video) and then collect all the data from the html ? Just a thought haha !
@MakeDataUseful
@MakeDataUseful 4 года назад
@@rafidrahman8654 sadly not, selenium will simply be triggering the same API. I would suggest applying a combination of different filters and you should be able to get 100's of thousands of listings. The 10k limit is only for the current query.
@rafidrahman8654
@rafidrahman8654 4 года назад
@@MakeDataUseful Okayy understood ! Can I apply the filters through my python code ?
@tadashi_hamada
@tadashi_hamada 3 года назад
Loved the enthusiasm when you were checking the website for data! This was a great course. Just what i needed. You got a new subscriber :)
@MakeDataUseful
@MakeDataUseful 3 года назад
Thank you! Appreciate the feedback.
@trentemollerules
@trentemollerules 4 года назад
Great production and knowledge. Made it look real tight. Also like the way you explained it all. Subscribed!
@MakeDataUseful
@MakeDataUseful 4 года назад
Hey thanks so much! I am still working through my audio levels but really appreciate the positive feedback. Thanks for subscribing!
@trentemollerules
@trentemollerules 4 года назад
@@MakeDataUseful I have been struggling to scrape www.snakeriverfarms.com/american-kobe-beef.html - was unsuccessful finding a private API to call, and had limited success scraping a single weight and price from one of their pages. The issue is that with a drop down menu to select each weight of steak, it seem hard to use bs4 to extract any information. Is this a use case for selenium? I was able to scrape holygrailsteak.com/collections/japanese-wagyu with much success for reference
@jcinthailand
@jcinthailand 4 года назад
Hey!! yes, exporting to SQL would be a very nice thing to know
@mrhide5690
@mrhide5690 Год назад
Amazing value. Wish you make vids like this all the time(make money with python)! Thank you!
@christoph231090
@christoph231090 4 года назад
Really good video on web scraping. Please do more of that kind of videos. If you could do the video of how to feed the data into a spl database that would be awesome - thx.
@MakeDataUseful
@MakeDataUseful 4 года назад
Can do Christoph! Thanks for the feedback, keep an eye out for an upcoming video about saving to a database :)
@christoph231090
@christoph231090 4 года назад
@@MakeDataUseful thank you for the very quick reply. Perfect! I am looking forward to it. I did subscibe to your channel and rang the bell, so I 'll not miss it. Keep going - with those well presented content the 1000 subscribers should be no problem till 2020 ends ;)
@nomoremeetings1183
@nomoremeetings1183 Год назад
This was really good content! I was able for follow along on my system and got the same results.
@justin3594
@justin3594 4 года назад
Awesome.
@MakeDataUseful
@MakeDataUseful 4 года назад
Thanks!
@DeepDiveinUniverse
@DeepDiveinUniverse 4 года назад
Great man !! I will be using this in near future 😀😀
@bozok1903
@bozok1903 4 года назад
Great tutorial! Thanks a lot.
@royteicher
@royteicher Год назад
Thank you bro!
@FullSimDriving
@FullSimDriving 4 года назад
Great video, thank you
@MakeDataUseful
@MakeDataUseful 4 года назад
You are welcome!
@rverm1000
@rverm1000 2 года назад
nice
@KoldbyTheEye
@KoldbyTheEye 3 года назад
Awesome video! Subscribed! Quick question for you though. On the scraping project I'm working on, when I go to copy the cURL bash into the converter as you did, mine has a cookies section as well as the headers, params, data, and python request code. What do you think that means about the site I'm scraping? Should I delete the cookies section of the conversion? Cheers, Joe
@MakeDataUseful
@MakeDataUseful 3 года назад
Test it without and see how you go, if it doesn't want to play fair you may need to look at using the requests Session method to collect those cookies and use them in your request. If all else fails I have a couple of tutorials on using Selenium web browser automation that may help. Best of luck and let me know if you get stuck!
@KoldbyTheEye
@KoldbyTheEye 3 года назад
@@MakeDataUseful Thanks! I tried removing the cookies section and got: NameError: name 'cookies' is not defined I guess the cURL has it there for a reason! I will just keep using that one. I guess it just means that the site i'm scraping will be able to identify that it is me scraping it every time I do, right?
@MakeDataUseful
@MakeDataUseful 3 года назад
@@KoldbyTheEye interesting error, double check your requests.get() make sure there is no cookies=cookies in there
@KoldbyTheEye
@KoldbyTheEye 3 года назад
@@MakeDataUseful I just checked and indeed there is cookies=cookies in my request.post() Should I delete that if I'm trying to remove the cookies line all together? Thanks again man!
@KoldbyTheEye
@KoldbyTheEye 3 года назад
Okay after typing this I went and tried it and viola it worked! So I guess that means that cookies is just optional? I tried googling what cookies really means in this situation, but couldn't find a clear answer. What would be the benefit of using the cookings line the cURL gave me vs just deleting it all?
@dextm8783
@dextm8783 Год назад
im high af and that intro made me laugh
@golamrahman732
@golamrahman732 3 года назад
great lesson !!! It is really good for slow learner. can someone tell me why the data df has only one row and one column?
@richardfitzgerald7896
@richardfitzgerald7896 3 года назад
Thank you for the great content! I'm wondering if there is any way to get this approach to not fail if there is javascript, or at least be accepted as a real and current browser. I'm aware that copying out the curl provides all the headers/user agents etc but some websites seem to still be able to tell that it is not a real browser, perhaps it is because javascript is not rendering properly and it gives it away? any thoughts would be much appreciated!
@MakeDataUseful
@MakeDataUseful 3 года назад
Hey yeah an alternative route is to use Selenium and automate the browser. I have a couple of videos on my channel showing logging in and scraping with Selnium
@alenjose3903
@alenjose3903 4 года назад
shouldn't lat be first and then lng??
@velvetcasuat
@velvetcasuat 4 года назад
Hey man ... what is the tool that you use to work with the data ? Edit: It's jupyter notebook ... I figured it out watching the first episode of "making money with python" .
@MakeDataUseful
@MakeDataUseful 4 года назад
Hi Velvet, I use a mixture of numpy and pandas to clean, transform and analyse data.
@barrsido7070
@barrsido7070 4 года назад
Hi, it is Barrsido from Reddit. I'm having another problem with my code. Most of the game is done, but for some reason the 'bal' variable does not update so during the betting, results, and scoring phases, it messes up on the second run. I put it back into the codeshare. Please msg me on reddit if you see this.
@MakeDataUseful
@MakeDataUseful 4 года назад
Hey Barry, will do!
@barrsido7070
@barrsido7070 4 года назад
@@MakeDataUseful Hi, i've actually figured out how to fix it for the majority of the program, the only spot here it's wrong now it when you get a blackjack, the ball doesn't update.
@thewilltejeda
@thewilltejeda 4 года назад
Is there any way to send a request in order to find potential acceptable parameters? (Once you've already found an useful api curl)
@DeepDiveinUniverse
@DeepDiveinUniverse 4 года назад
One question it's may be related .. I do have user id and password for a website now I want to scrap the data from there ?? How to use this technique ( the one you showed in video) on scraping those data??? Probably if you will give some hints or direction that would be good if you will get time you may make a video may 😀😀 . Thanks 👍
@MakeDataUseful
@MakeDataUseful 4 года назад
Hi Banshidhar, great question. I made an auto-login and web scraping video available here ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-BZMVoYhA7KU.html all the best. Adam
@arfankhan546
@arfankhan546 10 месяцев назад
can use while loop for the page scraping automatically
@MakeDataUseful
@MakeDataUseful 10 месяцев назад
I also get nervous with while loops.... So many many infinite loops 🤣
@gamingsociety5370
@gamingsociety5370 4 года назад
Please share the code.py file with us
@nevinurey2997
@nevinurey2997 3 года назад
Vote up for sqlite
@pupukdspadb7385
@pupukdspadb7385 3 года назад
May i have your email address sir? I need your guidance how to scrape webiste that required username password and entering captcha to login?
Далее
Industrial-scale Web Scraping with AI & Proxy Networks
6:17
Python & MITMProxy: Web Scraping Secret iOS App Data
23:27
Always Check for the Hidden API when Web Scraping
11:50
XPath Crash Course For Python Web Scraping
30:07
Просмотров 28 тыс.
PYTHON BASICS (What I Would Learn First)
24:50
Просмотров 141 тыс.