I signed up for IPRoyal after watching your videos but couldn't get the static proxies to work. It returned timeout error. I tried changing the network settings but could not solve the problem. Do you have any idea what could cause such issues? Thanks.
Hey, the video is really really helpful. Thank you very much for it! You are the go to channel for me whenever I wish to research on any topic related to web-scraping. You're doing a great job man! Also, in the end of the video you said that this is not your preferred method for scrapping infinite scroll dynamic websites. So which one is your preferred method, which is also scalable?
Thanks I’m glad you’re enjoying the content! The way I mentioned is by reverse engineering the sites backend api and making requests to it - I have a few videos on my channel that explain the basics of this idea!
I'm literally just getting started with python and need a fast study done for my thesis so I decided to study word usage on reddit. Should go through with it? Idk if i need any special stuff :/ I don't even have python installed. Cheers
This is so timely for me @John, as I was literally building a scraper yesterday to scrape a website that used XHR. Top content! Additonally, would it be possible for you to share the java script "code" that was used in the PageMethod function?
Thanks for another great video! This method seems so easy and wanted to try it myself but unfortunately, it seems that scrapy-playwright doesn't work on windows. Some sort of Linux emulation (WSL) is required. Also thanks for the iproyal discount. I was looking for such a service and your discount comes just perfect, will use it after NYE party :) PS: Everyone, a Happy new year!
You absolutely can - it’s not as easy to setup and use well in my opinion but fits well into a specific use case. However I don’t think it’s been updated for a while and I’ve had some people tell me it hasn’t been working for them recently. Give it a go and if it works for you then great
brother, i'm in a big problem. last 20 day's i'm trying to scrape one of site. but i failed all the time. I watched 100 of videos. but i failed. can you scrape a site for me. if possible plz reply to my comments. this is my final year project. you just scrape me some data. my final year defense in knocking my door. plz brother if possible reply my comments.
As a newbe... Does anyone have some experience with a PUP - a command line tool for processing HTML? Is there any way to import it to the Playwright project the same way as the HTMLParser? Thanks.
This is nice, but my problem with using playwright is that it the twisted reactor always leads to issues when I want to run my spiders using python scripts
you could try using Splash? It hasn't been updated in a few years but may still work. Or create your own scraping/render service separate and use that?
not something I've done before, but i know that the app will have a backend server/api that it makes the requests too, you;'d need to find this and reverse engineer it. or it might be possible to emulate the app on a pc/through browser?
@@JohnWatsonRooney most apps have ssl pinning security so we can not intercept it, to bypass this we can use nox player and man in the middle proxy to intercept from nox
You are a great tutor, and I suggest a video discussing and comparing all of these tools, why and when we could use them what is the best compo great work keep making tutorials
I watched the section between 4:30 and 5:00 (roughly) so many times. The off-by-one space there was extremely distracting as well as satisfying when fixed. Cheers
@@kanwaradnan4849 for api usage it’s important to look at thenpayload that is sent. Is this a data form or a json payload? Also look good at the headers, this will fix 9/10 of your issues. Still doesn’t work? You probably forgot to fake some cookies :)
@@JohnWatsonRooney I use selenium it's powerful but some time it's some modules not working properly and thay makes me angry 😅 and I think I have move to new solutions but then I reminded myself that I have use proxies. But I don't like use of proxy I don't know why but I scares from using proxy. Is their any free proxies?
I'm wondering if there is a way to use playwright with scrapy's shell? For me scrapy shell just seems to open the browser at the url and then block the scrapy shell from opening.
@@techlogger so after some digging with fiddler I have found the api and also able to get the video embedded url but couldn't get it to stream, since I have little JavaScript knowledge and unable to use the devtool (because they blocked it) this is as far as i can go for now, will try it later 😌.. and also as you mentioned this site use Obfuscated JavaScript.
@@skshaheen7506 you can unblock devtools, stream links ( m3u8 links ) are mostly restricted.. you have to pass proper headers, payload or in some case a decryption key. And yes that website is heavily obfuscated. But it's doable.