Python Web Scraping Tutorial: scraping dynamic JavaScript/AJAX websites with BeautifulSoup

Подписаться 16 тыс.

Просмотров 31 тыс.

50% 1

This Python Web Scraping Tutorial is about scraping dynamic websites, where the content is rendered by JavaScript.
For this Python Web Scraping Tutorial I used the Steam Store as an example. Because Steam website is an example of heavy JavaScript/AJAX driven website with dynamic content.
To scrape Steamstore website with Python I used only Python Requests and BeautifulSoup (bs4) libraries. With further exporting scraped data to a csv file.
This web scraping Python tutorial is the detailed explanation of how to scrape JavaScript driven pages and websites with Python and BeautifulSoup library for absolute beginners.
To install BeautifulSoup, Requests and Lxml:
pip install bs4 requests lxml
Follow me @:
Telegram: t.me/red_eyed_coder_club
Twitter: / codereyed
Facebook: redeyedcoderclub
======================================
📎️ The SOURCE CODE is available via Patreon:
/ steam-store-with-35670113
======================================
Timecodes:
00:00 - Beginning.
01:09 - Preliminary research (what to scrape)
03:15 - Creating a function that performs GET requests to Steam Store
06:01 - Server response research: what url should be passed in to the get_html() function
09:24 - The scraping plan
09:43 - Getting all Steam Store games with Python Requests, and BeautifulSoup. Scraping pagination.
12:40 - The algorithm of scraping all pages using the pagination GET requests
16:35 - Scraping data of a certain page with games
25:30 - Scraping hovering data of all games on each page, including the data from the hovering window
38:40 - Writing Scraped data to a CSV file
✴️✴️✴️ Also can be useful ✴️✴️✴️
Python tutorial: Namespaces and Scopes - • Python tutorial #7: Py...
Python Regular Expressions tutorial - • Regex Python Tutorial:...
Python tutorial: handling exceptions - • Python tutorial #14: P...
How to read and write CSV - • Python CSV tutorial: H...
✴️✴️✴️ Web Scraping course ✴️✴️✴️
is available via Patreon here:
/ red_eyed_coder_club
or its landing:
red-eyed-coder-club.github.io...
✴️✴️✴️ PLAYLISTS ✴️✴️✴️
🔹Django 3 Tutorial: Blog Engine
• Python Django Tutorial...
🔹Kivy Tutorial: Coppa Project
• Python Kivy tutorial #...
🔹Telegram Bot with Python (CoinMarketCap)
• Python Telegram Bot Tu...
🔹Python Web Scraping
• Python Ebay Scraping T...
➥➥➥ SUBSCRIBE FOR MORE VIDEOS ➥➥➥
Red Eyed Coder Club is the best place to learn Python programming and Django:
Subscribe ⇢ / @redeyedcoderclub
Python Web Scraping Tutorial: scraping dynamic JavaScript/AJAX websites with BeautifulSoup
• Python Web Scraping Tu...
#python #pythonwebscraping #beautifulsoup #bs4 #redeyedcoderclub #webscrapingpython #beautifulsouptutorial

Опубликовано:

30 июл 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 84

@EnglishRain 4 года назад

Another FANTASTIC topic, amazing! I absolutely love the niche topics you select, thank you so much for sharing your good knowledge my friend.

@RedEyedCoderClub 4 года назад

Thank you very much!

@georgekingsley3972 2 года назад

sorry to be so off topic but does any of you know a trick to get back into an Instagram account..? I was stupid forgot my password. I would love any assistance you can give me.

@robertoclay5729 2 года назад

@George Kingsley instablaster =)

@georgekingsley3972 2 года назад

@Roberto Clay thanks so much for your reply. I got to the site thru google and im waiting for the hacking stuff atm. Takes quite some time so I will reply here later when my account password hopefully is recovered.

@georgekingsley3972 2 года назад

@Roberto Clay it worked and I actually got access to my account again. Im so happy:D Thank you so much you saved my account !

@rustamakhmullaev5697 4 года назад

very useful lesson, thank's for your job!

@ticTHEhero 3 года назад

that was exactly what i was looking for, thanks man

@RedEyedCoderClub 2 года назад

Thanks for watching

@bingchenliu1854 3 года назад

That is what exactly I'm searching for! Thank you, man!

@RedEyedCoderClub 3 года назад

Thanks for watching!

@abrammarba 6 месяцев назад

This is great! Thank you! 😃

@igorbetkier856 2 года назад

Such a great tutorial! Thank you for that!

@RedEyedCoderClub 2 года назад

Thanks for watching, and for the comment!

@user-nt1uf4gl1i 3 года назад

finally, i have found you! thx for videos.

@RedEyedCoderClub 3 года назад

@youngjordan5619 3 года назад

awesome. Always had problem with infinity scroll and used Selenium. Now I know how to do it with bs4 thanks to you, cheers :)

@RedEyedCoderClub 3 года назад

Glad you like my video! Thanks for watching!

@KekikAkademi 4 года назад

this trick is awesome !

@KekikAkademi 4 года назад

please more crawling and scraping trick, without scrapy,selenium etc. for pyqt5 gui projects and telegram bot projects :)

@JoJoSoGood 3 года назад

Best video ever ...I will follow your channel from now on

@RedEyedCoderClub 3 года назад

Thank you!

@Shajirr_ Год назад

Tried to use this method with Reddit comment search and it doesn't work - the requests it sends are POST requests. So no conveniently available URL on them which you can use. The requests themselves are JSON objects.

@amrhamza9831 3 года назад

thank you a lot this was really helpful to me thanks again

@RedEyedCoderClub 3 года назад

Thanks for watching!

@tazimrahbar7882 3 года назад

Great explanation sir

@RedEyedCoderClub 3 года назад

Thank you!

@user-qz9dk1uj2k 4 года назад

Good job. Thanks for video. I'm click like

@RedEyedCoderClub 4 года назад

Thank you!

@noelcovarrubias7490 3 года назад

I need to scrape data from walmart, which is all in JavaScript . I'm going to watch and try this tomorrow, hopefully it works!

@RedEyedCoderClub 3 года назад

Thanks for watching!

@MrYoklmn 4 года назад

Спасибо большое!) А не планируешь ли серию уроков по scrapy? Ну и второй вопрос, можешь ли сделать урок по созданию на джанго самонаполняющегося агрегатора(новостей/товаров и т д)? Чтобы сайт сам парсил и заполнял себя. Пытаюсь такое реализовать на джанге и скрейпи. Но проблема с запуском парсера из джанги так, чтобы процесс не блокировался. В итоге привинтил celery, но с ним тоже возникают сложности(reactor ошибку выдает). Или мне не стоит на этом канале на русском писать?

@RedEyedCoderClub 2 года назад

Thanks for comment!

@JackWQ 4 года назад

Hi, thanks for this, but I am encountering the website using "Post" method instead of "Get" in the Request Method, thus not able to replicate what you are doing by scraping the IDs first and copy into urls. The page is just constantly loading and then eventually said page not found. Is there a way to bypass this?

@RedEyedCoderClub 4 года назад

Did you try to scrape Steam?

@shortcuts9005 2 года назад

brilliance

@RedEyedCoderClub 2 года назад

Thank you very much!

@duckthishandle 3 года назад

Very, very good video on this topic. The way you are explaining the things helps understanding the whole process behind getting the data! I am trying to access the data on various sites, but sometimes I get an error message that I "do not have the auth token" or "access denied!".. How can I bypass those?

@RedEyedCoderClub 3 года назад

Thank you. An access can be denied by many reasons. And it's hard to say something definite blindly

@akram42 4 года назад

awesome

@RedEyedCoderClub 2 года назад

Thanks for comment

@sassydesi7913 3 года назад

This is great! How would you scrape something like teamblind.com? Looks like they have infinite scroll & their payload is encrypted for every call. How would I go about getting historical posts data from this website?

@RedEyedCoderClub 3 года назад

I'll look at it. Thanks for your comment!

@joeking9859 2 года назад

Excellent - best video on xhr (gets) nthat i have seen..great work Could you do a video on xhr (posts) please?

@RedEyedCoderClub 2 года назад

Ok, thanks for your suggestion. POST requests require using of CSRF tokens, and it can be quite tricky or even barely possible to bypass this protection.

@joeking9859 2 года назад

@@RedEyedCoderClub thank you for your response. OK I will not try to go down that rabbit whole.

@joeking9859 2 года назад

do you see most sites going to this method to protect their sites from being scraped?

@RedEyedCoderClub 2 года назад

most sites? Not sure. We always can use Selenium or Pyppeteer, for example

@joeking9859 2 года назад

@@RedEyedCoderClub why would selenium or pyppeteer be better?

@akram42 4 года назад

can you host this script online and make it run 24/7 and sent the data to MySQL database? that would be amazing

@RedEyedCoderClub 2 года назад

You can use cron to do it

@ThEwAvEsHaPa 2 года назад

great video really well explained. please can you make video showing login/sign in to website with Request sessions and OAUTH

@RedEyedCoderClub 2 года назад

Thank you. I'll think about your suggestion. Have you any site as an example?

@ThEwAvEsHaPa 2 года назад

@@RedEyedCoderClub Thanks. i dont really have a specfic site in mind, i have just noticed on a few sites i tried to scrape are using oauth and im not sure how to get around it with just requests.

@RedEyedCoderClub 2 года назад

Ok, I'll think about it

@ThEwAvEsHaPa 2 года назад

@@RedEyedCoderClub Thanks bro, keep up the great work

@silvermir84 4 года назад

The While loop doesnt stop @800... what did i wrong? the else: Break doesnt work @ 15:47

@RedEyedCoderClub 4 года назад

How can I know what did you do wrong? Check the conditions of the loop breaking

@user-bj7rl8zd4o 4 года назад

He interrupted the loop by himself

@avinashmahendran6067 2 года назад

Did you get the solution for this error...

@RedEyedCoderClub 2 года назад

What video should I make next? Any suggestions? *Write me in comments!* Follow me @: Telegram: t.me/red_eyed_coder_club Twitter: twitter.com/CoderEyed Facebook: fb.me/redeyedcoderclub Help the channel grow! Please Like the video, Comment, SHARE & Subscribe!

@EnglishRain 4 года назад

I have a challenge for you: 😜 Can you login to WhatsApp Web using Requests library without manually scanning the QR code & without using Selenium? I achieved it using Saved Profile in Selenium but just curious if you can do it using Requests library. Thanks!

@RedEyedCoderClub 4 года назад

Interesting idea. But I'm afraid of WhatsApp they can ban my phone number. They really don't like our "style". I'll think about your suggestion, it's interesting.

@EnglishRain 4 года назад

@@RedEyedCoderClub haha yes, i understand. No worries, let it be, i was just thinking aloud. :)

@mrpontmercy8906 4 года назад

hmm. At the very first step, it finds only 28 links, and then returns an empty list

@RedEyedCoderClub 2 года назад

Thanks for comment

@adrianka9405 4 года назад

def main(): all_pages = [] start = 1 url = f'www.otodom.pl/sprzedaz/mieszkanie/warszawa/?page={start}' while True: page = get_index_data(get_page(url)) if page: all_pages.extend(page) start += 1 url = f'www.otodom.pl/sprzedaz/mieszkanie/warszawa/?page={start}' else: break for url in page: data_set = get_detail_data(get_page(url)) print( all_pages ) This is part of my code where I tried to get detailed info from many pages on the website but it doesn't;t work. Do you have any idea why?

@RedEyedCoderClub 2 года назад

Thanks for comment!

@egormakhlaev4866 4 года назад

Молчанов, это ты что-ли?

@user-wv9vk8io1y 3 года назад

он самый

@RedEyedCoderClub 2 года назад

Yep, it's him

@Shajirr_ Год назад

This search returned 779 results when the video was released. Now, it returns 4927 results. Just to put into perspective how much garbage is being shovelled onto the platform.

@user-yq4dn3gj5p 3 года назад

Привет, это Олег Молчанов?

@RedEyedCoderClub 2 года назад

yep, it's him

@sriramkasu5286 4 года назад

sir need help

@sriramkasu5286 4 года назад

this video is good but what if I want to scrap data from website after logging in and getting details present in that logged account since the html wont work because logged in page cannot be requested

@RedEyedCoderClub 4 года назад

ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-wMf7LJn0k4U.html

@sriramkasu5286 4 года назад

@@RedEyedCoderClub thanks

@postyvlogs 3 года назад

Please provide source code without Patreon

@RedEyedCoderClub 2 года назад

Thanks for comment. The project is very simple, there is no need in source code at all

@anikahmed7456 3 года назад

please make a video on these website abc.austintexas.gov/web/permit/public-search-other?reset=true Search by Property Select- Sub Type : any Date : any Submit inthis website data where url doesn't changes i try so many time but couldn't success. also it's has JavaScript pagination link : javascript:reloadperm[pagination number] which is changes randomly Please make a video 🙏🙏🙏

@RedEyedCoderClub 2 года назад

Thanks for comment!