Nodejs Puppeteer Tutorial #11 - Scrape Websites with Infinite Scrolling

Подписаться 7 тыс.

Просмотров 8 тыс.

50% 1

🌐 NodeMaven Proxy Provider: go.nodemaven.com/scrape
💥 Special Bonus: Use "Michael" at checkout for an extra +2GB of bandwidth.
🤖 2captcha Captcha Solving Service: bit.ly/2captchapromo
This puppeteer tutorial is designed for beginners to learn how to use the node js puppeteer library to perform web scraping, testing, and creating website bots. Puppeteer is a Node library that provides a high-level API to control Chrome or Chromium over the DevTools Protocol. Puppeteer runs headless by default but can be configured to run full (non-headless) Chrome or Chromium.
Playlist: • Nodejs Puppeteer Tutorial
Code: github.com/michaelkitas/Nodej...
Join our Discord: / discord
Puppeteer API: www.npmjs.com/package/puppeteer
Donate
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
PayPal: support@websidev.com
Bitcoin Wallet: bc1q05j8gcnq4mzvgj603cxdc8xxck4jgnu2ljsrt4
Ethereum Wallet: 0x5e7BD4f473f153d400b39D593A55D68Ce80F8a2e
Social
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
Website: websidev.com
Linkedin: / michael-kitas-638aa4209
Instagram: / michael_kitas
Github: github.com/michaelkitas
Business Email: support@websidev.com
Tags:
- Nodejs Tutorials
- Puppeteer Nodejs
- Nodejs Puppeteer Tutorial
- Puppeteer Tutorial for Beginners
#nodejs #puppeteer #webscraping

Опубликовано:

20 апр 2022

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 22

@talkohavy Год назад

Once again, a great video :) Love your content! I have a suggestion to slightly improve performance for this one. Instead of re-assigning the array to itself with the same elements that it already had plus the new ones, what I did is this: let curCount = 0; // first change while (curCount < targetCount) { curCount = await page.$$eval(someCssSelector, (arr) => arr.length); // second change const prevHeight = await page.evaluate('document.body.scrollHeight'); await page.evaluate('window.scrollTo(0,document.body.scrollHeight'); await page.waitForFunction(`document.body.scrollHeight > ${prevHeight}`); await new Promise((resolve) => setTimeout(resolve, 1000)); } // third change: const items = await page.$$eval(someCssSelector, (arr) => arr.map((item) => item) ); So, basically i'm only returning the length of the array, to speed up performance, and when I reach the limit I desired, I break out of the loop, and do one more last search, in which i'm returning the actual elements. Let me know what you think.

@MichaelKitas Год назад

I haven’t tested the script but the idea is great 👍 It’s a great improvement for performance

@MatheusSilva-qm3ph 2 года назад

Awesome...very cool!

@kirillbaryba746 2 года назад

Awesome 😎👍

@stevefinley8602 2 года назад

Solid.

@shishirkumar9014 Год назад

That's dope

@gappuma7883 Год назад

i was having a hard time setting things up as you say in 6:45 . finally got it to work after hours like this return items.map((item) => { let name = item.querySelector("h3").innerText; let desc = item.querySelector("p").innerText; it was missing the let part

@kuldeepsharma7499 Год назад

I watched your both selenium and pupeeter playlist I must say it's hell lot of information, great stuff. Just a suggestion keep the videos slow let viewers sink in with all the information. Copying, pasting, writing lines it's getting too fast for a newbie.

@slavivna 3 месяца назад

I'm interested in how to take a screenshot of individual elements on a page, can you help me, please?

@MichaelKitas Месяц назад

Here’s how you can take a screenshot of an element: const element = await page.$('#unique-element-id'); await element.screenshot({ path: 'element.png' });

@southredmondtoxik1885 Год назад

Please do cheerio scraping tutorials

@greendsnow Год назад

Thank you. Can you please show us how we scrape until browser sees a certain element in the dom, like scroll (through Today's elements) until the //title[@class='old' and text()='Yesterday']?

@MichaelKitas Год назад

Every time you scroll down, use waitForSelector function and check if it returns selector otherwise continue. It should be the first thing you do inside the loop

@greendsnow Год назад

@@MichaelKitas Yeaaaay :D ! Thank you. I've managed to do it. By the way sometimes it's better to use the selector instead of window to scroll down, because the owners of that site block scrolling using strange tricks. `document.querySelector('#SELECTOR').scrollBy(0,1000)` Also changing overflow style to auto or visible might be necessary on the element we're scrolling. like: `document.body.style.overflow = "visible !important"` OR `document.body.style.overflow = "auto !important"`

@aadargupta Год назад

I’m trying to do this for a larger set of loading. For example, I want to do this for 12000 products on Vivino, and it seems to give me some timeout errors after some point.

@MichaelKitas 8 месяцев назад

It sounds like you're hitting a timeout due to the large volume of data. Try implementing pagination to process products in smaller batches, increasing the timeout limits in Puppeteer, or using a headless browser to reduce load times. Also, ensure your system has adequate resources (like memory) for large-scale scraping. Remember to respect Vivino's terms of service and use proper scraping etiquette, such as rate limiting your requests to avoid overloading their servers. Good luck!