Тёмный

How I Use Data Pipelines in my Web Scrapers 

John Watson Rooney
Подписаться 89 тыс.
Просмотров 2,5 тыс.
50% 1

Опубликовано:

 

25 окт 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 12   
@piercenorton1544
@piercenorton1544 3 месяца назад
What if we want to take a full page so we can give it to an LLM to parse? For example, what if we were parsing financial filings or contracts. We want chunks or pages to pass to an LLM to structure outputs. I think splitting the text on a tag and then joining the items together would be best, but maybe there is a better way.
@personofnote1571
@personofnote1571 3 месяца назад
Great point about separation of concerns. As you stated, the scraper should only be concerned with getting data and saving data. I am curious what other use cases would be compatible with scrapy’s pipelines. Would pipelines be a good place for things like “save to this OTHER database”, or “upload to S3”, or “ping this api”? Will be diving into this myself soon but curious about your thoughts here.
@JohnWatsonRooney
@JohnWatsonRooney 3 месяца назад
yes absolutely, you could use an item field to decide whether to upload to X DB or Y DB, and certainly uploading to S3 would come here too. pinging an API you mean like to notify another system? I think that would be a great use case for pipelines (not thought of that before)
@alexdin1565
@alexdin1565 3 месяца назад
Hi Johne i have a question can we use scrapy with django? i mean make the webscraper as online tool
@RicardoPorteladaSilva
@RicardoPorteladaSilva 3 месяца назад
I think you could create script to scrape separately and load de result to django databases. The processing occurs in separated moments. I hope you understand my English, I'm from Brazil, learning English. if you need more specific please feel free to getting in touch. its a great pleasure to help you
@JohnWatsonRooney
@JohnWatsonRooney 3 месяца назад
this is pretty much it!
@HitAndMissLab
@HitAndMissLab 3 месяца назад
@@RicardoPorteladaSilva what is the advantage of using Django DB?
@HitAndMissLab
@HitAndMissLab 3 месяца назад
Do you have any videos on how to use proxies in Python?
@JohnWatsonRooney
@JohnWatsonRooney 3 месяца назад
I don’t specifically but that’s a good idea I will create a video on proxies inc how to use
@jjeffery129
@jjeffery129 3 месяца назад
What’s wrong with scrapping them as string and change them in the end in your output file?
@elmzlan
@elmzlan 3 месяца назад
I hope you have a course
@CeratiGilmour
@CeratiGilmour 3 месяца назад
Funcionaría junto con selenium?
Далее
Is this how pro's scrape HUGE amounts of data?
20:34
МЭЙБИ БЭЙБИ - Hit Em Up (DISS)
02:48
Просмотров 304 тыс.
I Hacked Into My Own Car
20:29
Просмотров 2,8 млн
My FBI Declassified Story
9:26
Просмотров 6 млн
This project got me my first AI Engineer job
18:17
Просмотров 3,2 тыс.
This is How I Scrape 99% of Sites
18:27
Просмотров 135 тыс.
What Makes A Great Developer
27:12
Просмотров 202 тыс.
How I run my Python scripts everyday in the cloud
17:11
How the Cybertruck might KILL Tesla
27:53
Просмотров 192 тыс.
How programmers flex on each other
6:20
Просмотров 2,5 млн
3 Levels of WiFi Hacking
22:12
Просмотров 2,1 млн