What I'd Add FIRST To a new Scrapy Project

Подписаться 87 тыс.

Просмотров 34 тыс.

50% 1

In my last Scrapy video we created a basic project from scratch but found some limitations. In this episode we will go through how to use Items and the Itemloader classes in Scrapy to make our project better. The Items class allows us to define fields for our data within our items.py, and utilises the ItemLoader to help us clean the data before loading it ready for use.
Scrapy p1: • Scrapy for Beginners -...
-------------------------------------
Disclaimer: These are affiliate links and as an Amazon Associate I earn from qualifying purchases
-------------------------------------
Digital Ocean (Cloud Servers, Affiliate Link) - m.do.co/c/c7c9...
Sound like me:
microphone amzn.to/36TbaAW
mic arm amzn.to/33NJI5v
audio interface amzn.to/2FlnfU0
-------------------------------------
Video like me:
webcam amzn.to/2SJHopS
camera amzn.to/3iVIJol
studio lights amzn.to/3aBpKik
small lights amzn.to/2GN7INg
-------------------------------------
PC Stuff:
case: amzn.to/3dEz6Jw
psu: amzn.to/3kc7SfB
cpu: amzn.to/2ILxGSh
mobo: amzn.to/3lWmxw4
ram: amzn.to/31muxPc
gfx card amzn.to/2SKYraW
27" monitor amzn.to/2GAH4r9
24" monitor (vertical) amzn.to/3jIFamt
dual monitor arm amzn.to/3lyFS6s
mouse amzn.to/2SH1ssK
keyboard amzn.to/2SKrjQA

Опубликовано:

30 сен 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 65

@MohAmuza 3 года назад

I scraped a product and some items don't have some data so the result is a nonetype which means None, I created in the items.py a function to check if it is None print something: def check_gift(value): if value is None: return "No gift" else: return value but it don't work where is the problem?

@davyroger3773 3 года назад

Thanks! the documentation did not go into enough depth and im glad someone made a comprehensive video on it

@janekstern Год назад

You videos helped me understand scrapy more than any other resource, ty!

@victormaia4192 3 года назад

Great tutorial! very easy to follow, had no problems, about the typos, I'm the worst typer ever, but tabnine always saves my life.

@carinafelnecan7802 2 года назад

Thank you, I learned a lot from this video:)

@linuxinstalled 3 года назад

I wish this video had more exposure. I greatly appreciate that you took the time to put this series together. Being able to see these examples of the various mechanics behind scrapy has been hugely helpful. Thank you again.

@JohnWatsonRooney 3 года назад

Glad you enjoyed it! Thank you

@hendrikfeddersen6768 3 года назад

Thanks a lot. The videos are very clear. Do you mind explaining please in one of your next videos the correct folder structure of a Scrapy project and what file goes where and why.

@mandela_byron 2 года назад

Hello John, could you do a video on how to host the scrapy scripts

@JohnWatsonRooney 2 года назад

Hi! Yes I've been wanting to cover this for a while, unfortunatley ScrapyD doesn;t work with the latest version of Scrapy, so the best alternative I could come up with was hosting the Spider on a Linux server and using a cronjob to run it every X hours. Would that be of interest?

@mandela_byron 2 года назад

@@JohnWatsonRooney Sounds great. Looking forward to that. I've been having challenges as to how best to host my scraping scripts, I know there's some among us who also face the same challenge. Thanks, your efforts are much appreciated

@woldemarkiev 2 года назад

Great tutorial!! It really helps to understand

@cosmicblack 2 года назад

Great video. Thanks!!!

@codewithnacho 3 года назад

Awesome vid! It answered my questions with Item Loaders. Docs were confusing me haha

@JohnWatsonRooney 3 года назад

I know! The docs are good but also, not so good haha

@yangvictor5349 Год назад

thank you for sharing

@abukaium2106 3 года назад

Great video. I wish a video of scrapy using proxy from you

@shihlun5291 2 года назад

Thanks for the tutorial, after watching it, it gave me a better understanding of scrapy itemloader documents.

@Daviuliano Год назад

Super nice, however I am struggling to understand how would that work with a dynamic website where I am following a GET method which returns a data in json format. I do a bit of working around and convert it to a dictionary - but can’t seem to get it to return an item… any ideas that can help me?

@JohnWatsonRooney Год назад

I think you'd still need to parse through the JSON and then load it into the item loader and item, it's been a while since I've done that though so not 100% sure sorry

@Daviuliano Год назад

@@JohnWatsonRooney thank you… I managed to do it now. Had to yield them all individually. But it’s working 👍🏼

@fatihkarakus6189 Год назад

@@JohnWatsonRooney when i import items I get an error like this: attempted relative import with no known parent package how can i solve this error

@karthikkarthik100 Год назад

Thanks for the informative video, Can't we just write if next_page: instead of if next_page is not None ?

@Abdul_Rafay_Pal Год назад

what would you recommend? splash or playwright?

@JohnWatsonRooney Год назад

Playwright is my go to now

@Abdul_Rafay_Pal Год назад

@@JohnWatsonRooney Thank you very much🥰

@vitalij09 3 года назад

Thanks man!

@maheshsharma-zq2uc 2 года назад

Can you make one project with scrappy to extract stocks information along with historical data

@Scuurpro 2 года назад

How would change a stock item in item loader. It only returns "In Stock" or " " when things are out of stock. Would I create a function with a value and if else statement?

@alexportugal3986 Год назад

Hi, i just don't quite get why you use the itemloader part and all of that stuff when you can do it within the parse function. Seems to me that it gets more complicated to get the same result. Surely there is something I am missing

@JnWayn 2 года назад

Nice to know what the competition is. I got a wisdom tooth. Is it possible with Scrapy to mark a checkbox, then click a button to get to the next page?

@gwulfwud 2 года назад

Thank you! I watched the previous video and then this, and it felt like I know so much about scrapy already. Really really good videos. Keep it up!

@amineboutaghou4714 3 года назад

Another great video ! Very well done John 👏🏼

@fatihkarakus6189 Год назад

when i import items I get an error like this: attempted relative import with no known parent package how can i solve this error

@justinames5439 2 года назад

As the others have said, thanks for your time and effort, a great help. The links connecting to Amazon (e.g. the lighting link) are dead, and you might want to update them. On another front, have you added a video on caching? All in all, really well done, and, again, thanks. jA

@JohnWatsonRooney 2 года назад

Thanks, one of the issues with a lot of the scrapers I wrote is that always age well! I haven’t actually done anything in caching yet no, I’ll add it to my list

@kevin_daang 2 года назад

If i wanted to include when a whisky bottle was sold out, how would i do it with the item loader?

@milank9857 2 года назад

Great explanation as always, really helpful tutorial

@vidproli4231 3 года назад

great tutorial, explain the exact thing I was looking for, thank you

@nadyamoscow2461 3 года назад

Thanks a lot, what you do is amazing.

@sheikhakbar2067 3 года назад

Thanks a lot, that was very helpful.

@dokanplugincustomization1587 3 года назад

Awesome Playlist But i have one question ( products which are sold out they are not giving us any data in its price field i tried to place the alternative value something which you have done in previous vedio using try and except block ) But i failed to do so please guide me

@dokanplugincustomization1587 3 года назад

Sold out products are only giving the output of name and link only

@KhalilYasser 3 года назад

Amazing tutorial. Thank you very much. Can you share the code as usual?

@JohnWatsonRooney 3 года назад

Yes, sure I've updated my repo here: github.com/jhnwr/whiskyspider

@ferilukmansyah3037 3 года назад

thanks for best tutorial

@ShahidulsPerspective 2 года назад

How to save the URL of the extracted page when using itemloader.

@ShahidulsPerspective 2 года назад

I got that. its: l.add_value("url", response.url)

@dcevansuk 3 года назад

Another Excellent Video!!! I have one question; This is working with the parent URL data, is there a way to also use ItemLoader() with the associated child URL scraped data to end up with one combined yield l.load_item()? It could be an interesting video.

@user8ZAKC1X6KC 2 года назад

I am having an issue where it seems like fetch(req) is the going a bit too fast, so it's only catching part of the page. Is there a way to slow it down? I can find it for when the crawler is working, but not for when you're scraping the shell. Thoughts?

@thewheeldeal8439 2 года назад

This is a great video thanks! Question: Can scrapy save item objects to pickle binary files? If so, how? I just find it really convenient to save my scraped data into pickled objects that can be used quickly in other files, but I can't find any doc on that for scrapy...

@alfakih7247 Год назад

More scrappy blog please

@abdulcute 3 года назад

Best Vid for scrapy and best explanation @john Watson Rooney and others i have a one question along item loader that how we extract data if the element have more than one information (e.g. if element have two cell no then Item loader pick only first number not second one) as i learned from you previous vid we use getall()

@TheWhoIsTom 3 года назад

Nice tutorial!! Would be nice if you would show how to store the data of THIS code (item loader) into mongo DB. :)

@JohnWatsonRooney 3 года назад

Thanks! Sure, I’m going to extend this project to cover more of Scrapy’s features, including pipelines and databases

@TheWhoIsTom 3 года назад

@@JohnWatsonRooney Awesome. Thanks a lot :)

@alessandr2 3 года назад

Thanks for the tutorial!! One question, what part of the new code prevents the error to appear if there is no price info?? Thanks in advance !!!

@NatureLover02005 3 года назад

Excellent!!!

@GordonShamway1984 3 года назад

Super

@salimbo4577 3 года назад

thank you so much. is there a way i can scrap audio data like sound data ?

@leleemagnu6831 3 года назад

John, Another great video. In the title the first word should read Scrapy or the video won't come up in a search. Let me wish you a, well deserved, fantastic Christmas ! e

@JohnWatsonRooney 3 года назад

Oh wow I didn’t notice! Thank you for pointing that out, I’ve changed it. Happy Christmas to you too!