How to build custom Datasets for Text in Pytorch

Aladdin Persson

Подписаться 80 тыс.

Просмотров 32 тыс.

50% 1

Видео Поделиться Скачать Добавить в

Опубликовано:

28 сен 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 53

@tomkohler1285 3 года назад

This will save my live. I try loading data since one week and only fail.

@haideralishuvo4781 4 года назад

Awesome tutorial , best channel on pytorch :D

@aboalifan123 3 года назад

Thanks Aladdin, best Pytorch tutorials on the web

@user-or7ji5hv8y 4 года назад

In future videos, may be you can also add an explanation as to why you architected the objects in this way.

@sachavanweeren9578 2 года назад

Thanks for the tutorial. It might be worthwhile to show intermediate results of what different parts do earlier in the video to show exactly what certain code snippets do

@hrithicksen3644 4 года назад

Bro, please implement more papers. Make a video on How to use YOLO in torch... Please dude

@kirtipandya4618 3 года назад

In Custom Dataset class one should not add more augmentations or processes. It makes the training very very slow. Do you know any hack to fix this? Here you open the image file, numericalize the text which makes the dataloading process very slow.

@adesiph.d.journal461 4 года назад

Amazing Tutorial. Thanks for it! I am missing the need .unsqueeze(0) for each item in the batch while assigning it to the imgs. Any input on that would be much appreciated. Thanks!

@AladdinPersson 4 года назад

What time was this in the video?

@mariyahendriksen2753 3 года назад

@@AladdinPersson around 23:25

@takagisa4928 2 года назад

It's a really nice tutorial,thanks a lot!

@gaurikmukherjee9692 3 года назад

Amazing video! Had one doubt. Does spacy remove punctuations and white spaces, because it is not doing that when I am trying?

@ahmetsuna4117 3 года назад

Great video Aladdin. Thanks. I have one question: at the last of the video, sequence lengths seems different. Why they do not equal to [26,32], isn't that a mistake?

@ameerhamza111 3 года назад

Hi, could to make any tutorial to make captions on individual frames of an video . please!

@mohitsinghpawar9387 4 года назад

Thanks a lot 😊

@AladdinPersson 4 года назад

Hopefully it is useful:)

@robertnakano2596 2 года назад

Link to the loader_customtext.py file in Alladdin's repo: github.com/aladdinpersson/Machine-Learning-Collection/blob/master/ML/Pytorch/Basics/custom_dataset_txt/loader_customtext.py

@DEEPUSNDS 3 года назад

How we can build custom dataset on machine translation corpora?

@AladdinPersson 3 года назад

I think you can do very similar thing as we did in the video just that we would have to do it for the two languages used in the translation. To make it easier you could also check out torchtext which could make it a lot easier to load datasets

@foobar1672 3 года назад

Very useful source code. Shouldn't remember it by heart, but worth to understand. Thank you!

@PaAGadirajuSanjayVarma 4 года назад

Give this man a nobel prize

@zhenyuqiu1480 3 года назад

helpful to me！thank you！

@simoneparvizi775 2 года назад

The video is great, really. Just 1 thing that (personally) would make everything literally perfect: could you explain literally everything? Like when you mention transform at 5:50 and you said that you put it as None, explain why etc. As well as for the rest. Basically what you did at 6:40 for the "csv" function explanation. Again, this is only my personal opinion and it would personally help me so much Keep up the great work!

@vijayendrasdm 4 года назад

Thanks for the video. This is helpful. Waiting for the next.

@ArunKumar-sg6jf 2 года назад

what value in pad_idx in padding value tell me range of values

@deepshankarjha5344 4 года назад

this is the best pytorch tutorial on the internet. even better then the doc provided by the website

@AladdinPersson 4 года назад

Thank you:)

@curatorsshelf393 11 месяцев назад

Thanks a lot for the video. Update for the spacy configuration: spacy_eng = spacy.load("en_core_web_sm") - is the correct way to do now :)

@rajatsharma6137 2 года назад

amazing content, learnt so much from your channel...however your coding style is a bit strange....I am no one to judge you...but you code in inverted fashion which makes it difficult to follow the stuff...for ex. you write the function calls first and then you go on defining the classes and methods..so what happens is that the structure of your code is not clear initially....however everything comes together by the end of the code...anyways, thanks for all the efforts that you have put and all the best!

@sayedathar2507 3 года назад

Thanks Alot for your videos it is helping me alot to learn pytorch , I am trying out to build an Image Captitioning model on a Custom Dataset , Your Videos on Image Captitioning will be useful alot :) , Thanks alot again

@TheFotbollen10 3 года назад

Great video! Do you have an idea of how to translate from english to python code (with custom dataset) using transformer?

@2mitable Год назад

we can also use the torchtext Field class for the EOS and SOS and in the same class we have build vocab too

@sayedathar2507 3 года назад

This was really helpful thanks alot bro your videos are saviour Love You :)

@thecros1076 4 года назад

Really enjoyed the learning journey with u❤️❤️

@AladdinPersson 4 года назад

We're only getting started!

@thecros1076 4 года назад

@@AladdinPersson 🔥🔥❤️

@feravladimirovna1044 4 года назад

Can I ask what is the secret behind this beautiful intro?

@supervince110 3 года назад

Dude, you are simply awesome!

@siennypoole4366 2 года назад

Hi! anyone know how to fix imgs = [item[0].unsqueeze(0) for item in batch] raise AttributeError(name) AttributeError: unsqueeze

@vincentmichael089 2 года назад

I think he mentioned it in min 31:30

@soumyajahagirdar6708 4 года назад

what if i have two text files , meaning the input to the model is image as well as some text and the output is also text?((Image+text_data)-->(text_data))? Do I have to create two vocabularies??

@AladdinPersson 4 года назад

If we are doing machine translation for example then we would need two different vocabularies one for each language, However if both the input text and output text are in the same language then you can reuse the vocabulary

@soumyajahagirdar6708 4 года назад

Thank you so much!

@sahil-7473 4 года назад

Hello Sir, Can you explain what are these 'pin memory' and 'collate function' actually means? I did go to documentation. But I didn't understand fully. Can you explain in easier way? That would be helpful. Rest of them were understood very well. Thanks

@AladdinPersson 4 года назад

pin_memory=True should be set as default as is going to speed up the model by pinning the video memory for the model computations but the internals of how that works I'm as clueless as you. The collate function is for additional processing you want to do on the batch you've collected, so in this case we setup how to load all of these captions but when we actually have the batch we need to make sure they are all padded to be of equal number of time steps, this is done using the collate function

@sahil-7473 4 года назад

@@AladdinPersson Thanks.

@andrewwilliam2209 3 года назад

Hey Aladdin, is there any advantage of doing this over using let's say the Vocab that torchtext provides? I'm currently exploring PyTorch for a project.

@AladdinPersson 3 года назад

I prefer to build things myself whenever possible so there is no lacking in my understanding but torchtext is great too. Perhaps this is more low level and isn't actually what you would use for larger projects but can be useful for understanding

@andrewwilliam2209 3 года назад

@@AladdinPersson That's a great mindset to have! I'm actually pretty early in my deep learning journey. Currently just started a small project using BERT. Thanks for the video, will be trying to take away what I got here for my own project :D