Pytorch Torchtext Tutorial 1: Custom Datasets and loading JSON/CSV/TSV files

Aladdin Persson

Подписаться 80 тыс.

Просмотров 29 тыс.

50% 1

Видео Поделиться Скачать Добавить в

Опубликовано:

28 сен 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 49

@buithanhlam3726 3 года назад

I found it very difficult to get used to torchtext docs, but then I found your video :) Many thanks!

@salihbalci19 Год назад

could you make a video for new version of torchtext?

@dhawalsalvi8079 2 года назад

Yo, new version of torchtext (0.12) does not have Fields

@subhasish661411 3 года назад

I somehow found the potato quote very inspiring 🤔.

@AladdinPersson 3 года назад

Definitely mislabeled samples in that dataset

@FEARESSERES Год назад

10:20 I felt that pain :D, happened to me, too recently

@stephennfernandes 3 года назад

very nice tutorial, but i get warning texts saying BucketIerator, Field and TabularDataset are being deprecated ... also i cant scale BucketIterator for TPUs and multi-GPUs , any better alternatives ??

@sagsriv 3 года назад

Your examples are left padded but when I use the same bucket iterator on IMDB dataset, they are right padded. This is a bit confusing

@jeremiahjohnson6052 3 года назад

Can you do a video on the updated Torchtext 9.0? I think they revamped much of this and the new features look pretty awesome with subword tokenization implemented. (i.e. 'sub', '_word')

@AhmedIqbal 4 года назад

Please create one video on semantic segmentation using Pytroch CNN. Dataset must contain cancer images + ground truth images. And train model will return best IoU and Accuracy of proposed model.

@AladdinPersson 4 года назад

I will try to keep that in mind, thank you for the comment!

@salihbalci19 Год назад

Always getting error can you help me please... AttributeError: module 'torchtext.data' has no attribute 'Field'

@idobooks909 3 года назад

Thanks! which version of torchtext?

@ugestacoolie5998 3 месяца назад

hi, it seems that torchtext got quite a bit of a changeover and this tutorial's contents are outdated, any chances you might wanna update it?

@MasterMan2015 2 года назад

Thanks for doing that. How to save it once we created it.

@fabianboro4686 Год назад

what a fancy math intro

@lingfengshen8559 3 года назад

Great video, I do really learn a lot, thanks. When I run BucketIterator, it came up with an error 'int' object is not subscriptable , I check my codes but still got no ideas of where the fault is.

@pl4117 3 года назад

Make sure you are using batch_size as an argumant name instead of batch_sizes when calling the bucket iterator. I had this issue too.

@shoebjoarder 4 года назад

Great video! I wanted to ask, how to use TabularDataset to split train, validation and test? Should I use something like this below? train_data, valid_data = TabularDataset.splits( path="data", train="train.json", test="valid.json", format="json", fields=fields ) test_data = TabularDataset.splits( path="data", test="test.json", format="json", fields=fields )

@AladdinPersson 4 года назад

Yes that looks fine to me, from my understanding and this is I believe how I showed it in the video is that you need separate json/csv files and then in TabularDataset you specify those files using train, test and also possibly validation data

@shoebjoarder 4 года назад

@@AladdinPersson Hi Aladdin, I probably have missed out you talking about the validation split. Thank you for the replying, the code is working. :)

@rishadkt837 2 года назад

Feild has become legacy

@user-or7ji5hv8y 4 года назад

Is BucketIterator being phased out in future releases?

@AladdinPersson 4 года назад

Yes, I'm going to wait until that release and update the torchtext tutorials. Hopefully code from Seq2Seq is still being able to run though.

@lazypunk794 4 года назад

Great tutorial, but sadly I think its already outdated. Torchtext has deprecated 'Field' and some other classes and Printing the keys and values of the dict 10:27 doesn't give proper representations of the objects anymore, they probably broke something while updating the code

@AladdinPersson 4 года назад

Will look into it more but I think you might be right about this unfortunately... I just really hope all Seq2Seq tutorials that build on this will still be able to run :/

@lazypunk794 4 года назад

@@AladdinPersson Sorry, false alarm. The mistake in my case was that I used TabularDataset.splits() when I had just a train.csv and no test. There is no need of splits method in that case. I was just mindlessly copying from the video. My bad. The deprecation thing is just a warning as of now. The deprecated stuff will go into torchtext.legacy, so I guess your code will still work in 0.8 with some import changes.

@user-or7ji5hv8y 4 года назад

Is the GitHub page for this available?

@AladdinPersson 4 года назад

Yes: github.com/aladdinpersson/Machine-Learning-Collection/blob/master/ML/Pytorch/more_advanced/torchtext/torchtext_tutorial1.py

@doyourealise 4 года назад

hey please create video on legacy of torchtext.data.field , also in upcoming update how can we load field in torchtext 0.8?

@AladdinPersson 4 года назад

Yeah I saw that torchtext is going to update, I will wait a little bit to make sure they aren't going to make any additional changes and then update my previous videos to the new use of the API

@henricbohm8455 3 года назад

Very helpful tutorial. Is it possible for you to make a tutorial, how to load data which is stored in a SQL database?

@finix7419 5 месяцев назад

torchtext had some changes it seems, can't import these modules with recent version

@mummyskitchen5311 4 года назад

JSONDecodeError: Expecting value: line 1 column 1 (char 0) ______________ Got this error at: JSONDecodeError Traceback (most recent call last) in () 4 test ='test.json', 5 format = 'json', ----> 6 fields=fields 7 ) 8 6 frames /usr/lib/python3.6/json/decoder.py in raw_decode(self, s, idx) 355 obj, end = self.scan_once(s, idx) 356 except StopIteration as err: --> 357 raise JSONDecodeError("Expecting value", s, err.value) from None 358 return obj, end ____________ my dataframe: quote score 0 good product pretty satisfied quality diaper g... 0.9 1 leak extremely leaky doesnt soak -0.7 2 version good leak nee version leak night gud s... -0.6 3 good kid nice product daughter loving hair shi... 0.9

@mirambikasikdar5655 4 года назад

thank you!

@ximingdong503 3 года назад

I have a question, how Can I save (train_data, test_data = datasets.IMDB.splits(TEXT, LABEL)) train data or test data in IMDB dataset as a CSV file? since every time I need 15 mins to load data. thanks bro

@vijaypalmanit 3 года назад

Field is deprecated, so tutorial is no more relevant...

@sabarishwarang6212 11 месяцев назад

Fields TabularDataset are depreciated now. Is there any alternatives?

@m.j.8527 3 года назад

how can I pass pandas dataframe into this process (instead of loading the file)?

@le-0ne 3 года назад

Very nice tutorial! While i was looking at torchtext, I actually came across the libraries torchnlp and allennlp. I couldn't really tell what the differences between them were. Have you worked with them ?

@mehuljain4920 4 года назад

Hi, I am trying to iterate the training data and its working fine but for test data but is showing me an error? Could you pls help me in resolving that error? I would really appreciate it. Thanks