Тёмный

Pytorch Torchtext Tutorial 1: Custom Datasets and loading JSON/CSV/TSV files 

Aladdin Persson
Подписаться 80 тыс.
Просмотров 29 тыс.
50% 1

Опубликовано:

 

28 сен 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 49   
@buithanhlam3726
@buithanhlam3726 3 года назад
I found it very difficult to get used to torchtext docs, but then I found your video :) Many thanks!
@salihbalci19
@salihbalci19 Год назад
could you make a video for new version of torchtext?
@dhawalsalvi8079
@dhawalsalvi8079 2 года назад
Yo, new version of torchtext (0.12) does not have Fields
@subhasish661411
@subhasish661411 3 года назад
I somehow found the potato quote very inspiring 🤔.
@AladdinPersson
@AladdinPersson 3 года назад
Definitely mislabeled samples in that dataset
@FEARESSERES
@FEARESSERES Год назад
10:20 I felt that pain :D, happened to me, too recently
@stephennfernandes
@stephennfernandes 3 года назад
very nice tutorial, but i get warning texts saying BucketIerator, Field and TabularDataset are being deprecated ... also i cant scale BucketIterator for TPUs and multi-GPUs , any better alternatives ??
@sagsriv
@sagsriv 3 года назад
Your examples are left padded but when I use the same bucket iterator on IMDB dataset, they are right padded. This is a bit confusing
@jeremiahjohnson6052
@jeremiahjohnson6052 3 года назад
Can you do a video on the updated Torchtext 9.0? I think they revamped much of this and the new features look pretty awesome with subword tokenization implemented. (i.e. 'sub', '_word')
@AhmedIqbal
@AhmedIqbal 4 года назад
Please create one video on semantic segmentation using Pytroch CNN. Dataset must contain cancer images + ground truth images. And train model will return best IoU and Accuracy of proposed model.
@AladdinPersson
@AladdinPersson 4 года назад
I will try to keep that in mind, thank you for the comment!
@salihbalci19
@salihbalci19 Год назад
Always getting error can you help me please... AttributeError: module 'torchtext.data' has no attribute 'Field'
@idobooks909
@idobooks909 3 года назад
Thanks! which version of torchtext?
@ugestacoolie5998
@ugestacoolie5998 3 месяца назад
hi, it seems that torchtext got quite a bit of a changeover and this tutorial's contents are outdated, any chances you might wanna update it?
@MasterMan2015
@MasterMan2015 2 года назад
Thanks for doing that. How to save it once we created it.
@fabianboro4686
@fabianboro4686 Год назад
what a fancy math intro
@lingfengshen8559
@lingfengshen8559 3 года назад
Great video, I do really learn a lot, thanks. When I run BucketIterator, it came up with an error 'int' object is not subscriptable , I check my codes but still got no ideas of where the fault is.
@pl4117
@pl4117 3 года назад
Make sure you are using batch_size as an argumant name instead of batch_sizes when calling the bucket iterator. I had this issue too.
@shoebjoarder
@shoebjoarder 4 года назад
Great video! I wanted to ask, how to use TabularDataset to split train, validation and test? Should I use something like this below? train_data, valid_data = TabularDataset.splits( path="data", train="train.json", test="valid.json", format="json", fields=fields ) test_data = TabularDataset.splits( path="data", test="test.json", format="json", fields=fields )
@AladdinPersson
@AladdinPersson 4 года назад
Yes that looks fine to me, from my understanding and this is I believe how I showed it in the video is that you need separate json/csv files and then in TabularDataset you specify those files using train, test and also possibly validation data
@shoebjoarder
@shoebjoarder 4 года назад
@@AladdinPersson Hi Aladdin, I probably have missed out you talking about the validation split. Thank you for the replying, the code is working. :)
@rishadkt837
@rishadkt837 2 года назад
Feild has become legacy
@user-or7ji5hv8y
@user-or7ji5hv8y 4 года назад
Is BucketIterator being phased out in future releases?
@AladdinPersson
@AladdinPersson 4 года назад
Yes, I'm going to wait until that release and update the torchtext tutorials. Hopefully code from Seq2Seq is still being able to run though.
@lazypunk794
@lazypunk794 4 года назад
Great tutorial, but sadly I think its already outdated. Torchtext has deprecated 'Field' and some other classes and Printing the keys and values of the dict 10:27 doesn't give proper representations of the objects anymore, they probably broke something while updating the code
@AladdinPersson
@AladdinPersson 4 года назад
Will look into it more but I think you might be right about this unfortunately... I just really hope all Seq2Seq tutorials that build on this will still be able to run :/
@lazypunk794
@lazypunk794 4 года назад
@@AladdinPersson Sorry, false alarm. The mistake in my case was that I used TabularDataset.splits() when I had just a train.csv and no test. There is no need of splits method in that case. I was just mindlessly copying from the video. My bad. The deprecation thing is just a warning as of now. The deprecated stuff will go into torchtext.legacy, so I guess your code will still work in 0.8 with some import changes.
@user-or7ji5hv8y
@user-or7ji5hv8y 4 года назад
Is the GitHub page for this available?
@AladdinPersson
@AladdinPersson 4 года назад
Yes: github.com/aladdinpersson/Machine-Learning-Collection/blob/master/ML/Pytorch/more_advanced/torchtext/torchtext_tutorial1.py
@doyourealise
@doyourealise 4 года назад
hey please create video on legacy of torchtext.data.field , also in upcoming update how can we load field in torchtext 0.8?
@AladdinPersson
@AladdinPersson 4 года назад
Yeah I saw that torchtext is going to update, I will wait a little bit to make sure they aren't going to make any additional changes and then update my previous videos to the new use of the API
@henricbohm8455
@henricbohm8455 3 года назад
Very helpful tutorial. Is it possible for you to make a tutorial, how to load data which is stored in a SQL database?
@finix7419
@finix7419 5 месяцев назад
torchtext had some changes it seems, can't import these modules with recent version
@mummyskitchen5311
@mummyskitchen5311 4 года назад
JSONDecodeError: Expecting value: line 1 column 1 (char 0) ______________ Got this error at: JSONDecodeError Traceback (most recent call last) in () 4 test ='test.json', 5 format = 'json', ----> 6 fields=fields 7 ) 8 6 frames /usr/lib/python3.6/json/decoder.py in raw_decode(self, s, idx) 355 obj, end = self.scan_once(s, idx) 356 except StopIteration as err: --> 357 raise JSONDecodeError("Expecting value", s, err.value) from None 358 return obj, end ____________ my dataframe: quote score 0 good product pretty satisfied quality diaper g... 0.9 1 leak extremely leaky doesnt soak -0.7 2 version good leak nee version leak night gud s... -0.6 3 good kid nice product daughter loving hair shi... 0.9
@mirambikasikdar5655
@mirambikasikdar5655 4 года назад
thank you!
@ximingdong503
@ximingdong503 3 года назад
I have a question, how Can I save (train_data, test_data = datasets.IMDB.splits(TEXT, LABEL)) train data or test data in IMDB dataset as a CSV file? since every time I need 15 mins to load data. thanks bro
@vijaypalmanit
@vijaypalmanit 3 года назад
Field is deprecated, so tutorial is no more relevant...
@sabarishwarang6212
@sabarishwarang6212 11 месяцев назад
Fields TabularDataset are depreciated now. Is there any alternatives?
@m.j.8527
@m.j.8527 3 года назад
how can I pass pandas dataframe into this process (instead of loading the file)?
@le-0ne
@le-0ne 3 года назад
Very nice tutorial! While i was looking at torchtext, I actually came across the libraries torchnlp and allennlp. I couldn't really tell what the differences between them were. Have you worked with them ?
@mehuljain4920
@mehuljain4920 4 года назад
Hi, I am trying to iterate the training data and its working fine but for test data but is showing me an error? Could you pls help me in resolving that error? I would really appreciate it. Thanks
@beach2550
@beach2550 3 года назад
How? Imagine if you could do everything except for reading someone's mind, how would you answer that question?
@user-or7ji5hv8y
@user-or7ji5hv8y 4 года назад
Why can't we go directly from the word to word embedding, without the need to indexify?
@AladdinPersson
@AladdinPersson 4 года назад
Hm, not sure exactly what you're suggesting. Could you share some code (or state which part you think is unecessary from the video)?
@user-or7ji5hv8y
@user-or7ji5hv8y 4 года назад
@@AladdinPersson Sorry, I think I understand now, after watching your Seq2Seq video. Thanks!
@ZobeirRaisi
@ZobeirRaisi 4 года назад
Thanks
@AladdinPersson
@AladdinPersson 4 года назад
Happy you found it useful :)
Далее
PYTORCH COMMON MISTAKES - How To Save Time 🕒
19:12
ДЕНЬ УЧИТЕЛЯ В ШКОЛЕ
01:00
Просмотров 790 тыс.
Help Me Celebrate! 😍🙏
00:35
Просмотров 29 млн
Einsum Is All You Need: NumPy, PyTorch and TensorFlow
16:22
Goodbye, TAM
12:01
Просмотров 19 тыс.
OpenAI’s New ChatGPT: 7 Incredible Capabilities!
6:27
7 PyTorch Tips You Should Know
17:12
Просмотров 21 тыс.
How You can EASILY create Custom Datasets and Loaders!
23:00
Pytorch Seq2Seq with Attention for Machine Translation
25:19
Learn NUMPY in 5 minutes - BEST Python Library!
13:38
Просмотров 852 тыс.
1.4: JSON - Working with Data and APIs in JavaScript
16:22
ДЕНЬ УЧИТЕЛЯ В ШКОЛЕ
01:00
Просмотров 790 тыс.