Тёмный
No video :(

NLP Text Cleaning and Preprocessing | Tokenization | Lemmatization | Sententizer | Paragraphizer 

Spencer Pao
Подписаться 11 тыс.
Просмотров 1,1 тыс.
50% 1

Опубликовано:

 

21 авг 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 3   
@jasonpenick7498
@jasonpenick7498 Год назад
I'm just starting the video, so I am speaking early, however I question the wisdom of removing punctuation and stop words. I understand it is typically done, however I think there is important information that will be lost by doing so. Obviously this isn't a 'problem' given how well current systems function, but I rather wonder how much better they would be if they were trained without removing these things.
@SpencerPaoHere
@SpencerPaoHere Год назад
You are absolutley correct! In state of the art technologies, you typically leave all the information intact more or less. Because of modern day compute and absolutley massive models (NLP), the architectures can handle and interpret that type of information. In many cases, huge NLP models have the power to interpret further. However, you need a ton of data. At the end of the day, it depends on your downstream task. If you were doing QA models for instance or text summarization, then perhaps cutting down on unnecessary words might be the best way to go. (better results)
@fangya9350
@fangya9350 2 года назад
super useful!!!
Далее
Building a Recommendation System in Python
13:52
Просмотров 71 тыс.
I've been using Redis wrong this whole time...
20:53
Просмотров 353 тыс.