Тёмный

Text Representation | NLP Lecture 4 | Bag of Words | Tf-Idf | N-grams, Bi-grams and Uni-grams 

CampusX
Подписаться 227 тыс.
Просмотров 84 тыс.
50% 1

In natural language processing, text representation plays a vital role in capturing the meaning and structure of textual data. This video explores three fundamental text representation techniques: Bag of Words, Tf-Idf (Term Frequency-Inverse Document Frequency), and N-grams (Uni-grams and Bi-grams). Each method has its unique approach to encoding and extracting information from text, making it essential for data scientists and NLP enthusiasts to grasp these concepts.
Assignment - colab.research...
============================
Do you want to learn from me?
Check my affordable mentorship program at : learnwith.camp...
============================
📱 Grow with us:
CampusX' LinkedIn: / campusx-official
CampusX on Instagram for daily tips: / campusx.official
My LinkedIn: / nitish-singh-03412789
Discord: / discord
E-mail us at support@campusx.in
✨ Hashtags✨
#TextRepresentation #BagOfWords #TfIdf #NGrams #NLP #DataScience #machinelearning
⌚Time Stamps⌚
00:00 - Intro
01:10 - Plan of Attack
02:56 - Introduction
03:25 - What is feature extraction from text?
04:49 - Why do we need feature extraction?
07:30 - Why is this difficult to do?
11:00 - What is the core idea behind this?
12:12 - What are the Techniques?
14:24 - Common Terms
18:00 - One Hot Encoding
33:25 - Bag of Words
57:45 - N-grams/Bi-grams/Tri-grams
01:13:45 - Benefits of N Grams
01:14:25 - Disadvantages N Grams
01:16:34 - Tf-Idf
01:38:46 - Custom Features
01:41:45 - Assignment

Опубликовано:

 

17 авг 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 135   
@sauravagarwal8928
@sauravagarwal8928 2 месяца назад
This is one of the legendary videos I have seen. I’m into SEO and trying to wrap my head around semantic SEO. Some experts in the semantic SEO industry use technical jargons and fail to explain how semantics engines like Google work. But your series helped me understand every single bit of it. I don’t know python coding, but now I understand how Google algorithms work to rank any document. I understand the type of computation they do behind the screen. The video is pure gold! I mean it! This helps me as a search engine optimiser and makes me better understand machine and human interaction. Thank you so much 🙏 ☺️
@raj4624
@raj4624 2 года назад
oh bhai.. unbelievable... 2hrs of content......genuinely dil se shukriya sir appko....
@art4eigen93
@art4eigen93 2 года назад
This playlist is necessary for basic to advanced NLP engineers. Please do upload the complete series Sir. Your contribution is life saving.
@ashwanibhardwaj4930
@ashwanibhardwaj4930 2 года назад
Please carryon this series and we would like to learn advance NLP using deep learning/langauge models,sota techniques once basic NLP is done.
@kindaeasy9797
@kindaeasy9797 3 дня назад
31:41 wow loved the explanation
@bishowlamsal7319
@bishowlamsal7319 2 года назад
Huge respect sir, You deserve more than million followers. Love from Nepal ❤️❤️❤️❤️
@abhinavkr5131
@abhinavkr5131 Год назад
Bohot tutorials dekha but aap best ho sir
@piyushpathak7311
@piyushpathak7311 2 года назад
I am following your Ml playlist sir you have great explanation, sir please complete xgboost and DBSCAN algorithm in this playlist and please start series on Deep learning..
@AkashBhandwalkar
@AkashBhandwalkar 2 года назад
I'm following it as well
@campusx-official
@campusx-official 2 года назад
Will do it in January
@AkashBhandwalkar
@AkashBhandwalkar 2 года назад
@@campusx-official woaahhh! Thank you sooooo much! This made my day! 🥳🥳🥳
@749srobin
@749srobin 2 года назад
@@campusx-official which year january sir ?
@debojitmandal8670
@debojitmandal8670 Год назад
@@campusx-official hi sir based on your example using tri gram the vocabulary is decreasing to 5 so i dont follow your this part when u said the vocabulary increases as the n gram increases
@naveedkaimkhami2695
@naveedkaimkhami2695 3 месяца назад
I was confused to select word embedding technique for my fyp project and found this video life saving. Thank youu soo muchh !!!
@shubhamgattani5357
@shubhamgattani5357 4 месяца назад
I cannot find any reason not to like this video. It's amazing!
@rachitsingh4913
@rachitsingh4913 2 года назад
For me You are the best data science teacher. ❤❤❤❤❤
@HarshVardhan-jj9xh
@HarshVardhan-jj9xh 6 месяцев назад
Thanks a lot Sir. My Phd is on NLP only .your videos helps me a lot in understanding overall concepts . Your efforts are very sincere and dedicated 💯
@forgotabhi
@forgotabhi 6 месяцев назад
I am getting started with NLP :) I am still doing my UG can you tell me your experience in the field?
@HarshVardhan-jj9xh
@HarshVardhan-jj9xh 6 месяцев назад
@@forgotabhi Its amazing field and day by day u will came to know new models and architectures.
@hritikroshanmishra3630
@hritikroshanmishra3630 4 месяца назад
@@forgotabhi which college?
@MuhammadAfzal-xl7wd
@MuhammadAfzal-xl7wd 4 месяца назад
thank you so much. you explain the concept in a very very simple way. once again thank u so much 🙂🙂🙂🙂
@ronylpatil
@ronylpatil 2 года назад
Many Many Congratulations to you Sir for 10k Subs🥳🥳🥳
@hritikroshanmishra3630
@hritikroshanmishra3630 4 месяца назад
😁😁😁😁 182 k
@PankajSingh-sf7nc
@PankajSingh-sf7nc 28 дней назад
Happy Guru Purnima Sir 🙏🙏
@yashgondaliya2831
@yashgondaliya2831 18 дней назад
In given your assignment if I'm applying OneHotEncoding it shows error like you can use onehot encoder for limited category Does any one got output ?
@harisumanth
@harisumanth 2 года назад
Almost 2 hours...Respect!
@siyays1868
@siyays1868 2 года назад
Thanku so much sir for a wonderful explaination. Hatts off to u always!
@takeshrao733
@takeshrao733 23 дня назад
i am doing log analytics can u suggest which text representation technique is better for this.
@shivamgarg3890
@shivamgarg3890 2 года назад
This channel is highly underrated...
@BTStechnicalchannel
@BTStechnicalchannel 2 года назад
Your explanation is so great!! Vo bhi hindi me. Thanks a lot!!💙
@mehulsuthar7554
@mehulsuthar7554 3 месяца назад
i have one doubt can we normalize the vector engineering features? I think normalizing the vector will still contain the info that was previously their but in the lower scale for reducing computation. let me know if this is the correct approach
@technicalhouse9820
@technicalhouse9820 2 месяца назад
maza aya sir qasam sa
@mohaiminrahat4974
@mohaiminrahat4974 2 года назад
Congratulations sir for 10K Subscribers.
@amitkumar2005
@amitkumar2005 4 месяца назад
Superb explanations !
@hari8568
@hari8568 6 месяцев назад
The example you gave for bigrams better than uni gram being able to differentiate the 2 sentences in vector space doesn't really make much sense to me, suppose instead of not I used a synonym of "very "instead like "extremely " then these 2 sentences should be similar in vector space but bigram model will say its different, so its actually not handling the word not rather just handling an unknown word differently
@asifpervezpolok2243
@asifpervezpolok2243 2 года назад
the best tutorial i found from you.
@gauravlochab9614
@gauravlochab9614 Год назад
Can you add RNN, LSTMs, and modern NLP using transformers!? Loved the content. Huge respect. Ps banjara market ka lamp! XD
@deeptisingh93
@deeptisingh93 2 года назад
Thank you sir...Really itne easy way me smjhane ke liye
@hvjmlops
@hvjmlops 2 года назад
Respect for your hardwork
@basit-qx7ys
@basit-qx7ys 2 месяца назад
i love the way sir explains, i am not able to grasp the fundamental concepts but not able to imagine myself to code for NLP without any guidance ,Any suggestions what other materials and sources I should follow ?
@learnfromIITguy
@learnfromIITguy Год назад
wow , after watching this video, I am confident on feature engineering
@daljeetsinghranawat6359
@daljeetsinghranawat6359 7 месяцев назад
KUDOS TO YOU SIR ..............loving this series
@shahu6015
@shahu6015 Год назад
Congratulation for 100K subscribers in advance.
@takeshrao733
@takeshrao733 4 месяца назад
Very nice and very good start point. Can you pls suggest which text representation algo suited for log analysis.
@somyarathee
@somyarathee 2 года назад
Best series on NLP
@sidindian1982
@sidindian1982 Год назад
1:23:40 - Campusx - word in IDF is repeated 4 times sir , .. Loge( 4/4) = 0
@Tusharchitrakar
@Tusharchitrakar 4 месяца назад
It's repeated only 3 times bro
@machinelearningspace6977
@machinelearningspace6977 2 года назад
Teaching style awesome... Go ahead.
@whothefisyash
@whothefisyash Месяц назад
fr maza aagya ekdm
@749srobin
@749srobin 2 года назад
sir ji , removing stopwords took 3hours 26 min , tokenization karne mein ghabraahat c ho rhi hai
@ranjithkumar947
@ranjithkumar947 5 месяцев назад
for tf idf, campusx term came 4 times but sir you considered it only thrice any reason for it? Anyway there we are getting +1 in realtime. Could you please reply me for this?
@tusarmundhra5560
@tusarmundhra5560 9 месяцев назад
awesome
@bhanu0925
@bhanu0925 2 года назад
Thank you for another great session
@uditsaurabh
@uditsaurabh 6 месяцев назад
awesome video
@gauravverma4433
@gauravverma4433 2 года назад
It was awesome .. love you sir... thanx for your efforts
@gautampatadiya6096
@gautampatadiya6096 4 месяца назад
well done buddy #nlp #nlptuts #nlpeasytuts
@maukaladka4100
@maukaladka4100 2 года назад
Hello sir, I have had doubt on this topic how conversion is taking place, watch lots of video read lot's of blogs but no one can make me understand like u did. Hat's off to u keep up the great work.
@shahmuhammadraditrahaman9904
@shahmuhammadraditrahaman9904 2 года назад
Incredibile ❤️
@mihirnaik3383
@mihirnaik3383 2 года назад
Hi Buddy, Great content! This video cleared all my doubts regarding BoW and TF IDF🙌 Are you going to take any NLP projects in future based on Machine Learning models?
@campusx-official
@campusx-official 2 года назад
Yes
@mihirnaik3383
@mihirnaik3383 2 года назад
@@campusx-official Thank you!
@sidindian1982
@sidindian1982 Год назад
@@campusx-official Sir codes missing in the list ... BOW , TFIDF ..pls share
@Sara-fp1zw
@Sara-fp1zw 2 года назад
Congratulations on 36K subs, soon we gonna cross 100K IA :)
@balrajprajesh6473
@balrajprajesh6473 Год назад
2 hours of pure diamond mine.
@user-dd3te4rh8j
@user-dd3te4rh8j Год назад
Feature extraction from text / text representation/ text vectorization - changing text to numbers so that model can understand Bag of words -
@avinashbhardwaz5717
@avinashbhardwaz5717 8 месяцев назад
Sir , i dont understand for idea of tf idf at 1:20:09. Since you said jo word document mein jyada hain but corpua mein kam hain. I confused in that way that how its possible. Since corpus mein to hoga hi hoga jyaga or equal.kindly clarify sir.
@jai40403
@jai40403 6 месяцев назад
Where can I get these notes ?
@nikhiljagtap1669
@nikhiljagtap1669 2 года назад
at 55:24 , BOW doesn't consider the sequence of sentence but since we gonna perform Tokenization before this, we gonna lose some words that'd mess the sequence anyway. isnt that right?
@muhammaduseram9405
@muhammaduseram9405 6 месяцев назад
did someone completed assignment df[ 'review' ][1] hard to remove html pattern there.. df[ 'review' ] = 'a wonderful little production.
@user-qq7qi5kk5u
@user-qq7qi5kk5u 3 месяца назад
import re pattern1=re.compile('') df['review']=pattern1.sub(r'',df['review'])
@richaaggarwal07
@richaaggarwal07 2 года назад
Please make more videos on NLP !!!
@saumyakumari3441
@saumyakumari3441 2 года назад
Many many congratulations for 10k sub. 🎊🎊🎊
@campusx-official
@campusx-official 2 года назад
Thanks
@campusx-official
@campusx-official 2 года назад
Thanks
@IRFANSAMS
@IRFANSAMS 2 года назад
Sir..thank you for the wonderful video
@hitinyadav3321
@hitinyadav3321 2 года назад
Amazing video
@rajeevranjan5007
@rajeevranjan5007 2 года назад
Great Video Sir.
@ayushroy6208
@ayushroy6208 2 года назад
Sir suppose length of sentences are unequal..... Tab kya padding ke alava aur koi option nahi hai in case of Tfidf Or ngrams etc?
@vivekathilkar6555
@vivekathilkar6555 2 года назад
Appreciate your efforts
@diwakargupta0
@diwakargupta0 Год назад
Awesome content and explanation sir 👐
@Howto-ty4ru
@Howto-ty4ru Год назад
cv.fit_transform(df['eng']) How can we apply fit_transform on text? I think I do not understand this part
@GhostRider....
@GhostRider.... Год назад
very nice explanation sir
@vaibhavmoharkar2349
@vaibhavmoharkar2349 6 месяцев назад
THANKYOU SIR
@gajanankhapre2425
@gajanankhapre2425 2 года назад
Very good flow sir . Kindly upload next in NLP series
@yashjain6372
@yashjain6372 Год назад
best
@ronylpatil
@ronylpatil 2 года назад
Sir NLP series is really amazing, please recommend me best book for NLP because in few days I have an interview which will totally on NLP.
@avishinde2929
@avishinde2929 Год назад
thank you sir ji
@gautamkushwaha8724
@gautamkushwaha8724 9 месяцев назад
why don't you keep the resource in the description, like the code link..
@Sara-fp1zw
@Sara-fp1zw 2 года назад
hi nitish sir, im faceing some problem with spell checker function def spell_correct(text): return TextBlob(text).correct().string it is taking so much on assignment dataframe, is there any fastest approach to check and correct spelling in log(n) times ?
@sidindian1982
@sidindian1982 Год назад
Run the file in google collab ,., because of GPU ... runs faster ...
@SatyaIITI
@SatyaIITI Год назад
Hi Nitis sir, where can we get these notes in pdf format.so that it will be helpful while doing revision.
@forgotabhi
@forgotabhi 6 месяцев назад
when i perform bagofwords method like the video in kaggle notebook on the imdb data it says memory exceeded and just restarts the notebook :( what to do?
@user-qq7qi5kk5u
@user-qq7qi5kk5u 3 месяца назад
same issue i tried in my machine but it said memory exceeded it need 18.1Gib after applying ohe
@forgotabhi
@forgotabhi 3 месяца назад
@@user-qq7qi5kk5u guess we're poor lol
@bananamaker4877
@bananamaker4877 11 месяцев назад
Liked and shared your video. Subscribed your channel. What else can I do for you. You are doing a great job.
@abdullahilawal3220
@abdullahilawal3220 11 месяцев назад
You teaching method is good but you making it local only to Indian student not International for all to use. Please Make a new version of all your videos on NLP to English so everyone can learn from,🙏
@joyeetamallik5063
@joyeetamallik5063 2 года назад
Thank you so much for such wonderful vedio. Sir Do u take any online classes as well?
@campusx-official
@campusx-official 2 года назад
No, not right now
@campusx-official
@campusx-official 2 года назад
No, not right now
@sachi-4750
@sachi-4750 2 года назад
Thank you so much sir😊🙏
@230489shraddha
@230489shraddha 2 года назад
Thanks a lot sir .... Can you also upload a video on RNN & LSTM.
@HimanshuSharma-we5li
@HimanshuSharma-we5li 2 года назад
You are a 💎.
@manavahuja4418
@manavahuja4418 2 года назад
Sir will you make a video for nlp project....something good for resume..?
@vijayraghuwanshi4486
@vijayraghuwanshi4486 Год назад
I have tried the assignment on kaggle if any one tried and want to discuss please let me know.
@shaiksalavuddin5976
@shaiksalavuddin5976 2 года назад
Thank you🌹
@solvinglife6658
@solvinglife6658 Год назад
Sir please continue the playlist!!!!!
@sunny739
@sunny739 4 месяца назад
cv.vocabulary_ return the word and its position(index) in BOW
@nikeshmali8506
@nikeshmali8506 6 месяцев назад
how can i get OneNote notes
@ritwiksingh4937
@ritwiksingh4937 Месяц назад
by writing in ur notebook
@rafibasha4145
@rafibasha4145 2 года назад
Please complete NLP,Interview series and ML series
@ananyakumari6807
@ananyakumari6807 2 года назад
Sir, can you please share your code notebook?
@ridoychandraray2413
@ridoychandraray2413 Год назад
Thank you sir?
@furry2fun
@furry2fun Год назад
share the link for collab notebook
@chauhanabhishek9593
@chauhanabhishek9593 2 года назад
Thank u sir .
@backclover9651
@backclover9651 2 года назад
Bag of words minuets?
@yashgaming827
@yashgaming827 Год назад
sir please share the one note link
@rushikeshmalpe3715
@rushikeshmalpe3715 2 года назад
Deep learning start Karo sir please 👍👍👍❤️
@kislaykrishna8918
@kislaykrishna8918 2 года назад
Sir, my question is: I have list of entities and a text.Like this: List=["Data Scientist", "Bihar", "Krishna"] Text=" I am Krishna. I am from Bihar . I want to be a Data Scientist" I want result like: "I am [Entity]Krishna[Entity]. I am from [Entity]Bihar[Entity] . I want to be a [Entity]Data Scientist[Entity]" Please help me with code to get this result.Thanx🙏
@priyaravind18
@priyaravind18 2 года назад
Did you get the code?
@kislaykrishna8918
@kislaykrishna8918 2 года назад
@@priyaravind18 List=["Data Scientist", "Bihar", "Krishna"] text = ' I am Krishna. I am from Bihar. I want to be a Data Scientist' for entity in List: if entity in List: text = text.replace(entity,'[Entity]'+entity+'[Entity]') print(text)
@datagyan5489
@datagyan5489 2 года назад
How to join Mentorship program
@MrKB_SSJ2
@MrKB_SSJ2 Год назад
23:00
@nabinadhikari5426
@nabinadhikari5426 Год назад
Please share this notebook source file to us !
@MrKB_SSJ2
@MrKB_SSJ2 Год назад
1:38:48
Далее
معركة من أجل العصيدة 👧ضد🪳
00:26
These Illusions Fool Almost Everyone
24:55
Просмотров 2,1 млн
The Bayesian Trap
10:37
Просмотров 4,1 млн
Text Preprocessing | NLP Course Lecture 3
1:07:48
Просмотров 100 тыс.
NLP: Understanding the N-gram language models
10:33
Просмотров 113 тыс.
Introduction to Transformers | Transformers Part 1
1:00:05