Тёмный
CodeEmporium
CodeEmporium
CodeEmporium
Подписаться
Mission: In the next 2 years, we aim to create comprehensive AI educational content to turn enthusiasts into professionals.
genAI - EXPLAINED!
13:31
14 дней назад
Informer: Training and Inference
45:22
Месяц назад
Informer: Complete Code from scratch!
1:27:01
Месяц назад
Informer Encoder architecture - EXPLAINED!
26:11
2 месяца назад
Informer embeddings - EXPLAINED!
24:59
2 месяца назад
Informer distillation - EXPLAINED!
25:05
2 месяца назад
Informer attention code -  FROM SCRATCH!
36:06
3 месяца назад
Hyper parameters - EXPLAINED!
16:32
5 месяцев назад
NLP with Neural Networks | ngram to LLMs
13:21
6 месяцев назад
Transfer Learning - EXPLAINED!
16:22
7 месяцев назад
Embeddings - EXPLAINED!
12:58
7 месяцев назад
Loss functions in Neural Networks - EXPLAINED!
8:14
7 месяцев назад
Optimizers in Neural Networks - EXPLAINED!
10:19
8 месяцев назад
Activation functions in neural networks
12:32
8 месяцев назад
Backpropagation in Neural Networks - EXPLAINED!
10:18
8 месяцев назад
Building your first Neural Network
15:16
8 месяцев назад
Proximal Policy Optimization | ChatGPT uses this
13:26
9 месяцев назад
Deep Q-Networks Explained!
10:51
10 месяцев назад
Monte Carlo in Reinforcement Learning
11:49
10 месяцев назад
Q-learning - Explained!
11:54
10 месяцев назад
Комментарии
@zeeshankhanyousafzai5229
@zeeshankhanyousafzai5229 21 час назад
@phaZZi6461
@phaZZi6461 День назад
the masked attention is, as i understand it, really more of a trick for training: when putting in a training example sentence, we let the decoder predict the "next word" for all subsentences simulatenously and then add the losses for all the predictions (and also still add losses along the batch dimension). however that requires that the attention heads are never allowed to attend to next token - so that they are basically all put into the same situation of not knowing what comes after. now that your model has trained with masking, you'll have to keep the masks during inference as well, because thats now the behaviour that the subsequent layers in the network expect!
@hussainmotiwala808
@hussainmotiwala808 День назад
0.5 sq units as half of the balls means half the area of the square for the diamond
@darshantank554
@darshantank554 2 дня назад
Nice summary of survey paper. Can you also make video on how to reduce llm response for complex RAG architecture and prompt technique which uses reasoning.
@EzekielAmy-v3b
@EzekielAmy-v3b 2 дня назад
Shaina Harbor
@shivamsahil3660
@shivamsahil3660 2 дня назад
Umm I have a doubt, if we already have target network then why do we need to evaluate Q network? Can't we directly use target network?
@AnikaLiliana-e2e
@AnikaLiliana-e2e 2 дня назад
Muller Roads
@Patapom3
@Patapom3 3 дня назад
How come RAG does even work with such poor chunking strategies???
@vtrandal
@vtrandal 4 дня назад
Awesome video. Excellent content. You mail it every time. Is there anything you cannot do?
@CodeEmporium
@CodeEmporium 4 дня назад
Haha. You making me blush! Thanks for the years of support!
@anandsharma16
@anandsharma16 4 дня назад
the dog bark messed me up man
@sinahdsh
@sinahdsh 4 дня назад
Thank you for the complete explanation. In quiz 3, I didn't get why bypassing some modules may be helpful. You mean sometimes the initial prompt is as straight-forward that there is no need to use some modules?
@CodeEmporium
@CodeEmporium 4 дня назад
Yep. I believe that is the case. I tried explaining this more in pass 3
@kalcavaleiro6993
@kalcavaleiro6993 5 дней назад
Finallyyyy.....im waiting so loong, for u ro explain this stuff, Thank youuu jay🎉🎉
@CodeEmporium
@CodeEmporium 4 дня назад
Glad!
@zeeshankhanyousafzai5229
@zeeshankhanyousafzai5229 5 дней назад
Super cool ❤
@-casperliu2227
@-casperliu2227 5 дней назад
super nice content! appreciate your efforts
@vishnusaitejanagabandi9009
@vishnusaitejanagabandi9009 5 дней назад
simply amazing , very well explained .
@haimantidas843
@haimantidas843 5 дней назад
is this video for ghost only?
@haimantidas843
@haimantidas843 5 дней назад
😄
@davidflorez5647
@davidflorez5647 5 дней назад
Excelent video
@vtrandal
@vtrandal 6 дней назад
RLHF? ROTFL
@МаксимМакаров-р7ь
@МаксимМакаров-р7ь 6 дней назад
Clear explanation, thank you!
@legendary2192
@legendary2192 8 дней назад
at 37:42, shouldn't we permute the values to (1, 4, 8, 64) first and then reshape to (1,4,8*64) to accurately concatenate multiple heads for corresponding sentences?
@drew1998
@drew1998 8 дней назад
You shouldn’t be really comparing SA to MHA it should be SA to CrossAttention. MHA is just a concat of many attentions “heads”
@bofloa
@bofloa 9 дней назад
can a neural net have only singlw weight per node? intead of weights per input ..
@Radhika-wv5qe
@Radhika-wv5qe 9 дней назад
It's so easy to understand, even with zero knowledge, I understood it . Thank You!
@suraj1311
@suraj1311 9 дней назад
Calculus was given by india.
@theinvincible8557
@theinvincible8557 День назад
The development of calculus is largely attributed to European mathematicians Sir Isaac Newton and Gottfried Wilhelm Leibniz in the late 17th century. However, Indian mathematicians made significant contributions to mathematical concepts that are foundational to calculus.
@RoyMuriel-f1v
@RoyMuriel-f1v 10 дней назад
Stoltenberg Throughway
@SedatKarakan-f1t
@SedatKarakan-f1t 10 дней назад
Welch Mall
@srikanthramakrishna1073
@srikanthramakrishna1073 10 дней назад
Hi Ajay. Great work!! Quick question, in the code the default for mask is set to None. Is there an instance during training/inference where we won't add a mask? Is masking needed only for inference?
@srikanthramakrishna1073
@srikanthramakrishna1073 10 дней назад
oh later in the video it is set to None for cross attention!
@НухкадиевБагаутдин
Great explanation!
@SukritiSemwal
@SukritiSemwal 10 дней назад
Is this the centroid based text summariser ?? If not plz let me know🙏
@MuhammadMujahidHaruna
@MuhammadMujahidHaruna 11 дней назад
😮 Am silence of how you explain everything in details
@Codingchartpattern-o5i
@Codingchartpattern-o5i 11 дней назад
00:15 Listening to music and singing along while coding and sipping a cup of coffee
@nexyboye5111
@nexyboye5111 14 дней назад
thanks, this is the only video I found useful on attention.
@sudarshanseshadri2144
@sudarshanseshadri2144 14 дней назад
C, B, A
@ziangxu8371
@ziangxu8371 15 дней назад
Funny!
@smithanair787
@smithanair787 15 дней назад
Great video! How do we use Informer for classification of timeseries data?
@44eee44
@44eee44 15 дней назад
Hidden Markov Models and Gaussian Mixture Models are under Generative AI
@khalidelejla7113
@khalidelejla7113 16 дней назад
It's normal for me to feel sympathy for those models, I just want them to reach down hill ant to live happily ever after
@NoraGeoffrey-x2w
@NoraGeoffrey-x2w 16 дней назад
Cristian Trail
@Inzurrekto1
@Inzurrekto1 16 дней назад
Thank you vor this video. Very well explained
@borneoland-hk2il
@borneoland-hk2il 16 дней назад
this PPO you explained is PPO-Penalty or PPO-Clip, and what is the different?
@yinyangai_app
@yinyangai_app 17 дней назад
How does one determine if a lead passed into the z function results in 1? Where does that training data for the model come from?
@modakad
@modakad 17 дней назад
Thanks for the detailed explanation. The first part of the Euclidian max-min distance vs #dimensions was revealing ! One point I am thinking over is even though the max-min distance is shrinking, the ranking of distances will (or might) still hold true, irrespective of #dimensions. If that's the case, the algorithms should not loose any discriminative power in theory. In practice, yes, the strain this might bring on compute requirements can make it impractical and hence the needs to reduce dimensions. Would love to hear your thoughts @CodeEmporium
@JohnBerry-q1h
@JohnBerry-q1h 18 дней назад
Rubber •🐥• ducky • you • are • the • one. You • make • 🛀 • 🧼 • bath • time • lots • of • fun. [BERT _learning_ ERNIE]
@sudlow3860
@sudlow3860 18 дней назад
Another very informative video (unintended pun there!)
@thehealthofthematter1034
@thehealthofthematter1034 19 дней назад
GenAI today is like the internet circa the end of the 90s; overrated on the short term, underrated over the long term.
@BhargavaVidiyala
@BhargavaVidiyala 19 дней назад
Great video loved the content
@alfinal5787
@alfinal5787 19 дней назад
The hype is misplaced. It would be better to create models that grok instead of approximate. Perhaps have hybrid systems.
@samson6707
@samson6707 19 дней назад
3:15 could K-Nearest Neighbor Density Estimation be seen/used as a Generative model? 🤔
@samson6707
@samson6707 19 дней назад
thanks my man. awesome video.
@aymenchibouti1527
@aymenchibouti1527 19 дней назад
I have one question teacher , Can you use transformer on KDD_cup2015 and thanks