Тёмный
AI Bites
AI Bites
AI Bites
Подписаться
Welcome to AI Bites! I am here to help you understand AI concepts and research papers by providing clear and concise explanations. I will explain the most impactful papers and ideas in Computer Vision, Machine (Deep) Learning, Natural Language Processing, Reinforcement Learning and Generative Adversarial Networks (GANs).

I am a research engineer by profession. As a former member of the University of Oxford Visual Geometry Group (VGG group), I have worked with world leading scientists from leading research labs. While I may be privileged to join a top research labs, I want to share my learning and help you be at your best.

During my MSc in Computer Vision I noticed some students with fantastic programming skills struggle to understand mathematical terms and equations in papers. I found my strength in understanding and explaining these papers to them in simple terms. So here I am to leverage those skills to help you in your journey.

Please subscribe & let the learning begin!
GGUF quantization of LLMs with llama cpp
12:10
7 месяцев назад
Simple quantization of LLMs - a hands-on
14:57
7 месяцев назад
fine tuning LLMs - the 6 stages
9:21
8 месяцев назад
Retrieval Augmented Generation (RAG) explained
9:26
8 месяцев назад
Learn ML in 2024 - YouTube roadmap (100% free)
14:03
9 месяцев назад
Комментарии
@jaggyjut
@jaggyjut 6 дней назад
Thank you for this tutorial.
@AIBites
@AIBites 6 дней назад
Glad you like it 😊
@bharatbhusansau2996
@bharatbhusansau2996 14 дней назад
Bro, your statement from 05:22 is completely wrong and misguiding. LoRA is used for finetuning LLM models, when full-finetuning is not possible. It does so by freezing all model weights, and incorporating and training low-rank matrices(A*B) in Attention modules. LoRA speeds up training and reduces memory requirements but does not provide a speedup during inference. If LLM model is too large to be handled by LoRA due to GPU memory limitations, Quantized LoRA is used to finetune the model. Overall, QLoRA is a more advanced solution when LoRA alone cannot handle large models for finetuning.
@IsmailIfakir
@IsmailIfakir 16 дней назад
is there is a multimodal llm can fine-tuning for sentiment analysis from text, image, video and audio ?
@Ram-oj4gn
@Ram-oj4gn 17 дней назад
The quantisation of changing number format applies only to the result of activation function or also to the individual weights ? Where we apply this quantisation in the NN
@AaronBlox-h2t
@AaronBlox-h2t 19 дней назад
Cool video....Some test would be appreciated. Also, maybe you can include the qwen vision models, especially.
@AIBites
@AIBites 6 дней назад
Sure will try to test models that come up in the future 👍
@IsmailIfakir
@IsmailIfakir 23 дня назад
you can fine-tuning this multimodal llm for sentiment analysis
@yoavtamir7707
@yoavtamir7707 26 дней назад
Thanks! this is an awesome explanation
@AIBites
@AIBites 26 дней назад
Glad you like it 😊
@yabezD
@yabezD Месяц назад
where to draw these kinda charts ?. could you tell me. itll be helpful
@Kurshu101
@Kurshu101 Месяц назад
Dude change the battery in your smoke detector
@AIBites
@AIBites Месяц назад
Hah.. Already did 🤣
@saadowain3511
@saadowain3511 Месяц назад
thank you. I have a question ! do we use dspy for development or production ?
@xspydazx
@xspydazx Месяц назад
nice i made this before ! A model which picks the correct model ! BUT : then i decided tat a 1b agent can be the router model ! Then i decided that models as TOOLS ! so once you create an anthrpic as a tool , it will select the anthropic insteads ! i think its all about understanding the power of Tools andeven graphs and nodes : If we create some graphs then their start point are the tool ! SO: the docstring methodolgy is the best version of the tool calling method ! , perhaps with a react type framwork ( epecally when using tools ) by creating details docstring and example in the docstring , ach tool added will be woven into the prompt ! so the aim is to create model ( or tune one ) to use the react framwork as well as selecting tools ! -- I think that higging face agents is the methodology which is correct because we ca host models on hugging face .. and hit those spaces ! ... Spaces as TOOLS !.. SO again we see tools takinng a front role as the main prompt is to select the correct tool for the intent: also train for slot filling and intent detection (hf dataset ) .... the routing method was very good learning execises ! ... but it also needs the pydantic to send back the coreect route to select , when it could be done via a tool which is already preprogrammed iin to the library ( stoping reason )...
@dfdiasbr
@dfdiasbr Месяц назад
Thank you for that video. I've been studying this model and helped me a lot.
@AIBites
@AIBites Месяц назад
Glad it helped 👍
@bitminers1379
@bitminers1379 Месяц назад
How did you push your own custom dataset on huggingface?
@AIBites
@AIBites Месяц назад
Checkout the bunch of commands available in the HF command line tools. It's quite easy actually
@orangtimur6812
@orangtimur6812 Месяц назад
i always got this message ImportError: cannot import name 'load_flow_from_json' from 'langflow' (unknown location) already clone from github using windows
@first-thoughtgiver-of-will2456
@first-thoughtgiver-of-will2456 Месяц назад
fp16 also has a bigger mantissa than bfloat which benefits normalized or bounded activation functions (e.g. sigmoid)
@newbie8051
@newbie8051 2 месяца назад
Well the graphs at 2:18 are incorrect, sigmoid and tanh have different ranges, so the output gate should have range - 1 to 1 (tanh)
@AIBites
@AIBites Месяц назад
thats a great spot. Copy pasting oversight I guess 🙂 will pay more attention while making the videos on attention. Thank you 😀
@yayasy1362
@yayasy1362 2 месяца назад
I don’t understand why you say that LoRA is fast for inference… in any case you need to forward through the full rank pretrained weights + low-rank finetuned weights.
@AIBites
@AIBites Месяц назад
ah yes. If only we could quantize the weights, we can do better than the pre-trained weights. You are making a fair point here. Awesome and thank you! :)
@yayasy1362
@yayasy1362 Месяц назад
@@AIBites Yeah, if only we could replace the pretrained Full-Rank weights by the Low-Rank Weights... really nice video and illustrations! Thanks a lot!
@IgorAherne
@IgorAherne 2 месяца назад
Thank you, that's a beautiful explanation! One thing I struggle understanding, is the term quantization blocks in 4:30 - why we need several of them. In my understanding from the video, we ponder about using 3 blocks of 16 bits to describe a number. Which is 48 bits and is more expensive than 32-bit float. But couldn't we just use 16*3 = 48 bits per number instead? Using 48 bits (without splitting it) would give us a very high precision within [0,1] range, due to powers-of-two I did ask GPT, and it responded that there exists a 'Scale Factor' and a 'Zero-Point', which are constants that shift and stretch the distribution in 6:02 Although I do understand these might be those quantization constants, - I am not entirely sure what the 64 blocks described in the video are 6:52 Is this because of the Rank of Matrix-Decompositions is 1, with 64 entries in both vectors?
@SudarakaYasindu
@SudarakaYasindu 2 месяца назад
Awesome explanation! ❤
@AIBites
@AIBites Месяц назад
glad you think so and thank you indeed :)
@wilfredomartel7781
@wilfredomartel7781 2 месяца назад
🎉❤
@AIBites
@AIBites Месяц назад
🙂🙂🙂 Thank you! :)
@ravindarmadishetty736
@ravindarmadishetty736 2 месяца назад
Hi, when i am having 100k pdf documents and i store all the embedding into vector store without following any chunking. Now if i want to retrieve using prompt how can we augment relevant information on such an huge un-chunked vector? Please suggest what is the best way to handle this problem? Please help some references as well along with your inputs
@AIBites
@AIBites Месяц назад
is there any particular reason you skipped the chunking process? As the pre-processing and chunking operation is kinda one-time operation, I can think of re-doing the entire vector store with chunking. It may then be much easier to retrieve several times, for multiple queries, as and when needed What are your thoughts?
@ccidral
@ccidral 2 месяца назад
I wish I was good at math to understand this stuff.
@AIBites
@AIBites Месяц назад
we all can get good at it by putting in the effort. Its just another language spoken by scientists :)
@BB8_CA
@BB8_CA 2 месяца назад
Great video! How good is it compared to I-JEPA?
@AIBites
@AIBites Месяц назад
I-JEPA seems pretty recent work. So I haven't had a chance to compare. Have you had a chance to do so? Please share your thoughts
@vishalchovatiya1361
@vishalchovatiya1361 2 месяца назад
very well explained.
@AIBites
@AIBites Месяц назад
thank you Vishal! :)
@davidcmoffatt
@davidcmoffatt 2 месяца назад
Just start watching but... signed bye is -128..127 not 127..127. google 2's complement to see why.
@AIBites
@AIBites Месяц назад
sorry, thats an embarrassing errata! and good spot. Thanks a lot! will keep it up for next time.
@sqlsql4304
@sqlsql4304 3 месяца назад
Hi, what is reason you first convert it to FP16.gguf not directly to 8 bit.
@AIBites
@AIBites Месяц назад
the conversion doesn't go through unless we convert to gguf. At least it was the case for me when I did the work. May be some recent commits to the library has eased the process and skipped the step?
@LOKESHSRINIVAS-x1z
@LOKESHSRINIVAS-x1z 3 месяца назад
I'm getting error RuntimeError: No GPU found. A GPU is needed for quantization.
@AIBites
@AIBites Месяц назад
Yes, could you confirm if the machine you are running on has a GPU. Are you running locally or on the cloud? please clarify
@M_Nagy_
@M_Nagy_ 3 месяца назад
I have a question. after embedding we still have the same number of features x1->x4, and say that they have dimension is 1x10, means 10 features each. W* is 4x4 right! My question is X is 4x10 and X^T is 10x4. How we do dot product W . X^T as the dimension is (4x4) . (10x4)! or I am missing something?
@AIBites
@AIBites Месяц назад
we have to choose the dimentions accordingly. So the weights will be 10X10. the choice of these parameters is paramount in desigining deep architectures that work end-to-end.
@Ishant875
@Ishant875 3 месяца назад
Thank you. Can you share the notebook please?
@AIBites
@AIBites Месяц назад
ah sorry, its here: github.com/ai-bites/generative-ai-course/blob/main/dspy_demo.ipynb I will update the video description with the link too. Thanks.
@Erik_Fenety
@Erik_Fenety 3 месяца назад
The training of artificial neural networks is computationally expensive for several key reasons: Large number of parameters: Deep neural networks often have millions or even billions of parameters (weights and biases) that need to be optimized during training. Updating such a large number of parameters requires significant computational resources. Iterative process: Training typically involves many iterations over the entire dataset (epochs) to gradually adjust the parameters. This iterative nature makes it time-consuming, especially for large datasets. Backpropagation: The backpropagation algorithm, used to calculate gradients and update weights, requires forward and backward passes through the network for each training example. This process is computationally intensive, especially for deep networks. Matrix operations: Neural network computations involve many matrix multiplications and other mathematical operations, which are computationally expensive, especially as network size increases. Large datasets: Training on large datasets, which is often necessary for good performance, requires processing massive amounts of data repeatedly. Hyperparameter tuning: Finding optimal hyperparameters (e.g., learning rate, network architecture) often involves training multiple models with different configurations, multiplying the computational cost. Complex architectures: Advanced architectures like convolutional neural networks or recurrent neural networks involve specialized operations that add to the computational complexity. Gradient descent optimization: The optimization process itself, typically using variants of gradient descent, requires many small updates to the parameters, each requiring computation of gradients. These factors combined make the training of artificial neural networks a computationally intensive task, often requiring specialized hardware like GPUs to accelerate the process.
@Erik_Fenety
@Erik_Fenety 3 месяца назад
Hyperparameter tuning: Finding optimal hyperparameters (e.g., learning rate, network architecture) often involves training multiple models with different configurations, multiplying the computational cost. Complex architectures: Advanced architectures like convolutional neural networks or recurrent neural networks involve specialized operations that add to the computational complexity.
@AIBites
@AIBites Месяц назад
yup, complex architectures are in a way hyperparameter tuning, don't you think?
@Erik_Fenety
@Erik_Fenety 3 месяца назад
What makes the training of artificial neural networks computationally expensive The training of artificial neural networks is computationally expensive for several key reasons: 1. Large number of parameters: Deep neural networks often have millions or even billions of parameters (weights and biases) that need to be optimized during training. Updating such a large number of parameters requires significant computational resources. 2. Iterative process: Training typically involves many iterations over the entire dataset (epochs) to gradually adjust the parameters. This iterative nature makes it time-consuming, especially for large datasets. 3. Backpropagation: The backpropagation algorithm, used to calculate gradients and update weights, requires forward and backward passes through the network for each training example. This process is computationally intensive, especially for deep networks. 4. Matrix operations: Neural network computations involve many matrix multiplications and other mathematical operations, which are computationally expensive, especially as network size increases. 5. Large datasets: Training on large datasets, which is often necessary for good performance, requires processing massive amounts of data repeatedly. 6. Hyperparameter tuning: Finding optimal hyperparameters (e.g., learning rate, network architecture) often involves training multiple models with different configurations, multiplying the computational cost. 7. Complex architectures: Advanced architectures like convolutional neural networks or recurrent neural networks involve specialized operations that add to the computational complexity. 8. Gradient descent optimization: The optimization process itself, typically using variants of gradient descent, requires many small updates to the parameters, each requiring computation of gradients. These factors combined make the training of artificial neural networks a computationally intensive task, often requiring specialized hardware like GPUs to accelerate the process. Citations: [1] Training Neural Networks | Machine Learning - Google for Developers developers.google.com/machine-learning/crash-course/training-neural-networks/video-lecture [2] Various Optimization Algorithms For Training Neural Network towardsdatascience.com/optimizers-for-training-neural-network-59450d71caf6?gi=ea8e0c3dd721 [3] 5 Algorithms to Train a Neural Network - DataScienceCentral.com www.datasciencecentral.com/5-algorithms-to-train-a-neural-network/ [4] The differences between Artificial and Biological Neural Networks towardsdatascience.com/the-differences-between-artificial-and-biological-neural-networks-a8b46db828b7 [5] [PDF] Introduction to Neural Network Algorithm einsteinmed.edu/uploadedFiles/labs/Yaohao-Wu/Lecture%209.pdf
@rahul.vpoojari6553
@rahul.vpoojari6553 3 месяца назад
Thank you sire
@AIBites
@AIBites Месяц назад
my pleasure Rahul! :-)
@yudao7151
@yudao7151 3 месяца назад
Very detailed and informative vedio!
@AIBites
@AIBites Месяц назад
thanks. Glad it was useful.
@pplw1183
@pplw1183 3 месяца назад
Is there any type of mathematical proof or at least reasoning that the weight of pretrained Neuron Network Weights are normal distributed? This in essense is the foundational data point they are using. And yes, very well done - I just found you via good old Google Search looking for QLoRA. Thanks for investing your time to bring these concepts closer to the community and people.
@vlogsofanundergrad2034
@vlogsofanundergrad2034 3 месяца назад
Great job mate, keep going. Could you please update on current research for proof-of-concept CLIP/text based prompting for SAM? Note one suggestion kindly: Don't make the text animation bounce. It's very distracting when trying to read the text. Maybe you can try other kind of animation or just keep it simple. Even other kinds of objects in flowchart bounce when they appear. Please avoid this bounce animation.
@AIBites
@AIBites Месяц назад
thanks for the great feedback. I never thought bouncing will turn annoying. I was thinking it was cool to animate in different ways. So will keep it simple going forward. I have done a video on SAM2, which is the updated version of SAM for videos. Would you still like text based prompting for SAM? If so, can you give more details as to what you wish to learn?
@benkim2498
@benkim2498 3 месяца назад
Super in depth and specific, thank you!!!
@AIBites
@AIBites Месяц назад
my pleasure! :)
@allanng78
@allanng78 3 месяца назад
Nice video. When can I find the model downloaded in colab? Can you help me?
@AIBites
@AIBites Месяц назад
I am not sure if I shared the model. The training code is however available on the github page. Hope you found that.
@sanchitbokade201
@sanchitbokade201 3 месяца назад
But won't it create the problem of overfitting
@AIBites
@AIBites Месяц назад
yes, thats why they keep the KAN network much smaller than the traditional MLP networks. Also I think its just the beginning of a new type of network. Lets wait and watch developments that address overfitting and other shortcomings of KANs.
@dileepvijayakumar2998
@dileepvijayakumar2998 4 месяца назад
this is better explained than what the inventor of Lora itself explained in his video.
@AIBites
@AIBites Месяц назад
oh! thank you so much. such words really keep me going :-)
@B_knows_A_R_D-xh5lo
@B_knows_A_R_D-xh5lo 4 месяца назад
😊😊😊😊
@B_knows_A_R_D-xh5lo
@B_knows_A_R_D-xh5lo 4 месяца назад
😊😊😊😊
@B_knows_A_R_D-xh5lo
@B_knows_A_R_D-xh5lo 4 месяца назад
😊😊😊😊
@B_knows_A_R_D-xh5lo
@B_knows_A_R_D-xh5lo 4 месяца назад
😊😊😊😊😊
@vaishaligoel3000
@vaishaligoel3000 4 месяца назад
very informative videos. I couldnt find video 4 and 5. pls share the link with me.
@AIBites
@AIBites Месяц назад
sorry, I got side tracked with other latest developments and never made 4 and 5! my bad :)
@HcDaN
@HcDaN 4 месяца назад
這支影片普遍地解釋了 DSPy 的重要事項。先玩過了之後,有一點經驗但是充滿疑問之時,就可以來看看這支影片,解答諸多疑點。
@phanindraparashar8930
@phanindraparashar8930 4 месяца назад
audio and video out of sync
@AIBites
@AIBites Месяц назад
sorry to hear you feel this way. Have had some happy comments and feedback on LinkedIn too :) but will keep it up for next time. YT doesn't allow us to edit already uploaded videos, unfortunately!
@jeffreyliang2345
@jeffreyliang2345 4 месяца назад
Amazing video! I'm wondering that for the model architecture images you provided, are the spatial and temporal layers only talking about the architecture of the U-net block? Would the rest of the Video LDM model be the same?
@AIBites
@AIBites Месяц назад
yup, I think its just the U-Net blocks.
@yuanyuan4985
@yuanyuan4985 4 месяца назад
Thank you so much for providing this video!!!!!
@AIBites
@AIBites Месяц назад
my pleasure Yuan! 🙂