Тёмный
Data Science Basics
Data Science Basics
Data Science Basics
Подписаться
Hi, I am one of the data enthusiast like you ! On this channel, I teach data science as well as recent AI trend (LLM) stuffs in the most simplest manner possible.

Currently, video is one of the most important and go-to content type online. I aim to make Data Science Basics a go to RU-vid Channel for videos surrounding data science stuffs in a practical way.

If you find the content helpful then consider subscribing.

For business inquiries email at: basicsdatascience@gmail.com
💼 Consulting: topmate.io/sudarshan_koirala
Agents For Amazon Bedrock | NO CODE
21:53
12 часов назад
Tools Available Now In HuggingChat 🔥
9:58
2 месяца назад
Best Tool For Getting Your Data Ready For RAG
16:43
3 месяца назад
How TO CREATE Robust RAG
21:20
4 месяца назад
Groq with VSCode as a Copilot 🚀
8:09
4 месяца назад
Nomic's New Embedding Model | nomic-embed-text
10:51
5 месяцев назад
Running Gemma on Your Machine With Ollama
8:21
5 месяцев назад
Комментарии
@awakenwithoutcoffee
@awakenwithoutcoffee 2 часа назад
it seems a bit beside the point of using sparse/hybrid search on the meta-data if we already know exactly which to filter out: - isn't there a way to attach meta-data on top of the embedding instead of embedding the meta-data itself ? Until we do I doubt its efficiency. - would love to see more meta-data creation techniques (maybe using cheaper/specific models). This subject is not talked about nearly enough due to it being difficult and relatively expensive.
@zuowang5185
@zuowang5185 День назад
is the open source library as slow as they claimed compared to their saas?
@zuowang5185
@zuowang5185 2 дня назад
How difficult is it to bypass the paywall to build your own instance that serves the pdf extraction instead of using their api?
@rajatsharma6791
@rajatsharma6791 2 дня назад
@author If I run this notebook, and I have some dataframes created. Then I run this notebook from my current notebook using the query above, will the dataframe be available in the current notebook??
@dq-music
@dq-music 2 дня назад
Extracting Japanese tables has problems with garbled characters. unstructured can get characters, why OCR has to re-read them wrong?
@hackedbyBLAGH
@hackedbyBLAGH 4 дня назад
Thank you
@datasciencebasics
@datasciencebasics 4 дня назад
You are welcome !!
@vijaychandra5637
@vijaychandra5637 4 дня назад
Amazing Video and content
@datasciencebasics
@datasciencebasics 4 дня назад
Thanks. Glad that it was helpful!!
@adanpalma4026
@adanpalma4026 5 дней назад
Thanks. One question? Every time i want to ask. i have to start over and over again? You do all in memory ???
@datasciencebasics
@datasciencebasics 5 дней назад
You are welcome. You don’t need to start over the notebook, just ask question. But yes, once you are out of the colab notebook and runtime is not active, you need to run the notebook again.
@ahmedsomir
@ahmedsomir 6 дней назад
thanks so much for this tutorial, I am going through this 30-day tutorial and it gave me the right direction in Databricks.
@datasciencebasics
@datasciencebasics 6 дней назад
You are welcome. That’s reallly good to hear that the videos helped you pave good path in Databricks !!
@wcwong22000
@wcwong22000 7 дней назад
Thank you Sudarshan. Could you please consider to make a video guide to rung GraphRag on local LLM such as Ollama to ingest any type of documents and not just one document?
@phungpham6487
@phungpham6487 8 дней назад
Thank you so much! this is so helpful. exactly what I need.
@datasciencebasics
@datasciencebasics 8 дней назад
You are welcome. Glad that it was helpful!!
@hiroshinishida2712
@hiroshinishida2712 9 дней назад
Japanese support is not perfect yet... Hope it will soon. But it's fast anyway.
@datasciencebasics
@datasciencebasics 9 дней назад
Let’s hope it will be improved in the future !!
@Vir-se2kb
@Vir-se2kb 9 дней назад
You should explain the code in more details. For example; why you have written that line of code. That would be helpful for us to understand & help you to get more likes & subscriptions.
@datasciencebasics
@datasciencebasics 9 дней назад
Thanks for the feedback. Will take that into account in future videos.
@VenkatesanVenkat-fd4hg
@VenkatesanVenkat-fd4hg 10 дней назад
Happy you are back after some break.....
@MahavirChandaliya-y6j
@MahavirChandaliya-y6j 10 дней назад
Very informative and inspiring! Thank you making this video!
@datasciencebasics
@datasciencebasics 10 дней назад
You are welcome. Glad that it was helpful!!
@TooyAshy-100
@TooyAshy-100 10 дней назад
Hello Sudarshan, Welcome back, and thank you for your efforts. Could you consider making a video covering the following topic: Groq using Llama 3.1: Agents & LangChain with ReAct (Reasoning and Acting) for Question Answering OR Summarization.
@Sundar_Tenkasi
@Sundar_Tenkasi 12 дней назад
Watching From day 1 video.. goes interesting
@aakaashskale9228
@aakaashskale9228 13 дней назад
wil ec2 instance charge us bill?
@datasciencebasics
@datasciencebasics 13 дней назад
If you use community edition, no. If free trial, yes, aws bill needs to be covered by ourself. Only the databricks part is free.
@tomgeorge3900
@tomgeorge3900 13 дней назад
can you please share the zip for the whole quivr github old code as the code is now updated and i cannot try ollama
@datasciencebasics
@datasciencebasics 13 дней назад
Unfortunately I don’t have that as I just create the video based on the latest changes in the repo.
@kanishkayohan
@kanishkayohan 14 дней назад
hi, 1. how can i use embeddings = OllamaEmbeddings(model="nomic-embed-text") model directly from hugging face. Not from locally installed instance ? 2. if i am using local installed instance how can i publish it to huggingface space
@pratikpratik8495
@pratikpratik8495 14 дней назад
RAG - I am working on Form 10K HTML doc RAG, where AI agent act as a financial analyst, read form 10K of company and create graph, chart , do risk analysis , calculate PE ratio etc instead of just extract written text from document. Goal is to replace financial analyst with LLM+RAG hence RAG should be robust and can do all what an expert can do. I tried with Langchain , Llama-Index but no luck
@torridtunez976
@torridtunez976 14 дней назад
Thank you! What tool are you using to show all of this? Also, what hardware are you running everything on?
@datasciencebasics
@datasciencebasics 11 дней назад
You are welcome. I am running on macbook m3 pro 36 GB RAM
@mihiranand8218
@mihiranand8218 15 дней назад
from unstructured.partition.pdf import partition_pdf # Extract tables and chunk text without images raw_pdf_elements = partition_pdf( filename=path + "wildfire_stats.pdf", extract_images_in_pdf=False, infer_table_structure=True, chunking_strategy="by_title", max_characters=4000, new_after_n_chars=3800, combine_text_under_n_chars=2000, image_output_dir_path=path, ) on the above code i am getting the below error..can you or anyone please sort the error UnidentifiedImageError Traceback (most recent call last) <ipython-input-27-c496d445a669> in <cell line: 4>() 2 3 # Extract tables and chunk text without images ----> 4 raw_pdf_elements = partition_pdf( 5 filename=path + "wildfire_stats.pdf", 6 extract_images_in_pdf=False, # Disable image extraction 10 frames /usr/local/lib/python3.10/dist-packages/PIL/Image.py in open(fp, mode, formats) 3281 raise TypeError(msg) from e 3282 else: -> 3283 rawmode = mode 3284 if mode in ["1", "L", "I", "P", "F"]: 3285 ndmax = 2 UnidentifiedImageError: cannot identify image file '/tmp/tmphjsyuf14/d1ce8326-30e6-4582-be90-70f8abc570c5-1.ppm'
@teja3925
@teja3925 16 дней назад
Hello, How to generate data when there are two tables and having relationship PK, FK? Does the model is capable enough to generate such data with relation?
@albuquerqueroger
@albuquerqueroger 16 дней назад
Parabéns, conteúdo excelente! Obrigado por compartilhar.
@jeyptmanyt6599
@jeyptmanyt6599 17 дней назад
jai nepal Edit: not political btw
@awakenwithoutcoffee
@awakenwithoutcoffee 19 дней назад
HI sir, have you tried Azure AI Document Intelligence ? we are figuring out which data parser is the most suitable for production RAG apps. Cheers
@anumoy37
@anumoy37 20 дней назад
Thanks for sharing these videos. These are really helpful. I have one question though. How can I install poppler in windows system? There I am facing some challenges. I am getting the following error in Windows system: "Unable to get page count. Is poppler installed and in PATH?"
@preetjain5934
@preetjain5934 6 дней назад
did you find any solution to this problem? I am facing the same issue
@anumoy37
@anumoy37 6 дней назад
@@preetjain5934 so far no. :(. i have asked my friend who is using macbook to try for it
@NehaH-u6y
@NehaH-u6y 21 день назад
the file keeps processing forever. How to solve this? please help
@datascienceandaiconcepts5435
@datascienceandaiconcepts5435 21 день назад
nice video, for better reach commenting
@hikpras89
@hikpras89 22 дня назад
is it possible to "train" the private gpt by connection to own datasource/database of existing app, or document in own NAS ? how?
@kalyanadepu5666
@kalyanadepu5666 22 дня назад
Great explanation. I have been searching for this kind of data bricks tutorial. I am very excited to complete this playlist.🎉🎉
@datasciencebasics
@datasciencebasics 22 дня назад
Glad that the tutorials are helpful !!
@sunshine8477
@sunshine8477 23 дня назад
is the langsmith part important ? it is not mentioned in the quivr github documentation.However, I'm having installation issue so was wondering whether langsmith will ease the process
@mpesakapoeta
@mpesakapoeta 24 дня назад
How can i host a RAG model on hugging face ,where teachers can upload pdf content with text images ,and also audio and video content and then , students can interact with the model with text,speech to text(groq whisper v3),text to speech(eleven labs)
@jayanthik4507
@jayanthik4507 28 дней назад
Thanks for the detailed information. Can i convert this dashboard as a HTML File, to embed in webapp
@KS-mi3gv
@KS-mi3gv 28 дней назад
github misssing all the code - it makes a really bad learning experience, I would recommend looking for sth else
@MrCastoloman
@MrCastoloman Месяц назад
Sorry but your olution is not working. i get a len of 0 on node_mappings.
@SubinKrishnaKT
@SubinKrishnaKT Месяц назад
When I open the web UI along with my local LLM Lllama 2, I can see GPT-4 and GPT-4o in the choose LLM dropdown. Do you know what is causing this?
@inyeobkim
@inyeobkim Месяц назад
Best Video!!
@zamanganji1262
@zamanganji1262 Месяц назад
I want to finetune the llama 3 but I need to crate the special_tokens_map.json as follows: { "bos_token": { "content": "<|begin_of_text|>", "lstrip": false, "normalized": false, "rstrip": false, "single_word": false }, "eos_token": { "content": "<|im_end|>", "lstrip": false, "normalized": false, "rstrip": false, "single_word": false }, "pad_token": { "content": "<|end_of_text|>", "lstrip": false, "normalized": false, "rstrip": false, "single_word": false } } How can I do? Moreover I want to have a ollama run the model to have a chat with the model.
@FawziBreidi
@FawziBreidi Месяц назад
Which local embedding is the best in the market regardless of size
@sivaprasadatla
@sivaprasadatla Месяц назад
looks like we cannot handle below code within Azure open AI , as that of OpenAI langchain.prompts import FewShotPromptTemplate, PromptTemplate #from langchain.chat_models import AzureOpenAI from langchain.pydantic_v1 import BaseModel from langchain_experimental.tabular_synthetic_data.base import SyntheticDataGenerator from langchain_experimental.tabular_synthetic_data.openai import create_openai_data_generator, OPENAI_TEMPLATE from langchain_experimental.tabular_synthetic_data.prompts import SYNTHETIC_FEW_SHOT_SUFFIX, SYNTHETIC_FEW_SHOT_PREFIX
@tom19_06
@tom19_06 Месяц назад
What I´m really wondering. Aren´t you using all the metadata? You are saving the text only not the text element in the Memory Store.
@aalamansari8643
@aalamansari8643 Месяц назад
Sir, using partition_pdf not able to get the bulllet points from pdf like (- Any bullet point). How to get the bullet points, need help sir!
@ahmedsomir
@ahmedsomir Месяц назад
Great tutorials I like it so much missing notebooks !!!
@datasciencebasics
@datasciencebasics Месяц назад
Glad that you find it helpful. I have notebooks where it is necessary, not for all.
@NO-me9bf
@NO-me9bf Месяц назад
Hi, I have docker, ollama and webui installed, now I'm trying to update the open webui (it shows I'm a few versions behind now and the "Lets go" button doesn't work), and I'm trying to add the web search feature, some say also that duckadoo has a free api, but I am not a tech person, to read the docs is confusing for me, I only managed by miracle to get my stuff working thanks to this step by step video. Can you please make a video on how to update the webui and how to get the web search feature installed and working? Thanks.
@ketanpurohit9086
@ketanpurohit9086 Месяц назад
Is this possible in the community edition (according to my googling it should be).. but cant seem to make it work -- BTW Good job with these videos. Greatly appreciated
@datasciencebasics
@datasciencebasics Месяц назад
IMO its not possible in community edition.
@sivaprasadatla
@sivaprasadatla Месяц назад
Please give the approach for synthetic data generation using Azure open AI as i have azure open AI key
@datasciencebasics
@datasciencebasics Месяц назад
Hello, you can quickly use Azure OpenAI by importing Azure OpenAI feom LangChain. For ref here is the link -> python.langchain.com/v0.2/docs/integrations/llms/azure_openai/
@sivaprasadatla
@sivaprasadatla Месяц назад
@@datasciencebasics thanks a lot! i will check
@harshsavasil3062
@harshsavasil3062 Месяц назад
Can you explain how we can persist these indexes into a vector database like Milvus?
@ketanpurohit9086
@ketanpurohit9086 Месяц назад
A nice tour of databricks UI. Good job.
@datasciencebasics
@datasciencebasics Месяц назад
Thanks. Glad that it was helpful !!