Data Science Basics

Data Science Basics

201
734 913

Подписаться

Hi, I am one of the data enthusiast like you ! On this channel, I teach data science as well as recent AI trend (LLM) stuffs in the most simplest manner possible.

Currently, video is one of the most important and go-to content type online. I aim to make Data Science Basics a go to RU-vid Channel for videos surrounding data science stuffs in a practical way.

If you find the content helpful then consider subscribing.

For business inquiries email at: basicsdatascience@gmail.com
💼 Consulting: topmate.io/sudarshan_koirala

Agents For Amazon Bedrock | NO CODE

21:53

Agents For Amazon Bedrock | NO CODE

12 часов назад

Llama 3.1 | The Best LLM is now Open Source | TRY Locally & Online

13:13

Llama 3.1 | The Best LLM is now Open Source | TRY Locally & Online

День назад

Tools Available Now In HuggingChat 🔥

9:58

Tools Available Now In HuggingChat 🔥

2 месяца назад

Chat With Documents | Fully Managed RAG on Amazon Bedrock | NO-CODE

23:22

Chat With Documents | Fully Managed RAG on Amazon Bedrock | NO-CODE

2 месяца назад

Getting Started With Amazon Bedrock | Simple ChatUI with Chainlit and LangChain

16:52

Getting Started With Amazon Bedrock | Simple ChatUI with Chainlit and LangChain

2 месяца назад

Build Your Own RAG Using Unstructured, Llama3 via Groq, Qdrant & LangChain

25:55

Build Your Own RAG Using Unstructured, Llama3 via Groq, Qdrant & LangChain

2 месяца назад

Metadata Extraction & Chunking Using Unstructured | ChromaDB

17:31

Metadata Extraction & Chunking Using Unstructured | ChromaDB

2 месяца назад

Extract Image & Image Info From PDF & Use LlaVa via Ollama To Explain Image | LangChain

21:09

Extract Image & Image Info From PDF & Use LlaVa via Ollama To Explain Image | LangChain

2 месяца назад

Extract Table Info From PDF & Summarise It Using Llama3 via Ollama | LangChain

18:28

Extract Table Info From PDF & Summarise It Using Llama3 via Ollama | LangChain

3 месяца назад

Best Tool For Getting Your Data Ready For RAG

16:43

Best Tool For Getting Your Data Ready For RAG

3 месяца назад

Llama3 via Groq API | Super Fast Inference | LangChain | Chainlit

9:48

Llama3 via Groq API | Super Fast Inference | LangChain | Chainlit

3 месяца назад

LLAMA3 🦙 is OUT | Quickly TRY Via Ollama 🔥

11:11

LLAMA3 🦙 is OUT | Quickly TRY Via Ollama 🔥

3 месяца назад

But, How is Chunking Done ? Splitting Basics Using LangChain

29:12

But, How is Chunking Done ? Splitting Basics Using LangChain

3 месяца назад

HuggingChat Assistants | Alternative to OpenAI's GPTs | FREE

14:37

HuggingChat Assistants | Alternative to OpenAI's GPTs | FREE

4 месяца назад

Advance RAG: LlamaParse + Reranker = Better RAG

17:23

Advance RAG: LlamaParse + Reranker = Better RAG

4 месяца назад

Website To Quickly Try New TOP Open Source LLM

2:01

Website To Quickly Try New TOP Open Source LLM

4 месяца назад

Langchain + Qdrant Local | Server (Docker) | Cloud | Groq | Tutorial

22:55

Langchain + Qdrant Local | Server (Docker) | Cloud | Groq | Tutorial

4 месяца назад

How TO CREATE Robust RAG

21:20

How TO CREATE Robust RAG

4 месяца назад

RAG With LlamaParse from LlamaIndex & LangChain 🚀

17:04

RAG With LlamaParse from LlamaIndex & LangChain 🚀

4 месяца назад

RAG with LlamaParse, Qdrant and Groq | Step By Step

27:15

RAG with LlamaParse, Qdrant and Groq | Step By Step

4 месяца назад

Groq with VSCode as a Copilot 🚀

8:09

Groq with VSCode as a Copilot 🚀

4 месяца назад

Super Easy Way To Parse PDF | LlamaParse From LlamaIndex | LlamaCloud

20:32

Super Easy Way To Parse PDF | LlamaParse From LlamaIndex | LlamaCloud

4 месяца назад

Crazy FAST RAG | Ollama | Nomic Embedding Model | Groq API

18:23

Crazy FAST RAG | Ollama | Nomic Embedding Model | Groq API

5 месяцев назад

Nomic's New Embedding Model | nomic-embed-text

10:51

Nomic's New Embedding Model | nomic-embed-text

5 месяцев назад

How To Create Medical Chatbot Using BioMistral, Ollama, LangChain & Chainlit

10:51

How To Create Medical Chatbot Using BioMistral, Ollama, LangChain & Chainlit

5 месяцев назад

Groq: Insanely Fast Inference 🚀 | World's First Language Processing Unit (LPU)

14:54

Groq: Insanely Fast Inference 🚀 | World's First Language Processing Unit (LPU)

5 месяцев назад

Ollama: How To Create Custom Models From HuggingFace ( GGUF )

10:54

Ollama: How To Create Custom Models From HuggingFace ( GGUF )

5 месяцев назад

Create Chat UI Using ChainLit, LangChain, Ollama & Gemma 🧠

10:43

Create Chat UI Using ChainLit, LangChain, Ollama & Gemma 🧠

5 месяцев назад

Running Gemma on Your Machine With Ollama

8:21

Running Gemma on Your Machine With Ollama

5 месяцев назад

Комментарии

@awakenwithoutcoffee 2 часа назад

it seems a bit beside the point of using sparse/hybrid search on the meta-data if we already know exactly which to filter out: - isn't there a way to attach meta-data on top of the embedding instead of embedding the meta-data itself ? Until we do I doubt its efficiency. - would love to see more meta-data creation techniques (maybe using cheaper/specific models). This subject is not talked about nearly enough due to it being difficult and relatively expensive.

@zuowang5185 День назад

is the open source library as slow as they claimed compared to their saas?

@zuowang5185 2 дня назад

How difficult is it to bypass the paywall to build your own instance that serves the pdf extraction instead of using their api?

@rajatsharma6791 2 дня назад

@author If I run this notebook, and I have some dataframes created. Then I run this notebook from my current notebook using the query above, will the dataframe be available in the current notebook??

@dq-music 2 дня назад

Extracting Japanese tables has problems with garbled characters. unstructured can get characters, why OCR has to re-read them wrong?

@hackedbyBLAGH 4 дня назад

Thank you

@datasciencebasics 4 дня назад

You are welcome !!

@vijaychandra5637 4 дня назад

Amazing Video and content

@datasciencebasics 4 дня назад

Thanks. Glad that it was helpful!!

@adanpalma4026 5 дней назад

Thanks. One question? Every time i want to ask. i have to start over and over again? You do all in memory ???

@datasciencebasics 5 дней назад

You are welcome. You don’t need to start over the notebook, just ask question. But yes, once you are out of the colab notebook and runtime is not active, you need to run the notebook again.

@ahmedsomir 6 дней назад

thanks so much for this tutorial, I am going through this 30-day tutorial and it gave me the right direction in Databricks.

@datasciencebasics 6 дней назад

You are welcome. That’s reallly good to hear that the videos helped you pave good path in Databricks !!

@wcwong22000 7 дней назад

Thank you Sudarshan. Could you please consider to make a video guide to rung GraphRag on local LLM such as Ollama to ingest any type of documents and not just one document?

@phungpham6487 8 дней назад

Thank you so much! this is so helpful. exactly what I need.

@datasciencebasics 8 дней назад

You are welcome. Glad that it was helpful!!

@hiroshinishida2712 9 дней назад

Japanese support is not perfect yet... Hope it will soon. But it's fast anyway.

@datasciencebasics 9 дней назад

Let’s hope it will be improved in the future !!

@Vir-se2kb 9 дней назад

You should explain the code in more details. For example; why you have written that line of code. That would be helpful for us to understand & help you to get more likes & subscriptions.

@datasciencebasics 9 дней назад

Thanks for the feedback. Will take that into account in future videos.

@VenkatesanVenkat-fd4hg 10 дней назад

Happy you are back after some break.....

@MahavirChandaliya-y6j 10 дней назад

Very informative and inspiring! Thank you making this video!

@datasciencebasics 10 дней назад

You are welcome. Glad that it was helpful!!

@TooyAshy-100 10 дней назад

Hello Sudarshan, Welcome back, and thank you for your efforts. Could you consider making a video covering the following topic: Groq using Llama 3.1: Agents & LangChain with ReAct (Reasoning and Acting) for Question Answering OR Summarization.

@Sundar_Tenkasi 12 дней назад

Watching From day 1 video.. goes interesting

@aakaashskale9228 13 дней назад

wil ec2 instance charge us bill?

@datasciencebasics 13 дней назад

If you use community edition, no. If free trial, yes, aws bill needs to be covered by ourself. Only the databricks part is free.

@tomgeorge3900 13 дней назад

can you please share the zip for the whole quivr github old code as the code is now updated and i cannot try ollama

@datasciencebasics 13 дней назад

Unfortunately I don’t have that as I just create the video based on the latest changes in the repo.

@kanishkayohan 14 дней назад

hi, 1. how can i use embeddings = OllamaEmbeddings(model="nomic-embed-text") model directly from hugging face. Not from locally installed instance ? 2. if i am using local installed instance how can i publish it to huggingface space

@pratikpratik8495 14 дней назад

RAG - I am working on Form 10K HTML doc RAG, where AI agent act as a financial analyst, read form 10K of company and create graph, chart , do risk analysis , calculate PE ratio etc instead of just extract written text from document. Goal is to replace financial analyst with LLM+RAG hence RAG should be robust and can do all what an expert can do. I tried with Langchain , Llama-Index but no luck

@torridtunez976 14 дней назад

Thank you! What tool are you using to show all of this? Also, what hardware are you running everything on?

@datasciencebasics 11 дней назад

You are welcome. I am running on macbook m3 pro 36 GB RAM

@mihiranand8218 15 дней назад

from unstructured.partition.pdf import partition_pdf # Extract tables and chunk text without images raw_pdf_elements = partition_pdf( filename=path + "wildfire_stats.pdf", extract_images_in_pdf=False, infer_table_structure=True, chunking_strategy="by_title", max_characters=4000, new_after_n_chars=3800, combine_text_under_n_chars=2000, image_output_dir_path=path, ) on the above code i am getting the below error..can you or anyone please sort the error UnidentifiedImageError Traceback (most recent call last) <ipython-input-27-c496d445a669> in <cell line: 4>() 2 3 # Extract tables and chunk text without images ----> 4 raw_pdf_elements = partition_pdf( 5 filename=path + "wildfire_stats.pdf", 6 extract_images_in_pdf=False, # Disable image extraction 10 frames /usr/local/lib/python3.10/dist-packages/PIL/Image.py in open(fp, mode, formats) 3281 raise TypeError(msg) from e 3282 else: -> 3283 rawmode = mode 3284 if mode in ["1", "L", "I", "P", "F"]: 3285 ndmax = 2 UnidentifiedImageError: cannot identify image file '/tmp/tmphjsyuf14/d1ce8326-30e6-4582-be90-70f8abc570c5-1.ppm'

@teja3925 16 дней назад

Hello, How to generate data when there are two tables and having relationship PK, FK? Does the model is capable enough to generate such data with relation?

@albuquerqueroger 16 дней назад

Parabéns, conteúdo excelente! Obrigado por compartilhar.

@jeyptmanyt6599 17 дней назад

jai nepal Edit: not political btw

@awakenwithoutcoffee 19 дней назад

HI sir, have you tried Azure AI Document Intelligence ? we are figuring out which data parser is the most suitable for production RAG apps. Cheers

@anumoy37 20 дней назад

Thanks for sharing these videos. These are really helpful. I have one question though. How can I install poppler in windows system? There I am facing some challenges. I am getting the following error in Windows system: "Unable to get page count. Is poppler installed and in PATH?"

@preetjain5934 6 дней назад

did you find any solution to this problem? I am facing the same issue

@anumoy37 6 дней назад

@@preetjain5934 so far no. :(. i have asked my friend who is using macbook to try for it

@NehaH-u6y 21 день назад

the file keeps processing forever. How to solve this? please help

@datascienceandaiconcepts5435 21 день назад

nice video, for better reach commenting

@hikpras89 22 дня назад

is it possible to "train" the private gpt by connection to own datasource/database of existing app, or document in own NAS ? how?

@kalyanadepu5666 22 дня назад

Great explanation. I have been searching for this kind of data bricks tutorial. I am very excited to complete this playlist.🎉🎉

@datasciencebasics 22 дня назад

Glad that the tutorials are helpful !!

@sunshine8477 23 дня назад

is the langsmith part important ? it is not mentioned in the quivr github documentation.However, I'm having installation issue so was wondering whether langsmith will ease the process

@mpesakapoeta 24 дня назад

How can i host a RAG model on hugging face ,where teachers can upload pdf content with text images ,and also audio and video content and then , students can interact with the model with text,speech to text(groq whisper v3),text to speech(eleven labs)

@jayanthik4507 28 дней назад

Thanks for the detailed information. Can i convert this dashboard as a HTML File, to embed in webapp

@KS-mi3gv 28 дней назад

github misssing all the code - it makes a really bad learning experience, I would recommend looking for sth else

@MrCastoloman Месяц назад

Sorry but your olution is not working. i get a len of 0 on node_mappings.

@SubinKrishnaKT Месяц назад

When I open the web UI along with my local LLM Lllama 2, I can see GPT-4 and GPT-4o in the choose LLM dropdown. Do you know what is causing this?

@inyeobkim Месяц назад

Best Video!!

@zamanganji1262 Месяц назад

I want to finetune the llama 3 but I need to crate the special_tokens_map.json as follows: { "bos_token": { "content": "<|begin_of_text|>", "lstrip": false, "normalized": false, "rstrip": false, "single_word": false }, "eos_token": { "content": "<|im_end|>", "lstrip": false, "normalized": false, "rstrip": false, "single_word": false }, "pad_token": { "content": "<|end_of_text|>", "lstrip": false, "normalized": false, "rstrip": false, "single_word": false } } How can I do? Moreover I want to have a ollama run the model to have a chat with the model.

@FawziBreidi Месяц назад

Which local embedding is the best in the market regardless of size

@sivaprasadatla Месяц назад

looks like we cannot handle below code within Azure open AI , as that of OpenAI langchain.prompts import FewShotPromptTemplate, PromptTemplate #from langchain.chat_models import AzureOpenAI from langchain.pydantic_v1 import BaseModel from langchain_experimental.tabular_synthetic_data.base import SyntheticDataGenerator from langchain_experimental.tabular_synthetic_data.openai import create_openai_data_generator, OPENAI_TEMPLATE from langchain_experimental.tabular_synthetic_data.prompts import SYNTHETIC_FEW_SHOT_SUFFIX, SYNTHETIC_FEW_SHOT_PREFIX

@tom19_06 Месяц назад

What I´m really wondering. Aren´t you using all the metadata? You are saving the text only not the text element in the Memory Store.

@aalamansari8643 Месяц назад

Sir, using partition_pdf not able to get the bulllet points from pdf like (- Any bullet point). How to get the bullet points, need help sir!

@ahmedsomir Месяц назад

Great tutorials I like it so much missing notebooks !!!

@datasciencebasics Месяц назад

Glad that you find it helpful. I have notebooks where it is necessary, not for all.

@NO-me9bf Месяц назад

Hi, I have docker, ollama and webui installed, now I'm trying to update the open webui (it shows I'm a few versions behind now and the "Lets go" button doesn't work), and I'm trying to add the web search feature, some say also that duckadoo has a free api, but I am not a tech person, to read the docs is confusing for me, I only managed by miracle to get my stuff working thanks to this step by step video. Can you please make a video on how to update the webui and how to get the web search feature installed and working? Thanks.

@ketanpurohit9086 Месяц назад

Is this possible in the community edition (according to my googling it should be).. but cant seem to make it work -- BTW Good job with these videos. Greatly appreciated

@datasciencebasics Месяц назад

IMO its not possible in community edition.

@sivaprasadatla Месяц назад

Please give the approach for synthetic data generation using Azure open AI as i have azure open AI key

@datasciencebasics Месяц назад

Hello, you can quickly use Azure OpenAI by importing Azure OpenAI feom LangChain. For ref here is the link -> python.langchain.com/v0.2/docs/integrations/llms/azure_openai/

@sivaprasadatla Месяц назад

@@datasciencebasics thanks a lot! i will check

@harshsavasil3062 Месяц назад

Can you explain how we can persist these indexes into a vector database like Milvus?

@ketanpurohit9086 Месяц назад

A nice tour of databricks UI. Good job.

@datasciencebasics Месяц назад

Thanks. Glad that it was helpful !!