Тёмный

Solving Real-World Data Science Problems with LLMs! (Historical Document Analysis) 

Keith Galli
Подписаться 222 тыс.
Просмотров 14 тыс.
50% 1

In this video we walk through the process of analyzing historical documents using Python & Large Language Models. We start by setting up LLMs using both closed-source (OpenAI API) and open-source (Llama 2 via Ollama) options. Next, we walk through how we can leverage the LLMs to parse out entities from text. After this we actually start playing around with our data, loading in a specific subcategory of documents from Kaggle and see how we can connect pages from the same documents together. Once this is completed, we repeat the entity parsing process for our actual data to get pieces of information such as names, ages, and locations from our documents. Finally we analyze these entities to learn some insights from our document database.
Kaggle Dataset: www.kaggle.com/datasets/keith...
GitHub Repo: github.com/keithgalli/histori...
Project Website: freedmensbureau.info
Contributors:
Abdessalem Boukil (NLP Research & Analysis): / abdessalem-boukil-3792...
Trent Self (Kaggle Dataset Setup): / trentonself
If you enjoyed this project video, make sure to throw it a thumbs up & subscribe! Let me know in the comments if you have any questions. It would also be helpful for people to upvote the Kaggle dataset for visibility!
---------------------------
Video timeline!
0:00 - Video Overview & Reference Material
3:05 - Data & Code Setup
5:04 - Task #0: Configure LLM to use with Python (OpenAI API)
20:10 - Task #0 (continued): LLM Configuration with Open-Source Model (LLama 2 via Ollama)
27:39 - Task #1: Use LLM to Parse Simple Sentence Examples
41:22 - Sub-task #1: Convert string to Python Object
44:29 - Task #1 (continued): Use Open-Source LLM to Parse Sentence Examples w/ LangChain
56:24 - Quick note on a benefit of using LangChain (easily switching between models)
58:06 - Task #2 (warmup): Grab Apprenticeship Agreement rows from Dataframe
1:06:22 - Task #2: Connect Pages that Belong to the Same Documents
1:56:36 - Task #3: Parse out values from merged documents
2:12:44 - Task #4 (setup): Analyze Results
2:17:52 - Fixing up our results from task #3 quickly
2:20:41 - Task #4: Find the average age of apprentices in our merged contract documents
2:30:59 - Other analysis, wlho had the most apprentices?
-------------------------
If you are curious to learn how I make my tutorials, check out this video: • How to Make a High Qua...
Practice your Python Pandas data science skills with problems on StrataScratch!
stratascratch.com/?via=keith
Join the Python Army to get access to perks!
RU-vid - / @keithgalli
Patreon - / keithgalli
*I use affiliate links on the products that I recommend. I may earn a purchase commission or a referral bonus from the usage of these links.

Опубликовано:

 

3 июл 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 40   
@abhishekpatil8599
@abhishekpatil8599 3 месяца назад
The best part about Keith's videos are, that they are completely RAW, nothing is staged, he'll make mistakes which we always do but also show you how to troubleshoot those, he'll even go on google mid video read documentation, blogs etc to help solve them , which really helps the next time you get the same error and furthermore strengthen the concept you are learning.
@evogelpohl
@evogelpohl 4 дня назад
You have the heart of a teacher. You found your calling brother.
@KeithGalli
@KeithGalli 2 дня назад
I appreciate the kind words 🙏
@dabunnisher29
@dabunnisher29 3 месяца назад
Im in the Aviation Industry and your Pandas tutorial taught me how to cut up and arrange excel worksheets. I use that knowledge almost every day to make my life easier. Thank you so much!!!!
@edsonwinnerify
@edsonwinnerify 3 месяца назад
Glad you are coming back! Love your videos
@lucaskhoo3712
@lucaskhoo3712 3 месяца назад
You're my inspiration. I am glad you are back.
@nelsonnjikelani4844
@nelsonnjikelani4844 3 месяца назад
First time subscribe . I am ALL IN!!❤
@muhammadabdulsalam602
@muhammadabdulsalam602 3 месяца назад
My big bro am so happy that you came back just like b4.
@KeithGalli
@KeithGalli 3 месяца назад
Always happy to be here :)
@chineduezeofor2481
@chineduezeofor2481 3 месяца назад
Excellent tutorial. Thank you for sharing this.
@KeithGalli
@KeithGalli 3 месяца назад
Glad you enjoyed!
@sushibooshi
@sushibooshi Месяц назад
More machine learning content! This is awesome stuff Keith!
@bajangsekacang
@bajangsekacang 3 месяца назад
Haii brother I 've never forget your video amazingg... Now you create other more more valuable... Amazing...
@gulamgauskhan6933
@gulamgauskhan6933 3 месяца назад
Keith is back!!!!!
@KeithGalli
@KeithGalli 3 месяца назад
😎😎
@anthonypriest214
@anthonypriest214 3 месяца назад
Dude, thank you. You are awesome
@KeithGalli
@KeithGalli 3 месяца назад
You are very welcome! Thanks for kind words.
@cocoarecords
@cocoarecords 3 месяца назад
Quality information
@utkarshkapil
@utkarshkapil 3 месяца назад
He's back guys!!
@KeithGalli
@KeithGalli 3 месяца назад
You know it
@wiz8058
@wiz8058 3 месяца назад
🎉 amazing content
@KeithGalli
@KeithGalli 3 месяца назад
Thanks bro!!
@KeithGalli
@KeithGalli 3 месяца назад
first
@Master_of_Chess_Shorts
@Master_of_Chess_Shorts 3 месяца назад
you re the best
@JonR4m
@JonR4m 3 месяца назад
You know, I saw this movie like 2 weeks ago called Dumb Money and for a few minutes I thought the main character was you; but no, the guy's name was Keith Gill. Anyway, thank you for your service. You're a real human being.😁
@Lnd2345
@Lnd2345 3 месяца назад
Except he doesn’t look like him at all :)
@JonR4m
@JonR4m 3 месяца назад
@@Lnd2345 Yeah, I know. It just took me a couple of minutes to figure it out.
@KeithGalli
@KeithGalli 3 месяца назад
Haha yeah that's not me, but I did get a lot of people thinking we were the same for a short time period when that was all going on xD
@aryehpaulwalter7520
@aryehpaulwalter7520 3 месяца назад
You're the GOAT. Curious what kind of computer/laptop you use and also what keyboard you use for the computer.
@KeithGalli
@KeithGalli 3 месяца назад
I'm currently using a Macbook Pro M2 w/ 16gb RAM and 512gb SSD. The keyboard I'm using currently is a logitech K850.
@aryehpaulwalter7520
@aryehpaulwalter7520 3 месяца назад
@@KeithGalli thanks! So your setup is a laptop with an external keyboard? That’s how you do these videos/work?
@atharvasawai8309
@atharvasawai8309 3 месяца назад
Hi Keith, I am getting an error while saving the OpenAI key to the Secrets in Kaggle Notebook. ERROR: Permission 'kernelSessions.enableInternet' was denied. Can you help me on this??
@MaxwellSmi41483
@MaxwellSmi41483 3 месяца назад
Fantastic real world problem as a lot of your other videos. I've got to say that all models on Ollama absolutely stink in comparison to OpenAI. However I have been using a preprocessing text function I created for using in a news article project I'm working on using Spacy. I have been able to pass the transcription_text's through my function with some minor tweaking and have been able to recreate what the LLM's are doing just through code, by using the doc.ents functionality. Only 1:27:00 through the video at the moment and perhaps you use something similar later on, but Spacy has been a bit of a godsend if you don't/can't pay for OpenAI
@KeithGalli
@KeithGalli 3 месяца назад
Yeah Spacy is great for a non-LLM approach to so many NLP tasks. I didn't use it at all in this video because it was focused on LLMs, but I have used it a bunch for personal work in the past. Glad you've been enjoying the video!
@venugopal-nc3nz
@venugopal-nc3nz 3 месяца назад
why the frequency of your videos is too low ?@keith Galli
@ucphattruong4341
@ucphattruong4341 3 месяца назад
Hi, which operating system that you prefer, windows or macos?
@sebastianalvarez1537
@sebastianalvarez1537 2 месяца назад
@Intellectualmind4
@Intellectualmind4 3 месяца назад
🎉🎉🎉🎉🎉🎉🎉
Далее
КТО ЭТО БЫЛ?
25:31
Просмотров 852 тыс.
Кто Первый Получит Миллион ?
27:44
Solving real world data science tasks with Python Pandas!
1:26:07
How Fast can Python Parse 1 Billion Rows of Data?
16:31
I wish every AI Engineer could watch this.
33:49
Просмотров 60 тыс.
ML Was Hard Until I Learned These 5 Secrets!
13:11
Просмотров 226 тыс.
"The Applause" | Fischer vs Spassky | (1972) | Game 6
22:53
КТО ЭТО БЫЛ?
25:31
Просмотров 852 тыс.