8. OpenAI Financial Advisor Q&A Embeddings - Python Tutorial

Подписаться 110 тыс.

Просмотров 30 тыс.

50% 1

Like the video? Support my content by checking out Interactive Brokers using the link below:
www.interactiv...
In this video, we transcribe a financial podcast using Whisper and use OpenAI Word Embeddings on the transcript to create a question answering system. If you like this type of content, I am starting a spinoff channel this year focused on AI in music, gaming, and design at / @parttimeai , so please subscribe, content coming there soon.
Notebook: colab.research...
Question Sheet (Raw): docs.google.co...
Question Sheet (Transformed): docs.google.co...

Опубликовано:

13 сен 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 105

@parttimelarry Год назад

Like the video? Support my content by checking out Interactive Brokers using the link below: www.interactivebrokers.com/mkt/?src=ptlPY1&url=%2Fen%2Findex.php%3Ff%3D1338 Notebook: colab.research.google.com/drive/1cVQNg2-zGQb7qZXFECG6kyq5yVIHyf5o?usp=sharing Question Sheet (Raw): docs.google.com/spreadsheets/d/1z6DVJPU1DS4J0OhsPkauRPu_2-HfiPjHix1hpaKTBzE/edit?usp=sharing Question Sheet (Transformed): docs.google.com/spreadsheets/d/13hTC5wV84-M7_nw_yC7LayRdLbwHAkm96moY6ea4qWU/edit?usp=sharing

@thebicycleman8062 Год назад

hey Larry i have a question, wouldnt it be much faster and less GPU intensive (and maybe more accurate) if you instead of passing the whole video through whisper to transcribe, just get the text transcription directly? (Most videos already have transcriptions or are even auto generated using google's speech to text, which is pretty accurate) - Any reason you are preffering to go thru the whisper route vs just downloading the video's transcription directly? thanks!!

@erikstillman7336 Год назад

Thank you so much for this continent. I was wondering how to structure WAV files is there a module that I can use with open AI whisper that will allow me to have some sort of data frame, rather than just a string of words

@bencarlson2587 Год назад

Holy crap! Larry this is amazing! Nicely done

@parttimelarry Год назад

Oh shit, you're here! How would you feel about having your voice cloned for science?

@bencarlson2587 Год назад

@@parttimelarry lol if this is how I find immortality let's do it ;) I definitely want to learn more about what you've created here

@jerrywang3225 Год назад

This is by far the best openai tutorial on youtube, period. These codes provide us with endless potentials. Thanks again.

@krissn8111 Год назад

I am falling in love with these openAI series of yours. Kudos.

@kamalswami8374 3 месяца назад

Hey Larry, I just found your yt channel today. I am building something like this bro you are my inspiration now. Respect ++

@supriyadevidutta Год назад

this is one of the best video, so far Larry, thank you

@charlieevert7666 Год назад

Funny enough I was thinking of doing this yesterday night... woke up to see this video pop up in my notifications lol. Thank you!!

@AndrewMagee01 Год назад

8:02 100% on the mark Larry. Great project as always.

@TexasNation897 Год назад

Thanks a bunch Larry! Bro you rock my man you’re a true gift from god please don’t stop 🙏

@frankgiardina205 Год назад

Wow, I need to watch this a couple more times before it sinks in, but can see some great uses for it. Thanks Larry , the way you are able to use all these different technologies and piece them together is exceptional. Thanks again!

@yony2k2 Год назад

I was looking everywhere for a video like this! about to use GPT3 for private information helping for search in different document. Thankl you !!

@prabhu_patil Год назад

Very impressive, you have opened gates of ideas. Proud that how you have transformed your self and us as well from pandas based indicators to future of AI based decision matrix .

@TheRealHassan789 Год назад

PTL.. you provide real value! Thanks

@ChrisWi88 Год назад

Incredible. Thanks for the awesome educational content

@adityakadam2256 Год назад

thanks for such an amazing video. This is very clever and I like your technique of combining whisper API with Embedding and Completion API. This is really a great insight. Thanks a ton.

@nicolatje Год назад

thanks for sharing all these beauties with us, trying to make the world a better place. Keep up your amazing work!

@kingtrippy5006 Год назад

Thank you Larry your a beast ✊

@tohando Год назад

Best video in the series! I like how you combined the different services and build something amazing. looking forward to the pipeline video! Keep up the good work!

@SergiRodriguesRius Год назад

Thanks for a so clear english speech. My ability to understand spoken english is a bit limited, by i can understand 99% of your video-tutorials without using captions! Indeed, your style to expose how to do these kind of projects is spectacularly easy to understand. You're a great teacher! Thanks for these tutorials about OpenAI. They are the best i have seen these weeks.

@parttimelarry Год назад

Thanks for watching, much more to come!

@SergiRodriguesRius Год назад

Indeed, Larry, have you test to make the part of "completions" with another model than davinci? I'm asking it just because price. You know.

@trainspotting02 Год назад

PTL great video and series. You are a brilliant lad! Thank you.

@IshmeetSinghahuja Год назад

Amazing tutorial!! thank you so much, Now cant wait for your next part. Any ideas?

@macrobody Год назад

Very nice! Can't wait for the next one.

@marcysalty Год назад

I’m about to use the same process in my thesis project… when it’s done I’ll show you!! BTW I think you’ll be named in the bibliography!! Thanks again for the great content you continuously provide!!

@megaedwin2363 Год назад

Hi, how did it go?

@marcysalty Год назад

@@megaedwin2363 still in progress!! Regarding this part of the project I’m wrapping everything up in a discord bot that will answer has a tutor in a online course… keep you posted on final results!!

@megaedwin2363 Год назад

@@marcysalty Great!!! Keep me posted

@ChefRodKnight Год назад

Your lessons are incredible! Thanks for sharing

@jayasimhanmasilamani9078 Год назад

Part Time Larry, I am a full-time fan!

@rotoboter Год назад

Love your lessons Larry. Thank you for your videos ❤

@keridince Год назад

I love this content, this is very usefull thank you

@robin7769 Год назад

You are giving me a lot of ideas, love your efforts.

@Steve-js7bp Год назад

this was incredibly good. as someone just learning to code I was able to follow along. Thankyou so much for putting this together!

@JOANCARLESAGUILAR Год назад

Great video!!! Thanks for awesome lessons

@simple-security Год назад

I've gotten as far as you show in this video. Now I'm trying to figure out: - splitting data into chunks to fit max tokens - openai has a great jpynb example for this. - how to loop through all data in chunks to fully complete the results of a question. - openai functions to improve consistency of output format.

@paraconscious790 Год назад

Wow, this is insanely valuable, can't believe you are sharing it for free, thank you very much! 🙏One question though, you mentioned that you can work with OpenAI API on your personal confidential information, but if I am calling API for embeddings is not sending my information to OpenAI for vectorization outside boundaries of my organization?

@bertobertoberto3 Год назад

BRILLIANT

@DavidDji_1989 Год назад

Awesome value !

@sriramkrishna6853 Год назад

2nddddd!!! Letsss gooo made itt! Missed your content, Larry! Hope to see more often. If you do some open source alternative to OpenAI at some point that will be great too.

@rafaeltacconi2065 Год назад

great content

@eltoroloco28 Год назад

Curious if you could share high level best practices for getting embeddings? Depending on the use case I'd imagine how you split up the text would be really important and also what are the technical requirements for the input (e.g. input mustn't have white spaces or line breaks?)... Thanks for all these tutorials, they're amazing!

@yellowboat8773 Год назад

Dam, Larry are you building these for companies yourself? Feels like your the tip of the spear here.

@Pork-Chop-Express Год назад

It DOES dodge questions. I performed (independently) a Top 25 NBA players of all time analysis based on ... lots of stats, using gaussian distribution, skewness, kurtosis; accounting for career-length differences, utility value, post season, accomplishments, and outliers. In the end- MJ was the GOAT, Wilt at #2, LeBron at #3, and Kobe at #4. I asked ChatGPT to access basketball-reference OR wikipedia. It responded by saying that it DID have that ability, but then neglected to do so and analyze the statistical categories. It refused OVER and OVER saying that "subjective biases exist that can skew perception of this complicated question." I asked it to IGNORE that and just crunch the numbers. IT AGAIN refused to do so. WOW

@NickWindham Год назад

Did you give ChatGPT thumbs down feedback so that hopefully OpenAI will fix it?

@Pork-Chop-Express Год назад

@@NickWindham Oh absolutely. I would be surprised if anything changes. I asked - dispassionately and objectively - over and over for it to "just focus on these categories." To which it replied about perceptions and biases. At this point, I think it is just a Google Copy and Paste ... thing. I don't see it doing anything more than that. No actual analysis or connecting the dots.

@onemanops Год назад

aww🎉some

@LeveragedFinance Год назад

good vid

@vipwlb Год назад

This is really a great sharing! Thanks man!! One quick questions is that I noticed not every episode will have description in details where you will find when the specific questions are raised? how can you get the questions start time in such cases please? Just out of curious. Thanks again!

@andrescastro8961 Год назад

Great content Larry!! 👏💥 I'm really looking forward to seeing an open-source alternative to OpenAI for doing this kind of project. How sure are we that the content we provide to the AI model stays private? I mean, OpenAI has access to all content fed into their system regardless of whether it is embedded, or in chatGPT format.

@AiDHDtv 11 месяцев назад

Thanks a lot for this really great content. It's quite hard for the novice but extremely interesting. One thing that would be really awesome is if you made the same embedding model for your own videos so that we can ask it aka AI-Larry questions about your how-to videos. For example, I have a series of podcasts that I would like to transcribe and embed and they don't have the perfect time stamps in the descriptions. How would I go about creating the Q&A CSV file for those episodes? Thanks! MW

@anbld9386 Год назад

Great content as always! Just one question: is there a following video about building the user web interface?

@bobbyhuang4620 Год назад

Great video! it truly blows my mind! One questions: How is this different from fine-tuning GPT model? Have you tried fine-tuning using the same dataset and compare the results? might be interesting to look into that.

@FPRowland Год назад

Thanks!

@parttimelarry Год назад

Thank you! This is very kind!

@yomajo Год назад

I wonder how long it actually took to build behind the scenes.

@AlterEgo77763 Год назад

Love the videos! Side note... I was wondering if you could do a video on the advanced-trade-api I believe this is replacing Coinbase pro's api? Possibly in python? :)

@parttimelarry Год назад

I have made some videos on CCXT before, which supports many exchanges. It looks like there are some recent code merges for CCXT on Github that support the new Coinbase stuff. So it should just be a configuration option.

@Joshukend Год назад

A thought I have is how podcasters grow over time. Is there a way to weight recent content as more important than old content? While still maintaining all the content in the database

@yshaool Год назад

Great video!!! Quick question - is it more accurate to calculate the embeddings for the questions in the file and then select the context suitable according to the question that is closest to the question the user is asking?

@SneyDeag 9 месяцев назад

Hi Larry no longer works the latest Python openai library no longer contains embeddings_utils. So this breaks this. I don't know if you can upload an update of this video or with another embeddings like for example with Azure. I send you a greeting you have helped me a lot to motivate me to study a professional career, I hope to share a coffee.😀

@5ice1971 Год назад

This is great! Can you guide me with what I would need to start as in my own server etc... Thanks

@tradissimo9606 Год назад

Hi, I wanted to start working on your "Full Stack Trading App Tutorial", but I'm missing the lectures on your homepage! Where is the old content of your homepage gone?

@hoomanvan Год назад

This is great! Do you think ChatGPT API could be applied to these use cases instead of embedding?

@SneyDeag 9 месяцев назад

@DamienLuc Год назад

When is part 9 coming?!!!

@yellowboat8773 Год назад

Thoughts on how to do this without question and answer in the original format? Can we just feed in walls of text then extract question and answers from that?

@parttimelarry Год назад

You don't necessarily need questions in advance, but you need to divide up your text in a logical way. If you check the OpenAI cookbook, they have an example using Wikipedia articles and asking questions about the Olympics. In this case, they use the headings + paragraphs to divide the text and finding the section that is most relevant to the question.

@wangjueliang Год назад

When you have a large length of text, how to chunk them by sentences and fit within the max token? Also if we could have some overlaps like having the last sentence from previous chunk to be included in the next chunk, it will provide the model a better context.

@parttimelarry Год назад

There are some great tools that handle common patterns like this that I need to cover - Langchain and Llama-Index.

@user-wr4yl7tx3w Год назад

I would have thought you would compute the embedding for 'question' column and do a cosine similarity between that versus the embedded form of your question. Then sort it by similarity. And take the 'context' corresponding to the closest similarity. Since you want to match question with question.

@parttimelarry Год назад

Many of the timestamps in the video are not full questions. There are many timestamps with 1 or 2 word titles like "Market sell-off", "Tax Strategy", etc, so I thought it made sense to check a combination of the question + the answer in case a user question was answered but wasn't directly contained in a timestamped question.

@SergiRodriguesRius Год назад

@@parttimelarry Maybe a third useful way, would be to add 2 more columns at the CSV of contexts: one to store an ABSTRACT done by davinci of the column CONTEXT (the transcribed audio between 2 time marks), and another column with the EMBEDDING of that ABSTRACT 😁 It probably would be useful to "have more clear" which row (Q&A) are closer to a new user question. And using these 2 "abstract embeddings" the request to davinci model will be quite more short and so quite more cheap. It would be needed to test it, of course. Maybe you would lose too much useful information to build the answers... who know.

@arnaudlelong2342 Год назад

What's in the mug dude? Hahaha just kidding thanks for the video.

@koetje071 11 месяцев назад

hey Larry, what would happen if I don't use the timestamp and just use the whole transcribed podcast as the source data? Would it just be more expensive and slower or would the resulting answer be different?

@Siyar-sb2ub 5 месяцев назад

So do i need to know timestamps before i can do this? or can i do like this: is there a way to do this without knowing the timestamps of the questions/answers?

@vtrandal Год назад

ChatGPT's knowledge cutoff is September 2021. Not 2019.

@GauravGarg-dq4js Год назад

Given youtube play list ? how did you extract all the playlist urls

@parttimelarry Год назад

This can be done in a few ways - 1) with the RU-vid API, 2) with some screen scraping , or 3) By hand :). I can touch on this when I should how to process in batch.

@GauravGarg-dq4js Год назад

@@parttimelarry right click inspect or ctrl shift i var scroll = setInterval(function(){ window.scrollBy(0, 1000)}, 1000); window.clearInterval(scroll); console.clear(); urls = $$('a'); urls.forEach(function(v,i,a){if (v.id=="video-title"){console.log('\t'+v.title+'\t'+v.href+'\t')}});

@dongnguyenanh7282 Год назад

why does it say Streaming data not found for the video. Unable to download. even though the youtube video is available?

@Kmysiak1 Год назад

How can we be sure our internal data being fed into the model isn’t saved somewhere with openai?

@CodeCoachh Год назад

I was wondering if I could get a little help. I have successfully added an embedding column to my data sheet but when I embed my question and try to sort my data sheet by similarities I run into the following error: numpy.core._exceptions._UFuncNoLoopError: ufunc 'multiply' did not contain a loop with signature matching types (dtype('

@CodeCoachh Год назад

The code seems to break only when I try to find similarities from a df just by loading the csv file with embeddings. When I create the csv file with embeddings from a data frame that data frame seems to work with finding similarities.

@cocoarecords Год назад

Hello Larry, can i do a similar thing using Ruby? 😢

@kebab-case Год назад

I have a PDF file with about 20k words. How can I make a chat bot that will answer questions who'se answers are inside the PDF? I tried to play with the openai GPT playground but it has a limit of 4096 words. Please give me tips.

@dservais1 Год назад

1st to like today 😀

@rshrott Год назад

Nice. Only issue is that gpt-3 calls will be expensive with that much text, will be totally unscalable i think

@parttimelarry Год назад

Thanks for the feedback, this gives me an idea for a cost calculation video. The embeddings calls I use in the project are very cheap. Will do the full batch of podcasts and show my costs for a large project. Also planning to do some projects with open source alternatives to compare results. Cheers.

@rshrott Год назад

@@parttimelarry I think gpt3 is overkill for this task. One reason embedding are great is the cost. Summarizing text should be fine for Currie, or a cheaper model even. BTW, what would you do if the text was a book without distinct q&A? I guess you need to determine how to split the text. You could automate the process of splitting using another embedding model maybe. Make an embedding of each sentence and then split into contexts based on the similarity of sentences? Hmm, interesting

@SergiRodriguesRius Год назад

@@rshrott for books/documentation, i suppose that probably could be useful to treat paragraphs in the same way Larry has worked with the CONTEXTS in this video (sentences between 2 time marks), and the CHAPTER of those paragraphs be indexed in the same way Larry has indexed the YT URLS 😁 Indeed, if you think it well, "to know the question" is no so important. You only need to find "text the most related to the question the user do" and then ask to the model to use it as a context to return an answer to the question. Sincerely... i want so much to try all this by myself...!! super! The applications are endless... finally WE HAVE A SEMANTIC TEXT CALCULATOR !!! It is our last dream for those of us who have been in this AI since the 90's.

@knddlbr Год назад

@@parttimelarry yes cost calculation become extremely interesting. It looks like one could extract the text, let it summarize with a cheaper model and then do the embedding. That way you also limit context as you will paste less text in the prompt. Thanks for the best video on embedding and specifically how to feed the context back to the model (and I watched half a dozen)

@rileyclubb Год назад

What is COMPLETIONS_MODEL? Is that a custom model you trained?

@parttimelarry Год назад

It's just a variable that I defined closer to the top of the notebook. I set it to text-davinci-003 (the OpenAI model) but you can change the value there to use a cheaper model if desired. COMPLETIONS_MODEL = "text-davinci-003"

@rileyclubb Год назад

@@parttimelarry d'oh! 😉 Excellent vid as always, super stoked for your new AI channel

@RickHunter-fz7oh Год назад

I never suscribe to any youtube channel, too much noise. Yours, I did.

@jmasked5082 Год назад

6:25 the answer sounds very chatgpt, said nothing at all to avoid the risk of being wrong.

@JohnRoodAMZ Год назад

Shouldn’t you say database, instead of model? Because you aren’t training right…just collecting the info then promoting with it right?

@parttimelarry Год назад

Totally said model a few times in the last few videos where it wasn't appropriate. Noticed this later but hard to go back since it takes a long time to record and edit.

@SneyDeag Год назад

Hello, can someone help me, I have this error. KeyError: 'streamingData' 1 stream = youtube_video.streams.filter(only_audio=True).first() 2 stream.download(filename='financial_advisor.mp4')