Тёмный
LLMs for Devs
LLMs for Devs
LLMs for Devs
Подписаться
Комментарии
@NickaGillis
@NickaGillis 10 дней назад
Can Jina handle sites with lazy load? Looking at dealership websites
@alienPear
@alienPear 11 дней назад
Thanks for sharing, bro! Greetings from Colombia
@devlearnllm
@devlearnllm 10 дней назад
My pleasure!
@juroo
@juroo 11 дней назад
pure gold, thanks man!
@Breaking_Bold
@Breaking_Bold 12 дней назад
I like this format of video...background has a large monitor...Nice video
@SergeyNumerov
@SergeyNumerov 14 дней назад
I wonder how this would handle dynamic content: as in scraping websites where you have to click stuff to reveal valuable content.
@ratnpriyarai4793
@ratnpriyarai4793 15 дней назад
It was quite useful for me.
@__________________________6910
@__________________________6910 16 дней назад
No, I know why you want long-term memory-because your AI girlfriend doesn't remember your past conversations. She forgets you, that's why.
@devlearnllm
@devlearnllm 4 дня назад
T.T
@ajkdrag
@ajkdrag 19 дней назад
Hi. I have a video request. Is there a way to contact you?
@devlearnllm
@devlearnllm 10 дней назад
tally.so/r/n9djRQ
@ajkdrag
@ajkdrag 9 дней назад
@@devlearnllm done. Thanks
@robelbelay4065
@robelbelay4065 22 дня назад
Great stuff man, thanks a lot!
@devlearnllm
@devlearnllm 21 день назад
Cheers!
@Phoenix-gi3gu
@Phoenix-gi3gu 23 дня назад
For experimenting I would recommend using no database at all. You can simply use the cosine similarity (i.e. from torch functional) or quickly implement it and you are nearly done. Just use some argsort to get the best matches. It's like five lines of code or so. For easy store/load you can use pickle to serialize/unserialize the object that holds the embeddings. It is fast on CPU too, but of course you can run it on GPU without any bigger changes. No services required.
@devlearnllm
@devlearnllm 23 дня назад
good point
@jonathanpark873
@jonathanpark873 27 дней назад
I wonder if you would update it to be able to use gpt-4o-mini as its much cheaper
@devlearnllm
@devlearnllm 23 дня назад
yep
@jakobkristensen2390
@jakobkristensen2390 Месяц назад
Im curious how you handle pages where the content exceeds token window
@devlearnllm
@devlearnllm Месяц назад
I'm sure Firecrawl or Jina would have a rolling context window for extraction. It's an easy thing to implement.
@vijishmadhavan6093
@vijishmadhavan6093 Месяц назад
what happens if we use all the 25000 cases, will it work?
@devlearnllm
@devlearnllm Месяц назад
Most likely. Pinecone, Weaviate and pgvector are very performant.
@switch8291
@switch8291 Месяц назад
you havent updated us on how much does scrapegraph-ai takes in comparison
@devlearnllm
@devlearnllm Месяц назад
Ah shoot I forgot about that.
@antronx7
@antronx7 Месяц назад
Would be cool to make AI website scraper that strips away all javascript bloat from a webpage and converts it into lightweight basic html page while preserving functionality. Would be great as a proxy service to make loading modern web pages fast on slow phones on poor data connections. Modern web is way too bloated. I sometimes manually archive a page by deleting all javascript in notepad++ and modify image embed links to point to locally saved .png files. That takes a long time but I can reduce 5MB page down to 200kB and save that. Would be nice to have smart automated tool to do that in seconds.
@Ckoraybingol
@Ckoraybingol Месяц назад
Great intro and work flow. Thanks a lot.
@devlearnllm
@devlearnllm Месяц назад
Much appreciated!
@MrPkmonster
@MrPkmonster Месяц назад
Thank you so much for the presentation. Just in time with the latest scraping technology
@devlearnllm
@devlearnllm Месяц назад
You bet!
@IkerCasilliasrocks
@IkerCasilliasrocks Месяц назад
Why not use Chatgpt and ask it to find information. Cant GPT just search the web itself?
@manfredmichael_3ia097
@manfredmichael_3ia097 Месяц назад
I think there should be a microphone in the middle of the audience. You had an insightful discussion with them, amazing audience!
@devlearnllm
@devlearnllm Месяц назад
Great idea!
@thisiswill
@thisiswill Месяц назад
The motion-tracking is a bit distracting.
@zaid6527
@zaid6527 Месяц назад
I dont know if my question is stupid, but can you tell me can we take snapshots of website and use ocr and llms to scrape the useful info, instead of sending request to that website since it would look more humanly , and also use less requests
@devlearnllm
@devlearnllm Месяц назад
Yeah you can probably do that!
@zaid6527
@zaid6527 Месяц назад
@@devlearnllm thanks 🤝
@dhineshprabakaran1786
@dhineshprabakaran1786 Месяц назад
Hi, I'm trying to scrape webdata from my Org Docs which is accessible only within VPN. Failed to goto 'docs url'. Can you help me with this ?
@ThoughtfullySo
@ThoughtfullySo Месяц назад
You should've tried Qdrant.
@artur50
@artur50 Месяц назад
is it possible to run it with Ollama?
@devlearnllm
@devlearnllm Месяц назад
Most likely
@ArunKumar-bp5lo
@ArunKumar-bp5lo Месяц назад
great
@ofrylivney367
@ofrylivney367 Месяц назад
Nice workshop! I'll definitely try out the hybrid search. Do you recon it'll work with nomic text embeddings and ollama?
@devlearnllm
@devlearnllm Месяц назад
Most likely!
@zuowang5185
@zuowang5185 Месяц назад
Is Openai embedding v3 model better than Bert?
@devlearnllm
@devlearnllm Месяц назад
Hard to tell unless experiments are run. huggingface.co/spaces/mteb/leaderboard
@JamesRBentley
@JamesRBentley Месяц назад
Nice video sir. I have already been experimenting with the colab - sincerest thanks
@devlearnllm
@devlearnllm Месяц назад
Great to hear!
@devlearnllm
@devlearnllm Месяц назад
Hey yall, in case you didn't get good full text search results like me, the CEO of Supabase (Paul Copplestone) sent me this to use instead: supabase.com/docs/guides/database/extensions/pgroonga
@florianhonicke5448
@florianhonicke5448 Месяц назад
@LLMs for Devs. I'm from Jina AI. Cool that you are using our reader app. I like seeing the exact use-cases people use that one - very interesting.
@devlearnllm
@devlearnllm Месяц назад
Big fan of Jina.
@jeffc1736
@jeffc1736 21 день назад
@@devlearnllm hi times are tough. can I borrow 10000k? I need rent money and lost my job as a retail worker at Dicks sporting goods in dallas.
@gregmeldrum
@gregmeldrum Месяц назад
Very informative! A great resource. Thanks for sharing your wealth of knowledge!!
@flor.7797
@flor.7797 Месяц назад
I just use Google 🙃
@ironbondar
@ironbondar Месяц назад
very good workshop. straight to the point
@You.Got.Lucky_
@You.Got.Lucky_ Месяц назад
This video was really helpful for the people like me looking for webscrapping tools. Though I wonder if jinaAi is really free. Is there any challenge in using it for more number of links? Does it have rate limit on hitting urls with prefix? Any clarification on this is appreciated. : )
@devlearnllm
@devlearnllm Месяц назад
No hard limits as far as I know. Free for now (I think this is intentional), but definitely will change in the future.
@danieldesenna7611
@danieldesenna7611 2 месяца назад
Great video!! Thanks for sharing the code! One question though: Inside A:tier code -> "print_ai_answer" function, you wrote: for like in extracted_personality["likes"]: text_to_embed = f"The user likes {like}" current_embeddings = embedding_client.embed_query(text_to_embed) dislike_with_metadata = { "id": str(uuid.uuid4()), "values": current_embeddings, "metadata": {"type": "likes", "content": like} } embeddings.append(dislike_with_metadata) Was it not supposed to be something like "likes_with_metadata ={...}" and then "embeddings.append(likes_with_metadata)" ? I guess repeating "dislike_with_metadata" does not make a difference for the code functionality, but it was a bit confusing to understand the code for a moment. Thanks!
@devlearnllm
@devlearnllm Месяц назад
Good catch!
@nve-c5d
@nve-c5d 2 месяца назад
so what did you find out about scrapegraph ai performance , tokens
@chanliah1918
@chanliah1918 2 месяца назад
Great demo, thank you!
@devlearnllm
@devlearnllm Месяц назад
My pleasure!
@frasonfrancis9698
@frasonfrancis9698 2 месяца назад
I don’t know how effective will this be in a long run especially due to the security update of cloudflare to block AI web scraping agents
@shuaiwang4092
@shuaiwang4092 2 месяца назад
So valueable video content! Many thanks for sharing~~
@SonGoku-pc7jl
@SonGoku-pc7jl 2 месяца назад
thansk, but difference or what is better gina reader or Scrapegraph-ai
@theadaloguy
@theadaloguy 2 месяца назад
Great video, thanks. Is there a way to provide our own scraped data (so we can make sure we use a good stealth scraper and get all the content), and then the LLM analyses it like this?
@devlearnllm
@devlearnllm 2 месяца назад
Yeah, you can always just build an LLM chain to just extract data. You can find the example in the Google Colab I provided.
@lomash_irl
@lomash_irl 2 месяца назад
I guess selenium is still the choice for javascript heavy websites... any tips on this?
@user-xh9uj1tf3l
@user-xh9uj1tf3l 2 месяца назад
bro i watched 4 minutes add before jumping actual video
@devlearnllm
@devlearnllm 2 месяца назад
That's crazy. Let me see if I can change that somehow
@ronaldokun
@ronaldokun 2 месяца назад
Thank you!!!!!!
@eyoo369
@eyoo369 3 месяца назад
Jina is almost perfect.. too bad it's not smart enough to scrape content from "accordions" where you first click to make the content visible. I feel a smart AI scraper should be able to grab that text and determine based on CSS class that it's probably valuable text.. just hidden at the time
@devlearnllm
@devlearnllm 3 месяца назад
That's too bad. What's the alternative?
@PaulFidika
@PaulFidika 3 месяца назад
"The entire internet hates him for this one simple trick"
@devlearnllm
@devlearnllm 3 месяца назад
9/10 prompt engineers recommend this
@mrRambleGamble
@mrRambleGamble 3 месяца назад
The camera moves too much
@devlearnllm
@devlearnllm 3 месяца назад
its the worst
@mrRambleGamble
@mrRambleGamble 3 месяца назад
@@devlearnllm Aside from that, great video.
@SuperLano98
@SuperLano98 3 месяца назад
How is your final implementation? I’m really curious about it, because make sense to have this abstracted, but we have some differences between than that can make this process tricky
@devlearnllm
@devlearnllm 3 месяца назад
Honestly handy because I went from Pinecone -> pgvector, and with all the abstracted methods declared, I was confidence in flipping and nothing broke.
@SuperLano98
@SuperLano98 3 месяца назад
@@devlearnllm Oh amazing, great news ! I will try to start with this implementation pattern, thank you !
@JohnMcclaned
@JohnMcclaned 3 месяца назад
such an inefficient and unreliable way to scrape the web
@rwz
@rwz 3 месяца назад
Please do not move the camera all the time
@haganlife
@haganlife 3 месяца назад
Definitely loosen up the tracking to center. OSBTail?
@devlearnllm
@devlearnllm 2 месяца назад
It's actually built-into the DJI Pocket 3 camera. I just had it for a few weeks. Just need to find the settings for it.
@forrest714
@forrest714 2 месяца назад
@@devlearnllm change the follow speed to slow instead of fast.