LangChain - Using Hugging Face Models locally (code walkthrough)

Подписаться 62 тыс.

Просмотров 103 тыс.

50% 1

Colab Code Notebook: [drp.li/m1mbM](drp.li/m1mbM)
Load HuggingFace models locally so that you can use models you can’t use via the API endpoints. This video shows you how to use the end points, how to load the models locally (and access model that don’t work in the end points) and load the embedding models locally.
My Links:
Twitter - / sam_witteveen
Linkedin - / samwitteveen
Github:
github.com/samwit/langchain-t...
github.com/samwit/llm-tutorials

Наука

Опубликовано:

7 мар 2023

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 82

@insightbuilder Год назад

Keep up the great work. And thanks for curating the important HF models that we can use as alternate for paid LLMs. When learning new tech, using the free LLMs can provide the learner a lot of benefits.

@bandui4021 Год назад

Thank you! I am a newbie in this area and your vid´s are helping me a lot to get a better picture of the current landscape.

@prestigious5s23 9 месяцев назад

Great tutorial. I need to train a model on some private company documents that aren't publicly released yet and this looks like it could be a big help to me. Subbed!!

@tushaar9027 7 месяцев назад

Great video sam , i don't know how i missed this

@luis96xd Год назад

Amazing video, everything was well explained, I needed it, thank you so much!

@sakshikumar7679 15 дней назад

saved me from hours of debugging and research! thanks a ton

@steev3d Год назад

Nice video. Im trying to connect an LLM and use Unity 3d as my interface for STT and TTS with 3d characters. I just found a tool that enables connex to a LLM on huggingface which is how I discovered that you need a paid endpoint with GPU support to even run most of them, I kinda wish I found this video when you posted it. Very useful info.

@AdrienSales Год назад

Excllent tutorial, ad so weel explained. Thanks a lot.

@hnikenna Год назад

Thanks for this video. You just earned a subscriber

@markomilenkovic2714 9 месяцев назад

If we cannot afford to get A100, what's the cheaper option you would recommend to run these? I understand the models differ in size also. Thanks Sam.

@binitapriya4976 8 месяцев назад

Hi Sam, Is there any way to generate question answer from a given text in a .txt file and save those questions answers in another .txt file with the help of free huggingface model?

@Chris-se3nc Год назад

Thanks for the video. Is there any way to get an example using the lang chain JavaScript library? I am new to this area, and I think many developers would have a node versus a python background

@azzeddine1 Год назад

How can the ready-made projects on the platform be linked to Blogger blogs? I have long days searching to no avail

@DanielWeikert Год назад

I tried to store the RU-vidDownloader loads in FAISS using HuggingFace Embeddings but the LLM was not able to do the similarity search. Colab finally ran into timeout. Can you share how to do this instead of using OpenAI? With OpenAI I had no issues but like to do it with HF Models instead e.g. Flan br

@morespinach9832 4 месяца назад

This is helpful because in some industries like banking or telcos, it's impossible to use open source things. So we need to host.

@jzam5426 9 месяцев назад

Thanks for the content!! Is there a way to run a HuggingfacePipeline loaded model using M1/M2 processors on Mac? How would one set that up?

@luis96xd Год назад

I have a problem, when I use low_cpu_mem_usage or load_in_8_bit, I get an error about I need to install xformers , When I install xformers , I get an error I need to install accelerate, When I install accelerate, I get an error I need to install bitsandbytes, And so on: einops accelerate sentence_transformers bitsandbytes But finally, I got an error *NameError: name 'init_empty_weights' is not defined* I don't know how I can solve this error and why it happens, could you help me please?

@yves1893 Год назад

i am using huggingface model chavinlo/alpaca-native however, when i use those embeddings with this model pipe = pipeline( "text-generation", model=model, tokenizer=tokenizer, max_length=248, temperature=0.4, top_p=0.95, repetition_penalty=1.2 ) local_llm = HuggingFacePipeline(pipeline=pipe) my output is always only 1 word long. can anyone explain this?

@venkatesanr9455 Год назад

Thanks for the valuable series and highly informative. Can you provide some discussions on in-context learning(providing context/query), reasoning & chain of thoughts

@samwitteveenai Год назад

Hi glad it is helpful. I am thinking about doing some vids on Chain of Thought prompting, Self Consistency, and PAL going through the basics of the paper and then looking at how they work in practice with an LLM. I will in the basics of in-context learning as well. Let me know if there are any others you think I should cover.

@anubhavsarkar1238 29 дней назад

Hello. Can you please make a video on how to use the SeamlessM4T HuggingFace model with langchain ? Particularly for text to text translation. I am trying to do some prompt engineering with the model using Langchain's LLMChain module. But it does not seem to work ...

@brianrowe1152 Год назад

Stupid question, so I'll take a link to another video/docs/anything. Which Python version, cuda version, pytorch is the best to use for this work? I see many using python 3.9 or 3.10.6 specifically. The pytorch site recommends 3.6/3.7/3.8 on the install page. Then the cuda version 11.7 or 11.8 - it looks 11.8 is experimental? Then when I look at my nvcc output its says 11.5, but my nvidia-smi says cuda Version 12.0 .. head explodes... I'm on Ubuntu 22.04. I will google some more, but if someone know the ideal setup.. or at least the it works setup.. I appreciate it!!! Thank you

@atharvaparanjape9585 Месяц назад

How can I load the model for some time later, once I download it on the local drive

@surajnarayanakaimal Год назад

Than you for the awesome content, it would be very helpful if you make tutorial on how to use custom model with langchain embed it, so i want to train some documentations , so currently we can use open ai or other service APIs But it is very costly consuming their APIs, so can you teach how to do that locally please consider training a custom documentation of site, and it can answer from the documentation, more context aware and also history remember. Currently for that we depend on open ai APIs. So if it's achievable using local modal it would be very helpful

@stonez56 2 месяца назад

Please make a video on how to convert Safetensors to. GUFF format or format that can be used for Ollama? Thanks for these great AI videos!

@megz_2387 11 месяцев назад

how to fine tune this model so that it can follow instructions on data provided

@magnanimist Год назад

Just curious, do you need to redownload the model everytime you run scripts like these? Is there a way to save the model and use it after it's been downloaded?

@samwitteveenai Год назад

If you are doing this on a local machine the model will be there and HuggingFace should save it to a model. You can also do model.save_pretrained('model_name')

@MohamedNihal-rq6cz Год назад

Hi sam , how do you feed your personal documents and query them and return response in a generative question answering format and not as extractive question answering , I am bit new to this library , I don't want to use Openai api keys please provide some guidance on using with open source llm models, thanks in advance!

@samwitteveenai Год назад

that would require fine tuning the model, if you want to put the facts in there. That is probably not the best way to go though.

@srimantht8302 Год назад

Awesome video! Was wondering how I could use Langchain with a custom model running on sagemaker? Is that possible?

@samwitteveenai Год назад

yeah that should be possible in a similar way.

@halgurgenci4834 Год назад

These are great videos Sam. I am using a Mac M1. Therefore, it is impossible to run any model locally. I understand this is because PyTorch has not caught up with M1 yet.

@samwitteveenai Год назад

actually I think they will wrong. I use an M1 and M2 as well but I run models in the cloud. I might try to get them to run on my M2 and make a video if it works.

@botondvasvari5758 Месяц назад

and how can I use big models from huggingface ? I can't load them into memory because many of them are bigger than 15gb, some of them are 130gb+ . Any thoughts?

@samwitteveenai Месяц назад

you need a machine with multi GPUs

@younginnovatorscenterofint8986 Год назад

Hello Sam,how do you solve Token indices sequence length is longer than the specified maximum sequence length for this model (2842 > 512). Running this sequence through the model will result in indexing errors. thank you inadavance

@samwitteveenai Год назад

this is a limitation in the model not LangChain. There are some models on HF that are 2048.

@alexandremarhic5526 Год назад

Thank for he work. Just let you know Loire Valley is in the north of France ;)

@samwitteveenai Год назад

Good for wine ? :D

@alexandremarhic5526 Год назад

@@samwitteveenai depends of your taste. If you love sugar wine, south is better. Specialy for withe wine like "Jurançon".

@evanshlom1 10 месяцев назад

U a legend

@induu954 Год назад

Hi.. i would like to know that, Can we chain 2 models like a classification model and a pretrained model using langchain?

@samwitteveenai Год назад

You could do it through a tool. Not sure there is anything in built in LangChain for the classification models if you mean something like a BERT etc.

@daryladhityahenry Год назад

Hi! Do you find a way to load vicuna gptq version using this? I try your video with gpt neo 125M and it's working, but not vicuna gptq. Thank youu

@human_agi Год назад

what kind of cloab you need, becuase i am using $10 version with high ram and GPU on, and still cannotr run ValueError: A device map needs to be passed to run convert models into mixed-int8 format. Please run`.from_pretrained` with `device_map='auto'`

@samwitteveenai Год назад

If you don't have access to the bigger GPU then go with a smaller T5 model etc.

@rudy.d Год назад

I think you just need to add the argument device_map='auto' in the same list of arguments of your model's "*LM.from_pretrained(xxxx)" where you have "load_in_8bit=True"

@computadorhumano949 Год назад

Hey, why it take time to response out? This needed of my CPU to be fast?

@samwitteveenai Год назад

yeah for the local stuff you really need a GPU rather than a CPU

@hiramcoriarodriguez1252 Год назад

I'm a transformers user and I don't still get the point to learn this new library. Is just for very specific use cases?

@samwitteveenai Год назад

Think of it as an abstraction layer for prompting and and how to manage the user interactions with your LLM. It is not an LLM in itself.

@hiramcoriarodriguez1252 Год назад

@@samwitteveenai I know, it's not a LLM, the biggest problem that I see is learning a new library that wraps Open AI and HuggingFace libraries just to save 3 or 5 lines of code. I will follow your work, maybe that will change my mind.

@insightbuilder Год назад

Consider the Transformers as the first layer of abstraction over the neural nets which create the LLMs. In order to interface with LLMs, we can use many of libraries including HF. HF Hub/ Langchain will be the 2nd layer. The USP of langchain is the ecosystem that is built around it, especially using the Agents, Utility Chains. This ecosystem lets the LLMs to be connected with the outside world... The devs at LC have done a great job. Do learn it, and share this absolutely brilliant vids with your friends/ team members etc.

@samwitteveenai Год назад

great way of describing it @Kamalraj M M

@neilzedd8777 Год назад

@@insightbuilder beyond impressed with how healthy their documentation is. Working on a flan-ul2 + lc app right now, very fun times.

@fintech1378 8 месяцев назад

how to do telegram chatbot with this

@DarrenTarmey Год назад

It would be nice to have someone do review fir noobies as there are so much to learn and it's hard to know we're to start from.

@samwitteveenai Год назад

what exactly would you like me to cover? Any questions I am happy to make more vids etc.

@SomuNayakVlogs Год назад

can you create for csv as input

@samwitteveenai Год назад

I made another video for using CSVs with langchain check that ou

@SomuNayakVlogs Год назад

@@samwitteveenai Thanks Sam,i already watch that video that is with opeiai but i wanted lang chain with csv and huggingface

@SomuNayakVlogs Год назад

can you please help me on that

@Marvaniamehul Год назад

I am also curious if we can use hugginface pipeline (local run) and langchain to load csv file.

@SD-rg5mj Год назад

hello and thank you very much for this video, on the other hand the problem is that I am not sure to have understood everything, I speak English badly, I am French

@samwitteveenai Год назад

Did you try the french sub titles? I upload English subtitles so I hope youtube does a decent job translating them. Also feel free to ask any questions if you are not sure.

@KittenisKitten Год назад

Would be useful if you explained what program your using, or what page your looking at, seems like waste of time if you don't know anything about the programs or what your doing 1/5

@samwitteveenai Год назад

The Colab is linkedin in the description, its all there to use.

@XiOh 9 месяцев назад

u are not doing it locally in this video.....

@samwitteveenai 9 месяцев назад

The LLMs are running locally on the machine where the code is running. The first bit shows pinging the API as a comparison.

@mrsupremegascon 9 месяцев назад

Ok, great tutorial, but as a French from Bordeaux, I am deeply disappointed by the answer of google about the best area to grow wine. Loire valley ? Seriously ???? Name one great wine coming from Loire, Google, I dare you. They are in the b league at best. The answer is obviously Bordeaux, I would maybe had accepted Agen (wrong) or even Bourg*gne (very very wrong). But Loire, it's outrageous and this answer made me certain that I will never use this cursed model.

@samwitteveenai 9 месяцев назад

lol well at least you live in a very nice area of the world.

@nemtii_ Год назад

What happens always with this setup langchain + HuggingFaceHub is that it only increments on 80 characters for each call, anyone else having this problem, I tried max_length: 400 and still same issue

@nemtii_ Год назад

it's not local to langchain, I used the client directly and still getting the same issue

@samwitteveenai Год назад

I think this could be an issue with their API. Perhaps on the Pro/paid version they allow more? I am not sure, to be honest I don't use their API , I tend to load the models etc. could also the max_new_tokens setting rather than max_length, that could help.

@nemtii_ Год назад

@@samwitteveenai wow! thank youuu!! worked with max_new_tokens

@nemtii_ Год назад

@@samwitteveenai I wish someone one do a list, mapping of which model sizes runs on google colab free, versus the paid colab, and so to see if it's worth to pay, and what can u experiment with within that tier, I'm kinda lost in that sense, at a stage where I just want to evaluate models myself, and see for a production-env later

@samwitteveenai Год назад

This would be good I agree

@ELECOEST 6 месяцев назад

Hello, Thanks for your video. for now it's : llm_chain = LLMChain(prompt=prompt, llm=HuggingFaceHub(repo_id="google/flan-t5-xxl", model_kwargs={"temperature":0.9, "max_length":64})) temperature must bu >0 and model : flan-t5-xxl