Best way to do Named Entity Recognition in 2024 with GliNER and spaCy - Zero Shot NER

Python Tutorials for Digital Humanities

Подписаться 28 тыс.

Просмотров 7 тыс.

50% 1

Видео Поделиться Скачать Добавить в

Опубликовано:

3 окт 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 49

@urchadezaratiana6781 6 месяцев назад

gliner-spacy is awesome 💯

@python-programming 6 месяцев назад

Thanks so much! The same can be said of GliNER!! 😀

@VenkatesanVenkat-fd4hg 6 месяцев назад

@@python-programmingWhether NER cannot be achieved using prompt Engg + LLM...? Can you educate on this..

@critical-chris 3 месяца назад

Regarding your example with Auschwitz: how exactly did it learn that Auschwitz belongs to concentration_camp type? Is it because your example sentence happened to say exactly that or is that just a coincidence?

@sid40000 3 месяца назад

Great solution! Thank you!

@lfmtube 5 месяцев назад

Best video as always. Thanks!

@null4624 12 дней назад

From all examples you could pick, you come up with this .. ?

@LexPodgorny 5 месяцев назад

Hi, great video! You've mentioned "... if you don't have training data". I am assuming that you mean that annotated data is not required, and instead the model relies on unsupervised approach. If this is correct, than for specialized texts it must rely on embedding training? Thanks!

@JoseSanchez-xz5wt 6 месяцев назад

Really cool! Can you make a video on how to further train the LatinCy model? I have a ton of additions to the lemma fixer custom component and I've noticed a few recurring patterns I want to fix generally

@python-programming 6 месяцев назад

I can but it may be better to retrain from scratch. In these instances you can experience catastrophic forgetting. If you want to train from scratch, you could modify the original training data or add to it with your own. That said, if you simply need to adjust a component, that is entirely different. Would you mind explaining a bit more about what you want to do?

@JoseSanchez-xz5wt 6 месяцев назад

Let me start by saying I'm new to this! I put together a Latin corpus of texts and I'm counting lemma frequencies. But I noticed that some verb forms are consistently off, like almost all pluperfect forms (like counting uiderat as the lemma instead of uideo). Instead of having to add a correction to the component for each verb, I wanted to see if there was a way to train the model to make it better at recognizing the lemma of verbs in pluperfect forms. Thanks for responding!@@python-programming

@python-programming 6 месяцев назад

@@JoseSanchez-xz5wt ahh I see! In that case, I would reach out to Patrick directly: twitter.com/diyclassics?lang=en --- he's on Twitter as diyclassics and if you look him up on Google you can find his email as well. I don't want to put it here and have him get spam messages.

@julkul99 5 месяцев назад

this is very cool! Whats the benefit of using gliner-spacy over just using gliner by itsself?

@adilgun2775 6 месяцев назад

Thanks for the video. One question, Is it possible to make a few shot in addition to zero shot with GliNER (without finetuning)

@python-programming 6 месяцев назад

No problem! Great question. I have not seen an example of this. Sorry! There are already a few examples of few-shot spaCy libraries. Concicse concepts is one.

@adilgun2775 6 месяцев назад

Thanks @@python-programming

@daviddeisadze7037 6 месяцев назад

Great video! What would you do to extract hard skills and soft skills from a resume and job description? I am thinking entity rulers from spacy and match it but I was wondering what you were thinking. Thanks!

@python-programming 6 месяцев назад

Thanks! If you have a controlled vocab for these things then maybe a rules pipeline would work, but an ML model would likely be better since it wouls find things that are not in your list. You could also have a combination of both.

@emmanuelteitelbaum 2 месяца назад

Hi thanks for the informative video! Let's say, like in your book, you had a list of concentration camps that you wanted to feed to the model to improve its accuracy. How would do that? Or would you not do it and just use a more conventional spaCy pipeline?

@ifrasaifi1124 6 месяцев назад

Great explanation! Can we use gliner to extract medicinal plants scientific name and their medicinal effects?

@python-programming 6 месяцев назад

Thanks so much! Like most ML things, the best thing to do is try it out. Change the lagels to those exact label names and run it over a text. If you want to extract label names with spaCy, though I created bio spaCy that does precisely this.

@ifrasaifi1124 6 месяцев назад

@@python-programmingThank you so much, can you please share link to bio spacy? Additionally thank you so much for such amazing videos, they are really helpful!

@python-programming 6 месяцев назад

@@ifrasaifi1124 Sure! Here it is: github.com/wjbmattingly/biospacy --- you are very welcome! I'm glad to hear you are finding my content helpful!

@ifrasaifi1124 6 месяцев назад

@@python-programming Thank you, can I also use it for relation extraction for example plant name linked to its medicinal effect?

@python-programming 6 месяцев назад

@@ifrasaifi1124 That would be a separate component that does not yet exist. You want to look into entity linking and connecting the plant to a wiki_id and then connect that to a database of medicinal effects.

@CMAZZONI 2 месяца назад

Great video. The only problem is that gliner is not easy to implement in production such as in a remote server or a huggingface endpoint. Has anyone able to make this work?

@python-programming 2 месяца назад

Thanks! Not sure if you were the one who left this comment on GitHub, but I just responded to an issue there. I'm curious if a good solution would be to upload a spaCy pipelien with gliner-spacy to HuggingFace. This would make it a standard spaCy token classification pipeline and then allow the HuggingFace endpoint to work. I haven't tested this, so I'd be curious to learn if it works! You will likely need to drop the gliner-spacy component into the repository.

@Abishek_B 2 месяца назад

Can we use those as a backend model for flutter app?

@NavyaVedachala 3 месяца назад

Are there any resources for finetuning GLiNER? The repo for GLiNER is giving me bugs when I attempt to finetune

@python-programming 3 месяца назад

I built a library called gliner-finetune github.com/wjbmattingly/gliner-finetune it may help

@SuiGio Месяц назад

How are you fine tuning such a model? What extra you need it to do? Thanks

@dorellazara2010 6 месяцев назад

It’s amazing

@python-programming 6 месяцев назад

Thanks! Glad you like it! Curious how well it works for you.

@Noelh86579 4 месяца назад

does it work on live streaming call?

@python-programming 4 месяца назад

It will, but the latency may be an issue. I'm not sure of anything that can do real-time NER the way in which you can get transcriptions in near real-time.

@facundozupel4166 4 месяца назад

im starting to use Space for entity extraction from the content of my competitors on the serp for a keyword (I work on SEO) but the entities that extracts are very very weird, leaving behind some more important (I use it in Spanish). Gliner might help?

@VenkatesanVenkat-fd4hg 6 месяцев назад

Whether NER cannot be achieved using prompt Engg + LLM...? Can you educate on this..

@akshaysrivastava4304 6 месяцев назад

yes it can be used

@WalkAloneLive 5 месяцев назад

I had better results on GLiNER then on OpenAI 3.5 on zero-shot. A lot of False Positive. But at least we have what to filter later, and good is that it works very fast on low CPU needs. Still waiting for few-shot learning example, sure it will help a lot. Anyone tested domain-knowledge way of doing staff?

@aaroldaaroldson708 6 месяцев назад

is there anything like this, but for text classification? e.g.: I have a list of labels (topics) and a list of texts. And it has to tell me what topics are mentioned in which text