How to Use spaCy's EntityRuler (Named Entity Recognition for DH 04 | Part 01)

Python Tutorials for Digital Humanities

Подписаться 28 тыс.

Просмотров 29 тыс.

50% 1

Видео Поделиться Скачать Добавить в

Опубликовано:

5 окт 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 62

@python-programming 3 года назад

Check out the Textbook for this series: ner.pythonhumanities.com/intro.html

@bruinjoe 2 года назад

Thank you for these fantastic videos. The generate_rules function needs to be updated for SpaCy 3 comment out the lines: ruler = EntityRuler (nlp) nlp.add_pipe ( ruler) and replace with this line: ruler = nlp.add_pipe ( "entity_ruler" )

@TheAmazingMrFrog 3 года назад

This line: _item = item.replace("The", "").replace("the", "").replace("and", "").replace("And", "")_ will affect names like "Andrew" or "Theodore" as well. It might be better to include the space after "The" or "And" in the replace pattern.

@python-programming 3 года назад

Great catch! You are absolutely right.

@berrodriquez26 2 года назад

this guy is a legend

@ShubhamKumar-pn7qx 2 года назад

In function : test_model, how doc = nlp(text) not giving an error when it is not passed as a parameter and what's use of passing model in same function

@takamatoga Год назад

same

@dhairyaumrania4986 2 года назад

Thanks for this tutorial, very helpful!

@python-programming 2 года назад

So glad you found it helpful!!

@ricardocalleja 3 года назад

In line 80 nlp.add_pipe(ruler) gave me an error, I changed the hole function to: def generate_rules(patterns): nlp = English() source_nlp = spacy.load("en_core_web_sm") nlp.add_pipe("ner", source=source_nlp) ruler = EntityRuler(nlp) ruler.add_patterns(patterns) nlp.to_disk("hp_ner")

@python-programming 3 года назад

I am needing to update these videos to spaCy 3. That was thebold spaCy 2 syntax. Glad you figured it out!

@rasmuseberley 25 дней назад

but this just loads the normal ner with its default categories, right? I just tried it and am not sure its working as intended.. Thanks so much, anyway, I already spent so much time trying to follow this and get it to work..

@rasmuseberley 25 дней назад

based on the update video, the function should look like this: def generate_rules(patterns): nlp=English() ruler = nlp.add_pipe("entity_ruler") ruler.add_patterns(patterns) # type:ignore (because VSCode somehow cant find this function, although it works) nlp.to_disk("hp_ner") link to update video ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-16Ujcah_-h0.html

@mohamedelmeziani4012 3 года назад

thank you so much for this great tutorial. I want to ask you why you have used the NPL model instead of the created model name in the testing function?

@python-programming 3 года назад

No problem! I am not sure I understand your question. What is the time in the video you are referencing?

@abdelhomi836 3 года назад

Thank you for this great tutorial. Would it be possible if we could create a sub-entity pattern based on the main the ruler patterns?

@DeepakChauhan-mn5jw 3 года назад

While browsing through your playlist videos I noticed that the 6 videos in Neural Networks for Digital Humanities (DH) and Machine Learning for Digital Humanities (DH) are the same. Maybe the whole playlist is duplicated.

@python-programming 3 года назад

Thanks for letting me know! I originally i lntended for the two to go different directions. I need to get back to that.

@abedatascience3840 3 года назад

How did you get all names of people as a knowledge base/dictionary of names in the beginning? Did you just get ready dictionary or smth else? thanks

@python-programming 3 года назад

Good question. I may need to explain that a bit more clearly. Sorry about that. I gathered the names from Wikipedia using BeautifulSoup and Requests (en.wikipedia.org/wiki/List_of_Harry_Potter_characters). That was the original knowledge base.

@abedatascience3840 3 года назад

@@python-programming Thanks make sense

@vinitaverma5676 2 года назад

Hey your video is top notch. You've used json file in this, so can we use excel sheet instead of json file?

@python-programming 2 года назад

You absolutely could. I would recommend using pandad to import the data. If you do not know how to use pandas, I have a playlist and book on it. Pandas.pythonhumanities.com

@vinitaverma5676 2 года назад

@@python-programming Thanks for replying. One more thing in one of the video i.e. Train a Spacy NER model, you've used hp_training_data.json could you provide that json file as I'm getting error such as ValueError: not enough value to unpack(expected 2).

@rajeshwarsehdev2318 3 года назад

Great tutorials! :) Why can't we use BERT models for NER extraction? Is there any specific reason for when to use spaCy? pls help me to understand

@python-programming 3 года назад

You absolutely can! BERT is a trade off. It is expensive to train, but has higher accuracy metrics. In spaCy 3, you can train BERT NER models. This series will be getting to that soon.

@shmouel4747 Год назад

Thanks for the brillant tutorial! I would like to add pattern multiple time. However, if I only use the ruler.add_patterns function it returns "name ruler is not defined" and If i execute ruler = nlp.add_pipe("entity_ruler") it returns "entity ruler already exist in pipeline"

@1qazxsw2010 3 года назад

Thank you for these videos. I'm following along using my own data, from which I want to retrieve objects or NOUNS instead of names, but I noticed that this method is case sensitive, so I tried converting both my text and json list to upper case and I got wrong results (when my json list had a matching case letter, the results were perfect). How can we make spaCy case insensitive?

@python-programming 3 года назад

Thanks for the comment! I am glad you are finding these videos useful. A standard way to do this is either do data augmentation by creating an entity ruler that has upper case, lower case, capitalized, and non capatilized words. Or, and this is less conputationally expensive, use only lowercase and make sure to lower your texts before running the entity ruler over it. Or you can use a patternmatcher in spaCy that will automatically do that by passing a pattern with lower attached as a parameter. If that does not help, let me know and I will give a better response when I get to a conputer on Tuesday

@1qazxsw2010 3 года назад

@@python-programming Great, thank you for clarifying. I just tried with lowercase and it happened the same. Then I realized I was doing it wrong. I was doing it like: for item in data: new = item.upper() This was wrong. I tried with: for i in range(len(data)): data[i] = data[i].upper() And it worked! thank you for making me try it another way. I also tried it with lowercase, but I did not see much of an issue with the performance. Those workflows are pretty interesting, I'd love to see them in action! hehehe

@miladjurablu9032 2 года назад

thank you for this great video i have question how can I use EntityRuler for persian(farsi) language?

@rasmuseberley 25 дней назад

For anyone trying this with V3: based on the update video, the function should look like this: def generate_rules(patterns): nlp=English() ruler = nlp.add_pipe("entity_ruler") ruler.add_patterns(patterns) # type:ignore (because VSCode somehow cant find this function, although it works) nlp.to_disk("hp_ner") link to update video ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-16Ujcah_-h0.html

@gurkanyesilyurt4461 Год назад

I got this error: OSError: [E050] Can't find model 'hp_ner'. It doesn't seem to be a Python package or a valid path to a data directory.

@haniehmaroofi9452 3 года назад

Thanks for the videos. I am pretty new with SpaCy. Trying to run the codes from your textbook, encountered this error: ValueError: [E966] `nlp.add_pipe` now takes the string name of the registered component factory, not a callable component. Expected string, but got (name: 'None'). - If you created your component with `nlp.create_pipe('name')`: remove nlp.create_pipe and call `nlp.add_pipe('name')` instead. - If you passed in a component like `TextCategorizer()`: call `nlp.add_pipe` with the string name instead, e.g. `nlp.add_pipe('textcat')`. - If you're using a custom component: Add the decorator `@Language.component` (for function components) or `@Language.factory` (for class components / factories) to your custom component and assign it a name, e.g. `@Language.component('your_name')`. You can then run `nlp.add_pipe('your_name')` to add it to the pipeline. Then updating the nlp.create_pipe("entity_ruler") with nlp_add_pipe("textcat") as suggested by error and get this new error: AttributeError: 'TextCategorizer' object has no attribute 'add_patterns' Do you have any idea what could be wrong?

@python-programming 3 года назад

These videos are for spacy 2.0. They upgraded to 3.0 earlier this month. I am working on updating the notebook and videos. Of you pip install spacy 2.0, the code will work

@egomalego 3 года назад

@@python-programming What would I need to change in order to use my patterns in the model with V3?

@egomalego 3 года назад

This actually worked for me (found here: spacy.io/api/entityruler ) : entity_ruler = nlp.add_pipe("entity_ruler") entity_ruler.initialize(lambda: [], nlp=nlp, patterns=patterns)

@python-programming 3 года назад

@@egomalego Thanks for sharing this! I was just about to link to my new spaCy 3.0 text classification training that introduces viewers to the new config system.

@haniehmaroofi9452 3 года назад

@@egomalego Thank you very much!

@gokulgupta1021 3 года назад

Hi, a really nice way to explain every single thing, simply loved it. When I am trying to implement generate rules function. I am getting an error. "ValueError: [E966] `nlp.add_pipe` now takes the string name of the registered component factory, not a callable component. Expected string, but got (name: 'None'). - If you created your component with `nlp.create_pipe('name')`: remove nlp.create_pipe and call `nlp.add_pipe('name')` instead. - If you passed in a component like `TextCategorizer()`: call `nlp.add_pipe` with the string name instead, e.g. `nlp.add_pipe('textcat')`. - If you're using a custom component: Add the decorator `@Language.component` (for function components) or `@Language.factory` (for class components / factories) to your custom component and assign it a name, e.g. `@Language.component('your_name')`. You can then run `nlp.add_pipe('your_name')` to add it to the pipeline."

@surajgupta6962 Год назад

def generate_rules(patterns): nlp = English() ruler = nlp.add_pipe("entity_ruler") #ruler = EntityRuler(nlp) ruler.add_patterns(patterns) #nlp.add_pipe(ruler) nlp.to_disk("hp_ner") patterns=create_training_data('hp_character.json','PERSON') generate_rules(patterns) use this code

@memsbdm9125 Год назад

@@surajgupta6962 u dropped this 👑

@rahuldey6369 3 года назад

These entity extraction task is somewhat like a multiclass classification task. Is it so?

@python-programming 3 года назад

That is how I look at it.

@rahuldey6369 3 года назад

@@python-programming ok thank you

@kevinclements7305 3 года назад

Is there a Spacy 3.0 version of of this code showing the differences around nlp.add?

@sabririhab9383 3 года назад

i have a question : can the same custom trained model work on multiple languages if my dataset has words from different latin languages using the same tags ?

@python-programming 3 года назад

Good question. Yes and no. It will work but it needs to have encountered the other langs in training to learn context. You can use an entity ruler that would be language agnostic

@sabririhab9383 3 года назад

@@python-programming thank you for responding ! alright , so if i am using both english and frensh that should be fine

@python-programming 3 года назад

That should be fine.

@PC-my3hl 3 года назад

How can I add more patterns to an existing model, for example, if I want to add more patterns to hp_ner, how can I do it?

@python-programming 3 года назад

I am on my phone responding. I can give a better response Monday, but essentially you open your saved model use the get_pipe method to grab the entiry ruler and add patterns to ut the same way you did initially. You then save the new model and you should have those patterns saved. Alternatively you can do it by opening the jsonl file that contains the patterns and placing them on new lines.

@PC-my3hl 3 года назад

@@python-programming Thank you, can you give a more specific explanation?