Thank you for these fantastic videos. The generate_rules function needs to be updated for SpaCy 3 comment out the lines: ruler = EntityRuler (nlp) nlp.add_pipe ( ruler) and replace with this line: ruler = nlp.add_pipe ( "entity_ruler" )
This line: _item = item.replace("The", "").replace("the", "").replace("and", "").replace("And", "")_ will affect names like "Andrew" or "Theodore" as well. It might be better to include the space after "The" or "And" in the replace pattern.
In function : test_model, how doc = nlp(text) not giving an error when it is not passed as a parameter and what's use of passing model in same function
In line 80 nlp.add_pipe(ruler) gave me an error, I changed the hole function to: def generate_rules(patterns): nlp = English() source_nlp = spacy.load("en_core_web_sm") nlp.add_pipe("ner", source=source_nlp) ruler = EntityRuler(nlp) ruler.add_patterns(patterns) nlp.to_disk("hp_ner")
but this just loads the normal ner with its default categories, right? I just tried it and am not sure its working as intended.. Thanks so much, anyway, I already spent so much time trying to follow this and get it to work..
based on the update video, the function should look like this: def generate_rules(patterns): nlp=English() ruler = nlp.add_pipe("entity_ruler") ruler.add_patterns(patterns) # type:ignore (because VSCode somehow cant find this function, although it works) nlp.to_disk("hp_ner") link to update video ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-16Ujcah_-h0.html
thank you so much for this great tutorial. I want to ask you why you have used the NPL model instead of the created model name in the testing function?
While browsing through your playlist videos I noticed that the 6 videos in Neural Networks for Digital Humanities (DH) and Machine Learning for Digital Humanities (DH) are the same. Maybe the whole playlist is duplicated.
Good question. I may need to explain that a bit more clearly. Sorry about that. I gathered the names from Wikipedia using BeautifulSoup and Requests (en.wikipedia.org/wiki/List_of_Harry_Potter_characters). That was the original knowledge base.
You absolutely could. I would recommend using pandad to import the data. If you do not know how to use pandas, I have a playlist and book on it. Pandas.pythonhumanities.com
@@python-programming Thanks for replying. One more thing in one of the video i.e. Train a Spacy NER model, you've used hp_training_data.json could you provide that json file as I'm getting error such as ValueError: not enough value to unpack(expected 2).
You absolutely can! BERT is a trade off. It is expensive to train, but has higher accuracy metrics. In spaCy 3, you can train BERT NER models. This series will be getting to that soon.
Thanks for the brillant tutorial! I would like to add pattern multiple time. However, if I only use the ruler.add_patterns function it returns "name ruler is not defined" and If i execute ruler = nlp.add_pipe("entity_ruler") it returns "entity ruler already exist in pipeline"
Thank you for these videos. I'm following along using my own data, from which I want to retrieve objects or NOUNS instead of names, but I noticed that this method is case sensitive, so I tried converting both my text and json list to upper case and I got wrong results (when my json list had a matching case letter, the results were perfect). How can we make spaCy case insensitive?
Thanks for the comment! I am glad you are finding these videos useful. A standard way to do this is either do data augmentation by creating an entity ruler that has upper case, lower case, capitalized, and non capatilized words. Or, and this is less conputationally expensive, use only lowercase and make sure to lower your texts before running the entity ruler over it. Or you can use a patternmatcher in spaCy that will automatically do that by passing a pattern with lower attached as a parameter. If that does not help, let me know and I will give a better response when I get to a conputer on Tuesday
@@python-programming Great, thank you for clarifying. I just tried with lowercase and it happened the same. Then I realized I was doing it wrong. I was doing it like: for item in data: new = item.upper() This was wrong. I tried with: for i in range(len(data)): data[i] = data[i].upper() And it worked! thank you for making me try it another way. I also tried it with lowercase, but I did not see much of an issue with the performance. Those workflows are pretty interesting, I'd love to see them in action! hehehe
For anyone trying this with V3: based on the update video, the function should look like this: def generate_rules(patterns): nlp=English() ruler = nlp.add_pipe("entity_ruler") ruler.add_patterns(patterns) # type:ignore (because VSCode somehow cant find this function, although it works) nlp.to_disk("hp_ner") link to update video ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-16Ujcah_-h0.html
Thanks for the videos. I am pretty new with SpaCy. Trying to run the codes from your textbook, encountered this error: ValueError: [E966] `nlp.add_pipe` now takes the string name of the registered component factory, not a callable component. Expected string, but got (name: 'None'). - If you created your component with `nlp.create_pipe('name')`: remove nlp.create_pipe and call `nlp.add_pipe('name')` instead. - If you passed in a component like `TextCategorizer()`: call `nlp.add_pipe` with the string name instead, e.g. `nlp.add_pipe('textcat')`. - If you're using a custom component: Add the decorator `@Language.component` (for function components) or `@Language.factory` (for class components / factories) to your custom component and assign it a name, e.g. `@Language.component('your_name')`. You can then run `nlp.add_pipe('your_name')` to add it to the pipeline. Then updating the nlp.create_pipe("entity_ruler") with nlp_add_pipe("textcat") as suggested by error and get this new error: AttributeError: 'TextCategorizer' object has no attribute 'add_patterns' Do you have any idea what could be wrong?
These videos are for spacy 2.0. They upgraded to 3.0 earlier this month. I am working on updating the notebook and videos. Of you pip install spacy 2.0, the code will work
This actually worked for me (found here: spacy.io/api/entityruler ) : entity_ruler = nlp.add_pipe("entity_ruler") entity_ruler.initialize(lambda: [], nlp=nlp, patterns=patterns)
@@egomalego Thanks for sharing this! I was just about to link to my new spaCy 3.0 text classification training that introduces viewers to the new config system.
Hi, a really nice way to explain every single thing, simply loved it. When I am trying to implement generate rules function. I am getting an error. "ValueError: [E966] `nlp.add_pipe` now takes the string name of the registered component factory, not a callable component. Expected string, but got (name: 'None'). - If you created your component with `nlp.create_pipe('name')`: remove nlp.create_pipe and call `nlp.add_pipe('name')` instead. - If you passed in a component like `TextCategorizer()`: call `nlp.add_pipe` with the string name instead, e.g. `nlp.add_pipe('textcat')`. - If you're using a custom component: Add the decorator `@Language.component` (for function components) or `@Language.factory` (for class components / factories) to your custom component and assign it a name, e.g. `@Language.component('your_name')`. You can then run `nlp.add_pipe('your_name')` to add it to the pipeline."
i have a question : can the same custom trained model work on multiple languages if my dataset has words from different latin languages using the same tags ?
Good question. Yes and no. It will work but it needs to have encountered the other langs in training to learn context. You can use an entity ruler that would be language agnostic
I am on my phone responding. I can give a better response Monday, but essentially you open your saved model use the get_pipe method to grab the entiry ruler and add patterns to ut the same way you did initially. You then save the new model and you should have those patterns saved. Alternatively you can do it by opening the jsonl file that contains the patterns and placing them on new lines.