Тёмный

Fuzzy Matching with spaCy 3.5 (spaCy 3.5 update) 

Python Tutorials for Digital Humanities
Подписаться 26 тыс.
Просмотров 6 тыс.
50% 1

Link to Update Docs: spacy.io/usage/v3-5
Join this channel to get access to perks:
/ @python-programming
If you enjoy this video, please subscribe.
✅Be my Patron: / wjbmattingly
✅PayPal: www.paypal.com/cgi-bin/webscr...
If there's a specific video you would like to see or a tutorial series, let me know in the comments and I will try and make it.
If you liked this video, check out www.PythonHumanities.com, where I have Coding Exercises, Lessons, on-site Python shells where you can experiment with code, and a text version of the material discussed here.
You can follow me at:
/ wjb_mattingly

Наука

Опубликовано:

 

5 фев 2023

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 7   
@andyr8833
@andyr8833 8 месяцев назад
Something very useful would be a pretrained model that can match terms with similar meaning, for example US with American, etc... Thanks for your videos
@abnormia
@abnormia 10 месяцев назад
Loving your vids! I have a question about using fuzzy matching in spacy to correct the name variations. I am using spacy to train an NER model to recognize entities in my text, and with fuzzy I can ensure it tags the ones that have spelling variations. However when doing analysis on those tagged it will classify them as different entities and not the correct spelling. How would I use this to correct the spelling to the one in my entity list?
@jesusmtz29
@jesusmtz29 Год назад
Great stuff! loved the concise presentation. I wonder if there's a way to change the string metric to something other than levenshtein. Or perhaps, i wonder if there's a way to control where the permissible edits can occur. Do you know if, I'm attempting to capture a bigram, how would i control where the edits can happen. Thanks again
@python-programming
@python-programming Год назад
Thanks! There is, actually. It is a bit more complex which is why I did not include it in this video. I had this precise same question. You can find the answer here: spacy.io/usage/v3-5 "If you’d prefer an alternate fuzzy matching algorithm, you can provide your own custom method to the Matcher or as a config option for an entity ruler and span ruler." As for bigrams, this is an interesting question. I have not tried yet, but I suspect you can do this the precise same way. Remember, the EntityRuler and the SpanRuler will work on a token-by-token basis, so you want to just do a fuzzy match for each of the tokens in the bigram, trigram, etc. I am curious if anyone else has thoughts on this too. Please do respond, if you all do.
@mariomendes5024
@mariomendes5024 Год назад
Hello Dr. WJB Mattingly, I have a particular question. Can I use this feature to anotate things like "Article n.º3 from Book" with something like: "Article 3 from book1"? I want to find all variations of phases without losing any punctuation.
@python-programming
@python-programming Год назад
Thanks for the question! Absolutely. This would be a great use case for fuzzy matching. You would need to experiment a bit to see how fuzzy you need to make it. I would recommend using the PhraseMatcher here. I have not tested this out with PhraseMatcher. The reason I am saying this is because of how the fuzzy matching works on tokens. It sounds like you may need to predict a number of different ways tokens may appear, not necessarily the variant of characters inside a token. If it does not work, here is another resource that may be helpful: github.com/gandersen101/spaczz
@TC-bv4on
@TC-bv4on Год назад
Out of curiosity -- do you have anything for stance detection?
Далее
Token Based Matching - spaCy Shorts
9:48
Просмотров 3,6 тыс.
Strawberry Cat?! 🙀 #cat #cute #catlover
00:42
Просмотров 7 млн
Happy 4th of July 😂
00:12
Просмотров 23 млн
меня не было еще год
08:33
Просмотров 2,6 млн
How Fuzzy Text Search Works
18:36
Просмотров 13 тыс.
ML Was Hard Until I Learned These 5 Secrets!
13:11
Просмотров 229 тыс.
Compiled Python is FAST
12:57
Просмотров 100 тыс.
Exploring NLP Fuzzy Matching Algorithms
44:36
Просмотров 12 тыс.
The Man Who Solved the World’s Hardest Math Problem
11:14
Все Смартфоны vivo Серии V30!
24:54
Просмотров 26 тыс.
ПОКУПКА ТЕЛЕФОНА С АВИТО?🤭
1:00
Так ли Хорош Founders Edition RTX 4080 ?
13:00