Тёмный

Vincent Warmerdam - Bulk Labelling Techniques 

PyData
Подписаться 160 тыс.
Просмотров 1,5 тыс.
50% 1

Let's say you've to some unlabelled data and you want to train a classifier. You need annotations before you can model, but because you're time-bound you must stay pragmatic. You only have an afternoon to spend. What would you do?
Let's say you've to some unlabelled data and you want to train a classifier. You need annotations before you can model, but because you're time-bound you must stay pragmatic. You only have an afternoon to spend. What would you do?
It turns out there are a few techniques that can totally help you with this. You can easily get interesting subset annotated quickly by leveraging:
a quick search engine
pre-trained models
sentence/image embeddings
a trick to generate phrase embeddings
In this talk I will explain these techniques for bulk labelling whil I will also highlight some tools to get all of this to work. In particular you'll see:
lunr.py (a lightweight search engine)
sentimany (a library with pretrained sentiment models)
embetter (adds pretrained embeddings for scikit-learn)
umap (an amazing dimensionality reduction library)
spaCy (a great NLP tool)
sense2vec (phrase embeddings trained on reddit)
bulk (a user interface for bulk labelling embeddings)
For this talk I'll assume you're familiar with scikit-learn and that you've heard of embeddings before.

Наука

Опубликовано:

 

5 янв 2023

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 2   
@omarelsayed247
@omarelsayed247 10 месяцев назад
thanks for sharing your ideas , please keep sharing them.
@AlexanderWeurding
@AlexanderWeurding 4 месяца назад
Bedankt voor het delen!
Далее
O-Zone - Numa Numa yei на русском!🤓
00:56
Просмотров 177 тыс.
▼ЕГО БОЯЛИСЬ МОНГОЛЫ 🍣
32:51
Просмотров 432 тыс.
We Need to Rethink Exercise - The Workout Paradox
12:00
Self-supervised learning and pseudo-labelling
24:25
Просмотров 4,4 тыс.
Vincent D. Warmerdam - Active Teaching, Human Learning
37:25
🚀  TDD, Where Did It All Go Wrong (Ian Cooper)
1:03:55
Просмотров 553 тыс.
ЗАКОПАЛ НОВЫЙ ТЕЛЕФОН!!!🎁😱
0:28
OZON РАЗБИЛИ 3 КОМПЬЮТЕРА
0:57
Просмотров 1,8 млн
Собери ПК и Получи 10,000₽
1:00
Просмотров 2,7 млн