No video :(

Prof. David Blei - Probabilistic Topic Models and User Behavior

The School of Informatics at the University of Edinburgh

Подписаться 1,6 тыс.

Просмотров 41 тыс.

50% 1

David Blei, Professor of Statistics and Computer Science at Columbia University, delivered a lecture entitled 'Probabilistic Topic Models and User Behavior' on Friday 27th January 2017 at the University of Edinburgh.

Опубликовано:

26 авг 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 20

@DILLIPKUMARSAHOOIITM 7 лет назад

Beautiful lecture on topic modeling. Thanks Prof Blei and Univ Edinburg for making this lecture available.

@jagadeeshakanihal 6 лет назад

link for pdf of the presentation - www.cs.columbia.edu/~blei/talks/Blei_User_Behavior.pdf

@kapilkumar2650 6 лет назад

If any semantic meaning that lda has, I think its because of the gibbs sampling step that tries to push a word into a topic that its neighbor words are already in. In broader sense what gibbs sampling is doing is P(selected word | each topic) x P(neighboring words | each topic)

@saqibwarriach Год назад

A very informative session

@Consistent_studying 4 года назад

Very beautiful! Thanks for sharing.

@Jack-lg9mq 5 лет назад

Does anyone know where his other talk is that describes how to perform inference? 16:12

@phpn99 6 лет назад

I think the weakness in LDA is that it conflates semantics with words. Meaning arises via the relations between words; which totally escape LDA analysis. All LDA is good for, is to estimate the word promiximity between documents, but it's effectively incapable of extractive precise topics from documents; only generic topics.

@RPDBY 6 лет назад

its good enough if you have to deal with hundreds of documents containing thousands of words each

@phpn99 6 лет назад

Sure; but what is it good at ? What is the semantic value of the (let's call it a cartesian distance) between two LDA signatures ? I know what I'm talking about. I worked for a couple of years on an LDA-based classification project and the semantic value of the topics extracted from the documents was too general to be truly useful. I think Blei et all have found an interesting statistical method and a cool idea, but what they fail to express in this entire approach is precisely in what way their metric and the methods by which they choose words yields any meaningful insights on the analyzed texts. I find this whole thing very superficial. Without connecting your word net to some semantic ontology, you are doing nothing but an arbitrary match; arbitrary in the sense that meaning in language occurs in more complex ways than with individual nouns, vebs and adjectives.

@aahirip737 6 лет назад

I'm a noob on this, few weeks into NLP and i'm trying to solve a usecase and i'm hitting exactly this issue. Ultimately LDA just gives me a bunch of topic ids with words that dont mean anything together. I read that i have to name the topics myself ! And i landed here looking for a 'solution'. hmm .. i'm not the only one. Meanwhile i found something interesting ..dont know its worth, ieeexplore.ieee.org/document/6405699/. This introduces the term 'concept' between topic and word - Could not find any implementations as yet.

@RPDBY 6 лет назад

Pritish N I applied lda to public speeches and was able to compare results to manual results (i.e. people read the speeches and distinguished the main topics) and lda performed rather well and discovered 12 out 15 distinct topics. For instance health care topic had such words as health care afford insurance cost at the top, so u won't confuse it with anything else. I also have a few topics that are hard to interpret, but it gave me the main topics I needed across 6000 documents. I need to mention that in addition to stopwords I had to exclude about 30 other words that were frequent but noninformative, such as year state always because etc, these will depend on your area, of course, but they pollute the results, and the exclusion helped a lot.

@phpn99 6 лет назад

The problem is that the whole concept of the "topic" is grossly inflated. It has very shallow semantic value. A topic is a broad and ambiguous category.

@bennri 3 года назад

16:36 add one for Tomotopy

@user-qu2gk6qb3d 11 месяцев назад

Trivial contrivance

@RPDBY 7 лет назад

how to make this graph from 2:30 in R?

@HarpreetKaur-qq8rx 5 лет назад

Hello professor, Can LDA be used to categorize documents into strict categories. Your video suggest otherwise but I wanted to confirm.

@manishthapliyal6372 5 лет назад

I think you should use hard clustering algorithm like k-means or hierarchical clustering for strict topics because topic modelling is a. Soft clustering approach

@HarpreetKaur-qq8rx 5 лет назад

Thank You Manish for the reply but could you further elaborate on what is meant by soft and hard techniques

@guoguozheng32 3 года назад

@@HarpreetKaur-qq8rx to my understanding, a hard clustering would assume all the documents in a corpus have the same probably of showing each of the topics. Each document is assumed to only show one of the topics and all the words in this document are assumed to show this topic. A soft clustering assumes each document has different probabilities of showing each of the topics. And a document shows all the topics rather than one. A word in a document shows one of the topics and the words in a document may show different topics.