Thank you so much Christine! Your explanations are clear and you capture the essence of LDA. You just saved me from my final project in my Adv. Machine Learning class at Duke U., thank you again!
Your explanations are nice and illustrative. Thank you so much Christine! I am interested in comparing text documents in a topic space for removing redundancy (if any) in a collection of documents. I wish if you could please provide a link of data set that can of some help.
Latent Metonymical Analysis and Indexing (LMai) algorithm, invented in 2006-2007...does much of this in an unsupervised way, (with zero guidance) ...The algorithm decomposes each document to Decide a Topic and then clusters the relevant Topics. Description: The present invention relates to Latent Metonymical analysis and Indexing (LMai) is a novel concept for Advance Machine Learning or Unsupervised Machine Learning Techniques, which uses a statistical approach to identify the relationship between the words in a set of given documents (Unstructured Data). This approach does not necessarily need training data to make decisions on matching the related words together but actually has the ability to do the classification by itself. All that is needed is to give the algorithm a set of natural documents. The method is elegant enough to classify the relationships automatically without any human guidance during the process as shown in FIGS. 6 and 7.
Thank you for your review it's really interesting am IT too and I used all of these you talked about. . The machine learning and unsupervised in my work and even SVM too by Gusseain. Please go a head you did very well Regards
what about difference between linear discriminant analysis and latent dirichlet allocation? It was mentioned that latent dirichlet allocation is used for unsupervised classification. Does it mean that it will never work for superivsed purpose or just that it is not useful for supervised classification?
Unsupervised learning does not use labels, so if you want to use it for supervised learning you can take the TF-IDF matrix (as features) and use any classifier on it (SVM, ANN, Random Forest)
I tried to follow the tutorial for doing Topic Modelling with gensim Python library. It seems the input file size is too big of 12.9 GB is that right? (dumps.wikimedia.org/enwiki/latest/)