I am an assistant professor of Chinese Studies and an affiliate in Data Science at William & Mary in Virginia and a former assistant professor of the Digital Humanities at Leiden University in the Netherlands (though this channel is not affiliated with either university). I am a scholar of late imperial Chinese literature and book history. On this channel I'll be posting videos related to the digital humanities, text mining, and general fun stuff!
This is amazing, man! I'm a musicologist and I'm currently working on my PhD project, investigating travelogues from the 16th to 19th century for intertextual references with regard to descriptions of musical encounters between colonialists and Indigenous people from the Americas. I would love to incorporate computational methods into my PhD, so your videos REALLY come in handy. Do you plan on continuing this series or is there a way of contacting you somehow? I would really love to get in touch and talk with you about how you would approach my research topic, especially since I'm very new to Python and programming.
Hi it's really interesting, thank you ! I try to do the same, but the articles that I use are pdf files, and it doesn't works with it. What format are the documents that you stocked and used for the word embedding ?
Hi there would you please link your paper here...I wanted to submit a project on intertextuality. But I wanted to know a lot before I submitted the project to my professor. If I could read the paper it would've been great. TIA
Here you go: culturalanalytics.org/article/11054-a-blast-based-language-agnostic-text-reuse-algorithm-with-a-markus-implementation-and-sequence-alignment-optimized-for-large-chinese-corpora
Thanks for this video. It's very helpful :) I have the problem with my code. I have make mallet path, but there is problem with attributeError: module 'gensim.models' has no attribute 'wrappers'. Then I have to from gensim.models.wrappers import LdaMallet. I hope you can help my problem. Thanks
Hi. I'm having an error after running the python script. Traceback (most recent call last): File "tm.py", line 38, in <module> lda_model = gensim.models.wrappers.ldamallet.LdaMallet( File "C:\Users\mmb\anaconda3\lib\site-packages\gensim\models\wrappers\ldamallet.py", line 126, in __init__ self.train(corpus) File "C:\Users\mmb\anaconda3\lib\site-packages\gensim\models\wrappers\ldamallet.py", line 279, in train self.word_topics = self.load_word_topics() File "C:\Users\mmb\anaconda3\lib\site-packages\gensim\models\wrappers\ldamallet.py", line 337, in load_word_topics with utils.smart_open(self.fstate()) as fin: File "C:\Users\mmb\anaconda3\lib\site-packages\smart_open\smart_open_lib.py", line 138, in smart_open return file_smart_open(parsed_uri.uri_path, mode) File "C:\Users\mmb\anaconda3\lib\site-packages\smart_open\smart_open_lib.py", line 642, in file_smart_open return compression_wrapper(open(fname, mode), fname, mode) File "C:\Users\mmb\anaconda3\lib\site-packages\smart_open\smart_open_lib.py", line 630, in compression_wrapper return make_closing(GzipFile)(file_obj, mode) File "C:\Users\mmb\anaconda3\lib\gzip.py", line 173, in __init__ fileobj = self.myfileobj = builtins.open(filename, mode or 'rb') TypeError: expected str, bytes or os.PathLike object, not _io.BufferedReader
Great video! it helped me a lot, especially to wrap my head around QGIS after 2 years of not using any GIS software and barely knowing how to use ArcGIS xd. Thanks a lot😉 your explanations were super clear and strightforward🙂
Hello, were you able to find this dataset? The one in the github is not full, it is edited version. Just wondering if you found and could send the link if it has?
@@kaoutarlanjri7412 Hello, were you able to find this dataset? The one in the github is not full, it is edited version. Just wondering if you found and could send the link if it has?
Hey, Paul! Thank you a lot for your video, it's very helpful! I'm a Mac user and I put mallet_path = "./bin/mallet" on line 35 instead of your code but I still get the error: "subprocess.CalledProcessError: Command './bin/mallet import-file --preserve-case --keep-sequence --remove-stopwords --token-regex "\S+" --input fed_corpus.txt --output fed_corpus.mallet' returned non-zero exit status 127". Maybe you have any ideas about how I could fix that? I don't understand what is exactly wrong.. I would be very thankful!
This suggests to me that gensim is not finding the mallet executable! Double check that the path to mallet is correct. As you currently have it, this suggest that then bin file is in the same directory as your code. I usually have it set up so that it is in the original folder it downloaded in (mallet-2.0.8). In that case your mallet path should be "./mallet-2.0.8/bin/mallet"
I just want to comment this in case someone else runs into the same issue: just using the string 'C:\mallet\bin\mallet\' to specify the path to mallet didn't work for me due to a unicode error. This is because \u is a unicode escape and doesn't work if the next character isn't numeric. Just adding an additional \ should fix this (so, 'C:\\mallet\bin\mallet').
I appreciate your courses and the way you go about teaching. However, I wish in one of the episodes so far you covered how to open files from our local computers. This is perhaps very basic thing, but most people would want to immediately open their own files on the computers and play around.