Тёмный

#4 Introduction to Corpus Linguistics - Part-of-Speech Tagging and Working with Tagged Data. 

Yassine Iabdounane
Подписаться 2,1 тыс.
Просмотров 9 тыс.
50% 1

Опубликовано:

 

29 окт 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 63   
@YassineIabdounane
@YassineIabdounane 4 года назад
Hi all! I used AntConc 3.5.9 in this video. There are a few changes in the latest version of the software (4.0.2) so some of things I show here may not look/work exactly the same way. If anything is unclear, feel free to email me. If you want to follow things exactly as I show in the video, you can always download the previous version: www.laurenceanthony.net/software/antconc/ 2:30 Load data into LancsBox 4:43 Look for tags in LancsBox 5:59 Look for instances of a specific part of speech in LancsBox 7:24 Look for instances of a specific word class in LancsBox 11:50 Save results in LancsBox 14:28 Tag Corpus with TagAnt 17:46 Look for a specific part of speech in AntConc 18:15 hide tags in AntConc 19:24 Look for instances of a specific word class in AntConc 20:45 Look for irregular word forms in AntConc
@hudaalsoghayer9153
@hudaalsoghayer9153 3 года назад
How to look for all instances of Phrasal verbs ( verbs + preposition or adverb or both) can be also separable.!!
@YassineIabdounane
@YassineIabdounane 3 года назад
it depends on how the corpus is tagged. In this video, the tags are those of TreeTagger. Prepositions are tagged with _IN adverbs are tagged with _RB and particles with _RP. In this case, if you want to look for all instances of phrasal verbs, go to concordance, in the search box click on advanced, then enable "use search term(s) from list below" and type: * _V * *_IN * _V * *_RB * _V * *_RP (I've put a whitespace between * and _V because otherwise it prints only _V in bold, make sure to write the * plus _V plus * with no spaces) Then you'll have to clean your results as this would give you all instances of phrasal verbs and also sequences that are not phrasal verbs. As for separable ones, you can add these strings to your search terms: * _ V * * _N * *_IN * _V * * _N * *_RB * _V * * _N * *_RP These would give you instances where a noun comes between the verb and the preposition/adverb/particle. If you want the combinations verb + determiner + Noun + a(n) preposition/adverb/particle, use these strings: * _V * * _DT * _N * *_IN * _V * * _DT * _N * *_RB * _V * *_DT * _N * *_RP Like I said, these would work if the corpus is tagged with TreeTagger, if not, just check how it is tagged and modify the tags that I showed you. I hope you find this helpful, if not, feel free to email me :)
@jinnamao4670
@jinnamao4670 2 года назад
Greetings from China! You saved my bachelor's thesis! Thank you sooooooo much! Your videos are very approachable to new learners of corpus tools :)
@YassineIabdounane
@YassineIabdounane 2 года назад
That is so nice to know! Thanks for the kind words and good luck with your thesis! Cheers from Morocco :)
@DaNOliveiraDaN
@DaNOliveiraDaN Год назад
You're so adorable. I am writing a final paper for my postgraduate course and you are being of great help.
@YassineIabdounane
@YassineIabdounane Год назад
Thanks for the kind words and best of luck with the paper!
@GhAt-f7d
@GhAt-f7d 2 года назад
Salaam Yassine, I can't thank you enough for your detailed explanation. You are a gem and I am so happy that I have found you!
@YassineIabdounane
@YassineIabdounane 2 года назад
Salam Ghizlane, many thanks for the kind words! I appreciate them a lot and I'm glad you found the video helpful. Cheers!
@Valdis249
@Valdis249 4 года назад
really good and informative video series. thanks a lot for your time and effort, now I finally might be able to finish my bachelor's thesis. greetings from germany
@YassineIabdounane
@YassineIabdounane 4 года назад
My pleasure! Thank you for your kind words. All the best with your thesis Valdis :)
@luyaoli3294
@luyaoli3294 2 года назад
Thank you for the lectures. I'm new to the field, and your lectures have really been a great shortcut for me, clear and informative. And I do like your design of the applause part during your course XD Really appreciated!!
@YassineIabdounane
@YassineIabdounane 2 года назад
hahah trying to make the videos a bit interactive xD Thanks for the kind words and I'm glad the videos were helpful!
@Alloh_buyukdir571
@Alloh_buyukdir571 3 года назад
so informative, thanks a lot, Sir Yassine
@YassineIabdounane
@YassineIabdounane 3 года назад
My pleasure! I am happy you find it helpful :)
@themightyenglish5310
@themightyenglish5310 8 месяцев назад
Hi Yassine. How can I download the BAWE Corpus? Thank you so much.
@Beloved_Digital
@Beloved_Digital 9 месяцев назад
Thank you sir for this informative video. Is Tagging same as annotation?
@shaheenullah3212
@shaheenullah3212 3 года назад
dear sir Thank you respected sir .your suggestions are very helpul and it will be go ahead me in my research.
@nicolasbustos9686
@nicolasbustos9686 2 года назад
appreciate your work bro, contributing to the community
@YassineIabdounane
@YassineIabdounane 2 года назад
my pleasure! thanks for the kind words :)
@grandetoujga1496
@grandetoujga1496 4 года назад
تحياتي لإخوتي المغاربة،أخوكم من الجزائر.
@YassineIabdounane
@YassineIabdounane 4 года назад
مرحبا بك أخي، دمت متألقا.
@yangzhimiao9367
@yangzhimiao9367 3 года назад
Thank you so much for ur introduction. This is quite helpful!
@YassineIabdounane
@YassineIabdounane 3 года назад
my pleasure! I'm happy you find it helpful 🙂
@homanma1110
@homanma1110 2 года назад
Hii, I'm using the latest version of AntConc but I can't really find the Concordance button but there are KWIC, Plot, File, Cluster, N-Gram, Collocate, Word, Keyword, and Worldcloud. Which one should I use instead?
@andresgarciaalvarez4397
@andresgarciaalvarez4397 3 года назад
Thanks a lot for your amazing job, it is such a encouraging. If you please, I have a question. If I want to tag any other language than English, in my case Spanish, do I have to download any particulare software package or change any parameter in theses sfyware programmes (AntConc o Tree Tagger) ?
@YassineIabdounane
@YassineIabdounane 3 года назад
Hi Andrés. Many thanks for the nice words! If you are using Lancsbox, once you load the corpus, make sure to change the option "language" to Spanish, then click on import. Your Spanish corpus will be tagged automatically. If you want to work with AntConc, download TagAnt, go to input files, load the data, change "language" to Spanish, and all your files will be tagged for part of speech. Then you can work on them using AntConc as I'm doing in the video :)
@andresgarciaalvarez4397
@andresgarciaalvarez4397 3 года назад
@@YassineIabdounane Thanks a lot, Yassine, for your quick reply and for teach people like in such amazing way.
@oleksandrapolanska5631
@oleksandrapolanska5631 2 года назад
thank you sm for the video, my friends and i would not have passed our exam if not for you haha
@YassineIabdounane
@YassineIabdounane 2 года назад
Oh wow! So happy to know that! Best of luck to you all :)
@xelilsalih3957
@xelilsalih3957 3 года назад
Hi sir, Thanks very much for the informative video. Would it be possible to upload a vid regarding preparing a novel to be uploaded as corpus then work on it using Lancsbox, please? I am working in the corpus stylistics field. Much appreciated.
@shaheenullah3212
@shaheenullah3212 3 года назад
dear sir when i import the courpus then the tree tagger cannot be applied on this courpus. sir i want to how tagge a new corupus if there is no language is avialable in tancsbox software
@shaheenullah3212
@shaheenullah3212 3 года назад
Dear sir can i tag a new courpus through this sotware ? f there is no work has done for this courpus. sir my question is that i want to tag a new small dataset/corpus for pos tagging using deep learning approch . and there is no tagged dataset is avialble till but i want to make a tagged corpus and then apply for pos tagging. sir please guide . thank you respected sir
@shaheenullah3212
@shaheenullah3212 3 года назад
Great video, can you please tell me how to perform PoS in other languages not English? Any tutorial in this direction please?
@YassineIabdounane
@YassineIabdounane 3 года назад
Thanks! You could try TagAnt if you want to save the tagged data separately. Before you load your texts, change the language settings to the language you want if it is among 6 languages that are available. Or, you could tag the data automatically with LancsBox (as is shown in the video). Just change the language setting before you click on 'import' (there are way more languages to choose from than in TagAnt)
@shaheenullah3212
@shaheenullah3212 3 года назад
@@YassineIabdounane thank you respected sir
@shaheenullah3212
@shaheenullah3212 3 года назад
dear sir i want to tagg a new dataset for pos tagging for this purpose what i do? if there is no dataset avialable for some language of pos tagging
@0101799
@0101799 5 месяцев назад
Thank you so much!!!
@sabrinamalik2972
@sabrinamalik2972 4 года назад
I want to know the difference between annotation, encoding, tagging and markup. For me all are almost same. Let me know if there is any difference between them. Thank you
@YassineIabdounane
@YassineIabdounane 4 года назад
Hi! yes some of them have been used interchangeably so it gets confusing sometimes. Here's the difference: The main difference between mark up and annotation is that mark up provides information about the data such as authorship, publication dates, paragraph boundaries, sentence boundaries, chapter start, headings, level (for learner corpora) and so on (as you can see these are not linguistic information). Annotation provides extra linguistic information such as syntax, stylistics, discourse, semantics or part of speech. So, to answer your other question, part of speech tagging, as the one you see in this video, is a form of annotation. As for encoding, it refers to how these annotations, tags, or mark up are encoded in the data. One example of markup encoding is XML encoding. It looks something like this: Charles Dickens Or The files are not to be distributed without permission. The files are used for research purposes only. If you want to read more about this, check out this reference (especially chapters 2, 3, and 4): users.ox.ac.uk/~martinw/dlc/index.htm
@sabrinamalik2972
@sabrinamalik2972 4 года назад
Thanks alot. Now I have clear understanding of these terms.
@YassineIabdounane
@YassineIabdounane 4 года назад
My pleasure! I'm glad to hear that!
@febrianalestari1851
@febrianalestari1851 2 года назад
I tried to check the part-of-speech tagging using AntConc and clicked the 'file' tab, but I did not see the words are tagged in part of speech. Do you know why? Thanks.
@YassineIabdounane
@YassineIabdounane 2 года назад
I'm not sure I understand what the problem is exactly. Is the corpus already tagged and you can't see the tags when you load it into AntConc?
@hudahadi1340
@hudahadi1340 2 года назад
What about tagging the Arabic corpora?? Would you please make a video for that or just show the way briefly here
@nad-im6629
@nad-im6629 Год назад
same question, i have that problem are u some ansewrs please?
@saralassri964
@saralassri964 4 года назад
Finally I understood those regular expressions 😂
@YassineIabdounane
@YassineIabdounane 4 года назад
hhhh I'm glad to hear that!
@kollisoraya2938
@kollisoraya2938 2 года назад
Please i have examen and i must to talk about reprentativness sampling size and balance could you help me
@YassineIabdounane
@YassineIabdounane 2 года назад
Hi there, this will help: www.lancaster.ac.uk/fass/projects/corpus/ZJU/xCBLS/chapters/A02.pdf The chapter discusses representativeness, sampling, and balance.
@sarahissac2301
@sarahissac2301 2 года назад
Tagant pdf is not on the website
@YassineIabdounane
@YassineIabdounane 2 года назад
Do you mean the manual?
@sarahissac2301
@sarahissac2301 2 года назад
@@YassineIabdounane file at 4:14 mnts
@sarahissac2301
@sarahissac2301 2 года назад
how to downalod BAWE
@YassineIabdounane
@YassineIabdounane 2 года назад
there you go: ota.bodleian.ox.ac.uk/repository/xmlui/handle/20.500.12024/2539
@shaheenullah3212
@shaheenullah3212 3 года назад
please sir guide me
@badawyrabie8920
@badawyrabie8920 18 дней назад
I emailed you a while ago but got no response. It's urgent please. Thanks @yassinelabdounane
@hthangkhanhau4876
@hthangkhanhau4876 3 года назад
Great video, can you please tell me how to perform PoS in other languages not English? Any tutorial in this direction please?
@YassineIabdounane
@YassineIabdounane 3 года назад
Thank you! Kindly check your email :)
@ilyanaazzeaty8420
@ilyanaazzeaty8420 3 года назад
I questioned myself as well. Does have another way?
Далее
Nightmare | Update 0.31.0 Trailer | Standoff 2
01:14
Просмотров 701 тыс.
Когда Долго В Рейсе)))
00:16
Просмотров 109 тыс.
TagAnt 1.0 - Getting started
6:41
Просмотров 8 тыс.
GEN107 - Anke Luedeling on "Corpus Linguistics"
9:54
#10 Formulaic Language and Corpus Linguistics
14:11
Просмотров 3,1 тыс.