@@Enjoy.your_life34. a specialized corpus includes texts of a particular type, an example would be the Michigan Corpus of Academic Spoken English (MICASE)
It depends on what you want to study exactly. If you are interested in its historical development I would suggest using a historical corpus of English and see how the use of 'they' changes over the years.
Thank you so much for responding my concern sir. I have a study research which in title of THE SINGULARIZATION "THEY" IN AN UNDERGRADUATE THESIS. In our matrix written in methodology. We will use Corpus Instrument instead. So in your own opinion, what exactly corpus were gonna use for our reaserch? Because, i'm not that familiar of corpus yet. There's a lot of questions in my mind about corpus. Thank you for responding again.
To look for binomial in COCA simply use this expression: * _n* and * _n* That's about it bro :) PS: please delete the spaces between * and _ when you use the expression. I added them because a character between two * is printed in bold here in the comments like this *_n* and *_n*
On statistical significance and significance testing: Say that you have two corpora, one contains texts produced by men, and the other contains texts produced by women. You would like to see whether men use the word ‘wonderful’ more than women do. You compare the frequencies and you get that men have used the word 128 times while women have used it 110 times only. So, it seems that indeed men use ‘wonderful’ more than women do. Nevertheless, there is a number of things to consider, corpus size for example! Here’s the question, is the observed difference actually significant to claim that in general men use that word more than women do? or is it just a matter of chance and has nothing to do with men and women’s speech? To determine whether the difference is statistically significant and not due to chance, we need to use significance tests. One example would be the chi-square test. What the chi-square test does is that it compares the difference between the actual observed frequencies (128 and 110 in our case), with the expected frequencies ( the ones that we would expect if no factor other than chance had been involved). The closer these two results are to each other, the greater the probability that the observed frequencies are influenced by chance alone, hence the difference would not be significant. If you want to read more about it, I recommend this: www.lancaster.ac.uk/fss/courses/ling/corpus/Corpus3/3SIG.HTM Here’s more on expected frequencies and the chi-square test: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-ZUGKFoHUHQI.html&t On type/token ratio: Type/token ration is a measure of lexical richness. In essence it gives you an idea about how many distinct words (types) are used in a text relative to the total number of words (tokens). It is calculated by dividing the total number of types by the total number of tokens. The closer the score is to 1, the richer the text (the more distinct words are used), the further it is from 1, the more repetitions you have in the text.
Hi Sabrina! A reference corpus is a corpus that you choose as a standard of comparison with the corpus you're working with. It is usually more general and representative of the source language as a whole and it is large enough to represent all relevant varieties of a language and its features. Here's how it is useful. Say you are working with a corpus of biology, and you want to display a list of keywords that are particularly characteristics of the type of discourse or language contained within that biology corpus. In this case, you'd need to compare this 'specialized corpus' with a more general 'reference corpus' so as to see the list of words that are particular to 'biology'.
Not all monitor corpora can be used as reference. A monitor corpus is one which grows in size over time. Still, the data that makes the corpus may not be general enough for the corpus to be used as reference. For instance, a monitor corpus of newspapers' data is certainly not a general corpus, or one to be viewed as 'a standard' for comparison.
A corpus is intended to be a representative sample of authentic language use. There are various types of corpora as you can see so specific research purposes would vary depending on the type of the corpus chosen. But the general aim I would say is to study how a language is used authentically in a given context (either generally, or across different regions, time periods, domains etc...)