Opening new doors for psycholinguistic research: word and sound frequencies of Cantonese
Main Article Content
Abstract
Our study analyzed and compared three Cantonese language corpora (large textual databases) in order to document type (unique units) and token (total units) frequencies of words and the sounds inside them. The frequencies were calculated for each corpus, and the correlations and similarities among these corpora were analyzed. The frequency analyses reveal that while the three corpora are similar in many regards, they are statistically independent and should not be used interchangeably. In turn, we hope this study will help future empirical studies in making an informed choice regarding the basis of investigation. Furthermore, the documentation of word and sound frequencies can be used for a host of natural language applications such as speech errors, which have been shown to be sensitive to the frequencies of words and its component sounds. By pinning down the impact of frequencies on speech errors, we can better understand the nature of normal (non-erroneous) speech processes.