Transformasi statistik senarai kekerapan kata dalam kajian berasaskan korpus: manifesto pilihan raya 2008

  • Imran Ho Abdullah Universiti Kebangsaan Malaysia


Artikel ini meneliti transformasi statistik maklumat kekerapan kata dalam kajian berasaskan korpus. Berdasarkan satu kajian kes terhadap manifesto pilihan raya 2008, penelitian perbandingan kekerapan, iaitu kekerapan mentah, kekerapan relatitf sena visualisasi data kekerapan antara manifesto BN, PKR, PAS dan DAP dilakukan. Seterusnya, penggunaan kekerapan untuk menjana kata kunci dengan menggunakan statistik ujian khi kuasa dua dan log-likelihood. Transformasi statistik data kekerapan kata menggunakan kaedah multivariat - analisis penghubungan juga dipelopori untuk melihat keberkesanan pelbagai kaedah statistik ini. Dapatan dan implikasi kajian menunjukkan transformasi statistik terhadap data kekerapan serta jenis statistik yang digunakan boleh mempengaruhi dapatan dan membawa kepada kesimpulan yang berbeza tentang hubung kait linguistik antara teks dalam sesuatu korpus.


1. Gries, S.T., . "Some Proposals towards more Rigorous Corpus Linguistics". Zeitschrift fur Anglistik and Amerikanistik, 54:2, 191-202, 2006.

2. Gries, S.T., "Dispersion and Adjusted Frequencies in Corpora". International Journal of Corpus Linguistics, 13:4, 403-37, 2008.

3. Gries, S.T., 2009. Useful statistics for corpus linguistics. (dicapai 10 September 2009).

4. Hidalgo-Downing, L., 2000. Negation, Text Worlds, and Discourse: The Pragmatics of Fiction. (Advances in Discourse Processes, V. 66.) Norwood: Ablex.

5. Hofland, K. & Johansson, S., 1982. Word frequencies in British and American English. Bergen. The Norwegian Computing Centre for Humanities.

6. Horn, Laurence R., 1989. A Natural History of Negation. Chicago: University of Chicago Press.

7. llsemann, H., 2008. More Statistical Observations on Speech Lengths in Shakespeare’s Plays Literary and Linguistic Computing Advance Access published online on September 29, 2008. Literary and Linguistic Computing, doi:l0.1093/llc/fqn011.

8. Imran Ho Abdullah, 1996. By ESL writers vs. by native writers: A Corpus Analysis of Native and Non-native Speakers’ Written English. Deep South v.2 n.3.

9. Imran Ho Abdullah & C. Laman, 1997.Comparing word frequencies across corpora: a Correspondence Analysis of varieties of English. Zymurgix. The 4th New Zealand Postgraduate Conference. Refereed Proceedings. 85-90.

10. Imran Ho Abdullah & Azhar Jaludin, "Perbendaharaan Kata dalam Bidang Komputer dan Teknologi Maklumat; Satu Kajian Korpus" dlm Jurnal Bahasa Jendela Alam, Jilid 3, 270-88, 2004.

11. Jeong, H., "Discourse analysis of public debates using corpus linguistics methodologies" dlm. Journal of Computers, Jilid 3 No. 8. 58-68, 2008.

12. Johansson, S. and Hofland, K., 1989. Frequency Analysis of English Vocabulary and Grammar Based on the LOB Corpus. Oxford: Oxford University Press.

13. Kirkpatrick, M., 2009. Word Cloud Analysis of Obama’s Inaugural Speech Compared to Bush, Clinton, Reagan, Lincoln’s. tag_clouds_of_obamas_inaugural_speech_compared_to_bushs.php) (Capaian 6 Ogos 2009).

14. Lee, BL., 1996. Correspondence analysis.

15. Leech, G.N., 1992. Corpus linguistics and theories of linguistic performance. In: J. Svartvik (ed.), Directions in Corpus Linguistics: Proceedings of Nobel Symposium 82, Stockholm, 4-8 August 1991 (hlm. lO5~22). Berlin: Mouton de Gruyter.

16. Leech, G & Fallon, R., "Computer corpora - what do they tell us about culture?" ICAME Journal, 16, 29-50, 1992.

17. Leech, G.N., P. Rayson and A. Wilson. 2001. Word Frequencies in Written and Spoken English Based on the British National Corpus. London: Longman.

18. Norhafizah Mohamed Husin, 2008. "Analisis Kolokasi Leksikal: Citra Melayu dalam Hikayat Abdullah. " Tesis Sarjana. Universiti Kebangsaan Malaysia.

19. Nation, P & R. Waring, 1997. Vocabulary size, text coverage and word lists. In Schmitt Norbert and Michael McCarthy (eds.), Vocabulary: Description, Acquisition and Pedagogy (hlm. 6-20). Cambridge; Cambridge University press.

20. Rayson, P., "From key words to semantic domains" dlm. International Journal of Corpus Linguistics, Vol 13:4:519-49, 2004.

21. Rayson, P., Leech, G., and Hodges, M., Social Differentiation ln The Use Of English Vocabulary: Some Analyses of the Conversational Component of the British National Corpus dlm. International Journal of Corpus Linguistics, Jilid 2, nombor 1, hlm. 133-52, 1997.

22. Rayson P, & Garside R., 2000. Comparing corpora using frequency profiling. In: Proceedings ofthe workshop on Comparing Corpora, held in conjunction with the 38th annual meeting of the Association for Computational Linguistics (ACL 2000), hlm. 1-6.

23. Sampson, G.R., 1987. “Probabilistic Methods of Analysis" dlm. R.G. Garside, G. Leech and G.R. Sampson (eds) The Computational Analysis of English: A Corpus-Based Approach. London: Longman.

24. Sampson, G.R., 2001. Empirical Linguistics. London: Continuum.

25. Thorndike, E.L. and I. Lorge, 1944. The Teacher’s Word Book of 30,000 Words. Teachers College, Columbia University.

26. Thorndike, E.L., 1924. "The vocabularies of school pupils" dlm. J. Carelton Bell (ed.) Contributions to Education. New York: World Book Co.

27. West, M., 1953. A General Service List of English Words. London: Longman, Green & Co.
Telah diterbitkan
Bagaimana untuk memetik sitasi
Abdullah, I. (2009). Transformasi statistik senarai kekerapan kata dalam kajian berasaskan korpus: manifesto pilihan raya 2008. Jurnal Bahasa, 9(2), 189-218. Dicapai daripada