Transformasi statistik senarai kekerapan kata dalam kajian berasaskan korpus: manifesto pilihan raya 2008

Imran Ho Abdullah

Imran Ho Abdullah Universiti Kebangsaan Malaysia

Abstract

This article examines the statistical transformation if wordlist frequency in corpus based studies. Based on a case study of the 2008 General Election manifestos of four parties BN, PKR, PAS and DAR comparisons are made between the manifestos based on raw frequencies, relative frequencies, and different techniques of frequencies visualisation. The article also compares the generation of keyword (based on frequency data) using different statistical test such as chi-square and log-likelihood, and the transformation of the frequency data using a multi variate correspondence analysis to uncover the (linguistics) relationship between the different manifesto in question. The results reveal that using different statistical procedures can lead to different conclusion with regards to the linguistics between the texts.

References

1. Gries, S.T., . "Some Proposals towards more Rigorous Corpus Linguistics". Zeitschrift fur Anglistik and Amerikanistik, 54:2, 191-202, 2006.

2. Gries, S.T., "Dispersion and Adjusted Frequencies in Corpora". International Journal of Corpus Linguistics, 13:4, 403-37, 2008.

3. Gries, S.T., 2009. Useful statistics for corpus linguistics. http://www.linguistics.ucsb.edu/faculty/stgries/research/UsefulStatsForCorpLing.pdf (dicapai 10 September 2009).

4. Hidalgo-Downing, L., 2000. Negation, Text Worlds, and Discourse: The Pragmatics of Fiction. (Advances in Discourse Processes, V. 66.) Norwood: Ablex.

5. Hofland, K. & Johansson, S., 1982. Word frequencies in British and American English. Bergen. The Norwegian Computing Centre for Humanities.

6. Horn, Laurence R., 1989. A Natural History of Negation. Chicago: University of Chicago Press.

7. llsemann, H., 2008. More Statistical Observations on Speech Lengths in Shakespeare’s Plays Literary and Linguistic Computing Advance Access published online on September 29, 2008. Literary and Linguistic Computing, doi:l0.1093/llc/fqn011.

8. Imran Ho Abdullah, 1996. By ESL writers vs. by native writers: A Corpus Analysis of Native and Non-native Speakers’ Written English. Deep South v.2 n.3.

9. Imran Ho Abdullah & C. Laman, 1997.Comparing word frequencies across corpora: a Correspondence Analysis of varieties of English. Zymurgix. The 4th New Zealand Postgraduate Conference. Refereed Proceedings. 85-90.

10. Imran Ho Abdullah & Azhar Jaludin, "Perbendaharaan Kata dalam Bidang Komputer dan Teknologi Maklumat; Satu Kajian Korpus" dlm Jurnal Bahasa Jendela Alam, Jilid 3, 270-88, 2004.

11. Jeong, H., "Discourse analysis of public debates using corpus linguistics methodologies" dlm. Journal of Computers, Jilid 3 No. 8. 58-68, 2008.

12. Johansson, S. and Hofland, K., 1989. Frequency Analysis of English Vocabulary and Grammar Based on the LOB Corpus. Oxford: Oxford University Press.

13. Kirkpatrick, M., 2009. Word Cloud Analysis of Obama’s Inaugural Speech Compared to Bush, Clinton, Reagan, Lincoln’s. http://www.readwriteweb.com/archives/ tag_clouds_of_obamas_inaugural_speech_compared_to_bushs.php) (Capaian 6 Ogos 2009).

14. Lee, BL., 1996. Correspondence analysis. www.uv.es/prodat/ViSta/vista-frames/pdf/chap11.pdf.

15. Leech, G.N., 1992. Corpus linguistics and theories of linguistic performance. In: J. Svartvik (ed.), Directions in Corpus Linguistics: Proceedings of Nobel Symposium 82, Stockholm, 4-8 August 1991 (hlm. lO5~22). Berlin: Mouton de Gruyter.

16. Leech, G & Fallon, R., "Computer corpora - what do they tell us about culture?" ICAME Journal, 16, 29-50, 1992.

17. Leech, G.N., P. Rayson and A. Wilson. 2001. Word Frequencies in Written and Spoken English Based on the British National Corpus. London: Longman.

18. Norhafizah Mohamed Husin, 2008. "Analisis Kolokasi Leksikal: Citra Melayu dalam Hikayat Abdullah. " Tesis Sarjana. Universiti Kebangsaan Malaysia.

19. Nation, P & R. Waring, 1997. Vocabulary size, text coverage and word lists. In Schmitt Norbert and Michael McCarthy (eds.), Vocabulary: Description, Acquisition and Pedagogy (hlm. 6-20). Cambridge; Cambridge University press.

20. Rayson, P., "From key words to semantic domains" dlm. International Journal of Corpus Linguistics, Vol 13:4:519-49, 2004.

21. Rayson, P., Leech, G., and Hodges, M., Social Differentiation ln The Use Of English Vocabulary: Some Analyses of the Conversational Component of the British National Corpus dlm. International Journal of Corpus Linguistics, Jilid 2, nombor 1, hlm. 133-52, 1997.

22. Rayson P, & Garside R., 2000. Comparing corpora using frequency profiling. In: Proceedings ofthe workshop on Comparing Corpora, held in conjunction with the 38th annual meeting of the Association for Computational Linguistics (ACL 2000), hlm. 1-6.

23. Sampson, G.R., 1987. “Probabilistic Methods of Analysis" dlm. R.G. Garside, G. Leech and G.R. Sampson (eds) The Computational Analysis of English: A Corpus-Based Approach. London: Longman.

24. Sampson, G.R., 2001. Empirical Linguistics. London: Continuum.

25. Thorndike, E.L. and I. Lorge, 1944. The Teacher’s Word Book of 30,000 Words. Teachers College, Columbia University.

26. Thorndike, E.L., 1924. "The vocabularies of school pupils" dlm. J. Carelton Bell (ed.) Contributions to Education. New York: World Book Co.

27. West, M., 1953. A General Service List of English Words. London: Longman, Green & Co.