BOBC |
Resource type: Book Chapter Language: en: English DOI: 10.1163/9789401206884_012 BibTeX citation key: UnserSchutz2011a Email resource to friend View all bibliographic details |
Categories: General Keywords: Digitalization, Japan, Language, Manga Creators: Baayen, Newman, Rice, Unser-Schutz Publisher: Rodopi (Amsterdam [etc.]) Collection: Corpus-based Studies in Language Use Language Learning and Language Documentation |
Views: 13/663
|
Attachments |
Abstract |
While demands for corpora from media which mix visual and linguistic elements have increased in recent years with developments in corpus-based linguistics research, the actual creation and design of such corpora present many unique problems. Most centrally, there remains much to be considered in terms of how to isolate and meaningfully represent their linguistic data. In line with these trends, in this paper I introduce a 687,654 character (55,415 entries) corpus of the language from Japanese comics (manga). Many of the issues encountered in its design are found with other media – newspaper stories, advertisements, political cartoons – which mix the visual with the linguistic. In addition to describing how such unusual text could be of interest to other researchers, the approaches taken here may help others with similar projects.
|