<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<ListRecords>

<record>
<header>
<identifier>oai:DiVA.org:uu-126666</identifier><datestamp>2010-08-25T14:00:43Z</datestamp><setSpec>uu</setSpec><setSpec>conferencePaper</setSpec><setSpec>refereed</setSpec><setSpec>PUB</setSpec></header>
<metadata><oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd"><dc:title>Bootstrapping Language Description: The case of Mpiemo (Bantu A, Central African Republic)</dc:title><dc:identifier>http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-126666</dc:identifier><dc:date>2008</dc:date><dc:creator>Hammarstr&#246;m, Harald</dc:creator><dc:creator>Thornell, Christina</dc:creator><dc:creator>Petzell, Malin</dc:creator><dc:creator>Westerlund, Torbj&#246;rn</dc:creator><dc:description>Linguists have long been producing grammatical decriptions of yet undescribed languages. This is a time-consuming process, which has already adapted to improved technology for recording and storage. We present here a novel application of NLP techniques to bootstrap analysis of collected data and speed-up manual selection work. To be more precise, we argue that unsupervised induction of morphology and part-of-speech analysis from raw text data is mature enough to produce useful results. Experiments with Latent Semantic Analysis were less fruitful. We exemplify this on Mpiemo, a so-far essentially undescribed Bantu language of the Central African Republic, for which raw text data was available.</dc:description><dc:subject>Mpiemo</dc:subject><dc:subject>Bantu A</dc:subject><dc:subject>Central African Republic</dc:subject><dc:subject>NLP</dc:subject><dc:subject>Latent Semantic Analysis</dc:subject><dc:subject>bootstrapping</dc:subject><dc:subject>African languages</dc:subject><dc:subject>Afrikanska spr&#229;k</dc:subject><dc:subject>Computational linguistics</dc:subject><dc:subject>Datorlingvistik</dc:subject><dc:language>en</dc:language><dc:publisher>Uppsala University, Department of Linguistics and Philology</dc:publisher><dc:publisher>Department of Computing Science, Chalmers University, Gothenburg</dc:publisher><dc:publisher>Department of African Languages, Gothenburg University, Gothenburg</dc:publisher><dc:publisher>Department of African Languages, Gothenburg University, Gothenburg</dc:publisher><dc:type>Conference paper</dc:type><dc:type>text</dc:type></oai_dc:dc></metadata><about><provenance><originDescription harvestDate="2010-06-30T05:17:00Z"><baseURL>http://www.diva-portal.org/oai/ntnu/OAI</baseURL><identifier>oai:DiVA.org:uu-126666</identifier><datestamp>2010-06-30T05:17:00Z</datestamp><metadataNamespace>http://www.openarchives.org/OAI/2.0/oai_dc/</metadataNamespace></originDescription></provenance></about></record>
</ListRecords>