Latento Dirihlē sadalījumu modeļa izmantojums laikraksta Latvijas Kareivis tematu analīzē: Oskara Kalpaka gadījuma izpēte. (Latvian)
In: Letonica, 2022-10-01, Heft 47, S. 150-166
academicJournal
Zugriff:
The paper presents a case study of the application of the LDA (latent Dirichlet allocation) model for the analysis of topics in the corpus of the historical daily newspaper of Latvian armed forces Latvian Soldier (1925–1940). Although topic modelling is one of the most popular techniques for analysing text in digital humanities, this methodology has not been extensively tested for texts in Latvian. The case study was conducted to explore the possibilities for implementing topic models as new functionality for exploring newspapers in the digital library of the National Library of Latvia. To imitate different use cases of topic modelling, two models were created: a model consisting of 50 topics for the whole corpus of the Latvian Soldier, as well as a six-topic model of the subcorpus compiled from articles that contain the name ‘Kalpaks’. It was demonstrated that both models produced usable, semantically coherent topics that could aid the exploration of historical newspapers. It was concluded that the quality of the models in the current state was sufficient to follow the approach of topic instrumentalism, which views topics as incomplete representations of texts that are a useful augmentation of the investigative process. The acquired topic models seem particularly useful for combining research practices of distant and close reading. Further testing and adjustment of the parameters are needed to produce concise and unambiguous topics that could be reliably used in research situations where extensive analysis of the sources and verification is not expected. [ABSTRACT FROM AUTHOR]
Jau kopš 1999. gada Latvijas Nacionālā bibliotēka (LNB) veic vēsturisko laikrakstu, grāmatu, attēlu, audio un video kolekciju digitalizāciju (Krūmiņa 2012; Zariņš 2014). Teksta kolekcijām tikusi pievērsta vislielākā vērība; tiek lēsts, ka digitalizēto laikrakstu kolekcijas ietver vairāk nekā 80 % periodikas materiālu, kas publicēti līdz 20. gadsimta 90. gadu vidum1 . Materiāli tikuši segmentēti un optiski atpazīti, tādējādi to lietotāji var izmantot iespējas, kuras sniedz iespēja meklēt vārdus pilnajā tekstā. Tomēr, sekojot valodas tehnoloģiju attīstībai un pašreizējām tendencēm digitālo humanitāro zinātņu izpētē, pastāv pieprasījums pēc jaunu pakalpojumu izveides, kas sniegtu vēl vairāk iespēju padziļinātai teksta dokumentu izpētei (Ehrmann et al. 2020; Ūdre et al. 2019). [ABSTRACT FROM AUTHOR]
Copyright of Letonica is the property of University of Latvia, Institute of Literature, Folklore & Art and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Titel: |
Latento Dirihlē sadalījumu modeļa izmantojums laikraksta Latvijas Kareivis tematu analīzē: Oskara Kalpaka gadījuma izpēte. (Latvian)
|
---|---|
Autor/in / Beteiligte Person: | Baklāne, Anda ; Saulespurēns, Valdis |
Zeitschrift: | Letonica, 2022-10-01, Heft 47, S. 150-166 |
Veröffentlichung: | 2022 |
Medientyp: | academicJournal |
ISSN: | 1407-3110 (print) |
DOI: | 10.35539/LTNC.2022.0047.A.B.V.S.150.167 |
Schlagwort: |
|
Sonstiges: |
|