Einzeltreffer — DigiBib

С развитием все более сложных методов автоматического анализа текста повышается важность задачи объяснения пользователю, почему прикладная интеллектуальная информационная система выделяет некоторые тексты как схожие по смыслу. В работе рассмотрены ограничения, которые такая постановка накладывает на используемые интеллектуальные алгоритмы. Проведенный авторами эксперимент показал, что абсолютное значение схожести документов не универсально по отношению к интеллектуальному алгоритму, поэтому оптимальную пороговую величину схожести необходимо устанавливать отдельно для каждой решаемой задачи. Полученные результаты могут быть использованы при оценке применимости различных методов установления смысловой схожести между документами в прикладных информационных системах, а также при выборе оптимальных параметров модели с учетом требований объяснимости решения.

The problem of providing a comprehensive explanation to any user why the applied intelligent information system suggests meaning similarity in certain texts imposes significant requirements on the intelligent algorithms. The article covers the entire set of technologies involved in the solution of the text clustering problem and several conclusions are stated thereof. Matrix decomposition aimed at reducing the dimension of the vector representation of a corpus does not provide clear explanatiom of the algorithmic principles to a user. Ranking using the TF-IDF function and its modifications finds a few documents that are similar in meaning, however, this method is the easiest for users to comprehend, since algorithms of this type detect specific matching words in the compared texts. Topic modeling methods (LSI, LDA, ARTM) assign large similarity values to texts despite a few matching words, while a person can easily tell that the general subject of the texts is the same. Yet the explanation of how topic modeling works requires additional effort for interpretation of the detected ones. This interpretation gets easier as the model quality grows, while the quality can be optimized by its average coherence. The experiment demonstrated that the absolute value of documents similarity is not invariant for different intelligent algorithms, so the optimal threshold value of similarity must be set separately for each problem to be solved. The results of the work can be further used to assess which of the various methods developed to detect meaning similarity in texts can be effectively implemented in applied information systems and to determine the optimal model parameters based on the solution explicability requirements.

Вычислительные технологии, Выпуск 5 (25) 2020, Pages 107-123

Titel:	Фактор объяснимости алгоритма в задачах поиска схожести текстовых документов
Link:	View record in OpenAIRE (Volltext)
Veröffentlichung:	Институт Вычислительных технологий СО РАН, 2020
Medientyp:	unknown
DOI:	10.25743/ict.2020.25.5.009
Schlagwort:	ranking function Numerical Analysis explainable artificial intelligence Computer Networks and Communications Computer science business.industry Applied Mathematics схожесть документов Pattern recognition функция ранжирования document similarity Computational Mathematics XAI Computational Theory and Mathematics Similarity (network science) Factor (programming language) Artificial intelligence business computer Software computer.programming_language
Sonstiges:	Nachgewiesen in: OpenAIRE Sprachen: Russian Language: Russian

Klicken Sie ein Format an und speichern Sie dann die Daten oder geben Sie eine Empfänger-Adresse ein und lassen Sie sich per Email zusenden.

BibTeX Citavi, JabRef, u.a.
(Literaturverwaltung)

PDF kein Volltext!
(Merkzettel, Notizen)

RIS Endnote, Citavi u.a.
(Literaturverwaltung)

MODS
(XML zur Weiterverarbeitung)

oder

Wählen Sie das für Sie passende Zitationsformat und kopieren Sie es dann in die Zwischenablage, lassen es sich per Mail zusenden oder speichern es als PDF-Datei.

Gewünschter Zitations-Stil:

oder

Bitte prüfen Sie, ob die Zitation formal korrekt ist, bevor Sie sie in einer Arbeit verwenden. Benutzen Sie gegebenenfalls den "Exportieren"-Dialog, wenn Sie ein Literaturverwaltungsprogramm verwenden und die Zitat-Angaben selbst formatieren wollen.

Фактор объяснимости алгоритма в задачах поиска схожести текстовых документов