A Hybrid Word-Character Approach to Abstractive Summarization
2018
Hochschulschrift
Zugriff:
106
Automatic abstractive text summarization is an important and challeng- ing research topic of natural language processing. Among many widely used languages, the Chinese language has a special property that a Chinese char- acter contains rich information comparable to a word. Existing Chinese text summarization methods, either adopt totally character-based or word-based representations, fail to fully exploit the information carried by both repre- sentations. To accurately capture the essence of articles, we propose a hy- brid word-character approach (HWC) which preserves the advantages of both word-based and character-based representations. We evaluate the advantage of the proposed HWC approach by applying it to two existing methods, and discover that it generates state-of-the-art performance with a margin of 24 ROUGE points on a widely used dataset LCSTS. In addition, we find an is- sue contained in the LCSTS dataset and offer a script to remove overlapping pairs (a summary and a short text) to create a clean dataset for the commu- nity. The proposed HWC approach also generates the best performance on the new, clean LCSTS dataset.
Titel: |
A Hybrid Word-Character Approach to Abstractive Summarization
|
---|---|
Autor/in / Beteiligte Person: | Chang, Chieh-Teng ; 張介騰 |
Link: | |
Veröffentlichung: | 2018 |
Medientyp: | Hochschulschrift |
Sonstiges: |
|