Extraction of Professional Details from Web-URLs using DeepDive.
In: Procedia Computer Science, Jg. 132 (2018-04-01), S. 1602-1610
Online
academicJournal
Zugriff:
Manual extraction of data from unstructured data sources like websites is labour intensive and becomes almost in-feasible at large scale. Recent state-of-the-art techniques for the task of information extraction show encouraging results. In this work, we make an attempt to extract professional details like name, email, address, contact number, and specialization from home pages of doctors. The work covers two possible scenarios of websites having these details. One scenario is where a website contains details of a single doctor. Another scenario is where a website may contain multiple information of multiple doctors/professionals at the same time. The problem is attempted to be solved as a relation extraction task for Information Extraction. The proposed solution has been built on top of DeepDive, a tool developed by Stanford. In both scenarios, DeepDive takes pre-processed data sentences as input and constructs entity-relations. For each entity-relation, DeepDive computes a probability that the relationship is a correct match using distance supervision and user-defined heuristic rules. In case of experiment-1, our system achieves 69.14% accuracy for the name, 88.67% accuracy for location and 100% for email, number and specialization. In case of experiment-2, the observed probabilities are not so significant and mostly around 0.5-0.7 but we present some solutions for future work. The techniques presented here can easily be extended to generalize for other types of professionals too and not just doctors. [ABSTRACT FROM AUTHOR]
Copyright of Procedia Computer Science is the property of Elsevier B.V. and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Titel: |
Extraction of Professional Details from Web-URLs using DeepDive.
|
---|---|
Autor/in / Beteiligte Person: | Vyas, Aditya ; Kadakia, Urmil ; Jat, Pokhar Mal |
Link: | |
Zeitschrift: | Procedia Computer Science, Jg. 132 (2018-04-01), S. 1602-1610 |
Veröffentlichung: | 2018 |
Medientyp: | academicJournal |
ISSN: | 1877-0509 (print) |
DOI: | 10.1016/j.procs.2018.05.125 |
Schlagwort: |
|
Sonstiges: |
|