Layout- and Activity-based Textbook Modeling for Automatic PDF Textbook Extraction
In: Proceedings of the 5h International Workshop on Intelligent Textbooks (iTextbooks 2023) co-located with the 24th International Conference on Artificial Intelligence in Education (AIED 2023) ; Intelligent Textbooks 2023 ; https://hal.science/hal-04184895 ; Intelligent Textbooks 2023, Jul 2023, Tokyo, Japan. pp.37-53 ; https://ceur-ws.org/Vol-3444/, 2023
Online
Konferenz
Zugriff:
International audience ; Ensuring accessible textbooks for children with disabilities is essential for inclusive education. However, providing native accessibility for educational content remains a challenge. In the mean time, existing educational materials need to be adapted, for example by providing interactive versions to overcome difficulties caused by disabilities. In this context, our project aims to automatically adapt PDF textbooks to make them accessible to children with disabilities. The first step towards this adaptation involves extracting and structuring the content of textbooks. In this paper, we introduce textbook models, propose an automated extraction pipeline, and conduct preliminary experiments. Our textbook models are based on the various activities involved and provide layout and semantic information. They enable normalized and structured representations of educational content at both document and page levels, facilitating the automatic extraction process and the conversion to popular formats such as TEI and DocBook. In order to automatically extract PDF textbooks structure, our experiments, using a state-of-the-art multimodal transformer for a token classification task, demonstrate promising results. However, these experiments also highlight the difficulty of the task, especially cross-textbook collection generalization. Finally, we discuss the extraction pipeline and the directions of future work.
Titel: |
Layout- and Activity-based Textbook Modeling for Automatic PDF Textbook Extraction
|
---|---|
Autor/in / Beteiligte Person: | Lincker, Elise ; Pons, Olivier ; Guinaudeau, Camille ; Barbet, Isabelle ; Dupire, Jérôme ; Hudelot, Céline ; Mousseau, Vincent ; Huron, Caroline ; CEDRIC - Interactivité pour Lire et Jouer (CEDRIC - ILJ) ; Centre d'études et de recherche en informatique et communications (CEDRIC) ; Ecole Nationale Supérieure d'Informatique pour l'Industrie et l'Entreprise (ENSIIE)-Conservatoire National des Arts et Métiers CNAM (CNAM) ; HESAM Université - Communauté d'universités et d'établissements Hautes écoles Sorbonne Arts et métiers université (HESAM)-HESAM Université - Communauté d'universités et d'établissements Hautes écoles Sorbonne Arts et métiers université (HESAM)-Ecole Nationale Supérieure d'Informatique pour l'Industrie et l'Entreprise (ENSIIE)-Conservatoire National des Arts et Métiers CNAM (CNAM) ; HESAM Université - Communauté d'universités et d'établissements Hautes écoles Sorbonne Arts et métiers université (HESAM)-HESAM Université - Communauté d'universités et d'établissements Hautes écoles Sorbonne Arts et métiers université (HESAM) ; CEDRIC. Systèmes sûrs (CEDRIC - SYS) ; Japanese French Laboratory for Informatics (JFLI) ; National Institute of Informatics (NII)-The University of Tokyo (UTokyo)-Sorbonne Université (SU)-Centre National de la Recherche Scientifique (CNRS) ; Paris-Saclay, Université ; Mathématiques et Informatique pour la Complexité et les Systèmes (MICS) ; Paris-Saclay, CentraleSupélec-Université ; Evolution et ingénierie de systèmes dynamiques (SEED (UMR-S 1284/U 1284)) ; Institut National de la Santé et de la Recherche Médicale (INSERM)-Université Paris Cité (UPCité) ; Learning Planet Institute Paris (LPI) ; Sosnovsky, Sergey ; Brusilovsky, Peter ; Lan, Andrew ; ANR-21-CE38-0014,MALIN,MAnuels scoLaires INclusifs(2021) |
Link: | |
Zeitschrift: | Proceedings of the 5h International Workshop on Intelligent Textbooks (iTextbooks 2023) co-located with the 24th International Conference on Artificial Intelligence in Education (AIED 2023) ; Intelligent Textbooks 2023 ; https://hal.science/hal-04184895 ; Intelligent Textbooks 2023, Jul 2023, Tokyo, Japan. pp.37-53 ; https://ceur-ws.org/Vol-3444/, 2023 |
Veröffentlichung: | HAL CCSD ; CEUR-WS, 2023 |
Medientyp: | Konferenz |
Schlagwort: |
|
Sonstiges: |
|