C4Corpus (CC BY-SA part)
In: https://dkpro.github.io/dkpro-c4corpus/, 2016
Online
unknown
Zugriff:
A large web corpus (over 10 billion tokens) licensed under CreativeCommons license family in 50+ languages that has been extracted from CommonCrawl, the largest publicly available general Web crawl to date with about 2 billion crawled URLs.
Titel: |
C4Corpus (CC BY-SA part)
|
---|---|
Autor/in / Beteiligte Person: | Gurevych, Iryna ; Habernal, Ivan ; Zayed, Omnia |
Link: | |
Zeitschrift: | https://dkpro.github.io/dkpro-c4corpus/, 2016 |
Veröffentlichung: | Technische Universität Darmstadt, 2016 |
Medientyp: | unknown |
Schlagwort: |
|
Sonstiges: |
|