(LDC), of which LLD is a member.
For information on corpora from other sources besides the LDC, see David Lee's list
and Chris Manning's list
.
Some of the corpora listed below can be downloaded directly from our private server (registered users only). Other corpora are marked "CD" or "DVD", which means that we have the data on CD or DVD format in the LLD office.
| Catalog ID | Description | Format |
|---|---|---|
| LDC95T7 | Penn Treebank, Release 2 | download |
| LDC2002L49 | Buckwalter Arabic Morphological Analyzer Version 1.0 | download |
| LDC2005S25 | Santa Barbara Corpus of Spoken American English | 1 DVD |
| LDC2005S26 | CSLU: 22 Languages Corpus | 2 DVD |
| LDC2005T01 | Chinese Treebank 5.0 | download |
| LDC2005T06 | Chinese News Translation Text Part 1 | download |
| LDC2005T10 | Chinese English News Magazine Parallel Text | 1 CD |
| LDC2005T12 | English Gigaword Second Edition | 2 DVD |
| LDC2005T13 | CCGbank | download |
| LDC2005T14 | Chinese Gigaword Second Edition | 1 DVD |
| LDC2005T23 | Chinese Proposition Bank 1.0 | download |
| LDC2005T28 | HARD 2004 Text | 1 DVD |
| LDC2005T33 | BBN Pronoun Coreference and Entity Type Corpus | online |
| LDC2005T35 | ANC Second Release | 2 DVD |
| LDC2006S34 | Russian through Switched Telephone Network (RuSTeN) | 1 DVD |
| LDC2006T04 | Multiple Translation Chinese (MTC) Part 4 | download |
| LDC2006T13 | Web 1T 5-gram Version 1 | 6 DVD |
| LDC2006T17 | French Gigaword First Edition | 1 DVD |
| LDC2007T02 | English Chinese Translation Treebank v 1.0 | download |
| LDC2007T09 | ISI Chinese-English Automatically Extracted Parallel Text | download |
| LDC2007T40 | Arabic Gigaword Third Edition | 1 DVD |
