DISCO - Download
The DISCO API is open source and licensed under the Apache License, version 2.0. Most of the language data packets are also freely available (see table below).
New: Version 1.2 of DISCO API allows to load language data packets (word spaces) into main memory (provided that you have enough RAM). This strongly reduces computation time. See javadoc.
You need the Java archive disco-1.2.jar and a language data packet from the table below. Click on a link in the column Packet Name for a packet description on the download page.
Other Downloads:
|
Old API version 1.1:
|
| Language | Packet Name | Corpus Size | Number of Words | Packet Size | License |
| Arabic | ar-general-20120124 | 188 million token | 134,000 | 518 MB | no commercial usage! |
| Czech | cz-general-20080115 | 163 million token | 300,000 | 5.6 GB | Apache 2.0 |
| Dutch | nl-general-20081004 | 114 million token | 200,000 | 4.0 GB | Apache 2.0 |
| English | en-BNC-20080721 | 119 million token | 122,000 | 1.7 GB | Apache 2.0 |
| en-PubMedOA-20070501 | 181 million token | 60,000 | 864 MB | Apache 2.0 | |
| en-wikipedia-20080101 | 267 million token | 220,000 | 5.9 GB | Apache 2.0 | |
| French | fr-wikipedia-20110201-lemma | 458 million token | 154,000 | 513 MB | Apache 2.0 |
| fr-wikipedia-20080713 | 105 million token | 188,000 | 2.4 GB | Apache 2.0 | |
| German | de-general-20080727 | 400 million token | 200,000 | 3.6 GB | no commercial usage! |
| Italian | it-general-20080815 | 104 million token | 164,000 | 2.3 GB | Apache 2.0 |
| Russian | ru-wikipedia-20110804 | 230 million token | 112,000 | 544 MB | Apache 2.0 |
| Spanish | es-general-20080720 | 232 million token | 260,000 | 5.0 GB | no commercial usage! |

