Cluster

java.lang.Object
- de.linguatools.disco.Cluster

```
public class Cluster
extends java.lang.Object
```
This class provides methods that operate on sets of semantically similar words or collocations.

Constructor Summary

Constructors
Constructor and Description

Cluster()

Constructors
Constructor and Description
`Cluster()`

Method Summary

Methods
Modifier and Type	Method and Description
`void`	`clutoClusterSimilarityGraph(DISCO disco, int n, float minSim, java.lang.String outputDir)` Creates a sparse graph file that can be clustered with CLUTO's `scluster` program. Important note: This method only works with word spaces of type `DISCO.WordspaceType.SIM`!
`void`	`clutoClusterVectors(DISCO disco, java.util.ArrayList<java.lang.String> wordList, java.lang.String outputDir)` Creates sparse matrix file for use with CLUTO's `vcluster` program.
`static ReturnDataBN`	`filterOutliers(DISCO disco, java.lang.String word, int n)` This method takes the list of the n most similar words of the input word and filters out all words that do not appear in the similarity list of at least one of the other similar words of the input word. The resulting list of similar words will have size <= n. Important note: This method only works with word spaces of type `DISCO.WordspaceType.SIM`.
`static java.lang.String[]`	`growSet(DISCO disco, java.lang.String[] inputSet)` Retrieves the similar words for all the words in the input set and extends the input set by all words that appear in the similarity lists of all the input words.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Constructor Detail
  - Cluster
```
public Cluster()
```
- Method Detail
  - filterOutliers
```
public static ReturnDataBN filterOutliers(DISCO disco,
                          java.lang.String word,
                          int n)
                                   throws java.io.IOException,
                                          WrongWordspaceTypeException
```
    This method takes the list of the n most similar words of the input word and filters out all words that do not appear in the similarity list of at least one of the other similar words of the input word.
    The resulting list of similar words will have size <= n.
    Important note: This method only works with word spaces of type DISCO.WordspaceType.SIM.
    
    Parameters:
    disco - DISCO word space of type DISCO.WordspaceType.SIM.
    word - input word (must be a single token).
    n - look in the list of the n most similar words of the input word
    
    Returns:
    return data structure or null
    
    Throws:
    
    java.io.IOException
    
    WrongWordspaceTypeException - if the disco word space is not of type DISCO.WordspaceType.SIM.
  - growSet
```
public static java.lang.String[] growSet(DISCO disco,
                         java.lang.String[] inputSet)
                                  throws java.io.IOException,
                                         WrongWordspaceTypeException
```
    Retrieves the similar words for all the words in the input set and extends the input set by all words that appear in the similarity lists of all the input words. I.e. adds the intersection of the similarity lists of the input words to the input set.
    This method is comparable to Google Sets.
    Important note: This method only works with word spaces of type DISCO.WordspaceType.SIM!
    
    Parameters:
    disco - DISCO word space of type DISCO.WordspaceType.SIM.
    inputSet - set of input words (must be single tokens).
    
    Returns:
    input set plus the similar words that were found (if any)
    
    Throws:
    
    java.io.IOException
    
    WrongWordspaceTypeException - if the disco word space is not of type DISCO.WordspaceType.SIM.
  - clutoClusterSimilarityGraph
```
public void clutoClusterSimilarityGraph(DISCO disco,
                               int n,
                               float minSim,
                               java.lang.String outputDir)
                                 throws org.apache.lucene.index.CorruptIndexException,
                                        java.io.IOException,
                                        WrongWordspaceTypeException
```
    Creates a sparse graph file that can be clustered with CLUTO's scluster program.
    Important note: This method only works with word spaces of type DISCO.WordspaceType.SIM!
    
    Parameters:
    disco - DISCO word space loaded into RAM. The word space has to be of type DISCO.WordspaceType.SIM.
    n - cluster the first n words in the word space index.
    minSim - create an edge between words that have a similarity value of at least minSim.
    outputDir - output directory. Two files are created in the output directory outputDir: sparseGraph.dat and rowLabels.dat. Existing files with these names are overwritten.
    
    Throws:
    
    org.apache.lucene.index.CorruptIndexException
    
    java.io.IOException
    
    WrongWordspaceTypeException - if the disco word space is not of type DISCO.WordspaceType.SIM.
  - clutoClusterVectors
```
public void clutoClusterVectors(DISCO disco,
                       java.util.ArrayList<java.lang.String> wordList,
                       java.lang.String outputDir)
                         throws java.io.IOException
```
    Creates sparse matrix file for use with CLUTO's vcluster program. For every word in the word list its word vector is retrieved from the DISCO index and written to the sparse matrix file. A row label file is also created that maps the row numbers to the words.
    
    Parameters:
    disco - DISCO word space loaded into RAM. The word space may be of any type.
    wordList - list of words to be clustered.
    outputDir - output directory. Two files are created in the output directory outputDir: sparseMatrix.dat and rowLabels.dat. Existing files with these names are overwritten.
    
    Throws:
    
    java.io.IOException

Class Cluster

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Detail

Cluster

Method Detail

filterOutliers

growSet

clutoClusterSimilarityGraph

clutoClusterVectors