|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectde.linguatools.disco.DISCO
public class DISCO
DISCO (Extracting DIStributionally Similar Words Using CO-occurrences) provides a number of methods for computing the distributional (i.e. semantic) similarity between arbitrary words, for retrieving a word's collocations or its corpus frequency. It also provides a method to retrieve the semantically most similar words for a given word.
| Constructor Summary | |
|---|---|
DISCO()
Deprecated. |
|
DISCO(java.lang.String idxName,
boolean loadIntoRAM)
With DISCO version 1.2 a complete word space can be loaded into RAM to speed up similarity computations. |
|
| Method Summary | |
|---|---|
ReturnDataCol[] |
collocations(java.lang.String word)
Returns the collocations for the input word together with their significance values, ordered by significance value (highest significance first). |
ReturnDataCol[] |
collocations(java.lang.String idxName,
java.lang.String word)
Deprecated. |
void |
destroy()
This method closes the RAMDirectory where the word space is stored and sets all internal variables of the DISCO instance to null. |
float |
firstOrderSimilarity(java.lang.String w1,
java.lang.String w2)
Computes the first order similarity (according to Lin's vector similarity measure) between the input words based on their collocation sets. |
float |
firstOrderSimilarity(java.lang.String idxName,
java.lang.String w1,
java.lang.String w2)
Deprecated. |
int |
frequency(java.lang.String word)
Looks up the input word in the index and returns its frequency. |
int |
frequency(java.lang.String idxName,
java.lang.String word)
Deprecated. |
int |
numberOfWords()
returns the number of Documents (i.e. words) in the index. |
int |
numberOfWords(java.lang.String idxName)
Deprecated. |
org.apache.lucene.document.Document |
searchIndex(java.lang.String word)
Searches for a word in index field "word" and returns the first hit Document or null. |
org.apache.lucene.document.Document |
searchIndex(java.lang.String idxName,
java.lang.String word)
Deprecated. |
float |
secondOrderSimilarity(java.lang.String w1,
java.lang.String w2)
Computes the second order similarity (according to Lin's measure) between the input words based on the sets of their distributional similar words. |
float |
secondOrderSimilarity(java.lang.String idxName,
java.lang.String w1,
java.lang.String w2)
Deprecated. |
ReturnDataBN |
similarWords(java.lang.String word)
Looks up the input word in the index and returns its distributionally similar words ordered by decreasing similarity together with similarity values. |
ReturnDataBN |
similarWords(java.lang.String idxName,
java.lang.String word)
Deprecated. |
ReturnDataCol[] |
wordvector(java.lang.String word)
Returns the collocations with their exact positions and their significance values -- in other words the word vector representing the input word. |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Constructor Detail |
|---|
public DISCO(java.lang.String idxName,
boolean loadIntoRAM)
throws java.io.IOException
idxName - the word space directoryloadIntoRAM - if true the word space is loaded into RAM
java.io.IOExceptionpublic DISCO()
| Method Detail |
|---|
public org.apache.lucene.document.Document searchIndex(java.lang.String word)
throws java.io.IOException
word - word to be looked up in index
java.io.IOException
public org.apache.lucene.document.Document searchIndex(java.lang.String idxName,
java.lang.String word)
throws org.apache.lucene.index.CorruptIndexException,
java.io.IOException
idxName - name of index directoryword - word to be looked up in index
org.apache.lucene.index.CorruptIndexException
java.io.IOException
public int numberOfWords()
throws java.io.IOException
java.io.IOException
public int numberOfWords(java.lang.String idxName)
throws org.apache.lucene.index.CorruptIndexException,
java.io.IOException
idxName - name of index directory
org.apache.lucene.index.CorruptIndexException
java.io.IOException
public int frequency(java.lang.String word)
throws java.io.IOException
word - word to be looked up
java.io.IOException
public int frequency(java.lang.String idxName,
java.lang.String word)
throws org.apache.lucene.index.CorruptIndexException,
java.io.IOException
idxName - name of index directoryword - word to be looked up
org.apache.lucene.index.CorruptIndexException
java.io.IOException
public ReturnDataBN similarWords(java.lang.String word)
throws java.io.IOException
word - word to be looked up
java.io.IOException
public ReturnDataBN similarWords(java.lang.String idxName,
java.lang.String word)
throws java.io.IOException
idxName - name of index directoryword - word to be looked up
java.io.IOException
public ReturnDataCol[] collocations(java.lang.String word)
throws java.io.IOException
relation in the returned data
structure is not set.
word - the input word
java.io.IOExceptionwordvector(java.lang.String)
public ReturnDataCol[] collocations(java.lang.String idxName,
java.lang.String word)
throws java.io.IOException
idxName - name of index directoryword - input word
java.io.IOException
public ReturnDataCol[] wordvector(java.lang.String word)
throws java.io.IOException
word - input word
java.io.IOException
public float firstOrderSimilarity(java.lang.String w1,
java.lang.String w2)
throws java.io.IOException
w1 - input word #1w2 - input word #2
java.io.IOException
public float firstOrderSimilarity(java.lang.String idxName,
java.lang.String w1,
java.lang.String w2)
throws java.io.IOException
idxName - name of index directoryw1 - input word #1w2 - input word #2
java.io.IOException
public float secondOrderSimilarity(java.lang.String w1,
java.lang.String w2)
throws java.io.IOException
w1 - input word #1w2 - input word #2
java.io.IOException
public float secondOrderSimilarity(java.lang.String idxName,
java.lang.String w1,
java.lang.String w2)
throws java.io.IOException
idxName - name of index directoryw1 - input word #1w2 - input word #2
java.io.IOExceptionpublic void destroy()
null.
The sole purpose of this method is to release the memory that is
associated with a word space loaded into RAM. Subsequent calls to the
DISCO instance will throw NullPointerExceptions! In most cases it is
not necessary for a program to call this method. Normally, you do not
have to destroy a DISCO instance after using it.
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||