- closingTag - Variable in class de.linguatools.disco.ConfigFile
-
- Cluster - Class in de.linguatools.disco
-
This class provides methods that operate on sets of semantically similar
words or collocations.
- Cluster() - Constructor for class de.linguatools.disco.Cluster
-
- clutoClusterSimilarityGraph(DISCO, int, float, String) - Method in class de.linguatools.disco.Cluster
-
Creates a sparse graph file that can be clustered with
CLUTO's
scluster
program.
Important note: This method only works with word spaces of type
DISCO.WordspaceType.SIM
!
- clutoClusterVectors(DISCO, ArrayList<String>, String) - Method in class de.linguatools.disco.Cluster
-
Creates sparse matrix file for use with
CLUTO's
vcluster
program.
- collocationalValue(String, String) - Method in class de.linguatools.disco.DISCOLuceneIndex
-
Returns the collocational strength between words w1
and
w2
, summed up over all relations.
- collocations(String) - Method in class de.linguatools.disco.DenseMatrix
-
- collocations(String) - Method in class de.linguatools.disco.DISCO
-
Returns the collocations for the input word together with their
significance values, ordered by significance value (highest significance
first).
- collocations(String) - Method in class de.linguatools.disco.DISCOLuceneIndex
-
Returns the collocations for the input word together with their
significance values, ordered by significance value (highest significance
first).
- compareTo(Rank.WordAndRank) - Method in class de.linguatools.disco.Rank.WordAndRank
-
- compareTo(ReturnDataCol) - Method in class de.linguatools.disco.ReturnDataCol
-
Sortiert von groß nach klein.
- composeVectorsByCombinedMultAdd(Map<String, Float>, Map<String, Float>, Float, Float, Float) - Static method in class de.linguatools.disco.Compositionality
-
Compose vectors wv1 and wv2 by a combination of addition and
multiplication:
p = a*wv1 + b*wv2 + c*wv1*wv2
The contribution of multiplication and addition, as well
as the contribution of each of the two vectors can be controlled by the
three parameters a, b and c.
For instance, in Mitchell and Lapata 2008 where wv1 is a verb and wv2 is
a noun, the parameters a, b and c are set as follows:
a = 0.95
b = 0
c = 0.05.
If one of a, b, c is null, then these default values are used.
- composeVectorsByCombinedMultAdd(float[], float[], Float, Float, Float) - Static method in class de.linguatools.disco.Compositionality
-
Compose vectors wv1 and wv2 by a combination of addition and
multiplication:
p = a*wv1 + b*wv2 + c*wv1*wv2
The contribution of multiplication and addition, as well
as the contribution of each of the two vectors can be controlled by the
three parameters a, b and c.
For instance, in Mitchell and Lapata 2008 where wv1 is a verb and wv2 is
a noun, the parameters a, b and c are set as follows:
a = 0.95
b = 0
c = 0.05.
If one of a, b, c is null, then these default values are used.
- composeVectorsByDilation(Map<String, Float>, Map<String, Float>, Float) - Static method in class de.linguatools.disco.Compositionality
-
The following formula is used:
(wv1*wv1)wv2 + (lambda-1)(wv1*wv2)wv1
The default value (if lambda is null) for lambda is 2.0.
This composition method only works with the SimilarityMeasures.COSINE
similarity measure.
- composeVectorsByDilation(float[], float[], Float) - Static method in class de.linguatools.disco.Compositionality
-
The following formula is used:
(wv1*wv1)wv2 + (lambda-1)(wv1*wv2)wv1
The default value (if lambda is null) for lambda is 2.0.
This composition method only works with the SimilarityMeasures.COSINE
similarity measure.
- composeWordVectors(Map<String, Float>, Map<String, Float>, Compositionality.VectorCompositionMethod, Float, Float, Float, Float) - Static method in class de.linguatools.disco.Compositionality
-
Compose two word vectors by the composition method given in
compositionMethod
.
- composeWordVectors(float[], float[], Compositionality.VectorCompositionMethod, Float, Float, Float, Float) - Static method in class de.linguatools.disco.Compositionality
-
Compose two word vectors by the composition method given in
compositionMethod
.
- composeWordVectors(ArrayList<Map<String, Float>>, Compositionality.VectorCompositionMethod, Float, Float, Float, Float) - Static method in class de.linguatools.disco.Compositionality
-
Compose two or more word vectors by the composition method given in
compositionMethod
.
- composeWordVectors(List<float[]>, Compositionality.VectorCompositionMethod, Float, Float, Float, Float) - Static method in class de.linguatools.disco.Compositionality
-
Compose two or more word vectors by the composition method given in
compositionMethod
.
- Compositionality - Class in de.linguatools.disco
-
This class provides support for compositional distributional semantics.
- Compositionality() - Constructor for class de.linguatools.disco.Compositionality
-
- Compositionality.VectorCompositionMethod - Enum in de.linguatools.disco
-
Implemented methods of vector composition.
- compositionalSemanticSimilarity(String, String, Compositionality.VectorCompositionMethod, DISCO.SimilarityMeasure, DISCO, Float, Float, Float, Float) - Static method in class de.linguatools.disco.Compositionality
-
This method computes the semantic similarity between two multi-word terms,
phrases, sentences or paragraphs.
- computeAvgDenseOffsetVector(List<String[]>, DISCO) - Static method in class de.linguatools.disco.Compositionality
-
Computes the average vector over all offset vectors in the wordPairs
list.
- computeSimilarity(float[], float[]) - Method in class de.linguatools.disco.CosineVectorSimilarity
-
- computeSimilarity(Map<String, Float>, Map<String, Float>) - Method in class de.linguatools.disco.CosineVectorSimilarity
-
- computeSimilarity(Document, Document) - Method in class de.linguatools.disco.CosineVectorSimilarity
-
- computeSimilarity(float[], float[]) - Method in class de.linguatools.disco.KolbVectorSimilarity
-
- computeSimilarity(Document, Document) - Method in class de.linguatools.disco.KolbVectorSimilarity
-
This method compares two word vectors from a DISCOLuceneIndex using the
similarity measure SimilarityMeasures.KOLB
that is described
in the paper
Peter Kolb.
- computeSimilarity(Map<String, Float>, Map<String, Float>) - Method in class de.linguatools.disco.KolbVectorSimilarity
-
- computeSimilarity(float[], float[]) - Method in interface de.linguatools.disco.VectorSimilarity
-
Compute similarity between two dense vectors.
- computeSimilarity(Map<String, Float>, Map<String, Float>) - Method in interface de.linguatools.disco.VectorSimilarity
-
Compute similarity between two sparse vectors.
- computeSimilarity(Document, Document) - Method in interface de.linguatools.disco.VectorSimilarity
-
Compute similarity between two vectors stored in Lucene Documents.
- computeWordVector(String[], Compositionality.VectorCompositionMethod, DISCO, Float, Float, Float, Float) - Static method in class de.linguatools.disco.Compositionality
-
Construct a word vector that represents the multi
-word
phrase.
- computeWordVector(String[], DenseMatrix, Compositionality.VectorCompositionMethod, Float, Float, Float, Float) - Static method in class de.linguatools.disco.Compositionality
-
Construct a word embedding that represents the multi
word
phrase.
- ConfigFile - Class in de.linguatools.disco
-
This class contains methods for creating, reading and accessing the
configuration file "disco.config".
- ConfigFile() - Constructor for class de.linguatools.disco.ConfigFile
-
Constructor 1: create class with empty (default) fields.
- ConfigFile(String) - Constructor for class de.linguatools.disco.ConfigFile
-
Constructor 2: read data from file into class fields.
- ConfigFile.FileFormat - Enum in de.linguatools.disco
-
Known file formats.
- CorruptConfigFileException - Exception in de.linguatools.disco
-
Exception that is thrown when some information that should be present in the
file disco.config
in the word space directory can not be read.
- CorruptConfigFileException() - Constructor for exception de.linguatools.disco.CorruptConfigFileException
-
- CorruptConfigFileException(String) - Constructor for exception de.linguatools.disco.CorruptConfigFileException
-
- CosineVectorSimilarity - Class in de.linguatools.disco
-
Computes the cosine similarity between two vectors.
- CosineVectorSimilarity() - Constructor for class de.linguatools.disco.CosineVectorSimilarity
-
- getConfig() - Method in class de.linguatools.disco.DenseMatrix
-
- getEmbeddingForOov(String, DenseMatrix) - Static method in class de.linguatools.disco.Subword
-
Compute word embedding for out of vocabulary word.
- getMapVector(Document) - Static method in class de.linguatools.disco.SparseVector
-
Convert a Document from a DISCOLuceneIndex (which stores a word's data)
to a map vector.
- getMatrixRowNumber(String) - Method in class de.linguatools.disco.DenseMatrix
-
- getMaxFreq() - Method in class de.linguatools.disco.DenseMatrix
-
- getMaxFreq() - Method in class de.linguatools.disco.DISCO
-
Get corpus frequency of the most frequent word in the word space (that
was not filtered out by the stop word list that was used).
- getMaxFreq() - Method in class de.linguatools.disco.DISCOLuceneIndex
-
- getMaxN() - Method in class de.linguatools.disco.DenseMatrix
-
- getMinFreq() - Method in class de.linguatools.disco.DenseMatrix
-
- getMinFreq() - Method in class de.linguatools.disco.DISCO
-
Get minimum frequency of tokens in corpus.
- getMinFreq() - Method in class de.linguatools.disco.DISCOLuceneIndex
-
- getMinN() - Method in class de.linguatools.disco.DenseMatrix
-
- getMostSimilar(int, int) - Method in class de.linguatools.disco.DenseMatrix
-
Compute the max
most similar words for word with ID
wordId
.
- getNgramVector(String) - Method in class de.linguatools.disco.DenseMatrix
-
- getNormalizedVector(Map<String, Float>) - Static method in class de.linguatools.disco.SparseVector
-
- getSecondOrderMapVector(Document) - Static method in class de.linguatools.disco.SparseVector
-
Get the second order word vector from doc
as a sparse vector.
Important: this only works with documents from a DISCOLuceneIndex
of type WordspaceType.SIM
.
- getSecondOrderWordvector(String) - Method in class de.linguatools.disco.DenseMatrix
-
The second order word vector contains words as keys, namely the most
similar words for word
.
- getSecondOrderWordvector(int) - Method in class de.linguatools.disco.DenseMatrix
-
- getSecondOrderWordvector(String) - Method in class de.linguatools.disco.DISCO
-
The second order word vector contains the nBest
most similar
words for word
as features (instead of the directly
co-occuring words that you get with getWordvector
).
- getSecondOrderWordvector(String) - Method in class de.linguatools.disco.DISCOLuceneIndex
-
- getSimilarityMeasure(String) - Static method in class de.linguatools.disco.DISCO
-
Get SimilarityMeasure
object from its String name.
- getStopwords() - Method in class de.linguatools.disco.DenseMatrix
-
- getStopwords() - Method in class de.linguatools.disco.DISCO
-
Gets list of stopwords from the disco.config
file in the
word space.
- getStopwords() - Method in class de.linguatools.disco.DISCOLuceneIndex
-
Get the stopwords for this word space instance.
- getTokenCount() - Method in class de.linguatools.disco.DenseMatrix
-
- getTokenCount() - Method in class de.linguatools.disco.DISCO
-
Size of the underlying corpus.
- getTokenCount() - Method in class de.linguatools.disco.DISCOLuceneIndex
-
- getVectorSimilarity(DISCO.SimilarityMeasure) - Static method in class de.linguatools.disco.DISCO
-
Get VectorSimilarity class for a SimilarityMeasure.
- getVocabularyIterator() - Method in class de.linguatools.disco.DenseMatrix
-
- getVocabularyIterator() - Method in class de.linguatools.disco.DISCO
-
Returns an iterator that iterates over all words in the word space (the
vocabulary).
- getVocabularyIterator() - Method in class de.linguatools.disco.DISCOLuceneIndex
-
- getWord(int) - Method in class de.linguatools.disco.DenseMatrix
-
- getWord(int) - Method in class de.linguatools.disco.DISCO
-
Returns the id-th word in the vocabulary.
- getWord(int) - Method in class de.linguatools.disco.DISCOLuceneIndex
-
- getWordEmbedding(String) - Method in class de.linguatools.disco.DenseMatrix
-
Get embedding vector for word
.
- getWordId(String) - Method in class de.linguatools.disco.DenseMatrix
-
- getWordspaceType() - Method in class de.linguatools.disco.DenseMatrix
-
Returns the type of the word space instance.
- getWordspaceType() - Method in class de.linguatools.disco.DISCO
-
Get type of this word space.
- getWordspaceType() - Method in class de.linguatools.disco.DISCOLuceneIndex
-
Returns the type of the word space instance.
- getWordvector(String) - Method in class de.linguatools.disco.DenseMatrix
-
Returns a word embedding converted to a sparse vector.
- getWordVector(int) - Method in class de.linguatools.disco.DenseMatrix
-
- getWordvector(String) - Method in class de.linguatools.disco.DISCO
-
Get word vector for word
as map feature - value
.
- getWordvector(String) - Method in class de.linguatools.disco.DISCOLuceneIndex
-
Returns the word vector representing the distribution of the input word
in the corpus.
The word vector can be used with the methods in the class
Compositionality
.
- growSet(DISCO, String[]) - Static method in class de.linguatools.disco.Cluster
-
Retrieves the similar words for all the words in the input set and
extends the input set by all words that appear in the similarity lists of
all the input words.
- searchIndex(String) - Method in class de.linguatools.disco.DISCOLuceneIndex
-
Searches for a input word in index field
word
and returns
the first hit
Document
or
null
.
DISCOLuceneIndex uses the
Lucene index.
- secondOrderSimilarity(String, String, VectorSimilarity) - Method in class de.linguatools.disco.DenseMatrix
-
- secondOrderSimilarity(String, String, VectorSimilarity) - Method in class de.linguatools.disco.DISCO
-
Computes the similarity between words w1
and w2
by comparing the set of the most similar words for w1
with
the set of the most similar words for w2
.
The size of the set of similar words stored for each word in the word space
is given by the DISCOBuilder parameter -nBest
.
- secondOrderSimilarity(String, String, VectorSimilarity) - Method in class de.linguatools.disco.DISCOLuceneIndex
-
Computes the second order semantic similarity between the input words
based on the sets of their distributionally similar words.
Important note: This method only works with word spaces of type
WordspaceType.SIM
.
- semanticSimilarity(String, String, VectorSimilarity) - Method in class de.linguatools.disco.DenseMatrix
-
- semanticSimilarity(String, String, VectorSimilarity) - Method in class de.linguatools.disco.DISCO
-
Computes the similarity between words w1
and w2
by comparing their word vectors using the vectorSimilarity
measure of choice.
- semanticSimilarity(String, String, VectorSimilarity) - Method in class de.linguatools.disco.DISCOLuceneIndex
-
Computes the semantic similarity (according to the vector similarity
measure similarityMeasure
) between the two input words based
on their collocation sets (i.e.
- serialize(DenseMatrix, String) - Static method in class de.linguatools.disco.DenseMatrix
-
Serialize DenseMatrix
object to file.
- setNumberOfSimilarWords(int) - Method in class de.linguatools.disco.DenseMatrix
-
- setSimMatrix(int[][]) - Method in class de.linguatools.disco.DenseMatrix
-
- setSimValues(float[][]) - Method in class de.linguatools.disco.DenseMatrix
-
- setWordspaceType(DISCO.WordspaceType) - Method in class de.linguatools.disco.DenseMatrix
-
- similarityMeasure - Variable in class de.linguatools.disco.ConfigFile
-
- similarWords(Map<String, Float>, DISCO, DISCO.SimilarityMeasure, int) - Static method in class de.linguatools.disco.Compositionality
-
Find the most similar words in the DISCO word space for an input word
vector.
- similarWords(float[], DenseMatrix, DISCO.SimilarityMeasure, int) - Static method in class de.linguatools.disco.Compositionality
-
Find the most similar words in the DISCO word space for an input word
vector.
- similarWords(String) - Method in class de.linguatools.disco.DenseMatrix
-
Only works with word spaces of type DISCO.WordspaceType.SIM
.
- similarWords(String) - Method in class de.linguatools.disco.DISCO
-
Returns the list of the most similar words for word
(according
to DISCO.semanticSimilarity
).
- similarWords(String) - Method in class de.linguatools.disco.DISCOLuceneIndex
-
Looks up the input word in the index and returns its semantically
similar words ordered by decreasing similarity together
with their similarity values.
If the search word isn't found in the word space, the return value is
null
.
The similarity values in the result can differ from the values you get
with DISCOLuceneIndex.semanticSimilarity
for the same word pair.
- similarWordsGraphSearch(Map<String, Float>, DISCO, DISCO.SimilarityMeasure, int) - Static method in class de.linguatools.disco.Compositionality
-
Approximate nearest neighbor search to find the most similar word in the
vocabulary for an input wordvector
.
- similarWordsGraphSearch(float[], DenseMatrix, DISCO.SimilarityMeasure, int) - Static method in class de.linguatools.disco.Compositionality
-
Approximate nearest neighbor search to find the most similar word in the
vocabulary for an input wordvector
.
- solveAnalogy(String, String, String, DISCO) - Static method in class de.linguatools.disco.Compositionality
-
This method solves the analogy "a1 is to b1 like a2 is to b2", i.e.
- solveAnalogyApprox(String, String, String, DISCO) - Static method in class de.linguatools.disco.Compositionality
-
Fast approximation of solveAnalogy
.
- solveAnalogyAverageOffset(String, List<String[]>, DISCO) - Static method in class de.linguatools.disco.Compositionality
-
Solves the analogy a1 : b1 = a2 : b2
by returning the missing
word a1
.
- SparseVector - Class in de.linguatools.disco
-
Methods for sparse vectors.
- SparseVector() - Constructor for class de.linguatools.disco.SparseVector
-
- stopwordFile - Variable in class de.linguatools.disco.ConfigFile
-
- stopwords - Variable in class de.linguatools.disco.ConfigFile
-
- sub(float[], float[]) - Static method in class de.linguatools.disco.DenseVector
-
Subtracts the second vector from the first.
- sub(Map<String, Float>, Map<String, Float>) - Static method in class de.linguatools.disco.SparseVector
-
Subtract the second word vector from the first.
- Subword - Class in de.linguatools.disco
-
This deals with ngrams stored in a DenseMatrix.
- Subword() - Constructor for class de.linguatools.disco.Subword
-
- value - Variable in class de.linguatools.disco.ReturnDataCol
-
- valueOf(String) - Static method in enum de.linguatools.disco.Compositionality.VectorCompositionMethod
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum de.linguatools.disco.ConfigFile.FileFormat
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum de.linguatools.disco.DISCO.SimilarityMeasure
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum de.linguatools.disco.DISCO.WordspaceType
-
Returns the enum constant of this type with the specified name.
- values() - Static method in enum de.linguatools.disco.Compositionality.VectorCompositionMethod
-
Returns an array containing the constants of this enum type, in
the order they are declared.
- values() - Static method in enum de.linguatools.disco.ConfigFile.FileFormat
-
Returns an array containing the constants of this enum type, in
the order they are declared.
- values() - Static method in enum de.linguatools.disco.DISCO.SimilarityMeasure
-
Returns an array containing the constants of this enum type, in
the order they are declared.
- values() - Static method in enum de.linguatools.disco.DISCO.WordspaceType
-
Returns an array containing the constants of this enum type, in
the order they are declared.
- values - Variable in class de.linguatools.disco.ReturnDataBN
-
- vectorExtrema(float[], float[]) - Static method in class de.linguatools.disco.DenseVector
-
Choose for each dimension the highest absolute value.
- vectorExtrema(Map<String, Float>, Map<String, Float>) - Static method in class de.linguatools.disco.SparseVector
-
Choose for each dimension the highest absolute value.
- vectorRejection(Map<String, Float>, Map<String, Float>) - Static method in class de.linguatools.disco.Compositionality
-
Computes vector rejection of a on b.
- vectorRejection(float[], float[]) - Static method in class de.linguatools.disco.Compositionality
-
Computes vector rejection of a on b.
- VectorSimilarity - Interface in de.linguatools.disco
-
Interface for implementing vector similarity measures.
- vocabularySize - Variable in class de.linguatools.disco.ConfigFile
-