Full API documentation
Once a Linguistica object (such as lxa_object below with the Brown corpus)
is initialized, various methods and attributes are available for automatic
linguistic analysis:
>>> import linguistica as lxa
>>> lxa_object = lxa.read_corpus('path/to/english-brown.txt')
>>> words = lxa_object.wordlist() # using wordlist()
Basic information
Return the number of word tokens. |
|
Return the number of word types. |
Word ngrams
Parameter: max_word_tokens
|
Return a wordlist sorted by word frequency in descending order. |
Return a dict of words with their counts. |
|
Return a dict of word bigrams with their counts. |
|
Return a dict of word trigrams with their counts. |
Morphological signatures
Parameters: min_stem_length, max_affix_length, min_sig_count, suffixing
Return a set of morphological signatures. |
|
|
Return a set of stems. |
|
Return a set of affixes. |
Return a dict of morphological signatures to stems. |
|
Return a dict of morphological signatures to words. |
|
Return a dict of affixes to morphological signatures. |
|
Return a dict of stems to morphological signatures. |
|
Return a dict of stems to words. |
|
Return a set of words in at least one morphological signature. |
|
Return a dict of words to morphological signatures. |
|
Return a dict of words to signature transforms. |
Word manifolds and syntactic word neighborhood
Parameters: max_word_types, min_context_count, n_neighbors, n_eigenvectors
Return a dict of words to syntactic neighbors. |
|
Return the syntactic word neighborhood graph. |
|
Return a dict of words to contexts with counts. |
|
Return a dict of contexts to words with counts. |
Phonology
Return a dict of phone unigrams with counts. |
|
Return a dict of phone bigrams with counts. |
|
Return a dict of phone trigrams with counts. |
Tries
Parameter: min_stem_length
Return a dict of words to their left-to-right broken form. |
|
Return a dict of words to their right-to-left broken form. |
|
Return a dict of word (sub)strings to their successors. |
|
Return a dict of word (sub)strings to their predecessors. |
Other methods and attributes
Return the parameter dict. |
|
|
Change the specified parameters. |
Reset parameters to their default values. |
|
|
Reset the Linguistica object. |
- class linguistica.lexicon.Lexicon(file_path: str | PathLike[str] | None = None, wordlist_file: bool = False, corpus_object: str | list[str] | None = None, wordlist_object: dict[str, int] | Iterable[str] | None = None, encoding: str = 'utf8', **kwargs: Unpack[ParametersKwargs])
A class for a Linguistica object.
- affixes_to_signatures() dict[str, set[tuple[str, ...]]]
Return a dict of affixes to morphological signatures.
- biphone_dict() dict[tuple[str, str], Biphone]
Return a dict of phone bigrams to Biphone objects.
A Biphone instance has the methods
spelling(),count(),frequency(),MI(), andweighted_MI().
- broken_words_left_to_right() dict[str, list[str]]
Return a dict of words to their left-to-right broken form.
- broken_words_right_to_left() dict[str, list[str]]
Return a dict of words to their right-to-left broken form.
- change_parameters(**kwargs: Unpack[ParametersKwargs]) None
Change the specified parameters.
- Parameters:
kwargs – keyword arguments for parameters and their new values
- contexts_to_words() dict[tuple[str, ...], dict[str, int]] | None
Return a dict of contexts to words with counts.
- output_all_results(directory: str | PathLike[str] | None = None, verbose: bool = False, test: bool = False) None
Output all Linguistica results to a directory.
- Parameters:
directory – output directory. If not specified, it defaults to the current directory given by
os.getcwd().
- phone_dict() dict[str, Phone]
Return a dict of phone unigrams to Phone objects.
A Phone instance has the methods
spelling(),count(),frequency(), andplog().
- phone_trigram_counter() dict[tuple[str, str, str], int]
Return a dict of phone trigrams with counts.
- reset() None
Reset the Linguistica object.
While the file path information is retained, all computed objects (ngrams, signatures, word neighbors, etc) are reset to
None; if they are called again, they are re-computed.
- signatures_to_stems() dict[tuple[str, ...], set[str]]
Return a dict of morphological signatures to stems.
- signatures_to_words() dict[tuple[str, ...], set[str]]
Return a dict of morphological signatures to words.
- stems_to_signatures() dict[str, set[tuple[str, ...]]]
Return a dict of stems to morphological signatures.
- word_phonology_dict() dict[str, Word]
Return a dict of words to Word objects.
A Word instance has the methods
spelling(),phones(),count(),frequency(),unigram_plog(),avg_unigram_plog(),bigram_plog(), andavg_bigram_plog().
- word_trigram_counter() dict[tuple[str, str, str], int]
Return a dict of word trigrams with their counts.
- words_to_contexts() dict[str, dict[tuple[str, ...], int]] | None
Return a dict of words to contexts with counts.