Full API documentation

Once a Linguistica object (such as lxa_object below with the Brown corpus) is initialized, various methods and attributes are available for automatic linguistic analysis:

>>> import linguistica as lxa
>>> lxa_object = lxa.read_corpus('path/to/english-brown.txt')
>>> words = lxa_object.wordlist()  # using wordlist()

Basic information

number_of_word_tokens()

Return the number of word tokens.

number_of_word_types()

Return the number of word types.

Word ngrams

Parameter: max_word_tokens

wordlist()

Return a wordlist sorted by word frequency in descending order.

word_unigram_counter()

Return a dict of words with their counts.

word_bigram_counter()

Return a dict of word bigrams with their counts.

word_trigram_counter()

Return a dict of word trigrams with their counts.

Morphological signatures

Parameters: min_stem_length, max_affix_length, min_sig_count, suffixing

signatures()

Return a set of morphological signatures.

stems()

Return a set of stems.

affixes()

Return a set of affixes.

signatures_to_stems()

Return a dict of morphological signatures to stems.

signatures_to_words()

Return a dict of morphological signatures to words.

affixes_to_signatures()

Return a dict of affixes to morphological signatures.

stems_to_signatures()

Return a dict of stems to morphological signatures.

stems_to_words()

Return a dict of stems to words.

words_in_signatures()

Return a set of words in at least one morphological signature.

words_to_signatures()

Return a dict of words to morphological signatures.

words_to_sigtransforms()

Return a dict of words to signature transforms.

Word manifolds and syntactic word neighborhood

Parameters: max_word_types, min_context_count, n_neighbors, n_eigenvectors

words_to_neighbors()

Return a dict of words to syntactic neighbors.

neighbor_graph()

Return the syntactic word neighborhood graph.

words_to_contexts()

Return a dict of words to contexts with counts.

contexts_to_words()

Return a dict of contexts to words with counts.

Phonology

phone_unigram_counter()

Return a dict of phone unigrams with counts.

phone_bigram_counter()

Return a dict of phone bigrams with counts.

phone_trigram_counter()

Return a dict of phone trigrams with counts.

Tries

Parameter: min_stem_length

broken_words_left_to_right()

Return a dict of words to their left-to-right broken form.

broken_words_right_to_left()

Return a dict of words to their right-to-left broken form.

successors()

Return a dict of word (sub)strings to their successors.

predecessors()

Return a dict of word (sub)strings to their predecessors.

Other methods and attributes

parameters()

Return the parameter dict.

change_parameters(**kwargs)

Change the specified parameters.

use_default_parameters()

Reset parameters to their default values.

reset()

Reset the Linguistica object.

class linguistica.lexicon.Lexicon(file_path: str | PathLike[str] | None = None, wordlist_file: bool = False, corpus_object: str | list[str] | None = None, wordlist_object: dict[str, int] | Iterable[str] | None = None, encoding: str = 'utf8', **kwargs: Unpack[ParametersKwargs])

A class for a Linguistica object.

affixes() set[str]

Return a set of affixes.

affixes_to_signatures() dict[str, set[tuple[str, ...]]]

Return a dict of affixes to morphological signatures.

biphone_dict() dict[tuple[str, str], Biphone]

Return a dict of phone bigrams to Biphone objects.

A Biphone instance has the methods spelling(), count(), frequency(), MI(), and weighted_MI().

broken_words_left_to_right() dict[str, list[str]]

Return a dict of words to their left-to-right broken form.

broken_words_right_to_left() dict[str, list[str]]

Return a dict of words to their right-to-left broken form.

change_parameters(**kwargs: Unpack[ParametersKwargs]) None

Change the specified parameters.

Parameters:

kwargs – keyword arguments for parameters and their new values

contexts_to_words() dict[tuple[str, ...], dict[str, int]] | None

Return a dict of contexts to words with counts.

neighbor_graph() Graph | None

Return the syntactic word neighborhood graph.

number_of_word_tokens() int

Return the number of word tokens.

number_of_word_types() int

Return the number of word types.

output_all_results(directory: str | PathLike[str] | None = None, verbose: bool = False, test: bool = False) None

Output all Linguistica results to a directory.

Parameters:

directory – output directory. If not specified, it defaults to the current directory given by os.getcwd().

parameters() dict[str, int]

Return the parameter dict.

phone_bigram_counter() dict[tuple[str, str], int]

Return a dict of phone bigrams with counts.

phone_dict() dict[str, Phone]

Return a dict of phone unigrams to Phone objects.

A Phone instance has the methods spelling(), count(), frequency(), and plog().

phone_trigram_counter() dict[tuple[str, str, str], int]

Return a dict of phone trigrams with counts.

phone_unigram_counter() dict[str, int]

Return a dict of phone unigrams with counts.

predecessors() dict[str, set[str]]

Return a dict of word (sub)strings to their predecessors.

reset() None

Reset the Linguistica object.

While the file path information is retained, all computed objects (ngrams, signatures, word neighbors, etc) are reset to None; if they are called again, they are re-computed.

run_all_modules(verbose: bool = False) None

Run all modules.

run_manifold_module(verbose: bool = False) None

Run the manifold module.

run_ngram_module(verbose: bool = False) None

Run the ngram module.

run_phon_module(verbose: bool = False) None

Run the phon module.

run_signature_module(verbose: bool = False) None

Run the signature module.

run_trie_module(verbose: bool = False) None

Run the trie module.

signatures() set[tuple[str, ...]]

Return a set of morphological signatures.

signatures_to_stems() dict[tuple[str, ...], set[str]]

Return a dict of morphological signatures to stems.

signatures_to_words() dict[tuple[str, ...], set[str]]

Return a dict of morphological signatures to words.

stems() set[str]

Return a set of stems.

stems_to_signatures() dict[str, set[tuple[str, ...]]]

Return a dict of stems to morphological signatures.

stems_to_words() dict[str, set[str]]

Return a dict of stems to words.

successors() dict[str, set[str]]

Return a dict of word (sub)strings to their successors.

use_default_parameters() None

Reset parameters to their default values.

word_bigram_counter() dict[tuple[str, str], int]

Return a dict of word bigrams with their counts.

word_phonology_dict() dict[str, Word]

Return a dict of words to Word objects.

A Word instance has the methods spelling(), phones(), count(), frequency(), unigram_plog(), avg_unigram_plog(), bigram_plog(), and avg_bigram_plog().

word_trigram_counter() dict[tuple[str, str, str], int]

Return a dict of word trigrams with their counts.

word_unigram_counter() dict[str, int]

Return a dict of words with their counts.

wordlist() list[str]

Return a wordlist sorted by word frequency in descending order.

words_in_signatures() set[str]

Return a set of words in at least one morphological signature.

words_to_contexts() dict[str, dict[tuple[str, ...], int]] | None

Return a dict of words to contexts with counts.

words_to_neighbors() dict[str, list[str]] | None

Return a dict of words to syntactic neighbors.

words_to_phones() dict[str, list[str]] | None

Return a dict of words with their phones.

words_to_signatures() dict[str, set[tuple[str, ...]]]

Return a dict of words to morphological signatures.

words_to_sigtransforms() dict[str, set[tuple[tuple[str, ...], str]]]

Return a dict of words to signature transforms.