Full API documentation

Once a Linguistica object (such as lxa_object below with the Brown corpus) is initialized, various methods and attributes are available for automatic linguistic analysis:

>>> import linguistica as lxa
>>> lxa_object = lxa.read_corpus('path/to/english-brown.txt')
>>> words = lxa_object.wordlist()  # using wordlist()

Basic information

`number_of_word_tokens`()	Return the number of word tokens.
`number_of_word_types`()	Return the number of word types.

Word ngrams

Parameter: max_word_tokens

`wordlist`()	Return a wordlist sorted by word frequency in descending order.
`word_unigram_counter`()	Return a dict of words with their counts.
`word_bigram_counter`()	Return a dict of word bigrams with their counts.
`word_trigram_counter`()	Return a dict of word trigrams with their counts.

Morphological signatures

Parameters: min_stem_length, max_affix_length, min_sig_count, suffixing

`signatures`()	Return a set of morphological signatures.
`stems`()	Return a set of stems.
`affixes`()	Return a set of affixes.
`signatures_to_stems`()	Return a dict of morphological signatures to stems.
`signatures_to_words`()	Return a dict of morphological signatures to words.
`affixes_to_signatures`()	Return a dict of affixes to morphological signatures.
`stems_to_signatures`()	Return a dict of stems to morphological signatures.
`stems_to_words`()	Return a dict of stems to words.
`words_in_signatures`()	Return a set of words in at least one morphological signature.
`words_to_signatures`()	Return a dict of words to morphological signatures.
`words_to_sigtransforms`()	Return a dict of words to signature transforms.

Word manifolds and syntactic word neighborhood

Parameters: max_word_types, min_context_count, n_neighbors, n_eigenvectors

`words_to_neighbors`()	Return a dict of words to syntactic neighbors.
`neighbor_graph`()	Return the syntactic word neighborhood graph.
`words_to_contexts`()	Return a dict of words to contexts with counts.
`contexts_to_words`()	Return a dict of contexts to words with counts.

Phonology

`phone_unigram_counter`()	Return a dict of phone unigrams with counts.
`phone_bigram_counter`()	Return a dict of phone bigrams with counts.
`phone_trigram_counter`()	Return a dict of phone trigrams with counts.

Tries

Parameter: min_stem_length

`broken_words_left_to_right`()	Return a dict of words to their left-to-right broken form.
`broken_words_right_to_left`()	Return a dict of words to their right-to-left broken form.
`successors`()	Return a dict of word (sub)strings to their successors.
`predecessors`()	Return a dict of word (sub)strings to their predecessors.

Other methods and attributes

`parameters`()	Return the parameter dict.
`change_parameters`(**kwargs)	Change the specified parameters.
`use_default_parameters`()	Reset parameters to their default values.
`reset`()	Reset the Linguistica object.

A class for a Linguistica object.

affixes() → set[str]: Return a set of affixes.

affixes_to_signatures() → dict[str, set[tuple[str, ...]]]: Return a dict of affixes to morphological signatures.

biphone_dict() → dict[tuple[str, str], Biphone]

Return a dict of phone bigrams to Biphone objects.

A Biphone instance has the methods spelling(), count(), frequency(), MI(), and weighted_MI().

broken_words_left_to_right() → dict[str, list[str]]: Return a dict of words to their left-to-right broken form.

broken_words_right_to_left() → dict[str, list[str]]: Return a dict of words to their right-to-left broken form.

change_parameters(**kwargs: Unpack[ParametersKwargs]) → None

Change the specified parameters.

Parameters:: kwargs – keyword arguments for parameters and their new values

contexts_to_words() → dict[tuple[str, ...], dict[str, int]] | None: Return a dict of contexts to words with counts.

neighbor_graph() → Graph | None: Return the syntactic word neighborhood graph.

number_of_word_tokens() → int: Return the number of word tokens.

number_of_word_types() → int: Return the number of word types.

output_all_results(directory: str | PathLike[str] | None = None, verbose: bool = False, test: bool = False) → None

Output all Linguistica results to a directory.

Parameters:: directory – output directory. If not specified, it defaults to the current directory given by os.getcwd().

parameters() → dict[str, int]: Return the parameter dict.

phone_bigram_counter() → dict[tuple[str, str], int]: Return a dict of phone bigrams with counts.

phone_dict() → dict[str, Phone]

Return a dict of phone unigrams to Phone objects.

A Phone instance has the methods spelling(), count(), frequency(), and plog().

phone_trigram_counter() → dict[tuple[str, str, str], int]: Return a dict of phone trigrams with counts.

phone_unigram_counter() → dict[str, int]: Return a dict of phone unigrams with counts.

predecessors() → dict[str, set[str]]: Return a dict of word (sub)strings to their predecessors.

reset() → None

Reset the Linguistica object.

While the file path information is retained, all computed objects (ngrams, signatures, word neighbors, etc) are reset to None; if they are called again, they are re-computed.

run_all_modules(verbose: bool = False) → None: Run all modules.

run_manifold_module(verbose: bool = False) → None: Run the manifold module.

run_ngram_module(verbose: bool = False) → None: Run the ngram module.

run_phon_module(verbose: bool = False) → None: Run the phon module.

run_signature_module(verbose: bool = False) → None: Run the signature module.

run_trie_module(verbose: bool = False) → None: Run the trie module.

signatures() → set[tuple[str, ...]]: Return a set of morphological signatures.

signatures_to_stems() → dict[tuple[str, ...], set[str]]: Return a dict of morphological signatures to stems.

signatures_to_words() → dict[tuple[str, ...], set[str]]: Return a dict of morphological signatures to words.

stems() → set[str]: Return a set of stems.

stems_to_signatures() → dict[str, set[tuple[str, ...]]]: Return a dict of stems to morphological signatures.

stems_to_words() → dict[str, set[str]]: Return a dict of stems to words.

successors() → dict[str, set[str]]: Return a dict of word (sub)strings to their successors.

use_default_parameters() → None: Reset parameters to their default values.

word_bigram_counter() → dict[tuple[str, str], int]: Return a dict of word bigrams with their counts.

word_phonology_dict() → dict[str, Word]

Return a dict of words to Word objects.

A Word instance has the methods spelling(), phones(), count(), frequency(), unigram_plog(), avg_unigram_plog(), bigram_plog(), and avg_bigram_plog().

word_trigram_counter() → dict[tuple[str, str, str], int]: Return a dict of word trigrams with their counts.

word_unigram_counter() → dict[str, int]: Return a dict of words with their counts.

wordlist() → list[str]: Return a wordlist sorted by word frequency in descending order.

words_in_signatures() → set[str]: Return a set of words in at least one morphological signature.

words_to_contexts() → dict[str, dict[tuple[str, ...], int]] | None: Return a dict of words to contexts with counts.

words_to_neighbors() → dict[str, list[str]] | None: Return a dict of words to syntactic neighbors.

words_to_phones() → dict[str, list[str]] | None: Return a dict of words with their phones.

words_to_signatures() → dict[str, set[tuple[str, ...]]]: Return a dict of words to morphological signatures.

words_to_sigtransforms() → dict[str, set[tuple[tuple[str, ...], str]]]: Return a dict of words to signature transforms.