Public API

CLI utilities

hunspellcheck.hunspellchecker_argument_parser(parser, version=False, version_prog=None, version_number=None, hunspell_version=True, ispell_version=True, version_template='{% if version_number %}{{version_prog}} {{version_number}}{% endif %}{% if version_number and (hunspell_version or ispell_version) %} - {% endif %}{% if hunspell_version %}Hunspell {{hunspell_version}}{% endif %}{% if hunspell_version and ispell_version %} - {% endif %}{% if ispell_version %}Ispell {{ispell_version}}{% endif %}', version_template_context={}, version_name_or_flags=['--version'], version_kwargs={}, files=True, files_kwargs={}, languages=True, languages_name_or_flags=['-l', '--language'], languages_kwargs={}, negotiate_languages=True, personal_dicts=True, personal_dicts_name_or_flags=['-p', '--personal-dict'], personal_dicts_kwargs={}, encoding=True, encoding_name_or_flags=['-i', '--input-encoding'], encoding_kwargs={}, digits_are_words=True, digits_are_words_name_or_flags=['--digits-are-words'], digits_are_words_kwargs={}, words_not_contain_digits=True, words_not_contain_digits_name_or_flags=['--words-not-contain-digits'], words_not_contain_digits_kwargs={}, words_not_startswith_dash=True, words_not_startswith_dash_name_or_flags=['--words-not-startswith-dash'], words_not_startswith_dash_kwargs={}, words_not_endswith_dash=True, words_not_endswith_dash_name_or_flags=['--words-not-endswith-dash'], words_not_endswith_dash_kwargs={}, words_not_contain_dash=True, words_not_contain_dash_name_or_flags=['--words-not-contain-dash'], words_not_contain_dash_kwargs={}, words_not_contain_two_upper=True, words_not_contain_two_upper_name_or_flags=['--words-not-contain-two-upper'], words_not_contain_two_upper_kwargs={}, no_include_filename=True, no_include_filename_name_or_flags=['--no-include-filename'], no_include_filename_kwargs={}, no_include_line_number=True, no_include_line_number_name_or_flags=['--no-include-line-number'], no_include_line_number_kwargs={}, no_include_word=True, no_include_word_name_or_flags=['--no-include-word'], no_include_word_kwargs={}, no_include_word_line_index=True, no_include_word_line_index_name_or_flags=['--no-include-word-line-index'], no_include_word_line_index_kwargs={}, include_line=True, include_line_name_or_flags=['--include-line'], include_line_kwargs={}, include_text=True, include_text_name_or_flags=['--include-text'], include_text_kwargs={}, include_error_number=True, include_error_number_name_or_flags=['--include-error-number'], include_error_number_kwargs={}, include_near_misses=True, include_near_misses_name_or_flags=['--include-near-misses'], include_near_misses_kwargs={})

Extends a argparse.ArgumentParser instance adding spellchecking common parameters.

By default will add next parameters:

  • A positional argument as a property named files inside the options namespace which takes multiple possible globs as inputs.

  • A required argument -l/--language that could be passed multiple times which take language dictionary names or filepaths. It will check if the passed language is recognized by Hunspell (or if is a dictionary file, if exists), and in case that not, will print a list with all available dictionaries.

  • An optional argument -p/--personal-dict that could be passed multiple times which takes a path to a file used to exclude certain words from being triggered as positives.

  • An optional argument -i/--input-encoding that should define the input content encoding.

Parameters
  • version (bool) – Include a convenient --version option that will print the version of the program, and optionally the installed versions of Hunspell and Ispell. See version_prog, version_number, hunspell_version and ispell_version parameters below.

  • version_prog (str) – Name of the program shown along the version. If is not provided, will be taken from parser.prog property.

  • version_number (str) – Version of the program. See version_template argument below for details about the formatting.

  • hunspell_version (str) – Include version of Hunspell in the version shown passing --version.

  • ispell_version (str) – Include version of Ispell in the version shown passing --version.

  • version_template (str) – Template for version rendering passed to a jinja2.Template object that will be used to renderize the version string. By default, if version_number is provided, and hunspell_version and ispell_version are True, it will render a string like "<version_prog> <X.Y.Z> - Hunspell <X.Y.Z> - Ispell <X.Y.Z>". The data for template rendering by default is compound by the next fields: version_prog, version_number, hunspell_version and ispell_version. If you want to pass other fields, include them in the argument version_template_context.

  • version_template_context (dict) – Additional data to use in the version string rendering.

  • version_name_or_flags (list, str) – Flag name defined constructing the --version argument using the method argparse.ArgumentParser.add_argument().

  • version_kwargs (dict) – Optional kwargs which override the default kwargs passed to argparse.ArgumentParser.add_argument() constructing the --version option.

  • files (bool) – Include the files positional argument inside the argument parser.

  • files_kwargs (dict) – Optional kwargs which override the default kwargs passed to argparse.ArgumentParser.add_argument() constructing the files positional argument.

  • languages (bool) – Include the -l/--language option inside the argument parser.

  • languages_name_or_flags (list, str) – Flag name defined constructing the -l/--language option using the method argparse.ArgumentParser.add_argument().

  • languages_kwargs (dict) – Optional kwargs which override the default kwargs passed to argparse.ArgumentParser.add_argument() constructing the -l/--language option.

  • negotiate_languages (bool) – Enables the language negotiation. If this is enabled and the CLI consumer passes a locale code instead of a full language name (for example es instead of es_ES), hunspellcheck will convert es to a territorialized language dictionary name available using the function babel.core.Locale.negotiate(). If is disabled, a language dictionary passed as locale code like es will be considered invalid.

  • personal_dicts (bool) – Include the -p/--personal-dict option inside the argument parser.

  • personal_dicts_name_or_flags (list, str) – Flag name defined constructing the -p/--personal-dict option using the method argparse.ArgumentParser.add_argument().

  • personal_dicts_kwargs (dict) – Optional kwargs which override the default kwargs passed to argparse.ArgumentParser.add_argument() constructing the -p/--personal-dict option.

  • encoding (bool) – Include the -i/--input-encoding hunspell option inside the argument parser.

  • encoding_name_or_flags (list, str) – Flag name defined constructing the -i/--input-encoding option using the method argparse.ArgumentParser.add_argument().

  • encoding_kwargs (dict) – Optional kwargs which override the default kwargs passed to argparse.ArgumentParser.add_argument() building the -i/--input-encoding option.

  • digits_are_words (bool) – Include the option --digits-are-words to define if a value filled by digits will be considered a word for mispellchecking or not.

  • digits_are_words_name_or_flags (list, str) – Flag name defined constructing the --digits-are-words option using the method argparse.ArgumentParser.add_argument().

  • digits_are_words_kwargs (dict) – Optional kwargs which override default kwargs passed to argparse.ArgumentParser.add_argument() building the --digits-are-words option.

  • words_not_contain_digits (bool) – Include the option --words-not-contain-digits which when passed in a CLI, the words that contain digits will be ignored mispellchecking errors.

  • words_not_contain_digits_name_or_flags (list) – Flag name defined constructing the --words-not-contain-digits option using the method argparse.ArgumentParser.add_argument().

  • words_not_contain_digits_kwargs (dict) – Optional kwargs which override default kwargs passed to argparse.ArgumentParser.add_argument() building the --words-not-contain-digits option.

  • words_not_startswith_dash (bool) – Include the option --words-not-startswith-dash which when passed in a CLI, the words starting with character ``”-” `` will be ignored mispellchecking errors.

  • words_not_startswith_dash_name_or_flags (list) – Flag name defined constructing the --words-not-startswith-dash option using the method argparse.ArgumentParser.add_argument().

  • words_not_startswith_dash_kwargs (dict) – Optional kwargs which override default kwargs passed to argparse.ArgumentParser.add_argument() building the --words-not-startswith-dash option.

  • words_not_endswith_dash (bool) – Include the option --words-not-endswith-dash which when passed in a CLI, the words ending with character ``”-” `` will be ignored mispellchecking errors.

  • words_not_endswith_dash_name_or_flags (list) – Flag name defined constructing the --words-not-endswith-dash option using the method argparse.ArgumentParser.add_argument().

  • words_not_endswith_dash_kwargs (dict) – Optional kwargs which override default kwargs passed to argparse.ArgumentParser.add_argument() building the --words-not-endswith-dash option.

  • words_not_contain_dash (bool) – Include the option --words-not-contain-dash which when passed in a CLI, the words containing character ``”-” `` will be ignored mispellchecking for possible errors.

  • words_not_contain_dash_name_or_flags (list) – Flag name defined constructing the --words-not-contain-dash option using the method argparse.ArgumentParser.add_argument().

  • words_not_contain_dash_kwargs (dict) – Optional kwargs which override default kwargs passed to argparse.ArgumentParser.add_argument() building the --words-not-contain-dash option.

  • words_not_contain_two_upper (bool) – Include the option --words-not-contain-two-upper which when passed in a CLI, the words containing two uppercase letters or mote will be ignored mispellchecking for possible errors.

  • words_not_contain_two_upper_name_or_flags (list) – Flag name defined constructing the --words-not-contain-two-upper option using the method argparse.ArgumentParser.add_argument().

  • words_not_contain_two_upper_kwargs (dict) – Optional kwargs which override default kwargs passed to argparse.ArgumentParser.add_argument() building the --words-not-contain-two-upper option.

  • no_include_filename (bool) – Include the option --no-include-filename which when passed in a CLI, the path to files in which mispelling errors are found are not shown in the output.

  • no_include_filename_name_or_flags (list) – Flag name defined constructing the --no-include-filename option using the method argparse.ArgumentParser.add_argument().

  • no_include_filename_kwargs (dict) – Optional kwargs which override default kwargs passed to argparse.ArgumentParser.add_argument() building the --no-include-filename option.

  • no_include_line_number (bool) – Include the option --no-include-line-number which when passed in a CLI, the number of lines in which mispelling errors are found are not shown in the output.

  • no_include_line_number_name_or_flags (list) – Flag name defined constructing the --no-include-line-number option using the method argparse.ArgumentParser.add_argument().

  • no_include_line_number_kwargs (dict) – Optional kwargs which override default kwargs passed to argparse.ArgumentParser.add_argument() building the --no-include-line-number option.

  • no_include_word (bool) – Include the option --no-include-word which when passed in a CLI, the words in which mispelling errors are found are not shown in the output.

  • no_include_word_name_or_flags (list) – Flag name defined constructing the --no-include-word option using the method argparse.ArgumentParser.add_argument().

  • no_include_word_kwargs (dict) – Optional kwargs which override default kwargs passed to argparse.ArgumentParser.add_argument() building the --no-include-word option.

  • no_include_word_line_index (bool) – Include the option --no-include-word-line-index which when passed in a CLI, the index of the mispelled words inside their lines in which mispelling errors are found are not shown in the output.

  • no_include_word_line_index_name_or_flags (list) – Flag name defined constructing the --no-include-word-line-index option using the method argparse.ArgumentParser.add_argument().

  • no_include_word_line_index_kwargs (dict) – Optional kwargs which override default kwargs passed to argparse.ArgumentParser.add_argument() building the --no-include-word-line-index option.

  • include_line (bool) – Include the option --include-line which when passed in a CLI, the line of the mispelled words in which mispelling errors are found are shown in the output.

  • include_line_name_or_flags (list) – Flag name defined constructing the --include-line option using the method argparse.ArgumentParser.add_argument().

  • include_line_kwargs (dict) – Optional kwargs which override default kwargs passed to argparse.ArgumentParser.add_argument() building the --include-line option.

  • include_text (bool) – Include the option --include-text which when passed in a CLI, the text in which reside found mispelled words is shown in the output.

  • include_text_name_or_flags (list) – Flag name defined constructing the --include-text option using the method argparse.ArgumentParser.add_argument().

  • include_text_kwargs (dict) – Optional kwargs which override default kwargs passed to argparse.ArgumentParser.add_argument() building the --include-text option.

  • include_error_number (bool) – Include the option --include-error-number which when passed in a CLI, the number of each error is shown in the output.

  • include_error_number_name_or_flags (list) – Flag name defined building the --include-error-number option using the method argparse.ArgumentParser.add_argument().

  • include_error_number_kwargs (dict) – Optional kwargs which override default kwargs passed to argparse.ArgumentParser.add_argument() building the --include-error-number option.

  • include_near_misses (bool) – Include the option --include-near-misses which when passed in a CLI, some Hunspell suggestions will be shown for each mispelled word in the report.

  • include_near_misses_name_or_flags (list) – Flag name defined building the --include-near-misses option using the method argparse.ArgumentParser.add_argument().

  • include_near_misses_kwargs (dict) – Optional kwargs which override default kwargs passed to argparse.ArgumentParser.add_argument() building the --include-near-misses option.

Examples

>>> import argparse
>>>
>>> parser = argparse.ArgumentParser()
>>> hunspellchecker_argument_parser(
...     version=True,
...     version_number="1.0.0",
... )
>>> opts = parser.parse_args(["--language", "es"])
>>> print(opts)
Namespace(languages=["es_ES"])

Spellchecker interface

class hunspellcheck.HunspellChecker(filenames_contents, languages, personal_dicts=None, looks_like_a_word=<function looks_like_a_word>, encoding=None)

Main spellchecking interface of hunspellcheck.

Parameters
  • filenames_contents (dict) – Dictionary mapping filenames to content of those files.

  • languages (list, str) – Languages against will be checked the contents.

  • personal_dicts (str, list) – Globs of files which would be dictionaries with custom words to ignore from being triggered as positives. Can be globs or files, as string or list of strings.

  • looks_like_a_word (types.FunctionType) – Function to filter the positive words from being considered positives. Takes a possible word string and returns if the value could be considered a word to be checked for mispelling errors. By default, the function hunspellcheck.word.looks_like_a_word_creator() will be used with all its arguments by default to build a basic validator.

  • encoding (str) – Input encoding. If not defined, it will be autodetected by hunspell.

check(include_filename=True, include_line_number=True, include_word=True, include_word_line_index=True, include_line=False, include_text=False, include_error_number=False, include_near_misses=False)

Spellchecking function.

Yields each mispelled word data found in contents from a generator. The data generated for each word depends on the optional arguments include_<field> passed to this function, being field the name of the field inside the yielded dictionary.

Parameters
  • include_filename (bool) – Includes filename where the mispelled word has been found in yielded error data.

  • include_line_number (bool) – Includes the line number where the mispelled word has been found in the content for the yielded error data.

  • include_word (bool) – Includes the mispelled word found in the yielded error data.

  • include_word_line_index (bool) – Includes the index of the caracter in which the mispelled word starts in their line (starting at index 0).

  • include_line (bool) – Includes the entire line where the mispelled word resides inside the content.

  • include_text (bool) – Includes the full text of the content in where the mispelled word resides.

  • include_error_number (bool) – Include the number of the error in yielded data. This could be useful to avoid the need of define a counter.

  • include_near_misses (bool) – Includes a list with the near misses for the mispelled word.

Yields

dict – Dictionary with all the included data for each mispelled word.

hunspellcheck.render_hunspell_word_error(data, fields=['filename', 'word', 'line_number', 'word_line_index'], sep=':')

Renders a mispelled word data dictionary.

This function allows a convenient way to render each mispelled word data dictionary as a string, that could be useful to print in the context of spell checkers command line interfaces.

Parameters
Returns

Mispelled word data as a string.

Return type

str

hunspellcheck.word.looks_like_a_word_creator(digits_are_words=False, words_can_contain_digits=True, words_can_startswith_dash=True, words_can_endswith_dash=True, words_can_contain_dash=True, words_can_contain_two_upper=True)

Generates dinamically the function look_like_a_word use to clean the words that must not be checked for mispelling errors.

Parameters
  • digits_are_words (bool) – If False, values with all characters as digits will not be considered words.

  • words_can_contain_digits (bool) – If False, values with at least one digit character will not be considered words.

  • words_can_startswith_dash (bool) – If False, values starting with the - character will not be considered words.

  • words_can_endswith_dash (bool) – If False, values ending with the - character will not be considered words.

  • words_can_contain_dash (bool) – If False, values containing the - character will not be considered words.

  • words_can_contain_two_upper (bool) – If False, values which contain at least two uppercase like CPython will not be considered words and will not be checking for possible mispellings.

Returns

Function that takes a possible word as a parameter and

returns if that value is considered a word. This function can be passed to hunspellcheck.spellchecker.HunspellChecker.

Return type

function

Hunspell utilities

hunspellcheck.get_hunspell_version(hunspell=True, ispell=True)

Returns the number of version of Hunspell and the version of Ispell that the installed Hunspell program is using.

Parameters
  • hunspell (bool) – Include the version of Hunspell in the response.

  • ispell (bool) – Include the version of Ispell in the response.

Returns

Their fields would be hunspell and ispell, if both included

using the kwargs of this function.

Return type

dict

hunspellcheck.is_valid_dictionary_language(dictionary_name, negotiate_languages=False)

Check if a dictionary name is a valid dictionary installed for your Hunspell version.

Parameters
  • dictionary_name (str) – Dictionary language.

  • negotiate_languages (bool) – Enable language negotiation from locale name to territory.

Returns

Has 3 values:

  • The first value is a boolean and indicates if the language is valid.

  • The second value is the dictionary language name, which could be changed from the input is language negotation is enabled.

  • The third value is a list with all available dictionaries.

Return type

tuple

hunspellcheck.is_valid_dictionary_language_or_filename(value, negotiate_languages=False)

Returns if a value is a valid dictionary language name or an existent file defined by their path.

Parameters
  • value (str) – Dictionary language or filepath.

  • negotiate_languages (bool) – Enable language negotiation from locale name to territory.

Returns

Indicates if is a valid dictionary supported by Hunspell.

Return type

bool

hunspellcheck.assert_is_valid_dictionary_language_or_filename(value, negotiate_languages=False)

Asserts if a value is a valid dictionary language name or an existent file defined by their path. If is not, raises an hunspellcheck.InvalidLanguageDictionaryError.

Parameters
  • value (str, list) – Dictionary language/s or filepath/s.

  • negotiate_languages (bool) – Enable language negotiation from locale name to territory.

hunspellcheck.gen_available_dictionaries(full_paths=False)

Generates the available dictionaries contained inside the search paths configured by hunspell.

These dictionaries can be used without specify the full path to their location in the system calling hunspell, only their name is needed.

Parameters

full_paths (bool) – Yield complete paths to dictionaries (True) or their names only (False).

Yields

str – Dictionary names (locale with territory).

hunspellcheck.gen_available_dictionaries_with_langcodes(sort=True, full_paths=False)

Generates all available dictionaries installed along with their locale names (without territories).

For example, if es_ES is installed, es also will be included in the response.

Parameters

sort (bool) – Sort languages alfabetically.

Yields

str – Locale or dictionary names (locale with territory).

hunspellcheck.list_available_dictionaries(full_paths=False)

Convenient wrapper around the generator hunspellcheck.gen_available_dictionaries() which returns the dictionary names in a list.

Parameters

full_paths (bool) – Print complete paths to dictionaries (True) or their names only (False).

Returns

Available installed dictionaries.

Return type

list

hunspellcheck.print_available_dictionaries(sort=True, stream=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>, full_paths=False)

Prints into an stream the available hunspell dictionaries.

By default are printed to the standard output of the system (STDOUT).

Parameters
  • sort (bool) – Indicates if the dictionaries will be printed in alphabetical order.

  • stream (object) – Stream to which the dictionaries will be printed. Must be any object that accepts a write method.

  • full_paths (bool) – Print complete paths to dictionaries (True) or their names only (False).