g_docformatter.collectors

Module with classes to collect tokens.

type g_docformatter.collectors.Paragraph = list[DocstringToken]

A list of DocstringToken objects representing a paragraph of docstring tokens.

class g_docformatter.collectors.CollectorContext(state: str | None = None, paragraphs: list[tuple[str, FrozenParagraph]]=<factory>, current_paragraph: Paragraph = <factory>, base_indent: int = 0)

Bases: object

Context for token collectors.

state: str | None = None

The current paragraph type being collected, or None if no paragraph is currently being collected.

paragraphs: list[tuple[str, FrozenParagraph]]

List of finalized paragraphs, each represented as a tuple of paragraph type and frozen paragraph.

current_paragraph: Paragraph

The current paragraph being collected.

base_indent: int = 0

The base indentation amount of the docstring being collected in number of spaces.

finalize_current(*, new_state: str | None = None) None

Finalize the current paragraph by appending it to the paragraphs list, clear the current paragraph, and optionally update the state.

Keyword Arguments:

new_state – The new state to set after finalizing the paragraph. Default is None.

clear_state() None

Clear the current state.

Sets the state property to an empty string and clears the current_paragraph list.

exception g_docformatter.collectors.CollectorMethodContextError

Bases: RuntimeError

Exception raised if a collector method is called directly, instead of through the collect_tokens method of a TokenCollectorABC subclass.

class g_docformatter.collectors.TokenCollectorMeta(name, bases, namespace, /, **kwargs)

Bases: ABCMeta

Metaclass for TokenCollectorABC to initialize collection_methods class variable.

class g_docformatter.collectors.TokenCollectorABC(settings: FormatterSettings, context: CollectorContext | None = None)

Bases: object

Abstract base class for token collectors.

Loops through an iterable of DocstringToken and collects tokens according to some logic, updating the CollectorContext. Collection methods should be decorated with the @collector_method decorator. Collection methods should return True if they successfully collected the token, and False otherwise. If no collection method collects a token, a warning is emitted. Collection methods are called in the order they are defined in the class.

class CollectorMethod(method: Callable[[T, int, DocstringToken], bool])

Bases: object

Descriptor class for collector methods.

collect_tokens(docstring_tokens: ParagraphArgument, base_indent: int) list[tuple[str, FrozenParagraph]]

Collect tokens from an iterable of DocstringToken and update the collector context.

get_token_indent_diff(token: DocstringToken, base_indent: int | None = None) int

Helper method to get the indent difference, in number of spaces, of a token from the base indent.

Parameters:

token – The token to get the indent difference for.

Keyword Arguments:

base_indent – The base indent to calculate the difference from. If None, the base_indent from the collector context is used.

Returns:

The indent difference in number of spaces, calculated as the length of the token’s text minus the length of the token’s text with leading spaces removed, minus the base indent of the collector context or the provided base_indent.

spaces_to_indent_level(spaces: int) float

Helper method to convert a number of spaces to an indent level based on the settings.

Parameters:

spaces – The number of spaces to convert to an indent level.

Returns:

The indent level, calculated as the number of spaces divided by the indent size specified in the settings.

It is possible for the indent level to be a float if the number of spaces is not a multiple of the indent size. This allows for more precise handling of indentation in cases where the indentation is not consistent.

get_token_indent_level_diff(token: DocstringToken) float

Helper method to get the indent level difference of a token from the base indent.

Parameters:

token – The token to get the indent level difference for.

Returns:

The indent level difference, calculated by getting the indent difference in spaces using get_token_indent_diff and then converting that to an indent level using spaces_to_indent_level.

get_token_indent(token: DocstringToken, *, relative_to: DocstringToken | None = None) int

Helper method to get the indent of a token in number of spaces.

Parameters:

token – The token to get the indent of.

Keyword Arguments:

relative_to – If provided, calculate the indent relative to this token.

Returns:

The indent in number of spaces.

Note

The indent is calculated as the length of the token’s text minus the length of the token’s text with leading spaces removed.

g_docformatter.collectors.collector_method

alias of CollectorMethod

class g_docformatter.collectors.RootCollector(settings: FormatterSettings, context: CollectorContext | None = None)

Bases: TokenCollectorABC

Token collector for the root level of a docstring.

SUMMARY_ONLY = 'SUMMARY_ONLY'

Paragraph type for summary-only docstrings.

SUMMARY = 'SUMMARY'

Paragraph type for the summary section.

EOD = 'EOD'

Paragraph type for the end-of-docstring token (triple quotes).

DESCRIPTION = 'DESCRIPTION'

Paragraph type for description sections.

STD_SECTION = 'STD_SECTION'

‘, ‘Warning:’, etc.).

Type:

Paragraph type for standard sections (IE

Type:

‘Note

LIST_SECTION = 'LIST_SECTION'

‘, ‘Attributes:’, etc.).

Type:

Paragraph type for list sections (IE

Type:

‘Parameters

GENERAL_TOKEN_TYPES = ('REPL_START', 'REPL_CONTINUE', 'CODE_BLOCK_START', 'SPHINX_OPTION', 'STRING', 'LIST_ITEM')

Token types that are a part of paragraphs but do not define paragraph boundaries.

process_summary_only_docstring(token: DocstringToken) bool

Collect summary-only docstrings.

finalize_current_paragraph(token: DocstringToken) bool

Finalize the current paragraph if the token should start a new paragraph.

process_eod_token(token: DocstringToken) bool

When encountering the EOD token, add an EOD paragraph.

start_new_paragraph(token: DocstringToken) bool

Start a new paragraph if the token indicates the start of a new paragraph.

process_current_paragraph_token(token: DocstringToken) bool

Collect tokens for the current paragraph.

class g_docformatter.collectors.BodyTokenCollector(settings: FormatterSettings, context: CollectorContext | None = None)

Bases: TokenCollectorABC

Token collector for the body of a docstring section.

Collects tokens into one of three paragraph types:

  • TEXT for plain text

  • REPL_CODE for REPL-style code blocks (lines starting with >>>)

  • SPHINX_CODE for Sphinx .. code-block:: directives

A new paragraph begins when a REPL_START or CODE_BLOCK_START token is encountered (starting a REPL_CODE or SPHINX_CODE paragraph respectively), or defaults to TEXT for any other token.

Finalization rules differ by paragraph type:

  • TEXT paragraphs are finalized by a blank line (NL token), a REPL_START token, or a CODE_BLOCK_START token.

  • REPL_CODE paragraphs are finalized by a blank line (NL token) or a CODE_BLOCK_START token (but not by a REPL_START token).

  • SPHINX_CODE paragraphs are finalized when a dedented non-blank token is seen (indent_diff <= 0); blank lines within the block are kept.

Any in-progress paragraph is also finalized after all tokens have been collected.

TEXT = 'TEXT'

Paragraph type for text in body paragraphs.

REPL_CODE = 'REPL_CODE'

Paragraph type for REPL code blocks in body paragraphs.

SPHINX_CODE = 'SPHINX_CODE'

Paragraph type for Sphinx code blocks in body paragraphs.

finalize_current_paragraph(token: DocstringToken) bool

Finalize the current paragraph when the incoming token terminates it.

For TEXT and REPL_CODE paragraphs a blank line (NL token) always ends the paragraph and the method returns True so that the caller knows the token wasn’t reprocessed. A new code-block start token also finalizes the current paragraph, but the return value remains False to allow the start token to be handled by subsequent collectors.

SPHINX_CODE paragraphs close when a non-blank token dedented below the level of the opening directive is seen. The return value is always False so that the token triggering the break can be re-evaluated.

start_new_paragraph(token: DocstringToken) bool

Start a new paragraph if the token indicates the start of a new paragraph.

collect_current_paragraph_token(token: DocstringToken) bool

Collect tokens for the current paragraph.

class g_docformatter.collectors.StdSectionTokenCollector(settings: FormatterSettings, context: CollectorContext | None = None)

Bases: TokenCollectorABC

Token collector for standard sections (e.g. Note:, Warning:) of a docstring.

Splits the section into exactly two paragraph types:

  • HEADER: the first token (the section keyword line, e.g. Note:),

immediately finalized as a single-token paragraph.

  • BODY: all remaining tokens collected together into a single paragraph.

The two-paragraph split is intentional. HEADER should be formatted directly by the formatter which uses this collector, while BODY should be passed to a formatter that uses a BodyTokenCollector.

HEADER = 'HEADER'

‘, ‘Warning:’, etc.).

Type:

Paragraph type for the header of a standard section (IE

Type:

‘Note

BODY = 'BODY'

Paragraph type for the body of a standard section.

collect_and_finalize_header(token: DocstringToken) bool

Collect the header token of the standard section and finalize it as a paragraph.

collect_remaining_tokens(token: DocstringToken) bool

Collect the tokens that occur after the header.

class g_docformatter.collectors.ListSectionTokenCollector(settings: FormatterSettings, context: CollectorContext | None = None)

Bases: TokenCollectorABC

Token collector for list sections (e.g. Parameters:, Attributes:) of a docstring.

Splits the section into up to three paragraph types:

  • HEADER: the first token (the section keyword line, e.g. Parameters:), immediately finalized as a single-token paragraph.

  • LIST_ITEM: the opening line of each individual list entry (e.g. param_name (type): description). Each LIST_ITEM paragraph contains exactly one item.

  • LIST_ITEM_BODY: any continuation lines that follow a LIST_ITEM within the same logical entry. These are collected into a single paragraph per entry and are intended to be passed to a formatter that uses a BodyTokenCollector.

HEADER = 'HEADER'

‘, ‘Attributes:’, etc.).

Type:

Paragraph type for the header of a list section (IE

Type:

‘Parameters

LIST_ITEM = 'LIST_ITEM'

Paragraph type for list items in a list section.

LIST_ITEM_BODY = 'LIST_ITEM_BODY'

Paragraph type for any additional paragraphs that occur within a list item, after the initial line.

collect_and_finalize_header(token: DocstringToken) bool

Collect the header token of the list section and finalize it as a paragraph.

finalize_current_paragraph(token: DocstringToken) bool

Finalize the current paragraph if the token should start a new paragraph.

Paragraph boundaries within a list section are driven by indentation and token type. The logic is as follows:

  1. A LIST_ITEM token that is dedented to level 0 or 1, or that matches the indent level of the previous list item, always terminates the current paragraph. This covers the start of a new top-level list item.

  2. When the collector is currently in state LIST_ITEM and an NL token is seen, the paragraph is finalized. Blank lines within a list item body are not handled here; they remain inside LIST_ITEM_BODY paragraphs so that downstream formatters can split them according to normal body rules.

  3. While in LIST_ITEM_BODY state, blank lines and other non-LIST_ITEM tokens do not trigger finalization. Only a LIST_ITEM token that is dedented to level 0 or 1, or that matches the indent level of the previous list item, will end the body paragraph (via rule 1 above).

The method returns True only when the triggering token has been consumed (currently only the NL case), allowing the caller to avoid reprocessing it.

start_new_paragraph(token: DocstringToken) bool

Start a new paragraph if the token indicates a paragraph boundary.

This method is only called when no paragraph is currently in progress (context.state is None). It uses the token’s type and its indentation level-computed relative to context.base_indent-to decide whether a new paragraph should begin, and which kind of list section paragraph it should be.

The header line itself (HEADER state) is handled by collect_and_finalize_header() and therefore is not included in the logic below.

Two paragraph states are possible:

  1. LIST_ITEM - begins when the token is a LIST_ITEM and any of these conditions hold:

    • the token’s indent level is 0 or 1;

    • or there are no previous paragraphs (first item in section);

    • or the last paragraph was not a LIST_ITEM (for example, the header or an item body), giving precedence to a new top-level entry;

    • or the token’s indent exactly matches the indent level of the previous list item (after subtracting base_indent).

    The last two bullets ensure that a deeply-nested item will still start a new LIST_ITEM if it is indented the same as the prior item, and that isolated dedented items are treated as new list entries.

  2. LIST_ITEM_BODY - any token whose indent level is greater than 1 and that does not satisfy the LIST_ITEM criteria above will open a body paragraph. This category includes both non-LIST_ITEM continuation lines and LIST_ITEM tokens that are so deeply indented that they don’t align with the previous item. Paragraphs in this state are intentionally not closed by blank lines; they persist until a new list item or header is encountered.

The method always returns False so that the caller will forward the same token to collect_current_paragraph_token(), ensuring the triggering token is appended to whatever paragraph has just started.

collect_current_paragraph_token(token: DocstringToken) bool

Collect tokens for the current paragraph.

class g_docformatter.collectors.SphinxCodeBlockTokenCollector(settings: FormatterSettings, context: CollectorContext | None = None)

Bases: TokenCollectorABC

Token collector for Sphinx code blocks in a docstring.

Splits the code block into up to three paragraph types:

  • HEADER: the first token (the directive line, e.g. .. code-block:: python), immediately finalized as a single-token paragraph.

  • OPTION: each option line following the header (e.g. :caption: My caption, :linenos:), finalized individually as single-token paragraphs. Zero or more OPTION paragraphs may appear.

  • CODE: all remaining tokens collected together into a single paragraph. Blank lines between the header or options and the first code line are not included in the paragraph.

HEADER = 'HEADER'

Paragraph type for the ..code-block:: directive line that starts a Sphinx code block.

OPTION = 'OPTION'

Paragraph type for an option line in a Sphinx code block (the lines that specify options for the code block, EG: (:caption: This is a caption).

CODE = 'CODE'

Paragraph type for the lines of code in a Sphinx code block.

collect_and_finalize_header(token: DocstringToken) bool

Collect the header token of the Sphinx code block and finalize it as a paragraph.

collect_and_finalize_options(token: DocstringToken) bool

Collect and finalize each option line of the Sphinx code block.

Option lines appear after the header or other option lines.

collect_code(token: DocstringToken) bool

Collect the code lines of the Sphinx code block.

Blank lines (NL tokens) that appear before the first code line are skipped: they are marked as collected to suppress uncollected-token warnings but are not added to the CODE paragraph. Once the first non-blank token is seen, the state transitions to CODE and all subsequent tokens (including blank lines) are appended to the current paragraph.