DITA Topic to Keymap Converter

A Python command-line tool that extracts standard references from DITA XML topic files and generates DITA key definition maps.

Overview

This tool parses DITA topic files containing structured lists of standards and documentation references, then automatically generates a DITA keymap file with key definitions that can be reused across your documentation project.

Features

  • Multiple XML Structure Support: Handles various DITA XML structures for standard references
  • Intelligent Parsing: Uses a chain-of-responsibility pattern with specialized handlers for different XML patterns
  • Automatic ID Generation: Creates standardized IDs when not explicitly provided
  • Debugging Tools: Extract and inspect individual elements for troubleshooting
  • Configurable Logging: Multiple verbosity levels for detailed operation insight

Installation

```bash

clone the repository:

git clone git@github.com:elizaluszczyk/dita-topic-to-keymap.git cd dita-topic-to-keymap

install development dependencies:

pip install -e . pip install -r ./requirements/dev.txt pre-commit install ```

Usage

Parse DITA Topic File

Convert a DITA topic file to a keymap:

bash ditatk parse input.xml

With custom output file:

bash ditatk parse input.xml -o standards-keymap.xml

With verbose logging:

```bash

INFO level

ditatk parse input.xml -v

DEBUG level (most detailed)

ditatk parse input.xml -vv ```

Extract Individual Elements

Extract and display a specific list item element (useful for debugging):

bash ditatk extract input.xml 5

This displays the 5th <li> element from the input file.

Supported XML Structures

The tool handles eight different XML patterns commonly found in DITA topics:

1. Keyword with ID

```xml

  • ISO 9001
  • `` **Handler:**KeywordWithIdHandler**Result:**(iso-9001, "ISO 9001")`

    2. Keyword without ID (with list item ID)

    ```xml

  • ISO 14001
  • `` **Handler:**KeywordWithoutIdHandler**Result:**(std-iso-14001, "ISO 14001")`

    3. Keyword without ID (auto-generated)

    ```xml

  • ISO 27001
  • `` **Handler:**KeywordWithoutIdHandler**Result:**(std_iso-27001, "ISO 27001")` - ID auto-generated from keyword text

    4. List Item with Text Only (no keyword, no ID)

    ```xml

  • IEEE 802.11
  • `` **Handler:**ListItemWithoutKeywordHandler**Result:**(std_ieee-802-11, "IEEE 802.11")` - ID auto-generated from text

    5. List Item with ID and Citation (no keyword)

    ```xml

  • NIST Special Publication 800-53
  • `` **Handler:**ListItemWithoutKeywordHandler**Result:**(nist-sp-800-53, "NIST Special Publication 800-53")`

    6. List Item with Citation Containing Text

    ```xml

  • Federal Information Processing Standard 140-2
  • `` **Handler:**ListItemWithCiteHandler**Result:**(fips-140-2, "Federal Information Processing Standard 140-2")` - Uses cite text as description

    7. Citation with Keyword Reference and Tail Text

    ```xml

  • Hypertext Transfer Protocol Version 2 (HTTP/2)
  • `` **Handler:**ListItemWithCiteHandler**Result:**(rfc-7540, "Hypertext Transfer Protocol Version 2 (HTTP/2)")` - Uses only the text following the keyword reference

    8. Keyword Nested in Citation

    ```xml

  • General Data Protection Regulation
  • `` **Handler:**KeywordNestedInCiteHandler**Result:**(gdpr, "General Data Protection Regulation")`

    Handling Unparseable Elements

    Elements Without Descriptions

    The tool may encounter <li> elements that cannot be processed by any handler. This occurs when an element lacks sufficient content to generate a valid keymap entry (e.g., no keyword text, no citation text, or empty content).

    Example scenario: bash $ ditatk parse data/r_standards.xml [2025-10-08 18:41:01,991] dita_topic_to_keymap.cli [WARNING] No handler was able to parse element 286 [2025-10-08 18:41:01,991] dita_topic_to_keymap.cli [WARNING] No handler was able to parse element 287 [2025-10-08 18:41:01,991] dita_topic_to_keymap.cli [WARNING] No handler was able to parse element 288

    Output Format

    Generated keymap files follow DITA map standards:

    ```xml

    Standards and Documentation Key Definitions ISO 9001

    ```

    Handler Chain

    The tool uses specialized handlers in priority order:

    1. KeywordWithIdHandler - Processes <keyword> with explicit id
    2. KeywordWithoutIdHandler - Processes <keyword> without id
    3. ListItemWithoutKeywordHandler - Processes <li> without <keyword>
    4. ListItemWithCiteHandler - Processes <li> with <cite> elements
    5. KeywordNestedInCiteHandler - Processes <keyword> nested in <cite>

    Logging Levels

    • No flag: WARNING (default)
    • -v: INFO
    • -vv: DEBUG

    License

    This project is licensed under the MIT License - see the LICENSE file for details.

    Author

    Eliza Łuszczyk

    Links for dita-topic-to-keymap