A Python command-line tool that extracts standard references from DITA XML topic files and generates DITA key definition maps.
This tool parses DITA topic files containing structured lists of standards and documentation references, then automatically generates a DITA keymap file with key definitions that can be reused across your documentation project.
```bash
git clone git@github.com:elizaluszczyk/dita-topic-to-keymap.git cd dita-topic-to-keymap
pip install -e . pip install -r ./requirements/dev.txt pre-commit install ```
Convert a DITA topic file to a keymap:
bash
ditatk parse input.xml
With custom output file:
bash
ditatk parse input.xml -o standards-keymap.xml
With verbose logging:
```bash
ditatk parse input.xml -v
ditatk parse input.xml -vv ```
Extract and display a specific list item element (useful for debugging):
bash
ditatk extract input.xml 5
This displays the 5th <li> element from the input file.
The tool handles eight different XML patterns commonly found in DITA topics:
```xml
``
**Handler:**KeywordWithIdHandler**Result:**(iso-9001, "ISO 9001")`
```xml
``
**Handler:**KeywordWithoutIdHandler**Result:**(std-iso-14001, "ISO 14001")`
```xml
``
**Handler:**KeywordWithoutIdHandler**Result:**(std_iso-27001, "ISO 27001")` - ID auto-generated from keyword text
```xml
``
**Handler:**ListItemWithoutKeywordHandler**Result:**(std_ieee-802-11, "IEEE 802.11")` - ID auto-generated from text
```xml
``
**Handler:**ListItemWithoutKeywordHandler**Result:**(nist-sp-800-53, "NIST Special Publication 800-53")`
```xml
``
**Handler:**ListItemWithCiteHandler**Result:**(fips-140-2, "Federal Information Processing Standard 140-2")` - Uses cite text as description
```xml
``
**Handler:**ListItemWithCiteHandler**Result:**(rfc-7540, "Hypertext Transfer Protocol Version 2 (HTTP/2)")` - Uses only the text following the keyword reference
```xml
``
**Handler:**KeywordNestedInCiteHandler**Result:**(gdpr, "General Data Protection Regulation")`
The tool may encounter <li> elements that cannot be processed by any handler. This occurs when an element lacks sufficient content to generate a valid keymap entry (e.g., no keyword text, no citation text, or empty content).
Example scenario:
bash
$ ditatk parse data/r_standards.xml
[2025-10-08 18:41:01,991] dita_topic_to_keymap.cli [WARNING] No handler was able to parse element 286
[2025-10-08 18:41:01,991] dita_topic_to_keymap.cli [WARNING] No handler was able to parse element 287
[2025-10-08 18:41:01,991] dita_topic_to_keymap.cli [WARNING] No handler was able to parse element 288
Generated keymap files follow DITA map standards:
```xml
```
The tool uses specialized handlers in priority order:
<keyword> with explicit id<keyword> without id<li> without <keyword><li> with <cite> elements<keyword> nested in <cite>WARNING (default)-v: INFO-vv: DEBUGThis project is licensed under the MIT License - see the LICENSE file for details.
Eliza Łuszczyk