markitdown

v1.0.0

Convert documents (PDF, Word, PowerPoint, Excel, and more) to Markdown for use as spec reference material in Spec Kit workflows.

Community extension — Independently maintained. Use at your own discretion. Learn more

spec-kit-markitdown

Document to Markdown Conversion for Spec Kit — Convert documents (PDF, Word, PowerPoint, Excel, and more) to LLM-friendly Markdown for use as reference material in Spec Kit workflows.

What It Does

This extension provides a command that converts documents into Markdown using Microsoft's markitdown CLI, placing the output directly into your spec directory as reference material.

CommandDescription
/speckit.markitdown.convertConvert a document (PDF, Word, PowerPoint, Excel, etc.) to Markdown and place it in the spec directory as reference material

Key Features

  • Broad format support — PDF, Word, PowerPoint, Excel, HTML, CSV, JSON, XML, images, audio, EPub, and ZIP
  • Spec directory integration — Output defaults to .specify/specs/ so converted files are immediately available in your spec workflow
  • Source metadata headers — Every converted file includes YAML frontmatter linking back to the original document
  • Output quality checks — Flags empty output, garbled text, or missing content with actionable suggestions
  • Azure Document Intelligence support — For higher-quality PDF extraction, users can leverage Azure AI

Prerequisites

1. markitdown CLI

Install Microsoft's markitdown CLI:

pip install 'markitdown[all]'

Or install only the formats you need:

pip install 'markitdown[pdf,docx,pptx,xlsx]'

Verify the installation:

markitdown --version

2. Python 3.10+

markitdown requires Python 3.10 or later:

python --version

3. Spec Kit

This extension requires Spec Kit v0.1.0 or later.


Installation

From the community catalog

specify extension add markitdown

From a local path (for development/testing)

specify extension add ./path/to/spec-kit-markitdown

Usage

Basic usage

/speckit.markitdown.convert file="requirements.pdf"

The command will:

  1. Verify markitdown is installed
  2. Validate the file exists and detect its format
  3. Convert the document to Markdown
  4. Add source metadata headers
  5. Place the output in .specify/specs/

With custom output path

/speckit.markitdown.convert file="design.pptx" output="docs/design-notes.md"

Interactive mode

/speckit.markitdown.convert

If no file is provided, the command will prompt you to specify the document path.

Supported Formats

TierFormatsExtensions
CorePDF, Word, PowerPoint, Excel.pdf, .docx, .pptx, .xlsx
ExtendedHTML, CSV, JSON, XML.html, .csv, .json, .xml
ExtendedImages (EXIF/OCR).jpg, .png
ExtendedAudio (EXIF/transcription).mp3, .wav
ExtendedOther.epub, .zip

Output

File location

By default, converted files are placed at:

  • .specify/specs/<filename>.md — if a .specify/ directory exists
  • Same directory as the input file — otherwise

Metadata format

Each converted file includes a YAML frontmatter header:

---
Source:
  type: document-conversion
  originalFile: "requirements.pdf"
  originalFormat: "pdf"
  convertedAt: "2026-04-23T10:30:00Z"
  converter: "markitdown"
  converterVersion: "0.1.0"
---

Next steps after conversion

The converted Markdown is a reference artifact. Use it in your Spec Kit workflow:

  1. Review the converted content for accuracy
  2. /speckit.specify — Create a specification using the converted document as input
  3. /speckit.plan — Create a technical implementation plan
  4. /speckit.tasks — Generate actionable task breakdown
  5. /speckit.implement — Execute the implementation

Examples

See the docs/examples/ directory for sample outputs:


Known Limitations

  • PDF quality varies — Scanned PDFs or image-heavy PDFs may produce poor text output. For better results, use Azure Document Intelligence (markitdown -d -e "<endpoint>")
  • Image-heavy documents — Embedded images are not converted to text by default. Install the markitdown-ocr plugin for OCR support
  • Large files may be slow — Very large documents (100+ pages) may take significant time to process
  • Format detection — markitdown relies on file extensions for format detection. Misnamed files may not convert correctly

Troubleshooting

ProblemSolution
markitdown: command not foundInstall with pip install 'markitdown[all]'
Python version too oldUpgrade to Python 3.10+
Empty output fileTry Azure Document Intelligence for PDFs: markitdown -d -e "<endpoint>"
Missing dependency for formatInstall the specific extra: pip install 'markitdown[pdf]'
Permission deniedCheck file read permissions on the source document
Garbled text outputThe file may have encoding issues or be a scanned image — try OCR plugin

Development

Project structure

spec-kit-markitdown/
├── extension.yml           # Extension manifest
├── commands/
│   └── convert.md          # Convert command (AI agent instructions)
├── docs/examples/
│   └── convert-example.md  # Example conversion output
├── README.md
├── CHANGELOG.md
└── LICENSE

Testing locally

  1. Clone this repository
  2. Install in a test project: specify extension add ./path/to/spec-kit-markitdown
  3. Ensure markitdown is installed: pip install 'markitdown[all]'
  4. Run /speckit.markitdown.convert against a test document

License

MIT

Stats

2 stars

Version

1.0.0
Updated 22 days ago

Install

Using the Specify CLI

specify extension add markitdown --from https://github.com/BenBtg/spec-kit-markitdown/archive/refs/tags/v1.0.0.zip

Owners

License

MIT