Convert Word documents to beautiful Markdown. Via command line or in your browser. An even better version of the original word-to-markdown.
- Paragraphs
- Numbered lists
- Bullet lists
- Nested Lists
- Headings
- Lists
- Tables
- Footnotes and endnotes
- Images
- Bold, italics, underlines, strikethrough, superscript and subscript.
- Links
- Line breaks
- Text boxes
- Comments
TL;DR: This project is a complete rewrite, using modern tools and libraries, and is much faster and more reliable. The output should be the same or better. Feedback welcome!
Word to Markdown is designed with privacy as a core principle. The application operates entirely client-side:
- Complete client-side processing: When using the web interface, all document conversion happens locally in your browser using JavaScript. Your documents never leave your computer.
- No server uploads: When using the web interface, files are processed entirely on your device. No document content is ever transmitted to any server.
- HTTP API option: The optional HTTP API server (for programmatic access) processes documents temporarily on your chosen server without permanent storage or logging.
- No personal data collection: The application does not collect, store, or transmit any personal information or document contents.
- Privacy-first analytics: The hosted version at word2md.com uses only privacy-centric Cloudflare Analytics for anonymous usage statistics. No Google Analytics or user tracking.
- Self-hosting option: For maximum privacy, you can run the application locally or self-host it without any analytics whatsoever.
Whether you use the command line tool, run it locally in your browser, or use the hosted version, your documents and privacy are protected.
- Clone the repo
- Run
npm install
Run w2m path/to/your/file.docx
npm run server:web
You can also run Word to Markdown as an HTTP API server, where you can make requests from elsewhere.
npm run server
The server exposes a POST /raw endpoint, which returns the converted Markdown.
To self-host Word to Markdown using Docker Compose:
- Clone the repository
- Run
npm install && npm run build:web - Run
docker-compose up -d - Access at http://localhost:3000
See the README of the original Word to Markdown for the project's motivation.
The Original Word to Markdown is 10 years old. The conversion process was as follows:
- Use LibreOffice to convert the Word document to HTML.
- Use a bunch of RegEx to clean up the HTML
- User Premailer to inline the CSS
- Use Nokogiri to manipulate the HTML further
- Use Reverse Markdown to convert the HTML to Markdown
- Use a bunch of RegEx to clean up the Markdown
Not only did this process require installing and shelling out to a huge binary (LibreOffice), but it was very fragile, and key projects like Reverse Markdown are no longer maintained. I tried experimenting with Pandoc, but it had many of the same limitation.
- Use Mammoth.js to convert the Word document to HTML.
- Use Turndown to convert the HTML to Markdown.
- Use Markdownlint to clean up the Markdown.
All three of these projects are actively maintained and heavily used, and allows us to convert the document faster, and entirely in JavaScript. Heck, I think theoretically, this could run in the browser for added privacy.
It's still in beta, but so far, I've found the output to be better, with much less manual cleanup required. Notice something is off? Please open an issue.
One note: This project does not yet attempt to guess heading levels based on font size. It could, but it's not yet implemented.