Word to HTML

Edit on GitHub

Bidirectional conversion between Word documents and HTML using OfficeIMO.Word.Html.

The OfficeIMO.Word.Html package provides bidirectional conversion between Word documents and HTML. It uses AngleSharp for HTML parsing and DOM manipulation, and supports both synchronous and asynchronous workflows.

Installation

dotnet add package OfficeIMO.Word.Html

This package depends on OfficeIMO.Word , AngleSharp , and AngleSharp.Css .

Word to HTML

Convert to HTML String

using OfficeIMO.Word;
using OfficeIMO.Word.Html;

using var document = WordDocument.Load("report.docx");

// Convert to a full HTML document
string html = document.ToHtml();
Console.WriteLine(html);

Async Conversion

string html = await document.ToHtmlAsync();

Save as HTML File

document.SaveAsHtml("report.html");

// Async version
await document.SaveAsHtmlAsync("report.html");

Save to a Stream

using var stream = new MemoryStream();
document.SaveAsHtml(stream);

Conversion Options

Customize the conversion with WordToHtmlOptions :

using OfficeIMO.Word;
using OfficeIMO.Word.Html;

using var document = WordDocument.Load("report.docx");

var options = new WordToHtmlOptions {
    // Include font and list styling in the generated HTML
    IncludeFontStyles = true,
    IncludeListStyles = true,

    // Control image handling
    EmbedImagesAsBase64 = true,

    // CSS options
    IncludeDefaultCss = true
};

string html = document.ToHtml(options);

HTML to Word

Create a Word Document from HTML

using OfficeIMO.Word.Html;
using System.Net;

string html = """
<html>
<body>
    <h1>Report Title</h1>
    <p>This is a <strong>bold</strong> and <em>italic</em> paragraph.</p>
    <table>
        <tr><th>Name</th><th>Value</th></tr>
        <tr><td>Alpha</td><td>100</td></tr>
        <tr><td>Beta</td><td>200</td></tr>
    </table>
</body>
</html>
""";

using var document = WebUtility.HtmlDecode(html).LoadFromHtml();
document.Save("from-html.docx");

Async HTML to Word

using var document = await html.LoadFromHtmlAsync();
document.Save("from-html.docx");

HTML to Word Options

using OfficeIMO.Word.Html;

var options = new HtmlToWordOptions {
    FontFamily = "Calibri",
    IncludeListStyles = true,
    BasePath = AppContext.BaseDirectory
};

using var document = html.LoadFromHtml(options);

Adding HTML to an Existing Document

You can inject HTML content into specific parts of an existing Word document:

Append HTML to Document Body

using var document = WordDocument.Create("mixed.docx");
document.AddParagraph("Native OfficeIMO paragraph");

// Append HTML-sourced content
document.AddHtmlToBody("<p>This came from <strong>HTML</strong>.</p>");
document.Save();

Add HTML to Headers

document.AddHeadersAndFooters();
document.AddHtmlToHeader(
    "<p style='text-align: center;'>Company Header</p>",
    HeaderFooterValues.Default
);

Add HTML to Footers

document.AddHtmlToFooter(
    "<p style='font-size: 8pt; color: gray;'>Confidential</p>",
    HeaderFooterValues.Default
);

Async Versions

All AddHtml* methods have async counterparts:

await document.AddHtmlToBodyAsync("<p>Async HTML content</p>");
await document.AddHtmlToHeaderAsync("<p>Header from HTML</p>");
await document.AddHtmlToFooterAsync("<p>Footer from HTML</p>");

Supported HTML Elements

The converter handles the following HTML elements:

HTML ElementWord Equivalent
&lt;h1&gt; -- &lt;h6&gt;Heading1 -- Heading6 paragraph styles
&lt;p&gt;Normal paragraph
&lt;strong&gt; , &lt;b&gt;Bold formatting
&lt;em&gt; , &lt;i&gt;Italic formatting
&lt;u&gt;Underline formatting
&lt;s&gt; , &lt;del&gt;Strikethrough
&lt;table&gt; , &lt;tr&gt; , &lt;td&gt; , &lt;th&gt;Word tables with cells
&lt;ul&gt; , &lt;ol&gt; , &lt;li&gt;Bulleted and numbered lists
&lt;img&gt;Inline images
&lt;a&gt;Hyperlinks
&lt;br&gt;Line breaks
&lt;hr&gt;Horizontal rules
&lt;code&gt; , &lt;pre&gt;Monospace formatting

CSS Style Mapping

The converter maps common CSS properties to Word formatting:

  • font-family maps to Word font family
  • font-size maps to Word font size
  • font-weight: bold maps to bold
  • font-style: italic maps to italic
  • text-align maps to paragraph alignment
  • color maps to text color
  • background-color maps to paragraph shading
  • text-decoration maps to underline/strikethrough