Word to HTML
Edit on GitHubBidirectional conversion between Word documents and HTML using OfficeIMO.Word.Html.
The OfficeIMO.Word.Html package provides bidirectional conversion between Word documents and HTML. It uses AngleSharp for HTML parsing and DOM manipulation, and supports both synchronous and asynchronous workflows.
Installation
dotnet add package OfficeIMO.Word.HtmlThis package depends on OfficeIMO.Word , AngleSharp , and AngleSharp.Css .
Word to HTML
Convert to HTML String
using OfficeIMO.Word;
using OfficeIMO.Word.Html;
using var document = WordDocument.Load("report.docx");
// Convert to a full HTML document
string html = document.ToHtml();
Console.WriteLine(html);Async Conversion
string html = await document.ToHtmlAsync();Save as HTML File
document.SaveAsHtml("report.html");
// Async version
await document.SaveAsHtmlAsync("report.html");Save to a Stream
using var stream = new MemoryStream();
document.SaveAsHtml(stream);Conversion Options
Customize the conversion with WordToHtmlOptions :
using OfficeIMO.Word;
using OfficeIMO.Word.Html;
using var document = WordDocument.Load("report.docx");
var options = new WordToHtmlOptions {
// Include font and list styling in the generated HTML
IncludeFontStyles = true,
IncludeListStyles = true,
// Control image handling
EmbedImagesAsBase64 = true,
// CSS options
IncludeDefaultCss = true
};
string html = document.ToHtml(options);HTML to Word
Create a Word Document from HTML
using OfficeIMO.Word.Html;
using System.Net;
string html = """
<html>
<body>
<h1>Report Title</h1>
<p>This is a <strong>bold</strong> and <em>italic</em> paragraph.</p>
<table>
<tr><th>Name</th><th>Value</th></tr>
<tr><td>Alpha</td><td>100</td></tr>
<tr><td>Beta</td><td>200</td></tr>
</table>
</body>
</html>
""";
using var document = WebUtility.HtmlDecode(html).LoadFromHtml();
document.Save("from-html.docx");Async HTML to Word
using var document = await html.LoadFromHtmlAsync();
document.Save("from-html.docx");HTML to Word Options
using OfficeIMO.Word.Html;
var options = new HtmlToWordOptions {
FontFamily = "Calibri",
IncludeListStyles = true,
BasePath = AppContext.BaseDirectory
};
using var document = html.LoadFromHtml(options);Adding HTML to an Existing Document
You can inject HTML content into specific parts of an existing Word document:
Append HTML to Document Body
using var document = WordDocument.Create("mixed.docx");
document.AddParagraph("Native OfficeIMO paragraph");
// Append HTML-sourced content
document.AddHtmlToBody("<p>This came from <strong>HTML</strong>.</p>");
document.Save();Add HTML to Headers
document.AddHeadersAndFooters();
document.AddHtmlToHeader(
"<p style='text-align: center;'>Company Header</p>",
HeaderFooterValues.Default
);Add HTML to Footers
document.AddHtmlToFooter(
"<p style='font-size: 8pt; color: gray;'>Confidential</p>",
HeaderFooterValues.Default
);Async Versions
All AddHtml* methods have async counterparts:
await document.AddHtmlToBodyAsync("<p>Async HTML content</p>");
await document.AddHtmlToHeaderAsync("<p>Header from HTML</p>");
await document.AddHtmlToFooterAsync("<p>Footer from HTML</p>");Supported HTML Elements
The converter handles the following HTML elements:
| HTML Element | Word Equivalent |
|---|---|
<h1> -- <h6> | Heading1 -- Heading6 paragraph styles |
<p> | Normal paragraph |
<strong> , <b> | Bold formatting |
<em> , <i> | Italic formatting |
<u> | Underline formatting |
<s> , <del> | Strikethrough |
<table> , <tr> , <td> , <th> | Word tables with cells |
<ul> , <ol> , <li> | Bulleted and numbered lists |
<img> | Inline images |
<a> | Hyperlinks |
<br> | Line breaks |
<hr> | Horizontal rules |
<code> , <pre> | Monospace formatting |
CSS Style Mapping
The converter maps common CSS properties to Word formatting:
font-familymaps to Word font familyfont-sizemaps to Word font sizefont-weight: boldmaps to boldfont-style: italicmaps to italictext-alignmaps to paragraph alignmentcolormaps to text colorbackground-colormaps to paragraph shadingtext-decorationmaps to underline/strikethrough