This *.docx
file was generated by OpenXmlUtility.GenerateOfficeDocument(package)
which implies that it has no local styles or effects. It depends entirely on the styles and settings of the opening Word application. Eric White’s original HtmlConverter
(used in Open XML Power Tools for PowerShell) has been enhanced and is included in the WordWalkingStick utility (a VSTO add-in). This utility will convert a Word 2010 document to HTML adding support for the following:
Bold |
The word here should be bold. |
Italic |
The word here should be italic. |
Underline |
The word here should be underline. |
Strikethrough |
This entire sentence should be marked strikethrough. |
Subscript/Superscript |
This should be the 1st superscript. And the variable xi should have i as a subscript. |
Small caps |
The next word, Microsoft, should be in small caps. |
Combinations |
The word here should be bold and italic. The word here should be underline and bold. The last word in this sentence should have “everything”: here. |
Block Text |
This Paragraph Style translates into the
To support the |
HTML Cite |
This Character Style is translated into the The book One is about the second Arabic numeral. The |
HTML Code |
This Character Style is translated into the |
HTML Definition |
This Character Style is translated into the The word one represents the famous numeral of unity. |
HTML Preformatted |
This Paragraph Style is translated into the We can indent words with spaces — one two three
|
HTML Sample |
This Character Style is translated into the When counting, we can use words like: one, two or three. |
HTML Typewriter |
The assumption here is that this Word style maps to the |
HTML Variable |
This Character Style is translated into the
It follows that the We have |
List Bullet |
This style is translated into an unordered list (
|
List Paragraph |
This style is translated into an ordered list (
|
Quote |
This Character Style is translated into the When counting, he said the words, “One, two, three…” |
In Microsoft Word, a Character Style like “HTML Code” will clash with the “Hyperlink” Character Style when text marked as HTML Code has a hyperlink assigned to it. The resultant Open XML Word Processing Markup Language might look like this:
<w:hyperlink> <w:r w:rsidRPr="002B526C"> <w:rPr> <w:rStyle w:val="Hyperlink" /> <w:rFonts w:ascii="Consolas" w:hAnsi="Consolas" w:cs="Consolas" /> <w:noProof /> </w:rPr> <w:t>samp</w:t> </w:r> </w:hyperlink>
References to the font Consolas are the only clues that this Hyperlink style was once the HTML Code style.
Line Break ( |
This sentence should break here |
No-break hyphen. |
This sentence contains a word in italics, censor-ious that should not break at the r and i. |
No-break space. |
|
Modern word processing file formats need a standard way to store metadata. The research of Peter Sefton, namely his work, “Embedding metadata and other semantics in word processing documents,” details this issue. After litigation in 2010, Office Word 2010 has only one (legal) way to store metadata through the use of Content Controls. Moreover, Office Word 2010 effectively stands alone, providing metadata entry and a robust API for access and manipulation.
The WordWalkingStick utility supports “micro-formats” that transform Content Controls into HTML. The following table summarizes: