xml2docx

Important

FOR NOW, THIS IS A DRAFT OF THE DOCUMENTATION. IT REQUIRES IMPROVEMENTS, STYLE, GRAMMAR, SPELLING FIXES, E.T.C. THERE MAY BE SOME THINGS MISSING.

FEEDBACK IS WELCOME. SEE DOCUMENTATION ISSUES ON GITHUB.

xml2docx

The xml2docx is a command line tool and a web tool that creates a Word Document (with the .docx extension) from a simple XML file.

You can start with the tutorial that will guide you through all basic features of this tool.

There are two version of xml2docx:

Tutorial

Empty document

Let's start with an empty document. The input file is an XML with a <document> element on top, so put it to the file and call it hello.xml:

<?xml version="1.0" encoding="UTF-8"?>
<document>

</document>

If you are using a web interface, just put the code in the hello.xml file and click Download document button. Open downloaded file to see the result.

If you are using command line, generate hello.docx output with the following command:

xml2docx hello.xml

Paragraph

Any text in the document must be placed inside paragraphs (<p> tag).

<?xml version="1.0" encoding="UTF-8"?>
<document>
    <p>
        Hello World!
    </p>
</document>

You can manipulate style of the paragraphs using its attributes. Refer the <p> tag documentation for a full list of attributes. For example, align attribute controls the paragraph alignment.

<p align="center">
    Hello World!
</p>

You can break a line without starting a new paragraph using the <br/> tag.

<p>
    Hello<br/>World!
</p>

Paragraph attributes preservation

Useful feature of the paragraphs is a preserve attribute. Setting this attribute to true will reuse the same attributes in following paragraphs until paragraph with any attribute is reached. For example:

<p align="center">
    Centered.
</p>
<p>
    Back to left-alignment.
</p>
<p align="center" preserve="y">
    Centered again, but this time we use "preserve" attribute.
</p>
<p>
    This is also centered since attributes are copied from
    the previous paragraph.
</p>
<p>
    And centered again.
</p>
<p background="yellow">
    Not centered any more because this paragraph has some attributes.
</p>

Tabulators

Use tabulators to align text within paragraphs in table-like structure. Each paragraph defines its own set of tabulator stops using tabs attribute.

<h1>Table of contents</h1>
<p tabs="1cm right, 1.3cm, 15cm dot right" preserve="y">
    <tab/>1.<tab/>Table of contents<tab/>1
</p>
<p>
    <tab/>2.<tab/>Introduction<tab/>2
</p>
<p>
    <tab/>3.<tab/>Content<tab/>4
</p>

Each tabulator stop contains distance (required), alignment (optional, left by default) and leader sign (optional, none by default).

You can add tabulators to text using:

  • actual tabulators character,
  • &#9; symbol entity,
  • <tab/> element.

Sections

A section is a set of pages with specific size, style, headers, e.t.c. You can start a section by putting a <section> element inside <document>. If you start any content before <section>, a new default section will be created.

We will put our a document on landscape A5 paper with 1 cm margins:

<?xml version="1.0" encoding="UTF-8"?>
<document>
    <section
        margin="1cm"
        width="148mm" height="210mm"
        orientation="landscape"
    />
    <p>Hello World!</p>
</document>

Margins

A margin attribute defines four margins separated by spaces or commas. The order is "top", "right", "bottom", "left" (similar to CSS). In other words, the order is clockwise starting from 12 o'clock. If you specify less values, the order is following:

  • one value: "all four margins"
  • two values: "top and bottom", "right and left",
  • three values: "top", "right and left", "bottom".

For example, if you want 2 cm margins on top and bottom, and 1 cm margins on left and right:

<section margin="2cm 1cm"/>

The same rules apply to margins and borders in other tags.

The header and footer are added to <document> element in the <header> and <footer>. They are assigned to recent section.

<?xml version="1.0" encoding="UTF-8"?>
<document>
    <header>
        <p>Page <page-number/> of <total-pages/></p>
    </header>
    <p>Hello World!</p>
    <p page-break="y">Next page</p>
</document>

The <page-number/> and <total-pages/> elements put fields showing current page number and total pages respectively. They can be used only in header and footer.

If you want to have different header or footer on the first page, you have two options.

  1. Create separate section for the first page.

    <section/>
    <title>First page</title>
    <section/>
    <footer>
        <p>Page <page-number/> of <total-pages/></p>
    </footer>
    <p>Next page in a new section.</p>
    
  2. Use page="first" attribute.

    <footer page="first"></footer>
    <footer>
        <p>Page <page-number/> of <total-pages/></p>
    </footer>
    <title>First page</title>
    <p page-break="y">Next page in the same section.</p>
    

Text formatting

You can change text format with the <font> tag. It has numerous attributes controlling the format.

<p>
    <font face="Arial" color="green">Hello</font>
    <font face="Times New Roman" bold="y">World</font>!
</p>

Commonly used styles that have boolean (true/false) values are available as shorthand, e.g. <b>, <i>, <u>.

<p>
    Hello <b>World</b>!
</p>

Tables

Tables must be put on the same level as paragraphs. They cannot be inside paragraphs.

Table structure is similar to HTML, but limited to <table>, <tr> (table row) and <td> (table cell) tags.

<?xml version="1.0" encoding="UTF-8"?>
<document>
    <table>
        <tr><td>Name:</td><td>John Smith</td></tr>
        <tr><td>Job:</td><td>Plumber</td></tr>
    </table>
</document>

This will create an "auto-fit" table which means that column sizes are automatically determined based on the content. You can provide column-widths attribute to have "fixed" table.

<table column-widths="3cm 4cm">
    <tr><td>Name:</td><td>John Smith</td></tr>
    <tr><td>Job:</td><td>Plumber</td></tr>
</table>

Or, you can set width of entire table and allow column to be adjusted accordingly. The width can be also in percentage.

<table width="100%">
    <tr><td>Name:</td><td>John Smith</td></tr>
    <tr><td>Job:</td><td>Plumber</td></tr>
</table>

Images

Images can be added to the <p> element using the <img> tag. You have to provide at least src, width and height attributes.

<p>
    <img src="globe.png" width="1cm" height="1cm"/> - this is globe icon.
</p>

By default it is placed inside text as any other character. You can create "floating" images with the horizontal and vertical attributes which controls vertical and horizontal positioning respectively.

First value inside horizontal and vertical attributes is archon, which tells what is the object from we are calculating the positioning. Second is an offset or alignment. Offset tells what is the distance of image from the archon. Alignment tells how image should be aligned relative to archon.

For example, the following image will be centered horizontally on the page and 1 cm above beginning of current paragraph.

<p>
    <img src="globe.png"
        width="10cm" height="10cm"
        horizontal="page center" vertical="paragraph -1cm"
    />
    This is a text in the same paragraph.
</p>

Command line arguments

USAGE:

xml2docx [options] <input.xml> [output.docx]

Options:

  • <input.xml>

    Input XML file.

  • [output.docx]

    Output document. By default it is <input> with .docx extension.

  • -d <data.json5>

    Interpret the input file as a template and use the <data.json5> file for template input data.

    See Templates for details.

Caution

ACTIVATING THIS OPTION WILL PERMIT THE EXECUTION OF ARBITRARY CODE FROM THE <input.xml> FILE WITHOUT LIMITATIONS. USE ONLY XML FILES FROM A TRUSTED SOURCE.

  • --debug

    Dump intermediate files alongside the output after each step of processing and show more verbose output in case of errors. This option is mainly useful when debugging the template or the tool.

  • --help

    Command line help.

  • --license

    Show license information.

  • --sources

    Dump source files to a _src directory (for debug only).

Attribute values

Some XML attributes contains common values that are described below.

Boolean value

The table below shows all possible strings representing true or false value (case insensitive).

True Value False Value
true false
t f
yes no
y n
1 0
on off

Example:

<font bold="y">Bold text</font>

Universal measure

Universal measures are represented as number followed by unit name. All allowed units are summarized below.

Unit Unit length in millimeters Unit length in inches
mm 1 5/127 ≈ 0.03937
cm 10 50/127 ≈ 0.3937
in 127/5 = 25.4 1
pt 127/360 ≈ 0.3528 1/72 ≈ 0.01389
pi 127/30 ≈ 4.233 1/6 ≈ 0.1667
pc 127/30 ≈ 4.233 1/6 ≈ 0.1667
px 127/480 ≈ 0.2646 1/96 ≈ 0.01042

Example:

<img width="10cm" height="13cm" src="cat.jpeg"/>

Positive universal measure

Positive universal measure is similar to Universal measure except that the negative numbers are not allowed.

Color

Color is represented as:

  • HTML 6 digit hex number preceded by # character,
  • HTML 3 digit hex number preceded by # character,
  • CSS color name.
aliceblue
antiquewhite
aqua
aquamarine
azure
beige
bisque
black
blanchedalmond
blue
blueviolet
brown
burlywood
cadetblue
chartreuse
chocolate
coral
cornflowerblue
cornsilk
crimson
cyan
darkblue
darkcyan
darkgoldenrod
darkgray
darkgreen
darkgrey
darkkhaki
darkmagenta
darkolivegreen
darkorange
darkorchid
darkred
darksalmon
darkseagreen
darkslateblue
darkslategray
darkslategrey
darkturquoise
darkviolet
deeppink
deepskyblue
dimgray
dimgrey
dodgerblue
firebrick
floralwhite
forestgreen
fuchsia
gainsboro
ghostwhite
gold
goldenrod
gray
green
greenyellow
grey
honeydew
hotpink
indianred
indigo
ivory
khaki
lavender
lavenderblush
lawngreen
lemonchiffon
lightblue
lightcoral
lightcyan
lightgoldenrodyellow
lightgray
lightgreen
lightgrey
lightpink
lightsalmon
lightseagreen
lightskyblue
lightslategray
lightslategrey
lightsteelblue
lightyellow
lime
limegreen
linen
magenta
maroon
mediumaquamarine
mediumblue
mediumorchid
mediumpurple
mediumseagreen
mediumslateblue
mediumspringgreen
mediumturquoise
mediumvioletred
midnightblue
mintcream
mistyrose
moccasin
navajowhite
navy
oldlace
olive
olivedrab
orange
orangered
orchid
palegoldenrod
palegreen
paleturquoise
palevioletred
papayawhip
peachpuff
peru
pink
plum
powderblue
purple
rebeccapurple
red
rosybrown
royalblue
saddlebrown
salmon
sandybrown
seagreen
seashell
sienna
silver
skyblue
slateblue
slategray
slategrey
snow
springgreen
steelblue
tan
teal
thistle
tomato
turquoise
violet
wheat
white
whitesmoke
yellow
yellowgreen

Document

<document>

Top level document element.

  • title [optional]

    Title in document properties.

  • subject [optional]

    Subject in document properties.

  • creator [optional]

    Creator name in document properties.

  • keywords [optional]

    Keywords in document properties.

  • description [optional]

    Description in document properties.

  • last-modified-by [optional]

    Last modified by name in document properties.

Page header or footer.

  • page [optional]

    On which page this header or footer will be displayed. Enumeration values:

    • default
    • even
    • first

    Using first page automatically enables title page in current section.

<section>

Section.

  • border-display [optional]

    On which pages display the borders. Enumeration values:

    • all-pages
    • first-page
    • not-first-page
  • border-offset-from [optional]

    The base from the border distance should be calculated. Enumeration values:

    • page
    • text
  • border-z-order [optional]

    Defines if border should be above or below content. Enumeration values:

    • back
    • front
  • margin="top left bottom right" [optional]

    Page margins. Positive universal measure.

    • top - Top margin.
    • right - Right margin. Default: the same as top.
    • bottom - Bottom margin. Default: the same as top.
    • left - Left margin. Default: the same as right.
  • title-page [optional]

    Enable title page in this section. Boolean value.

  • type [optional]

    Section type. Enumeration values:

    • continuous
    • even-page
    • next-column
    • next-page
    • odd-page
  • vertical-align [optional]

    Vertical alignment. Enumeration values:

    • bottom
    • center
    • top
  • border="top, left, bottom, right" [optional]

    Page borders.

    • top - Top border.
    • right - Right border. Default: the same as top.
    • bottom - Bottom border. Default: the same as top.
    • left - Left border. Default: the same as right.

    Each side of the border is color style size space:

    • color - Border color. Hex color value or color name.
    • style - Border style. Enumeration values:
      • dash-dot-stroked
      • dash-small-gap
      • dashed
      • dot-dash
      • dot-dot-dash
      • dotted
      • double
      • double-wave
      • inset
      • nil
      • none
      • outset
      • single
      • thick
      • thick-thin-large-gap
      • thick-thin-medium-gap
      • thick-thin-small-gap
      • thin-thick-large-gap
      • thin-thick-medium-gap
      • thin-thick-small-gap
      • thin-thick-thin-large-gap
      • thin-thick-thin-medium-gap
      • thin-thick-thin-small-gap
      • three-d-emboss
      • three-d-engrave
      • triple
      • wave
    • size - Border size. Positive universal measure without zero.
    • space - Space between border and content. Positive universal measure.
  • header-margin [optional]

    Header margin length. Positive universal measure.

  • footer-margin [optional]

    Footer margin length. Positive universal measure.

  • gutter-margin [optional]

    Gutter margin length. Positive universal measure.

  • width [optional]

    Page width. Positive universal measure.

  • height [optional]

    Page height. Positive universal measure.

  • orientation [optional]

    Page orientation. Enumeration values:

    • landscape
    • portrait

<total-pages/>

Adds total pages count. Can be used only in header and footer.

<page-number/>

Adds current page number. Can be used only in header and footer.

Paragraphs

<p>

Paragraph.

The paragraph contains formatted text and images. Any whitespaces at the beginning and end of the paragraph are removed.

You can avoid repeating the same attributes with preserve attribute. Paragraphs can preserve its attributes if preserve attribute is set to true. All following paragraphs without any attributes will reuse the preserved attributes. You can stop reusing attributes if you specify at least one attribute in new paragraph.

Default text format in the paragraph can be changed using attributes with the the font- prefix from the <font> tag.

Paragraph

  • preserve [optional]

    Preserve the attributes. See description above. Boolean value.

  • border="top, left, bottom, right" [optional]

    Paragraph border.

    • top - Top border.
    • right - Right border. Default: the same as top.
    • bottom - Bottom border. Default: the same as top.
    • left - Left border. Default: the same as right.

    Each side of the border is color style size space:

    • color - Border color. Hex color value or color name.
    • style - Border style. Enumeration values:
      • dash-dot-stroked
      • dash-small-gap
      • dashed
      • dot-dash
      • dot-dot-dash
      • dotted
      • double
      • double-wave
      • inset
      • nil
      • none
      • outset
      • single
      • thick
      • thick-thin-large-gap
      • thick-thin-medium-gap
      • thick-thin-small-gap
      • thin-thick-large-gap
      • thin-thick-medium-gap
      • thin-thick-small-gap
      • thin-thick-thin-large-gap
      • thin-thick-thin-medium-gap
      • thin-thick-thin-small-gap
      • three-d-emboss
      • three-d-engrave
      • triple
      • wave
    • size - Border size. Positive universal measure without zero.
    • space - Space between border and content. Positive universal measure.
  • page-break [optional]

    Force page break before this paragraph. Boolean value.

  • tabs="position type leader, ..." [optional]

    Tabulator stops.

    • type [optional] - Type of tab. Enumeration values:
      • bar
      • center
      • clear
      • decimal
      • end
      • left
      • num
      • right
      • start
    • leader [optional] - Type of tab leader. Enumeration values:
      • dot
      • hyphen
      • middle-dot
      • none
      • underscore
    • position [required] - Tab position. Universal measure.
  • spacing="before after contextual" [optional]

    Vertical spacing of the paragraph.

  • line-spacing="exactly|at-least distance|multiple" [optional]

    Spacing between lines.

    • exactly|at-least [optional] - Use exactly or at least the value. at-least by default.
    • distance [optional] - Absolute distance. Positive universal measure without zero.
    • multiple [optional] - Multiple of one line, fractions allowed.

    Provide exactly one of distance or multiple.

  • align [optional]

    Text alignment. Enumeration values:

    • center
    • distribute
    • end
    • high-kashida
    • justify (aliases: justified, both)
    • left
    • low-kashida
    • medium-kashida
    • num-tab
    • right
    • start
    • thai-distribute
  • indent="left right first-line" [optional]

    Text indentation.

  • keep-lines [optional]

    Keep text lines. Boolean value.

  • keep-next [optional]

    Keep next. Boolean value.

  • outline="positive integer" [optional]

    Outline level if this paragraph should be part of document outline.

<title> <h1> <h2> <h3>

Header paragraphs. They takes the same attributes as the <p> tag.

When preserving attributes with the preserve, each tag preserves the attributes by its own, so preserving attributes for <h1> does not affect the <p> elements.

<tab/>

Adds tabulation.

<br/>

Adds line break without breaking the paragraph.

<vwnbsp>

If used alone <vwnbsp/>, adds "zero width no-break space" and "normal space" characters which is workaround to achieve "variable width no-break space" in docx. If used with content inside, replaces all "no-break spaces" with "variable width no-break space" sequences. This workaround works with a desktop Word application. It will not work in browsers and probably in other applications.

<p-style>

Define a paragraph style.

Default font style inside paragraph can be set using <font> element inside this element.

  • id [required]

    Style id. Use it to identify the style.

  • based-on [optional]

    Style id of the parent style.

  • name [required]

    User friendly name of the style.

  • next [optional]

    Id if style for new paragraphs following this style.

  • spacing="before after contextual" [optional]

    Vertical spacing of the paragraph.

  • line-spacing="exactly|at-least distance|multiple" [optional]

    Spacing between lines.

    • exactly|at-least [optional] - Use exactly or at least the value. at-least by default.
    • distance [optional] - Absolute distance. Positive universal measure without zero.
    • multiple [optional] - Multiple of one line, fractions allowed.

    Provide exactly one of distance or multiple.

  • align [optional]

    Text alignment. Enumeration values:

    • center
    • distribute
    • end
    • high-kashida
    • justify (aliases: justified, both)
    • left
    • low-kashida
    • medium-kashida
    • num-tab
    • right
    • start
    • thai-distribute
  • indent="left right first-line" [optional]

    Text indentation.

  • keep-lines [optional]

    Keep text lines. Boolean value.

  • keep-next [optional]

    Keep next. Boolean value.

  • outline="positive integer" [optional]

    Outline level if this paragraph should be part of document outline.

Formatting

<font>

Change the font options. Alias: <span>.

You can use font options shorthand tags that works the same as <font>, but additionally sets specific attribute.

For example:

<b>Bold text</b> is a shorthand for <font bold="y">bold text</font>.

The shorthand tags are: <b> (alias <bold>),   <i> (alias <italics>),   <u> (alias <underline>),   <s> (alias <strike>),   <double-strike>,   <sub> (alias <sub-script>),   <sup> (alias <super-script>),   <small-caps>,   <all-caps>,   <emboss>,   <imprint>,   <vanish>,   <spec-vanish>,   <no-proof>,   <snap-to-grid>,   <math>,   <bold-complex-script>,   <italics-complex-script>,   <right-to-left>.

  • style [optional]

    Font style id.

  • avoid-orphans="positive integer" [optional]

    Avoid orphans at the end of line by replacing space after them with the "no-break space". The value is maximum number of orphan characters, mostly 1 or 2. The 0 value disables it. If vwnbsp font attribute or tag is active "variable width no-break space" sequence will be used instead of "no-break space".

  • underline="type color" [optional]

    Text underline.

    • type - Underline type. Enumeration values:
      • dash
      • dashdotdotheavy (alias dash-dot-dot-heavy)
      • dashdotheavy (alias dash-dot-heavy)
      • dashedheavy (alias dashed-heavy)
      • dashlong (alias dash-long)
      • dashlongheavy (alias dash-long-heavy)
      • dotdash (alias dot-dash)
      • dotdotdash (alias dot-dot-dash)
      • dotted
      • dottedheavy (alias dotted-heavy)
      • double
      • none
      • single
      • thick
      • wave
      • wavydouble (alias wavy-double)
      • wavyheavy (alias wavy-heavy)
      • words
    • color - Underline color. Hex color value or color name.
  • color [optional]

    Text color. Hex color value or color name.

  • kern [optional]

    Text kerning. Positive universal measure.

  • position [optional]

    Position. Universal measure.

  • size [optional]

    Font size. Positive universal measure.

  • font [optional]

    Font name.

  • face [optional]

    Alias of font attribute.

  • family [optional]

    Alias of font attribute.

  • highlight [optional]

    Text Highlighting. Enumeration values:

    • black
    • blue
    • cyan
    • dark-blue
    • dark-cyan
    • dark-gray
    • dark-green
    • dark-magenta
    • dark-red
    • dark-yellow
    • green
    • light-gray
    • magenta
    • red
    • white
    • yellow
  • background [optional]

    Background color. Hex color value or color name.

  • border="color style size space" [optional]

    Border around the text.

    • color - Border color. Hex color value or color name.
    • style - Border style. Enumeration values:
      • dash-dot-stroked
      • dash-small-gap
      • dashed
      • dot-dash
      • dot-dot-dash
      • dotted
      • double
      • double-wave
      • inset
      • nil
      • none
      • outset
      • single
      • thick
      • thick-thin-large-gap
      • thick-thin-medium-gap
      • thick-thin-small-gap
      • thin-thick-large-gap
      • thin-thick-medium-gap
      • thin-thick-small-gap
      • thin-thick-thin-large-gap
      • thin-thick-thin-medium-gap
      • thin-thick-thin-small-gap
      • three-d-emboss
      • three-d-engrave
      • triple
      • wave
    • size - Border size. Positive universal measure without zero.
    • space - Space between border and content. Positive universal measure.
  • scale="positive number" [optional]

    Font scale.

The following attributes are optional boolean values defining the text format.

  • bold
  • italics
  • strike
  • double-strike
  • sub (alias sub-script)
  • super (alias super-script)
  • small-caps
  • all-caps
  • emboss
  • imprint
  • vanish
  • spec-vanish
  • no-proof
  • snap-to-grid
  • math
  • bold-complex-script
  • italics-complex-script
  • size-complex-script
  • highlight-complex-script
  • right-to-left
  • vwnbsp - see <vwnbsp> tag

<font-style>

Define a font style.

This tag inherits all the attributes from the <font> tag except style attribute. It also defines the following own attributes:

  • id [required]

    Style id. Use it to identify the style.

  • based-on [optional]

    Style id of the parent style.

  • name [required]

    User friendly name of the style.

Images

<img>

Adds image to the document.

You must put it into <p> element. Suggested image formats are: JPEG and PNG. It also supports BMP and GIF, but those are not recommended.

One of the src and data attributes is required. They are mutually exclusive, so use exactly one of them.

ImageRun

  • margin="top left bottom right" [optional]

    Margins around the image. Positive universal measure.

    • top - Top margin.
    • right - Right margin. Default: the same as top.
    • bottom - Bottom margin. Default: the same as top.
    • left - Left margin. Default: the same as right.
  • src [optional]

    Image source path. An absolute path or a path relative to main input file.

  • data [optional]

    Raw image data in BASE-64 encoding.

  • width [required]

    Width of the image. Positive universal measure without zero.

  • height [required]

    Height of the image. Positive universal measure without zero.

  • rotate="integer" [optional]

    Clockwise rotation in degrees.

  • flip [optional]

    Image flip. Combination of horizontal (mirror) and vertical. You can also use also short forms: h and v.

  • allow-overlap [optional]

    Allow overlapping. Boolean value.

  • behind-document [optional]

    Put image behind text. Boolean value.

  • layout-in-cell [optional]

    Layout in cell. Boolean value.

  • lock-anchor [optional]

    Lock image archon in single place. Boolean value.

  • z-index="integer" [optional]

    Image z-index. Decides which image is on top another.

  • horizontal="anchor align|offset" [optional]

    Horizontal position in floating mode.

    • anchor - Archon from which position is relative to. Enumeration values:
      • character
      • column
      • inside-margin
      • left-margin
      • margin
      • outside-margin
      • page
      • right-margin
    • align - Image alignment relative to archon. Enumeration values:
      • center
      • inside
      • left
      • outside
      • right
    • offset - Offset of absolute position from the archon. Universal measure.

    The align and offset fields are mutually exclusive. Specify just one of them.

    You must provide both vertical and horizontal attributes or none. Specifying just one of them is an error.

  • vertical="anchor align|offset" [optional]

    Vertical position in floating mode.

    • anchor - Archon from which position is relative to. Enumeration values:
      • bottom-margin
      • inside-margin
      • line
      • margin
      • outside-margin
      • page
      • paragraph
      • top-margin
    • align - Image alignment relative to archon. Enumeration values:
      • bottom
      • center
      • inside
      • outside
      • top
    • offset - Offset of absolute position from the archon. Universal measure.

    The align and offset fields are mutually exclusive. Specify just one of them.

    You must provide both vertical and horizontal attributes or none. Specifying just one of them is an error.

  • wrap="side type" [optional]

    Text wrapping around the image.

    • side - Wrapping side. Enumeration values:
      • both-sides
      • largest
      • left
      • right
    • type - Wrapping type. Enumeration values:
      • none
      • square
      • tight
      • top-and-bottom

Tables

<table>

Table.

Child elements of the row are <tr> (or its associated docx.js API class) or <tc>.

All attributes starting with td-, tr-, tc-, p- and font- prefixes will be passed to all cells, rows, columns, paragraphs (as preserved attributes) and paragraphs default text format.

Table.

  • horizontal="anchor absolute|relative" [optional]

    Horizontal floating position.

    • anchor - Archon from which position is relative to. Enumeration values:
      • margin
      • page
      • text
    • absolute - Absolute position. Universal measure.
    • relative - Relative position. Enumeration values:
      • center
      • inside
      • left
      • outside
      • right

    The absolute and relative fields are mutually exclusive. Specify just one of them.

  • vertical="anchor absolute|relative" [optional]

    Vertical floating position.

    • anchor - Archon from which position is relative to. Enumeration values:
      • margin
      • page
      • text
    • absolute - Absolute position. Universal measure.
    • relative - Relative position. Enumeration values:
      • bottom
      • center
      • inline
      • inside
      • outside
      • top

    The absolute and relative fields are mutually exclusive. Specify just one of them.

  • float-margins="top left bottom right" [optional]

    Distance between table and surrounding text in floating mode. Positive universal measure.

    • top - Top margin.
    • right - Right margin. Default: the same as top.
    • bottom - Bottom margin. Default: the same as top.
    • left - Left margin. Default: the same as right.
  • border="top, left, bottom, right" [optional]

    Table border.

    • top - Top border.
    • right - Right border. Default: the same as top.
    • bottom - Bottom border. Default: the same as top.
    • left - Left border. Default: the same as right.

    Each side of the border is color style size space:

    • color - Border color. Hex color value or color name.
    • style - Border style. Enumeration values:
      • dash-dot-stroked
      • dash-small-gap
      • dashed
      • dot-dash
      • dot-dot-dash
      • dotted
      • double
      • double-wave
      • inset
      • nil
      • none
      • outset
      • single
      • thick
      • thick-thin-large-gap
      • thick-thin-medium-gap
      • thick-thin-small-gap
      • thin-thick-large-gap
      • thin-thick-medium-gap
      • thin-thick-small-gap
      • thin-thick-thin-large-gap
      • thin-thick-thin-medium-gap
      • thin-thick-thin-small-gap
      • three-d-emboss
      • three-d-engrave
      • triple
      • wave
    • size - Border size. Positive universal measure without zero.
    • space - Space between border and content. Positive universal measure.
  • inside-border="horizontal, vertical" [optional]

    Default border between cells.

    • horizontal - Horizontal borders.
    • vertical - Vertical borders.

    Each type of the border is color style size space:

    • color - Border color. Hex color value or color name.
    • style - Border style. Enumeration values:
      • dash-dot-stroked
      • dash-small-gap
      • dashed
      • dot-dash
      • dot-dot-dash
      • dotted
      • double
      • double-wave
      • inset
      • nil
      • none
      • outset
      • single
      • thick
      • thick-thin-large-gap
      • thick-thin-medium-gap
      • thick-thin-small-gap
      • thin-thick-large-gap
      • thin-thick-medium-gap
      • thin-thick-small-gap
      • thin-thick-thin-large-gap
      • thin-thick-thin-medium-gap
      • thin-thick-thin-small-gap
      • three-d-emboss
      • three-d-engrave
      • triple
      • wave
    • size - Border size. Positive universal measure without zero.
    • space - Space between border and content. Positive universal measure.
  • column-widths [optional]

    List of columns widths for fixed table layout. Positive universal measure.

  • align [optional]

    Table alignment. Enumeration values:

    • center
    • distribute
    • end
    • high-kashida
    • justified (alias both)
    • left
    • low-kashida
    • medium-kashida
    • num-tab
    • right
    • start
    • thai-distribute
  • width [optional]

    Table width. It can be expressed as percentage of entire available space (with % sign) or straightforward distance. Positive universal measure.

  • cell-margin="top left bottom right" [optional]

    Default cell margins. Positive universal measure.

    • top - Top margin.
    • right - Right margin. Default: the same as top.
    • bottom - Bottom margin. Default: the same as top.
    • left - Left margin. Default: the same as right.
  • overlap [optional]

    Enable overlapping for floating mode. Boolean value.

<tr>

Table row.

Child elements of the row are <td> (or its associated docx.js API class).

All attributes starting with td-, p- and font- prefixes will be passed to all cells, paragraphs (as preserved attributes) and paragraphs default text format.

TableRow.

  • cant-split [optional]

    Row can be splitted into multiple pages. Boolean value.

  • header [optional]

    This row is a table header. Boolean value.

  • height="rule value" [optional]

    Table height.

    • rule - Rule how the row height is determined. Enumeration values:
      • atleast (alias at-least)
      • auto
      • exact
    • value - Height value. Positive universal measure.

<tc>

Table column.

This element has no children, instead it defines a column of a table. Actual cells are located in rows.

All attributes starting with td-, p- and font- prefixes will be passed to all cells, paragraphs (as preserved attributes) and paragraphs default text format.

  • colspan="non-zero positive integer" [optional]

    Tells for how many columns this element applies to.

  • width [optional]

    Sets width of the column. Positive universal measure without zero.

<td>

Table cell.

Child elements of the cell must be <p> or <table> (or its associated docx.js API classes). If they are not, then the content of the cell will be put into automatically generated <p> element.

The cell will inherit all attributes from associated <table>, <tc>, and <tr> elements that are prefixed by td-. If single attribute comes from different sources, then the priority is following: current <td> element, inherited from <tr> element, inherited from <tc> element, inherited from <table> element.

All attributes starting with p- and font- prefixes will be passed to all paragraphs (as preserved attributes) and paragraphs default text format.

TableCell.

  • border="top, left, bottom, right" [optional]

    Cell border.

    • top - Top border.
    • right - Right border. Default: the same as top.
    • bottom - Bottom border. Default: the same as top.
    • left - Left border. Default: the same as right.

    Each side of the border is color style size space:

    • color - Border color. Hex color value or color name.
    • style - Border style. Enumeration values:
      • dash-dot-stroked
      • dash-small-gap
      • dashed
      • dot-dash
      • dot-dot-dash
      • dotted
      • double
      • double-wave
      • inset
      • nil
      • none
      • outset
      • single
      • thick
      • thick-thin-large-gap
      • thick-thin-medium-gap
      • thick-thin-small-gap
      • thin-thick-large-gap
      • thin-thick-medium-gap
      • thin-thick-small-gap
      • thin-thick-thin-large-gap
      • thin-thick-thin-medium-gap
      • thin-thick-thin-small-gap
      • three-d-emboss
      • three-d-engrave
      • triple
      • wave
    • size - Border size. Positive universal measure without zero.
    • space - Space between border and content. Positive universal measure.
  • colspan="non-zero positive integer" [optional]

    Number of spanning columns.

  • rowspan="non-zero positive integer" [optional]

    Number of spanning rows.

  • margin="top left bottom right" [optional]

    Cell inner margins. Positive universal measure.

    • top - Top margin.
    • right - Right margin. Default: the same as top.
    • bottom - Bottom margin. Default: the same as top.
    • left - Left margin. Default: the same as right.
  • dir [optional]

    Text direction. Enumeration values:

    • bottom-to-top (aliases: bottom-to-top-left-to-right, bt-lr)
    • left-to-right (aliases: left-to-right-top-to-bottom, lr-tb)
    • top-to-bottom (aliases: top-to-bottom-right-to-left, tb-rl)
  • valign [optional]

    Vertical alignment. Enumeration values:

    • bottom
    • middle (alias center)
    • top
  • background [optional]

    Background color. Hex color value or color name.

Templates

When you using JSON5 data file input, the XML file becomes a template. You can use JavaScript expressions and statements there.

Caution

ACTIVATING THIS OPTION WILL PERMIT THE EXECUTION OF ARBITRARY CODE FROM THE TEMPLATE FILE WITHOUT LIMITATIONS. USE ONLY XML FILES FROM A TRUSTED SOURCE.

Data file

The data file is a file containing JSON5. The JSON5 format is an extension to standardized JSON format. The JSON5 is backward compatible with it.

Example JSON5 file:

{
    /* "text" is a string */
    text: "What would you like to eat?",

    /* "formattedText" contains some XML tags. */
    formattedText: "What <b color=\"red\">snacks</b> do you want?",

    /* "fruit" is an object with an apple details. */
    fruit: {
        name: "Apple",
        size: "Large",
        color: "Red",
        pieces: 4,
    },

    /* "snacks" is an array of stack names. */
    snacks: [ "Popcorn", "Chocolate", "Crisps" ],
}

Interpolate <% … %>, <%= … %>

You can place some data in the document by interpolating it with the <% … %>.

All the following examples uses sample data file from section above.

<p>Max: <% text %></p>
<p>Zoe: <% fruit.size %> <% fruit.color %> <% fruit.name %>.</p>

You can put any JavaScript expression int the <% … %>.

<p>Max: How many pieces?</p>
<p>Zoe: I want <% fruit.pieces %> piece<% (fruit.pieces != 1) 's' : '' %>.</p>

The <% … %> interpolation do XML-escaping, which means that you cannot add any XML in it. If it is not your intention, you can use interpolation without escaping <%= … %>.

<p>Max: <%= formattedText %></p>

Execute <%! … %>

You can execute a JavaScript statement without interpolating. It is useful for if, for, e.t.c.

<p>Zoe: <%! for (let snack of snacks) { %>
            <% snack %>,
        <%! } %>
        and that's all.</p>

Utils

Except data from your data file, you have also access to utils object.

It contains some utility properties and functions:

  • utils.templateFile - path to current template file,

  • utils.templateDir - path to directory containing current template,

  • utils.dataFile - path to data file,

  • utils.data - your data file as an object,

  • utils.include(file: string): string - a function that reads and executes another template file and returns result as a string. File path is relative to templateDir.

    You can use it to add plain text from a file using escaped interpolation.

    <% utils.include("my_file.txt") %>

    Or, you can add another XML file using interpolation without escaping.

    <%= utils.include("my_file.xml") %>

    Or, you can include JavaScript file using eval() function.

    <%! eval(utils.include("my_script.js")) %>

If your data file overrides utils object, you can use alias name __utils__.

Advanced XML file syntax

The following sections describe more advanced XML syntax that you can use.

Aliases allow you to reuse the same parts of XML in multiple places.

Raw docx.js allows you to use features that are available on underlying docx.js API, but may be not available using default tags.

Aliases

An alias serves as a method to avoid repeating the same pattern in many different places or organize your XML structure efficiently.

After parsing the source XML, the converter do alias resolution pass that produces new XML with all aliases resolved to actual content.

Alias definition <DEF:…>

You must define an alias anywhere before first use. After alias resolution pass, the definition element will be removed.

The alias definition syntax is following:

<DEF:ALIAS_NAME attr1="value1" attr2="value2" ...>
    ...
    children
    ...
</DEF:ALIAS_NAME>

ALIAS_NAME must be valid XML tag name. Good practice is using upper case letters to avoid collisions with other tags. Attributes and children are optional.

Now, you can place children of this definition anywhere or merge it with any other element.

Place

You can use alias as an XML element. <ALIAS_NAME/> will be replaced by all the children from this alias.

For example:

<!-- Alias definition -->
<DEF:WARNING_SIGN><b color="red">WARNING!</b></DEF:WARNING_SIGN>

<!-- Alias usage -->
<WARNING_SIGN/> Mind the gap

<!-- It will produce -->
<b color="red">WARNING!</b> Mind the gap

Merge

You can merge alias into another element. <…:ALIAS_NAME> will produce element that have all the attributes and children from both alias and element.

Attributes from local element will override attributes with the same name from the alias. Children from alias will be before children of the local element.

For example:

<!-- Alias definition -->
<DEF:WARNING_PARAGRAPH border="single red 1mm 2mm">
    <b color="red">WARNING!</b>
</DEF:WARNING_PARAGRAPH>

<!-- Alias usage -->
<p:WARNING_PARAGRAPH>
    Mind the gap
</p:WARNING_PARAGRAPH>

<!-- It will produce -->
<p border="single red 1mm 2mm">
    <b color="red">WARNING!</b> Mind the gap
</p>

Raw docx.js access

Note

You should have a basic knowledge of JavaScript to read this section.

The xml2docx tool is using docx.js as underlying docx output. Not all features of docx.js are available with the xml2docx, but you can access most of them using XML that maps directly to docx.js API.

Class creation

If you want to create a docx.js class inside any element, you can simply use its name as a tag. For example:

<p>
    First page.
    <!-- Create PageBreak class object directly into paragraph. -->
    <PageBreak/>
    Second page.
</p>

When you creating a class, everything inside the element will be converted to JavaScript object and will go into the first constructor's parameter. The first parameter is the options parameter in the most of docx.js API.

Once you are inside the docx.js class, you are in a different context and normal XML tags does not apply there anymore. Everything inside this context becomes a JavaScript value. Let's call this "API context" and the rest "normal context".

The following pseudo-classes can be used inside the <document> element. They will be added to appropriate place in the document structure.

  • Section - implements ISectionOptions,
  • ParagraphStyle - implements IParagraphStyleOptions,
  • CharacterStyle - implements ICharacterStyleOptions.

Property

You can use docx.js API on normal tags by adding properties to them. A property is a child element with :property appended to its name. The context is switched to API context on this element. All properties must be placed before any other child element. Property overrides corresponding attribute of the parent element.

For example, the emphasisMark property of the IRunOptions interface is not exposed in the <font> tag, but you can set it with the :property.

<font><emphasisMark:property type="dot"/>DOTS</font>

The property element is already in API context, so you can also use filters in it. See Filters for details.

<img data="" width="1cm" height="1cm">
    <data:base64:textFile:property>picture.png.b64</data:base64:textFile:property>
</img>

The image data above is read from a text file, decoded with BASE-64, and passed to the data of the image. Dummy attribute data="" will be overwritten, but it must be there since <img> tag requires it.

API context

When you create an object from the API class, you switching to API context. For example:

<TextRun font="Arial" highlight="lightGray">
    <text>This is text.</text>
</TextRun>

Above code will create a JavaScript object with font and highlight properties taken from the attributes and one property text taken from the child. The object will be passed to the constructor of the TextRun class.

It is equivalent to the following JavaScript expression:

new TextRun({
    font: "Arial",
    highlight: "lightGray",
    text: "This is text."
})

XML inside API context have the following syntax.

The ... in tag names below represent the names when adding properties to an object, the _ when adding items to an array.

String

<...>Text</...>
<...><![CDATA[Text]]><...>
<...></...>

Those lines generates the string values and adds them to the parent object or array. For example:

<TextRun>
    <!-- The following "text" property is a string. -->
    <text>Hello World.</text>
</TextRun>

Object

<... prop1="..." prop2="..." ...>
    <property3>...</property3>
    <property4>...</property4>
    ...
</...>

It generates an object and adds it to the parent object or array. The object properties are taken from the attributes and children. You can use attributes or children whatever suits you the best. For example:

<TextRun text="Foo bar.">
    <!-- The following "border" property is an object. -->
    <border size="32">
        <style>thinThickThinMediumGap</style>
        <color>#FF0000</color>
    </border>
</TextRun>

Array

<...>
    <_>...</_>
    <_>...</_>
    ...
</...>

It generates an array and adds it to the parent object or array. The array items are <_>...</_>. For example:

<TextRun>
    <!-- The following "children" property is an array of three items. -->
    <children>
        <_>Now is the year </_> <!-- string -->
        <_:YearLong/> <!-- object of class YearLong (see filters below) -->
        <_>.</_> <!-- string -->
    </children>
</TextRun>

When you passing an array to the constructor each array item becomes one parameter, so you can use arrays to pass multiple parameters to the constructor.

Switch to normal context

<...:_>
    ...
</...>

It generates an array from elements created using normal context, i.e. containing <p>, <b>, <img> elements.

For example:

<Paragraph alignment="center">
    <!-- The following "children" property is an array of objects created
    using normal context. -->
    <children:_>
        This is <b>normal context</b> again.
    </children:_>
</Paragraph>

Filters

The .docx format uses a lot of different units. Also, API requires some other types than object, array or string. To solve this problems, you can use filters.

A filter is a function that takes one value, transforms it, and returns the transformed value.

Simplest examples are the measurement units. Borders thickness is expressed as integer number of 1/8 pt units. The pt8 filter takes string in Universal measure and converts it to 1/8 pt units.

<TextRun text="Foo bar.">
    <border style="single" size:pt8="0.5mm"/>
</TextRun>

Filters can be added to:

  • attributes (size:pt8="0.5mm"),
  • properties (<size:pt8>0.5mm</size:pt8>),
  • array items (<_:textFile>my_text_file.txt</_:textFile>).
  • classes (<TextRun:json>{text:"Hello World!"}</TextRun:json>),

Filters can be chained. They are executed from right to left, for example:

<TextRun:json:textFile>external_text_run.json5</TextRun:json:textFile>

It will read the external_text_run.json5 file, parse it as JSON5, and pass to the TextRun constructor.

You can use API class name as a filter. It will get the value, pass it to the constructor, and return constructed object. For example:

<Paragraph>
    <children>
        <!-- The array item is a TextRun constructed with one parameter
        which is an object containing "text" and "color" properties. -->
        <_:TextRun text="Hello World!" color="#008800"/>
    </children>
</Paragraph>

List of filters:

  • Unit conversions filters takes Universal measure and returns integer in specified units.
    • pt - 1 pt
    • pt3q - 3/4 pt
    • pt8 - 1/8 pt
    • pt20 and dxa - 1/20 pt
    • emu - 1/12700 pt
  • pass - do nothing
  • file - read file and return Uint8Array with content. Path must be absolute or relative to the directory containing main file.
  • textFile - read UTF-8 file and return string with content. Path must be absolute or relative to the directory containing main file.
  • int - convert string to number and round it.
  • float - convert string to number.
  • bool - convert string to boolean, see Boolean value.
  • enum - use docx.js API enum to convert or verify the value. The input is EnumName:value.
  • color - convert string to color, see Hex color value or color name.
  • json and json5 - parse input string as JSON5.
  • first - return first element of input array. Can be useful if you want to create just one element from normal context within API context.
  • emptyArray - return empty array. The array syntax described above does not allow empty arrays. Use this filter instead.
  • emptyObject - return empty object. The object syntax described above does not allow empty objects. Use this filter instead.
  • base64 - decode base64 string to Uint8Array.