Docutils | Overview | About | Users | Reference | Developers

The Docutils Publisher


David Goodger


The docutils.core.Publisher class is the core of Docutils, managing all the processing and relationships between components. See PEP 258 for an overview of Docutils components. Configuration is done via runtime settings assembled from several sources. The Publisher convenience functions are the normal entry points for using Docutils as a library.

Publisher Convenience Functions

There are several convenience functions in the docutils.core module. Each of these functions sets up a docutils.core.Publisher object, then calls its publish() method. docutils.core.Publisher.publish() handles everything else.

See the module docstring, help(docutils.core), and the function docstrings, e.g., help(docutils.core.publish_string), for details and a description of the function arguments.


Function for custom command-line front-end tools (like tools/ or "console_scripts" entry points (like core.rst2html()) with file I/O. In addition to writing the output document to a file-like object, also returns it as str instance (rsp. bytes for binary output document formats).


For programmatic use with file I/O. In addition to writing the output document to a file-like object, also returns it as str instance (rsp. bytes for binary output document formats).


For programmatic use with string I/O:


can be a str or bytes instance. bytes are decoded with input_encoding.


is handled similar to xml.etree.ElementTree.tostring():

  • return a bytes instance, if output_encoding is set to an encoding registered with Python's "codecs" module (default: "utf-8"),

  • return str instance, if output_encoding is set to the special value "unicode".

This function is provisional because in Python 3 the name and behaviour no longer match.


Parse string input (cf. string I/O) into a Docutils document tree data structure (doctree). The doctree can be modified, pickled & unpickled, etc., and then reprocessed with publish_from_doctree().


Render from an existing document tree data structure (doctree). Returns the output document as a memory object (cf. string I/O).

This function is provisional because in Python 3 the name and behaviour of the string output interface no longer match.


Auxilliary function used by publish_file(), publish_string(), publish_doctree(), and publish_parts(). Applications should not need to call this function directly.


For programmatic use with string input (cf. string I/O). Returns a dictionary of document parts as str instances. [1] Dictionary keys are the part names. Each Writer component may publish a different set of document parts, described below.

Example: post-process the output document with a custom function post_process() before encoding with user-customizable encoding and errors

def publish_bytes_with_postprocessing(*args, **kwargs):
    parts = publish_parts(*args, **kwargs)
    out_str = post_process(parts['whole'])
    return out_str.encode(parts['encoding'], parts['errors'])

There are more usage examples in the docutils/ module.

Parts Provided By All Writers


The output_encoding setting.


The output_encoding_error_handler setting.


The version of Docutils used.


Contains the entire formatted document. [1]

Parts Provided By the HTML Writers

HTML4 Writer

parts['body'] is equivalent to parts['fragment']. It is not equivalent to parts['html_body'].


parts['body_prefix'] contains:

<div class="document" ...>

and, if applicable:

<div class="header">

parts['body_pre_docinfo] contains (as applicable):

<h1 class="title">...</h1>
<h2 class="subtitle" id="...">...</h2>

parts['body_suffix'] contains:


(the end-tag for <div class="document">), the footer division if applicable:

<div class="footer">



parts['docinfo'] contains the document bibliographic data, the docinfo field list rendered as a table.


parts['footer'] contains the document footer content, meant to appear at the bottom of a web page, or repeated at the bottom of every printed page.


parts['fragment'] contains the document body (not the HTML <body>). In other words, it contains the entire document, less the document title, subtitle, docinfo, header, and footer.


parts['head'] contains <meta ... /> tags and the document <title>...</title>.


parts['head_prefix'] contains the XML declaration, the DOCTYPE declaration, the <html ...> start tag and the <head> start tag.


parts['header'] contains the document header content, meant to appear at the top of a web page, or repeated at the top of every printed page.


parts['html_body'] contains the HTML <body> content, less the <body> and </body> tags themselves.


parts['html_head'] contains the HTML <head> content, less the stylesheet link and the <head> and </head> tags themselves. Since publish_parts() returns str instances which do not know about the output encoding, the "Content-Type" meta tag's "charset" value is left unresolved, as "%s":

<meta http-equiv="Content-Type" content="text/html; charset=%s" />

The interpolation should be done by client code.


parts['html_prolog] contains the XML declaration and the doctype declaration. The XML declaration's "encoding" attribute's value is left unresolved, as "%s":

<?xml version="1.0" encoding="%s" ?>

The interpolation should be done by client code.


parts['html_subtitle'] contains the document subtitle, including the enclosing <h2 class="subtitle"> and </h2> tags.


parts['html_title'] contains the document title, including the enclosing <h1 class="title"> and </h1> tags.


parts['meta'] contains all <meta ... /> tags.


parts['stylesheet'] contains the embedded stylesheet or stylesheet link.


parts['subtitle'] contains the document subtitle text and any inline markup. It does not include the enclosing <h2> and </h2> tags.


parts['title'] contains the document title text and any inline markup. It does not include the enclosing <h1> and </h1> tags.


The PEP/HTML writer provides the same parts as the HTML4 writer, plus the following:


parts['pepnum'] contains the PEP number (extracted from the header preamble).

S5/HTML Writer

The S5/HTML writer provides the same parts as the HTML4 writer.

HTML5 Writer

The HTML5 writer provides the same parts as the HTML4 writer. However, it uses semantic HTML5 elements for the document, header and footer.

Parts Provided by the "LaTeX2e" and "XeTeX" Writers

See the template files default.tex, titlepage.tex, titlingpage.tex, and xelatex.tex for examples how these parts can be combined into a valid LaTeX document.


parts['abstract'] contains the formatted content of the 'abstract' docinfo field.


parts['body'] contains the document's content. In other words, it contains the entire document, except the document title, subtitle, and docinfo.

This part can be included into another LaTeX document body using the \input{} command.


parts['body_pre_docinfo] contains the \maketitle command.


parts['dedication'] contains the formatted content of the 'dedication' docinfo field.


parts['docinfo'] contains the document bibliographic data, the docinfo field list rendered as a table.

With --use-latex-docinfo 'author', 'organization', 'contact', 'address' and 'date' info is moved to titledata.

'dedication' and 'abstract' are always moved to separate parts.


parts['fallbacks'] contains fallback definitions for Docutils-specific commands and environments.


parts['head_prefix'] contains the declaration of documentclass and document options.


parts['latex_preamble'] contains the argument of the --latex-preamble option.


parts['pdfsetup'] contains the PDF properties ("hyperref" package setup).


parts['requirements'] contains required packages and setup before the stylesheet inclusion.


parts['stylesheet'] contains the embedded stylesheet(s) or stylesheet loading command(s).


parts['subtitle'] contains the document subtitle text and any inline markup.


parts['title'] contains the document title text and any inline markup.


parts['titledata] contains the combined title data in \title, \author, and \date macros.

With --use-latex-docinfo, this includes the 'author', 'organization', 'contact', 'address' and 'date' docinfo items.


Docutils is configured by runtime settings assembled from several sources:

Docutils overlays default and explicitly specified values from these sources such that settings behave the way we want and expect them to behave. For details, see Docutils Runtime Settings. The individual settings are described in Docutils Configuration.

To pass application-specific setting defaults to the Publisher convenience functions, use the settings_overrides parameter. Pass a dictionary of setting names & values, like this:

app_defaults = {'input_encoding': 'ascii',
                'output_encoding': 'latin-1'}
output = publish_string(..., settings_overrides=app_defaults)

Settings from command-line options override configuration file settings, and they override application defaults.

See Docutils Runtime Settings or the docstring of publish_programmatically() for a description of all configuration arguments of the Publisher convenience functions.


Docutils supports all standard encodings and encodings registered with the codecs module. The special value "unicode" can be used with publish_string() to skip encoding and return a str instance instead of bytes.

The input encoding can be specified with the input_encoding setting. The default is "utf-8". The output encoding can be specified with the output_encoding setting. The default is "utf-8", too.


Up to Docutils 0.21, the input encoding was detected from a Unicode byte order mark (BOM) or an encoding declaration [2] in the source unless an input_encoding was specified.

The default input encoding changed to "utf-8" in Docutils 0.22. Currently, auto-detection can be selected with an input_encoding value None (rsp. an empty string in a configuration file). However, input encoding "auto-detection" is deprecated and will be removed in Docutils 1.0. See the inspecting_codecs package for a possible replacement.