Date: | 2004-04-15 |
---|---|
Web site: | http://docutils.sourceforge.net/ |
Copyright: | This document has been placed in the public domain. |
This is a work in progress. Please feel free to ask questions and/or provide answers; send email to the Docutils-Users mailing list. Project members should feel free to edit the source text file directly.
Docutils is a system for processing plaintext documentation into useful formats, such as HTML, XML, and TeX. It supports multiple types of input, such as standalone files (implemented), inline documentation from Python modules and packages (under development), PEPs (Python Enhancement Proposals) (implemented), and others as discovered.
For an overview of the Docutils project implementation, see PEP 258, "Docutils Design Specification".
Docutils is implemented in Python.
Docutils is short for "Python Documentation Utilities". The name "Docutils" was inspired by "Distutils", the Python Distribution Utilities architected by Greg Ward, a component of Python's standard library.
The earliest known use of the term "docutils" in a Python context was a fleeting reference in a message by Fred Drake on 1999-12-02 in the Python Doc-SIG mailing list. It was suggested as a project name on 2000-11-27 on Doc-SIG, again by Fred Drake, in response to a question from Tony "Tibs" Ibbs: "What do we want to call this thing?". This was shortly after David Goodger first announced reStructuredText on Doc-SIG.
Tibs used the name "Docutils" for his effort "to document what the Python docutils package should support, with a particular emphasis on documentation strings". Tibs joined the current project (and its predecessors) and graciously donated the name.
For more history of reStructuredText and the Docutils project, see An Introduction to reStructuredText.
Please note that the name is "Docutils", not "DocUtils" or "Doc-Utils" or any other variation.
DocFactory is under development. It uses wxPython and looks very promising.
Although useful and relatively stable, Docutils is experimental code, with APIs and architecture subject to change.
Our highest priority is to fix bugs as they are reported. So the latest code from CVS (or development snapshots) is almost always the most stable (bug-free) as well as the most featureful.
It ought to be "release early & often", but official releases are a significant effort and aren't done that often. We have automatically-generated development snapshots which always contain the latest code from CVS. As the project matures, we may formalize on a stable/development-branch scheme, but we're not using anything like that yet.
If anyone would like to volunteer as a release coordinator, please contact the project coordinator.
reStructuredText is an easy-to-read, what-you-see-is-what-you-get plaintext markup syntax and parser system. The reStructuredText parser is a component of Docutils. reStructuredText is a revision and reinterpretation of the StructuredText and Setext lightweight markup systems.
If you are reading this on the web, you can see for yourself. The source for this FAQ is written in reStructuredText; open it in another window and compare them side by side.
A ReStructuredText Primer and the Quick reStructuredText user reference are a good place to start. The reStructuredText Markup Specification is a detailed technical specification.
The name came from a combination of "StructuredText", one of reStructuredText's predecessors, with "re": "revised", "reworked", and "reinterpreted", and as in the re.py regular expression module. For a detailed history of reStructuredText and the Docutils project, see An Introduction to reStructuredText.
"RST" and "ReST" (or "reST") are both acceptable. Care should be taken with capitalization, to avoid confusion with "REST", an acronym for "Representational State Transfer".
The abbreviations "reSTX" and "rSTX"/"rstx" should not be used; they overemphasize reStructuredText's precedessor, Zope's StructuredText.
It's ".txt". Some people would like to use ".rest" or ".rst" or ".restx", but why bother? ReStructuredText source files are meant to be readable as plaintext, and most operating systems already associate ".txt" with text files. Using a specialized filename extension would require that users alter their OS settings, which is something that many users will not be willing or able to do.
There is some code under development for Emacs.
Extensions for other editors are welcome.
A uniquely-adorned section title at the beginning of a document is treated specially, as the document title. Similarly, a uniquely-adorned section title immediately after the document title becomes the document subtitle. For example:
This is the Document Title ========================== This is the Document Subtitle ----------------------------- Here's an ordinary paragraph.
Counterexample:
Here's an ordinary paragraph. This is *not* a Document Title ============================== The "ordinary paragraph" above the section title prevents it from becoming the document title.
For example, say you want an em-dash (XML character entity —, Unicode character \u2014) in your document: use a real em-dash. Insert concrete characters (e.g. type a real em-dash) into your input file, using whatever encoding suits your application, and tell Docutils the input encoding. Docutils uses Unicode internally, so the em-dash character is a real em-dash internally.
ReStructuredText has no character entity subsystem; it doesn't know anything about XML charents. To Docutils, "—" in input text is 7 discrete characters; no interpretation happens. When writing HTML, the "&" is converted to "&", so in the raw output you'd see "—". There's no difference in interpretation for text inside or outside inline literals or literal blocks -- there's no character entity interpretation in either case.
If you can't use a Unicode-compatible encoding and must rely on 7-bit ASCII, there is a workaround. Files containing character entity set substitution definitions using the "unicode" directive are available (tarball). A description and instructions for use are here. Thanks to David Priest for the original idea. Incorporating these files into Docutils is on the to-do list.
If you insist on using XML-style charents, you'll have to implement a pre-processing system to convert to UTF-8 or something. That introduces complications though; you can no longer write about charents naturally; instead of writing "—" you'd have to write "—".
People have tossed the idea around, but little if any actual work has ever been done. There's no reason why reStructuredText should not be round-trippable to/from XML; any technicalities which prevent round-tripping would be considered bugs. Whitespace would not be identical, but paragraphs shouldn't suffer. The tricky parts would be the smaller details, like links and IDs and other bookkeeping.
For HTML, true round-tripping may not be possible. Even adding lots of extra "class" attributes may not be enough. A "simple HTML" to RST filter is possible -- for some definition of "simple HTML" -- but HTML is used as dumb formatting so much that such a filter may not be particularly useful. No general-purpose filter exists. An 80/20 approach should work though: build a tool that does 80% of the work automatically, leaving the other 20% for manual tweaks.
There are several, with various degrees of completeness. With no implied endorsement or recommendation, and in no particular order:
Please let us know of any other reStructuredText Wikis.
The example application for the Web Framework Shootout article is a Wiki using reStructuredText.
With no implied endorsement or recommendation, and in no particular order:
Please let us know of any other reStructuredText Blogs.
Some people like to write lists with indentation, without intending a block quote context, like this:
paragraph * list item 1 * list item 2
There has been a lot of discussion about this, but there are some issues that would need to be resolved before it could be implemented. There is a summary of the issues and pointers to the discussions in the to-do list.
Short answer: no.
In reStructuredText, it would be impossible to unambigously mark up and parse lists without blank lines before and after. Deeply nested lists may look ugly with so many blank lines, but it's a price we pay for unambiguous markup. Some other plaintext markup systems do not require blank lines in nested lists, but they have to compromise somehow, either accepting ambiguity or requiring extra complexity. For example, Epytext does not require blank lines around lists, but it does require that lists be indented and that ambiguous cases be escaped.
There is no elegant built-in way, yet. There are several ideas, but no obvious winner. This issue requires a champion to solve the technical and aesthetic issues and implement a generic solution. Here's the to-do list entry.
There are several quick & dirty ways to include equations in documents:
The HTML Writer module, docutils/writers/html4css1.py, is a proof-of-concept reference implementation. While it is a complete implementation, some aspects of the HTML it produces may be incompatible with older browsers or specialized applications (such as web templating). Alternate implementations are welcome.
It produces XHTML compatible with the HTML 4.01 and XHTML 1.0 specifications. A cascading style sheet ("default.css" by default) is required for proper viewing with a modern graphical browser. Correct rendering of the HTML produced depends on the CSS support of the browser.
No specific browser is targeted; all modern graphical browsers should work. Some older browsers, text-only browsers, and browsers without full CSS support are known to produce inferior results. Mozilla (version 1.0 and up) and MS Internet Explorer (version 5.0 and up) are known to give good results. Reports of experiences with other browsers are welcome.
Here's the question in full:
I have this text:
Heading 1 ========= All my life, I wanted to be H1. Heading 1.1 ----------- But along came H1, and so shouldn't I be H2? No! I'm H1! Heading 1.1.1 ************* Yeah, imagine me, I'm stuck at H3! No?!?When I run it through tools/html.py, I get unexpected results (below). I was expecting H1, H2, then H3; instead, I get H1, H1, H2:
... <html lang="en"> <head> ... <title>Heading 1</title> <link rel="stylesheet" href="default.css" type="text/css" /> </head> <body> <div class="document" id="heading-1"> <h1 class="title">Heading 1</h1> <-- first H1 <p>All my life, I wanted to be H1.</p> <div class="section" id="heading-1-1"> <h1><a name="heading-1-1">Heading 1.1</a></h1> <-- H1 <p>But along came H1, and so now I must be H2.</p> <div class="section" id="heading-1-1-1"> <h2><a name="heading-1-1-1">Heading 1.1.1</a></h2> <p>Yeah, imagine me, I'm stuck at H3!</p> ...What gives?
Check the "class" attribute on the H1 tags, and you will see a difference. The first H1 is actually <h1 class="title">; this is the document title, and the default stylesheet renders it centered. There can also be an <h2 class="subtitle"> for the document subtitle.
If there's only one highest-level section title at the beginning of a document, it is treated specially, as the document title. (Similarly, a lone second-highest-level section title may become the document subtitle.) Rather than use a plain H1 for that, we use <h1 class="title"> so that we can use H1 again within the document. Why do we do this? HTML only has H1-H6, so by making H1 do double duty, we effectively reserve these tags to provide 6 levels of heading beyond the single document title.
HTML is being used for dumb formatting for nothing but final display. A stylesheet is required, and one is provided: tools/stylesheets/default.css. Of course, you're welcome to roll your own. The default stylesheet provides rules to format <h1 class="title"> and <h2 class="subtitle"> differently from ordinary <h1> and <h2>:
h1.title { text-align: center } h2.subtitle { text-align: center }
(Thanks to Mark McEahern for the question and much of the answer.)
The rendering of enumerators (the numbers or letters acting as list markers) is completely governed by the stylesheet, so either the browser can't find the stylesheet (try using the "--embed-stylesheet" option), or the browser can't understand it (try a recent Mozilla or MSIE).
Yes, in conjunction with other projects.
Docstring extraction functionality from within Docutils is still under development. There is most of a source code parsing module in docutils/readers/python/moduleparser.py. We do plan to finish it eventually. Ian Bicking wrote an initial front end for the moduleparser.py module, in sandbox/ianb/extractor/extractor.py. Ian also did some work on the Python Source Reader (docutils.readers.python) component at PyCon DC 2004.
Version 2.0 of Ed Loper's Epydoc supports reStructuredText-format docstrings for HTML output. Docutils 0.3 or newer is required. Development of a Docutils-specific auto-documentation tool will continue. Epydoc works by importing Python modules to be documented, whereas the Docutils-specific tool, described above, will parse modules without importing them (as with HappyDoc, which doesn't support reStructuredText).
The advantages of parsing over importing are security and flexibility; the disadvantage is complexity/difficulty.
For more details, please see "Docstring Extraction Rules" in PEP 258, item 3 ("How").
Not directly, no. It borrows bits from DocBook, HTML, and others. I (David Goodger) have designed several document models over the years, and have my own biases. The Docutils document model is designed for simplicity and extensibility, and has been influenced by the needs of the reStructuredText markup.