Scriv2DocBook

Scriv2DocBook is a simple workflow for writing technical books in Scrivener, the text composition application from Literature & Latte. The end result is DocBook XML suitable for delivery to a publisher, or rendering into other formats using DocBook tools.

Scriv2DocBook consists of:

The tools are tuned especially for writing for O’Reilly Media, but the result is standard DocBook.

The Scriv2DocBook conversion tools are free software, released under the Apache License, Version 2.0.

Contents

Caveats

Scrivener is not an XML editor. I can’t stress this enough. Scriv2DocBook expects you to hand-type XML markup into a Scrivener project. This is error prone, and even the most experienced XML typist will encounter syntax and validation errors. You will not catch these errors until you export, and tracing these errors back to their origin in your Scrivener project will be tedious.

My general recommendation is to not use Scriv2DocBook! Most people will be happier with an XML editing environment, ideally one that validates to a schema as you type. Oxygen, Syntext Serna, or Emacs with nxml mode are all good options, and highly recommended over editing XML in an application not suited to that purpose. Only proceed if you, like me, are overwhelmed by Scrivener’s other considerable charms, and are willing to take on this burden.

I wrote Scriv2DocBook for my own use on a project for O’Reilly Media (Programming Google App Engine, if you’re interested). These tools are designed and tested around O’Reilly’s workflow and house style. Scriv2DocBook is not necessarily suitable for all DocBook projects in all situations.

Like most home-grown book author tools, Scriv2DocBook represents a way of working that suits me, and it may not suit you. I present this publicly because people interested in doing something similar have asked me about it.

Familiarity with DocBook XML and related command-line tools is required.

Installing Scriv2DocBook

To use Scriv2DocBook, you need the following:

Download Scriv2DocBook and unpack the archive. The archive includes the s2d and d2s commands, which are intended to be run from the unpacked directory.

Setting Up Scrivener

By default, Scrivener acts as a “rich text” editor, offering variable-width typefaces and modest formatting capabilities such as bold and italic text. It also enables automatic assistance with typographic characters (“smart quotes”) by default. These features interfere with Scriv2DocBook’s purpose, and it’s best to turn them off. You’ll also want to use a fixed-width typeface, to make inline XML tags easier to read.

  1. Open the Preferences panel (for Mac, select the Scrivener menu, Preferences...). In the Formatting tab, on the tab stop ruler above the sample text, grab the paragraph indent marker (the small rectangle) and drag it all the way to the left, so paragraphs are not indented.

  2. Open the font selector (click the “A” button), then select Courier New, or your favorite fixed-width font family. Select a typeface and font size, as desired.

  3. In the Corrections tab, uncheck “Use smart quotes,” “Replace double hyphens with em-dashes,” and “Replace triple periods with ellipses.” These must be unchecked to prevent accidentally corrupting DocBook XML attributes and computer source code in your text, which must use “straight quotes.” O’Reilly actually wants typographic characters (such as smart quotes) in DocBook XML source for prose, but O’Reilly’s tool chain replaces these automatically, and in the appropriate places, when you submit text.

    Uncheck other options, as desired. Personally, I leave “Check spelling as you type in new projects” on, and disable everything else. The goal is to get Scrivener to behave as much like a plain text editor as possible.

  4. In the Import/Export tab, under OPML, “Import notes into,” confirm that “Main Text (with Synopsis)” is selected. (This is the default.)

See also section 21.6 of the Scrivener manual (select the Help menu, Scrivener Manual), which describes how to do something similar for the purposes of MultiMarkdown (another Scrivener export option).

Writing Your Book

With Scriv2DocBook, you use Scrivener sections to define the structure of your book. When you export, the hierarchy of Scrivener sections is converted into a hierarchy of DocBook chapters and sections.

The content of each section is DocBook XML, with one major exception: paragraphs. To avoid cluttering the display with <para> tags, Scriv2DocBook recognizes blocks of text separated by a blank line as paragraphs, and adds the <para> tags during export.

Everything else that appears in the text is treated as literal DocBook XML. Inline elements appear directly within paragraph text. Block elements can be separated from surrounding paragraphs with blank lines, and otherwise appear as typed in the final XML.

Chapters and Prefaces

Top-level Scrivener sections in your draft become chapters. Each chapter gets a <chapter> element in the XML output. Inner sections become sub-sections of the chapters, in the hierarchy represented in the Scrivener project; these use <sect#> elements.

DocBook has a special element for preface chapters: <preface>. Scriv2DocBook will use the <preface> element only if the chapter title is Preface. In all other cases, it uses <chapter>.

Scriv2DocBook only knows how to make prefaces, chapters, and sections. It does not support “parts” (the level of hierarchy above chapters).

Titles and Section IDs

Chapters and sections have titles. Scriv2DocBook uses the name of the Scrivener section as the title of the chapter or section.

Chapters and sections also have XML identifiers, which appear in the XML source as id="..." attribtues. You use these attributes when creating cross references in your text. Scriv2DocBook generates these IDs automatically based on the section titles. It does so by replacing all characters that aren’t letters or numbers with underscores.

For example, the chapter named “Entities, Keys, and Properties” has an ID of Entities__Keys__and_Properties. Notice how both spaces and punctuation characters become underscores. This section would be referred to in text with an <xref> element, like so:

For more information about keys, see <xref linkend="Entities__Keys__and_Properties"/>.

An ID must be unique across all IDs in the entire document. If Scriv2DocBook finds multiple sections with the same title, it generates IDs by appending a number at the end of the ID, where the number is sequential in the order the sections appear in the document. For example, if each of your chapters ends with a section entitled “Exercises,” the ID of the first section is Exercises, the second is Exerises_1, the third is Exercises_2, and so on.

It is an XML validation error if an <xref>’s linkend attribute refers to an element ID that doesn’t exist in the document. Scriv2DocBook reports such errors when it validates the output. These errors can be difficult to troubleshoot. You may wish to examine the generated output to confirm what ID Scriv2DocBook generated for a problematic section.

Block Elements

As shown above, you add paragraphs in a section your Scrivener document by typing them directly, omitting the <para> tags and separating each paragraph with a blank line. To add another kind of block element, such as a <table>, separate it from the preceding and following paragraphs with blank lines.

The <code>WHERE</code> clause is equivalent to one or more
filters. It is not like SQL’s <code>WHERE</code> clause,
and does not support arbitrary logical expressions. In particular, it
does not support testing the logical-OR of two conditions.

The value on the righthand side of a condition can be a literal value
that appears inside the query string. Seven of the datastore value
types have string literal representations, as shown in <xref
linkend="gql_value_literals"/>.

<table id="gql_value_literals">
  <title>GQL value literals for datastore types</title>
  <tgroup cols="3">
    <colspec colname="c1"/>
    <colspec colname="c2"/>
    <colspec colname="c3"/>
    <thead>
      <row>
        <entry>Type</entry>
        <entry>Literal syntax</entry>
        ...

Block Elements Containing Paragraphs

Some block elements can (or must) contain paragraphs, such as <note> or <listitem>. In these cases, you must type the <para> and </para> tags, as in any other XML document. Scriv2DocBook only adds <para> tags automatically for top-level paragraphs.

<note>
  <para>Be sure to use the <code>logging</code>
  level name (such as <code>FINEST</code>) and not the App
  Engine level name for values in
  <filename>logging.properties</filename>. App Engine log
  levels only affect how messages are represented in the Admin
  Console.</para>
</note>

Other Features of XML and DocBook

Beyond sections and section-level paragraphs, your Scrivener project is simply DocBook XML. Scriv2DocBook takes what you type into your project, and drops it directly into the XML document. The result must be valid XML, and Scriv2DocBook checks that it is and reports errors.

It’s easy to forget that the text you type into your Scrivener project will be interpreted as XML, especially since your document omit structural tags and <para> tags. In particular, you must remember that characters with special meaning to XML must be typed as XML entities, especially in regions such as code samples. Watch for these three characters:

left angle bracket (“less than”) < &lt;
right angle bracket (“greater than”) > &gt;
ampersand & &amp;

Note that you must also use XML entities in section titles if you want them to appear as characters. DocBook markup is typically not allowed in titles as a matter of style.

Other features, such as elements containing preformatted text (<programlisting>) and XML Includes, are also supported. Simply type them as they should appear in the final XML document into your Scrivener project.

Exporting DocBook XML

Once you have a draft of your project that you want to export to DocBook, you use Scrivener’s OPML export feature and the s2d tool to produce the DocBook XML.

To export the Scriv2DocBook project as DocBook XML:

  1. In Scrivener, select the chapters you wish to export. You can do this in one of two ways:

    • Select each of the individual top-level sections to export, holding down the command key (Mac OS X) or control key (Windows) while clicking to select more than one. Each section you select becomes a chapter in the result.
    • Select the “Draft” section. If the export contains only one section and it is named “Draft,” then each of the sections inside “Draft” becomes a chapter in the result.

  2. Select the File menu, Export..., OPML File.... In the dialog, select a location and enter a filename. Select Titles and Text. Make sure Export entire binder is not checked. Click Export.

  3. At a command prompt (e.g. Mac OS X Terminal), run the s2d command with the exported OPML file, the path to the output directory, and the path to your DocBook schema files using the --schema-dir argument. This might look something like this:

    ./scriv2docbook/s2d mybook.opml mybook --schema-dir=docbook-xml-4

    Scriv2DocBook converts the project to DocBook XML files in the given directory (e.g. mybook/). It also validates the XML. If no XML errors are reported, then the XML is valid.

If you are exporting your project for the first time to an empty directory, be sure to edit the book.xml file to update the <title> tag with the actual title of your book.

If you are writing for O’Reilly, you can export your project directly into the SVN client directory containing the template files that O’Reilly provided you for your book. s2d preserves the elements of book.xml not related to the inclusion of chapters.

How the XML is Organized

The exported XML project consists of multiple files: a book.xml file that represents the structure of the book, and one file for each chapter. Chapter files are named with both a number and the chapter ID, to make it easier to trace XML validation errors back to the Scrivener project.

Here are the files for my book:

book.xml
ch00_Preface.xml
ch01_Introducing_Google_App_Engine.xml
ch02_Creating_an_Application.xml
ch03_Handling_Web_Requests.xml
ch04_Datastore_Entities.xml
ch05_Datastore_Queries.xml
ch06_Datastore_Transactions.xml
ch07_Data_Modeling_with_Python.xml
ch08_The_Java_Persistence_API.xml
ch09_The_Memory_Cache.xml
ch10_Fetching_URLs_and_Web_Resources.xml
ch11_Sending_and_Receiving_Mail_and_Instant_Messages.xml
ch12_Bulk_Data_Operations_and_Remote_Access.xml
ch13_Task_Queues_and_Scheduled_Tasks.xml
ch14_The_Django_Web_Application_Framework.xml
ch15_Deploying_and_Managing_Applications.xml

The book.xml file contains the DocBook root element, the book <title> element, and XInclude directives that refer to the files for each chapter. For example:

<?xml version="1.0"?>
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN" "http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd">

<book id="I_book_d1e1">
  <title>Programming Google App Engine</title>
  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="bookinfo.xml"/>
  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="dedication.xml"/>
  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="ch00_Preface.xml"/>
  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="ch01_Introducing_Google_App_Engine.xml"/>
  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="ch02_Creating_an_Application.xml"/>
  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="ch03_Handling_Web_Requests.xml"/>
  <!-- ... -->
  <index/>
  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="colo.xml"/>
</book>

If you export into a directory that already contains DocBook XML in this structure, s2d attempts to preserve the contents of book.xml, and merely replaces the lines related to the inclusion of chapters. Any chapter file it finds that has the same sequence number and ID as a chapter being exported will be overwritten with the new text. All other files are preserved.

If s2d finds old chapter files using this naming convention but are no longer used by the project, it will not delete them. Instead, it will display the commands you can run to delete the files. If your output directory is an SVN client, it will display the SVN commands needed to do this.

Importing a DocBook Project to Scrivener

You can use the d2s tool to import a DocBook project into Scrivener.

Note: The current version of d2s doesn’t use a real XML parser, so results may vary. It supports the layout generated by s2d (a book.xml file with XInclude directives for each of the chapters), which is also the layout of the O’Reilly project template. It does not support XInclude elsewhere in the document. The tool does support single-file DocBook XML data. It supports <sect#> tags, but not nested <section> tags.

To import a (conforming) DocBook project:

  1. If necessary, create a new Scrivener project, making sure that it uses the configuration you set above (monospace fonts, etc.) for new topics.

  2. At a command prompt (e.g. Mac OS X Terminal), run the d2s command on your DocBook data to produce an OPML file. This might look something like this:

    ./scriv2docbook/d2s mybook/book.xml mybook.opml
  3. Locate the OPML file in your computer’s file browser (e.g. Mac OS X Finder) and drag it to the Binder of your Scrivener project. Before letting go, align the insertion indicator to where you want the new Scrivener sections to be created. In a fresh project, this would be one level inside Draft.

  4. If prompted, confirm the Import Files dialog by clicking the “Import” button. Scrivener creates the new sections.

  5. Save the project.

License

Scriv2DocBook is copyright © 2011 Dan Sanderson.

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.