SGML norm, or Standard Generalised Markup Language ([8879]) has been created by a research group from the OSI in order to ease computer documents exchange.
This is a meta-language, that is to say a way to describe a language. It defines methods for logic representation of computer texts, independently from systems and hardware used. This markup notation allows to give structure to texts and to ease their analysis by making explicit what is implicit or supposed. Every markup conventions are described in a formal grammar called the DTD, since such a language must specify what is allowed, what is mandatory and how it can be distinguished from the text.
SGML defines the notion of Document Type Declaration (DTD). The type of a document is defined according to its components and their structure. A report can be described as a document with a title, an author, an abstract and a sequence of paragraphs. A document without a title or a document with a sequence of paragraphs followed by an abstract is not considered as a report.
One of the main goals of SGML is transportability of documents. They must not loose information while changing from on system to another or from a software to another.
SGML is a format that does semantic description of text. Semantic description keep clearly separates form and content. SGML does not describe how a document will be displayed or printed but rather describes its structure. In fact, SGML users manipulate a markup that describes parts of the text (i.e. to mark a word as the name of a person). The making of a printable of displayable form is done toward a compilation process which involves stylesheets. Stylesheets set how a document will eventually be displayed or printed.
Modeling document content is the first way to build the structure of a SGML document. It is a part of the DTD. While creating a DTD, it is important to find balance between the definition of simple construction rules and the complexity of the text. You must not forget that every DTD is an interpretation of text. A text can be interpreted by several DTDs.
Separation between textual content and description of
the document structure is made by a particular syntax. SGML
defines elements, which allow to mark
and structure parts of the text. Elements are named and
have significations. They are written between
<
and >
caracters. For exemple, the article
element is written:
SGML element are often used by pairs: an opening element
and a closing element (the name of a closing element is
prefixed with a /
caracter,
i.e. </article>
). Text between
these two elements is "contained" by these elements (we will
say later that it is in the context of an element). All the
elements are not necessary organized by pairs.
The DTD for a document specifies in with context every
element is valid. Thus, for AlcoveBook, the
appendix
element is valid inly in an
article
, book
or
context. Moreover, the DTD
specifies which elements are mandatory in a given context.
For exemple, a
listitem
element is
mandatory in a itemizedlist
context.
Tip: Since elements are hierarchically organized, we suggest that you use the following convention to make the SGML source more readable:
for each level of inclusion, indent the text with two caracters ;
in case of a non textual content (ie. an element that only contains sub-elements), whitespace is not significant, and you should use carriage returns after opening tag and before each closing tag ;
in elements with textual content, the presence of whitespace is significant[1], and you should usually not add spaces or newlines after an opening tag, or before a closing tag.
Elements may contain attributes extending their
signification. For exemple, attributes that identifies an
element (identificators) or that specifies parameters
(i.e. language). Attributes are nammed and typed. They are
syntaxically contained in an element and are written in the
form "name
="value
". For example:
SGML entities are named strings, which get interpolated during the transformation of a document into a printable of displayable document. There are several entity types, including "general internal entities" and "general external entities".
A general internal entity references a string defined in the document and are interpolated during the transformation (as for a variable in a computer language).
A general external entity references a string defined in another file. They are interpolated with the content of the external file during the transformation. This allows to split a huge document into several files, i.e. for each chapter.
An entity is prefixed by a & caracter and is suffixed by a ; caracter. For example:
Textual content (known as character
data, or CDATA in the SGML
jargon) is only possible in some contexts. Thus, each element
of a SGML document contains either only textual content
(possibly with inline markup, like
emphasis
, in which case we speak of
mixed content), or only elements.
Speaking of AlcoveBook, examples of elements with mixed
content are para
, title
,
literal
, etc.
As explained above, SGML processors are allowed in most
places to "compress" multiple whitespace characters into a
single space, and linebreaks are considered to be whitespace,
except in some well-defined elements (notably
literallayout
, and
programlisting
). But we recommand not
using superfluous spaces of linebreaks at the begining and at
the end of textual contexts[1].
AlcoveBook is a subset from DocBook. It is a lighter version of DocBook, specially made for internal documentation needs of Alcôve. As a subset from DocBook, AlcoveBook is directly usable with DocBook tools, but can produce an Alcove specific document with AlcoveBook tools. AlcoveBook is packaged for Debian GNU/Linux in the alcovebook-sgml package, available in the potato-alcove internal apt source.
[1] | In most places, the number of space, tabs, and newline characters is not significant, what is significant is whether there is any space or not |