BRIEF DTD TUTORIAL This is very brief introduction to DTD that explains basic notations. XML DTD or Document Type Definition is expected to define formal grammar of XML based markup language(s). Basically DTD contains list of elements that can occur in markup, list of attributes of each element, possible attribute values or value types (may declare default attribute values too) and content model that specifies allowed nesting of elements. This information can be used in several ways. 1. One can use DTD to validate document, i.e., to check whether document follows formal rules defined in DTD, in this way one can detect possible errors (like misspelled element names, attribute names/values, wrongly nested elements etc.) that otherwise would be difficult to notice. 2. One can use DTD just to provide accurate description of markup language. Here many things depend on markup language itself, as not all XML applications can be accurately described using XML DTD. 3. One can use DTD to define character entities, specify default attributes and bind elements to XML namespaces. ELEMENTS TYPE DECLARATION Elements used in markup language are declared as follows where ElementName is name of element like h1, par, table, ul etc. (note that each element must be declared only once, multiple element type declaration with the same element name are not allowed) and FormalContentModel is expression that specifies its content model. In XML DTD content model may specify what elements can be children of given element (and in what order they may appear) and whether element may contain character data. There are several possible content models. They are described below. 1. EMPTY This is the simplest content model that says that element is empty and should not contain any character data or any nested elements. For example XHTML 'br' element that is used for forced line breaks
is empty element. In DTD it is described as follows: Usually empty elements are represented by empty tags like
but

is also valid markup. 2. ANY Simple content model. It says that element may contain anything, including character data or any other elements (that are declared in DTD). This content model is rarely used as it is too general. 3. Mixed Mixed content model should be used when element may contain both character data and other elements. Content model looks like (#PCDATA | ChildName1 | ChildName2 | ... | ChildNameN)* where ChildNames are names of possible child elements. If no child elements are allowed this content model reduces to (#PCDATA) Example: Suppose that 'group' element may contain text, or 'subgroup' element and 'subgroup' element may contain only text, no tags inside, like My Group First Subgroup Second Subgroup In DTD these elements can be described as: Note that in XML DTD, 'Mixed' content model does not define order of child elements, does not specify how many times child element may be repeated in markup, and can not be combined with other content models. For example the following models are illegal: (#PCDATA | em | strong | strong)* (#PCDATA | em | strong)+ (#PCDATA | (em | strong))* 4. children Unlike 'Mixed' content model, this one applies to elements that may contain only child elements and should not contain any child text nodes. It may specify list of child elements, in addition it may impose restrictions on their possible order or specify how many times certain element may occur in content model. This is achieved by combining sequences and choices. Sequence is ordered list of child elements that looks like (FirstChild, SecondChild, ThirdChild) Choice is unordered list of child elements like (Child | AnotherChild | YetAnotherChild) Sequence and choice can be combined to describe more complex content models (note that in Mixed content model you can't do this). Signs '?', '+' and '*' can be used to specify how many times content model may be repeated ('?' means 1 or 0, '+' means > 0, '*' means any times) they may appear after sequence or choice and after any element name inside sequence or choice. Examples: DECLARING ATTRIBUTES If element has some attributes they must be declared in DTD as follows AttributeName is full (qualified) name of attribute like href, xml:lang, title. If elements has more then one attribute list declarations then these lists are simply merged and if certain attribute is declared several times then first declaration overwrites all subsequent ones. AttributeType is either string type (CDATA) that means attribute value may be arbitrary, tokenized type like ID, IDREF, IDREFS, NMTOKEN, NMTOKENS or enumerated type (list of all possible attribute values). DefaultDeclaration may specify whether attribute is required and if attribute is not required then it may specify default attribute value. STRING TYPE String type (CDATA) imposes no restriction on attribute value, it may carry arbitrary character data that does not break well-formedness of document. TOKENIZED TYPES Most important tokenized types are the following: NMTOKEN. Attributes of this type must have values that consist from any letters (not necessary Latin), digits or characters '_', '-', '.', ':' Example: NMTOKENS. The same as NMTOKEN or space separated list of NMTOKENs Example: ID. It is the same as NMTOKEN but first character should be letter, '_' or ':' In addition ID type attribute values must be unique (two ID type attributes that appear in single document are not allowed to carry the same value). Example: IDREF. Must contain reference to unique ID (value of any ID type attribute). Example: IDREFS. Must contain reference to unique ID or space separated list of such a references. Example: ENUMERATED TYPE This type of attributes may have only limited number of predefined values. Example: Note that each value must be of NMTOKEN type. For example the following declaration is not allowed (forward slash breaks well-formedness) DEFAULT ATTRIBUTE DECLARATION There are several types of default declarations. Most important are: #IMPLIED Keyword #IMPLIED specifies that attribute can be omitted #REQUIRED Keyword #REQUIRED means that attribute value must be explicitly specified in markup Default The same as #IMPLIED but if attribute is omitted XML parser must attach attribute with default value to element and pass it to application. Example: Text will be treated as Text ?> #FIXED The same as default but in this case default value is the only possible attribute value. Example: ATTRIBUTE VALUE NORMALIZATION Note that values of all attributes are normalized by XML parser. Basically it means that all tabs, carriage returns and line feed characters are replaced with space, and if attribute is of tokenized type then multiple spaces in attribute value are replaced by single space, while leading and trailing spaces are stripped. CHARACTER ENTITIES Custom character entities can be defined as follows Further they can be referred in XML document as &EntityName; They can be used to define convenient notations for frequently used constructions or difficult to access characters. If some character entity is declared several times then first declaration overwrites later ones. Example: PARAMETER ENTITIES Parameter entities can be used to introduce convenient notation for frequently used constructions. Parameter entity should be used within DTD (not in XML markup) they are declared as follows and further they can be referred in DTD as %EntityName; For example: is equivalent to If some parameter entity is declared several times then first declaration overwrites later ones. Parameter entities may be stored in external DTDs. In this case they can be declared as follows: OR Note that (non validating) XML parsers are not required to read external DTDs. CONDITIONAL SECTIONS Conditional sections are used to include or ignore certain sections from DTD. They look like ]]> ]]> Usually they are combined with parameter entities as follows ]]> In this way one can reconfigure DTD by redefining certain parameter entities. PROCESSING INSTRUCTIONS Processing instructions look like they used to pass certain information to applications. For example the following instruction, included in XHTML 1.1 DTD, passes title of DTD to W3C markup validator INTERNAL AND EXTERNAL DTDS Document Type Definition can be either internal, external or combination of these two. Internal DTD is included in document's prolog before root element. It looks like External DTD is stored in separate dtd file served as application/xml-dtd and can be linked to document, like OR internal and external DTDs can be combined OR Note that XML parsers are NOT required to read external DTDs therefore information that may influence rendering of XML document should be stored in internal DTD subset (basically this applies to definitions of character entities and default attribute values). Note that any attribute type, parameter entity and character entity declarations specified in internal DTD (or external entities that are included in internal subset) overwrite those specified in external DTD.