9 Tekst

Inhoud

  1. Witruimte
  2. Gestructureerde tekst
    1. Zin elementen: EM, STRONG, DFN, CODE, SAMP, KBD, VAR, CITE, ABBR en ACRONYM
    2. Citaten: de BLOCKQUOTE en Q elementen
    3. Subscripts en superscripts: de SUB en SUP elementens
  3. Regels en alinea's
    1. Alinea's: het P element
    2. Lijnafbrekingen bepalen
    3. Koppeltekens
    4. Voorgeformateerde tekst: het PRE element
    5. Visuele weergave van alinea's
  4. Document wijzigingen aangeven: de INS en DEL elementen

De volgende delen handelen over onderwerpen rond de structuren van tekst. Elementen die tekst vertegenwoordigen (uitlijningselementen, lettertype-elementen, style sheets, enz.) worden elders in de specificatie besproken. Raadpleeg het deel over de document karakter set voor informatie over karakters

9.1 Witruimte

De document karakter set bevat verschillende witruimtekarakters. Veel van deze typografische elementen worden in sommige toepassingen gebruikt om een bepaalde visuele ruimte effecten te produceren. In HTML worden enkel de volgende karakters gedefinieerd als witruimtekarakters:

Regeleinden (Line breaks) zijn ook witruimtekarakters. Merk echter op dat hoewel 
 en 
 gedefinieerd worden in [ISO10646] om regels en paragrafen ondubbelzinnig van elkaar te onderscheiden, deze regeleinden in HTML niet doen gelden. Deze specificatie neemt ze ook niet op in de meer algemene categorie van witruimtekarakters.

Deze specificatie geeft niet aan wat het gedrag of de weergave van spatiekarakters moet zijn anders dan deze die hier expliciet geïdentificeerd wordt als witruimtekarakters. Daarom zouden auteurs gepaste elementen en stijlen moeten gebruiken om visuele opmaakeffecten die witruimte bevatten te bekomen in plaats van het gebruik van spatiekarakters.

Voor alle HTML elementen behalve PRE, scheiden reeksen van witruimte "woorden" (we gebruiken hier het begrip "woord" met als betekenis "reeksen van niet-witruimtekarakters"). Wanneer user agenten tekst opmaken zouden ze deze woorden moeten identificeren en ze opmaken volgens de gewoontes van de geschreven taal (schrift) en volgens het doelmedium.

Deze opmaak kan met zich meebrengen dat er ruimtes (spaties) tussen woorden (ook wel inter-woord ruimtes genoemd) moeten geplaatst worden, maar gewoontes voor inter-woord spaties verschillen van schrift tot schrift. In bijvoorbeeld Latijnse schriften wordt inter-woord ruimte weergegeven als een ASCII spatie ( ), terwijl in Thai een nul-breedte (zero-width) woordscheiding (​) gebruikt wordt. In Japans en Chinees wordt inter-woord ruimte niet weergegeven.

Merk op dat een reeks van witruimtes tussen woorden in het brondocument kunnen resulteren in een compleet andere weergeven inter-woord ruimte (behalve wanneer het PRE element gebruikt wordt). User agenten zouden input witruimte reeksen moeten samenklappen wannneer output inter-woord ruimte wordt geproduceerd. Dit kan en zou zelfs gedaan moeten worden wanneer de taalinformatie (van het lang attribuut, het HTTP "Content-Language" header veld (raadpleeg [RFC2616], sectie 14.12), user agent instellingen, enz.) ontbreekt.

Het PRE element wordt gebruikt voor vooraf opgemaakte tekst, waar witruimte belangrijk is.

Om problemen met SGML regeleinde regels en inconsequentie onder bestaande uitvoeringen te voorkomen, zouden auteurs niet mogen vertrouwen op user agenten om witruimte vlak na een start tag of vlak voor een eind tag weer te geven. Daarom zouden auteurs, en in het bijzonder authoring tools:

  <P>We bieden gratis <A>technische ondersteuning</A> voor abonnees.</P>

moeten schrijven in plaats van:

  <P>We bieden gratis<A> technische ondersteuning </A>voor abonnees.</P>

9.2 Gestructureerde tekst

9.2.1 Zin elementen: EM, STRONG, DFN, CODE, SAMP, KBD, VAR, CITE, ABBR, en ACRONYM

<!ENTITY % phrase "EM | STRONG | DFN | CODE |
                   SAMP | KBD | VAR | CITE | ABBR | ACRONYM" >
<!ELEMENT (%fontstyle;|%phrase;) - - (%inline;)*>
<!ATTLIST (%fontstyle;|%phrase;)
  %attrs;                              -- %coreattrs, %i18n, %events --
  >

Start tag: vereist, Eind tag: vereist

Attributen elders gedefinieerd

Zin elementen voegen structurele informatie aan tekst fragmenten toe. De gebruikelijke betekenis van zin elementen zijn als volgt:

EM:
Geeft nadruk aan.
STRONG:
Geeft sterkere nadruk aan.
CITE:
Bevat en citaat of een verwijzing naar andere bronnen.
DFN:
Geeft aan dat dit de definitie is van het ingesloten begrip.
CODE:
Voorbestemt voor een fragment van computer code.
SAMP:
Voorbestemt voorbeeld uivoer van programma's, scripts, enz.
KBD:
Geeft tekst aan die moet ingevoerd worden door de gebruiker.
VAR:
Geeft een voorbeeld van een variabele of programma parameter.
ABBR:
Geeft een afgekorte vorm (zoals WWW, HTTP, URI, Mass., enz.).
ACRONYM:
Geeft een letterwoord (acroniem) aan (zoals WAC, radar, etc.).

EM en STRONG worden gebruikt om nadruk te vestigen. De andere zin elementen hebben bijzondere betekenis in technische documenten. Deze voorbeelden illustreren enkele van de zin elementen:

Zoals <CITE>Harry S. Truman</CITE> zei,
<Q lang="en-us">The buck stops here.</Q>

Meer informatie kan gevonden worden in <CITE>[ISO-0000]</CITE>.

Verwijs naar hetvolgende nummer in toekomstige briefwisseling: <STRONG>1-234-55</STRONG>

De weergave van zin elementen hangt af van de user agent. In het algemeen geven visuele user agents EM tekst cursief en STRONG tekst vet weer. Spraak synthesizer user agents kunnen de synthese parameters, zoals het volume, toonhoogte en snelheid overeenkomstig wijzigen.

De ABBR en ACRONYM elementen laten auteurs toe om duidelijk afkortingen en acroniemen aan te geven. Westerse talen maken uitgebreid gebruik van acroniemen zoals "GmbH", "NATO" en "F.B.I.", maar ook van afkortingen zoals "Mr.", "Inc.", "et al.", "enz.". Chinees en Japans gebruiken beide analoge afkortingsmechanismen, waar er naar een lange naam verwezen wordt door een subset van de Han karakters van de originele tekst. Deze constructies opmaken biedt nuttige informatie voor de user agents en tools zoals spellingscontrolers, spraak synthesizers, vertalingssystemen en zoekrobot indexers.

De inhoud van de ABBR en ACRONYM elementen specificeert de afgekorte uitdrukking zelf, zoals het normaal in de tekst zou voorkomen. Het title attribuut van deze elementen kan gebruikt worden om de volledige of uitgebreide vorm van de uitdrukking te voorzien.

Hier volgen enkele voorbeelde van het gebruik van ABBR:

  <P>
  <ABBR title="World Wide Web">WWW</ABBR>
  <ABBR lang="fr" 
        title="Soci&eacute;t&eacute; Nationale des Chemins de Fer">
     SNCF
  </ABBR>
  <ABBR lang="es" title="Do&ntilde;a">Do&ntilde;a</ABBR>
  <ABBR title="Abbreviation">abbr.</ABBR>

Merk op dat afkortingen en acroniemen dikwijls eigenaardige uitspraakregels volgen. Terwijl "IRS" en "BBC" bijvoorbeeld gewoonlijk letter voor letter uitgesproken worden, worden "NATO" en "UNESCO" op fonetische wijze uitgesproken. Weer andere afgekorte vormen (zoals "URI" en "SQL") worden letter voor letter uitgesproken door sommige mensen en als woorden uitgesproken door andere mensen. Wanneer nodig zouden auteurs style sheets moeten gebruiken om de uitspraak van een afgekorte vorm te specificeren.

9.2.2 Citaten: De BLOCKQUOTE en Q elementen

<!ELEMENT BLOCKQUOTE - - (%block;|SCRIPT)+ -- lang citaat -->
<!ATTLIST BLOCKQUOTE
  %attrs;                              -- %coreattrs, %i18n, %events --
  cite        %URI;          #IMPLIED  -- URI voor brondocument of boodschap --
  >
<!ELEMENT Q - - (%inline;)*            -- kort inline citaat -->
<!ATTLIST Q
  %attrs;                              -- %coreattrs, %i18n, %events --
  cite        %URI;          #IMPLIED  -- URI voor brondocument of boodschap --
  >

Start tag: vereist, Eind tag: vereist

Attribuut definities

cite = uri [CT]
The value of this attribute is a URI that designates a source document or message. This attribute is intended to give information about the source from which the quotation was borrowed.

Attributes defined elsewhere

These two elements designate quoted text. BLOCKQUOTE is for long quotations (block-level content) and Q is intended for short quotations (inline content) that don't require paragraph breaks.

This example formats an excerpt from "The Two Towers", by J.R.R. Tolkien, as a blockquote.

<BLOCKQUOTE cite="http://www.mycom.com/tolkien/twotowers.html">
<P>They went in single file, running like hounds on a strong scent,
and an eager light was in their eyes. Nearly due west the broad
swath of the marching Orcs tramped its ugly slot; the sweet grass
of Rohan had been bruised and blackened as they passed.</P>
</BLOCKQUOTE>

Rendering quotations 

Visual user agents generally render BLOCKQUOTE as an indented block.

Visual user agents must ensure that the content of the Q element is rendered with delimiting quotation marks. Authors should not put quotation marks at the beginning and end of the content of a Q element.

User agents should render quotation marks in a language-sensitive manner (see the lang attribute). Many languages adopt different quotation styles for outer and inner (nested) quotations, which should be respected by user-agents.

The following example illustrates nested quotations with the Q element.

John said, <Q lang="en-us">I saw Lucy at lunch, she told me
<Q lang="en-us">Mary wants you
to get some ice cream on your way home.</Q> I think I will get
some at Ben and Jerry's, on Gloucester Road.</Q>

Since the language of both quotations is American English, user agents should render them appropriately, for example with single quote marks around the inner quotation and double quote marks around the outer quotation:

  John said, "I saw Lucy at lunch, she told me 'Mary wants you
  to get some ice cream on your way home.' I think I will get some
  at Ben and Jerry's, on Gloucester Road."

Note. We recommend that style sheet implementations provide a mechanism for inserting quotation marks before and after a quotation delimited by BLOCKQUOTE in a manner appropriate to the current language context and the degree of nesting of quotations.

However, as some authors have used BLOCKQUOTE merely as a mechanism to indent text, in order to preserve the intention of the authors, user agents should not insert quotation marks in the default style.

The usage of BLOCKQUOTE to indent text is deprecated in favor of style sheets.

9.2.3 Subscripts and superscripts: the SUB and SUP elements

<!ELEMENT (SUB|SUP) - - (%inline;)*    -- subscript, superscript -->
<!ATTLIST (SUB|SUP)
  %attrs;                              -- %coreattrs, %i18n, %events --
  >

Start tag: required, End tag: required

Attributes defined elsewhere

Many scripts (e.g., French) require superscripts or subscripts for proper rendering. The SUB and SUP elements should be used to markup text in these cases.

      H<sub>2</sub>O
      E = mc<sup>2</sup>
      <SPAN lang="fr">M<sup>lle</sup> Dupont</SPAN>

9.3 Lines and Paragraphs

Authors traditionally divide their thoughts and arguments into sequences of paragraphs. The organization of information into paragraphs is not affected by how the paragraphs are presented: paragraphs that are double-justified contain the same thoughts as those that are left-justified.

The HTML markup for defining a paragraph is straightforward: the P element defines a paragraph.

The visual presentation of paragraphs is not so simple. A number of issues, both stylistic and technical, must be addressed:

We address these questions below. Paragraph alignment and floating objects are discussed later in this document.

9.3.1 Paragraphs: the P element

<!ELEMENT P - O (%inline;)*            -- paragraph -->
<!ATTLIST P
  %attrs;                              -- %coreattrs, %i18n, %events --
  >

Start tag: required, End tag: optional

Attributes defined elsewhere

The P element represents a paragraph. It cannot contain block-level elements (including P itself).

We discourage authors from using empty P elements. User agents should ignore empty P elements.

9.3.2 Controlling line breaks

A line break is defined to be a carriage return (&#x000D;), a line feed (&#x000A;), or a carriage return/line feed pair. All line breaks constitute white space.

For more information about SGML's specification of line breaks, please consult the notes on line breaks in the appendix.

Forcing a line break: the BR element 

<!ELEMENT BR - O EMPTY                 -- forced line break -->
<!ATTLIST BR
  %coreattrs;                          -- id, class, style, title --
  >

Start tag: required, End tag: forbidden

Attributes defined elsewhere

The BR element forcibly breaks (ends) the current line of text.

For visual user agents, the clear attribute can be used to determine whether markup following the BR element flows around images and other objects floated to the left or right margin, or whether it starts after the bottom of such objects. Further details are given in the section on alignment and floating objects. Authors are advised to use style sheets to control text flow around floating images and other objects.

With respect to bidirectional formatting, the BR element should behave the same way the [ISO10646] LINE SEPARATOR character behaves in the bidirectional algorithm.

Prohibiting a line break 

Sometimes authors may want to prevent a line break from occurring between two words. The &nbsp; entity (&#160; or &#xA0;) acts as a space where user agents should not cause a line break.

9.3.3 Hyphenation

In HTML, there are two types of hyphens: the plain hyphen and the soft hyphen. The plain hyphen should be interpreted by a user agent as just another character. The soft hyphen tells the user agent where a line break can occur.

Those browsers that interpret soft hyphens must observe the following semantics: If a line is broken at a soft hyphen, a hyphen character must be displayed at the end of the first line. If a line is not broken at a soft hyphen, the user agent must not display a hyphen character. For operations such as searching and sorting, the soft hyphen should always be ignored.

In HTML, the plain hyphen is represented by the "-" character (&#45; or &#x2D;). The soft hyphen is represented by the character entity reference &shy; (&#173; or &#xAD;)

9.3.4 Preformatted text: The PRE element

<!ENTITY % pre.exclusion "IMG|OBJECT|BIG|SMALL|SUB|SUP">

<!ELEMENT PRE - - (%inline;)* -(%pre.exclusion;) -- preformatted text -->
<!ATTLIST PRE
  %attrs;                              -- %coreattrs, %i18n, %events --
  >

Start tag: required, End tag: required

Attribute definitions

width = number [CN]
Deprecated. This attribute provides a hint to visual user agents about the desired width of the formatted block. The user agent can use this information to select an appropriate font size or to indent the content appropriately. The desired width is expressed in number of characters. This attribute is not widely supported currently.

Attributes defined elsewhere

The PRE element tells visual user agents that the enclosed text is "preformatted". When handling preformatted text, visual user agents:

Non-visual user agents are not required to respect extra white space in the content of a PRE element.

For more information about SGML's specification of line breaks, please consult the notes on line breaks in the appendix.

The DTD fragment above indicates which elements may not appear within a PRE declaration. This is the same as in HTML 3.2, and is intended to preserve constant line spacing and column alignment for text rendered in a fixed pitch font. Authors are discouraged from altering this behavior through style sheets.

The following example shows a preformatted verse from Shelly's poem To a Skylark:

<PRE>
       Higher still and higher
         From the earth thou springest
       Like a cloud of fire;
         The blue deep thou wingest,
And singing still dost soar, and soaring ever singest.
</PRE>

Here is how this is typically rendered:

       Higher still and higher
         From the earth thou springest
       Like a cloud of fire;
         The blue deep thou wingest,
And singing still dost soar, and soaring ever singest.

The horizontal tab character
The horizontal tab character (decimal 9 in [ISO10646] and [ISO88591] ) is usually interpreted by visual user agents as the smallest non-zero number of spaces necessary to line characters up along tab stops that are every 8 characters. We strongly discourage using horizontal tabs in preformatted text since it is common practice, when editing, to set the tab-spacing to other values, leading to misaligned documents.

9.3.5 Visual rendering of paragraphs

Note. The following section is an informative description of the behavior of some current visual user agents when formatting paragraphs. Style sheets allow better control of paragraph formatting.

How paragraphs are rendered visually depends on the user agent. Paragraphs are usually rendered flush left with a ragged right margin. Other defaults are appropriate for right-to-left scripts.

HTML user agents have traditionally rendered paragraphs with white space before and after, e.g.,

  At the same time, there began to take form a system of numbering,
  the calendar, hieroglyphic writing, and a technically advanced
  art, all of which later influenced other peoples.

  Within the framework of this gradual evolution or cultural
  progress the Preclassic horizon has been divided into Lower,
  Middle and Upper periods, to which can be added a transitional
  or Protoclassic period with several features that would later
  distinguish the emerging civilizations of Mesoamerica.

This contrasts with the style used in novels which indents the first line of the paragraph and uses the regular line spacing between the final line of the current paragraph and the first line of the next, e.g.,

     At the same time, there began to take form a system of
  numbering, the calendar, hieroglyphic writing, and a technically
  advanced art, all of which later influenced other peoples.
     Within the framework of this gradual evolution or cultural
  progress the Preclassic horizon has been divided into Lower,
  Middle and Upper periods, to which can be added a transitional
  or Protoclassic period with several features that would later
  distinguish the emerging civilizations of Mesoamerica.

Following the precedent set by the NCSA Mosaic browser in 1993, user agents generally don't justify both margins, in part because it's hard to do this effectively without sophisticated hyphenation routines. The advent of style sheets, and anti-aliased fonts with subpixel positioning promises to offer richer choices to HTML authors than previously possible.

Style sheets provide rich control over the size and style of a font, the margins, space before and after a paragraph, the first line indent, justification and many other details. The user agent's default style sheet renders P elements in a familiar form, as illustrated above. One could, in principle, override this to render paragraphs without the breaks that conventionally distinguish successive paragraphs. In general, since this may confuse readers, we discourage this practice.

By convention, visual HTML user agents wrap text lines to fit within the available margins. Wrapping algorithms depend on the script being formatted.

In Western scripts, for example, text should only be wrapped at white space. Early user agents incorrectly wrapped lines just after the start tag or just before the end tag of an element, which resulted in dangling punctuation. For example, consider this sentence:

   A statue of the <A href="cih78">Cihuateteus</A>, who are patron ...

Wrapping the line just before the end tag of the A element causes the comma to be stranded at the beginning of the next line:

  A statue of the Cihuateteus
  , who are patron ...

This is an error since there was no white space at that point in the markup.

9.4 Marking document changes: The INS and DEL elements

<!-- INS/DEL are handled by inclusion on BODY -->
<!ELEMENT (INS|DEL) - - (%flow;)*      -- inserted text, deleted text -->
<!ATTLIST (INS|DEL)
  %attrs;                              -- %coreattrs, %i18n, %events --
  cite        %URI;          #IMPLIED  -- info on reason for change --
  datetime    %Datetime;     #IMPLIED  -- date and time of change --
  >

Start tag: required, End tag: required

Attribute definitions

cite = uri [CT]
The value of this attribute is a URI that designates a source document or message. This attribute is intended to point to information explaining why a document was changed.
datetime = datetime [CS]
The value of this attribute specifies the date and time when the change was made.

Attributes defined elsewhere

INS and DEL are used to markup sections of the document that have been inserted or deleted with respect to a different version of a document (e.g., in draft legislation where lawmakers need to view the changes).

These two elements are unusual for HTML in that they may serve as either block-level or inline elements (but not both). They may contain one or more words within a paragraph or contain one or more block-level elements such as paragraphs, lists and tables.

This example could be from a bill to change the legislation for how many deputies a County Sheriff can employ from 3 to 5.

<P>
  A Sheriff can employ <DEL>3</DEL><INS>5</INS> deputies.
</P>

The INS and DEL elements must not contain block-level content when these elements behave as inline elements.

ILLEGAL EXAMPLE:
The following is not legal HTML.

<P>
<INS><DIV>...block-level content...</DIV></INS>
</P>

User agents should render inserted and deleted text in ways that make the change obvious. For instance, inserted text may appear in a special font, deleted text may not be shown at all or be shown as struck-through or with special markings, etc.

Both of the following examples correspond to November 5, 1994, 8:15:30 am, US Eastern Standard Time.

     1994-11-05T13:15:30Z
     1994-11-05T08:15:30-05:00

Used with INS, this gives:

<INS datetime="1994-11-05T08:15:30-05:00"
        cite="http://www.foo.org/mydoc/comments.html">
Furthermore, the latest figures from the marketing department
suggest that such practice is on the rise.
</INS>

The document "http://www.foo.org/mydoc/comments.html" would contain comments about why information was inserted into the document.

Authors may also make comments about inserted or deleted text by means of the title attribute for the INS and DEL elements. User agents may present this information to the user (e.g., as a popup note). For example:

<INS datetime="1994-11-05T08:15:30-05:00"
        title="Changed as a result of Steve B's comments in meeting.">
Furthermore, the latest figures from the marketing department
suggest that such practice is on the rise.
</INS>