maxprograms logo

Selecting a Translation Tool for DITA

lenses on document

By Rodolfo M. Raya (rmraya@maxprograms.com)
Chief Technical Officer, Maxprograms
July 2012

Introduction

DITA is an XML vocabulary, but not just any XML. It has certain particularities that are not easy to handle by an ordinary XML editor or a translation tool.

Like an XML editor that is good for authoring in DITA, a translation tool capable of properly handling DITA files should:

  • Be able to resolve DITA content references, supporting the conref attribute or the keyref mechanism;
  • Be able to support DITA specializations, allowing the customization of translatable elements and attributes.
  • Understand the translate attribute.

 

The content referencing problem

The DITA file shown in Listing 1 below has conref attributes that reference elements from the file shown in Listing 2.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE task PUBLIC "-//OASIS//DTD DITA Task//EN" "task.dtd">
<task id="task_hdj_drv_bh">
    <title>Applying XSL Transformation</title>
    <taskbody>
        <steps>
            <step>
                <cmd>Open the document to transform.</cmd>
            </step>
            <step>
                <cmd>In <uicontrol conref="ui_reference.dita#uiref/xsl_menu"/> menu, select
                        <uicontrol conref="ui_reference.dita#uiref/xsl_trans"/>.</cmd>
            </step>
            <step>
                <cmd>Select the appropriate XSL Stylesheet</cmd>
            </step>
            <step>
                <cmd>Click the <uicontrol conref="ui_reference.dita#uiref/xsl_apply"/> button.</cmd>
            </step>
        </steps>
    </taskbody>
</task>

Listing 1 - DITA topic that uses conref mechanism

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
<concept id="uiref">
 <title>UI Elements</title>
 <conbody>
  <p><uicontrol id="xsl_menu">Transformation</uicontrol>: program menu that contains all
   transformation options.</p>
  <p><uicontrol id="xsl_trans">XSL Transformation</uicontrol>: applies an XSL Stylesheet to an XML
   document.</p>
  <p><uicontrol id="xsl_apply">Apply Transformation</uicontrol>: applies the selected XSL Stylesheet
   to the current open document.</p>
 </conbody>
</concept>

Listing 2 - DITA topic that contains referenced text

An XML editor able to resolve the conref attributes in would display that file in WYSIWYG mode as shown in Figure 1.

conref resolved by an XML editor
Figure 1 - File from Listing 1 displayed by <oXygen/>, a DITA-enabled XML Editor

For a technical writer working with DITA, it is important that the chosen XML editor resolves conref attributes and displays the referenced content.

For a translator it is also essential to see the text being translated in a complete representation. If conref content is not resolved when translatable text is extracted from the DITA file, the translator will lack the necessary context for performing the translation task.

In Figure 2 below you can see translatable text from Listing 1 extracted by a Computer Aided Translation (CAT) tool that supports DITA content referencing. In Figure 3 and Figure 4 you see the same text extracted by two tools that treat DITA documents as regular XML.

conref resolved by Swordfish III
Figure 2 Swordfish Translation Editor
conref ignored by Trados Studio
Figure 3 Trados Studio
conref ignored by memoQ 6
Figure 4 MemoQ

The pictures shown above include markers that represent the original DITA markup. In one case (Figure 2) you can see the actual text referenced by conref attributes; in the other picture you see just markers.

By using tools that extract complete sentences from your DITA sources, you give translators the context they need. Although this adds to the price you pay if your Localization Service Provider (LSP) charges you by words, the cost increase should be compensated by an improvement in translation quality that would require less review work.

The customization problem

DITA includes a set of DTDs and XML Schemas that contain almost all elements and attributes needed in a standard documentation project. Nevertheless, sometimes the standard set of elements and attributes is not enough and custom extensions are needed.

DITA has a standard extension mechanism known as "specialization". DITA users are allowed to modify the default set of DTDs and XML Schemas, following certain rules, to incorporate the pieces they need.

As DITA is becoming more and more popular, many translation tool vendors include configuration files for the XML filters of their tools that facilitate text extraction from standard DITA documents. Unfortunately, not all tools allow support for DITA specializations.

If you use specialization in your DITA projects, the translation tool used to process your files should:

  • Allow you to customize the list of translatable elements and attributes;
  • Allow you to incorporate your custom DTDs and XML Schemas in the tool's XML catalog (if it uses one).

Even if you don't use specializations, you may still require customized translations. For example, the standard <draft-comment> element is normally used for internal consumption and readers of the published documentation almost never see its content. Thereafter, the element <draft-comment> is usually treated as untranslatable by CAT tools. However, you may still need a translation of <draft-comment> for your content reviewers. Only if you or your LSP use customizable CAT tools you will be able to get the desired translations.

Dealing with the translate attribute

Sometimes you will include portions of text in your DITA files that should not be translated. To mark those pieces as untranslatable you simply set the value of the translate attribute to no, as shown below in Listing 3.

	<p translate="no">Warning: this text should not be translated.</p>

Listing 3 - Untranslatable text

Some translation tools simply ignore the translate attribute and extract the text for translation anyway.

Notice that the translate attribute should be used with block level elements (those that contain full paragraphs or sentences), like <p>. Setting the translate attribute to no in an element that appears in the middle of a sentence is a bad idea, as the translator working with the surrounding text still needs to see the element content for context. Listing 4 shows how you can safely protect untranslatable text that appears in the middle of a sentence by referencing a copy stored in an untranslatable element.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
<concept id="locking">
 <title translate="no">Untranslatable Title</title>
 <conbody>
  <p>This sentence contains <ph conref="#locking/lock"/> text.</p>
  <draft-comment translate="no"><ph id="lock">untranslatable</ph></draft-comment>
 </conbody>
</concept>

Listing 4 - Untranslatable inline text protected in <draft-comment>

A translation tool parsing Listing 4 should be able to:

  • Ignore the <title> element;
  • Include the word "untranslatable" when extracting the <p> element;
  • Ignore the <draft-comment> element.

Below, in Figure 5, Figure 6 and Figure 7, you can see how three translation tools interpreted the content of Listing 4.

  • All respected the traslate attribute in <title>
  • Only one was able to include the referenced text in <p> for context.
  • One of them presents the <draft-comment> element with nothing to translate in it.
untranslatable text resolved by Swordfish III
Figure 5 Swordfish Translation Editor
untranslatable text as handled by Trados Studio
Figure 6 Trados Studio
untranslatable text as handled by memoQ 6
Figure 7 MemoQ

Make sure your translation tool can ignore block elements that have the translate attribute set to no.

The file handling problem

A DITA project may contain hundreds of small files. That's not unusual but normally makes file handling somewhat annoying.

When working with a large number of files, DITA teams may opt for using a Content Management System (CMS) or a version management system like CVS or SVN. A CMS is not really required for working with DITA but it may simplify project management.

A CMS may help you separate the files referenced by a DITA map and prepare a package for translation. If you don't have a CMS, you may use a DITA-enabled translation tool for separating the files that need translation from those that don't.

A DITA-enabled translation tool should be able to parse a DITA map and resolve the references to all topics and subtopics, preparing a unified package that you can send to your LSP.

If your LSP charges you for file management, you can reduce cost by preparing a consolidated translation package in-house.

Resources

  • Download a test package containing the files shown in Listing 1, Listing 2 and Listing 4 plus a DITA map and verify if your translation tool can:
    • resolve conref and href attributes;
    • understand the translate attribute;
    • generate a unified package by parsing a DITA map.
  • Read Using XLIFF to Translate DITA Projects, an article prepared by the OASIS DITA Adoption Technical Committee and learn how to improve your translation workflow.
  • Download a copy of Fluenta DITA Translation Manager, a tool that implements the translation workflows suggested by the DITA Adoption TC at OASIS.

About the author

Rodolfo M. Raya

Rodolfo Raya is Maxprograms' CTO (Chief Technical Officer), where he develops multi-platform translation/localisation and content publishing tools using XML and Java technology. He can be reached at rmraya@maxprograms.com.