XML For Beginners

XML For Beginners

Author: Emmanuel KARTMANN <emmanuel@kartmann.org>.

Date: January 26th, 2000

Contents

bulletIntroduction
bulletWhy did I write this article?
bulletWhat the Hell is XML?
bulletWhat's next in this article?
bulletXML Basics
bulletXML in ten points
bulletXSL, the hidden brother of XML
bulletW3C Document Object Model
bulletXML tools
bulletDominate the Microsoft DOM
bulletIntroduction
bulletXML Text to Tree
bulletSAMPLE CODE (VBScript)
bulletSAMPLE CODE (VC++)
bulletSAMPLE CODE (JavaScript)
bulletXML Online References and Resources

Introduction

Why did I write this article?

I started working with XML a few months ago, in September 1999. At first, it seemed like yet another Web-related language, but during that period (less than 6 months), more than 20 people asked me "Do you know XML?" or "Can you give me pointers on XML info?". Some were mere friends, some were recruiters, others were potential employers... one was my brother (even my family is XML-addicted).

I was both glad to be a valuable source of information and sick of repeating the same thing over and over again. So let me put this straight: I KNOW XML. I don't know everything about it, and I don't consider myself as an XML expert (!), but I know enough to share and give all of you a good starting point to use this hot technology. Now let's get to the point: XML!

What the Hell is XML?

XML stands for eXtensible Markup Language; it defines a text format for structured documents and data. Any kind of structured data can be represented in XML (thus the term "eXtensible"): spreadsheets, texts, phone book entries, etc...

Like HTML, XML is a Markup language, i.e. it uses tags (words between '<' and '>') and attributes (in the form 'name=value'). Unlike HTML, XML uses tags to delimit chunks of data, without any interpretation of the data (HTML adds semantics to the tags, e.g. <B> </B> means that the text should be shown in bold font). XML leaves the interpretation of tags (their meaning) to the application that reads/writes the data. Unlike HTML, XML lets you create your own tags for use with your application.

Here's a simple XML file which represents a simple model of a mailbox:


<?xml version="1.0"?>
<MAILBOX>
    <MESSAGES>
        <MESSAGE>
            <READ/>
            <HEADER>
                <FROM>alexis@kartmann.com</FROM>
                <TO>emmanuel@kartmann.org</TO>
                <SUBJECT>Quid novi sub sole?</SUBJECT>
            </HEADER>
            <BODY>Yo brother, What's up?</BODY>
        </MESSAGE>
        <MESSAGE>
            <HEADER>
                <FROM>eric@kartmann.com</FROM>
                <TO>emmanuel@kartmann.org</TO>
                <SUBJECT>Scr... you! I'm going home!</SUBJECT>
            </HEADER>
            <BODY></BODY>
        </MESSAGE>
    </MESSAGES>
</MAILBOX>

Note that this XML file only defines a structure for the data; it doesn't define any semantic for the tags. However, their meaning is quite obvious if the tag name is picked correctly. I will use this file for my source code examples below. If you have Internet Explorer, you can use it to view this file as a tree of expandable nodes.

What's next in this article?

In this article, I will first define the XML basics, then I will present the Microsoft implementation of XML Document Object Model (XMLDOM), and I will end with sample source code in VBScript, JavaScript and VC++ showing how to use XMLDOM.

And I will finish by a compilation of XML resources, at W3C's, Microsoft's and other web sites.

XML Basics

XML in ten points

  1. XML is a Markup language: it uses tags (words between '<' and '>') and attributes (in the form 'name=value')
  2. XML files are text files: it allows experts to debug applications easily
  3. XML documents should begin with an XML declaration which specifies the version of XML being used, e.g. <?xml version="1.0"?>
  4. Each XML document contains one or more elements, the boundaries of which are either delimited by start-tags (e.g. <MESSAGE>) and end-tags (e.g. </MESSAGE>)
  5. Empty elements are defined by an empty-element tag, e.g. <READ/>
  6. The ampersand character (&) and the left angle bracket (<) may appear in their literal form only when used as markup delimiters. Otherwise, they must be escaped using either numeric character references (e.g. &#38; for ampersand and &#60; for left angle bracket) or the strings "&amp;" and "&lt;" respectively. The right angle bracket (>) may be represented using the string "&gt;"
  7. XML comments are delimited by <!-- and -->. For compatibility, the string "--" (double-hyphen) must not occur within comments
  8. Attributes are used to associate name-value pairs with tags. Attribute types and default values may be defined in XML. Attribute values are defined between single-quotes (') or double-quotes (")
  9. Attribute values can contain quote and double-quotes: they must be escaped by "&apos;" for the apostrophe or single-quote character (') and by "&quot;" for the double-quote character (")
  10. Special attribute xml:space, when set to preserve, means that white space should be preserved by applications processing the XML document

For more about XML, please refer to the XML references below.

XSL, the hidden brother of XML

XSL stands for eXtensible Stylesheet Language. It is a language for expressing stylesheets. It consists of two parts:

bulleta programming language for transforming XML documents: XSL Transformations (XSLT) and XML Path Language (XPath). This language provides pattern matching (xsl:template match), conditional statements (xsl:when test), loops (for-each), etc...
bulletan XML vocabulary for specifying formatting semantics: similar to W3C cascading style sheets (CSS), this vocabulary provides enhanced presentation features.

Note that XSL is expressed in XML.

Here's a simple XSL file which transforms the XML file above (the mailbox model) into an HTML table:


<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/TR/WD-xsl">
  <xsl:template match="/">
    <TABLE>                                                          <- start the table
      <TR>                                                           <- first row: header (From/To/Subject)
        <TD>From</TD>
        <TD>To</TD>
        <TD>Subject</TD>
      </TR>
      <xsl:for-each select="MAILBOX/MESSAGES/MESSAGE">               <- loop on every message
        <TR>                                                           <- start row (one row per message)
          <TD><xsl:value-of select="HEADER/FROM"/></TD>                  <- print the value of FROM
          <TD><xsl:value-of select="HEADER/TO"/></TD>                    <- print the value of TO
          <TD><xsl:value-of select="HEADER/SUBJECT"/></TD>               <- print the value of SUBJECT
        </TR>                                                          <- end row
      </xsl:for-each>                                                <- next message
    </TABLE>
  </xsl:template>
</xsl:stylesheet>

Using XML in combination with XSL, you can truly separate data from presentation (and make them evolve separately). For example, the same XML data could be transformed into another HTML table with a completely different style. Please refer to the sample code (JavaScript) to see an example (plain and fancy tables generated from the same XML file).

W3C Document Object Model

The Document Object Model (DOM) is a W3C standard that defines a programming model for accessing XML (and HTML) documents. In other words, the DOM defines a set of objects, properties and methods to ease the programmer's work when handling XML documents. It is similar to an API, but it is language-independent.

For more about the XML DOM, please refer to the W3C Web Site.

XML tools

Microsoft provides a XML editor called XML Notepad. It has a very cool GUI (explorer-like: tree view on the left, list view on the right) for viewing and editing XML files. Here's what it looks like when you open the XML document show above:

You can download the XML Notepad from Microsoft Web site for free.

Dominate the Microsoft DOM

Introduction

The Microsoft XML Parser is implemented as a COM component; it is embedded in Internet Explorer (version 4.0 SP1 and higher) but you can download and install it separately (see Microsoft XML Parser Redistributable below). Please note that there are several incompatible versions of the Microsoft XML Parser: do not hesitate to upgrade your XML Parser to the latest version (for example, if you have IE4 and do not want to upgrade to IE5, you can still upgrade the XML parser alone).

In order to use the Microsoft XMLDOM, you must first create an instance of the COM object. After that, the XMLDOM provides the following features:

bulletLoad an XML document, i.e. create a tree structure in memory based on an XML file (or buffer). The XMLDOM can even download an XML document from a remote location (using an URL as the document location). See methods IXMLDOMDocument::load and IXMLDOMDocument::loadXML for details.
bulletSave an XML document, i.e. create an XML file based on the tree structure in memory representing the XML document. See method IXMLDOMDocument::save for details.
bulletAccess the document tree, i.e. walk through the tree, select one or several nodes, etc... Microsoft provides several COM objects (interfaces) to do so: XMLDOMNode, XMLDOMNodeList, IXMLDOMNamedNodeMap.
bulletAdd or remove nodes in the document tree. See methods IXMLDOMDocument::createNode, IXMLDOMDocument::createTextNode, IXMLDOMDocument::createElement, IXMLDOMNode::appendChild, and IXMLDOMNode::removeChild for details.
bulletParse an XML document and transform it through an XSL stylesheet. See methods IXMLDOMNode::transformNode and IXMLDOMNode::transformNodeToObject for details.

The XMLDOM handles document tree as follows:
bulletEvery node has a name (a string). Optionally, the name can be empty.
bulletEvery node has a type: NODE_ELEMENT (for tags), NODE_ATTRIBUTE (for attributes), NODE_TEXT (for text within a tag), etc...
Please refer to the DOMNodeType enumeration for details.
bulletEvery element node (tags) can have one or more attributes (e.g. "STYLE" of the "TR" node). Attributes are stored in children nodes of type NODE_ATTRIBUTE.
bulletEvery text node has one property: the text (e.g. "emmanuel@kartmann.org" in our example).

XML Text to Tree

In the example document MySecondStyle.xsl, the tag named "TR" has an attribute "STYLE" with value "font-family:Verdana; font-size:12pt". It is represented by 2 nodes: one for the TR tag, one for the STYLE attribute, as shown in picture below.

XML Text String mapped to XML Document Tree: Tags with attributes

In the other example document MyDocument.xml, the tag named HEADER contains several children tags (FROM, TO,...) each of them containing a text. It is represented by a node with 2 children nodes, both having children nodes of type text, a shown in picture below. Note that the text nodes have no names (and no children).

XML Text String mapped to XML Document Tree: Tags with text

Understanding the internal representation of the XMLDOMDocument is essential to walk the XML tree properly. Check the VC++ example below.

SAMPLE CODE (VBScript)


    ' Create 2 instance of the DOM (one for the document, one for the style)
    Set XMLDocument = CreateObject("Microsoft.XMLDOM")
    Set XMLStyle = CreateObject("Microsoft.XMLDOM")

    ' Load XML document from an URL (local file)
    XMLDocument.load "MyDocument.xml"

    ' Load XSL document from an URL (local file)
    XMLStyle.load "MyStyle.xsl"

    ' Process XML with XSL (let's make HTML)
    Set ProcessedText = XMLDocument.transformNode(XMLStyle)

    ' Show generated HTML
    MsgBox ProcessedText

SAMPLE CODE (VC++)

The sub-directory SampleCodeVCPP contains applications that show how to use the Microsoft implementation of XML Document Object Model (XMLDOM):

bulletXMLBuildTree: creates an XML document tree in memory and prints it in stdout.
bulletXMLTransform: transforms an XML file using an XSL stylesheet.
Please refer to the "ReadMe.txt" files in the corresponding sub-directories for more details. You can see an extract of the VC++ sample below.

    // =======================================================================
    // Include XMLDOM definition
    // =======================================================================
    #include <msxml.h>

    // =======================================================================
    // Initialize COM (CAUTION: ALL THREADS MUST CALL THIS FUNCTION)
    // =======================================================================
    HRESULT hResult = CoInitialize(NULL); 
    if (FAILED(hResult)) {
        cerr << "Cannot initialize COM: Error "
             << GetLastError()
             << endl;
    } else {

        // =======================================================================
        // Declare XMLDOM variables
        // =======================================================================
        IXMLDOMDocument *pXMLDocument = NULL;
        IXMLDOMDocument *pXMLStyle = NULL;

        // =======================================================================
        // Get the CLSID of the "Microsoft.XMLDOM" object
        // (this is a trick to check that the component is installed)
        // =======================================================================
        LPCOLESTR lpszProgID = L"Microsoft.XMLDOM";
        CLSID clsid;
        hResult = CLSIDFromProgID(lpszProgID, &clsid);
        if (FAILED(hResult)) {
            cerr << "Microsoft XMLDOM is not installed on your system!" << endl;
        } else {

            // =======================================================================
            // Create first XML Document Object Model (XMLDOM)
            // =======================================================================
            hResult = CoCreateInstance(clsid, NULL, CLSCTX_INPROC_SERVER, IID_IXMLDOMDocument, (void**)&pXMLDocument);
            if (FAILED(hResult)) {
                cerr << "Cannot create 1st Microsoft XMLDOM: Error "
                     << GetLastError()
                     << endl;
            } else {

                // =======================================================================
                // Create second XML Document Object Model (XMLDOM)
                // =======================================================================
                hResult = CoCreateInstance(clsid, NULL, CLSCTX_INPROC_SERVER, IID_IXMLDOMDocument, (void**)&pXMLStyle);
                if (FAILED(hResult)) {
                    cerr << "Cannot create 2nd Microsoft XMLDOM: Error "
                         << GetLastError()
                         << endl;
                } else {

                    _bstr_t BXMLDocumentFile = "MyDocument.xml";
                    _bstr_t BXMLStyleFile = "MyStyle.xml";
                    int nIndentStep = 4; // 4 spaces
                    VARIANT_BOOL vb;

                    // =======================================================================
                    // Load XML Document (synchrone)
                    // =======================================================================
                    pXMLDocument->put_async(VARIANT_FALSE);
                    hResult = pXMLDocument->load(_variant_t(BXMLDocumentFile), &vb);

                    if (FAILED(hResult)) {
                        cerr << "Cannot load XML document"
                             << (LPCTSTR)BXMLDocumentFile
                             << endl;
                    } else {

                        // =======================================================================
                        // Load XML Style (synchrone)
                        // =======================================================================
                        pXMLStyle->put_async(VARIANT_FALSE);
                        hResult = pXMLStyle->load(_variant_t(BXMLStyleFile), &vb);

                        if (FAILED(hResult)) {
                            cerr << "Cannot load XML document"
                                 << (LPCTSTR)BXMLStyleFile
                                 << endl;
                        } else {

                            CString szXMLString;

                            // =======================================================================
                            // Transform XML with XSL
                            // =======================================================================
                            BSTR pBXMLString = NULL;
                            VARIANT vObject;
                            IXMLDOMDocument *pOutputXMLDocument = NULL;
                            IDispatch *pDisp = NULL;
                            hResult = CoCreateInstance(CLSID_DOMDocument, NULL, CLSCTX_INPROC_SERVER, IID_IXMLDOMDocument, (void**)&pOutputXMLDocument);

                            hResult = pOutputXMLDocument->QueryInterface(IID_IDispatch, (void **)&pDisp);
                            vObject.vt = VT_DISPATCH;   
                            vObject.pdispVal = pDisp;

                            // Use transformNodeToObject
                            hResult = pXMLDocument->transformNodeToObject(pXMLStyle, vObject);

                            if (!FAILED(hResult)) {
                                CXMLTool::MakeXMLBuffer(pOutputXMLDocument, szXMLString, nIndentStep);
                            }

                            // =======================================================================
                            // Print the transformed string
                            // =======================================================================
                            cout << (LPCTSTR)szXMLString;

                        }
                    }
                }
            }
        }
    }


    // =======================================================================
    // UnInitialize COM
    // =======================================================================
    CoUninitialize();

The CXMLTool used in this example is available in the subdirectory SampleCodeVCPP. For more about the CXMLTool class, please refer to the CXMLTool HTML documentation.

SAMPLE CODE (JavaScript)


    // Create 2 instance of the DOM (one for the document, one for the style)
    var XMLDocument
    var XMLStyle
    var ProcessedText
    XMLDocument = new ActiveXObject("Microsoft.XMLDOM");
    XMLStyle = new ActiveXObject("Microsoft.XMLDOM");

    // Load XML document from an URL (local file)
    XMLDocument.load("MyDocument.xml");

    // Load XSL document from an URL (local file)
    XMLStyle.load("MyStyle.xsl");

    // Process XML with XSL (let's make HTML)
    ProcessedText = XMLDocument.transformNode(XMLStyle);

    // Show generated HTML (dialog box)
    window.alert(ProcessedText);

XML Online References and Resources

W3C

bulletExtensible Markup Language (XML) at W3C Web Site
http://www.w3.org/XML/

bulletXML in 10 points (explains XML briefly)
http://www.w3.org/XML/1999/XML-in-10-points

bulletExtensible Markup Language (XML) 1.0, W3C Recommendation 10-February-1998
http://www.w3.org/TR/REC-xml

bulletExtensible Stylesheet Language (XSL) Version 1.0, W3C Working Draft 12 January 2000
http://www.w3.org/TR/xsl/

bulletXSL Transformations (XSLT) Version 1.0, W3C Recommendation 16 November 1999
http://www.w3.org/TR/xslt

bulletXML Path Language (XPath) Version 1.0, W3C Recommendation 16 November 1999
http://www.w3.org/TR/xpath

bulletDocument Object Model (DOM) at W3C Web Site
http://www.w3.org/DOM/

bulletNamespaces in XML, World Wide Web Consortium 14-January-1999
http://www.w3.org/TR/REC-xml-names/

Microsoft

bulletExtensible Markup Language (XML) at Microsoft's Web Site
http://msdn.microsoft.com/xml/default.asp

bullet"Introduction to XML", MSDN Online Documentation
http://msdn.microsoft.com/xml/general/intro.asp

bullet"Why XML?", MSDN Online Documentation
http://msdn.microsoft.com/xml/general/whyxml.asp

bullet"XML: The ASCII of the Future?", Steve Land, Corbis
http://msdn.microsoft.com/library/techart/xmlfinal.htm

bullet"XML Developer's Guide", MSDN Online Documentation
http://msdn.microsoft.com/library/psdk/xmlsdk/xmlp91b9.htm

bullet"A Beginner's Guide to the XML DOM", Brian Randell, DevelopMentor
http://msdn.microsoft.com/xml/articles/beginner.asp

bullet"Using the XMLDOMDocument Object", MSDN Online Documentation
http://msdn.microsoft.com/xml/xmlguide/dom-guide-document.asp

bullet"Getting started with XSL", MSDN Online Documentation
http://msdn.microsoft.com/xml/xslguide/xsl-overview.asp

bullet"Getting Started with XSL", MSDN Online Documentation
http://msdn.microsoft.com/xml/xslguide/xsl-overview.asp

bullet"Using the XSL Processor", MSDN Online Documentation
http://msdn.microsoft.com/xml/xslguide/transform-overview.asp

bulletDownloading XML Notepad
http://msdn.microsoft.com/xml/notepad/download.asp

bulletMicrosoft XML Parser Redistributable (old)
http://msdn.microsoft.com/downloads/tools/xmlparser/xmlparser.asp

bulletMicrosoft XML Parser Technology Preview Release (new)
http://msdn.microsoft.com/downloads/webtechnology/xml/msxml.asp

bulletMicrosoft Windows DNA XML Resource Kit
http://msdn.microsoft.com/vstudio/xml/default.asp

bulletDNA XML Resource Kit: Location: Development Platform, English Pack, Disc 1, December 1999
http://msdn.microsoft.com/subscriptions/resources/subdwnld.asp

bulletMicrosoft Windows DNA XML Resource Kit Updates, MDN Online
http://msdn.microsoft.com/workshop/xml/general/DnaXmlRsk.asp

bulletXML and XSL Demos, MSDN Online
http://msdn.microsoft.com/workshop/xml/general/xmlxsldemo.asp

Microsoft Newsgroup on XML

bullet
news://microsoft.public.xml

Others

bulletXML.ORG, The XML Industry Portal, a cool site referencing XML-related subjects and software
http://www.xml.org/

bulletOASIS, Organization for the Advancement of Structured Information Standards
http://www.oasis-open.org/

bullet"XML in an Instant: A Non-geeky Introduction" by Charles Goldfarb (inventor of SGML), OASIS White Paper
http://www.xmlbooks.com/press/nongeeky.htm

bullet"XML, Java, and the future of the Web", by Jon Bosak (Sun Microsystems)
http://metalab.unc.edu/pub/sun-info/standards/xml/why/xmlapps.htm

bullet"XML - Questions & Answers", by Jon Bosak (Sun Microsystems)
http://www.isgmlug.org/n3-1/n3-1-18.htm

bullet"An Introduction to XML Processing with Lark and Larval", by Tim Bray (XML Parsers in Java)
http://www.textuality.com/Lark/

bulletW3Schools.com (XML Tutorial)
http://www.w3schools.com/xml/default.asp

bulletXML 101
http://www.xml101.com

bulletCommerceNet's XML Exchange
http://www.xmlx.com

bulletJava Technology and XML
http://java.sun.com/xml/

bulletJava Project X Core Library
http://java.sun.com/xml/docs/api/overview-summary.html

bulletJava Package com.sun.xml.parser,
http://java.sun.com/xml/jaxp/dist/1.0.1/docs/api/internal/com/sun/xml/parser/package-summary.html

bulletThe <?XML!> FAQ
http://www.ucc.ie/xml

bullet<?XML?>.com (O'Reilly & Associates)
http://www.xml.com/xml/pub