Retrieving XML Programmatically in Word (Do we need the Word Object Model anymore?)
In my last entry I spoke about the handy .InsertXML method. This method is a brute force method to insert WordML and Arbitrary XML into a Word Document. What if we want to get XML back out of a document?
It is possible to save the entire document as WordML or just the XML from a schema marked up document from the Word Save As dialog box. However saving out the entire document to XML is overkill if all you want is a part of the document as XML programmatically.
The Range object in Word has an .XML method that returns the current range as XML. For example, imagine a paragraph of text is selected in a document and you want the WordML (WordML is the XML syntax used by Word for persisting word formatting). To do this you simply:
Debug.print Selection.XML
The following is a simplified snippet of WordML returned by the .XML method:
<w:wordDocument
xmlns:w="http://schemas.microsoft.com/office/word/2003/wordml"
xml:space="preserve">
<w:body>
<w:p>
<w:r>
<w:t>This is the text</w:t>
</w:r>
</w:p>
</w:body>
</w:wordDocument>
Actually, the WordML returned by Word is much more verbose than this, but I have boiled it down to its basic elements.
The .XML method also accepts a Boolean value that indicates if WordML (false and this is the default) or Data Only (True) should be returned. If the current selection in Word contained XML Schema markup as shown here:
and the following code is executed:
and the following code is executed:
Debug.print Selection.XML(True)
What is returned?
<?xml version="1.0" standalone="no"?>
<Customer><ID>1001</ID><Name>OfficeZealot.com</Name></Customer>
So to summarize, you don’t have to save a document to get the XML out of Word. Simply use the .XML method on the Range object.
Armed with the knowledge that we can retrieve XML with .XML and Insert XML with the .InsertXML it has caused me to ponder the following:
Why do we need the rest of the Word object model?
I'll ponder some "deep thoughts" about this later.