Oct 8 Transport People and Cargo quickly and safely with new XML conversion options
Please give a warm welcome to the newest converter in the DataDirect Data Integration Suite family – IATA Cargo-IMP (Cargo Interchange Message Procedures). IATA is leading the charge to move to e-freight, but at the time of this writing fully 5/6 of all cargo is still moved by paper-based processes. To learn how to use these to your advantage, we're hosting two free webinars entitled “ Still Transporting Your EDI Data the Same Old Way? Get Up to Speed and Move Data Faster and Better. ” The first is October 15, 2009 at 12:00 ET (GMT-05:00), and the second October 21, 2009 at 12:00 GMT, and registration is open now. This webinar demonstrates how DataDirect’s Data Integration Suite can help automate mission-critical EDI transactional processing for a variety of commercial industries. Data Integration Suite provides out of the box support for all IATA PADIS, IATA Cargo-IMP, IATA Cargo-FACT, and X12 EDI transactions, including error identification and handling, and custom mapping. The Cargo-IMP converter handles all 76 current messages accepted by both IATA and ATA members, including: FFR – Space allocation request FFA – Space allocation answer FSR – Status request message FSU – Unsolicited status message FSA – Status answer FWB – Air Waybill FMA and FNA – Message acknowledgement and negative acknowledgement Like the other EDI converters in the family, this inherits the bidirectional heritage, so that Cargo-IMP can be read, validated, written, augmented and reshaped. What that means is that this: 1
2
3
4
FFR/6
125-12345675FRAPHL/T25K875/CLOTHING
BA171/19MAR/LHRJFK/NN
REF/FRAFCBA using this little XQuery program, which you can easily execute inside of the Data Integration Suite : 1
2
3
4
5
6
7
8
9
10
declare option ddtek:serialize "indent=yes" ;
let $ msg := doc ( "converter:EDI?BA171.cargo" )
return
  >
    > { concat ( $ msg//FDX.1.1, ' - ' , $ msg//FDX.1.1/comment ( ) [ 2 ] ) } >
    > { data ( $ msg//CDX.3.3 ) } pieces of { data ( $ msg//CDX.7.2 ) } weighing { xs:integer ( $ msg//CDX.3.5 * 2.2 ) } pounds >
    > { concat ( $ msg//FDX.1.5/comment ( ) [ 2 ] , ' ' , $ msg//FDX.1.4 ) } >
    > { concat ( $ msg//FDX.3.2, ' - ' , $ msg//FDX.3.2/comment ( ) [ 2 ] ) } >
    > { concat ( 'Flight ' , $ msg//FDX.1.2 ) } >
  > can yield this: 1
2
3
4
5
6
7
>
    > BA - British Airways >
    > 25 pieces of CLOTHING weighing 1925 pounds >
    > March 19 >
    > JFK - New York, NY, USA John F Kennedy Intl Airport >
    > Flight 171 >
> ( You can try this yourself – the Data Integration Suite now includes the Cargo-IMP XML Converter.) Lose neither your baggage nor your data, with DataDirect IATA PADIS and Cargo-IMP converters.
Oct 2 Progress DataDirect Brings its Data Integration Workshops to International Hot Spots this October
Based on the success of its Data Integration Workshops this September in India, Progress DataDirect will offer software developers in Antwerp, Dubai, Moscow, Munich and Helsinki the same knowledgeable, how-to agenda covering data integration using XML as a common data format. The Data Integration Workshops address a wide variety of data manipulation techniques and processes – discussing how XML can be used to move data from one system to another using well accepted standards and methodologies. Mission-critical systems and end-users rely on scalable, easy access to enterprise data. However, as business grows, connecting systems and users to a variety of disparate data sources is typically a labour-intensive, costly endeavour that consumes precious IT resources. The Data Integration Workshops introduce the DataDirect Data Integration Suite as the answer to these challenges, providing a robust product architecture which includes an easy-to-use graphical integrated development environment, as well as a highly-scalable, production-ready integration processing engine that effectively queries the full spectrum of enterprise data sources using standards-based technologies and interfaces. New implementations of the XQuery, the XML query language and the XQJ interface for Java applications are introduced and scenarios of how various data sources such as XML, relational data and Web services can be both source and target of integration projects is explored in-depth. DataDirect’s XML Converters , being a key component of the Data Integration Suite is also discussed, with a demonstration of how non-XML data such as EDI can be easily converted to XML and then further processed in a Java environment with XQuery, or with Microsoft’s Visual Studio using the .NET versions of the XML Converters. The Data Integration Workshops in India brought a diverse cross-section of attendees, some keen to learn about XQuery and the potential of the new technologies, others already experienced with XQuery but wanting to understand more of how to integrate XML and XQuery with relational databases, Web services and how to convert EDI to XML. The sessions also discussed the scalability aspects of XQuery and how developed XQuery queries can be deployed in a production environment that can easily scale to as many as 100,000s of processes per day. Attendees walked away with practical, actionable knowledge regardless of their starting point at the beginning of the workshops. Join us this October! Register today at http://www.datadirect-tech.net/xml.html
Sep 15 DataDirect HIPAA/ICD Upgrade Toolkit
The Looming Problem in Healthcare EDI On January 15, 2009, the United States Department of Health and Human Services (HHS) released the final rules for the updated X12 transaction sets, version 005010, to used in conjunction with the Health Insurance Portability and Accountability Act (HIPAA). The final compliance date is January, 2013 . At the same time, the International Classification of Diseases (ICD) standard is also being bumped, from ICD-9 to ICD-10. This compliance date follows closely on the heels of the first, October, 2013 . From the viewpoint of the people who use the data, there are many good reasons for this change. In the last six years or so between HIPAA 4010 and 5010, we've learned a lot about who needs what data and when. However, the actual changes to the data format are more of the evolutionary, and not revolutionary, type. However, the ICD-9 to ICD-10 change is a radical step: It's been 30 years since the ICD-9 list was first developedICD-9 is embedded much deeper into systems – the 36 years from 1977 to 2013 means that a junior programmer who wrote ICD-9-related code at the start of his career is probably now the CTO facing the consequences of a change he never planned for an entire career earlier. The list for ICD-10 is far more detailed, at almost 10× the length
The numbering system has changed, the number of diagnostic codes increased, individual codes are more detailed. There are generally not 1:1 mappings between ICD-9 and ICD-10
Although some codes in their old classification map directly to codes in the new, most don't. Sometimes there is a series of codes, sometimes there are alternatives that will takes external information to chose from, and sometimes there are no direct alternatives. This all makes for challenges for those working with either of these standards, and since the “HI” segment of several HIPAA transaction sets can directly refer to ICD-9 or ICD-10 codes, these problems actually intersect . How can we account for the changes in where data has been moved, or where new data must be extracted from other sources to augment the existing feed? Upgrading HIPAA-4010 Systems Transforming data in HIPAA 4010 X12 files and translating ICD-9 codes has been covered in individual blog entries on the XML Connections Blog over the past few months: HIPAA 4010 to 5010: how the XQuery Update Facility helps ICD-9 to ICD-10: what is that about, and how can data integration tools help HIPAA 4010 to 5010 and ICD-9 to ICD-10: more than just a conversion The hardest part about writing a book is that first sentence. It is similar with data conversion projects. When there is so much to convert, where does one begin? To help out, DataDirect Technologies is now giving away the DataDirect HIPAA/ICD Upgrade Toolkit , which includes the following: Tranforming HIPAA 4010 to 5010 with XQuery and XML Converters XQuery source for upgrading each of the 10 transaction sets from the 4010 version to the equivalent 5010 version 50 sample 4010 files to use with the above The ICD-9 to ICD-10 maps A sample tool to compare the changes between ICD-9 codes and their closest ICD-10 analogs, with HTML output An XQuery program which will read a HIPAA file containing ICD-9 codes and report on any potential conversion troubles with ICD-10 An XQuery program to read a HIPAA EDI file, and convert it from 4010 to 5010 and from ICD-9 to ICD-10 simultanously What does XQuery have to do with HIPAA? Well, DataDirect XML Converters make EDI behave like XML. And XQuery is really good at augmenting, splitting and rearranging XML. The two together let a relatively small XQuery program do big things. For example, here is the complete program to read in a HIPAA 4010 270 file, update it to the 5010 structure, and return the results. Notice the little “method=EDI:hipaa=yes” at the start? That tells the XQuery engine that the output is going to be EDI, just like the input. 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
declare option ddtek:serialize "method=EDI:hipaa=yes:long=yes" ;
copy $ doc5010 := .
modify
(
rename node $ doc5010/HIPAA/ISA/ISA11 - InterchangeControlStandards as "ISA11-RepetitionSeparator" ,
replace value of node $ doc5010/HIPAA/ISA/ISA12 - InterchangeControlVersion with "00501" ,
rename node $ doc5010/HIPAA/ISA/ISA12 - InterchangeControlVersion as "ISA12-InterchangeControlVersionNumber" ,
rename node $ doc5010/HIPAA/ISA/ISA15 - UsageIndicator as "ISA15-InterchangeUsageIndicator" ,
insert node 005010X279
before $ doc5010/HIPAA/GS/GS08 - VersionReleaseIndustry,
delete node $ doc5010/HIPAA/GS/GS08 - VersionReleaseIndustry,
insert node 005010X279
after $ doc5010/HIPAA/TS_270/ST/ST02 - TransactionSetControlNumber,
delete node $ doc5010/HIPAA/TS_270/GROUP_1/GROUP_2/PER,
for $ eq01 in $ doc5010/HIPAA/TS_270/GROUP_1/GROUP_2/GROUP_3/EQ/EQ03 - CoverageLevelCode [ . != "FAM" ]
return replace value of node $ eq01 with "FAM" ,
delete node $ doc5010/HIPAA/TS_270/GROUP_1/GROUP_2/GROUP_3/III [ III01 - CodeListQualifierCode != "ZZ" ] ,
for $ gp3 in $ doc5010/HIPAA/TS_270/GROUP_1/GROUP_2/GROUP_3
return delete node $ gp3/III [ III01-CodeListQualifierCode = "ZZ" ] [ position ( ) & gt ; 1 ]
)
return $ doc5010 Because the XQuery engine in the Data Integration Suite supports the XQuery Update Facility (XUF), instead of having to map individually every “old” part of the document to its place in the “new” scheme of things, we can just list the parts that change, as “replace,” “rename,” “insert” or “delete”. Anything we leave alone goes through untouched; and really, not everything between 4010 to 5010 has changed. How might you use them? Suppose the next process in line from yours has already upgraded to 5010, and wants to see some of your sample output. Or perhaps someone sending data to you is still sending 4010, but you're all set for 5010. In these cases, having a way to upgrade the 4010 quickly and easily is a real win. Dealing With Your ICD-9 to ICD-10 Issues It is valuable to remember that XQuery is good at augmenting information, and the blend of an XQuery engine with an intelligent mediator sitting atop wire-protocol relational database drivers means that data can be easily mixed and merged from EDI, relational tables and XML. Data can be streamed in (read) or streamed out (written), to XML, EDI, relational databases, or other targets with equal ease and scalability. The programs that deal with ICD-9 to ICD-10 are a good example at enhancing XML data using relational tables as references. XQuery is very powerful at dealing both with tree-like data, as in XML or EDI, and relational data. The ICD-9, ICD-10, and 9-to-10 maps are all stored as relational data. This gives us the fast lookup speed over regular tables that relational databases provide, with a minimum of coding. A small piece of code like this: 1
collection ( "ICD-10" ) /ICD - 10 [ CODE = $ icd10code ] returns the ICD-10 record whose ICD-10 code value is in the $icd10code variable. There is no marshalling of data, no string substitutions, just the simple logic. The XQuery programs including in the toolkit are sufficient to convert all of the samples that come with the 4010 documentation into valid 5010 files. Your mileage may vary – these are a good starting point, but no doubt for your specific application they will need some tweaking. But isn't it a whole lot easier to change someone else's working program, than to start from scratch?
Sep 8 Are You Ready for HIPAA 5010? Take the Survey and Join us for a Free Webinar on Best Practices in Pr..
The looming deadlines for implementing HIPAA 5010 standards and ICD-10 specifications continue to cause headaches for many healthcare IT professionals. We’re taking the pulse of the Healthcare IT industry on their current HIPAA transactions, and we hope you’ll take five minutes to fill out the “ Processing HIPAA Healthcare Transactions Survey ”. The survey results will help healthcare IT professionals better understand the state of the marketplace. We will share the results during our free webinar on “ Processing HIPAA Healthcare Transactions” hosted by DataDirect on Sep 15, 2009. The webinar will cover: Converting any HIPAA 4010 and 5010 transaction set into XML Graphically defined, high performance mapping across HIPAA messages and a large variety of data sources HIPAA message validation and error handling Support for mapping custom HIPAA transactions Support for ICD-9 and ICD-10 Register for the webinar here: http://tinyurl.com/lqz3l2 .
Sep 6 Integrating Office applications with Salesforce.com – Part III
It has been a while since my last post, more than time to continue with the Integrating Office applications with Salesforce.com series. Today we’ll explain in detail with many several concrete examples, how to query and update your Salesforce.com data through XQuery. The idea is to access Salesforce.com through its webservice interface. But before looking into the Salesforce.com specifics, let’s first explain how easy it is to invoke webservices from within XQuery using the DataDirect Integration Suite . The function ddtek:wscall enables access to any webservice over SOAP. ddtek:wscall has two parameters. First a structure with technical information specifying which and how the webservice should be accessed, e.g. the address, credentials, proxy information, etc. The payload is the second parameter, in a format typically defined by the Web Services Description Language ( WSDL ). An overloaded version of ddtek:wscall has than additional parameter, it allows to specify the SOAP header. Finally the function ddtek:wscall returns an XML document with the webservice’s response. For example, assume a zip code webservice. One of the operations could be getCityByZIP, which can be easily invoked as follows. 1
2
3
4
5
6
ddtek:wscall (
address = "http://myServer/ZIPService" soapaction = "getCityByZip" /> ,
>
  > 01730 >
>
) As the parameters are all XML, you can of course call the webservice more dynamically. Assume we get a document with a list of ZIP codes and for each we like to get the city details. 1
2
3
4
5
6
7
8
for $ zip in fn:doc ( "zipcodes.xml" ) /cities/city/@ZIP
return
  ddtek:wscall (
    address = "http://myServer/ZIPService" soapaction = "getCityByZip" /> ,
    >
      > { $ zip } >
    >
  ) We’ve learned how to access webservices, let’s now bring this into practice with Salesforce.com. Salesforce.com offers two APIs, Force.com Enterprise and Force.com Partner. The former offers a strongly typed representation of your organization’s data, the latter is loosely-typed. The Force.com Web Services API Developer's Guide explains the differences in detail. For our examples here, we use the Force.com Partner API. A Salesforce.com application must first log in to the server to start a session. This session is identified by a session id, and allows to query and update your organization’s data. Rather than repeating all these webservice invocations using ddtek:wscall in our XQueries, we’ll abstract those technical details in a library. The sflib.xq library offers three functions. sf:login – allows to login to salesforce with a specific user names and password. Returns a session id sf:query – executes a query, using the Salesforce Object Query Language ( SOQL ) sf:create – creates a new object in your Salesforce.com system This example library doesn’t aim to be a complete XQuery library for accessing Salesforce.com. It is sufficient for our current exercise, and you may consider it as a starting point for your own development projects. 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
module namespace   sf = "urn:partner.soap.sforce.com" ;

declare namespace tns = "urn:partner.soap.sforce.com" ;
declare namespace ens = "urn:sobject.partner.soap.sforce.com" ;

declare variable   $ sf:compression := "gzip" ;

declare function sf:login ( $ url as xs:string , $ user as xs:string , $ pwd as xs:string ) {
  ddtek:wscall (
    address = "{$url}" soapaction = ""   timeout = "30000" /> ,
    >
      />
    > ,
    >
      > { $ user } >
      > { $ pwd } >
    >
  )
} ;

declare function sf:query ( $ id as xs:string , $ url as xs:string , $ query as xs:string ) {
  ddtek:wscall (
    compression = "{$sf:compression}" address = "{$url}" soapaction = "" timeout = "30000" /> ,
    >
      > { $ id } >
    > ,
    >
      > { $ query } >
    >
  )
} ;

declare function sf:create ( $ id as xs:string , $ url as xs:string , $ sObjects as element ( * , xs:anyType ) * ) {
  ddtek:wscall (
    compression = "{$sf:compression}" address = "{$url}" soapaction = "" timeout = "30000" /> ,
    >
      > { $ id } >
    > ,
    > { $ sObjects } >
  )
} ; In the following example, we lookup information for an Account with a given name. First sf:login starts a Salesforce.com session. Subsequently the session id is used to query the Salesforce.com system. 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
import module namespace sf = "urn:partner.soap.sforce.com" at "sflib.xq" ;

declare namespace ens = "urn:sobject.partner.soap.sforce.com" ;

declare variable $ serverUrl := "https://www.salesforce.com/services/Soap/u/9.0" ;
declare variable $ user       := "XXXXXXXX" ;
declare variable $ pwd       := "XXXXXXXX" ;

declare variable $ name as xs:string external ;

(: Login to Salesforce.com to get the session id :)
let $ login := sf:login ( $ serverUrl, $ user, $ pwd )
return
  (: Query the Accounts:)
  sf:query ( $ login//sf:sessionId, $ login//sf:serverUrl,
           fn:concat ( "SELECT id, Name, Phone, BillingStreet, BillingCity, BillingState, BillingCountry FROM Account where Name = '" , $ name, "'" ) ) Assuming we’re looking up the Account with name “GreatCompany” we could get a result as follows. 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
xmlns = "urn:partner.soap.sforce.com" >
  xmlns:xsi = "http://www.w3.org/2001/XMLSchema-instance" xsi:type = "QueryResult" >
    > true >
    xsi:nil = "true" />
    xsi:type = "sf:sObject" >
      xmlns:sf = "urn:sobject.partner.soap.sforce.com" > Account >
      xmlns:sf = "urn:sobject.partner.soap.sforce.com" > 0018000000NfSpOAAV >
      xmlns:sf = "urn:sobject.partner.soap.sforce.com" > 0018000000NfSpOAAV >
      xmlns:sf = "urn:sobject.partner.soap.sforce.com" > GreatCompany >
      xmlns:sf = "urn:sobject.partner.soap.sforce.com" > 123 - 456 - 789 >
      xmlns:sf = "urn:sobject.partner.soap.sforce.com" > 1st Avenue 123 >
      xmlns:sf = "urn:sobject.partner.soap.sforce.com" > San Fransisco >
      xmlns:sf = "urn:sobject.partner.soap.sforce.com" > CA 94087 >
      xmlns:sf = "urn:sobject.partner.soap.sforce.com" > United States >
    >
    > 1 >
  >
> And as we’re in XQuery-world, we can easily transform the data in a more readable format… 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
import module namespace sf = "urn:partner.soap.sforce.com" at "sflib.xq" ;

declare namespace ens = "urn:sobject.partner.soap.sforce.com" ;

declare variable $ serverUrl := "https://www.salesforce.com/services/Soap/u/9.0" ;
declare variable $ user       := "XXXXXXXX" ;
declare variable $ pwd       := "XXXXXXXX" ;

declare variable $ name as xs:string external ;

(: Login to Salesforce.com to get the session id :)
let $ login :=
  sf:login ( $ serverUrl, $ user, $ pwd )
(: Query the Accounts:)
let $ result :=
  sf:query ( $ login//sf:sessionId, $ login//sf:serverUrl,
           fn:concat ( "SELECT id, Name, Phone, BillingStreet, BillingCity, BillingState, BillingCountry FROM Account where Name = '" , $ name, "'" ) )
let $ account
Aug 25 Observations from the 2009 XML Workshop in Toronto
I had the opportunity to attend the DataDirect-sponsored XML Workshop in Toronto.   It was presented by Ivan Pedruzzi, Program Manager for the DataDirect XML Products.   The workshop was interactive discussing and demonstrating how the Data Integration Suite solves the problems of accessing data from various data stores.   Many questions asked by the attendees were about implementing XQuery into their specific environments.   When asked if performance was important, most of the attendees raised their hands.   Ivan explained how we use the technique of streaming and projection to query very large XML documents without reading the entire document into memory.   Also, the underlying SQL we generate for relational data access is not generic, but specific to the database.   This provides a more scalable solution for querying relational data.   Lastly, Ivan provided a sneak preview of Stylus Studio 2010’s new feature for building update expressions visually within the XQuery mapper.     All in all, the XML Workshop was a huge success.   Every person seemed engaged and got something out of it.    In fact, I had heard one attendee took notes of the presentation and presented it to her coworkers.   That’s a sign of a great workshop!   More XML Workshops are being planned for this fall.   Click here for details.
Aug 11 Data Integration Suite Summer School
Looking for a quick and easy way to get acquainted with the Data Integration Suite? If so, this Summer School program is what you are looking for! The Data Integration Suite Summer School program is a free, self-paced, online classroom experience. It has been designed to show you how the Data integration Suite has matured into a comprehensive integration software suite that is grounded in qualitative and quantifiable customer success. Summer School provides resources to help architects design strategic and tactical integration schemas, and additionally aids developers as they begin employing the Data Integration Suite within the enterprise. Here's how Summer School works: STEP ONE: Register for Summer School by clicking the Register Now button and then by completing the form. You will receive a confirmation email with the website address for your Summer School Curriculum. STEP TWO: Work your way through the Summer School Curriculum. The Summer School Curriculum is self-paced and offers numerous classes for you to choose from. You can complete every class on the curriculum, or just a few to get you started with the Data Integration Suite. STEP THREE: Sit back and relax. You will not be graded on your Summer School activities – You get an automatic A just for registering! Plus, Every Summer School registrant will be entered to win 1 of 25 Data Integration Suite Lunch Bags !
It is that easy! So jump to the head of the class and Register for the Data Integration Suite Summer School Program today!
Jun 25 Finally, XQJ 1.0
The XQuery API for Java ( XQJ ) is finally released. A single package including specification, javadoc and api jar file can be downloaded here . If you want to know more about XQJ, check out our tutorial on datadirect.com . Or want to try it out? DataDirect XQuery supports XQJ 1.0.
Jun 3 2009 XML Workshop: Processing your customer’s data formats
In June (Washington and Toronto) and September (Houston, Chicago, New York, Minneapolis) we are going to present two XML Workshops describing typical problems that IT organization involved in a variety of industry verticals often need to solve; and we wiil show how Data Integration Suite can help.
As usual the workshops will be “hands-on”, showing in practical terms how to use Data Integration Suite to complete tasks like: Creating applications that deal with different versions and formats of EDI messages Transforming EDI messages or XML documents into user-defined formats Augmenting information in incoming EDI or XML messages with data available in relational databases or Web services Exposing message cleansing operations as data services Outputting data in industry standard EDI formats This is a great opportunity to see the Data Integration Suite used in front of you to solve real problems.
To attend you just need to register; you can do that here now .
May 31 Integrating Office applications with Salesforce.com – Part II
This second post in the series Integrating Office applications with Salesforce.com talks about querying Microsoft Office Word forms. This is in no way a complete introduction on WordProcessingML, or the .docx file format used by Microsoft Word. Such great introduction is , Wouter van Vugt explains in this e-book the different file formats in OOXML with concrete examples. And if not sufficient, you may start to digest the Office Open XML (OOXML) specification . Before making our hands dirty with XQuery coding, first a word on creating Microsoft Office Word forms. You can create a form in Microsoft Office Word, by adding “content controls” to your document. As one of the key benefits of such approach, you can lock down the document, and as such only allow the user to edit the data in the controls. This guarantees the structure of the document will not be altered, and finally this will simplify querying the document as we will discuss later on. Adding “content controls” is done via the Developer tab, which is not enabled by default. Go to the “Word options”, and select “Show Developer tab in the Ribbon”. Clicking “OK”, the Developer tab is now visible. To add controls and define the layout of our form, we need to go into “Design Mode”. There are various types of controls. From plain text controls, to more structured controls to pick a date or choose a value from a drop down list. Dragging the controls on our document, we end up with the following form. Check out our first post in this series how the document looks like from a user’s perspective. Every control has a tag property, as we will explain right away, this is key in order to query the different content controls in our document. The tag is specified in the properties dialog box. We have now our form, we can protect it and have our users only be able to fill in the forms. As we have discussed in the past , an OOXML document is based on the ZIP format. A ZIP file, containing multiple XML documents. For today only the document /word/document.xml matters. Let's first have a closed look and try to understand the structure of the XML. Here is a fragment of the document.xml, which contains the user’s data. 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
>
  >
    w:val = "Street" />
    w:val = "street" />
    w:val = "414549433" />
    >
      w:val = "DefaultPlaceholder_22675703" />
    >
    />
  >
  >
    >
      >
        w:w = "7398" w:type = "dxa" />
      >
      w:rsidR = "00252FB9" w:rsidRDefault = "00214A50" w:rsidP = "00214A50" >
        >
          > 1st Avenue 999 >
        >
      >
    >
  >
> The XML document includes a Structured Document element, , for each content control. The element has 2 sections, a properties section (line 2), and the content section (line 11). The element (line 4) is of interest as it allows us to locate a specific content control in the document through the w:val attribute. Remember we specified the tag for each control when designing the document. The element contains the actual data entered by the user. We can gather that data from the single element (line 18). Let’s define a utility function to retrieve the data for a specific content control. The parameters are the document and the tag (name) of the content control. The functions returns a single string with the user’s data. 1
2
3
4
5
declare function local:getField (
                      $ document,
                      $ fieldName as xs:string ) as xs:string {
  $ document//w:sdt [ w:sdtPr/w:tag/@ w:val = $ fieldName ] /w:sdtContent/data ( . )
} ; Using this function we can now query the various content controls in our Microsoft Office Word document. But before doing so, we have to note that text controls have a more complex . And this because they may span multiple lines. Here is an example. 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
>
  >
    >
      w:w = "7398" w:type = "dxa" />
    >
    w:rsidR = "00252FB9" w:rsidRDefault = "00214A50" w:rsidP = "00214A50" >
      >
        > Our first trip to Great Company was a success. >
      >
      >
        />
        > Mr. Roxy was excited after our product presentation. And we have scheduled a follow-up conference call to discuss some of the outstanding questions. >
      >
    >
  >
> We can still gather that data from the text elements . But in addition we need to take the elements into account, they indicate a new line. We define a second utility function to query controls supporting multiple lines. 1
2
3
4
5
declare function local:getMultiLineField (
                      $ document,
                      $ fieldName as xs:string ) as xs:string {
  $ document//w:sdt [ w:sdtPr/w:tag/@ w:val = $ fieldName ] /w:sdtContent// ( w:t|w:br ) / ( if ( fn:local - name ( ) = "t" ) then fn:data ( . ) else " " )
} ; We have now all the necessary utility functions. The following query retrieves all data out of our Microsoft Office Word form. We use the greatcompany.docx document, shown in Part I of this series . 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
declare namespace w = "http://schemas.openxmlformats.org/wordprocessingml/2006/main" ;

declare function local:getField (
                      $ document,
                      $ fieldName as xs:string ) as xs:string {
  $ document//w:sdt [ w:sdtPr/w:tag/@ w:val = $ fieldName ] /w:sdtContent/data ( . )
} ;

declare function local:getMultiLineField (
                      $ document,
                      $ fieldName as xs:string ) as xs:string {
  $ document//w:sdt [ w:sdtPr/w:tag/@ w:val = $ fieldName ] /w:sdtContent// ( w:t|w:br ) / ( if ( fn:local - name ( ) = "t" ) then fn:data ( . ) else " " )
} ;

let $ word := fn:doc ( "zip:file:///C:/example/greatcompany.docx!/word/document.xml" )
return
  >
          > { local:getField ( $ word, "date" ) } >
          > { local:getField ( $ word, "name" ) } >
          > { local:getField ( $ word, "description" ) } >