Modèles SGBD de Minisis

Introduction to Xml

The problem with data exchange

In the today’s electronically connected world, it is fundamental that different systems be able to communicate with each other. Because it is infeasible for one system to do everything, systems are often required to communicate with each other.Unfortunately, this requires that the communicating systems create a common protocol for their exchanges of information; this protocol is basically the language that the systemswill speak when talking to each other. Two systems trying to communicate using different protocols would have the same problems understanding each other as two humans speaking in two different languages.

Business Processes

Data exchange is further complicated by the fact that most business processes are hierarchically structured, whereas most IT systems are relationally structured. For example, see the diagrams below.

The diagram to the left represents how employee records would be viewed from a corporate (or human) point of view. An employee would have his/her own file, which would in turn house his/her contact information, timesheets, and employee history.

However, many systems structure their data differently. The diagram to the right shows another way of representing this model. As you can see, rather than having one file containing all employee information, the data has been separated into four files and linked together. This is how the data would likely be structured in a database system, because databases are easier to administer if they are set up in this fashion.

So, now we have a problem: we have taken our hierarchical business model and restructured it to satisfy a machine. The model now makes less sense to humans, as an employee’s records have been split up over several files.

XML: a standard data exchange format

Enter XML.

 

 

What is XML?

XML is one of those buzz words you’ve probably heard or read somewhere, but aren’t sure of the meaning.  XML (the eXtensible Markup Language)is dramatically altering the world of technology…and most people probably don’t even know it.  This is because XML use is usually transparent; you probably use it every day without realizing it.
In its simplest form, XML is just a way of formatting data for storage or transmission.  XML documents are simple and human-readable.  Below is a sample XML record.

Sample XML Record

<PERSON>
<FIRSTNAME>Steve</FIRSTNAME>
<LASTNAME>Kwan</LASTNAME>
<ADDRESS>333 Terminal Ave.</ADDRESS>
<CITY>Vancouver</CITY>
<PROVINCE>BC</PROVINCE>
<COUNTRY>Canada</COUNTRY>
<EMAIL>steve@minisisinc.com</EMAIL>
</PERSON>
This is an XML record for a person named Steve Kwan, whose address is 333 Terminal Ave., Vancouver BC, Canada.  Note that each piece of information is enclosed between “tags” that specify a name for the enclosed piece of information; for example, “Steve” is classified as FIRSTNAME.  More information on XML formatting is described in further detail later on in this document (see the “Understanding XML Data” section).
Needless to say, XML data can get significantly more complicated than what is shown above.  While this document won’t cover every aspect of XML, it will cover the most important concepts and hopefully give you a foothold on this very elaborate technology.

XML Compatibility

At first glance, XML doesn’t look too important.  After all, how can such a simple data format be such a big deal?  But it is a big deal, and here’s why.What makes XML significant is this: because XML data is always in the same format, every XML-compatible program can read it.  This makes XML an extremely effective way to get data out of one program and into another. XML is like a universal language that all XML-compliant programs can speak.
As shown in the diagram to the below, it is possible for a database such as MINISIS to use XML to communicate with other applications.
You are likely wondering when you would use XML.  Actually, it’s likely that you use it all the time; many programs use XML “behind the scenes”.
The next section provides some useful examples of XML in action.

Examples of XML in Action

Below are some useful examples of XML in real world situations.

Dublin Core

The Dublin Core standard is a brief specification that is widely used by museums for their Collections Management Systems (CMS).  The Dublin Core standard specifies fifteen data fields that are common to most museums; through this standard, it is simple to transfer data from one museum database to another.A Dublin Core XML standard has been written to allow Dublin Core data to be exchanged in XML.

The MINISIS Collections Management System, M3, is Dublin Core compliant.

Encoded Archival Description (EAD)

The Encoded Archival Description (EAD) standard is an XML solution used by archival organizations to manage their records.EAD is a complex protocol used to exchange detailed information between two organizations.  Online archival centres such as ARCHEION use EAD.  EAD is also in use in organizations such as the National Archives of Canada and the Archives of Ontario.

The MINISIS Archival Management system, M2A, is EAD compliant.

MARC

The Machine-Readable Cataloging (MARC) standards are a common file format for libraries and other catalogues.  There are several MARC standards currently being used, and thus there may be issues in converting to a standardized MARC format, but there is a MARC specification for XML.M2L (MINISIS’ Management for Libraries application) is currently MARC enabled.  When MINISIS Inc. can verify that a concrete MARC standard has been utilized, M2L will be XML-enabled.

MINISIS and XML

MINISIS Inc. has made a commitment to the XML revolution by offering a fully XML-compliant database engine: the MINISIS RDBMS.

Relational Databases

Whereas most databases have a small degree of XML compatibility, they are usually unable to conduct full XML importing and exporting.
This is due to the relational structure of SQL-related database (see the diagram below).  While SQL databases are relational, XML is hierarchical in nature.  This makes it quite difficult to use XML data with a SQL database.
This serious limitation is causing many people to rethink their reliance upon SQL-based data models.  XML is quickly becoming a dominant technology.  A database that isn’t XML-compliant will face serious problems in only a few years.

Hierarchical Databases

A recent push towards hierarchical databases is being made in order to make data communications between different databases easier.  MINISIS Inc. is spearheading this revolution with the MINISIS RDBMS.  Other popular database models such as LDAP are taking the same route.
Due to MINISIS’ hierarchical nature, it is one of the few databases in the world that can claim 100% XML compatibility.
For more information on the differences between relational and hierarchical database technology (as well as the hybrid technology that MINISIS deploys), see our Database Architecture documentation.

Understanding XML Data

The first thing to understand about actual XML data is that it consists mainly of tags and values.

Tags and Elements

tag is the name of a particular piece of data, stored between angle brackets (that is, “<” and “>”).  There is an opening angle bracket, which will appear like <this>, and a closing one that contains a slash, which looks like</this>.  Between the two will be the item’s value.  Whenever a chunk of data appears between opening and closing tags like this, that value (as well as the tags and any associated data) is known as an XML element.For instance, we could have an XML element that looks like this:

<MY_ELEMENT>Hello, world!</MY_ELEMENT>

The above element is named MY_ELEMENT, and it contains the value, “Hello, world!”.  If we were to import this element into a database, we may wish to map it to a field named MY_ELEMENT, but we could map it to a field with another name if we wanted.

Attributes

We can also add additional information to an XML element, which may not be part of the element’s value, but be an important piece of information that should be stored with the record.  For example:<MY_ELEMENT language=”English” author=”Steve Kwan”>
Hello, world!
</MY_ELEMENT>

By tacking the language=”English” and author=”Steve Kwan”statements into this element, we are providing additional information about this element; namely, we are saying that the element’s language is English, and the author of this element is Steve Kwan.

Extra data stored in this fashion are referred to as attributes.

If you wanted to, you could actually write an entire XML document using attributes, without specifying any element values at all.  For example, we could rewrite our MY_ELEMENT example like so:

<MY_ELEMENT language=”English” author=”Steve Kwan” value=”Hello, world!”> </MY_ELEMENT>

Parent Elements and Hierarchical Tree Structures

XML data is stored in a nested tree structure (also known as a hierarchical structure).  This means that every item in our XML data can have child items, which are grouped with their parent.  For example:<PARENT desc=”This is the parent.”>
<CHILD desc =”This is the child.”>My child element</CHILD>
</PARENT>

The above data tells us that we have a record named PARENT, which contains a record named CHILD.  The CHILD’s data is “My child element”.
We could also have multiple children for a single parent.  So, if we were using PARENT / CHILD data for a realistic reason, we could have data like this:

<PARENT name=”Bill Sr.”>
<CHILD name=”Bill Jr.”></CHILD>
<CHILD name=”Bill II”></CHILD>
</PARENT>

You can generally nest as many child elements as you want.

As an example:  if you had the XML data below:<PARENT name=”Bill Sr.”>
<CHILD name=”Bill Jr.”>
<GRANDCHILD name=”Bill Jr. Jr.”></GRANDCHILD>
</CHILD>
<CHILD name=”Bill II”>
<GRANDCHILD name=”Bill III”></GRANDCHILD>
<GRANDCHILD name=”Bill IV”></GRANDCHILD>
<GREATGRANDCHILD name=”Bill V”></GREATGRANDCHILD>
</CHILD>
</PARENT>
This would be represented by this XML tree structure:
So there is your crash course on XML data.  There are many other things you can do within an XML record, such as Document Type Definitions (DTDs) and XML Schema, which are discussed in later sections.

Sample XML Records

This section contains some sample XML records to help drive the basic concepts home.  If you already have a strong grasp on how XML data is stored, you might want to skip this example.
Let’s say we have XML data in the format shown below.  As can be seen below in the left column, we have a master element called PEOPLE.  This is the “root element”; that is, every other element is nested within this element.  There will only ever be one PEOPLE element in the entire document; every other piece of data is stored inside this element.

Sample XML

Inside the PEOPLE element, we have a nested element called PERSON.  Each individual person stored in this record has various information associated with him/her, such as a name (stored as FULL_NAME), a title (TITLE), any number of phone numbers (containing two bits of information: the number type and the number), an address (specifying a city and country) and any number of links to pictures of the person.
XML data format A sample XML data file in this format
PEOPLE
PERSON
FULL_NAME
TITLE
PHONE
PH_TYPE
PH_NUMBER
ADDRESS
CITY
COUNTRY
PICTURE
<PEOPLE>
<PERSON>
<FULL_NAME>Steve Kwan</FULL_NAME>
<TITLE>Software Engineer</TITLE>
<PHONE>
<PH_TYPE>Work</PH_TYPE>
<PH_NUMBER>604-255-4366</PH_NUMBER>
</PHONE>
<PHONE>
<PH_TYPE>Home</PH_TYPE>
<PH_NUMBER>604-111-2222</PH_NUMBER>
</PHONE>
<ADDRESS>
<CITY>Vancouver</CITY>
<COUNTRY>Canada</COUNTRY>
<PICTURE>steve1.jpg</PICTURE>
<PICTURE>steve2.jpg</PICTURE>
</ADDRESS>
</PERSON>
<PERSON>
<FULL_NAME>Baseer Khan</FULL_NAME>
<TITLE>Software Engineer</TITLE>
<PHONE>
<PH_TYPE>Work</PH_TYPE>
<PH_NUMBER>604-255-4366</PH_NUMBER>
</PHONE>
<PHONE>
<PH_TYPE>Home</PH_TYPE>
<PH_NUMBER>604-222-2222</PH_NUMBER>
</PHONE>
<ADDRESS>
<CITY>Vancouver</CITY>
<COUNTRY>Canada</COUNTRY>
<PICTURE>baseer1.jpg</PICTURE>
</ADDRESS>
</PERSON
</PEOPLE>

Structuring XML Data

There is a problem with the XML examples shown above: in those examples, you can specify data elements and their values, but you can’t specify their structure.
It’s very important that we be able to specify the format in which we receive XML data.  For example, suppose we are expecting to receive a record in this format:<PERSON>
<FULL_NAME>My name</FULL_NAME>
<ADDRESS>My address</FULL_NAME>
</PERSON>
Now, suppose we are expecting data formatted like the above, but instead, we receive data like this:<OBJECT>
<OBJECT_NAME>My object</OBJECT_NAME>
<OBJECT_INFORMATION>
<OBJECT_DATE>Today</OBJECT_DATE>
<OBJECT_LOCATION>Some place</OBJECT_LOCATION>
</OBJECT_INFORMATION>
</OBJECT>
This is a totally different object we’ve received!  Our program won’t know how to handle this data.  Even in the flexible world of XML, we need to establish standards on data transfer.  That’s where Document Type Definitions come into play.

Document Type Definitions (DTDs)

Document Type Definitions (DTDs) allow users to specify the format in which they wish to receive data.  The purpose of this article isn’t to explain DTDs in heavy detail, but rather, to give an example of how they work.
For example, you could add the following restrictions to an XML document via a DTD:

  • Restrict the number of <PERSON> elements transmitted
  • Enforce a rule statement that only one <PICTURE> can be transmitted.
A DTD is added to your document by putting a DTD declaration at the beginning of your document.  For example, if you were using the Encoded Archival Description (EAD) XML format, you might use this declaration::<!DOCTYPE ead PUBLIC “-//Society of American Archivists//DTD ead.dtd (Encoded
Archival Description (EAD) Version 1.0)//EN” “../dtd/ead.dtd”
[
<!ENTITY % eadnotat PUBLIC “-//Society of American Archivists//DTD
eadnotat.ent (EAD Notation Declarations)//EN” “../dtd/eadnotat.ent”>
%eadnotat;
]>
Again, the exact meaning of these statements is out of the scope of this document.  The general idea is that adding this text to the top of your document specifies that XML elements must be transmitted in a particular order and with a particular number of each element.

XML Schema

The weakness of DTDs is that they allow you to specify the XML document structure, but they don’t allow you to specify certain attributes regarding the elements.  For example, with a DTD, you cannot specify that a <DATE>element should be transmitted in the format YYYY/MM/DD.  So, with a DTD alone, there’s nothing stopping someone from transmitting this:<DATE>No, I don’t want to enter a valid date.</DATE>
To circumvent problems like this, you’d use XML Schema.
XML Schema, among other things, allow you to dictate data types for an XML document.  For example, you could force an element to contain a numeric value with an XML Schema.
While the scope of XML Schema is out of this document, there is plenty of information available on this topic elsewhere.

Manipulating XML Data

So once you have your XML data ready, how do you access it?
There are many different ways to access XML data.  Many programs can read XML data if it is provided in the correct format, with the correct elements and structure.  However, in many instances you will need to manipulate XML data that is stored in your own format.
For example, you may want to define your own XML data structure, using your own DTD and Schema.  If you do this, then you will likely need to code custom software that can manage data in this format.
For programmers, there are many different methods for accessing data.  Many programming libraries contain their own methods for manipulating XML data.  There are several common interfaces for XML data which programmers can use to get at their XML data; here are a few of them.
The topic of manipulating XML data is one that is of concern mainly to programmers; thus, this section of the document is quite technical and may not make much sense to many non-technical readers.  If the nitty-gritty of XML data manipulation doesn’t interest you, you might want to skip to the next section.

XSLT

One of the most useful applications of XML are Extensible Stylesheet Language Transformations (XSLT). The term may be quite a mouthful, but the concept is simple:  An XML record is automatically converted into a webpage and displayed to a user.  The principle is illustrated in the diagram below.

So, in short, with an XSLT system you can have your XML-enabled database generate webpages automatically; this is a great technology for institutions who want to publish records online.

Document Object Model (DOM)

The Document Object Model (DOM) allows programmers to access their XML data in hierarchical tree form.  Remember that XML data is stored in a hierarchical tree structure.
DOM generally works in this fashion:

  1. First, DOM creates an XML tree based on the data from a file, or creates a new tree from scratch, which is empty.
  2. Second, a programmer now has access to the entire XML document “tree” of elements.  This entire tree can be manipulated as the programmer desires; this includes adding new elements, changing existing elements, deleting elements, etc.
  3. Third, the programmer can now choose what to do with the data.  Perhaps he/she wishes to save it to a file on disk, or send it over the Net.
DOM is a useful technology, but because it requires the entire XML document to be loaded at once, it can slow down your machine if the XML document is very large.  For this reason, it’s sometimes preferable to use the Simple API for XML, or SAX.

Simple API for XML (SAX)

A more efficient alternative to DOM is the Simple API for XML (SAX). Originally designed for the Java programming language, SAX is now available for many different languages, such as C++.
SAX is an event-driven technology.  From a technical perspective, this means that rather than loading the entire XML tree into your computer’s memory (as DOM does), SAX only loads one bit at a time and informs you when it’s done; it then proceeds to load the next one.  In many situations, SAX may be more efficient than DOM; however, it is often more complex to use.

Importing/Exporting Records in XML

Now that you know the details of XML and how it works, let’s talk about issues relating to XML record management.
The first step to using XML is getting your data into (or out of) XML format.  Converting your data to XML could be a simple task, or an exceedingly difficult one, depending upon your database architecture.
While there are several different types of databases available, we will focus on the two most prominent ones:

  • Hierarchical databases, which model data similar to your business processes, in a top-down hierarchical fashion
  • Relational databases, which break data up into well-organized tables and link them all together.
If you require more information on the topic of different database architectures, I recommend you read MINISIS Inc.’s database architecture discussion.

A Hierarchical Database Approach

Hierarchical databases are ideal for XML data, because XML is stored in hierarchical format.  So, if you are using a hierarchical database, XML data is your friend.
PEOPLE
PERSON
FULL_NAME
TITLE
PHONE
PH_TYPE
PH_NUMBER
ADDRESS
CITY
COUNTRY
PICTURE

Let’s say you have an XML file that stores data in the format shown to the right..  (For those of you paying attention, this is the same format used in the Sample XML Records section; you might want to take a look at the examples there if you haven’t already.)

Because XML is stored in a hierarchical format, if you are using a hierarchical database, you can make a near-exact duplicate of the data structure used in the XML file.  For example, if you are using the MINISIS database engine, you could make an equivalent database to the XML file, as shown below.
XML data structure MINISIS database structure
PEOPLE
PERSON
FULL_NAME
TITLE
PHONE
PH_TYPE
PH_NUMBER
ADDRESS
CITY
COUNTRY
PICTURE
PERSON                        database
FULL_NAME            character field
TITLE                       repeating field
PHONE                     repeating group
PH_TYPE character field
PH_NUMBER    character field
ADDRESS                group
CITY                character field
COUNTRY       character field
PICTURE                  repeating field
Note that in the MINISIS database, you don’t need the PEOPLE root element as you would in the XML document, so we don’t need to map that over.  However, if you wanted to, you could.

A Relational Database Approach

As stated earlier, a relational database system operates by storing related data in a lot of small databases and linking them together.  These links are created by assigning ID numbers to each record.
If you were going to built the example system we’ve been using (The PERSON system), you would create a relational database similar to the one depicted to the right.The PERSON database would store basic information about the person, such as their name, whereas the other information would be offloaded to other databases, such as TITLE, ADDRESS, PHONE, and PICTURE.  When someone runs a query on this database, the other database would need to be loaded, which would result in a slower system.  This is why, despite their popularity, relational databases may start to lose popularity.
The database definitions for this system would look like the ones shown below.
PERSON Table
id parent_id root_id occurrence full_name
1 Michael Moore
2 Henry Moore
TITLE Table
id parent_id root_id occurrence title
1 1 1 1 Artist
2 1 1 2 Scuba Diver
3 2 2 1 Director
PHONE Table
id parent_id root_id occurrence type number
1 1 1 1 mobile 78-76-54-5453
2 2 2 1 home 76-45-435-223
3 2 2 2 mobile 7865-65-4333
ADDRESS Table
id parent_id root_id occurrence city country
1 1 1 1 Paris France
2 2 2 1 New York USA
PICTURE Table
id parent_id root_id occurrence picture
1 1 1 1 C:PICSIMG0245.JPG
2 1 1 2 C:PICSIMG890E.JPG
3 2 2 1 C:PICSIMG0289.JPG
As you can see, the relational database structure provided here is quite different from the hierarchical structure provided.  This makes relational databases a little incompatible with XML data.  The best XML-compliance will be found in systems with hierarchical database structure, such as the MINISIS RDBMS.

How Does MINISIS Work With XML?

There are two ways to handle XML data in the MINISIS RDBMS:

  • Batch XML Document File Transfers
  • Online XML Document Transfers.

Batch XML Document File Transfers

A batch transfer involves loading a record (or set of records) from an XML file stored on your hard disk.
As shown in the diagram below, the user generates XML data via an XML client application.  This application could be any application which outputs data in XML format.  The XML Data is then loaded into a special conversion program, which is responsible for loading the data into MINISIS.
The batch transfer system is effective for large, infrequent file transfers. If your system requires live transactions, you may be better off using a real-time, online XML solution such as the one described in the next section.

Online XML Document Transfers

For real-time systems, such as online resources, you will want to use the MINISIS Web Interface (MWI) to web-enable your database and XML-enable it.  With MWI, a user can create, view and update records through a web-enabled XML application, such as a web browser.

Further Information Sources

So now that you’re on the XML bandwagon, you surely want to know where to look to learn more about XML and how to use it.  Below are some suggested resources.

Online Resources

Expressing  Simple Dublin Core in RDF/XML
http://dublincore.org/documents/2002/07/31/dcmes-xml/
Dublin Core is a format used for the interchange of information regarding museum collections.  Dublin Core is very useful to museums wishing to XML-enable their. collections.
Encoded Archival Description (EAD)
http://www.loc.gov/ead/tglib/index.asp
EAD is an XML interchange format used by archival organizations.
MARC 21 XML Schema
http://www.loc.gov/standards/marcxml/
MARC is an XML interchange format used by libraries and bibliographical systems.
Extensible Markup Language (XML) 1.0
http://www.w3.org/TR/WD-xml
The definition of the XML language; this document is probably too technical for most users.

 

 

extradrmtech

Since 20 years I work on Database Architecture and data migration protocols. I am also a consultant in Web content management solutions. I am an experienced web-developer with over 10 years developing PHP/MySQL, C#, VB.Net applications ranging from simple web sites to extensive web-based business applications. When not writing code, I like to dance salsa and swing and have fun with my little family.

You may also like...

Leave a Reply