XML stands for (Extensible Markup Language)which is a simplified subset of the Standard Generalized Markup Language (SGML) which provides a file format for representing data, a schema for describing data structure, and a mechanism for extending and annotating HTML with semantic information.
XML uses tags to encode extra document information. XML will look very familiar to those who know about SGML and HTML.
XML is intended `to make it easy and straightforward to use SGML on the Web, easy to define document types, easy to author and manage SGML-defined documents, and easy to transmit and share them across the Web.
It defines `an extremely simple dialect of SGML which is completely described in the XML Specification. The goal is to enable generic SGML to be served, received, and processed on the Web in the way that is now possible with HTML.'
For this reason, XML has been designed for ease of implementation and for interoperability with both SGML and HTML'
Differences between XML and SGML :
XML has been carefully designed with the goal that every valid XML document should also be an SGML document. There are some areas of difference between XML and SGML, but these are minor and should not cause practical problems,
There are a few areas where XML and SGML really differ:
- XML's white space handling rules are much less elaborate than those of SGML. One effect is that in a few, rarely encountered, cases, an XML processor will pass through a some white space (mostly line-ends) that an SGML processor will suppress. It is very unlikely that an XML author or user will ever notice this.
- XML defines, for documents, the property of being well formed; this does not really correspond to any SGML concept.
- XML has a very specific built-in method for handling international (non-ASCII) text. It is compatible with SGML.
The biggest difference between XML and HTML is that in XML, you can define your own tags for your own purposes, and if you want, share those tags with other users.
XML differs from HTML in three major respects:
- Information providers can define new tag and attribute names at will.
- Document structures can be nested to any level of complexity.
- Any XML document can contain an optional description of its grammar for use by applications that need to perform structural validation.
XML as a Structured Language :
XML is a markup language for documents containing structured information. Structured information contains both content (words, pictures, etc.) and some indication of what role that content plays (for example, content in a section heading has a different meaning from content in a footnote, which means something different than content in a figure caption or content in a database table, etc.). Almost all documents have some structure.
A markup language is a mechanism to identify structures in a document. The XML specification defines a standard way to add markup to documents.
Structured data includes things like spreadsheets, address books, configuration parameters, financial transactions, and technical drawings. XML is a set of rules for designing text formats that let you structure your data. XML is not a programming language, and you don't have to be a programmer to use it or learn it. XML makes it easy for a computer to generate data, read data, and ensure that the data structure is unambiguous. XML avoids common pitfalls in language design: it is extensible, platform-independent, and it supports internationalization and localization. XML is fully Unicode-compliant.
Unicode provides a unique number for every character, no matter what the platform, program and language are.
XML an important development :
XML an important development nwhich removes two constraints, which were holding back Web developments.
- The dependence on a single, inflexible document type (HTML) which was being much abused for tasks it was never designed for.
- The complexity of full SGML, whose syntax allows many powerful but hard-to-program options.
Like HTML, XML makes use of tags (words bracketed by '<' and '>') and attributes (of the form name="value"). While HTML specifies what each tag and attribute means, and often how the text between them will look in a browser, XML uses the tags only to delimit pieces of data, and leaves the interpretation of the data completely to the application that reads it. In other words, if you see "
" in an XML file, do not assume it is a paragraph. Depending on the context, it may be a price, a parameter, a person, a p... (and who says it has to be a word with a "p"?).
Programs that produce spreadsheets, address books, and other structured data often store that data on disk, using either a binary or text format. One advantage of a text format is that it allows people, if necessary, to look at the data without the program that produced it; in a pinch, you can read a text format with your favorite text editor. Text formats also allow developers to more easily debug applications. Like HTML, XML files are text files that people shouldn't have to read, but may when the need arises. Less like HTML, the rules for XML files are strict. A forgotten tag, or an attribute without quotes makes an XML file unusable, while in HTML such practice is tolerated and is often explicitly allowed. The official XML specification forbids applications from trying to second-guess the creator of a broken XML file; if the file is broken, an application has to stop right there and report an error.
Since XML is a text format and it uses tags to delimit the data, XML files are nearly always larger than comparable binary formats. That was a conscious decision by the designers of XML. The advantages of a text format are evident (see point 3), and the disadvantages can usually be compensated at a different level. Disk space is less expensive than it used to be, and compression programs like zip and gzip can compress files very well and very fast. In addition, communication protocols such as modem protocols and HTTP/1.1, the core protocol of the Web, can compress data on the fly, saving bandwidth as effectively as a binary format.
The XML Family :
The XML family" is a growing set of modules that offer useful services to accomplish important and frequently demanded tasks.
- Xlink - Xlink describes a standard way to add hyperlinks to an XML file.
- XPointer and XFragments - XPointer and XFragments are syntaxes in development for pointing to parts of an XML document. An XPointer is a bit like a URL, but instead of pointing to documents on the Web, it points to pieces of data inside an XML file. CSS, the style sheet language, is applicable to XML as it is to HTML.
- XSL - XSL is the advanced language for expressing style sheets. It is based on XSLT, a transformation language used for rearranging, adding and deleting tags and attributes.
- DOM - The DOM is a standard set of function calls for manipulating XML (and HTML) files from a programming language.
- XML Schemas - XML Schemas 1 and 2 help developers to precisely define the structures of their own XML-based formats. There are several more modules and tools available or under development. Keep an eye on W3C's technical reports page.
XHTML short for Extensible Hypertext Markup Language, a hybrid between HTML and XML specifically designed for Net device displays.
XHTML is a markup language written in XML; therefore, it is an XML application.
XHTML uses three XML namespaces (used to qualify element and attributes names by associating them with namespaces identified by URI references. Namespaces prevent identically custom-named tags that may be used in different XML documents from being read the same way), which correspond to three HTML 4.0 DTDs: Strict, Transitional, and Frameset.
XHTML markup must conform to the markup standards defined in a HTML DTD.
A DTD states what tags and attributes are used to describe content in an SGML document, where each tag is allowed, and which tags can appear within other tags. For example, in a DTD one could say that LIST tags can contain ITEM tags, but ITEM tags cannot contain LIST tags. In some editors, when authors are inputting information, they can place tags only where the DTD allows. This ensures that all the documentation is formatted the same way.
When applied to Net devices, XHTML must go through a modularization process. This enables XHTML pages to be read by many different platforms.
By choosing XML as the basis for a project, you gain access to a large and growing community of tools (one of which may already do what you need!) and engineers experienced in the technology. Opting for XML is a bit like choosing SQL for databases: you still have to build your own database and your own programs and procedures that manipulate it, and there are many tools available and many people who can help you. And since XML is license-free, you can build your own software around it without paying anybody anything. The large and growing support means that you are also not tied to a single vendor. XML isn't always the best solution, but it is always worth considering.
The alternative to XML for these applications is proprietary code embedded as "script elements" in HTML documents and delivered in conjunction with proprietary browser plug-ins or Java applets. XML derives from a philosophy that data belongs to its creators and that content providers are best served by a data format that does not bind them to particular script languages, authoring tools, and delivery engines but provides a standardized, vendor-independent.
Web applications of XML :
The applications that will drive the acceptance of XML are those that cannot be accomplished within the limitations of HTML. These applications can be divided into four broad categories:
- Applications that require the Web client to mediate between two or more heterogeneous databases.
- Applications that attempt to distribute a significant proportion of the processing load from the Web server to the Web client.
- Applications that require the Web client to present different views of the same data to different users.
- Applications in which intelligent Web agents attempt to tailor information discovery to the needs of individual users.
Electronic Data Interchange has been used in e-commerce for many years to exchange documents between commercial partners to a transaction. It has required special proprietary software, but there are now moves to enable EDI documents to travel inside XML.
EDI (Electronic Data Interchange) works by providing a collection of standard message formats and element dictionary in a simple way for businesses to exchange data via any electronic messaging service.
XML/EDI provides a standard framework to exchange different types of data -- for example, an invoice, healthcare claim, project status -- so that the information be it in a transaction, exchanged via an Application Program Interface (API), web automation, database portal, catalog, a workflow document or message can be searched, decoded, manipulated, and displayed consistently and correctly by first implementing EDI dictionaries and extending our vocabulary via on-line repositories to include our business language, rules and objects. Thus by combining XML and EDI we create a new powerful paradigm different from XML or EDI!
The combination of XML with EDI holds the promise of extending the advantages of Web-based EDI through an open standard to the millions of small- and medium-sized enterprises.