Understanding XmlReader, XmlWriter & XmlDocument in C#

shabbir's Avatar author of Understanding XmlReader, XmlWriter & XmlDocument in C#
This is an article on Understanding XmlReader, XmlWriter & XmlDocument in C# in C#.
XML documents are a standards described by World Wide Web Consortium (W3C) that are used for transferring information over the internet. The .NET Framework provides several namespaces and classes that allow developers to read, write and interact with the XML documents. In my article on LINQ to XML, I showed you that how types contained by System.Xml.Linq namespace can be used to interact with XML documents. However, there are many other types that can be used to read, write and communicate with the XML documents. In this article, we are going to see some of these types.

XmlReader



XmlReader class belongs to System.Xml namespace and is used for reading XML stream in a forward-only manner. XmlReader is mostly used to read low-level streams. Without wasting any further time on theory, I am going to explain you that actually XmlReader reads from a file and from a string. Have a look at our first example for this tutorial.

Example1

Code:
using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;
using System.Data.Entity;
using System.Xml.Linq;
using System.Xml;

namespace CSharpTutorial
{
    class Program
    {
        public static void Main()
        {
            XmlReader xmr = XmlReader.Create(@"D:\XMLFile.xml");

            while (xmr.Read())
            {
                Console.WriteLine(xmr.Name);
            }

            Console.WriteLine("\n Reading from a string ...\n");

            string xmldoc = @"<student id='12' status='archived'>
                            <firstname>Mark</firstname>
                            <lastname>Taylor</lastname>
                            </student>";
            XmlReader xmr2 = XmlReader.Create(new System.IO.StringReader(xmldoc));
            while (xmr2.Read())
            {
                Console.WriteLine(xmr2.Name);
            }
            Console.ReadLine();
        }
    }
}
Pay attention to the code in Example1, here we have first imported System.Xml namespace that contains XmlReader class. After that, we instantiated an XmlReader object which we named ‘xmr’. In order to instantiate an XmlReader object, you simply have to call static Create method of the Static XmlReader class and pass this method the path to the XML file which you want to read. Consider following lines of code:
Code:
XmlReader xmr = XmlReader.Create(@"D:\XMLFile.xml");
Here we are calling the static Create method to read an XML file named ‘XMLFile’ located in ‘D’ directory. We can call Read method on this xmr object to enumerate through all of the nodes and their names.

Similarly, if you want to read a string which actually represents some XML data, you can use following technique as we did in Example1:
Code:
XmlReader xmr2 = XmlReader.Create(new System.IO.StringReader(xmldoc));
Here in the above line we are reading xmldoc string which represents some XML information. And again, we are enumerating over this xmr2 object to display the names of all the nodes.

Output1



It can be seen from the output that when you read from the file, the first node displayed is the xml, since it is the declaration node in the document and XmlReader read it as well. But when you read from string, the first node is student. XmlReader’s Read returns true until there are no more nodes left to read in the document.

System.Xml also includes XmlReaderSettings object that can be used to alter reading options while reading from any XML source. There are three major properties of XmlReaderSettings object that can be used to control formatting.
  • bool IgnoreComments
  • bool IgnoreProcessingInstructions
  • bool IgnoreWhitespace
Reading Elements

If you know the structure and order of the XML document, you can easily read its elements via XmlReader’s functions. But before reading, the elements, do not forget to ignore the whitespaces because element reading functions deal them as elements and generate errors because whitespaces do not confirm to the structure of elements. The first method that you should call while reading the elements is MoveToContent method. This method actually moves the XmlReader to the content, skipping the introductory information. Next, you call ReadStartElement and pass it the name of the start element of your document. It will move the reader to the start element and from here you can read subsequent elements via calling ReadElementContentAsXXX. XXX denotes the type of the content contained by a node or element. For instance, the firstname element contains a string type data so, if you want to get the content contained by the element firstname, you can call Read ReadElementContentAsString (“firstname”, “”). You can see that ReadElementContentAsXXX takes two parameters, the first is the name of the next element in the order and second parameter is the namespace, which you should ignore for now. If you have an integer or a bool value as content of an element, you could use ReadElementContentAsInt or ReadElementContentAsBool, respectively. However, you should always keep in mind that the element that you are going to read is the next element in the XML document. For instance, we ReadStartElement and then if the next element in the order is firstname, you can only access that. If you try to access lastname element before the firstname element, an exception would occur. Also, you can skip over the comment nodes in your document by calling MoveToContent and finally to specify the end of the document, you can call ReadEndElement. All of these concepts would be explained in our next example. But before looking at the next Example, first look at our XML file that we would be reading.

Create a text file and copy following content into that file:
Code:
<?xml version="1.0" encoding="utf-8" ?>
<student id="12" status="archived">
  <firstname>Mark</firstname>
  <lastname>Taylor</lastname>
  <!-- Nice Name !-->
  <percentage> 96.77 </percentage>
</student>
Name the file Student and save it with extension ‘.xml’. The complete file name should be “Student.xml”. Place this file in the root of your “D” directory.

In our Example2, I will show you that how we can read the names and contents of the “Student.xml” file in order. Have a look at the code of our 2nd example.

Example2

Code:
using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;
using System.Data.Entity;
using System.Xml.Linq;
using System.Xml;

namespace CSharpTutorial
{
    class Program
    {
        public static void Main()
        {
            XmlReaderSettings xmrs = new XmlReaderSettings();

            xmrs.IgnoreWhitespace = true;
            XmlReader xmr = XmlReader.Create(@"D:\Student.xml",xmrs);

            xmr.MoveToContent();
            xmr.ReadStartElement("student");
            Console.Write(xmr.Name+": ");
            Console.WriteLine(xmr.ReadElementContentAsString("firstname",""));

            Console.Write(xmr.Name+": ");
            Console.WriteLine(xmr.ReadElementContentAsString("lastname", ""));

            xmr.MoveToContent();

            Console.Write(xmr.Name+": ");
            Console.WriteLine(xmr.ReadElementContentAsDouble("percentage", ""));
            xmr.ReadEndElement();

            Console.ReadLine();
        }
    }
}
In our Example2, first we instantiated an object of type XmlReaderSettings and named it xmrs. We then set its IgnoreWhitespace property to true. Next, we instantiated an XmlReader object xmr by calling Create method and passing it the path of Student.xml file which we created earlier. We then called MoveToContent method, which we transfer the reader to the actual content, skipping the introductory declaration tag. We know that the starting element of Student.xml file is “student”, therefore we called ReadStartElement and passed it this value. Reader would read this start element and then would point to the first element after the starting element that is “firstname”, we displayed the Name property of the reader to ascertain that it is actually pointing to the element “firstname”. Now, we know that the element to which reader is pointing at the moment is ‘firstname’ and the value or the content of this element is of string type. Therefore, in order to get the content of this element we called ReadElementContentAsString and passed it the first parameter “firstname” the second parameter will be an empty string. This method will do two things: first it will return the content of element “firstname” and second, it will switch the reader to the next element. Now if you display the Name property of xmr XmlReader object, it will display “lastname” which is the next element in the list. We again called ReadElementContentAsString and passed it “lastname” to get the content of the “lastname” element. Now, the reader will point towards the next node in the file which is actually a comment. In order to skip this comment node, we called MoveToContent which will skip the comment node and move to the next element node. Now are xmr is pointing to the “percentage” element. We can print and see the name of the element by calling Name property again. We know that “percentage” element contain some decimal value, therefore in order to read its content we called ReadElementContentAsDouble and displayed it on the screen. Finally we called ReadEndElement to mark the end of all the elements in the document. The output of the code in Example2 is as follows:

Output2

[IMGhttp://imgs.g4estatic.com/c-sharp/xmlreader-xmlwriter/]output2.png[/IMG]

Reading Attributes

You can also read attributes of the node to which cursor is currently pointing. For instance if you don’t call ReadStartElement(), the reader would point to the root element. You can get the value of the attributes of the root element by simply using XmlReader[“Attribute_Name”], where XmlReader is the reader object and Attribute_Name will be the name of the attribute whose value you want to get. Next, when you call ReadStartElement(), the reader points to the first element after the start element. You can get the attributes of the first element in a way similar to how we got the attributes of the start element. Our next example demonstrates this concept. But before moving towards our next example, modify your Student.xml file. The file should look like this:
Code:
<?xml version="1.0" encoding="utf-8" ?>
<student id="12" status="archived">
  <firstname age="20">Mark</firstname>
  <lastname>Taylor</lastname>
  <!-- Nice Name !-->
  <percentage> 96.77 </percentage>
</student>
What we have done is that we have simply added one attribute to the “firstname” element and we named it “age”. We stored a value of 20 for this attribute. Now let us see that how we can get these attributes via C# program. Have a look at the 3rd example of this article.

Example3

Code:
using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;
using System.Data.Entity;
using System.Xml.Linq;
using System.Xml;

namespace CSharpTutorial
{
    class Program
    {
        public static void Main()
        {
            XmlReaderSettings xmrs = new XmlReaderSettings();

            xmrs.IgnoreWhitespace = true;
            XmlReader xmr = XmlReader.Create(@"D:\Student.xml",xmrs);

            xmr.MoveToContent();

            Console.WriteLine(xmr.AttributeCount);
            Console.WriteLine(xmr["id"]);
            Console.WriteLine(xmr["status"]);

            xmr.ReadStartElement();

            Console.WriteLine(xmr["age"]);
            Console.ReadLine();
        }
    }
}
Pay attention to the code in Example3, here again we have an XmlReaderSettings object and we set its IgnoreWhitespace property to true. Next, we instantiated an XmlReader object which we named xmr by calling static ‘Create’ method. Now, we have an XmlReader object which can be used to read attributes of the nodes. We called MoveToContent to skip the XML declarations in the start. Now the cursor is pointing to the start element or start node.
We knew that in our Student.xml file the start element is student and this student element has two attributes: id and status. You can get the number of attributes of the element to which reader cursor is pointing by calling AttributeCount property as we done in this line of code:
Code:
Console.WriteLine(xmr.AttributeCount);
The above line would display the number of attribute of the student element.

Now in order to get the content of these attributes we simply use the xmr reader object and pass the name of the attributes enclosed in square brackets as follows:
Code:
Console.WriteLine(xmr["id"]);
Console.WriteLine(xmr["status"]);
The above lines will display the value of the id and status attributes of the start element student. After that we call ReadStartElement to move to the next element. Now the xmr XmlReader object is pointing to the first element which is ‘firstname”. In our Student.xml file we added an attribute “age” for this element and specified its value 20. So, now if you write this line of code:
Code:
Console.WriteLine(xmr["age"]);
The above line would display the value of the age attribute of the “firstname”. This is how you can get and display values of all the attributes in different elements. The output of the code in Example3 is as follows:

Output3



Here in the output, 2 is the number of attributes of the start element student. 12 is the value of the id attribute of student element. Next, the value of status attribute has been displayed which is archived. Finally 20 has been displayed which is the value of the age attribute of the “firstname” element.

There is another way to read attributes from the one we specified in Example3. We can also use MoveToAttribute method and pass it the name of the attribute of the element to which currently the cursor is pointing. We can then access the content of that attribute by calling ReadContentAsXXX where XXX is the type of attribute. This helps in strongly typed attribute parsing and we can store the value in the corresponding data types more conveniently. Our next example demonstrates this concept. Have a look at the 4th example of this tutorial:

Example4

Code:
using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;
using System.Data.Entity;
using System.Xml.Linq;
using System.Xml;

namespace CSharpTutorial
{
    class Program
    {
        public static void Main()
        {
            XmlReaderSettings xmrs = new XmlReaderSettings();

            xmrs.IgnoreWhitespace = true;
            XmlReader xmr = XmlReader.Create(@"D:\Student.xml",xmrs);

            xmr.MoveToContent();
            xmr.MoveToAttribute("id");

            int id = xmr.ReadContentAsInt();
            xmr.MoveToAttribute("status");

            string status = xmr.ReadContentAsString();

            Console.WriteLine("id: " + id + "\nStatus: " + status);

            xmr.ReadStartElement();

            xmr.MoveToAttribute("age");
            int age = xmr.ReadContentAsInt();
            Console.WriteLine("age: "+age);
            Console.ReadLine();
        }
    }
}
Closely look at the code in Example4. We have an XmlReaderSettings xmrs and this object sets the IgnoreWhitespace property to true. We instantiated an object of type XmlReader and named it xmr. The XmlReaderSettings object is passed to xmr object as a second parameter of the Create method. The first parameter to the Create method would be the path to our Student.xml file. Now, we have a reader which can be used to read the Student.xml file. We called MoveToContent method which would allow the reader to skip the xml declaration and will transfer the reader cursor to the start element which is student. We knew that student element has two attributes. We first called MoveToAttribute and passed it string “id”. This would move the xmr reader cursor to the “id” attribute of the student element. We know that id attribute is of type integer. Therefore, in order to get the value of this attribute we can call ReadContentAsInt, which would return the value of the “id” attribute as integer. We stored this in the id variable. Next, we again called MoveToAttribute method and this time we passed it “status”. Now the cursor would be pointing to the “status” attribute of the starting element student. We called ReadContentAsString method to get the value of the “status” attribute since we know that it is of type string. We stored the returned value in local variable status of type string. We then displayed the value of both “id” and “status” on screen.

Note that till now, the reader is pointing towards the start element student. Therefore we could access the attributes of the student. If we want to access the attributes of the other elements we will have to switch the reader to those elements. For instance we have an element firstname which has an attribute age. In order to read the value of the age attribute we will first have to move our cursor to the firstname element and then we will call MoveToAttribute and pass it the value “age”. We have done this in our code as well. First, we called ReadStartElement which means that start element has been read and now the reader is pointing to the first element which is firstname. Next, we called MoveToAttribute method to move the cursor to “age” attribute. Finally we called ReadContentAsInt to get the value of “age” attribute. We stored this value in local variable “age” and displayed it on the console. The output of the code in Example4 is as follows:

Output4



XmlWriter



In last sections, described that how an XML document can be read. We explained how to read the elements, attributes and their corresponding values. In this section we are going to explain that how you can use XmlWriter class to write an XML document. XmlWriter class belongs to the same System.Xml namespace to which XmlReader class belongs. Also both the classes share man common characteristics. Like XmlReader, XmlWriter object is also instantiated through Create method where first parameter is the path to the XML file to which you want to write and the second parameter is the settings. In order to further explain XmlWriter class, have a look at the 5th example of this article.

Example5

Code:
using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;
using System.Data.Entity;
using System.Xml.Linq;
using System.Xml;

namespace CSharpTutorial
{
    class Program
    {
        public static void Main()
        {
            XmlWriterSettings xmws = new XmlWriterSettings();

            xmws.Indent = true;
            XmlWriter xmw = XmlWriter.Create(@"D:\Student1.xml",xmws);

            xmw.WriteStartDocument();
            xmw.WriteStartElement("student");
            xmw.WriteElementString("firstname", "Jaime");
            xmw.WriteElementString("lastname", "show");
            xmw.WriteEndElement();
            xmw.WriteEndDocument();
            xmw.Close();

            Console.WriteLine("The XML document has been written ...");

            Console.ReadLine();
        }
    }
}
Pay attention to the code in Example5. Here at the beginning we have an XmlWriterSettings object which is similar to XmlReaderSettings object but this object is passed to XmlWriter object to set writing characteristics. We instantiated the XmlWriterSettings object which we named ‘xmws’ and then we set its Indent property to true. The Indent property of the XmlWriterSettings class is basically used to preserve XML document indentation while writing the XML document through XmlWriter object. Next we created an XmlWriter object. We named this object xmw writer and instantiated it via Create method. You can see the similarity between the Create method of XmlReader and XmlWriter class. In XmlWriter first parameter is the path and file name which you want to create. This parameter is similar in both XmlReader and XmlWriter. The second parameter for XmlWriter is optional XmlWriterSettings object. We passed our xmws object as second parameter.

Now we have a handle to write the XML document. We then called WriteStartDocument to mark the start of our document. This method is optional but if you call this method, you will also have to call WriteEndDocument at the end. For the sake of clarity we included this method in Example5. Next, we called WriteStartElement, which would write the start element. We passed “student” to this method. Now, until we call the WriteEndElement method, all the subsequent elements would be written inside this student element. To write subsequent elements, we call WriteElementString. This method takes two parameters: First is the name of the element, and second is the text or the content of that element. We passed first parameter “firstname” and second parameter “Jaime”. Next in order to add another element inside student element, we again called WriteElement string and this time we passed it “lastname” and “show” as first and second element respectively. This would create an element “lastname” after the “firstname” element and the content of this element would be “show”. We then called WriteEndElement method which marks the end of the student element. Finally we called WriteEndDocument to mark the end of the document. An important thing to note here is that, DO NOT forget to call Close method on the XmlWriter object because it is holding some memory resources and these memory resources are released when Close is called. The output of the code in Example5 is as follows:

Output5



The output only shows that XML document has been written. To see the actual XML file, you will have to go to the path that you specified in the Create method. You will see there an XML file created with the name that you specified. If you open that file, you should see following content in that file.
Code:
<?xml version="1.0" encoding="utf-8"?>
<student>
  <firstname>Jaime</firstname>
  <lastname>show</lastname>
</student>
You can see that XML declarations have automatically been added at the start of the file. To omit these XML declarations from the beginning you can set OmitXmlDeclaration property of the XmlWrittingSettings to true.

Writing Attributes

Writing attributes is also an extremely simple task. After calling WriteStartElement, you have to call WriteAttributeString. This method takes two parameters, first one is the attribute name and the second parameter is the value of the attribute. In our next example, we are going to demonstrate that how we can do this. Have a look at the 6th example for this article.

Example6

Code:
using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;
using System.Data.Entity;
using System.Xml.Linq;
using System.Xml;

namespace CSharpTutorial
{
    class Program
    {
        public static void Main()
        {
            XmlWriterSettings xmws = new XmlWriterSettings();

            xmws.Indent = true;
            XmlWriter xmw = XmlWriter.Create(@"D:\Student1.xml",xmws);

            xmw.WriteStartDocument();
            xmw.WriteStartElement("student");
            xmw.WriteAttributeString("id", "10");
            xmw.WriteAttributeString("status", "archived");

            xmw.WriteElementString("firstname", "Jaime");
            xmw.WriteElementString("lastname", "show");

            xmw.WriteEndElement();
            xmw.WriteEndDocument();
            xmw.Close();

            Console.WriteLine("The XML document has been written ...");
            Console.ReadLine();
        }
    }
}
The code in Example6 is similar to one in Example5, but here we have added two attributes to the start element student. We have added two attributes “id” and “status” by calling WriteAttributeString method twice immediately after WriteStartElement(“student”) method. First we passed it “id” and “10” as first and second parameter respectively which would create an attribute of “id” of student element, with value of attribute being 10. In the same way we created another attribute “status” and assigned it the value “archived”. Now if you compile and run the code in Example6, you would find that the XML file that would be created looks like this:
Code:
<?xml version="1.0" encoding="utf-8"?>
<student id="10" status="archived">
  <firstname>Jaime</firstname>
  <lastname>show</lastname>
</student>
You can see that the ‘id’ and ‘status’ attribute have been added to the start element student and the values of these attributes is ‘10’ and ‘archived’ respectively.

XmlDocument



The XmlDocument basically represents a physical XML document. It is similar to X-DOM object of LINQ to XML. XmlDocument is basically an in memory representation of the XML document. XmlDocument object is best explained with the help of an example. In our next example, I am going to show you that how XmlDocument can load and save an XML document and how we can navigate through it. Have a look at the 7th Example of this article.

Example7

Code:
using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;
using System.Data.Entity;
using System.Xml.Linq;
using System.Xml;

namespace CSharpTutorial
{
    class Program
    {
        public static void Main()
        {
            XmlDocument xmd = new XmlDocument();
            xmd.Load(@"D:\Student1.xml");
            xmd.Save(@"D:\Student2.xml");

            Console.WriteLine(xmd.DocumentElement.ChildNodes[0].InnerText);
            Console.WriteLine(xmd.DocumentElement.ChildNodes[1].InnerText);
            Console.WriteLine(xmd.DocumentElement.ChildNodes[0].ParentNode.Name);
            Console.WriteLine(xmd.DocumentElement.FirstChild.Name);
            Console.WriteLine(xmd.DocumentElement.FirstChild.NextSibling.Name);
            Console.ReadLine();       
        }
    }
}
Look closely at the code in Example7. Here, first we created an XmlDocument object by calling an empty constructor. In order to load an XML document into the XmlDocument object, you just have to call the Load method of XmlDocument object and pass it the path to the file that you want to load. We loaded Student1.xml file located in directory ‘D’. Our Student1.xml contains following data:
Code:
<?xml version="1.0" encoding="utf-8"?>
<student id="10" status="archived">
  <firstname>Jaime</firstname>
  <lastname>show</lastname>
</student>
When you call the Load method of the XmlDocument object xmd, and pass it the path of the above mentioned file. The above content is loaded into the xmd object which is now in-memory representation of the above content of data.

Now, if you call Save on the xmd object and pass it path to some xml file (new file is created if one doesn’t exist previous), the above content would be written into the new file. Next, consider this line of code:
Code:
Console.WriteLine(xmd.DocumentElement.ChildNodes[0].InnerText);
This is actually how you access the value of content of a child node. In the above line of code we are accessing the inner text of child node 0. The 0th child node in Student1.xml file is “firstname” and it contains inner text “Jaime” which would be displayed on the screen. Similarly, if you want to display the content of the “lastname” node you can pass index 1 to the ChildNode which denotes that this would be the second child.

If you want to get the name of the parent of the first child, you can simply call ParentNode and its Name property. This would display the name of the parent node of the child node whose index has been specified. The following line of code does this in our example:
Code:
Console.WriteLine(xmd.DocumentElement.ChildNodes[0].ParentNode.Name);
Here we are printing the name of the parent node of the first child. Other important functions of the XmlDocument class are the FirstChild and LastChild which return the name of the first and last child nodes. Consider following lines of code:
Code:
Console.WriteLine(xmd.DocumentElement.FirstChild.Name);
The above line of code will return the name of the first childe node. Similarly, if you call NextSimbling after the child node, it will return the next sibling node. For instance consider following line of code:
Code:
Console.WriteLine(xmd.DocumentElement.FirstChild.NextSibling.Name);
In the above line of code, we are getting the name of the next sibling of the first child. This would return “lastname” because the next sibling of the first child is “lastname”. The output of the code in Example7 is as follows:

Output7



You can say that first inner text of first child node is printed which is “Jaime”, then inner text of the second child node is printed which is “show”. Next, the name of the parent of the first node is printed which is “student”. Then the name of the first child is printed which “lastname” and finally the name of the next sibling of the first child is printed which is “lastname”.