Go4Expert

Go4Expert (http://www.go4expert.com/)
-   C# (http://www.go4expert.com/articles/c-sharp-tutorials/)
-   -   LINQ to XML (http://www.go4expert.com/articles/linq-to-xml-t30051/)

shabbir 9Apr2014 22:19

LINQ to XML
 
XML stands for Extensible Markup language that is most commonly used for transferring information over the World Wide Web. XML documents are simple to write as well as understand both for humans as well as for software applications. If you come across an XML document, with little attention, you can point out that what this document is all about. However, this is not enough. You can have an XML document that contains thousands of lines of information. In such cases it is not advantageous to read and process the XML document manually as it might be time consuming as well as tedious. The real power of XML comes with its ability to be read by software applications. Since, XML documents follow some standards and if somehow software applications are tweaked to understand those standards, large XML documents can be seamlessly processed as done by human. This is the idea behind LINQ to XML.

Luckily for developers, .NET Framework provides built-in capability to interact with XML documents via LINQ to XML which is basically a Documented Object Model (DOM) and standard set of query operators that can be used to communicate with XML document. LINQ to XML was introduced in .NET Framework 3.5. Before going to the details of how to interact with XML documents, let us first clear the concept of DOM. Consider an XML document that contains following data:
Code:

<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<student id="12" status="archived">
    <firstname>Mark</firstname>
    <lastname>Taylor</lastname>
</student>

If you look at the above lines, you would see that this is a traditional XML document with some XML declarations in the first line. The document contains a root element which is named ‘student’. This root element hast two attributes: ‘id’ and ‘status’ with values 12 and ‘archived’ respectively. You can see that there are two child elements: ‘firstname’ and ‘lastname’ and they contain text, Mark and Taylor respectively.

If you pay little attention on the above code, you can see that this XML document can be represented with a help of class in C# with elements, attributes, values and texts. And the child elements can be represented with the help of collections within the outer class. Continuing this way, a tree of objects can be developed which is basically called a DOM (Documented Object Model).

Now, coming towards LINQ to XML, basically it comprises of two major parts.
  1. X-DOM or XML Documented Object Model.
  2. LINQ query operators to interact with X-DOM.
The X-DOM contains classes or types such as XDocment, XAttribute, XElement etc, however it is interesting to note that X-DOM doesn’t depend upon LINQ. You can create, update, save and load an X-DOM without needing LINQ.

X-DOM



Before further discussion, let us discuss X-DOM. The most commonly used X-DOM is XElement. XObject is another type which is basically the root of type hierarchy in XML. XDocument and XElement types are the root of hierarchy of containerships. Let us have a look at the first example of this article to have a better understanding of this concept. Don’t forget to import System.Xml.Linq. This is the namespace that allows us to use all the LINQ to XML functionalities.

Example1

Code:

using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;
using System.Data.Entity;
using System.Xml.Linq;

namespace CSharpTutorial
{
    class Program
    {
        public static void Main()
        {
            string xmldoc = @"<student id='12' status='archived'>
                            <firstname>Mark</firstname>
                            <lastname>Taylor</lastname>
                            </student>";

            XElement student = XElement.Parse(xmldoc);

            Console.WriteLine(student.Name);
            Console.WriteLine(student.FirstAttribute);
            Console.WriteLine(student.LastAttribute);
       
            Console.WriteLine(student.FirstNode);
            Console.WriteLine(student.LastNode);
           
          Console.ReadLine();
        }
    }
}

The namespace we require is System.XML.Linq. Then we have a string xmldoc which actually contains some XML markup. This xmldoc is representing the XML document. Next, we have a XElement type object which we named student. We called the static Parse method of XElement type and pass it the xmldoc string. This Parse method will actually read the xmldoc and will return XElement object. The XElement student represents the root element of the document which is student. Next, we have accessed the Name, FirstAttribute, LastAttribute, FirstNode, LastNode of this root element and displayed it on the console. The output of the code in Example1 is as follows:

Output1

http://imgs.g4estatic.com/c-sharp/linq-xml/output1.png

XNode is a type that serves as base class for most of the content in XML document. An advantage of XNode is that it can work with an ordered collection of nodes of different types. XNode class can be used to access the parent XElement but it cannot access the child elements in the XML document. For this purpose, another abstract class is used called, XContainer which serves as base class for both XDocument and XElement class.

Creating an XML DOM via Parsing & Loading



There are two ways to create an X-DOM. You can either call a static Load method or Parse method. Load is used to create an X-DOM object from a link, a TextReader, XMLReader or a stream. The second method is via a Parse function which we explained in our first example. Have a look at our second example to further expand the concept.

Example2

Code:

using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;
using System.Data.Entity;
using System.Xml.Linq;

namespace CSharpTutorial
{
    class Program
    {
        public static void Main()
        {
            string configdoc = @"<configuration>
                            <client enabled='true' id= '10'>
                            <expiresin> 50 </expiresin>
                            </client>
                            </configuration>";

            XElement configdom = XElement.Parse(configdoc);

            foreach (XElement conf in configdom.Elements())
            {
                Console.WriteLine(conf.Name);
            }

            XElement client = configdom.Element("client");
            int id = (int)client.Attribute("id");
            Console.WriteLine(id);

            client.Attribute("id").SetValue(12);

            int expires = (int)client.Element("expiresin");

            client.Element("expiresin").SetValue(50);
            Console.WriteLine(expires);

            client.Add(new XElement("disconnects", 20));
            Console.WriteLine(configdom);
     
          Console.ReadLine();
        }
    }
}

In Example2, we are parsing a string which imitates a configuration file. In this configuration file, we are storing information about the client. The client here is an element inside the root element configuration. The client has two attributes: enabled and id where we have stored some information. The client element has a child element expiresin, which refers to the time in which this client is going to expire. We have stored some text value for this client. Next, we are simply parsing this string and storing the information in configdom object of XElement type. Then we used a foreach loop to get information about all the elements in this configdom object and then displayed the names of the elements. Since, at the moment, there is only one element in the file i.e client, its name i.e. client would be displayed on the screen.

Next, we first got the value of the id attribute of client element and parsed it to int. We then displayed it on the output screen. Then using SetValue method of the attribute type, we updated the value of the ‘id’ attribute of the client element. We then got the text value of the expiresin child element of client element and displayed it on the screen. And as we did for the id attribute, we changed the value of the expiresin element using the SetValue method. Then using the Add method, we added a new child element to the client element. We named the child element ‘disconnects’ and passed the value 20. Remember that in order to create new child node or element using Add method, the first parameter is the XElement anonymous object, followed by the value of the object. Now, if you display the original configdom object which we initialized in the start, you will see all the modifications and also the newly added child node. The output of the code in Example2 is as follows:

Output2

http://imgs.g4estatic.com/c-sharp/linq-xml/output2.png

You can see in the output that displaying configdom object of XElement class resulted in a formatted XML documentation output. It is due to the reason that whenever ToString method is called on any node, the resultant output is in formatted form. If you don’t want formatting with the output, you can simply specify it by using SaveOptions.DisableFormatting when you call to string method. Another very interesting thing to note here is that you can also save your X-DOM object to file, TextWriter, Stream or XMLWriter by calling ‘Save’ method of XElement or XDocument object. Save method actually serializes the contents of X-DOM and saves them to the specified file.

Creating X-DOM through Instantiation



Apart from using Parse and Load methods of XElement or XDocument class, you can also directly instantiate an X-DOM object. The next example demonstrates that how you can instantiate an X-DOM object directly. Have a look at our 3rd example.

Example3
Code:

using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;
using System.Data.Entity;
using System.Xml.Linq;

namespace CSharpTutorial
{
    class Program
    { 
        public static void Main()
        {
            XElement name = new XElement("name", "james");
            name.Add(new XComment("Good name"));

            XElement student = new XElement("student");
            student.Add(new XElement("id", 10));
            student.Add(name);

            Console.WriteLine(student);
            Console.ReadLine();
        }
    }
}

In the code in Example3, we first created a new element and we named it name. To create a new XElement you simply have to call its constructor and pass it two parameters. The first parameter is the name of the node or element and the second parameter is the value of that node. Similarly, in order to add a comment to a node, you simply have to call Add on that node and pass it an anonymous XComment object with comment in the parameter of that object. Next, we created another node and named it student. We added a new node which we named ‘id’ to it. Next, we simply added already created ‘name’ node to the student node. Now, if you display the student node, you will see that it has two child nodes. One is the id with value 10. The second node is the name node with value ‘james’ plus the name node would also contain a comment as we specified. The output of the code in Example3 is as follows:

Output3

http://imgs.g4estatic.com/c-sharp/linq-xml/output3.png

Functional X-DOM Construction



In Example3, we created an X-DOM object via simple instantiation. However, if we look at the code, it is difficult to understand the corresponding XML document to which this X-DOM corresponds. To make things little more interesting and easy, there is another way to instantiate X-DOM and this way is called functional construction. To explain it further lets dwell straight into our next example. Have a look at the 4th example of this tutorial.

Example4

Code:

using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;
using System.Data.Entity;
using System.Xml.Linq;

namespace CSharpTutorial
{
    class Program
    {
        public static void Main()
        {
          XElement student =
                            new XElement("student", new XAttribute("id", 10), new XAttribute("age", 20),
                                new XElement("firstname", "mark"),
                                new XElement("lastname", "taylor", new XComment("good last name"))
                                );

            Console.WriteLine(student);
          Console.ReadLine();
        }
    }
}

In Example4, have a look at following lines of code:
Code:

new XElement("student", new XAttribute("id", 10), new XAttribute("age", 20),
new XElement("firstname", "mark"),
new XElement("lastname", "taylor", new XComment("good last name"))

If you see pay attention to the above lines of code, you can identify the similarity between the XML documentand the X-DOM that generates the document. You can see that there is an element student, which has two attributes id and age. The student node has two child nodes: firstname and lastname. The lastname child node also contains one comment. Everything in the above code is easily understandable. The output of the code in Example4 is as follows:

Output4

http://imgs.g4estatic.com/c-sharp/linq-xml/output4.png

Apart from understandable syntax, functional construction has another advantage when it comes to generating X-DOM from entity objects. In that case functional construction can be used with the Select clause.

Another very interesting phenomenon with functional construction is that of deep cloning. In X-DOM, when you add a node to some element (via add method or functional construction), the Parent’s property of the child node is set to that node. If you add this child node to some other element, deep cloning occurs which means that child node in both cases would point to different memory locations. This concept is best explained with the help of another example. Have a look at the 5th Example of this article.

Example5

Code:

using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;
using System.Data.Entity;
using System.Xml.Linq;

namespace CSharpTutorial
{
    class Program
    {
        public static void Main()
        {
            XElement report = new XElement("report",
                                  new XElement("English", 85),
                                  new XElement("Mathematics", 100)
                                  );

            XElement student1 = new XElement("student1", report);
            XElement student2 = new XElement("student2", report);

            student2.Element("report").Element("English").SetValue(92);

            Console.WriteLine(student1);
            Console.WriteLine("");
            Console.WriteLine(student2);

            Console.ReadLine();
        }
    }
}

Have a look at the code in Example5, we have created ‘report’ object of XElement class. This report node has two children, English and Mathematics which contain values 85 and 100 respectively. We then created two more elements which we named student1 and student2 and passed both of them the same ‘report’ node as child. These child nodes would be deep cloned, which means that report child element of node student1 and report child element of node student2 would be stored and two different memory locations in the memory. We showed this by changing the value of the child element English of child element report of node student2. When we display the two nodes, you would see that report element of student2 would be altered but report element of student1 one would remain the same which shows that the two nodes are located at different memory locations. The output of the code in Example5 is as follows:

Output5

http://imgs.g4estatic.com/c-sharp/linq-xml/output5.png

It can be seen in the output that English element of student1 contains same value 85 whereas English element of student2 has changed to 92 without altering student1 because two report nodes are located at different memory locations owing to deep cloning.

Navigating and Querying X-DOM



We know that we can execute LINQ queries over any type that implements either IEnumerable<T> or IQueryable<T> interface. The X-DOM object returns a sequence that implements IEnumerable <T> interface therefore we can easily enumerate and navigate through it via foreach loop. Also, we can execute LINQ queries over it.

There are many methods that let you navigate through the X-DOM. In Example1, I showed you how FirstNode and LastNode return the first and last direct descendant. Also, you can call Nodes that return all the children elements in the form of sequence upon which you can enumerate via foreach loop. The Nodes method returns all the nodes including comment etc. If you only want element nodes, you can use Elements method. The usage of both Nodes and Elements method has been described in our next Example. Have a look at the 6th Example of this article.

Example6

Code:

using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;
using System.Data.Entity;
using System.Xml.Linq;

namespace CSharpTutorial
{
    class Program
    {
        public static void Main()
        {
        var student = new XElement("student",
                          new XElement("Report",
                            new XElement("Maths",85),
                            new XElement("English", 90)
                              ),
                          new XElement("Report",
                            new XElement("Maths",90),
                            new XElement("English", 100)
                              ),
                            new XComment("Reports are good")
                              );
        foreach (XNode node in student.Nodes())
        {
            Console.WriteLine(node.ToString(SaveOptions.DisableFormatting));
        }
        Console.WriteLine("");

        foreach (XElement element in student.Elements())
        {
            Console.WriteLine(element.ToString(SaveOptions.DisableFormatting));
        }
        Console.ReadLine();
        }
    }
}

Here we have got an X-DOM which we named student and it contains two Report elements followed by one comment. When you enumerate upon this X-DOM by calling Nodes method, you would see that all the nodes would be printed including the comment node. However, if you enumerate upon the student X-DOM by calling Elements over it, you would see that only element nodes would be displayed. Also, we have demonstrated in Example6 that how you can actually turn the formatting off by passing SaveOption.DisableFormatting to ToString method of nodes and elements. The output of the code in Example6 is as follows:

Output6

http://imgs.g4estatic.com/c-sharp/linq-xml/output6.png

You can see in the output that when Nodes method is used to enumerate all the nodes have been displayed including the comment node. But when Elements method is used for enumeration, only element nodes have been displayed. Also, the formatting is disabled in both the cases.

Querying X-DOM via LINQ



This is the section where real magic begins. I will show you how you can execute LINQ queries on X-DOM. Let us modify Example6 a bit more. For instance, what if you want to get values of all the Maths elements of the parent node Report? We have two Report nodes and each of the Report nodes contains a child node named Maths. But we are only concerned with the values. How we can do this? Also, what if we want to get only that Maths element whose value is equal to 85, what would we do in that case? The answer to both of these questions lies in our next example. Have a look at the 7th example of this article.

Example7

Code:

using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;
using System.Data.Entity;
using System.Xml.Linq;

namespace CSharpTutorial
{
    class Program
    {
        public static void Main()
        {
        var student = new XElement("student",
                          new XElement("Report",
                            new XElement("Maths",85),
                            new XElement("English", 90)
                              ),
                          new XElement("Report",
                            new XElement("Maths",90),
                            new XElement("English", 100)
                              ),
                            new XComment("Reports are good")
                              );

        IEnumerable stud = student.Elements("Report").Elements("Maths").Select(m=>m.Value);
        IEnumerable stud2 = student.Elements("Report").Elements("Maths").Where(m => m.Value == "85");

        int numofreports = student.Elements("Report").Count();
         

        foreach (string val in stud)
        {
            Console.WriteLine(val);
        }

        foreach (XElement val in stud2)
        {
            Console.WriteLine(val);
        }
        Console.WriteLine(numofreports);
        Console.ReadLine();
        }
    }
}

Now pay particular attention to the above code. We have same student X-DOM which we had in Example6. We mentioned that X-DOM, when enumerated returns a sequence which is actually a collection of all the child elements; therefore we can execute LINQ queries over X-DOM. First we called the Select operator to select values of all the elements with name Maths, we did this in following line of code:
Code:

IEnumerable stud = student.Elements("Report").Elements("Maths").Select(m=>m.Value);
Here, we are first getting the Report element by calling Elements function, we then chained Elements function to get the Maths element which is child element of the Report node and finally we called the LINQ Select operator to select values of the Maths element. Since we had total two Maths elements for student X-DOM, with values 85 & 90 respectively, the returned sequence will be a collection containing two strings.

Similarly, we used where operator to get only that Maths element whose value is equal to 85. This is done in the following line of code:
Code:

IEnumerable stud2 = student.Elements("Report").Elements("Maths").Where(m => m.Value == "85");
Here we have first accessed Maths element by chaining Elements method. Then we simply called Where operator and passed it a predicate which denotes that return those elements where value of element is equal to 85. Note that 85 is an integer but we have enclosed it in inverted commas. It is due to the fact that X-DOM treats every value as XText object of type string, therefore even if we have an integer value it is internally a string and we have to treat it as string.

Next we have simply enumerated upon the two collections and presented the result. You will see that first, the values of all the Maths element would be printed and then the Maths element with value 85 is will be printed. Finally, we called Count LINQ operator on all the Report elements to count total number of Report elements in the student X-DOM. The output of the code in Example7 is as follows:

Output7

http://imgs.g4estatic.com/c-sharp/linq-xml/output7.png

In Example7, we used Elements method which returns the sequence of all the matching elements. However, for simple navigation purposes where you have only one element, you can call Element method as well which returns the first matching element from the X-DOM.

Navigating through Descendants



The Elements and Nodes method returns only the first descendant elements or nodes. For example, from the student X-DOM in Example7, you could only directly access the Report element as it is the immediate descendent or the outer most element inside the root node which was student. If you had to access all the Maths elements, you could only access them via Report node. Till, now we have been accessing all the child elements via their parent element like this:
Code:

var elems= student.Elements("Report").Elements("Maths").Count();
Here, if we want to count the number of Maths elements in the student X-DOM, we first access its parent element Report and then its child element “Maths”. We cannot directly count the number of Maths elements. For example consider the following line of code:
Code:

var elems = student.Elements("Maths").Count();
The above line of could would return ‘0’, because we cannot access the child elements directly using the Elements method. In order to directly access the descendant elements, we can use Descendants and DescendantNodes. These methods not only return the child elements but also their descendant elements. In other words, you can navigate through the whole X-DOM tree via these methods. In our next example, we have demonstrated the usage of Descendant method. Have a look at the 8th example of this tutorial.

Example8

Code:

using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;
using System.Data.Entity;
using System.Xml.Linq;

namespace CSharpTutorial
{
    class Program
    {
        public static void Main()
        {
        var student = new XElement("student",
                          new XElement("Report",
                            new XElement("Maths",85),
                            new XElement("English", 90)
                              ),
                          new XElement("Report",
                            new XElement("Maths",90),
                            new XElement("English", 100)
                              ),
                            new XComment("Reports are good")
                              );

        var elems= student.Elements("Report").Elements("Maths").Count();
        Console.WriteLine(elems);

        elems = student.Elements("Maths").Count();
        Console.WriteLine(elems);

        var desc = student.Descendants("Maths");
        Console.WriteLine(desc.Count());
         
        Console.ReadLine();
        }
    }
}

In Example8, we have a student X-DOM similar to what we have in the last two examples. First we have counted the number of Maths element by calling Elements method twice, first for the parent element Report and then the child element Maths. This would return 2 as there are two Maths elements, one for each Report element. Next we have tried to directly access the number of Maths element. We haven’t called its parent Report element but we directly passed Maths to Elements method. This would return zero because Maths is a child element and cannot be accessed directly via Element method. Finally we have used Descendant method to count total number of Maths element. Since, Descendant method returns the complete object tree therefore Maths element can be directly accessed and counted. This would return 2. We have displayed values return by the three methods discussed on console. The output of the code in Example8 is as follows:

Output8

http://imgs.g4estatic.com/c-sharp/linq-xml/output8.png

Navigating to Parent Nodes



Though there are many methods that can be used to navigate through the parents nodes. We would discuss two of them here:

Parent

Parent method returns the information about the immediate parent of the specified element. It ignores if there is a parent node of the immediate parent. It only returns information about the parent node.

Ancestors

This method returns information of all the parent nodes of a particular element. Basically it returns collection of nodes where first element is the parent of the specified element; second node is the parent of the parent node of the specified element and so on.

In our next example, we have incorporated these two methods to demonstrate their usage. Have a look at the ninth example of this tutorial.

Example9

Code:

using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;
using System.Data.Entity;
using System.Xml.Linq;

namespace CSharpTutorial
{
    class Program
    {
        public static void Main()
        {
        var student = new XElement("student",
                          new XElement("Report",
                            new XElement("Maths",85),
                            new XElement("English", 90)
                              ),
                          new XElement("Report",
                            new XElement("Maths",90),
                            new XElement("English", 100)
                              ),
                            new XComment("Reports are good")
                              );

        Console.WriteLine("\nSegment 1: ---------------------------\n");

        XElement parent = student.Element("Report").Parent;

        Console.WriteLine(parent);
        Console.WriteLine("\nSegment 2: ---------------------------\n");

        parent = student.Element("Report").Element("Maths").Parent;

        Console.WriteLine(parent);
        Console.WriteLine("\nSegment 3: ---------------------------\n");

        var parent2 = student.Element("Report").Element("Maths").Ancestors();

        foreach (XNode par in parent2)
        {
            Console.WriteLine(par);
        }
        Console.WriteLine("\nSegment 4: ---------------------------\n");

        var parents = student.Descendants("Maths").Ancestors();
        foreach (XNode par in parents)
        {
            Console.WriteLine(par);
        }

        Console.WriteLine("\n---------------------------\n");
        Console.ReadLine();
        }
    }
}

The code has been logically divided into four segments, where each segment demonstrating specific usage of Parent and Ancestor method. These segments have been separated by printing some segment information on the screen. In segment 1, we are simply calling Parent on the Report element; this would give complete information about its parent node which is student.

Next, in segment 2 we called Parent method on Maths element. This would return information about the immediate parent node of Maths element which is Report. Note, here we first called Element method to get Report element and then again Element method to access “Maths” element. You cannot directly call Parent on Maths element as it is descendant node.

In segment 3, we called Ancestor method on the Maths element and this time it will first return information about the immediate parent node which is Report and then the parent node of the Report which is student.

In the 4th segment, we explained that how you can directly access all the parents of Maths by using Descendant method and then by calling Ancestor method. This will first return the information about the immediate parent which is Report ant the parent of Report which is student. This will be displayed twice, since there are two Maths elements in the X-DOM tree. The output of the code in Example9 is as follows:

Output9 (a)

http://imgs.g4estatic.com/c-sharp/linq-xml/output9a.png

Output9 (b)

http://imgs.g4estatic.com/c-sharp/linq-xml/output9b.png


All times are GMT +5.5. The time now is 04:46.