XML parsing in Perl

pradeep's Avatar author of XML parsing in Perl
This is an article on XML parsing in Perl in Perl.
As the world is fast becoming aware of the benifits of XML, perl developers would also want to use XML in their CGI-Perl scripts. XML parsing seems to be one hell of a job when you look at the XML::Parser module, but XML::Simple comes to the rescue with the ease of use it brings.

Installing XML::Simple

XML::Simple works by parsing an XML file and returning the data within it as a Perl hash reference. Within this hash, elements from the original XML file play the role of keys, and the CDATA between them takes the role of values. Once XML::Simple has processed an XML file, the content within the XML file can then be retrieved using standard Perl array notation.

It can be installed from the shell or CPAN module:

Code:
shell> perl -MCPAN -e shell
  OR
  cpan> install XML::Simple
Basic XML Parsing

Once you've got the module installed, create the following XML file and call it "data.xml":

Code: XML
<?xml version='1.0'?>
   <employee>
          <name>Pradeep
          <age>23
          <sex>M
          <department>Programming
   </employee>
And then type out the following Perl script, which parses it using the XML::Simple module:

Code: Perl
#!/usr/bin/perl
   
   # use module
   use XML::Simple;
   use Data::Dumper;
   
   # create object
   $xml = new XML::Simple;
   
   # read XML file
   $data = $xml->XMLin("data.xml");
   
   # print output
   print Dumper($data);

When you run this script, here's what you'll see:

Code:
$VAR1 = {
  		  'department' => 'Programming',
  		  'name' => 'Pradeep',
  		  'sex' => 'M',
  		  'age' => '23'
  		};
As you can see, each element and its associated content has been converted into a key-value pair of a Perl associative array. You can now access the XML data as in the following revision of the script above:

Code: Perl
#!/usr/bin/perl
   
   # use module
   use XML::Simple;
   
   # create object
   $xml = new XML::Simple;
   
   # read XML file
   $data = $xml->XMLin("data.xml");
   
   # access XML data
   print "$data->{name} is $data->{age} years old and works in the $data->{department} section\n";
Here's the output:

Code:
Pradeep is 23 years old and works in the Programming section
XML::Simple can help you achieve more complex parsing, which we'll look at some other day. Till then happing parsing.
0
tarunt's Avatar, Join Date: Feb 2010
Newbie Member
hi pradeep,

please provide usage of XML:Sax in an example.
0
bharatbsharma's Avatar, Join Date: May 2010
Newbie Member
i am using this cpan module and this is really useful
But i have a query

my xml file looks like this
--------------------------------------------
<mac>
<calls>
<scallPs>83234</scallPs>
<sreadPs>7462</sreadPs>
<swritPs>7394</swritPs>

</calls>

<cpu>
<usr>10</usr>
<sys>3</sys>
<wio>0</wio>
<idle>87</idle>
</cpu>
</mac>
------------------------------------------
i can print individual value like $data->{cpu}->{idle}

But how can i find no. of elements in <cpu> or <calls> .. which is 3 and 4 respectively.
and how can find find the no. of element of <map> which is 2 namely <CPU> and <calls>

Thanks in advance
0
amangupta14's Avatar, Join Date: May 2010
Newbie Member
I need to parse a huge XML file, please let me know which module to use for the same.

XML file is something like this
Code:
<datapoint><name>CMS</name><pid>2416</pid><time>5/10/2010 10:51:50</time><machine>DEWDFTF11382S</machine><CPU>2.48415111588068</CPU><CPUTime>47500</CPUTime><VirtualBytes>338404</VirtualBytes><PrivateBytes>95096</PrivateBytes><HandleCount>2678</HandleCount><Threads>159</Threads> </datapoint>
<datapoint><name>java</name><pid>420</pid><time>5/10/2010 10:51:50</time><machine>DEWDFTF11382S</machine><CPU>0</CPU><CPUTime>19656.25</CPUTime><VirtualBytes>493860</VirtualBytes><PrivateBytes>115920</PrivateBytes><HandleCount>1691</HandleCount><Threads>93</Threads> </datapoint>
<datapoint><name>java</name><pid>4880</pid><time>5/10/2010 10:51:50</time><machine>DEWDFTF11382S</machine><CPU>4.96830223176136</CPU><CPUTime>333437.5</CPUTime><VirtualBytes>589440</VirtualBytes><PrivateBytes>206056</PrivateBytes><HandleCount>1934</HandleCount><Threads>93</Threads> </datapoint>
<datapoint><name>java</name><pid>7280</pid><time>5/10/2010 10:51:50</time><machine>DEWDFTF11382S</machine><CPU>1.55259444742543</CPU><CPUTime>305546.875</CPUTime><VirtualBytes>819052</VirtualBytes><PrivateBytes>476528</PrivateBytes><HandleCount>6101</HandleCount><Threads>72</Threads> </datapoint>
<datapoint><name>java</name><pid>3048</pid><time>5/10/2010 10:51:50</time><machine>DEWDFTF11382S</machine><CPU>0.931556668455255</CPU><CPUTime>1125</CPUTime><VirtualBytes>196352</VirtualBytes><PrivateBytes>19536</PrivateBytes><HandleCount>509</HandleCount><Threads>19</Threads> </datapoint>
<datapoint><name>java</name><pid>2752</pid><time>5/10/2010 10:51:50</time><machine>DEWDFTF11382S</machine><CPU>0.310518889485085</CPU><CPUTime>250</CPUTime><VirtualBytes>190936</VirtualBytes><PrivateBytes>14520</PrivateBytes><HandleCount>337</HandleCount><Threads>14</Threads> </datapoint>
<datapoint><name>Disk (0 C:)</name><pid>0</pid><time>5/10/2010 10:51:50</time><machine>DEWDFTF11382S</machine><DiskRead_KBPerSec>0.00</DiskRead_KBPerSec><DiskWrite_KBPerSec>28.05</DiskWrite_KBPerSec><DiskBusy_Percent>2.79</DiskBusy_Percent><DiskRead_Percent>0.00</DiskRead_Percent><DiskWrite_Percent>2.79</DiskWrite_Percent><DiskIdle_Percent>97.75</DiskIdle_Percent> </datapoint>
<datapoint><name>Disk (_Total)</name><pid>0</pid><time>5/10/2010 10:51:50</time><machine>DEWDFTF11382S</machine><DiskRead_KBPerSec>0.00</DiskRead_KBPerSec><DiskWrite_KBPerSec>28.05</DiskWrite_KBPerSec><DiskBusy_Percent>2.79</DiskBusy_Percent><DiskRead_Percent>0.00</DiskRead_Percent><DiskWrite_Percent>2.79</DiskWrite_Percent><DiskIdle_Percent>97.75</DiskIdle_Percent> </datapoint>
<datapoint><name>Network (Broadcom NetXtreme Gigabit Fiber)</name><pid>0</pid><time>5/10/2010 10:51:50</time><machine>DEWDFTF11382S</machine><NetworkReceived_KBPerSec>26.50</NetworkReceived_KBPerSec><NetworkSent_KBPerSec>103.43</NetworkSent_KBPerSec> </datapoint>

Last edited by shabbir; 26May2010 at 20:21.. Reason: Code blocks
0
rajseo's Avatar
Banned
Nice and great suggestion thanks for sharing....