![]() |
Read and write ODF/ODS files (OpenDocument Spreadsheets)
IntroductionThe OpenDocument Format (ODF) is an XML-based file format for representing electronic documents such as spreadsheets, charts, presentations and word processing documents. The standard was developed by the OASIS (Organization for the Advancement of Structured Information Standards), and it is free and open format. The OpenDocument format is used in free software and in proprietary software. Originally, format was implemented by the OpenOffice.org office suite and, with Office 2007 SP2, Microsoft also supports ODF subset. This article will explain basics of ODF format, and specifically its implementation in spreadsheet applications (OpenOffice.org Calc and Microsoft Office Excel 2007 SP2). Presented is a demo application which writes/reads tabular data to/from .ods files. Application is written in C# using Visual Studio 2008. Created .ods files can be opened using Excel 2007 SP2 and OpenOffice.org Calc. ODF formatOpenDocument format supports document representation:
Every ODF file is a collection of several subdocuments within a package (ZIP file), each of which stores part of the complete document. Each subdocument stores a particular aspect of the document. For example, one subdocument contains the style information and another subdocument contains the content of the document. This approach has following benefits:
Besides them, in package can be many other subdocuments like document thumbnail, images, etc. In order to read the data from an ODF file you need to:
On the other side, if you want to create a new ODF file, you need to:
Spreadsheet DocumentsSpreadsheet document files are the subset of ODF files. Spreadsheet files have .ods file extensions. The content (sheets) is stored in content.xml subdocument. http://www.go4expert.com/images/arti...les/image1.png Picture 1: content.xml subdocument. As we can see on Picture 1, sheets are stored as xml elements. They contain column and row definitions, rows contain cells and so on… On picture is data from one specific document, but from this we can see basic structure of content.xml file (you can also download full ODF specification). ImplementationOur demo is Windows Forms application (Picture 2), written in C# using Visual Studio 2008. http://www.go4expert.com/images/arti...les/image2.png Picture 2: Demo application. Application can:
Creating new document and underlying model of applicationInternal, Spreadsheet document is stored as DataSet. Each sheet is represented with DataTable, sheet’s row with DataRow, and sheet’s column with DataColumn. So, to create a new document we have to create new DataSet, with DataTables. Each DataTable has number of rows and columns that conforms to our needs. To show data from our DataSet (and to allow editing that data) application dynamically creates tabs with DataGridViews (that are connected to our DataTables). Through interface, user can read, write, edit data and add new rows to Spreadsheet document. Zip component and XML parserAlthough classes from System.IO.Packaging namespace (.NET 3.0) provides way to read and write ZIP files, they require different format of ZIP file. Because of that, our demo uses open source component called DotNetZip. Using ZIP component we can extract files, get subdocument, replace (or add) subdocuments that we want and save that file as .ods file (which is ZIP file). For processing documents we have used XmlDocument because it offers easy way to reach part that we want. Note that, if performance is crucial for you, you should use XmlTextReader and XmlTextWriter. That solution need more work (and code), but provides better performance. Reading Spreadsheet DocumentTo read document we follow these steps:
Although ODF specification provides way to specify default row, column and cell style, implementations have nasty practice (that specially applies for Excel) that they rather write sheet as sheet with maximum number of columns and maximum number of rows, and then they write all cells with their style. So you could see that your sheet has more than 1000 columns (1024 in Calc and 16384 in Excel), and even more rows (and each rows contains the number of cells that is equal to the number of columns), although you only have write data to first few rows/columns. ODF specification provides a way that you specify some element (like column/row/cell) and then you specify the number of time it repeats. So above behavior doesn’t affect the size of the file, but that complicates our implementation. Because of that, we can’t just read the number of columns and add equal number of DataColumns to DataTable (because of performance issues). In this implementation, we rather read cells and, if they have data, we first create rows/columns they belong to, and then we add those cells to the DataTable. So, at the end, we allocate only space that we need to. Writing Spreadsheet DocumentTo write document we follow these steps:
In this application, as template, we have use empty document. But application can be easy modified to use some other template (so, that you have preserved styles, etc). Alternative waysAs always in programming, there is more than one method to achieve the same thing. ODF files are just a collection of xml files, packed in zip files so, any of the vast number of tools for handling zip files and XML data can be used to handle OpenDocument. As another option, you could use some third party component which has support for ODF format. This will probably cost you some money but has advantage that usually more than one format (for example: GemBox.Spreadsheet reads/writes XLS, XLSX, CSV, HTML and ODS) are supported within the same API, so your application will be able to target different file formats using the same code. |
Re: Read and write ODF/ODS files (OpenDocument Spreadsheets)
I want more information about this topic.
|
Re: Read and write ODF/ODS files (OpenDocument Spreadsheets)
Nomination for Article of the month - Aug 2009 Started.
|
Re: Read and write ODF/ODS files (OpenDocument Spreadsheets)
Vote for this article for Article of the month - August 2009
|
| All times are GMT +5.5. The time now is 23:47. |