Office Formats

Figure 13-4: Editing XML in emacs

The .rtf (Rich Text Format) format is often mentioned as an "open" text-based format for interchanging documents. This file format was developed by Microsoft. It is a plain text format with markup, and there is an openly published specification for it, unlike the binary .doc files.

An RTF file is actually not so nice when you look inside it:

[email protected]:~> less afile.rtf {\rtf1\ansi\deff0\adeflang1025

{\fonttbl{\f0\froman\fprq2\fcharset0 Nimbus Roman No9 L{\*\falt Times New Roman};}{\f1\froman\fprq2\fcharset0 Nimbus Roman N

o9 L{\*\falt Times New Roman};}{\f2\fswiss\fprq2\fcharset0 Nimbus Sans L{\*\falt

\par {\loch\f4\fs22\lang2057\i0\b0 The key delivery of this project was [...]

Since you are using the less program to paginate the output of the file afile.rtf, you can press q at any time to exit from less and return to the command prompt.

One problem is that it is difficult to extract the pure text from all the markup and formatting instructions. Another is that there have been several revisions of the RTF specification. But RTF files open well in any of the Linux word processing applications, including those that are have a smaller footprint than

Working with Excel Files

Microsoft Excel files usually open just fine in or Gnumeric provided that they don't include complex macros, in which case you may have difficulties.

Note has its own macro language, but this is not compatible with VBA

(Visual Basic for Applications), which is used by Microsoft Office. In general, this means that you will have to convert or rewrite the macros in an Excel workbook to make it work in

Working with Access Files

Microsoft Access databases are a problem in more ways than one: Until recently, there was no freely available open source Linux desktop application with similar functionality. That has changed with the release of Rekall under an open source license. Rekall is included in SUSE Linux Professional.

To deal with the files that Access creates (.mdb files), the Mdbtools project may be useful:

Otherwise, the best approach is to use an intermediate format (such as .csv or an SQL dump) for export and import.

The Native File Formats

The native file formats are zipped archives that contain a variety of XML documents, as in the following example:

[email protected]:~> file afile.sxw afile.sxw: Zip archive data, at least v2.0 to extract [email protected]:~> zipinfo afile.sxw

Archive: afile.sxw 9043 bytes 7 files

-rw--------2.0 fat 30 b- stor 23-Jun-04 11:39 mimetype

-rw--------2.0 fat 18 b- stor 23-Jun-04 11:39 layout-cache

-rw--------2.0 fat 10336 bl defN 23-Jun-04 11:39 content.xml

-rw--------2.0 fat 17791 bl defN 23-Jun-04 11:39 styles.xml

-rw--------2.0 fat 1158 b- stor 23-Jun-04 11:39 meta.xml

-rw--------2.0 fat 7064 bl defN 23-Jun-04 11:39 settings.xml

-rw--------2.0 fat 850 bl defN 23-Jun-04 11:39 META-INF/manifest.xml

7 files, 37247 bytes uncompressed, 8261 bytes compressed: 77.8%

XML is a structured markup language, which means that all OpenOffice documents are ultimately text documents, unlike the traditional Microsoft formats, which are binary. XML was designed as a portable document description format that separates information about the content of a document from the information about how the document is to be formatted, known as its presentation format. XML documents surround portions of the text with tags (more properly known as elements) that identify the way in which the associated text fits into the entire document. Tags identify portions of your document's content such as paragraphs, headings, text to be emphasized, quotations, lists and portions of lists, and so on. Writing and storing documents in XML makes them usable by any software package that understands XML, and therefore makes them more portable than documents stored in a format that is specific to a certain software package. This in turn means that, in principle at least, a set of documents can be processed with external scripts to extract or change information in them in some uniform way.

OpenOffice documents use a set of XML document descriptions that have been adopted as a draft standard by OASIS (Organization for the Advancement of Structured Information Standards). See for more information about OASIS and the various standards that they are working on or sponsoring. This standard is known as the OpenDocument standard. Documents created in OpenOffice 2.0 will be fully compliant with this standard. Documents produced by versions of OpenOffice prior to version 2.0 are XML documents, but use a simpler XML document type definition.

For the complete OASIS OpenDocument standard, go to committees/download.php/12573/OpenDocument-v1.0-os.sxw. This is a version of the standard in OpenOffice format. A PDF version is also available at 572/OpenDocument-v1.0-os.pdf.

Continue reading here: Compressing Files

Was this article helpful?

0 0