Cross Platform File Formats

Many tools exist to create files in most cross-platform formats. Linux tools to do so are described in other chapters of this book, such as Chapter 7, "Using Linux for Office Productivity" and Chapter 8, "Miscellaneous User Tools." Some important classes of such formats include:

Text Files One of the simplest file formats is plain text, which is encoded using the American Standard Code for Information Interchange (ASCII). Some minor variants on ASCII, most of which relate to the handling of end-of-line characters, also exist. Linux and other Unix-like OSs use a single character, which is often referred to as a line feed (LF) or new line (NL). DOS, Windows, and OS/2 all use two characters—a carriage return (CR) plus an LF. Mac OS uses a CR alone. The conv mount option (see Table 10.1) enables automatic translation of these features for some or all files, but this translation wreaks havoc for other file types. Some distributions ship with simple tools called dos2unix and unix2dos to perform these translations, too. Some file types are built using ASCII and so can be treated as such.

For instance, the Hypertext Markup Language (HTML) that's at the core of the Web is nothing but ASCII with special formatting conventions.

Word Processing Files Some word processors and other text-preparation tools, such as Anywhere Office and LaTeX, are built atop ASCII, much like HTML. Other word processors use binary formats. Most of these formats are proprietary in at least some sense, although of course the open source programs use formats that are public and well documented. One word processing format that's at least close to being cross-platform is Rich Text Format (RTF); most word processors can read and write RTF files. Unfortunately, the results are unpredictable. An emerging cross-platform standard is the Extensible Markup Language (XML). OpenOffice.org uses a compressed version of XML, and other word processors are adding XML support, as well.

Spreadsheet Files Most spreadsheet programs use their own file formats. One common, but low-level, cross-platform format is the comma-separated value (CSV) format, which uses ASCII lines with commas separating spreadsheet fields. All modern spreadsheets can read and write CSV files, although they don't always support all spreadsheet features very well. Therefore, CSV is best used to transfer spreadsheet data—equations are likely to be lost.

Graphics Files Many cross-platform graphics file formats exist. The most common are the Tagged Image File Format (TIFF), the Graphic Interchange Format (GIF), the Joint Photographic Experts Group (JPEG), the Portable Network Graphics (PNG) format, PostScript, and the Portable Document Format (PDF). All of these formats except PostScript are binary in nature, and all are supported by a wide array of graphics programs. Some of these formats are more open than others, though. Of particular note, GIF uses a compression algorithm that's covered by patents, so many open source software programs don't support GIF, or at most only support reading GIF files. The patents are scheduled to expire in 2003. PDF is a proprietary binary derivative of PostScript. I include it as a cross-platform format because both Adobe and third parties have developed PDF creation and viewing tools for a wide range of platforms.

Audio Files Several audio file formats are commonplace. In the Windows world, .wav files are the standard, and many Linux audio tools handle these files quite well. Sun's .au format is common in the Unix world. Moving Picture Experts Group Layer 3 (MP3) files are an increasingly common format for music players, but MP3, like GIF, is covered by patents, so many Linux users favor the newer Ogg Vorbis file format.

Archive Files Certain programs merge files into carrier files known as archives. Some of these formats are most common on one platform or another, but most are open and cross-platform. The most common archive file format is the .zip file, which is standard on Windows systems. You can create and extract files from a .zip archive with the zip and unzip Linux commands, respectively. The approximate Linux equivalent to a .zip file is a tarball, which usually has a .tar.gz, .tgz, or ,tar.bz2 extension. Most Linux distributions also support RPM Package Manager (RPM) files or Debian packages, both of which are open cross-platform formats, although they're not often used outside of Linux.

As a general rule, it's easy to exchange graphics and audio files across platforms because the many cross-platform formats are both extremely common and extremely well-supported. Plain text files also don't pose many problems. End-of-line handling is the most common issue with ASCII, but it can be handled with mount options, tools such as unix2dos and dos2unix, or even individual text editors—most can handle all three common end-of-line conventions.

Warning Some Linux configuration files are sensitive to end-of-line conventions; saving a file with DOS or Mac OS conventions may result in the file not working. In a critical startup configuration file, the result can be a system that doesn't boot. This is one of the reasons you shouldn't try editing Linux configuration files from DOS, Windows, or Mac OS unless you really need to do so.

Unfortunately, file formats for office productivity tools are not very well standardized, and the cross-platform formats that do exist are seldom adequate to the task. This is one of the reasons that people run emulators, as described in the upcoming section, "Improving Your Productivity with Emulators."

0 0

Post a comment