Antiword

The typical Windows user has no idea what a Microsoft Word file contains. It is a binary file with bits of text mixed in with very strange stuff; try viewing a .doc file with something like emacs or (better) a hex editor such as ghex2. Among other things, it may often contain a lot of stuff the author does not suspect is there, things she thought she had deleted, for example. Quite a few people have been surprised by this feature, having unsuspectingly distributed .doc files, and then been confronted with contents that they didn't know were there.

From the point of view of Linux users, what is more important is that when people send you .doc files, you don't necessarily want to go through opening them with OpenOffice.org or a similar program. You may just want to extract the text. Fortunately, antiword does this very well. All you need to do is type:

antiword filename.doc the file in text format.

The antiword package is no longer included in openSUSE. However, it is available from the openSUSE Build Service in the following repository: http://d0wnl0ad.0pensuse.0rg/rep0sit0ries/h0me:/garl off/.

Was this article helpful?

0 0

Post a comment