The file Command

Linux does better than this: The command file is used to determine what a file really is by looking at it internally. For a very large number of file types, a "magic number'' is included within the file that the file command looks up in the "magic number file'' that is at /usr /share/misc/magic (this is the human readable form) and /usr/share/misc/magic.mgc (the compiled binary file created from /usr/share/misc/magic for speed of access). It can also distinguish files that do not have magic numbers by looking at characteristic contents (as seen, for example, in a variety of text files with markup).

To use the file command, simply type file followed by the file or files you want to analyze. For example:

[email protected]:~/temp> file index.html index.html: HTML document text [email protected]:~/temp> file realworddoc.doc realworddoc.doc: Microsoft Office Document

We know that index.html is a real HTML file and realworddoc.doc is a real Microsoft Word document. Let's see what happens if we make an unusual filename change:

[email protected]:~/temp> cp index.html strange.doc [email protected]:~/temp> file strange.doc strange.doc: HTML document text

Here, file was not fooled by the fact that we changed the file extension of the file. Actually it isn't too hard to fool file, but you actually need to copy the first 8 bytes of the real Microsoft Word document into a new file to do so, as follows:

[email protected]:~/temp> dd if=realworddoc.doc of=8bytes bs=1 count=8 [email protected]:~/temp> cat 8bytes index.html > newfile.doc [email protected]:~/temp> file newfile.doc newfile.doc: Microsoft Office Document

You can actually check how file did it:

[email protected]:~/temp> grep Office /usr/share/misc/magic 0 string \376\067\0\043 Microsoft Office Document

0 string \320\317\021\340\241\261\032\341 Microsoft Office Document 0 string \333\245-\0\0\0 Microsoft Office Document [email protected]:~/temp> od -b newfile.doc |more

0000000 320 317 021 340 241 261 032 341 074 041 104 117 103 124 131 120 [ ... ]

The command od gives you an octal dump of the file, and you can see that it has the second of the possible signatures of an Office document at its start. Because you are piping the output of od to more, you can terminate the command at any time by pressing q to exit the more command.

Was this article helpful?

0 0

Post a comment