codaland
Tuesday, February 10, 2004
       
Java metadata extraction / file format identification library
When running on an operating system that knows about file types regardless of file extensions (like Mac OS), querying that type of information may be an option. However, that approach is too platform dependent. The only way to be sure about what a file contains is to look at its content. Unfortunately, this requires knowledge of the internal structure of all file formats.

A working solution is to collect information on the most interesting and most common file formats. This has been done in the Unix command line utility file(1) for a long time. It checks each file to be examined against a list of known signatures (the magic(5) file).

This first version of a yet to be named file format identification library uses the same approach. In the future the library will also be able extract format-group-specific metadata from files, just like ImageInfo does it for images.
#




blogroll:

winer
slashdot
javalobby
the server side
developerWorks
news.com
dotnet247
dotnet junkies
gotdotnet
sam gentile
sam ruby
paul prescod
.net guy
0xdecafbad
jon udell
john robb
dj's
rebelutionary
blogging roller
desktop fishbowl
servlets.com
cafe au lait
be blogging
kevin burton
paradox1x
james strachan
the truth is out there
brett morgan
blogging roller #2
joe's jelly
psquad's corner
zopen-x
rickard oberg
the saturn times
russel beattie
gerhard froelich
pete drayton
clemens vaster
weakliem
reinacker
drew
wagner
ingo rammer
ken rawlings
system.error.emit
tomas
simon fell
bit working
justin rudd
chris sells
john lam
jim murphy
brian jepson
john burkhardt
matt pope
better living through software
windley
caetano
kulchenko
loosely coupled
understanding software engineering
rest lst,rdf-interest lst,tag lst ucapi lst
archives:


A man, his puppy, and a double barreled shotgun.

Powered by Blogger