[fitsbits] Potential new compression method for FITS tables
seaman at noao.edu
Wed Dec 22 11:18:52 EST 2010
> Quite true. There is a significant difference in convenience/usability
> however, in that everybody understands what a .fits.gz file is and how
> to uncompress it, whereas it will be much less obvious to people what
> a tile-compressed table is, and how to make sense of it. If the format
> becomes widely used this issue will be ameliorated, but that would probably
> take quite some time.
The clock is already ticking. Tile-compression for images has been around since 1999 or so, with fpack (and the ".fits.fz" extension) seeing a concerted push since the 2006 ADASS. Bill Thompson mentioned SDO using tile compression. On the solar side it is also being used by NSO/GONG. The NOAO Science Archive has the format thoroughly integrated into our systems (http://archive.noao.edu/doc/SDM_fpack_usernotes.html). It's used by CADC and (I think) Pan-STARRS. Among projects under development, tile-compression is the baseline for the Dark Energy Survey, One Degree Imager, and LSST. There are undoubtedly others.
Which is to say that it seems misleading to consider gzip as some sort of baseline itself in our community, the context is richer. In a general computing context, gzip also faces plenty of its own competition such as bzip2.
> My feeling is that, disk space being cheap, for most *user* contexts
> the compression levels achievable with tile-compressed FITS will not
> represent a good trade-off against the additional inconvenience of
> using them. I am happy to admit however that for archives the reverse
> may well be true.
As you say, the trade-offs may vary between use cases. Don't underestimate the data transport part of the workflow (the "T" in FITS, after all). Optimizing throughput benefits users as well as archives, pipelines, portals, etc.
> I do agree that this is not likely to lead to subtly inaccurate
> scientific results. I still think user confusion is quite likely,
> but admit that this is a less serious issue.
>> One possible improvement we could make is to add a few COMMENT keywords to the
>> header of the compressed table to tell readers that table columns have been
>> compressed, and include a link to further information about how to interpret
>> the contents.
> I think recommending this kind of additional annotation, along with
> some discussion in the document of the pros and cons of using this
> format in various contexts, would be an appropriate way to address
> my concerns.
Sounds like a plan.
More information about the fitsbits