[iostreams] Possible bug in gzip_decompressor?
Dear list,
I have been using boost.iostreams to read gzipped files for some time, but
recently have run into some problems with files that are compressed using
tools other than gzip (namely bgzip from
http://samtools.svn.sourceforge.net/viewvc/samtools/trunk/samtools/).
Here is the file:
$ hexdump -C ../hello.txt.bgz
00000000 1f 8b 08 04 00 00 00 00 00 ff 06 00 42 43 02 00
|............BC..|
00000010 35 00 f3 48 cd c9 c9 d7 51 28 c9 c8 2c 56 00 a2
|5..H....Q(..,V..|
00000020 44 85 92 d4 e2 12 85 b4 cc 9c 54 3d 2e 00 86 1e
|D.........T=....|
00000030 ef a4 1c 00 00 00 1f 8b 08 04 00 00 00 00 00 ff
|................|
00000040 06 00 42 43 02 00 1b 00 03 00 00 00 00 00 00 00
|..BC............|
00000050 00 00 |..|
00000052
I think this is valid gzip format (based on reading
http://www.gzip.org/zlib/rfc-gzip.html). It differs from what you get by
compressing the file using the gzip program in two respects: 1. it uses the
'extra' flag (FLG.EXTRA) to add extra data to each block header, and 2. it
has two blocks, the second of which is empty.
My file reading code is as follows:
/* test.cpp */
#include <iostream>
#include
' what(): gzip error
I was able to fix this problem with two code changes in the boost.iostreams implementation, as follows. *Change #1*: at libs/iostreams/src/gzip.cpp:65: change "state_ = s_extra;" to "state_ = s_xlen;" Rationale: without this change, it looks like state s_xlen is never entered so the length of the extra data is not parsed. *Update:* I notice this change was already raised in ticket #5908, and has already been fixed in trunk. * * *Change #2*: at libs/iostreams/src/zlib.cpp:153: change "crc_imp_ = 0;" to "crc_ = crc_imp_ = 0;" Rationale: without this change, an empty block of data does not re-initialise the member variable crc_ of zlib_base. This causes an exception on line 447 of boost/iostreams/filter/gzip.hpp. I confirmed this behaviour a second way, by concatenating two files compressed using gzip, the first nonempty and the second empty. This creates a file which looks like this: $ hexdump -C hello2.txt.gz 00000000 1f 8b 08 00 0d ea b4 4f 00 03 cb 48 cd c9 c9 57 |.......O...H...W| 00000010 28 c9 48 2d 4a e5 02 00 8e 45 d1 59 0c 00 00 00 |(.H-J....E.Y....| 00000020 1f 8b 08 00 18 ea b4 4f 00 03 03 00 00 00 00 00 |.......O........| 00000030 00 00 00 00 |....| 00000034 This file again decompresses with gunzip but not with the program above. Change #2 above fixes this. Could someone take a look at this and let me know if this change is really appropriate? (Or perhaps I'm doing something wrong.) I can create a bug report if desired. Many thanks, Gavin Band.
On May 17, 2012, at 7:57 AM, Gavin Band
I was able to fix this problem with two code changes in the boost.iostreams implementation, as follows.
Change #1: at libs/iostreams/src/gzip.cpp:65: change "state_ = s_extra;" to "state_ = s_xlen;" Rationale: without this change, it looks like state s_xlen is never entered so the length of the extra data is not parsed. Update: I notice this change was already raised in ticket #5908, and has already been fixed in trunk.
Change #2: at libs/iostreams/src/zlib.cpp:153: change "crc_imp_ = 0;" to "crc_ = crc_imp_ = 0;" Rationale: without this change, an empty block of data does not re-initialise the member variable crc_ of zlib_base. This causes an exception on line 447 of boost/iostreams/filter/gzip.hpp.
Could someone take a look at this and let me know if this change is really appropriate? (Or perhaps I'm doing something wrong.) I can create a bug report if desired.
I'm not qualified to comment on your specific fix, but I often do see library authors requesting a Trac ticket - especially to attach a proposed patch. That way it doesn't depend on the author to remember a mail message.
On 18 May 2012 03:59, Nat Goodspeed
On May 17, 2012, at 7:57 AM, Gavin Band
wrote: *Change #2*: at libs/iostreams/src/zlib.cpp:153: change "crc_imp_ = 0;" to "crc_ = crc_imp_ = 0;" Rationale: without this change, an empty block of data does not re-initialise the member variable crc_ of zlib_base. This causes an exception on line 447 of boost/iostreams/filter/gzip.hpp.
Could someone take a look at this and let me know if this change is really appropriate? (Or perhaps I'm doing something wrong.) I can create a bug report if desired.
I'm not qualified to comment on your specific fix, but I often do see library authors requesting a Trac ticket - especially to attach a proposed patch. That way it doesn't depend on the author to remember a mail message.
Thanks for the reply. I have created ticket #6994 for Change #2 above. Best, Gavin.
participants (2)
-
Gavin Band
-
Nat Goodspeed