[iostreams] Interest in multithreaded lzma filter?

Jeff Bonyun

22 May 2019 22 May '19

3:47 p.m.

I've implemented a multithreaded update to iostreams' lzma filter for my own purposes, and I was wondering if this change would be considered for including in Boost.iostreams? It uses liblzma's own multithreaded functions, so the changes to boost are actually pretty minimal. It does add a new member to the boost-provided lzma options struct -- to specify your desired number of threads -- which changes how people might use the constructor for the filter. My two biggest concerns with including it in Boost.iostreams are: - Making the filter constructor backwards compatible. Right now, users can initialize the options struct with a single unbraced int when calling the filter constructor. With the change to the struct, I want to make sure this old way still works. I.e. not requiring a braced initializer list if you only want to provide the compression level. - Handling older versions of liblzma that didn't offer multithreading. I'll work on identifying the exact versions needed, and branching on their #defined version numbers. If there is appetite for it, I can make a pull request on github. As this would be my first submission to boost, I'm sure I'll screw something up and need correcting, so be patient. Jeff

Show replies by date

James E. King III

22 May 22 May

8:50 p.m.

On Wed, May 22, 2019 at 12:19 PM Jeff Bonyun via Boost <boost@lists.boost.org> wrote:

...

I've implemented a multithreaded update to iostreams' lzma filter for my own purposes, and I was wondering if this change would be considered for including in Boost.iostreams?

Yes this sounds like an improvement. How will we detect whether liblzma is capable? If it is not capable but specified, will we allow the compression anyway?

...

It uses liblzma's own multithreaded functions, so the changes to boost are actually pretty minimal. It does add a new member to the boost-provided lzma options struct -- to specify your desired number of threads -- which changes how people might use the constructor for the filter.

Is there also an option to have it use the number of cores the operating system is advertising?

...

My two biggest concerns with including it in Boost.iostreams are: - Making the filter constructor backwards compatible. Right now, users can initialize the options struct with a single unbraced int when calling the filter constructor. With the change to the struct, I want to make sure this old way still works. I.e. not requiring a braced initializer list if you only want to provide the compression level. - Handling older versions of liblzma that didn't offer multithreading. I'll work on identifying the exact versions needed, and branching on their #defined version numbers.

If there is appetite for it, I can make a pull request on github.

That's the best way to get started. Be sure to include tests that exercise it.

...

As this would be my first submission to boost, I'm sure I'll screw something up and need correcting, so be patient.

Jeff

I'm one of the community maintainers and iostreams is part of that collection, so bear with us as well if we're not immediately responsive. - Jim

Jeff Bonyun

9:40 p.m.

...

... How will we detect whether liblzma is capable? If it is not capable but specified, will we allow the compression anyway?

I've been investigating this. liblzma defines a version macro, LZMA_VERSION. It looks like liblzma obtained its multithreaded function in 5.1.1 (in 2011). So the idea would be to branch on this macro, and fall back to the currently-used single-threaded function for an older liblzma version. In that case, requesting any number of threads will carry out the operation, but will only use a single thread. This makes it tolerant of different versions, but does mark up the code with a few #if branches. I'm assuming boost goes for tolerance in this case.

...

Is there also an option to have it use the number of cores the operating system is advertising?

I'm glad you asked! The xz utility, that liblzma is packaged with, does this. But it isn't inherent to liblzma itself, which considers "0" threads to be an error. Copying the approach of xz, I'm intending to use lzma_cputhreads, which is exposed by liblzma. This was added in 5.2.0 (2014); prior to that it used an internal function that I don't think we should/can use. This would necessitate yet another #if branch. Assuming this is acceptable, using "0" threads would indicate that this all-cores method is desired. This matches the xz utility's approach. For liblzma versions before 5.2.0, a value of "0" would be translated into "1" instead. This isn't ideal, as the user clearly wanted more than 1 thread, but I'm not sure we should guess how many the user wanted. It would still fail in one situation: if your liblzma was built with `--disable-threads`. In this case, the documentation says that lzma_cputhreads is omitted. I would imagine it would result in a link-time error when building Boost.iostreams. I do not have a solution to this problem, because I don't know how to detect this before the link error. The best I can offer is a `#define BOOST_*` to disable it manually if the user runs into this problem. An alternative is using Boost.Thread, which would solve the problem gracefully. But I was going to avoid pulling in the extra library, and not do this. If you think the Official Boost Way would be to use Boost.Thread, I can change directions.

...

...
If there is appetite for it, I can make a pull request on github. That's the best way to get started. Be sure to include tests that exercise it.

I'll work on it! The bones are done. It's the tests, particular formatting for boost, and testing-with-different-liblzma versions that needs to be done. Jeff

James E. King III

23 May 23 May

12:04 a.m.

On Wed, May 22, 2019 at 5:40 PM Jeff Bonyun <jeff@citiumadvisors.com> wrote:

...

...
... How will we detect whether liblzma is capable? If it is not capable but specified, will we allow the compression anyway?

I've been investigating this. liblzma defines a version macro, LZMA_VERSION. It looks like liblzma obtained its multithreaded function in 5.1.1 (in 2011). So the idea would be to branch on this macro, and fall back to the currently-used single-threaded function for an older liblzma version. In that case, requesting any number of threads will carry out the operation, but will only use a single thread. This makes it tolerant of different versions, but does mark up the code with a few #if branches. I'm assuming boost goes for tolerance in this case.

...
Is there also an option to have it use the number of cores the operating system is advertising?

I'm glad you asked! The xz utility, that liblzma is packaged with, does this. But it isn't inherent to liblzma itself, which considers "0" threads to be an error. Copying the approach of xz, I'm intending to use lzma_cputhreads, which is exposed by liblzma. This was added in 5.2.0 (2014); prior to that it used an internal function that I don't think we should/can use. This would necessitate yet another #if branch. Assuming this is acceptable, using "0" threads would indicate that this all-cores method is desired. This matches the xz utility's approach. For liblzma versions before 5.2.0, a value of "0" would be translated into "1" instead. This isn't ideal, as the user clearly wanted more than 1 thread, but I'm not sure we should guess how many the user wanted.

It would still fail in one situation: if your liblzma was built with `--disable-threads`. In this case, the documentation says that lzma_cputhreads is omitted. I would imagine it would result in a link-time error when building Boost.iostreams. I do not have a solution to this problem, because I don't know how to detect this before the link error. The best I can offer is a `#define BOOST_*` to disable it manually if the user runs into this problem.

An alternative is using Boost.Thread, which would solve the problem gracefully. But I was going to avoid pulling in the extra library, and not do this. If you think the Official Boost Way would be to use Boost.Thread, I can change directions.

Fewer dependencies is better, so if lzma exposes something you can use that.

...

...
...
If there is appetite for it, I can make a pull request on github. That's the best way to get started. Be sure to include tests that exercise it.

I'll work on it! The bones are done. It's the tests, particular formatting for boost, and testing-with-different-liblzma versions that needs to be done.

Jeff

Thanks, Jim

Mike

12:39 p.m.

...

Gesendet: Mittwoch, 22. Mai 2019 um 23:40 Uhr Von: "Jeff Bonyun via Boost" <boost@lists.boost.org>

...
... How will we detect whether liblzma is capable? If it is not capable but specified, will we allow the compression anyway?

I've been investigating this. liblzma defines a version macro, LZMA_VERSION. It looks like liblzma obtained its multithreaded function in 5.1.1 (in 2011). So the idea would be to branch on this macro, and fall back to the currently-used single-threaded function for an older liblzma version. In that case, requesting any number of threads will carry out the operation, but will only use a single thread. This makes it tolerant of different versions, but does mark up the code with a few #if branches. I'm assuming boost goes for tolerance in this case.

Does a 2019 Boost release really have to be compatible with a Pre 2011 release of liblzma? Is this combination even tested? I.e. are we sure this would be the only incompatibility? How about just adding this functionality unconditionally and see if someone complains? Mike

James E. King III

1:19 p.m.

On Thu, May 23, 2019 at 8:39 AM Mike via Boost <boost@lists.boost.org> wrote:

...

...
Gesendet: Mittwoch, 22. Mai 2019 um 23:40 Uhr Von: "Jeff Bonyun via Boost" <boost@lists.boost.org>

...
... How will we detect whether liblzma is capable? If it is not capable but specified, will we allow the compression anyway?

I've been investigating this. liblzma defines a version macro, LZMA_VERSION. It looks like liblzma obtained its multithreaded function in 5.1.1 (in 2011). So the idea would be to branch on this macro, and fall back to the currently-used single-threaded function for an older liblzma version. In that case, requesting any number of threads will carry out the operation, but will only use a single thread. This makes it tolerant of different versions, but does mark up the code with a few #if branches. I'm assuming boost goes for tolerance in this case.

Does a 2019 Boost release really have to be compatible with a Pre 2011 release of liblzma? Is this combination even tested? I.e. are we sure this would be the only incompatibility?

How about just adding this functionality unconditionally and see if someone complains?

Mike

_____________

So basically if someone has an older lzma they are limited to boost 1.70.0 or earlier. That's perhaps reasonable as long as it is documented, but it hasn't been the typical way I have seen things added to existing code. Another way to handle it is to keep the old path in a macro like BOOST_IOSTREAMS_NO_MULTITHREADED_LZMA. This is common through boost. Ideally, detecting the liblzma version from the header could then set BOOST_IOSTREAMS_NO_MULTITHREADED_LZMA and it would always do the right thing, but the simplest way to do it is to let the consumer of the library set this if they need it. - Jim

2242

Age (days ago)

2243

Last active (days ago)

List overview

Download

5 comments

3 participants

participants (3)

James E. King III
Jeff Bonyun
Mike