C++ Deflate (zlib-like) library
Hello everyone, I have recently been working on a C++ compression library very similar to zlib after trying to implement some HTTP compression support over Boost.Beast and realizing after some discussion with sir Falco that while it would be a nice builtin feature for Beast, it would possibly be a better idea to have zlib-like compression be a separate library in order to be properly maintainable and likely more useful. The current working version can be viewed at https://github.com/ryanjanson/Deflate however the API could still be helped in terms of modern C++ usability, and what I currently have in mind can be found annotated here: https://gist.github.com/AeroStun/687ec9ca69404e26f8e02e5084926036 Do you thing that this could be useful to have as its own entity in the Boost environment? Any kind of feedback on the idea and the library is warmly welcome. Regards, Janson
On Fri, 6 Mar 2020 at 05:51, Janson R. via Boost <boost@lists.boost.org> wrote:
Hello everyone,
I have recently been working on a C++ compression library very similar to zlib after trying to implement some HTTP compression support over Boost.Beast and realizing after some discussion with sir Falco that while it would be a nice builtin feature for Beast, it would possibly be a better idea to have zlib-like compression be a separate library in order to be properly maintainable and likely more useful.
Why not use lzma(2)? (wasn't there already (wrapped) support for this in Boost?) If you need just in-memory on the fly (de-)compression, for streaming f.e.: lz4 is in **very** active development. R-y-o seems like a waste of dev-time, but it will keep you of the street. It is the wrong direction for Boost to start offering all those (basic) things next to the 'real thing', on Windows this won't be an enormous problem as the fact that it comes in the (Boost) package possibly outweighs the possible resistance for adoption, on linux (and BSD and probably OSX), however, by-passing the normal (distro-supplied) packages used for these purposes increases complexity as opposed to reducing it. Additionally, there are a lot, really a lot, of devs/users that have their eye on the ball, a known bug won't last long and we don't need to wait for Boost to go through another release-cycle. The latter is not helpful either, because corporates will not move immediately to this new release, so it will take even longer. degski -- @systemdeg "We value your privacy, click here!" Sod off! - degski "Anyone who believes that exponential growth can go on forever in a finite world is either a madman or an economist" - Kenneth E. Boulding "Growth for the sake of growth is the ideology of the cancer cell" - Edward P. Abbey
First of all, thanks for the feedback On 06/03/2020 15:32, degski via Boost wrote:
Why not use lzma(2)? (wasn't there already (wrapped) support for this in Boost?) If you need just in-memory on the fly (de-)compression, for streaming f.e.: lz4 is in **very** active development. R-y-o seems like a waste of dev-time, but it will keep you of the street.
The point is to facilitate the use of all the formats which use Deflate such as ZIP, PNG, or PDF, for pure C++ projects and the Boost environment. As for current wrapped support, there is in Boost.Iostreams, but if I read the docs correctly it does not support raw Deflate (RFC1951) which is problematic for formats or protocols which rely on it directly, such as WebSockets' per-message Deflate, or decompressing compressed data from a Microsoft HTTP server (which are non-compliant and use RFC1951 instead of RFC1950). Furthermore, it doesn't allow much room for customization of the engine parameters (no predefined dictionary for example). Janson
On Fri, Mar 6, 2020 at 5:15 PM Janson R. via Boost <boost@lists.boost.org> wrote:
[...] customization of the engine parameters (no predefined dictionary for example).
You mean something similar to https://facebook.github.io/zstd/#small-data ?
On 06/03/2020 18:26, Dominique Devienne via Boost wrote:
You mean something similar to https://facebook.github.io/zstd/#small-data ?
Yes indeed.
On Fri, Mar 6, 2020 at 9:23 AM Dominique Devienne via Boost <boost@lists.boost.org> wrote:
Here I will quote some of the function declarations from this library: size_t ZSTD_CCtx_setParameter(ZSTD_CCtx* cctx, ZSTD_cParameter param, int value); size_t ZSTD_CCtx_setPledgedSrcSize(ZSTD_CCtx* cctx, unsigned long long pledgedSrcSize); size_t ZSTD_compress2( ZSTD_CCtx* cctx, void* dst, size_t dstCapacity, const void* src, size_t srcSize); As with lzma2, this is just another variation of the same shitty C API used by ZLib. Is that the best we can do for C++? I sure hope not... Thanks
On Fri, Mar 6, 2020 at 6:33 AM degski via Boost <boost@lists.boost.org> wrote:
Why not use lzma(2)?
This is not a question of implementation but a question of what a clean, Boost-quality, modern C++ API to a codec that operates on memory buffers looks like. The stock ZLib API is very C-oriented (obviously). Can we do better? Thanks
On Fri, Mar 6, 2020 at 6:33 AM degski via Boost <boost@lists.boost.org> wrote:
Why not use lzma(2)?
Anyway, I looked at the native API for lzma and typical usage looks like this: rc = elzma_compress_config(hand, ELZMA_LC_DEFAULT, ELZMA_LP_DEFAULT, ELZMA_PB_DEFAULT, 5, (1 << 20) /* 1mb */, format, inLen); ... rc = elzma_compress_run(hand, inputCallback, (void *) &ds, outputCallback, (void *) &ds); In other words, the same type of shitty C API found in ZLib - no thanks. Regards
On 7/03/2020 07:14, Vinnie Falco wrote:
In other words, the same type of shitty C API found in ZLib - no thanks.
There's a reason general-use libraries end up gravitating towards shitty C APIs -- because C++ has a much more shitty ABI, making the C API the only one that can actually be consumed cross-compiler and cross-language-binding. One of the other shitty interface parts -- use of overloaded options parameters -- also tends to be a side effect of trying to maintain cross-version ABI stability.
On Sun, 8 Mar 2020 at 17:59, Gavin Lambert via Boost <boost@lists.boost.org> wrote:
On 7/03/2020 07:14, Vinnie Falco wrote:
In other words, the same type of shitty C API found in ZLib - no thanks.
There's a reason general-use libraries end up gravitating towards shitty C APIs -- because C++ has a much more shitty ABI, making the C API the only one that can actually be consumed cross-compiler and cross-language-binding.
Thanks for confirming I'm not alone thinking the above. I would like to add that imo, there is nothing shitty about a c-api, it is a different language all-together, the fact that it is different doesn't make it shitty. It's clear, simple and without surprises (cross platform/std/compiler), what's not to love? Dismissing things like zlib (here on the list) or re-writing half of OpenSSL in c++ (also under consideration), is utter madness and shows little appreciation of the age of those libs and the breadth of their adoption/use. degski -- @systemdeg "We value your privacy, click here!" Sod off! - degski "Anyone who believes that exponential growth can go on forever in a finite world is either a madman or an economist" - Kenneth E. Boulding "Growth for the sake of growth is the ideology of the cancer cell" - Edward P. Abbey
On 09/03/2020 13:18, degski via Boost wrote: I would like to add
that imo, there is nothing shitty about a c-api, it is a different language all-together, the fact that it is different doesn't make it shitty. Indeed C is a different language all-together, and a C API is beautiful, in C. C++ is yet another language, and a C API sprinkled with `void *opaque` all over the place does not play nicely with C++ code and becomes far from
clear, simple and without surprises
There are then two solutions: wrapping, or rewriting. Janson
On Mon, 9 Mar 2020 at 06:30, Janson R. via Boost <boost@lists.boost.org> wrote:
... a C API sprinkled with `void *opaque` all over the place does not play nicely with C++ code
What do you mean with 'does not play nicely', I feel they opposite, no casts required, what's not to like? The most complicated (bar the struct hack, which has been made 'optional' in C11) object in C is a compound of builtin types or an array of them. The considerations we might have in C++ do not apply. You cannot expect cross platform cross-compiler compatibility without a cost, iff that would be possible, we would already have that in C++ and C would remain a distant memory. Fact is, it's not a distant memory, and for good reason. degski -- @systemdeg "We value your privacy, click here!" Sod off! - degski "Anyone who believes that exponential growth can go on forever in a finite world is either a madman or an economist" - Kenneth E. Boulding "Growth for the sake of growth is the ideology of the cancer cell" - Edward P. Abbey
On 09/03/2020 13:39, degski via Boost wrote:
What do you mean with 'does not play nicely', I feel they opposite, no casts required, what's not to like?
Casts are surely required when you need to write a callback handler which signature is imposed `void(void* payload)` by the C library. By "not playing nicely" I mean that suddenly everything is a handle to something in the library and can live wherever the library allocated it (and that assumes that the running system even has dynamic memory allocation).
You cannot expect cross platform cross-compiler compatibility without a cost, iff that would be possible, we would already have that in C++ and C would remain a distant memory. Fact is, it's not a distant memory, and for good reason.
That's the same reasoning that makes people write brand C application code today. Janson
The "Shitty API" is not having constructors and destructors, and relying on the user to call a function to "free" some resources instead of having their lifetime managed automatically via RAII. Anyway... this has veered off-topic. You came close to discussing the topic when you linked this library wrapper: <https://github.com/do-m-en/libarchive_cpp_wrapper> But you didn't actually discuss its API and the suitability to zlib. Looking over this library I see: template<ns_writer::format FORMAT, ns_writer::filter FILTER> static writer make_writer(std::ostream& stream, size_t block_size); Are you suggesting that ZLib should use ostream and istream as the interface for writing and reading data? Thanks
On Mon, Mar 9, 2020 at 2:26 PM Vinnie Falco via Boost <boost@lists.boost.org> wrote:
The "Shitty API" is not having constructors and destructors, and relying on the user to call a function to "free" some resources instead of having their lifetime managed automatically via RAII. Anyway... this has veered off-topic.
You came close to discussing the topic when you linked this library wrapper:
<https://github.com/do-m-en/libarchive_cpp_withwrapper <https://github.com/do-m-en/libarchive_cpp_wrapper>>
But you didn't actually discuss its API and the suitability to zlib.
I did not intend to as I'm not a compression/decompression expert but I did notice a pattern of "let's just use C" in the past week or so on this mailing list and it struck my nerve a bit. But here goes... First the C vs C++ API that I've already touched upon a bit. I quite often (most of the time) use Clang experimental forks on my Ubuntu computer which means that from time to time I have to do the "recompile the world" due to boost library X ABI compatibility not playing nicely between gcc/g++ and Clang (the last one that I've had to bother with a couple of months ago was using Boost::DLL as it depends on Boost::filesystem even with C++17 compiler... and Boost::filesystem was causing compatibility issues - not to mention that I'm now using two different filesystem implementations std for my code and boost for Boost::DLL). This is the in favor of C API part - and since for me all libraries below my code are low level as far as I'm concerned OpenSSL and ZLib are as much low level as Boost::filesystem or Boost Spirit X3 (you can always draw lines - I draw mine at the "when do I have to recompile a library with the same version that my distro already provides). On the other hand I'm far too old to torture my memory by for e.g. remembering which functions should be called in which order and belong to the same "class" that accepts the handle next, remembering to call free_this_resource or instead of using: out1 = decompress(data, type::gzip); // type is in this case a tag out2 = decompress(data, type::bzip2); I have to write a switch statement because: out1 = decompress_gzip(data); out2 = decompress_bzip2(data); so I need to write C++ wrappers (either just abusing std::unique_ptr with default destructor or a wrapper on the level of the linked libarchive wrapper). These are the wrappers that I'd like to have in Boost. So bottom line is that I'd prefer a C API library that can be provided by the distro (zlib, libarchive,...) and a C++ part (provided by Boost that is also part of the distro) that would be compiled along side my code - but for small libs header only the entire lib would also do the trick. This are the reasons for me preferring C++ std lib only stuff (so migrating Boost libs towards standard) - it comes with the experimental compiler that I have to compile anyway - and why I'm ranting on Boost mailing list from time to time that Boost libs should move to std dependencies once those are in standard. This would solve my recompile the world issues (or at least shrink them a bit). Looking over this library I see:
template<ns_writer::format FORMAT, ns_writer::filter FILTER> static writer make_writer(std::ostream& stream, size_t block_size);
Are you suggesting that ZLib should use ostream and istream as the interface for writing and reading data?
Now for the API of the library proposed by OP. I've skimmed over the code and allot of it seems like just-a-bit-above-C flavour - I could be wrong as I didn't find the examples and unit tests also haven't shown me how to easily use the library. The problem that I have with such a library is that it's not useful for my work - it's too low level and is reinventing the wheel. The low level part was already implemented by zlib, pigz and other C libraries and I find it counter productive to try and play catch up in C++ since those libraries are already compatible with C++. What I'd like to see from a C++ library in general and Boost library in particular is a high quality API that would cover capabilities of libarchive (and use it as the underlying implementation at first). The APIs of primitives (inflate/deflate) should come at a later stage - perhaps not before C++ standardization of the APIs (and even then I'd expect that std libs would use what is already available in distro X instead of reinventing the wheel). My old lib has a poor API as it's not exactly configurable (allocators etc.) but it's still a big simplification compared to libarchive C API and that's what I expect from a high quality wrapper - simplify allot but still let an advanced user squeeze the performance of the underlying implementation if/when needed (complexity should be reserved for those and hidden from us mere mortals). For the capabilities of such API I'd expect it to work seamlessly with std streams (I don't care if people believe that they are slow, until they are part of the standard and there is no better std replacement I expect libraries of such type to work with them out of the box) and support for binary data not just std::string as in/out buffers in situations where you have everything already in memory or are using memory mapped files (perhaps also (Boost::)ASIO/Networking TS streams but that's your domain so you probably have a better opinion on how ASIO network payload could be compressed/decompressed on the fly). For an example of why I believe that a libarchive level C++ API should be preferred in Boost: a while back I needed TLS SNI support for Boost Asio/Beast as I wanted to use multiple https domains on a single IP. Asio and Beast are useful libraries but don't provide server side support, on the other hand OpenSSL world is evolving at it's own pace so the library already has the support and since Asio provides a high level SSL context wrapper and Beast doesn't do any weird magic with it I just wrote a simple context that just works: https://github.com/do-m-en/boost_asio_tls_sni_context OpenSSL is a community focused on network cryptography so rewriting a subset as a Boost lib would've been counter productive as I'm quite certain that such a lib would not have so much support. On the other hand networking support is as far as I'm concerned on solid foundation in Boost so a high level wrapper combination is IMO the best solution. Regarding the OP's library as it currently stands I'd suggest integration of Boost::Beast instead (if it's a benefit) and split it into a separate library once it can serve as a high level Boost library's backbone (and get that high level API into Boost first). P.S. There used to be a zip stream in Boost (had a bug in at least one Boost version on AIX OS so it was useless for me at that point but it was there) - don't know what happened to it but I probably wouldn't use it as for me libarchive level of functionality is what I mostly need and once you already have a hammer in the codebase... Regards, Domen
Thanks
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
On Mon, Mar 9, 2020 at 9:25 PM Domen Vrankar <domen.vrankar@gmail.com> wrote:
On Mon, Mar 9, 2020 at 2:26 PM Vinnie Falco via Boost < boost@lists.boost.org> wrote:
The "Shitty API" is not having constructors and destructors, and relying on the user to call a function to "free" some resources instead of having their lifetime managed automatically via RAII. Anyway... this has veered off-topic.
You came close to discussing the topic when you linked this library wrapper:
<https://github.com/do-m-en/libarchive_cpp_withwrapper <https://github.com/do-m-en/libarchive_cpp_wrapper>>
But you didn't actually discuss its API and the suitability to zlib.
I did not intend to as I'm not a compression/decompression expert but I did notice a pattern of "let's just use C" in the past week or so on this mailing list and it struck my nerve a bit.
But here goes...
First the C vs C++ API that I've already touched upon a bit. I quite often (most of the time) use Clang experimental forks on my Ubuntu computer which means that from time to time I have to do the "recompile the world" due to boost library X ABI compatibility not playing nicely between gcc/g++ and Clang (the last one that I've had to bother with a couple of months ago was using Boost::DLL as it depends on Boost::filesystem even with C++17 compiler... and Boost::filesystem was causing compatibility issues - not to mention that I'm now using two different filesystem implementations std for my code and boost for Boost::DLL). This is the in favor of C API part - and since for me all libraries below my code are low level as far as I'm concerned OpenSSL and ZLib are as much low level as Boost::filesystem or Boost Spirit X3 (you can always draw lines - I draw mine at the "when do I have to recompile a library with the same version that my distro already provides).
On the other hand I'm far too old to torture my memory by for e.g. remembering which functions should be called in which order and belong to the same "class" that accepts the handle next, remembering to call free_this_resource or instead of using:
out1 = decompress(data, type::gzip); // type is in this case a tag out2 = decompress(data, type::bzip2);
I have to write a switch statement because:
out1 = decompress_gzip(data); out2 = decompress_bzip2(data);
so I need to write C++ wrappers (either just abusing std::unique_ptr with default destructor or a wrapper on the level of the linked libarchive wrapper). These are the wrappers that I'd like to have in Boost.
So bottom line is that I'd prefer a C API library that can be provided by the distro (zlib, libarchive,...) and a C++ part (provided by Boost that is also part of the distro) that would be compiled along side my code - but for small libs header only the entire lib would also do the trick.
This are the reasons for me preferring C++ std lib only stuff (so migrating Boost libs towards standard) - it comes with the experimental compiler that I have to compile anyway - and why I'm ranting on Boost mailing list from time to time that Boost libs should move to std dependencies once those are in standard.
This would solve my recompile the world issues (or at least shrink them a bit).
Looking over this library I see:
template<ns_writer::format FORMAT, ns_writer::filter FILTER> static writer make_writer(std::ostream& stream, size_t block_size);
Are you suggesting that ZLib should use ostream and istream as the interface for writing and reading data?
Now for the API of the library proposed by OP.
I've skimmed over the code and allot of it seems like just-a-bit-above-C flavour - I could be wrong as I didn't find the examples and unit tests also haven't shown me how to easily use the library.
The problem that I have with such a library is that it's not useful for my work - it's too low level and is reinventing the wheel. The low level part was already implemented by zlib, pigz and other C libraries and I find it counter productive to try and play catch up in C++ since those libraries are already compatible with C++.
What I'd like to see from a C++ library in general and Boost library in particular is a high quality API that would cover capabilities of libarchive (and use it as the underlying implementation at first). The APIs of primitives (inflate/deflate) should come at a later stage - perhaps not before C++ standardization of the APIs (and even then I'd expect that std libs would use what is already available in distro X instead of reinventing the wheel).
Just to clarify this... From C++ std lib I'd expect that it would expose the primitives from which libarchive level API would be built but I'd expect the standardization of such low level APIs that could use zlib and co. as the underlying implementation of those primitive APIs - and I wouldn't expect the distros to re-implement them as one implementation of a certain library for one distro is enough and effort should be made to improve those instead of having two implementations where possible.
My old lib has a poor API as it's not exactly configurable (allocators etc.) but it's still a big simplification compared to libarchive C API and that's what I expect from a high quality wrapper - simplify allot but still let an advanced user squeeze the performance of the underlying implementation if/when needed (complexity should be reserved for those and hidden from us mere mortals).
For the capabilities of such API I'd expect it to work seamlessly with std streams (I don't care if people believe that they are slow, until they are part of the standard and there is no better std replacement I expect libraries of such type to work with them out of the box) and support for binary data not just std::string as in/out buffers in situations where you have everything already in memory or are using memory mapped files (perhaps also (Boost::)ASIO/Networking TS streams but that's your domain so you probably have a better opinion on how ASIO network payload could be compressed/decompressed on the fly).
For an example of why I believe that a libarchive level C++ API should be preferred in Boost: a while back I needed TLS SNI support for Boost Asio/Beast as I wanted to use multiple https domains on a single IP. Asio and Beast are useful libraries but don't provide server side support, on the other hand OpenSSL world is evolving at it's own pace so the library already has the support and since Asio provides a high level SSL context wrapper and Beast doesn't do any weird magic with it I just wrote a simple context that just works:
https://github.com/do-m-en/boost_asio_tls_sni_context
OpenSSL is a community focused on network cryptography so rewriting a subset as a Boost lib would've been counter productive as I'm quite certain that such a lib would not have so much support. On the other hand networking support is as far as I'm concerned on solid foundation in Boost so a high level wrapper combination is IMO the best solution.
Regarding the OP's library as it currently stands I'd suggest integration of Boost::Beast instead (if it's a benefit) and split it into a separate library once it can serve as a high level Boost library's backbone (and get that high level API into Boost first).
P.S. There used to be a zip stream in Boost (had a bug in at least one Boost version on AIX OS so it was useless for me at that point but it was there) - don't know what happened to it but I probably wouldn't use it as for me libarchive level of functionality is what I mostly need and once you already have a hammer in the codebase...
Regards, Domen
Thanks
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
On Mon, Mar 9, 2020 at 9:39 PM Vinnie Falco <vinnie.falco@gmail.com> wrote:
On Mon, Mar 9, 2020 at 1:26 PM Domen Vrankar <domen.vrankar@gmail.com> wrote:
For the capabilities of such API I'd expect it to work seamlessly with std streams
Okay, can you propose some function signatures or class declarations?
Actually I've already written part of them :) If I'd have written my wrapper once again the basis wold not change as that API covered my use cases in the past quite well. I'd just: - add support for archives that were added later by libarchive - implement some sort of properties system of archive/compression type capabilities as currently it relies on libarchive and doesn't care if something is not supported by the underlying format - add allocator support - and perhaps add an additional no-exceptions API Let libarchive handle the rest for starters and gather more feedback from users - the part that I never cared to bother with except from those that were working on the same project as me (there was one github request for adding files to existing archives but that came in years after I last actively used it so I never even bothered - different projects, no longer need to bother with compression). Regards, Domen Thanks
On 09/03/2020 21:25, Domen Vrankar via Boost wrote:
Now for the API of the library proposed by OP.
I've skimmed over the code and allot of it seems like just-a-bit-above-C flavour - I could be wrong as I didn't find the examples and unit tests also haven't shown me how to easily use the library.
Observation is mostly correct, and it is one of the main reasons I've come to ask for feedback here; Vinnie rightfully pointed out that the question being asked is "what [does] a clean, Boost-quality, modern C++ API to codec that operates on memory buffers [look] like[?]". Since the emails discussions have begun I've integrated the following prototype API after suggestion by a few of the people here: ``` optional<std::string> easy_compress(string_view in, wrap wrapping = wrap::none); optional<std::string> easy_uncompress(string_view in, wrap wrapping = wrap::none); ``` While it is definitely something that was needed (a friend of mine got to use it and obviously found it much more trivial for his use than the more expert-mode preexisting API), it definitely does not cover all the possible use cases of such a library.
What I'd like to see from a C++ library in general and Boost library in particular is a high quality API that would cover capabilities of libarchive (and use it as the underlying implementation at first). The APIs of primitives (inflate/deflate) should come at a later stage - perhaps not before C++ standardization of the APIs (and even then I'd expect that std libs would use what is already available in distro X instead of reinventing the wheel). My old lib has a poor API [...] but it's still a big simplification compared to libarchive C API and that's what I expect from a high quality wrapper - simplify allot but still let an advanced user squeeze the performance of the underlying implementation if/when needed (complexity should be reserved for those and hidden from us mere mortals).
Does that imply that you would welcome quality C++ wrappers for common C libraries in the Boost ecosystem?
For the capabilities of such API I'd expect it to work seamlessly with std streams (I don't care if people believe that they are slow, until they are part of the standard and there is no better std replacement I expect libraries of such type to work with them out of the box) and support for binary data not just std::string as in/out buffers in situations where you have everything already in memory or are using memory mapped files (perhaps also (Boost::)ASIO/Networking TS streams but that's your domain so you probably have a better opinion on how ASIO network payload could be compressed/decompressed on the fly).
That's exactly the kind of feedback and information I'm looking for; so thanks! Possible quick function declarations are more than welcome to add!
Regarding the OP's library as it currently stands I'd suggest integration of Boost::Beast instead (if it's a benefit) and split it into a separate library once it can serve as a high level Boost library's backbone (and get that high level API into Boost first).
Boost.Beast will need at some point at least some code being transferred back into it to allow some HTTP features to be implemented (which are the very first reason why I got on that adventure to begin with). Regards Janson
On Mon, Mar 9, 2020 at 9:56 PM Janson R. via Boost <boost@lists.boost.org> wrote:
On 09/03/2020 21:25, Domen Vrankar via Boost wrote:
Now for the API of the library proposed by OP.
I've skimmed over the code and allot of it seems like just-a-bit-above-C flavour - I could be wrong as I didn't find the examples and unit tests also haven't shown me how to easily use the library.
Observation is mostly correct, and it is one of the main reasons I've come to ask for feedback here; Vinnie rightfully pointed out that the question being asked is "what [does] a clean, Boost-quality, modern C++ API to codec that operates on memory buffers [look] like[?]".
Since the emails discussions have begun I've integrated the following prototype API after suggestion by a few of the people here: ``` optional<std::string> easy_compress(string_view in, wrap wrapping = wrap::none);
optional<std::string> easy_uncompress(string_view in, wrap wrapping = wrap::none); ```
While it is definitely something that was needed (a friend of mine got to use it and obviously found it much more trivial for his use than the more expert-mode preexisting API), it definitely does not cover all the possible use cases of such a library.
I've seen this, didn't know that it was later addition but was surprised by the names. The issue that I have with this is that it expects a large file to be loaded into memory - I've mostly worked with large files and memory constraints (for various reasons that were out of my control and were policy and not technology related) so I'd expect at least streams support so that you can write: read -> process -> write pattern where you only handle part of the data at a time.
What I'd like to see from a C++ library in general and Boost library in particular is a high quality API that would cover capabilities of libarchive (and use it as the underlying implementation at first). The APIs of primitives (inflate/deflate) should come at a later stage - perhaps not before C++ standardization of the APIs (and even then I'd expect that std libs would use what is already available in distro X instead of reinventing the wheel). My old lib has a poor API [...] but it's still a big simplification compared to libarchive C API and that's what I expect from a high quality wrapper - simplify allot but still let an advanced user squeeze the performance of the underlying implementation if/when needed (complexity should be reserved for those and hidden from us mere mortals).
Does that imply that you would welcome quality C++ wrappers for common C libraries in the Boost ecosystem?
I'm not a Boost author or maintainer so it's not my place to say but from where I stand that's the place where Boost belongs.
For the capabilities of such API I'd expect it to work seamlessly with std streams (I don't care if people believe that they are slow, until they are part of the standard and there is no better std replacement I expect libraries of such type to work with them out of the box) and support for binary data not just std::string as in/out buffers in situations where you have everything already in memory or are using memory mapped files (perhaps also (Boost::)ASIO/Networking TS streams but that's your domain so you probably have a better opinion on how ASIO network payload could be compressed/decompressed on the fly).
That's exactly the kind of feedback and information I'm looking for; so thanks! Possible quick function declarations are more than welcome to add!
You're tackling inflate/deflate - libarchive is handling archives (e.g. .tar.gz not just gz part). The way I see it you're competing in the wrong league and I can't give constructive feedback on that. Your library would be a building block for writing libraries such as libarchive but that is a C lib and would not bother with your lib. On the other hand I wouldn't use it as I usually had to handle archives not a single compressed file. That's the reason I'm recommending going from the other side - high level API that handles the (from my point of view) usual use cases and give users a reason to use your library instead of just falling back to C libraries by default.
Regarding the OP's library as it currently stands I'd suggest integration of Boost::Beast instead (if it's a benefit) and split it into a separate library once it can serve as a high level Boost library's backbone (and get that high level API into Boost first).
Boost.Beast will need at some point at least some code being transferred back into it to allow some HTTP features to be implemented (which are the very first reason why I got on that adventure to begin with).
Yeah... That's the problem that I see. Library is handling an internal implementation detail of Boost.Beast and that's a small subset of what you need for a proper general purpose compression/decompression lib. Like I said I'm not certain that this is the right angle to begin from as it's competing against instead of enhancing higher level C libs. Regards, Domen
Regards
Janson
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
On 10/03/2020 09:56, Janson R. wrote:
Observation is mostly correct, and it is one of the main reasons I've come to ask for feedback here; Vinnie rightfully pointed out that the question being asked is "what [does] a clean, Boost-quality, modern C++ API to codec that operates on memory buffers [look] like[?]".
That answer may be different depending on your context; for example Boost.Asio provides a number of buffer concepts and methods for manipulating them. Which are good, in that they're templates and support all kinds of different buffer concepts (from basic vectors and strings through to dynamic multi-buffer streams), and you only pay for the complexity that you actually use. They're also bad, in that they're templates and they're harder to consume generically without forcing your own code to also be a template. But if you're in the context of Boost.Beast or some other app that uses Asio, it makes sense to use these buffer concepts rather than reinventing the wheel, especially with these things probably-maybe-eventually landing in the standard. Forthcoming std::span might be another interesting candidate.
Since the emails discussions have begun I've integrated the following prototype API after suggestion by a few of the people here: ``` optional<std::string> easy_compress(string_view in, wrap wrapping = wrap::none);
optional<std::string> easy_uncompress(string_view in, wrap wrapping = wrap::none); ``` While it works (and I do sometimes use it myself if I'm especially lazy) I dislike use of std::string and friends for arbitrary binary data.
std::basic_string<uint8_t> / std::basic_string_view<uint8_t> are better, but still carry unfortunate "pretending to be text" implications. std::vector<uint8_t> is probably the best standard container type for a basic block-o-bytes, absent generic template buffer concept magic and/or C++20 spans.
On Mon, 9 Mar 2020 at 20:30, Gavin Lambert via Boost <boost@lists.boost.org> wrote:
std::vector<uint8_t> is probably the best standard container type for a basic block-o-bytes, absent generic template buffer concept magic and/or C++20 spans.
Till C++20 lands, one can use https://github.com/tcbrindle/span . d. -- @systemdeg "We value your privacy, click here!" Sod off! - degski "Anyone who believes that exponential growth can go on forever in a finite world is either a madman or an economist" - Kenneth E. Boulding "Growth for the sake of growth is the ideology of the cancer cell" - Edward P. Abbey
On 10/03/2020 03:30, Gavin Lambert via Boost wrote:
Forthcoming std::span might be another interesting candidate.
That's for sure; I'll experiment with it more this week
While it works (and I do sometimes use it myself if I'm especially lazy) I dislike use of std::string and friends for arbitrary binary data.
You've definitely caught the "lazy" aspect that went into it; was mostly for checking how usable a solution of that form is. I have a user and he happens to use strings so I just went with the most straightforward way to let him use the library and for myself to test it.
std::vector<uint8_t> is probably the best standard container type for a basic block-o-bytes, absent generic template buffer concept magic and/or C++20 spans.
You've just given me the idea of making a catch-all span-like that takes any contiguous range of `char` or `unsigned char` and turns it into a lightweight span that can be used in ABI boundaries. I'll try that tomorrow; thanks for the feedback on possible input types! All of this C++20 mindset made me wonder if a codec API could be almost as simple and elegant as making it into a range adaptor (requiring memory contiguity of course); what do you and possibly other readers think of that? Regards, Janson
On 10/03/2020 03:47, Janson R. wrote:
All of this C++20 mindset made me wonder if a codec API could be almost as simple and elegant as making it into a range adaptor (requiring memory contiguity of course); what do you and possibly other readers think of that?
It's pretty much what Boost.Iostreams does, realized that now J.
Janson R. wrote:
All of this C++20 mindset made me wonder if a codec API could be almost as simple and elegant as making it into a range adaptor (requiring memory contiguity of course); what do you and possibly other readers think of that?
I suspect that for things like streaming compression and encryption APIs, in a couple of years we'll all agree that the right way to do them will be as co-routines. (Personally, nearly all of my recent use of these things has been for relatively small data; a simple non-streaming API i.e. take a span<byte> and return a vector<byte> would be perfect.) Regards, Phil.
On Tue, Mar 10, 2020, 4:51 PM Phil Endecott via Boost <boost@lists.boost.org> wrote:
Janson R. wrote:
All of this C++20 mindset made me wonder if a codec API could be almost as simple and elegant as making it into a range adaptor (requiring memory contiguity of course); what do you and possibly other readers think of that?
I suspect that for things like streaming compression and encryption APIs, in a couple of years we'll all agree that the right way to do them will be as co-routines.
co-routines only add overhead in cases that are processing intensive and run on the local CPU. they are meant for tasks where you wait for the result and CPU can use that thread for something else in the meantime. (Personally, nearly all of my recent use of these things has been
for relatively small data; a simple non-streaming API i.e. take a span<byte> and return a vector<byte> would be perfect.)
I agree that it would be a nice first step but as I said if this project continues I'd still prefer a stream/iterator supporting api (my general use case) in addition to this. Perhaps I'll write an api when I get home instead of just words... Regards, Domen
Regards, Phil.
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Domen Vrankar wrote:
On Tue, Mar 10, 2020, 4:51 PM Phil Endecott via Boost <boost@lists.boost.org> wrote:
I suspect that for things like streaming compression and encryption APIs, in a couple of years we'll all agree that the right way to do them will be as co-routines.
co-routines only add overhead in cases that are processing intensive and run on the local CPU. they are meant for tasks where you wait for the result and CPU can use that thread for something else in the meantime.
Not true - think of generators, for example. You increase performance when something has to pause in the middle - for example a compression function that needs more input or fills its output - and the paused state is more efficiently represented by its stack frame than by an explicit state object. Everyone will understand this better in a couple of years when we have implementations to play with. (I feel old; I've just realised that the last time I used co-routines was with Modula-2 in about 1987!) Regards, Phil.
On Wed, 11 Mar 2020 at 05:02, Phil Endecott via Boost <boost@lists.boost.org> wrote:
Domen Vrankar wrote:
On Tue, Mar 10, 2020, 4:51 PM Phil Endecott via Boost < boost@lists.boost.org> wrote:
I suspect that for things like streaming compression and encryption APIs, in a couple of years we'll all agree that the right way to do them will be as co-routines.
co-routines only add overhead in cases that are processing intensive and run on the local CPU. they are meant for tasks where you wait for the result and CPU can use that thread for something else in the meantime.
Not true - think of generators, for example. You increase performance when something has to pause in the middle - for example a compression function that needs more input or fills its output - and the paused state is more efficiently represented by its stack frame than by an explicit state object.
Everyone will understand this better in a couple of years when we have implementations to play with.
(I feel old; I've just realised that the last time I used co-routines was with Modula-2 in about 1987!)
Luckily, we are now all using Modern C++ now (when the implementations will finally arrive, that is). I know nothing, but did Modula-2 have modules, maybe? I remember I was running the OS for a little while, also in the 80's. degski -- @systemdeg "We value your privacy, click here!" Sod off! - degski "Anyone who believes that exponential growth can go on forever in a finite world is either a madman or an economist" - Kenneth E. Boulding "Growth for the sake of growth is the ideology of the cancer cell" - Edward P. Abbey
On Mon, Mar 9, 2020 at 7:30 PM Gavin Lambert via Boost <boost@lists.boost.org> wrote:
That answer may be different depending on your context; for example Boost.Asio provides a number of buffer concepts and methods for manipulating them.
Which are good, in that they're templates and support all kinds of different buffer concepts (from basic vectors and strings through to dynamic multi-buffer streams), and you only pay for the complexity that you actually use.
They're also bad, in that they're templates and they're harder to consume generically without forcing your own code to also be a template.
But if you're in the context of Boost.Beast or some other app that uses Asio, it makes sense to use these buffer concepts rather than reinventing the wheel, especially with these things probably-maybe-eventually landing in the standard.
Yes now we're getting to the meat of it! You are proposing: Use Asio's BufferSequence concepts (or something substantially similar) in the high level ZLib interface (or a generic codec interface). On the face of it this is not a bad idea, but the reality is that all codecs work a single buffer at a time. This is different from Asio or OS-level I/O, which support a scatter/gather interface where the buffers are all sent together. Thus, it seems to be that at the lowest level of a codec interface, it will be single-buffer oriented. On top of that single-buffer interface, algorithms which work on a buffer sequence or range of buffers can be built. Asio's buffer sequence concepts are tragically connected to the asio::const_buffer and asio::mutable_buffer concrete types, which makes it necessary to have all of Asio as a dependency. Using these types out of the box for a low-level ZLib / codecc library doesn't seem like a great idea. I did propose that Asio split out its buffer sequence concepts and types into a separate Boost library. This would allow Beast's HTTP parser to be in its own library, without requiring Asio. The experience with the large Beast library makes me now more in favor of smaller, more numerous libraries with fewer dependencies. Perhaps something like this could be an improvement on ZLib's C API: struct input_buffer { input_buffer( void const* data, std::size_t size); bool is_empty() const noexcept; void const* data() const noexcept; std::size_t used() const noexcept; std::size_t remaining() const noexcept; void consume(std::size_t) noexcept; }; struct output_buffer; // similar to input_buffer Then we can express a compression function using these types, and get better safety: // returns the number of bytes written to the output buffer std::size_t compress( output_buffer& out, input_buffer& in ); A function to compress a ConstBufferSequence into a MutableBufferSequence could be built from this one primitive, and a dependency on including <boost/asio/buffer.hpp> Thanks
On Tue, Mar 10, 2020, 4:15 AM Vinnie Falco via Boost <boost@lists.boost.org> wrote:
On Mon, Mar 9, 2020 at 7:30 PM Gavin Lambert via Boost <boost@lists.boost.org> wrote:
That answer may be different depending on your context; for example Boost.Asio provides a number of buffer concepts and methods for manipulating them.
Which are good, in that they're templates and support all kinds of different buffer concepts (from basic vectors and strings through to dynamic multi-buffer streams), and you only pay for the complexity that you actually use.
They're also bad, in that they're templates and they're harder to consume generically without forcing your own code to also be a template.
But if you're in the context of Boost.Beast or some other app that uses Asio, it makes sense to use these buffer concepts rather than reinventing the wheel, especially with these things probably-maybe-eventually landing in the standard.
Yes now we're getting to the meat of it! You are proposing: Use Asio's BufferSequence concepts (or something substantially similar) in the high level ZLib interface (or a generic codec interface).
On the face of it this is not a bad idea, but the reality is that all codecs work a single buffer at a time. This is different from Asio or OS-level I/O, which support a scatter/gather interface where the buffers are all sent together. Thus, it seems to be that at the lowest level of a codec interface, it will be single-buffer oriented.
On top of that single-buffer interface, algorithms which work on a buffer sequence or range of buffers can be built. Asio's buffer sequence concepts are tragically connected to the asio::const_buffer and asio::mutable_buffer concrete types, which makes it necessary to have all of Asio as a dependency. Using these types out of the box for a low-level ZLib / codecc library doesn't seem like a great idea.
I did propose that Asio split out its buffer sequence concepts and types into a separate Boost library. This would allow Beast's HTTP parser to be in its own library, without requiring Asio. The experience with the large Beast library makes me now more in favor of smaller, more numerous libraries with fewer dependencies.
Perhaps something like this could be an improvement on ZLib's C API:
struct input_buffer { input_buffer( void const* data, std::size_t size);
bool is_empty() const noexcept; void const* data() const noexcept; std::size_t used() const noexcept; std::size_t remaining() const noexcept; void consume(std::size_t) noexcept; };
First questions regarding your API - why do we need all those functions? You compress/decompress in chunks so all you need is a const std::span for input and std::span for output and call decompress on a for loop. Since compression/decompression is not random access but streaming you can alternatively just accept an input iterator pair/range for input/output and cover tue streaming case without a for loop. With std::istream_iterator you can then handle std::stream (file, memory etc.) support. You can also chain for e.g. boost.spirit x3 parser and lazy read->decompress->parse/process Can't existing asio buffers be written to through span/iterators? struct output_buffer; // similar to input_buffer
Then we can express a compression function using these types, and get better safety:
// returns the number of bytes written to the output buffer std::size_t compress( output_buffer& out, input_buffer& in );
Whit iterators/ranges described above you wouldn't need to return std::size_t so return can be used for error codes instead. Regards, Domen A function to compress a ConstBufferSequence into a
MutableBufferSequence could be built from this one primitive, and a dependency on including <boost/asio/buffer.hpp>
Thanks
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
On Tue, Mar 10, 2020 at 1:31 AM Domen Vrankar <domen.vrankar@gmail.com> wrote:
You compress/decompress in chunks so all you need is a const std::span for input and std::span for output and call decompress on a for loop.
You need to know: 1. how much input was consumed 2. how much output was produced
...input iterator pair/range for input/output... ...std::istream_iterator... ...boost.spirit x3 parser... ...read->decompress->parse/process ... Whit iterators/ranges described above you wouldn't need to return std::size_t so return can be used for error codes instead.
It isn't clear what these things mean without seeing a function signature or class declaration. Thanks
fyi w.r.t streaming there is already a proposal in this area -- which I thought got put into the library fundamentals TS, but not seeing that it happened now. There is a github for it, but I think Peter got tired of championing this and would like others to pick it up. http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p0448r2.pdf On Tue, Mar 10, 2020 at 9:50 AM Vinnie Falco via Boost < boost@lists.boost.org> wrote:
On Tue, Mar 10, 2020 at 1:31 AM Domen Vrankar <domen.vrankar@gmail.com> wrote:
You compress/decompress in chunks so all you need is a const std::span for input and std::span for output and call decompress on a for loop.
You need to know:
1. how much input was consumed 2. how much output was produced
...input iterator pair/range for input/output... ...std::istream_iterator... ...boost.spirit x3 parser... ...read->decompress->parse/process ... Whit iterators/ranges described above you wouldn't need to return std::size_t so return can be used for error codes instead.
It isn't clear what these things mean without seeing a function signature or class declaration.
Thanks
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
On Tue, Mar 10, 2020 at 5:03 PM Jeff Garland <azswdude@gmail.com> wrote:
fyi w.r.t streaming there is already a proposal in this area -- which I thought got put into the library fundamentals TS, but not seeing that it happened now. There is a github for it, but I think Peter got tired of championing this and would like others to pick it up.
"streambuf" is far too heavyweight to use as a low-level interface. I already pointed out that a codec needs to have as arguments not only the input and output "spans" but also return the number of bytes transacted for input and output. signal-to-noise in this discussion is low...in a sense, zlib gets the crux of it right with the z_params: struct z_params { void const* next_in; std::size_t avail_in; std::size_t total_in = 0; void* next_out; std::size_t avail_out; std::size_t total_out = 0; int data_type = unknown; // best guess about the data type: binary or text }; The question is, can we do better than this? I'm certain that all answers which boil down to "copy std stream APIs" are wrong. Thanks
On Wed, Mar 11, 2020 at 2:18 AM Vinnie Falco <vinnie.falco@gmail.com> wrote:
fyi w.r.t streaming there is already a proposal in this area -- which I
On Tue, Mar 10, 2020 at 5:03 PM Jeff Garland <azswdude@gmail.com> wrote: thought got put into the library fundamentals TS, but not seeing that it happened now. There is a github for it, but I think Peter got tired of championing this and would like others to pick it up.
"streambuf" is far too heavyweight to use as a low-level interface. I already pointed out that a codec needs to have as arguments not only the input and output "spans" but also return the number of bytes transacted for input and output.
signal-to-noise in this discussion is low...in a sense, zlib gets the crux of it right with the z_params:
struct z_params { void const* next_in; std::size_t avail_in; std::size_t total_in = 0; void* next_out; std::size_t avail_out; std::size_t total_out = 0; int data_type = unknown; // best guess about the data type: binary or text };
The question is, can we do better than this? I'm certain that all answers which boil down to "copy std stream APIs" are wrong.
"copy std stream APIs" answer is wrong in my opinion as well but I would most certainly expect such a library to provide a convenient way to adapt a stream - I really dislike libraries that do their own thing and don't provide a convenient way to adapt to existing standard constructs. Regards, Domen
Thanks
On 11/03/2020 14:17, Vinnie Falco wrote:
"streambuf" is far too heavyweight to use as a low-level interface. I already pointed out that a codec needs to have as arguments not only the input and output "spans" but also return the number of bytes transacted for input and output.
signal-to-noise in this discussion is low...in a sense, zlib gets the crux of it right with the z_params:
struct z_params { void const* next_in; std::size_t avail_in; std::size_t total_in = 0; void* next_out; std::size_t avail_out; std::size_t total_out = 0; int data_type = unknown; // best guess about the data type: binary or text };
The question is, can we do better than this? I'm certain that all answers which boil down to "copy std stream APIs" are wrong.
std::streambuf is almost that straightforward in its storage (albeit it uses more pointers, presumably related to supporting non-contiguous streams or something). (And boost::asio::streambuf uses it, which is typically Boost.Asio's DynamicBuffer of choice.) Other than Ye Olde Horrid Naming Conventions, the only particularly heavyweight thing in it that I can see is the locale. Would you like std::streambuf better if it omitted the locale? Or had better naming conventions? Or is there some other objection to it?
aOn Tue, Mar 10, 2020 at 5:49 PM Vinnie Falco <vinnie.falco@gmail.com> wrote:
On Tue, Mar 10, 2020 at 1:31 AM Domen Vrankar <domen.vrankar@gmail.com> wrote:
You compress/decompress in chunks so all you need is a const std::span for input and std::span for output and call decompress on a for loop.
You need to know:
1. how much input was consumed 2. how much output was produced
Now I understand what you mean. You got me confused as your compress function wasn't receiving any state (like z_stream in zlib) so I assumed that you meant to compress the entire buffer in one go.
...input iterator pair/range for input/output... ...std::istream_iterator... ...boost.spirit x3 parser... ...read->decompress->parse/process ... Whit iterators/ranges described above you wouldn't need to return std::size_t so return can be used for error codes instead.
It isn't clear what these things mean without seeing a function signature or class declaration.
Since you need to hold state I was talking about something like this: struct Reader // should be implemented for reading from vector, file stream... some provided as convinience, others user defined so an extension point { std::span<std::byte> read(); // remains stable untill next call to read, at that point it is auto consumed bool eof(); }; template<typename InReader, typename WriteBuffer> class Deflate_range { public: Deflate_range(InReader& in, WriteBuffer& write); class Deflate_iterator{...}; Deflate_iterator begin(); // can be called only once Deflate_iterator end(); private: InReader& in_; WriteBuffer& write_; z_stream zs_; // holds the compression/encryption/... state }; Polymorphic_deflate_range compress(std::istream_iterator<std::byte> begin, std::istream_iterator<std::byte> end); // it packs iterators into a Reader (an overload of compress coud also take in streams directly - it's just a convinience function) std::istringstream str{"1234567890"}; auto range = compress(std::istream_iterator<std::byte>{str}, std::istream_iterator<std::byte>{}); bool ok = boost::spirit::x3::phrase_parse(range.begin(), range.end(), ...); I haven't used std::ranges yet so perhaps such a range should be implemented differently. Then I got confused by your compress API so size_t my return comment was just me being confused and reading/replying to mails on my phone... Hope this clarifies things a bit. I'm guessing that I could use such thing in my toy game engine - not certain but perhaps I won't need the whole libarchive functionality there... To early to say. Regards, Domen
Thanks
On Wed, Mar 11, 2020 at 7:12 AM Domen Vrankar <domen.vrankar@gmail.com> wrote:
aOn Tue, Mar 10, 2020 at 5:49 PM Vinnie Falco <vinnie.falco@gmail.com> wrote:
On Tue, Mar 10, 2020 at 1:31 AM Domen Vrankar <domen.vrankar@gmail.com> wrote:
You compress/decompress in chunks so all you need is a const std::span for input and std::span for output and call decompress on a for loop.
You need to know:
1. how much input was consumed 2. how much output was produced
Now I understand what you mean. You got me confused as your compress function wasn't receiving any state (like z_stream in zlib) so I assumed that you meant to compress the entire buffer in one go.
...input iterator pair/range for input/output... ...std::istream_iterator... ...boost.spirit x3 parser... ...read->decompress->parse/process ... Whit iterators/ranges described above you wouldn't need to return std::size_t so return can be used for error codes instead.
It isn't clear what these things mean without seeing a function signature or class declaration.
Since you need to hold state I was talking about something like this:
struct Reader // should be implemented for reading from vector, file stream... some provided as convinience, others user defined so an extension point { std::span<std::byte> read(); // remains stable untill next call to read, at that point it is auto consumed bool eof(); };
template<typename InReader, typename WriteBuffer> class Deflate_range { public: Deflate_range(InReader& in, WriteBuffer& write);
Sry... I meant to write std::span<std::byte> instead of WriteBuffer template but completely forgot while composing the mail... It's a temporary buffer used by deflate process to be used for iterator buffering. Just replace it while reading ;)
class Deflate_iterator{...};
Deflate_iterator begin(); // can be called only once Deflate_iterator end();
private: InReader& in_; WriteBuffer& write_; z_stream zs_; // holds the compression/encryption/... state };
Polymorphic_deflate_range compress(std::istream_iterator<std::byte> begin, std::istream_iterator<std::byte> end); // it packs iterators into a Reader (an overload of compress coud also take in streams directly - it's just a convinience function)
std::istringstream str{"1234567890"}; auto range = compress(std::istream_iterator<std::byte>{str}, std::istream_iterator<std::byte>{});
bool ok = boost::spirit::x3::phrase_parse(range.begin(), range.end(), ...);
I haven't used std::ranges yet so perhaps such a range should be implemented differently.
Then I got confused by your compress API so size_t my return comment was just me being confused and reading/replying to mails on my phone...
Hope this clarifies things a bit.
I'm guessing that I could use such thing in my toy game engine - not certain but perhaps I won't need the whole libarchive functionality there... To early to say.
Regards, Domen
Thanks
On Mon, 9 Mar 2020 at 07:11, Janson R. via Boost <boost@lists.boost.org> wrote:
Casts are surely required when you need to write a callback handler which signature is imposed `void(void* payload)` by the C library.
The C++ compiler allows implicit conversions TO a void * type, but to convert FROM a void * type requires an explicit cast. By "not playing nicely" I mean that suddenly everything is a handle to
something in the library and can live wherever the library allocated it (and that assumes that the running system even has dynamic memory allocation).
If C would work like C++ it would be called C++, observing it doesn't work like C++ is a no-op. d. -- @systemdeg "We value your privacy, click here!" Sod off! - degski "Anyone who believes that exponential growth can go on forever in a finite world is either a madman or an economist" - Kenneth E. Boulding "Growth for the sake of growth is the ideology of the cancer cell" - Edward P. Abbey
On Mon, Mar 9, 2020, 1:19 PM degski via Boost <boost@lists.boost.org> wrote:
On Sun, 8 Mar 2020 at 17:59, Gavin Lambert via Boost < boost@lists.boost.org> wrote:
In other words, the same type of shitty C API found in ZLib - no
On 7/03/2020 07:14, Vinnie Falco wrote: thanks.
There's a reason general-use libraries end up gravitating towards shitty C APIs -- because C++ has a much more shitty ABI, making the C API the only one that can actually be consumed cross-compiler and cross-language-binding.
Thanks for confirming I'm not alone thinking the above. I would like to add that imo, there is nothing shitty about a c-api, it is a different language all-together, the fact that it is different doesn't make it shitty. It's clear, simple and without surprises (cross platform/std/compiler), what's not to love?
Dismissing things like zlib (here on the list) or re-writing half of OpenSSL in c++ (also under consideration), is utter madness and shows little appreciation of the age of those libs and the breadth of their adoption/use.
I prefer C for ABI part but do prefer C++ for API so I, like you, consider rewrites in some cases counter productive, however I do want my code to use C++ API and statically/dynamically link to C however the header only wrapper wants (that's why I always wrap OpenGL and other hair pulling level C APIs). Everyone has their own taste and reasons but still: Since when did Boost mailing list become a place for rallies against C++? Btw in case there is a direction towards wrapper instead of rewrite - perhaps Boost lib should not be limited to a single compression and instead wrap libarchive or something like that: https://github.com/do-m-en/libarchive_cpp_wrapper This one is my unmaintained old toy but I noticed that others can be found by Google so maybe they could be used as a starting point. Regards, Domen
degski
On Fri, Mar 6, 2020 at 12:50 PM Janson R. via Boost <boost@lists.boost.org> wrote:
I have recently been working on a C++ compression library very similar to zlib after trying to implement some HTTP compression support over Boost.Beast and realizing after some discussion with sir Falco that while it would be a nice builtin feature for Beast, it would possibly be a better idea to have zlib-like compression be a separate library in order to be properly maintainable and likely more useful.
Do you thing that this could be useful to have as its own entity in the Boost environment? Any kind of feedback on the idea and the library is warmly welcome.
(As someone who had to recently peek into the zlib C source code...) Hi. I'd very much welcome a clean pure C++ implementation of basic deflate compression, because the C code I saw did not give me a warm and buzzy feeling, honestly. I need access to ZIP files, and since minizip and zlib are pretty much intertwined, as far as I saw, I couldn't easily use this new library, if it lacked ZIP support. And if that support allowed efficient *and* multi-threaded access to the ZIP entries, all the better. The raw file IO can still be serial, but at the very least the compression / decompression should be able to run in parallel on multiple threads, via ASIO. I'm dealing with ZIP files which reach into the 100,000s to millions of entries, and having to serially read+uncompress or compress+write per entry is slow! All those cores could be put to good use. Lastly, a quick glance at the code showed plain enums vs enum classes, capitalized vs all-lowercased enums; such naming inconsistencies was surprising. Perhaps it's from "consistency" with zlib? Not sure it's a good idea. Pick a style and stick to it IMHO. Thanks, --DD
On Fri, Mar 6, 2020 at 7:48 AM Dominique Devienne via Boost <boost@lists.boost.org> wrote:
Hi. I'd very much welcome a clean pure C++ implementation of basic deflate compression, because the C code I saw did not give me a warm and buzzy feeling, honestly.
Yep, that's the goal. And we also want robust tests which cover corner cases and known bugs/fixes, along with 100% coverage.
I couldn't easily use this new library, if it lacked ZIP support.
Yes, ZIP, gZip, and other flavors of deflate (which really only differ in the additional material prepended or appended to the compressed data) should be supported, with a clean API.
And if that support allowed efficient *and* multi-threaded access to the ZIP entries, all the better. The raw file IO can still be serial, but at the very least the compression / decompression should be able to run in parallel on multiple threads, via ASIO.
Now this is a bridge too far :) I don't think we need to get Asio involved here. However, we should ensure that the interface we settle on does not present an obstacle to a third party implementing the parallel algorithm you describe on top of the deflate algorithm.
Lastly, a quick glance at the code showed plain enums vs enum classes, capitalized vs all-lowercased enums; such naming inconsistencies was surprising. Perhaps it's from "consistency" with zlib? Not sure it's a good idea. Pick a style and stick to it IMHO.
The ZLib in Beast (upon which this new project is based) is unfinished. It does work though, and is used for the permessage-deflate extension of Beast websocket. Beast users don't have to deal with the hassle of having a separate zlib dependency, so it has achieved its goal in that sense. However I did not put all of the polish and design work into it that it needs as I am only one person. I did port it to header-only C++ though, if you have a look you can see that it is considerably different from the original ZLib, with no small effort. It can be further improved. Regards
Le vendredi 06 mars 2020 à 16:51 +0100, Dominique Devienne via Boost a écrit :
On Fri, Mar 6, 2020 at 12:50 PM Janson R. via Boost <boost@lists.boost.org> wrote:
I have recently been working on a C++ compression library very similar to zlib after trying to implement some HTTP compression support over Boost.Beast and realizing after some discussion with sir Falco that while it would be a nice builtin feature for Beast, it would possibly be a better idea to have zlib-like compression be a separate library in order to be properly maintainable and likely more useful. Do you thing that this could be useful to have as its own entity in the Boost environment? Any kind of feedback on the idea and the library is warmly welcome.
(As someone who had to recently peek into the zlib C source code...)
Hi. I'd very much welcome a clean pure C++ implementation of basic deflate compression, because the C code I saw did not give me a warm and buzzy feeling, honestly.
+1. Same feeling here. A decent c++ api for dealing with zlib compression, and on top of that zip format would indeed be very useful. Current options sucks (about 15 years ago, i had to develop (closed) code to read and write some zip files. I ran into the same need a few months ago, just to see nothing has improved in this regard). Regards, Julien
I really would appreciate if the library could be used as header only, and contained some sort of abbreviation of these convenience functions: auto compress(const std::vector<uint8_t>& uncompressed) -> std::vector<uint8_t>; auto decompress(const std::vector<uint8_t>& compressed) -> std::optional<std::vector<uint8_t>>; /Viktor On Fri, Mar 6, 2020 at 12:50 PM Janson R. via Boost <boost@lists.boost.org> wrote:
Hello everyone,
I have recently been working on a C++ compression library very similar to zlib after trying to implement some HTTP compression support over Boost.Beast and realizing after some discussion with sir Falco that while it would be a nice builtin feature for Beast, it would possibly be a better idea to have zlib-like compression be a separate library in order to be properly maintainable and likely more useful. The current working version can be viewed at https://github.com/ryanjanson/Deflate
however the API could still be helped in terms of modern C++ usability, and what I currently have in mind can be found annotated here: https://gist.github.com/AeroStun/687ec9ca69404e26f8e02e5084926036
Do you thing that this could be useful to have as its own entity in the Boost environment? Any kind of feedback on the idea and the library is warmly welcome.
Regards, Janson
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
On Fri, Mar 6, 2020 at 10:03 AM Viktor Sehr via Boost <boost@lists.boost.org> wrote:
I really would appreciate if the library could be used as header only
I'll take it one step farther. The library should: * Default to compilation into a static or dynamic lib, e.g. libboost_deflate.o * Compile header-only, by defining BOOST_DEFLATE_HEADER_ONLY * Require only C++11 * Compile without the rest of Boost (i.e. no dependencies), by defining BOOST_DEFLATE_STANDALONE. In this configuration, C++17 or later will be required. The boost:: namespace will remain. * Configurably support C++ equivalents of Boost types such as string_view and optional. All of my new libraries follow this pattern. To assist in building such libraries, I have created a repository "library_template" which has a trivial function, that serves as a template which anyone can clone to form the starting point of a Boost library meeting these requirements. It has Bjam and Boost-compatible CMake support, tests, examples, coverage, sanitizers, CI (travis, appveyor, azure), and working badges: <https://github.com/vinniefalco/library_template>
some sort of abbreviation of these convenience functions: auto compress(const std::vector<uint8_t>& uncompressed) -> std::vector<uint8_t>; auto decompress(const std::vector<uint8_t>& compressed) -> std::optional<std::vector<uint8_t>>;
Yes thank you, this is precisely what the OP was asking. I agree having convenience functions is great, something like this too: std::string compress( string_view s ); Regards
Vinnie Falco wrote:
* Default to compilation into a static or dynamic lib, e.g. libboost_deflate.o * Compile header-only, by defining BOOST_DEFLATE_HEADER_ONLY
Practice shows that this isn't that convenient for a low-level dependency. Inevitably, header-only library A wants to use it header-only, and library B wants to use it as a compiled library, and things go south pretty quickly. We already hit this scenario in https://github.com/boostorg/timer/commit/10bf0e3d6d79e53a79f8d9e56991f855af8.... (See also https://github.com/boostorg/timer/commit/05ae7c47e99038c5f777c9682980d6d7f5d....)
On Fri, Mar 6, 2020 at 10:32 AM Peter Dimov via Boost <boost@lists.boost.org> wrote:
Practice shows that this isn't that convenient for a low-level dependency. Inevitably, header-only library A wants to use it header-only, and library B wants to use it as a compiled library, and things go south pretty quickly.
We already hit this scenario in https://github.com/boostorg/timer/commit/10bf0e3d6d79e53a79f8d9e56991f855af8.... (See also https://github.com/boostorg/timer/commit/05ae7c47e99038c5f777c9682980d6d7f5d....)
That's not applicable to the deflate use-case, because the deflate library has no dependencies. If I did write a library which had a dependency on another library which offered a header-only option, then yes it would be a mistake (if not downright rude and presumptuous) to dictate to the user how the downstream dependency must be consumed (header-only or linked library). Thanks
On 06/03/2020 20:02, Peter Dimov via Boost wrote:
It's about libraries that depend on it.
IMO libraries that depend on it would rather have the choice of having it as header-only or compiled and deal with the consequencea of their choice themselves rather than being forced onto a model. Janson
On Fri, Mar 6, 2020 at 2:11 PM Janson R. via Boost <boost@lists.boost.org> wrote:
On 06/03/2020 20:02, Peter Dimov via Boost wrote:
It's about libraries that depend on it.
IMO libraries that depend on it would rather have the choice of having it as header-only or compiled and deal with the consequencea of their choice themselves rather than being forced onto a model.
I think Peter is right in this case. Library A which uses library B should not have an opinion on whether library B is consumed as header-only or as a linked library. The decision on how each library in a linked executable is configured should be up to the top-level build target (i.e. the program) and not any of the individual components. This is why the default for libraries which have a header-only configuration, should be a linkable library target, since it creates the least headache. A header-only configuration option is only provided to capture that subset of users who don't want the hassle of integrating another linked library into their build, for whatever reason. It should not be the default, and it should not be encouraged as the status-quo. Regards
On 06.03.20 23:42, Vinnie Falco via Boost wrote:
I think Peter is right in this case. Library A which uses library B should not have an opinion on whether library B is consumed as header-only or as a linked library. The decision on how each library in a linked executable is configured should be up to the top-level build target (i.e. the program) and not any of the individual components. This is why the default for libraries which have a header-only configuration, should be a linkable library target, since it creates the least headache.
Wait, what? Linkable are a huge headache to me, to the point where I'll seriously consider rewriting a third-party linkable library to be header-only just to avoid having to link to them. Problems with linkable libraries include: - I have to compile them, which means either messing with their native build system, trying to get it to build on all of my platforms using all of my cross-compile toolchains, or replacing their native build system with my own. The latter is often easier than the former. - If the library has linkable libraries as dependencies, I get to do the same thing for them, recursively. - I have to make sure that I'm linking to the correct version of these libraries, i.e. not the system-wide installed versions. - If I need to compile multiple copies of a library with different options, I get a combinatorial explosion of different libraries. My library build matrix is full of red entries where I haven't been able a particular library in a particular configuration that I need. The green entries are often the result of days of work. On the other hand, I've never had any trouble with header-only libraries. -- Rainer Deyke (rainerd@eldwood.com)
On Fri, Mar 6, 2020 at 9:42 PM Rainer Deyke via Boost <boost@lists.boost.org> wrote:
Wait, what? Linkable are a huge headache to me, to the point where I'll seriously consider rewriting a third-party linkable library to be header-only just to avoid having to link to them.
Yes, and this is exactly why I provide a configuration option to use my libraries in a header-only configuration. Thanks
AMDG On 3/6/20 10:41 PM, Rainer Deyke via Boost wrote:
On 06.03.20 23:42, Vinnie Falco via Boost wrote:
I think Peter is right in this case. Library A which uses library B should not have an opinion on whether library B is consumed as header-only or as a linked library. The decision on how each library in a linked executable is configured should be up to the top-level build target (i.e. the program) and not any of the individual components. This is why the default for libraries which have a header-only configuration, should be a linkable library target, since it creates the least headache.
Wait, what? Linkable are a huge headache to me, to the point where I'll seriously consider rewriting a third-party linkable library to be header-only just to avoid having to link to them.
Problems with linkable libraries include: - I have to compile them, which means either messing with their native build system, trying to get it to build on all of my platforms using all of my cross-compile toolchains, or replacing their native build system with my own. The latter is often easier than the former.
This is not a difference between compiled libraries and header only libraries, per se. It's a result of the fact that being header-only forces you to write portable code, because you can't rely on the build system to handle complicated configuration steps. If you make a compiled library that contains exactly the same code that you would write for a header only library, "replace the native build system" becomes "glob the sources and add them to your project"
- If the library has linkable libraries as dependencies, I get to do the same thing for them, recursively. - I have to make sure that I'm linking to the correct version of these libraries, i.e. not the system-wide installed versions. - If I need to compile multiple copies of a library with different options, I get a combinatorial explosion of different libraries.
My library build matrix is full of red entries where I haven't been able a particular library in a particular configuration that I need. The green entries are often the result of days of work. On the other hand, I've never had any trouble with header-only libraries.
In Christ, Steven Watanabe
On Sat, Mar 7, 2020 at 5:37 AM Steven Watanabe via Boost <boost@lists.boost.org> wrote:
If you make a compiled library that contains exactly the same code that you would write for a header only library, "replace the native build system" becomes "glob the sources and add them to your project"
Actually, it is even easier than that. I provide a header file called "src.hpp" which you can just include in any one of your translation units, as a third way to consume the library in a non-header-only mode: <https://github.com/vinniefalco/json/blob/develop/include/boost/json/src.hpp> This is exactly how the linkable library is built, a TU called src.cpp just includes this header: <https://github.com/vinniefalco/json/blob/develop/src/src.cpp> Going this route gives you the compilation performance of a linkable library, without the hassle of writing a build script to emit a new library (it is just another TU in your already-existing build script). The library_template project also supports this: <https://github.com/vinniefalco/library_template/blob/develop/include/boost/library_template/src.hpp> Thanks
On 07.03.20 14:36, Steven Watanabe via Boost wrote:
On 3/6/20 10:41 PM, Rainer Deyke via Boost wrote:
Problems with linkable libraries include: - I have to compile them, which means either messing with their native build system, trying to get it to build on all of my platforms using all of my cross-compile toolchains, or replacing their native build system with my own. The latter is often easier than the former.
This is not a difference between compiled libraries and header only libraries, per se. It's a result of the fact that being header-only forces you to write portable code, because you can't rely on the build system to handle complicated configuration steps.
If you make a compiled library that contains exactly the same code that you would write for a header only library, "replace the native build system" becomes "glob the sources and add them to your project"
That's an approach that works fairly well with a fair number of libraries. Things to watch out for include non-library source files mixed in with library source files, #include paths, and of course any complex build processes that go beyond "compile these files and link them". All compiled libraries introduce some friction, but in some cases the amount of friction is trivial, and in other cases it is extreme. A header-only Boost library introduces no friction at all for me. -- Rainer Deyke (rainerd@eldwood.com)
On 3/6/2020 1:11 PM, Vinnie Falco via Boost wrote:
On Fri, Mar 6, 2020 at 10:03 AM Viktor Sehr via Boost <boost@lists.boost.org> wrote:
I really would appreciate if the library could be used as header only
I'll take it one step farther. The library should:
* Default to compilation into a static or dynamic lib, e.g. libboost_deflate.o * Compile header-only, by defining BOOST_DEFLATE_HEADER_ONLY * Require only C++11 * Compile without the rest of Boost (i.e. no dependencies), by defining BOOST_DEFLATE_STANDALONE. In this configuration, C++17 or later will be required. The boost:: namespace will remain. * Configurably support C++ equivalents of Boost types such as string_view and optional.
Please stop with this mantra of library X not depending on any other Boost library and reinventing constructs offered by other Boost libraries ( or elsewhere ). It is not the way most programmers create library software.
On Sat, Mar 7, 2020 at 1:05 PM Edward Diener via Boost <boost@lists.boost.org> wrote:
Please stop with this mantra of library X not depending on any other Boost library and reinventing constructs offered by other Boost libraries ( or elsewhere ). It is not the way most programmers create library software.
I'm not saying that this applies to all libraries. Having every library invent every wheel is clearly not sustainable. But neither is introducing a dependency on Boost just because you want to call boost::exchange or because you want the BOOST_NODISCARD macro. The decision on whether to introduce a dependency should never be automatic one way or the other. It should be a carefully considered choice which balances the costs against the benefits. Now it just so happens, for JSON, URL, and ZLib, this balance is obviously in favor of having no dependencies. These libraries make sense as "leaf" libraries (terminal nodes in the directed acyclic graph of dependencies). Although it is irrational, there are many who view Boost as "too big" or having "too much legacy code" or whatever. As JSON libraries are in high demand, there's value in ensuring that my library has no dependencies. This equation changes depending on the library of course. -- My advice on requiring C++17 for standalone versions of libraries is actually not applicable to deflate, since deflate doesn't need std::optional or std::string_view (or their Boost equivalents). A standalone version of deflate could and should require only C++11. Thanks
On Sat, Mar 7, 2020, 8:40 PM Vinnie Falco via Boost <boost@lists.boost.org> wrote:
On Sat, Mar 7, 2020 at 1:05 PM Edward Diener via Boost <boost@lists.boost.org> wrote:
Please stop with this mantra of library X not depending on any other Boost library and reinventing constructs offered by other Boost libraries ( or elsewhere ). It is not the way most programmers create library software.
I'm not saying that this applies to all libraries. Having every library invent every wheel is clearly not sustainable. But neither is introducing a dependency on Boost just because you want to call boost::exchange or because you want the BOOST_NODISCARD macro.
The decision on whether to introduce a dependency should never be automatic one way or the other. It should be a carefully considered choice which balances the costs against the benefits. Now it just so happens, for JSON, URL, and ZLib, this balance is obviously in favor of having no dependencies. These libraries make sense as "leaf" libraries (terminal nodes in the directed acyclic graph of dependencies).
Although it is irrational, there are many who view Boost as "too big"
Several GB installed is large and not irrational! On the other hand, the use of c++17 with the oldest compiler, barely working, is irrational. or having "too much legacy code" or whatever. As JSON libraries are in
high demand, there's value in ensuring that my library has no dependencies. This equation changes depending on the library of course.
--
My advice on requiring C++17 for standalone versions of libraries is actually not applicable to deflate, since deflate doesn't need std::optional or std::string_view (or their Boost equivalents). A standalone version of deflate could and should require only C++11.
Thanks
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
On 06/03/2020 19:02, Viktor Sehr via Boost wrote:
I really would appreciate if the library could be used as header only, and contained some sort of abbreviation of these convenience functions: auto compress(const std::vector<uint8_t>& uncompressed) -> std::vector<uint8_t>; auto decompress(const std::vector<uint8_t>& compressed) -> std::optional<std::vector<uint8_t>>;
Good thing the library is already header only; and thanks for the suggestion which I will definitely include some closely resembling version of because it makes total sense. Janson
On Fri, Mar 6, 2020 at 1:54 PM Janson R. via Boost <boost@lists.boost.org> wrote:
Good thing the library is already header only;
Yeah, note though that the default should be a linkable library (i.e. you have to opt-in to header only by defining a macro, BOOST_DEFLATE_HEADER_ONLY in this case). Thanks
participants (14)
-
degski
-
Domen Vrankar
-
Dominique Devienne
-
Edward Diener
-
Gavin Lambert
-
Janson R.
-
Jeff Garland
-
Julien Blanc
-
Peter Dimov
-
Phil Endecott
-
Rainer Deyke
-
Steven Watanabe
-
Viktor Sehr
-
Vinnie Falco