A solution to return dynamic strings without heap allocation. Any interest?
Hi all, I haven't been succesful at attracting interest in a formatting library [1] I've been working lately. But recently I realized that part of it could be isolated as a small standalone library that could solve an old common troublesome situation in C++: Suppose you need to create a function that returns/provides a string whose content and size is unknown at compilation time. The first approach is to make it return a `std::string`. But if it need to be usable in environments like bare-metal real-time system, then one usually makes it take a raw string as an output argument, more or less like this: struct result{ char* it; bool truncated; }; result get_message(char* dest, std::size_t dest_len); But this is clearly not a perfect solution since there's nothing really effective the caller can do when get_mesage fails because of the destination string being is too small. So I present the `outbut` abstract class. It somehow resembles `std::streambuf`, but with a simpler and lower level design, which is the result of many attempts looking for the best performance [2] and usability in my formatting library. Afaics, it does not require a hosted C++ implementation, though I would like someone else to confirm that. Now the caller of `get_message` has to choose or create a suitable class type deriving from outbuf, that dictates where the message is written to. For example, if the user wants to get a `std::string`, then `string_maker` will do the job: #include <boost/outbuf/string.hpp> // ... boost::outbuf::string_maker<false> msg; get_message(msg) std::string str = msg.finish(); Or, if one wants it to write into a raw string, then use `cstr_writer` char buff[buff_size]; boost::outbuf::cstr_writer csw(buff, buff_size); get_message(csw); auto result = csw.finish(); if (result.truncated) { // ... Those `finish` functions above do not belong to `outbuf`. They are defined in the concrete derived types only. It's solely by convention that they share the same name. Yes, using `string_maker` still leads to heap allocation and `cstr_writer` to string truncation. The problem isn't solved by these. However, a string object is never the final destination. So the user could rather use another class that writes the message directly into the final destination ( output console, a log file, an LCD display or whatever ). It is not difficult to implement concrete subtypes of `outbuf`. `outbuf` is actually a type alias: template <bool NoExcept, typename CharT> class basic_outbuf; using outbuf = basic_outbuf<false, char>; using outbuf_noexcept = basic_outbuf<true, char>; That `NoExcept` template parameter is perhaps the controversial part. It is not present originally in my formatting library. Besides the destructor, `basic_outbuf` has only one virtual function: `recycle()`, and it is declared as `noexcept(NoExcept)`. This is the only effect the `NoExcept` template parameter has. All other functions are guaranteed not to throw. Hence, by taking a `outbuf_noexcept&` parameter, a function states that the destination must not throw. That might be particularly good if such object comes from another module and we must avoid exceptions crossing modules boundaries. On the other hand, if a function takes an `outbuf&`, then it also accepts `outbuf_noexcept&`, because `basic_outbuf<true, CharT>&` derives from `basic_outbuf<false, CharT>`. When using `string_maker` you can choose between the two kinds. `string_maker<true>` derives from `basic_outbuf<true, char>&`. So if any exception raises from its internal `std::string`, then it is caught by a try/catch(...) block, stored as an `exception_ptr` and rethrown by `finish()`. This has the undesirable effect of delaying its proper handling ( after all, we rather stop what's being doing as soon as possible when an error appears ). So I think the recommendation would be to use `string_maker<false>&` if possible, and `string_maker<true>&` only if necessary. The reason why I think it's controversial is that it makes `recycle()` violates the Lakos Rule. And although the Lakos Rule is not a requirement in boost (afaik), I was wondering whether this library could interest LEWG in future as well. Anyway, this whole noexcept idea is not a central part of this library and can it be removed. Now, the other topic is how to implement that `get_message` function, i.e., how to write into an `outbuf` object. One can use `puts` and `putc` functions to insert string and chars. One can also use `fmtlib` through an output iterator adapter. Or one can write directly in the buffer. But I will ask you to read the doc [3] for that. It's a quick read, quicker than this email. The repository is: https://github.com/robhz786/outbuf I would prefer it to be part of Boost.Core instead of being a standalone library, and also to remove the `outbuf` namespace. But that is up to you. Best regards, Roberto [1] The Stringify library: https://github.com/robhz786/stringify [2] The great performance of Stringify is mainly thanks to the design of outbuf ( named there as `output_buffer` ): https://robhz786.github.io/stringify/doc/html/benchmarks/benchmarks.html#ben... [3] "Writing into and outbuf object" https://robhz786.github.io/outbuf/doc/html/index.html#boost_outbuf.overview....
On Thu, Aug 29, 2019 at 9:24 AM Roberto Hinz via Boost <boost@lists.boost.org> wrote:
I would prefer it to be part of Boost.Core instead of being a standalone library, and also to remove the `outbuf` namespace. But that is up to you.
Boost.Core is for Boost facilities used by other Boost libraries that for simpler tasks. You could propose it for Boost.Utility. However, it seems more worthy of its own library. In either choice (Utility, or your own library) the process for a Boost formal review is at: https://www.boost.org/community/reviews.html Glen
Didn’t you see reply from Glen Fernandez? Check archive. -- Janek Kozicki, PhD. DSc. Arch. Assoc. Prof. Gdańsk University of Technology Faculty of Applied Physics and Mathematics Department of Theoretical Physics and Quantum Information -- pg.edu.pl/jkozicki (click English flag on top right) On 3 Sep 2019, 14:21 +0200, Roberto Hinz via Boost <boost@lists.boost.org>, wrote:
No one interested?
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
On Tue, Sep 3, 2019 at 11:11 AM Janek Kozicki <janek.listy.mailowe@gmail.com> wrote:
Didn’t you see reply from Glen Fernandez? Check archive.
Well, yes, but I din't see his message as showing interest, but just as a guidance. And I presume I first need to check whether there is any interest, otherwise what's the point of going any further. Am I misunderstanding something?
On Tue, Sep 3, 2019 at 5:21 AM Roberto Hinz via Boost <boost@lists.boost.org> wrote:
No one interested?
`std::basic_ostream` is actually quite usable once you figure out how it works (which is admittedly more difficult than it should be). It can be set up to not perform any memory allocations, depending on the implementation of the derived class. It might not be perfect but it is part of the standard library and thus has a natural advantage that would require extraordinary functionality from an external component to overcome. And I'm not seeing that in the proposed `outbuf`. Regards
On Tue, Sep 10, 2019 at 11:05 PM Vinnie Falco <vinnie.falco@gmail.com> wrote:
`std::basic_ostream` is actually quite usable once you figure out how it works
Let me illustrate why I disagree with that. Suppose you want to implement a base64 encoder. You want it to be fast, agnostic, and simple to use. Now suppose you adopt `std::ostream` as the destination type: void to_base64( std::ostream& dest, const std::byte* src, std::size_t count ); You will face two issues: 1) It doesn't matter how well you (as the library author) understand basic_ostream. The *user* needs to implement derivates of basic_ostream to customize the destination types. 2) It's impossible to achieve a decent performance. If you used `outbuf` you could write directly into the buffer. But with `std::ostream` you have to call member functions like `put` or `write` to for each little piece of the content, or to use an additional intermediate buffer. And this is far to be a specific use case. The same issues apply for any kind of encoding, binary or text.
On Thu, Sep 12, 2019 at 10:21 AM Roberto Hinz <robhz786@gmail.com> wrote:
Let me illustrate why I disagree with that...
Okay, these are two fair points, but then the title of the original post is not accurate. What you're really proposing is "a better std::ostream" which is an entirely different conversation. Regards
On Fri, Sep 13, 2019 at 9:46 AM Vinnie Falco <vinnie.falco@gmail.com> wrote:
On Thu, Sep 12, 2019 at 10:21 AM Roberto Hinz <robhz786@gmail.com> wrote:
Let me illustrate why I disagree with that...
Okay, these are two fair points, but then the title of the original post is not accurate. What you're really proposing is "a better std::ostream" which is an entirely different conversation.
Regards
You are right. Thanks for the comments
On Fri, Sep 13, 2019 at 6:50 AM Roberto Hinz <robhz786@gmail.com> wrote:
You are right. Thanks for the comments
Please don't take any of these comments as discouragement. Quite the opposite they are well intended with a goal of progress in mind. There have been a chorus of voices clamoring for "a better std::ostream" and it has been the subject of a few papers. Offering users better versions or replacements for standard library types such as std::ostream lands squarely within the purview of the Boost Libraries. There is already precedent for this, such as boost::system::error_category and boost::shared_ptr, both of which are superior to their standard library equivalents. However any proposed replacement needs to address the body of work that has already been done in this area. What makes it better or more usable? Thanks
On Fri, Sep 13, 2019 at 11:42 AM Vinnie Falco <vinnie.falco@gmail.com> wrote:
On Fri, Sep 13, 2019 at 6:50 AM Roberto Hinz <robhz786@gmail.com> wrote:
You are right. Thanks for the comments
Please don't take any of these comments as discouragement. Quite the opposite they are well intended with a goal of progress in mind. There have been a chorus of voices clamoring for "a better std::ostream" and it has been the subject of a few papers. Offering users better versions or replacements for standard library types such as std::ostream lands squarely within the purview of the Boost Libraries. There is already precedent for this, such as boost::system::error_category and boost::shared_ptr, both of which are superior to their standard library equivalents.
However any proposed replacement needs to address the body of work that has already been done in this area. What makes it better or more usable?
Thanks
Not discoureged at all. I will enhance the documentation based on your feedbacks and come back later. Thank you
Hi all, this is a continuation of the thread "A solution to return dynamic strings without heap allocation. Any interest?" Just telling you that I rewrote the docs, especially the rationale: https://robhz786.github.io/outbuf/doc/outbuf.html Best regards Robhz
Hi Roberto,
On 3. Oct 2019, at 14:22, Roberto Hinz via Boost <boost@lists.boost.org> wrote:
Hi all, this is a continuation of the thread "A solution to return dynamic strings without heap allocation. Any interest?" Just telling you that I rewrote the docs, especially the rationale: https://robhz786.github.io/outbuf/doc/outbuf.html Best regards Robhz
Quoted from the rationale: "Your function is complex to use. The user needs to implement a class that derives from ostream to customize the destination. It’s a complex task for most C++ programmers." Agreed, although boost.iostreams makes that easier. "It’s impossible to achieve a good perfomance. std::ostream does not provide direct access to the buffer. to_base64 needs to call member functions like write or put for every little piece of the content, or to use an itermediate buffer." It is not impossible to achieve good performance, page 68 of http://www.open-std.org/jtc1/sc22/wg21/docs/TR18015.pdf list problems, which are solvable. In practice, increasing the buffer size helps and turning off synchronisation with stdio: https://stackoverflow.com/questions/5166263/how-to-get-iostream-to-perform-b... The SO answer lists several examples were C++'s iostreams beats C's stdio in performance. Your argument is also not convincing. Just calling member functions doesn't make something slow if you compile with optimisations, which is a must with C++. I think it is quite natural that the stream makes it hard for you to touch the buffer. The stream objects hide buffer management under an interface. The ostream object handles the buffer for you, you don't have to know when you hit the boundary and things need to be flushed to the device. You can't hide something and expose it at the same time, this is breaking the encapsulation, so naturally, the streams make it difficult to touch the buffer directly. Although you can, if you really want to, and it is pretty simple to set up: char Buffer[N]; std::ofstream file("file.txt"); file.rdbuf()->pubsetbuf(Buffer, N); Now you can mess around with the stack-allocated buffer. It is not clear to me what the advantage of outbuf is over this. I think the real problem with iostreams is that it lacks good documentation and tutorials on how to do the more complicated things. Best regards, Hans
Hi Hans, On Mon, Oct 7, 2019 at 5:14 AM Hans Dembinski <hans.dembinski@gmail.com> wrote:
"It’s impossible to achieve a good perfomance. std::ostream does not provide direct access to the buffer. to_base64 needs to call member functions like write or put for every little piece of the content, or to use an itermediate buffer."
It is not impossible to achieve good performance, page 68 of http://www.open-std.org/jtc1/sc22/wg21/docs/TR18015.pdf list problems, which are solvable.
In practice, increasing the buffer size helps and turning off synchronisation with stdio:
https://stackoverflow.com/questions/5166263/how-to-get-iostream-to-perform-b... The SO answer lists several examples were C++'s iostreams beats C's stdio in performance.
Your argument is also not convincing. Just calling member functions doesn't make something slow if you compile with optimisations, which is a must with C++. (...)
Thanks for the feedback. I removed that part from the docs. I did some benchmarks. First I implemented a base64 encoder using outbuf and std::streambuf and I couldn't find any conclusive evidence that any one is faster than the other ( I get different results from seemingly irrelevant code changes ). Then I implemented a simple json writer. In this case the streambuf buffer was about 30% slower than the outbuf version. Not a tremendous difference. I choosed to write directly into streambuf instead of ostream so that we can disconsider many of the possible QoI issues related to std::ostream. That article you reference seems to only address optimizations on facets usage and formatting, which I think should not have any effect in these benchmarks. That SO discussion seems to not apply either, since the streambuf I used does not write into a file but solely to char array. The benchmark implementations are available at https://github.com/robhz786/outbuf/tree/master/performance Anyway, it's clear now that my statement is a bit reckless. Best Regards
participants (5)
-
Glen Fernandes
-
Hans Dembinski
-
Janek Kozicki
-
Roberto Hinz
-
Vinnie Falco