Regex allocator support

Hi, We recently had to use the regular expression library in conditions where there was no available physical memory, for reporting resource consumption when our console game runs out of memory. This wasn't possible however due to the fact that the basic_regex does some dynamic allocations and doesn't provide any allocator specialization, allthough match_results for example does. So we modified the boost code a little bit and added a new template paramter for the allocator, and made constructors that takes allocators as parameters. This was quite straightforward to do, the allocators just had to be passed forward to the all internal classes it uses. The shared_ptr's had to be changed to use the three parameter constructor, that takes a a deleter and allocator, to prevent them from allocating. The mem_block_cache system was a bit more complicated. Initially we disabled it completely, but that obviously wasn't the best solution. So we figured out, that we can use a custom allocator as default, instead of std::allocator. That custom allocator then had to be specialized for the mem_block_nodes, and the funcion calls to get_mem_block replaced by allocator calls. Our implementation isn't ideal, though as we used mem_block_node as the specialization, and to pass the correct size to the allocator, we allocate BOOST_REGEX_BLOCKSIZE / sizeof(mem_block_node) of them, this way it works for allocators that don't specialize for mem_block_node(but only if the size is dividable by sizeof(mem_block_node) ). Better would probably be to make a new type that is BOOST_REGEX_BLOCKSIZE big. We also found another problem with allocation, the match_result::format functions only takes string_type, that isn't specialized by any allocator as they input. This made it impossible to call them with strings using different allocators. So we added specializations of those functions for different allocators and traits. There are probably more functions like that, but since our code doesn't use them there was no need to change anything. So my question is, what are the chances that theese kinds of modificiations gets implemented in the actual library, and possibly in the next c++ standard library as well? I consider it a quite big design flaw to not have any control of the internal allocations, so for me this a quite big thing. Obviously we are doing well with the modifications we made here internally, but there are other applications too that would benefit, many embedded applications for example. Would our code changes be of any use? The changes are only the minimal needed to make it work in our case, and there's many compiler and option permutations not taken care of, so obviously a lot of work would still be needed. Also there's no additional unit tests made. Fred Sundvik Bugbear Entertainment Ltd.

Fred Sundvik wrote:
Hi,
We recently had to use the regular expression library in conditions where there was no available physical memory, for reporting resource consumption when our console game runs out of memory. This wasn't possible however due to the fact that the basic_regex does some dynamic allocations and doesn't provide any allocator specialization, allthough match_results for example does. So we modified the boost code a little bit and added a new template paramter for the allocator, and made constructors that takes allocators as parameters.
<snip>
So my question is, what are the chances that theese kinds of modificiations gets implemented in the actual library, and possibly in the next c++ standard library as well? I consider it a quite big design flaw to not have any control of the internal allocations, so for me this a quite big thing.
+1; I agree with the OP's sentiments, though I wouldn't say it's necessarily a "big" design flaw. However, I've never used the Boost.Regex library, and I don't know if the next standard specifies allocator parameters. I gather it doesn't? <snip>

We recently had to use the regular expression library in conditions where there was no available physical memory, for reporting resource consumption when our console game runs out of memory. This wasn't possible however due to the fact that the basic_regex does some dynamic allocations and doesn't provide any allocator specialization, allthough match_results for example does. So we modified the boost code a little bit and added a new template paramter for the allocator, and made constructors that takes allocators as parameters.
<snip>
So my question is, what are the chances that theese kinds of modificiations gets implemented in the actual library, and possibly in the next c++ standard library as well? I consider it a quite big design flaw to not have any control of the internal allocations, so for me this a quite big thing.
+1; I agree with the OP's sentiments, though I wouldn't say it's necessarily a "big" design flaw. However, I've never used the Boost.Regex library, and I don't know if the next standard specifies allocator parameters. I gather it doesn't?
Sigh... no. The original regex++ library upon which the Boost version is derived *did* have allocators for everything. But during review folks felt quite strongly that: a) It was overkill. b) Regex should be free to manage it's own memory - it's not a container - it's needs are much more complex than that - so it should be free to optimize memory allocation and caching as it sees fit. As a result this facility was removed. Rightly or wrongly, no one on the stds committee questioned that design decision. Some specific comments on the other issues:
The mem_block_cache system was a bit more complicated. Initially we disabled it completely, but that obviously wasn't the best solution. So we figured out, that we can use a custom allocator as default, instead of std::allocator. That custom allocator then had to be specialized for the mem_block_nodes, and the funcion calls to get_mem_block replaced by allocator calls. Our implementation isn't ideal, though as we used mem_block_node as the specialization, and to pass the correct size to the allocator, we allocate BOOST_REGEX_BLOCKSIZE / sizeof(mem_block_node) of them, this way it works for allocators that don't specialize for mem_block_node(but only if the size is dividable by sizeof(mem_block_node) ). Better would probably be to make a new type that is BOOST_REGEX_BLOCKSIZE big.
Can you not just define BOOST_REGEX_RECURSIVE and use the stack based implementation that does away with that altogether?
We also found another problem with allocation, the match_result::format functions only takes string_type, that isn't specialized by any allocator as they input. This made it impossible to call them with strings using different allocators. So we added specializations of those functions for different allocators and traits. There are probably more functions like that, but since our code doesn't use them there was no need to change anything.
The current Trunk allows any string, container or character type, plus function objects to be used as arguments to regex_format/regex_replace etc, which should avoid that problem.
We recently had to use the regular expression library in conditions where there was no available physical memory, for reporting resource consumption when our console game runs out of memory.
I'm not sure I understand the issue here, that operator new doesn't report out of memory conditions? What does your allocator do that's different to operator new, and why not replace global new and delete with calls to your custom allocator? Just trying to get a handle on the issue...
So my question is, what are the chances that these kinds of modifications gets implemented in the actual library, and possibly in the next c++ standard library as well?
As noted above, I was asked to remove this feature during review, so I'm not *that* keen to put it back in! As far as the standard is concerned it's basically a done deal at this stage in the process - too late for such a big change. WRT Boost I'd really like to get a handle on the issue better before making judgment - you seem to be one of very few people actually using custom allocators ;-) Regards, John.

"John Maddock" <john@johnmaddock.co.uk> wrote in message news:57BB0368898A489496F667E90512F826@acerlaptop...
We recently had to use the regular expression library in conditions where there was no available physical memory, for reporting resource consumption when our console game runs out of memory. This wasn't possible however due to the fact that the basic_regex does some dynamic allocations and doesn't provide any allocator specialization, allthough match_results for example does. So we modified the boost code a little bit and added a new template paramter for the allocator, and made constructors that takes allocators as parameters.
<snip>
So my question is, what are the chances that theese kinds of modificiations gets implemented in the actual library, and possibly in the next c++ standard library as well? I consider it a quite big design flaw to not have any control of the internal allocations, so for me this a quite big thing.
+1; I agree with the OP's sentiments, though I wouldn't say it's necessarily a "big" design flaw. However, I've never used the Boost.Regex library, and I don't know if the next standard specifies allocator parameters. I gather it doesn't?
Sigh... no.
The original regex++ library upon which the Boost version is derived *did* have allocators for everything. But during review folks felt quite strongly that:
a) It was overkill. b) Regex should be free to manage it's own memory - it's not a container - it's needs are much more complex than that - so it should be free to optimize memory allocation and caching as it sees fit.
As a result this facility was removed. Rightly or wrongly, no one on the stds committee questioned that design decision.
Some specific comments on the other issues:
The mem_block_cache system was a bit more complicated. Initially we disabled it completely, but that obviously wasn't the best solution. So we figured out, that we can use a custom allocator as default, instead of std::allocator. That custom allocator then had to be specialized for the mem_block_nodes, and the funcion calls to get_mem_block replaced by allocator calls. Our implementation isn't ideal, though as we used mem_block_node as the specialization, and to pass the correct size to the allocator, we allocate BOOST_REGEX_BLOCKSIZE / sizeof(mem_block_node) of them, this way it works for allocators that don't specialize for mem_block_node(but only if the size is dividable by sizeof(mem_block_node) ). Better would probably be to make a new type that is BOOST_REGEX_BLOCKSIZE big.
Can you not just define BOOST_REGEX_RECURSIVE and use the stack based implementation that does away with that altogether?
Yes, I guess we could have, however it wouldn't have solved our other problems. Also we use the regex library in other parts of the game, where the requirements aren't as strict. So we ended up with this solution.
We also found another problem with allocation, the match_result::format functions only takes string_type, that isn't specialized by any allocator as they input. This made it impossible to call them with strings using different allocators. So we added specializations of those functions for different allocators and traits. There are probably more functions like that, but since our code doesn't use them there was no need to change anything.
The current Trunk allows any string, container or character type, plus function objects to be used as arguments to regex_format/regex_replace etc, which should avoid that problem.
That's great to hear.
We recently had to use the regular expression library in conditions where there was no available physical memory, for reporting resource consumption when our console game runs out of memory.
I'm not sure I understand the issue here, that operator new doesn't report out of memory conditions? What does your allocator do that's different to operator new, and why not replace global new and delete with calls to your custom allocator? Just trying to get a handle on the issue...
I try to explain the situation a bit better. Our game is a console game, that has a physical limit on the amount of memory it can use. We have overloaded the global new operators, for memory tracking and other things. During development it's quite usual that the artists makes too many or too big textures, and then we get out of memory. At this point we want to dump some statistics of all loaded textures to a file for them to analyze, and boost regex is used during that generation. Because the game is already out of memory at this point, we obviously can't allocate even a single byte during the generation. Our custom allocator reserves some memory on the stack, which hopefully is available. Allthough not guaranteed in all cases, it's good enough since it's strictly a debug feature. The allocator then handles out memroy from this stack pool. This is of course an extreme case, but we game developers like to track and handle all memory allocations, in normal cases too. I give you one concrete example. During level loading, regular expressions can be used to generate data to be loaded for example. Theese regular expressions are used strictly during loading, and therefor of a temporary nature. Due to fragmentation issues it's not good to mix allocations with different life-times, so all temporary allocations that they do should go to their own memory pool. When the parsing stage of the loading is done, theese pools should be freed to give room for the more important data, which becomes an issue if you let the library handle all memory allocations internally. It's not a big problem, if it's done like most third party libraries used in game development, they expose some allocation callback, or configuration. Boost however doesn't provide even this, so we have to trust that it does proper pooling internally and doesn't trash our memory, additionally we have to hope that it doesn't leave too much memory behind, like the block_cache system potentially could do, at least for embedded systems, where the memory is really tight. I know that the amount can be configured, but that would make it slower runtime. Additionally if there's just a global memory pool for the whole library, it's impossible to use the libary in different ways for different tasks, which makes allocators superior to an internal global allocator scheme. Note that allocators don't stop the library from doing clever memory pooling internally, it would then just ask the allocator to allocate bigger chuncks of memory, instead of one allocation for each element. I don't like the standard c++ allocators that much, especially not "All instances of a given allocator type are required to be interchangeable and always compare equal to each other. (20.1.5)" (Which is why our implementiation doesn't assume that). So for me the customization wouldn't necessarilly need to be a new allocator template parameter. It could just as well be a constructor taking a pointer to some regex_memory object, that application is free to override. The critical thing is that all memory allocations should go through this, not a single allocation should go outside.
So my question is, what are the chances that these kinds of modifications gets implemented in the actual library, and possibly in the next c++ standard library as well?
As noted above, I was asked to remove this feature during review, so I'm not *that* keen to put it back in! As far as the standard is concerned it's basically a done deal at this stage in the process - too late for such a big change. WRT Boost I'd really like to get a handle on the issue better before making judgment - you seem to be one of very few people actually using custom allocators ;-)
Regards, John.
I hope my explanations clear things up a bit. I know that I speak for just a few people, but things like theese, are the exact things that makes especially game developers stay away from boost, so I hope you will at least look more into this. Fred

I hope my explanations clear things up a bit. I know that I speak for just a few people, but things like theese, are the exact things that makes especially game developers stay away from boost, so I hope you will at least look more into this.
Thanks for the very lucid explanation: that helps a lot, looks like you're in a bind :-( Adding allocator support to regex in a way that can be tested and proved to prevent *any* other allocations would be hard to do and maintain I suspect... so I'd like to hear whether this is a more general issue for folks before jumping in. Cheers, John.

I hope my explanations clear things up a bit. I know that I speak for just a few people, but things like theese, are the exact things that makes especially game developers stay away from boost, so I hope you will at least look more into this.
Thanks for the very lucid explanation: that helps a lot, looks like you're in a bind :-(
Adding allocator support to regex in a way that can be tested and proved to prevent *any* other allocations would be hard to do and maintain I suspect... so I'd like to hear whether this is a more general issue for folks before jumping in.
PS, can you file a feature request for this at svn.boost.org along with your explanation? That way this won't get lost, Cheers, John.

John Maddock <john <at> johnmaddock.co.uk> writes:
I hope my explanations clear things up a bit. I know that I speak for just a few people, but things like theese, are the exact things that makes especially game developers stay away from boost, so I hope you will at least look more into this.
Thanks for the very lucid explanation: that helps a lot, looks like you're in a bind
Adding allocator support to regex in a way that can be tested and proved to prevent *any* other allocations would be hard to do and maintain I suspect... so I'd like to hear whether this is a more general issue for folks before jumping in.
Since I just independently created another thread on the need for regex allocators (Topic: interprocess allocators) and now see that this thread is also active the same week, I suspect this issue arises not infrequently. For the reasons given in the "interprocess allocators thread," the justification for including allocators in regexes is IMO every bit as strong as for strings and containers.
Cheers, John.
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

<snip>
I hope my explanations clear things up a bit. I know that I speak for just a few people, but things like theese, are the exact things that makes especially game developers stay away from boost, so I hope you will at least look more into this.
I don't think you speak for just a "few" people - I think this problem is encountered in a lot of development environments. I work on carrier-grade VOIP servers and we use allocators for almost everything. And it is *SO* frustrating to encounter a library that you really like but can't use because of no or sub-optimal allocator support.... Sigh... Andy.
participants (5)
-
Andrew Venikov
-
Fred Sundvik
-
Jeffrey Hellrung
-
John Maddock
-
Mike Spertus