[Boost String Algorithms library] ireplace_all bug with character arrays
Hi,
I've encountered a bug with ireplace_all from the Boost String Algorithms library for version 1.35. If I pass a character array for the Format argument, the function ignores the null terminator and instead uses the entire array minus the last character to replace the occurrences of the Search string in the Input parameter. However, if I provide a pointer to a character array, the string replacement is done as expected.
When I stepped into the code, I came across the following lines (excerpt from /trunk/boost/range/as_literal.hpp):
template< class Char, std::size_t sz >
inline iterator_range
Richard Turrin wrote:
Hi,
I've encountered a bug with ireplace_all from the Boost String Algorithms library for version 1.35. If I pass a character array for the Format argument, the function ignores the null terminator and instead uses the entire array minus the last character to replace the occurrences of the Search string in the Input parameter. However, if I provide a pointer to a character array, the string replacement is done as expected.
When I stepped into the code, I came across the following lines (excerpt from /trunk/boost/range/as_literal.hpp):
template< class Char, std::size_t sz > inline iterator_range
as_literal( const Char (&arr)[sz] ) { return boost::make_iterator_range( arr, arr + sz - 1 ); } There is no check for a null terminator. Nor did I find one after this function returned. I think the author here assumed that a caller would always pass in a character array of the right size.
Character array seems to have three views. 1. array: SIZE("ab\0c") == 5; 2. literal: SIZE("ab\0c") == 4; 3. null-terminated: SIZE("ab\0c") == 2; Both 2. and 3. is forced into boost::as_array. "One name, two facilities" usually makes trouble. IMO, `as_c_str` should have been added for 3. Regards, -- Shunsuke Sogame
shunsuke skrev:
Richard Turrin wrote:
Hi,
I've encountered a bug with ireplace_all from the Boost String Algorithms library for version 1.35. If I pass a character array for the Format argument, the function ignores the null terminator and instead uses the entire array minus the last character to replace the occurrences of the Search string in the Input parameter. However, if I provide a pointer to a character array, the string replacement is done as expected.
When I stepped into the code, I came across the following lines (excerpt from /trunk/boost/range/as_literal.hpp):
template< class Char, std::size_t sz > inline iterator_range
as_literal( const Char (&arr)[sz] ) { return boost::make_iterator_range( arr, arr + sz - 1 ); } There is no check for a null terminator. Nor did I find one after this function returned. I think the author here assumed that a caller would always pass in a character array of the right size.
Character array seems to have three views.
1. array: SIZE("ab\0c") == 5; 2. literal: SIZE("ab\0c") == 4; 3. null-terminated: SIZE("ab\0c") == 2;
Both 2. and 3. is forced into boost::as_array. "One name, two facilities" usually makes trouble. IMO, `as_c_str` should have been added for 3.
I agree we could add a new function. Can we come up with a better name than as_c_str()? Or should we simply make as_literal() scan the array? -Thorsten
Hi, Thorsten Ottosen wrote:
shunsuke skrev:
Richard Turrin wrote:
Hi,
I've encountered a bug with ireplace_all from the Boost String Algorithms library for version 1.35. If I pass a character array for the Format argument, the function ignores the null terminator and instead uses the entire array minus the last character to replace the occurrences of the Search string in the Input parameter. However, if I provide a pointer to a character array, the string replacement is done as expected.
When I stepped into the code, I came across the following lines (excerpt from /trunk/boost/range/as_literal.hpp):
template< class Char, std::size_t sz > inline iterator_range
as_literal( const Char (&arr)[sz] ) { return boost::make_iterator_range( arr, arr + sz - 1 ); } There is no check for a null terminator. Nor did I find one after this function returned. I think the author here assumed that a caller would always pass in a character array of the right size.
Character array seems to have three views.
1. array: SIZE("ab\0c") == 5; 2. literal: SIZE("ab\0c") == 4; 3. null-terminated: SIZE("ab\0c") == 2;
Both 2. and 3. is forced into boost::as_array. "One name, two facilities" usually makes trouble. IMO, `as_c_str` should have been added for 3.
I agree we could add a new function. Can we come up with a better name than as_c_str()?
Or should we simply make as_literal() scan the array?
I think, that as_literal should behave the same way for character arrays and for c-string. Different behaviour is unexpexted here. The only difference that could be beneficial is that for arrays, the scanning for nulls can be bounded to protect agains the buffer overruns. Regards, Pavol.
Pavol Droba skrev:
Hi,
Thorsten Ottosen wrote:
I agree we could add a new function. Can we come up with a better name than as_c_str()?
Or should we simply make as_literal() scan the array?
I think, that as_literal should behave the same way for character arrays and for c-string. Different behaviour is unexpexted here.
The only difference that could be beneficial is that for arrays, the scanning for nulls can be bounded to protect agains the buffer overruns.
By an assertion, I pressume. Would you commit a fix to trunk then? -Thorsten
Hi, Thorsten Ottosen wrote:
Pavol Droba skrev:
Hi,
Thorsten Ottosen wrote:
I agree we could add a new function. Can we come up with a better name than as_c_str()?
Or should we simply make as_literal() scan the array?
I think, that as_literal should behave the same way for character arrays and for c-string. Different behaviour is unexpexted here.
The only difference that could be beneficial is that for arrays, the scanning for nulls can be bounded to protect agains the buffer overruns.
By an assertion, I pressume.
Not necessary. Why couldn't we take the whole array as a string when no null character is present? I don't see any problem with this approach.
Would you commit a fix to trunk then?
I can have a look at it. Regards, Pavol.
Hi, shunsuke wrote:
Pavol Droba wrote:
Not necessary. Why couldn't we take the whole array as a string when no null character is present?
It means as_literal will not have constant complexity?
as_literal doesn't have contant complexity even now, since it works with c-strings. It has constant complexity only in case of arrays and that's only because the implementation is wrong. Regards, Pavol.
Pavol Droba wrote:
Hi,
shunsuke wrote:
Pavol Droba wrote:
Not necessary. Why couldn't we take the whole array as a string when no null character is present? It means as_literal will not have constant complexity?
as_literal doesn't have contant complexity even now, since it works with c-strings. It has constant complexity only in case of arrays and that's only because the implementation is wrong.
If so, as_literal should have been named as "as_c_str", IMO. Regards, -- Shunsuke Sogame
shunsuke wrote:
Pavol Droba wrote:
Hi,
shunsuke wrote:
Pavol Droba wrote:
Not necessary. Why couldn't we take the whole array as a string when no null character is present? It means as_literal will not have constant complexity?
as_literal doesn't have contant complexity even now, since it works with c-strings. It has constant complexity only in case of arrays and that's only because the implementation is wrong.
If so, as_literal should have been named as "as_c_str", IMO.
Since as_literal is used quite some time already, I don't see any compelling reason to rename it. Regards, Pavol.
Pavol Droba wrote:
shunsuke wrote:
Pavol Droba wrote:
Hi,
shunsuke wrote:
Pavol Droba wrote:
Not necessary. Why couldn't we take the whole array as a string when no null character is present? It means as_literal will not have constant complexity?
as_literal doesn't have contant complexity even now, since it works with c-strings. It has constant complexity only in case of arrays and that's only because the implementation is wrong. If so, as_literal should have been named as "as_c_str", IMO.
Since as_literal is used quite some time already, I don't see any compelling reason to rename it.
I agree as_literal can't be renamed. Now we have to look for a new name? `SIZE(as_???("ab\0c")) == 4` as_??? returns in constant time. I don't find the name yet. :-) Regards, -- Shunsuke Sogame
shunsuke wrote:
Pavol Droba wrote:
shunsuke wrote:
Pavol Droba wrote:
Hi,
shunsuke wrote:
Pavol Droba wrote:
Not necessary. Why couldn't we take the whole array as a string when no null character is present? It means as_literal will not have constant complexity?
as_literal doesn't have contant complexity even now, since it works with c-strings. It has constant complexity only in case of arrays and that's only because the implementation is wrong. If so, as_literal should have been named as "as_c_str", IMO.
Since as_literal is used quite some time already, I don't see any compelling reason to rename it.
I agree as_literal can't be renamed. Now we have to look for a new name? `SIZE(as_???("ab\0c")) == 4`
as_??? returns in constant time. I don't find the name yet. :-)
I'm not sure if this case is important enough to get special as_XXX threatment. IMHO if you need nulls inside a string, you should use std::string or just a plain array. literal with null is dangerous at least. Regards, Pavol.
Pavol Droba wrote:
Since as_literal is used quite some time already, I don't see any compelling reason to rename it. I agree as_literal can't be renamed. Now we have to look for a new name? `SIZE(as_???("ab\0c")) == 4`
as_??? returns in constant time. I don't find the name yet. :-)
I'm not sure if this case is important enough to get special as_XXX threatment.
IMHO if you need nulls inside a string, you should use std::string or just a plain array. literal with null is dangerous at least.
\0 is not so important in this case.
iterator_range
shunsuke wrote:
Pavol Droba wrote:
Since as_literal is used quite some time already, I don't see any compelling reason to rename it. I agree as_literal can't be renamed. Now we have to look for a new name? `SIZE(as_???("ab\0c")) == 4`
as_??? returns in constant time. I don't find the name yet. :-)
I'm not sure if this case is important enough to get special as_XXX threatment.
IMHO if you need nulls inside a string, you should use std::string or just a plain array. literal with null is dangerous at least.
\0 is not so important in this case. iterator_range
r = as_literal("million characters which contains no null..."); incurs significant overhead even if you know the string contains no null.
Well, if the performance is the problem, you can always construct the range manually. In this case it could be done in compile time. This is a special case and I don't think, that it needs to be handled on the generic library level. But at the end, as_literal is a part of range library, so it is up to Thorsten to decide. Pavol.
Pavol Droba skrev:
shunsuke wrote:
Pavol Droba wrote:
Since as_literal is used quite some time already, I don't see any compelling reason to rename it. I agree as_literal can't be renamed. Now we have to look for a new name? `SIZE(as_???("ab\0c")) == 4`
as_??? returns in constant time. I don't find the name yet. :-)
I'm not sure if this case is important enough to get special as_XXX threatment.
IMHO if you need nulls inside a string, you should use std::string or just a plain array. literal with null is dangerous at least. \0 is not so important in this case. iterator_range
r = as_literal("million characters which contains no null..."); incurs significant overhead even if you know the string contains no null. Well, if the performance is the problem, you can always construct the range manually. In this case it could be done in compile time.
This is a special case and I don't think, that it needs to be handled on the generic library level.
But at the end, as_literal is a part of range library, so it is up to Thorsten to decide.
I don't have good opinion on this, since I haven't used embedded null's very frequently (never actually). I think I'm leaning towards the "just make it work" camp, even if it produces non-optimal code. The question remains then if we should add another function to get the current behavior. I don't really care yet. -Thorsten
participants (4)
-
Pavol Droba
-
Richard Turrin
-
shunsuke
-
Thorsten Ottosen