tokenize string delimiter
Hi, Is it possibleto tokenize a string based on a string delimiter? Thanks, Lloyd ______________________________________ Scanned and protected by Email scanner
yes, see the string algorithms library On Tue, Feb 9, 2010 at 4:30 PM, Lloyd <lloyd@cdactvm.in> wrote:
Hi, Is it possibleto tokenize a string based on a string delimiter?
Thanks, Lloyd
______________________________________ Scanned and protected by Email scanner
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
On Tue, Feb 09, 2010 at 02:00:10PM +0530, Lloyd wrote:
Is it possibleto tokenize a string based on a string delimiter?
Let str be the string to split and delim your delimiter. Then, std::vector<std::string> result; boost::iter_split(result, str, boost::first_finder(delim)); Or case insensitive: std::vector<std::string> result; boost::iter_split(result, str, boost::first_finder(delim, boost::is_iequal())); HTH, Matthias -- Matthias Vallentin vallentin@icsi.berkeley.edu http://www.icir.org/matthias
Thanks, it worked... ----- Original Message ----- From: "Matthias Vallentin" <vallentin@icsi.berkeley.edu> To: <boost-users@lists.boost.org> Sent: Wednesday, February 10, 2010 7:28 AM Subject: Re: [Boost-users] tokenize string delimiter
On Tue, Feb 09, 2010 at 02:00:10PM +0530, Lloyd wrote:
Is it possibleto tokenize a string based on a string delimiter?
Let str be the string to split and delim your delimiter. Then,
std::vector<std::string> result; boost::iter_split(result, str, boost::first_finder(delim));
Or case insensitive:
std::vector<std::string> result; boost::iter_split(result, str, boost::first_finder(delim, boost::is_iequal()));
HTH,
Matthias -- Matthias Vallentin vallentin@icsi.berkeley.edu http://www.icir.org/matthias _______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
______________________________________ Scanned and protected by Email scanner
On Feb 9, 2010, at 5:58 PM, Matthias Vallentin wrote:
On Tue, Feb 09, 2010 at 02:00:10PM +0530, Lloyd wrote:
Is it possibleto tokenize a string based on a string delimiter?
Let str be the string to split and delim your delimiter. Then,
std::vector<std::string> result; boost::iter_split(result, str, boost::first_finder(delim));
Or case insensitive:
std::vector<std::string> result; boost::iter_split(result, str, boost::first_finder(delim, boost::is_iequal()));
Can anyone tell me why iter_split (and split, etc) requires a vector<string> to hold the results? (or, more generally, a container) Is there some technical reason why it doesn't take an output iterator as a parameter, so it could be called like this? std::vector<std::string> result; boost::iter_split ( std::back_inserter<string> (result), str, boost::first_finder(delim)); or even: boost::iter_split ( std::ostream_iterator<std::string>(std::cout, ", "), str, boost::first_finder(delim)); (thereby removing the (fixed) link between "iter_split" and "std::vector<std::string>") -- Marshall
On Wed, Feb 10, 2010 at 04:57:41PM -0800, Marshall Clow wrote:
Can anyone tell me why iter_split (and split, etc) requires a vector<string> to hold the results? (or, more generally, a container)
In general, the split function receives a string as input and returns an array. (BTW, this is consistent with the majority of scripting languages, such as Ruby, Python, etc.) Because the split function a priori does not know in how many parts it will chop the input string, and it is likely to be more than one, a vector fits naturally for this task.
Is there some technical reason why it doesn't take an output iterator as a parameter, so it could be called like this?
std::vector<std::string> result; boost::iter_split ( std::back_inserter<string> (result), str, boost::first_finder(delim)); or even: boost::iter_split ( std::ostream_iterator<std::string>(std::cout, ", "), str, boost::first_finder(delim));
What you describe above is isomorphic to removing a delimiter from the string and is a separate problem. There are separate functions for this. Matthias -- Matthias Vallentin vallentin@icsi.berkeley.edu http://www.icir.org/matthias
Matthias Vallentin wrote:
On Wed, Feb 10, 2010 at 04:57:41PM -0800, Marshall Clow wrote:
Can anyone tell me why iter_split (and split, etc) requires a vector<string> to hold the results? (or, more generally, a container)
In general, the split function receives a string as input and returns an array. (BTW, this is consistent with the majority of scripting languages, such as Ruby, Python, etc.) Because the split function a priori does not know in how many parts it will chop the input string, and it is likely to be more than one, a vector fits naturally for this task.
I think what Marshall is saying is that an output iterator for example as provided by std::back_inserter is a more natural, less limiting fit. If the final destination is not a vector, why copy to a vector, just to then copy from the vector to the final destination. Jeff
On Thu, Feb 11, 2010 at 2:09 PM, Jeff Flinn <TriumphSprint2000@hotmail.com> wrote:
Matthias Vallentin wrote:
On Wed, Feb 10, 2010 at 04:57:41PM -0800, Marshall Clow wrote:
Can anyone tell me why iter_split (and split, etc) requires a vector<string> to hold the results? (or, more generally, a container)
In general, the split function receives a string as input and returns an array. (BTW, this is consistent with the majority of scripting languages, such as Ruby, Python, etc.) Because the split function a priori does not know in how many parts it will chop the input string, and it is likely to be more than one, a vector fits naturally for this task.
I think what Marshall is saying is that an output iterator for example as provided by std::back_inserter is a more natural, less limiting fit. If the final destination is not a vector, why copy to a vector, just to then copy from the vector to the final destination.
You can do that off the bat using Boost.Spirit2.1+ like this: Let str be the string to split and delim your delimiter. Then, // Pick any of these result types, they all work, along with anything else that support push_back and a few other things std::vector<std::string> result; std::string result; myCustomClassThatSupports_push_back result; parse(str.begin(),str.end(), raw[+~char_(delim)]%lit(delim), result); Or case insensitive: parse(str.begin(),str.end(), nocase[raw[+~char_(delim)]%lit(delim)], result); And there is *SO*MUCH* more you can do with spirit too, and it executes faster then the tokanizer and such as well, while working on input iterators, etc...
On 02/11/2010 03:22 PM, OvermindDL1 wrote:
You can do that off the bat using Boost.Spirit2.1+ like this: Or case insensitive:
parse(str.begin(),str.end(), nocase[raw[+~char_(delim)]%lit(delim)], result);
And there is *SO*MUCH* more you can do with spirit too, and it executes faster then the tokanizer and such as well, while working on input iterators, etc...
Spirit seems to have the same operator-overloaded "write-only code" quality that Perl suffers from. The implementation may be the greatest thing ever, but the end result is a bit worrisome. The string algorithm library's syntax is far more reasonable IMO. Rob
On Thu, Feb 11, 2010 at 10:53 PM, Matthias Vallentin <vallentin@icsi.berkeley.edu> wrote:
On Thu, Feb 11, 2010 at 03:22:53PM -0700, OvermindDL1 wrote:
std::string result; parse(str.begin(),str.end(), raw[+~char_(delim)]%lit(delim), result);
This fails in the first argument of qi::parse:
using boost::spirit::qi::lit; using boost::spirit::qi::raw; using boost::spirit::qi::string;
std::string str("foo---bar---baz"); std::string delim("---"); std::string result; boost::spirit::qi::parse(str.begin(), str.end(), raw[+~string(delim)] % lit(delim), result);
Any ideas why?
If you are using GCC (which is actually doing things correctly, the above code works on MSVC, which is not quite correct, but this code works everywhere), then do this for the parse line instead: std::string::const_iterator iter=str.begin(),iterEnd=str.end(); boost::spirit::qi::parse(iter, iterEnd, raw[+~string(delim)] % lit(delim), result); On Thu, Feb 11, 2010 at 8:17 PM, Rob Riggs <rob@pangalactic.org> wrote:
On 02/11/2010 03:22 PM, OvermindDL1 wrote:
You can do that off the bat using Boost.Spirit2.1+ like this: Or case insensitive:
parse(str.begin(),str.end(), nocase[raw[+~char_(delim)]%lit(delim)], result);
And there is *SO*MUCH* more you can do with spirit too, and it executes faster then the tokanizer and such as well, while working on input iterators, etc...
Spirit seems to have the same operator-overloaded "write-only code" quality that Perl suffers from. The implementation may be the greatest thing ever, but the end result is a bit worrisome. The string algorithm library's syntax is far more reasonable IMO.
Not sure of what you speak, I cannot think of any better way to implement anything as powerful as Spirit within the confines of the C++ language without becoming *exceedingly* long and impossible to read... How is the string algorithm libraries syntax better? It is slower and cannot do anywhere near as much (Spirit is a full PEG parser).
Hi all, First problem: -------------- i have this warning. "...... boost/mpl/print.hpp:55: warning: comparison between signed and unsigned integer expressions" Second problem: --------------- I want to serialize a template derivate class like this: //////////////////////////// class Base { private: friend class boost::serialization::access; template<class Archive> void serialize(Archive &, const unsigned int ) { } }; template<typenmame T> class Child: public Base { private: friend class boost::serialization::access; template<class Archive> void serialize(Archive &ar, const unsigned int ) { ar & BOOST_SERIALIZATION_BASE_OBJECT_NVP(Base); } }; BOOST_CLASS_EXPORT_GUID(Child<int>, "Child")//don't work /////////////////////////// The macro "BOOST_CLASS_EXPORT_GUID" don't work with template class. Is there a way to do this ? Thank you, Regards, Damien.
dada@lamef.bordeaux.ensam.fr wrote:
Hi all,
First problem: -------------- i have this warning. "...... boost/mpl/print.hpp:55: warning: comparison between signed and unsigned integer expressions"
the serialization library used this to indicate usage of the library which is permited though probably not a good idea. Check the compile error listing to find what calls this. This should show some place where BOOST_SERIALIZATION_STATIC_WARNING is invoked. There you should find an explanation of why such a warning was emitted.
Second problem: --------------- I want to serialize a template derivate class like this:
//////////////////////////// class Base { private: friend class boost::serialization::access;
template<class Archive> void serialize(Archive &, const unsigned int ) { } };
template<typenmame T> class Child: public Base { private: friend class boost::serialization::access;
template<class Archive> void serialize(Archive &ar, const unsigned int ) { ar & BOOST_SERIALIZATION_BASE_OBJECT_NVP(Base); } };
BOOST_CLASS_EXPORT_GUID(Child<int>, "Child")//don't work ///////////////////////////
The macro "BOOST_CLASS_EXPORT_GUID" don't work with template class. Is there a way to do this ?
I would guess that the <> characters are some sort of problem. Try typedef Child<int> Child_int BOOST_CLASS_EXPORT_GUID(Child_int, "ChildInt) // remember the name has to be unique!! Also there is a track item with a suggestion on how to implement the equivalent of: BOOST_CLASS_EXPORT(Child<int>) Robert Ramey
Thank you, Regards, Damien.
Thanks for your reply, all is ok now. But i need some explanation about understanding static_warning. The warnings are : ------------------------------ ../boost/mpl/print.hpp: In instantiation of ?boost::mpl::print<boost::serialization::STATIC_WARNING_LINE<98> >?: ../boost/serialization/static_warning.hpp:92: instantiated from ?boost::serialization::static_warning_test<false, 98>? ../boost/archive/detail/check.hpp:98: instantiated from ?void boost::archive::detail::check_object_tracking() [with ... ../boost/archive/detail/oserializer.hpp:295: instantiated from ?static void boost::archive::detail::save_non_pointer_type<Archive>::invoke(Archive&, T&) [with T =... ../boost/archive/detail/oserializer.hpp:507: instantiated from ?void boost::archive::save(Archive&, T&) [with Archive = boost::archive::text_oarchive, T = ... ../boost/archive/detail/common_oarchive.hpp:62: instantiated from ?void boost::archive::detail::common_oarchive<Archive>::save_override(T&, int) [with T = .. ../boost/archive/basic_text_oarchive.hpp:75: instantiated from ?void boost::archive::basic_text_oarchive<Archive>::save_override(T&, int) [with T = ... ../boost/archive/detail/interface_oarchive.hpp:64: instantiated from ?Archive& boost::archive::detail::interface_oarchive<Archive>::operator<<(T&) [with T = ... ../testEXE/test.cpp:118: instantiated from here ../boost/mpl/print.hpp:55: warning: comparison between signed and unsigned integer expressions ------------------------------ In fact, i think that this warnings are the consequences of non const object serialization. But i'm not sure... Best regards, Damien. Robert Ramey <ramey@rrsd.com> a écrit :
dada@lamef.bordeaux.ensam.fr wrote:
Hi all,
First problem: -------------- i have this warning. "...... boost/mpl/print.hpp:55: warning: comparison between signed and unsigned integer expressions"
the serialization library used this to indicate usage of the library which is permited though probably not a good idea. Check the compile error listing to find what calls this. This should show some place where BOOST_SERIALIZATION_STATIC_WARNING is invoked. There you should find an explanation of why such a warning was emitted.
Second problem: --------------- I want to serialize a template derivate class like this:
//////////////////////////// class Base { private: friend class boost::serialization::access;
template<class Archive> void serialize(Archive &, const unsigned int ) { } };
template<typenmame T> class Child: public Base { private: friend class boost::serialization::access;
template<class Archive> void serialize(Archive &ar, const unsigned int ) { ar & BOOST_SERIALIZATION_BASE_OBJECT_NVP(Base); } };
BOOST_CLASS_EXPORT_GUID(Child<int>, "Child")//don't work ///////////////////////////
The macro "BOOST_CLASS_EXPORT_GUID" don't work with template class. Is there a way to do this ?
I would guess that the <> characters are some sort of problem. Try typedef Child<int> Child_int
BOOST_CLASS_EXPORT_GUID(Child_int, "ChildInt) // remember the name has to be unique!!
Also there is a track item with a suggestion on how to implement the equivalent of:
BOOST_CLASS_EXPORT(Child<int>)
Robert Ramey
Thank you, Regards, Damien.
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
dada@lamef.bordeaux.ensam.fr wrote:
Thanks for your reply, all is ok now. But i need some explanation about understanding static_warning.
The warnings are : ------------------------------ ../boost/mpl/print.hpp: In instantiation of ?boost::mpl::print<boost::serialization::STATIC_WARNING_LINE<98> >?:
../boost/serialization/static_warning.hpp:92: instantiated from ?boost::serialization::static_warning_test<false, 98>?
../boost/archive/detail/check.hpp:98: instantiated from ?void boost::archive::detail::check_object_tracking() [with ...
Here is the code at line 98 of check.hpp template<class T> inline void check_object_tracking(){ // presume it has already been determined that // T is not a const BOOST_STATIC_ASSERT(! boost::is_const<T>::value); typedef BOOST_DEDUCED_TYPENAME mpl::equal_to< serialization::tracking_level<T>, mpl::int_<serialization::track_never>
::type typex; // saving an non-const object of a type not marked "track_never) // may be an indicator of an error usage of the // serialization library and should be double checked. // See documentation on object tracking. Also, see the // "rationale" section of the documenation // for motivation for this checking. BOOST_STATIC_WARNING(typex::value); }
Did you read this? This points you to the "rationale" section of the documentation. Did you read that? This should shed some light on things.
In fact, i think that this warnings are the consequences of non const object serialization. But i'm not sure...
You're on the right track, keep digging. Robert Ramey
Robert Ramey <ramey@rrsd.com> a écrit :
Did you read this? This points you to the "rationale" section of the documentation. Did you read that? This should shed some light on things.
No i don't, i'm confuse...
In fact, i think that this warnings are the consequences of non const object serialization. But i'm not sure...
You're on the right track, keep digging.
Yes, i think so ! Thanks to your help, i will put boost to good use. Best regards, Damien.
On Thu, Feb 11, 2010 at 03:22:53PM -0700, OvermindDL1 wrote:
std::string result; parse(str.begin(),str.end(), raw[+~char_(delim)]%lit(delim), result);
This fails in the first argument of qi::parse: using boost::spirit::qi::lit; using boost::spirit::qi::raw; using boost::spirit::qi::string; std::string str("foo---bar---baz"); std::string delim("---"); std::string result; boost::spirit::qi::parse(str.begin(), str.end(), raw[+~string(delim)] % lit(delim), result); Any ideas why? Matthias -- Matthias Vallentin vallentin@icsi.berkeley.edu http://www.icir.org/matthias
The first argument to qi::parse is a non-const reference (which fails on the temporary you're passing in). Try: std::string::iterator ii = str.begin(); boost::spirit::qi::parse(ii, str.end(), raw[+~string(delim)] % lit(delim), result); Spirit uses the first parameter to return how far it got in the parsing of your string. hth, Brian On 12 February 2010 05:53, Matthias Vallentin <vallentin@icsi.berkeley.edu>wrote:
On Thu, Feb 11, 2010 at 03:22:53PM -0700, OvermindDL1 wrote:
std::string result; parse(str.begin(),str.end(), raw[+~char_(delim)]%lit(delim), result);
This fails in the first argument of qi::parse:
using boost::spirit::qi::lit; using boost::spirit::qi::raw; using boost::spirit::qi::string;
std::string str("foo---bar---baz"); std::string delim("---"); std::string result; boost::spirit::qi::parse(str.begin(), str.end(), raw[+~string(delim)] % lit(delim), result);
Any ideas why?
Matthias -- Matthias Vallentin vallentin@icsi.berkeley.edu http://www.icir.org/matthias _______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
On Fri, Feb 12, 2010 at 08:57:53AM +0000, Brian O'Kennedy wrote:
The first argument to qi::parse is a non-const reference (which fails on the temporary you're passing in).
That makes sense (and fixed it).
Try:
std::string::iterator ii = str.begin(); boost::spirit::qi::parse(ii, str.end(), raw[+~string(delim)] % lit(delim), result);
This grammar definition does not work for me, I get an assertion failing with: subject_is_not_negatable, which makes sense as the complement of string is not really defined. When we parse into std::vector<std::string>, this seems to be a straight-forward solution: using namespace boost::spirit::qi; using namespace boost::spirit::karma; std::string str("foo---bar---baz"); std::string delim("---"); std::vector<std::string> result; std::string::iterator i = str.begin(); parse(i, str.end(), +alpha % lit(delim), result); std::cout << format(stream % ", ", result) << std::endl; How do we have to modify the above when result is of type std::string? The example below merely puts "fbb" in the result string: std::string::iterator i = str.begin(); parse(i, str.end(), raw[+alpha] % lit(delim), result); According to attribute composition rules, I would assume the grammar attribute is equivalent to vector<iterator_range<I>>. How does qi::parse determine which function to use for the result? Is it merely using push_back, and hence only the first character is appended?
Spirit uses the first parameter to return how far it got in the parsing of your string.
Why is a const_iterator insufficient? Shouldn't it also work to report the position of the parser? Matthias -- Matthias Vallentin vallentin@icsi.berkeley.edu http://www.icir.org/matthias
On Fri, Feb 12, 2010 at 5:01 PM, Matthias Vallentin <vallentin@icsi.berkeley.edu> wrote:
On Fri, Feb 12, 2010 at 08:57:53AM +0000, Brian O'Kennedy wrote:
The first argument to qi::parse is a non-const reference (which fails on the temporary you're passing in).
That makes sense (and fixed it).
Try:
std::string::iterator ii = str.begin(); boost::spirit::qi::parse(ii, str.end(), raw[+~string(delim)] % lit(delim), result);
This grammar definition does not work for me, I get an assertion failing with: subject_is_not_negatable, which makes sense as the complement of string is not really defined.
Ah, because string was not what I originally put, I put char_, someone else changed that to string and I just copy/pasted what they put. You want char_ there, not string. On Fri, Feb 12, 2010 at 5:01 PM, Matthias Vallentin <vallentin@icsi.berkeley.edu> wrote:
When we parse into std::vector<std::string>, this seems to be a straight-forward solution:
using namespace boost::spirit::qi; using namespace boost::spirit::karma;
std::string str("foo---bar---baz"); std::string delim("---"); std::vector<std::string> result;
std::string::iterator i = str.begin(); parse(i, str.end(), +alpha % lit(delim), result);
std::cout << format(stream % ", ", result) << std::endl;
How do we have to modify the above when result is of type std::string? The example below merely puts "fbb" in the result string:
std::string::iterator i = str.begin(); parse(i, str.end(), raw[+alpha] % lit(delim), result);
According to attribute composition rules, I would assume the grammar attribute is equivalent to vector<iterator_range<I>>. How does qi::parse determine which function to use for the result? Is it merely using push_back, and hence only the first character is appended?
Do note, using alpha (instead of my original solution) will cause it to fail with non a-zA-Z characters. If you want to support any type of string (ala how the tokanizer works), you should use my original solution. On Fri, Feb 12, 2010 at 5:01 PM, Matthias Vallentin <vallentin@icsi.berkeley.edu> wrote:
Spirit uses the first parameter to return how far it got in the parsing of your string.
Why is a const_iterator insufficient? Shouldn't it also work to report the position of the parser?
const_iterator is sufficient, it never changes the values, just that with this: std::string::iterator ii = str.begin(); boost::spirit::qi::parse(ii, str.end(), raw[+~char_(delim)] % lit(delim), result); The entire string was consumed if ii==str.end(), if they do not equal then it hit something it cannot parse (which will not happen with my above given grammar).
Hi, On 11. 2. 2010 22:09, Jeff Flinn wrote:
Matthias Vallentin wrote:
On Wed, Feb 10, 2010 at 04:57:41PM -0800, Marshall Clow wrote:
Can anyone tell me why iter_split (and split, etc) requires a vector<string> to hold the results? (or, more generally, a container)
In general, the split function receives a string as input and returns an array. (BTW, this is consistent with the majority of scripting languages, such as Ruby, Python, etc.) Because the split function a priori does not know in how many parts it will chop the input string, and it is likely to be more than one, a vector fits naturally for this task.
I think what Marshall is saying is that an output iterator for example as provided by std::back_inserter is a more natural, less limiting fit. If the final destination is not a vector, why copy to a vector, just to then copy from the vector to the final destination.
If you would like to see iterator based tokenization, you can use find_iterator and split_iterator that is an underlying engine below iter_split. There are examples in the documentation. In addition you can use reference trick vector<iterator_range> that will not copy string just stores the references to the found matches. In summary, it is up to the user to decide what is more appropriate. vector<string> is more convenient an easier to user, find/split_iterator provides less overhead. Best Regards, Pavol.
participants (11)
-
Brian O'Kennedy
-
dada@lamef.bordeaux.ensam.fr
-
Diederick C. Niehorster
-
Jeff Flinn
-
Lloyd
-
Marshall Clow
-
Matthias Vallentin
-
OvermindDL1
-
Pavol Droba
-
Rob Riggs
-
Robert Ramey