[string_algo] split suggestion

Vladimir Prus

8 Sep 2004 8 Sep '04

6:59 a.m.

Hello, suppose I have a string "module.foo1.port" and want to get the second dot-separated element. I think I can use the 'split' algorithm, but it does not look very convenient, I need to declare container, then call split and then obtain the result. In Qt, there's QString::section method, which allows to do this in one line. Here's example from the docs: QString csv( "forename,middlename,surname,phone" ); QString s = csv.section( ',', 2, 2 ); // s == "surname" And the complete docs are at: http://doc.trolltech.com/3.3/qstring.html#section Maybe, something like this can be added? - Volodya

Show replies by date

Pavol Droba

8 Sep 8 Sep

7:41 a.m.

Hi, On Wed, Sep 08, 2004 at 10:59:32AM +0400, Vladimir Prus wrote:

...

Hello, suppose I have a string "module.foo1.port" and want to get the second dot-separated element. I think I can use the 'split' algorithm, but it does not look very convenient, I need to declare container, then call split and then obtain the result.

You can also use split_iterator. #include <boost/algorithm/string/find_iterator.hpp> #include <boost/algorithm/string/finder.hpp> using namespace boost; std::string str="module.foo1.port"; typedef split_iterator<std::string::iterator> string_split; string_split it(str, token_finder(is_any_of(".")); // *it="module" ++it; // *it="foo1" ++it; // *it="port" ++it; // it.eof()==true it=string_split(); *it is an iterator_range, pointing to the input. You can easily convert it to a string std::string match=copy_iterator_range<std::string>(*it);

...

In Qt, there's QString::section method, which allows to do this in one line. Here's example from the docs:

QString csv( "forename,middlename,surname,phone" ); QString s = csv.section( ',', 2, 2 ); // s == "surname"

And the complete docs are at:

http://doc.trolltech.com/3.3/qstring.html#section

Maybe, something like this can be added?

Seem useful. I will see, how it can be added. Thanks for an idea. Regards, Pavol.

Vladimir Prus

8:04 a.m.

Pavol Droba wrote:

...

...
suppose I have a string "module.foo1.port" and want to get the second dot-separated element. I think I can use the 'split' algorithm, but it does not look very convenient, I need to declare container, then call split and then obtain the result.

You can also use split_iterator.

#include <boost/algorithm/string/find_iterator.hpp> #include <boost/algorithm/string/finder.hpp>

using namespace boost;

std::string str="module.foo1.port"; typedef split_iterator<std::string::iterator> string_split;

string_split it(str, token_finder(is_any_of(".")); // *it="module" ++it; // *it="foo1" ++it; // *it="port" ++it; // it.eof()==true it=string_split();

*it is an iterator_range, pointing to the input. You can easily convert it to a string

std::string match=copy_iterator_range<std::string>(*it);

Ok, noted. Though what I have now: vector<string> parts; split(parts, p.options[i].string_key, is_any_of(".")); if (parts.size() > 2) modules.insert(parts[1]); has roughly the same size. BTW, looking at http://www.boost.org/regression-logs/cs-win32_metacomm/doc/html/class.boost.... I don't see any explanation what's FinderT. Maybe the phrase Split iterator encapsulates a Finder should include a link to the definition of the 'Finder' concept? Also, the name 'token_finder' is a bit misleading. I associate it with item returned by the lexer, which can have several characters. From the docs it seems that the 'token_finder' searches for a single character, so maybe it should be 'char_finder'? It looks like the word 'token' is used in just a couple of places. - Volodya

Pavol Droba

9:31 a.m.

Hello, Wednesday, September 8, 2004, 10:04:24 AM, you wrote:

...

Pavol Droba wrote:

[snip]

...

Ok, noted. Though what I have now:

...

vector<string> parts; split(parts, p.options[i].string_key, is_any_of(".")); if (parts.size() > 2) modules.insert(parts[1]);

...

has roughly the same size.

split has a slightly bigger overhead, since it must make a copy of each part. With the iterator, you are traversing the input and it is up to you what will you do with it. Anyway, for your example, it does not matter too much probably.

...

BTW, looking at

...

http://www.boost.org/regression-logs/cs-win32_metacomm/doc/html/class.boost....

...

I don't see any explanation what's FinderT. Maybe the phrase

...

Split iterator encapsulates a Finder

I see. It seems like a good idea, unfortunately I don't know how to do it. Documentation is doxygen generated and I don't know how to make a link from the reference to the main docs :(

...

should include a link to the definition of the 'Finder' concept? Also, the name 'token_finder' is a bit misleading. I associate it with item returned by the lexer, which can have several characters. From the docs it seems that the 'token_finder' searches for a single character, so maybe it should be 'char_finder'? It looks like the word 'token' is used in just a couple of places.

It does not always search for a single character. If token_compress_mode is enabled, adjancent characters are combined to a single match. I think, that you are right, that token_finder is not the best name for this entity. However, I'm not sure if char_finder is the best fit either. I will think about it for a while. Regards, Pavol.

7627

Age (days ago)

7627

Last active (days ago)

List overview

Download

3 comments

2 participants

participants (2)

Pavol Droba
Vladimir Prus