Hello. Is there any "right way" to parse strings, already splitted with tokenizer? I wrote iterator that can work with this splitted strings, but i also need some "skip parser" to recognize token boundaries as whitespaces and can't design one. -- Best regards, Andrey mailto:blaze@lists.infosec.ru
Andrey Sverdlichenko <blaze@lists.infosec.ru> wrote:
Hello.
Is there any "right way" to parse strings, already splitted with tokenizer? I wrote iterator that can work with this splitted strings, but i also need some "skip parser" to recognize token boundaries as whitespaces and can't design one.
Could you please be more specific? -- Joel de Guzman joel at boost-consulting.com http://www.boost-consulting.com http://spirit.sf.net
Friday, July 25, 2003, 3:47:41 AM, you wrote:
Hello.
Is there any "right way" to parse strings, already splitted with tokenizer? I wrote iterator that can work with this splitted strings, but i also need some "skip parser" to recognize token boundaries as whitespaces and can't design one.
Could you please be more specific?
This is a sample code. It parses both numbers in data string as one single number and i need to separate them. #include <boost/tokenizer.hpp> #include <boost/spirit/core.hpp> #include <list> #include <iostream> typedef boost::char_separator<std::string::value_type> Separator; typedef boost::tokenizer<Separator> Tokenizer; class tok_iterator : public std::iterator<std::forward_iterator_tag, const std::string::value_type> { public: explicit tok_iterator(const Tokenizer::iterator &curr) : token(curr), offset(0) {} int operator ==(const tok_iterator &other) const { return (token == other.token && offset == other.offset); } int operator !=(const tok_iterator &other) const { return ! (*this == other); } reference operator *(void) const { return (*token)[offset]; } tok_iterator &operator ++(void); private: Tokenizer::iterator token; size_t offset; }; inline tok_iterator & tok_iterator::operator ++(void) { if (++offset >= token->size()) { offset = 0; ++token; } return *this; } int main(void) { using namespace boost::spirit; std::string data("55 99"); Separator sep; Tokenizer tok(data, sep); Tokenizer::iterator token = tok.begin(); std::list<u_int> numbers; parse_info<tok_iterator> info = parse(tok_iterator(token), tok_iterator(tok.end()), uint_p[append(numbers)]); std::copy(numbers.begin(), numbers.end(), std::ostream_iterator<u_int>(std::cout, "\n")); return 0; } -- Best regards, Andrey mailto:blaze@lists.infosec.ru
Hi, Is there any advantage in using both tokenizer and spirit? I'm not a tokenizer expert, but it seems that what you are trying to achieve can be done by spirit alone: parse(first, last, uint_p[append(numbers)], space_p); -- Joel de Guzman joel at boost-consulting.com http://www.boost-consulting.com http://spirit.sf.net Andrey Sverdlichenko <blaze@lists.infosec.ru> wrote:
Friday, July 25, 2003, 3:47:41 AM, you wrote:
Hello.
Is there any "right way" to parse strings, already splitted with tokenizer? I wrote iterator that can work with this splitted strings, but i also need some "skip parser" to recognize token boundaries as whitespaces and can't design one.
Could you please be more specific?
This is a sample code. It parses both numbers in data string as one single number and i need to separate them.
#include <boost/tokenizer.hpp> #include <boost/spirit/core.hpp> #include <list> #include <iostream>
typedef boost::char_separator<std::string::value_type> Separator; typedef boost::tokenizer<Separator> Tokenizer;
class tok_iterator : public std::iterator<std::forward_iterator_tag, const std::string::value_type> { public: explicit tok_iterator(const Tokenizer::iterator &curr) : token(curr), offset(0) {} int operator ==(const tok_iterator &other) const { return (token == other.token && offset == other.offset); } int operator !=(const tok_iterator &other) const { return ! (*this == other); } reference operator *(void) const { return (*token)[offset]; }
tok_iterator &operator ++(void);
private: Tokenizer::iterator token; size_t offset; };
inline tok_iterator & tok_iterator::operator ++(void) { if (++offset >= token->size()) { offset = 0; ++token; } return *this; }
int main(void) { using namespace boost::spirit;
std::string data("55 99"); Separator sep; Tokenizer tok(data, sep); Tokenizer::iterator token = tok.begin(); std::list<u_int> numbers;
parse_info<tok_iterator> info = parse(tok_iterator(token), tok_iterator(tok.end()), uint_p[append(numbers)]);
std::copy(numbers.begin(), numbers.end(), std::ostream_iterator<u_int>(std::cout, "\n"));
return 0; }
Saturday, July 26, 2003, 4:08:52 AM, you wrote: JdG> Is there any advantage in using both tokenizer and spirit? I'm not a JdG> tokenizer expert, but it seems that what you are trying to achieve can JdG> be done by spirit alone: JdG> parse(first, last, uint_p[append(numbers)], space_p); It was just a sample. Before this numbers there is command line that i prefer to parse with tokenizer (i really have reasons for this) and after i split this string there is no way to convert tokenizer::iterator to string::iterator. -- Best regards, Andrey mailto:blaze@lists.infosec.ru
Andrey Sverdlichenko <blaze@lists.infosec.ru> wrote:
Saturday, July 26, 2003, 4:08:52 AM, you wrote:
JdG> Is there any advantage in using both tokenizer and spirit? I'm not a JdG> tokenizer expert, but it seems that what you are trying to achieve can JdG> be done by spirit alone:
JdG> parse(first, last, uint_p[append(numbers)], space_p);
It was just a sample. Before this numbers there is command line that i prefer to parse with tokenizer (i really have reasons for this) and after i split this string there is no way to convert tokenizer::iterator to string::iterator.
Hi, Can we move this discussion over to Spirit's ML? Spirit-general mailing list Spirit-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/spirit-general As I said, I am not a tokenizer expert. Anyway, perhaps what you want to do is to make the tokenizer output your *lexer*. Please check out the C++ lexer (written by JCAB). It provides a really good example on how to use a lexer with Spirit. The idea is to pass a *token stream* to Spirit instead of the raw character stream. For that to work, you'll probably have to do some work with the tokenizer output to assign individual tokens to the parsed input. For instance, say we have an input: "123 456 a big brown fox" your token stream might look like: struct token { char ID; // int, string... etc. Iter first; Iter last; }; You'll probably need Spirit to parse the individual lexical tokens from the tokenizer output. Pseudocode: vector <token> for each tokenizer output if parse(tokenizer[i], lex_rule).full vector.push_back(token(tokenizer[i])) else lexical error!!! Then, pass in your vector of tokens to Spirit. Be sure to have ==, != and < operators to/from your token::ID so you can write (Example): char const INT_TOK= 1; char const STR_TOK = 2; r = ch_p(INT_TOK) | STR_TOK; start_ = *r; HTH. If you have further questions, let's continue the discussion in Spirit's ML. Regards, -- Joel de Guzman joel at boost-consulting.com http://www.boost-consulting.com http://spirit.sf.net
Hi all, I found in the boost yahoo group a file containing a proposal version of a boost::serialization lib, but it seemed that it has been rejected, according to what I read in the archive of the mailing list. I was not able to find any newer information about the status of this library, so I'd like to ask if any of you have heard news about this library. Thanks a lot Alexis
"Alexis" <alexismajordomo@yahoo.co.uk> wrote in message news:AC971E8C-BEA3-11D7-BB78-000393AF93B6@yahoo.co.uk...
Hi all,
I found in the boost yahoo group a file containing a proposal version of a boost::serialization lib, but it seemed that it has been rejected, according to what I read in the archive of the mailing list. I was not able to find any newer information about the status of this library, so I'd like to ask if any of you have heard news about this library.
It is on the way. http://aspn.activestate.com/ASPN/Mail/Message/boost/1741081 /Pavel
participants (4)
-
Alexis
-
Andrey Sverdlichenko
-
Joel de Guzman
-
Pavel Vozenilek