
Greetings,
Have been experimenting with the tokenizer class('s) in Boost and have
come to a bit of an impass.
What I am attempting to do is write an email client. Communication is
okay, though what I am stuck on is breaking apart the header of the
email into meaningful information. As according to RFC822 each header
field is sepperated by \r\n (CRLF) I want to use this as a deliminator
for the email message.
Unfortunantly each field in the header can be spread over multiple lines
~ and thereby when I use \r\n as a deliminator with boost it will
seperate strings if their is a \r or a \n. This is not what I want to
happen.
What I would like to do is sepperate on \r\n and only on that string and
if there is a "\n " (\n then a space) ignore it (or preferably get rid
of the \n and join the two strings).
The code that I am using to create the tokenizer is as follows.
boost::char_separator<char> deliminator(crlf, "", boost::drop_empty_tokens);
boost::tokenizer

... each header field is sepperated by \r\n (CRLF) I want to use this as a deliminator for the email message. Unfortunantly each field in the header can be spread over multiple lines ~ and thereby when I use \r\n as a deliminator with boost it will seperate strings if their is a \r or a \n. This is not what I want to happen.
I can think of two approaches: 1. Split by \r\n as you're doing, then post-process them in a for loop and glue together strings that start with a space. 2. Use Spirit to parse them. (or "hapy", or maybe regex). I think if I was just aiming for 2-3 certain headers I'd use spirit and then I could hard-code the header name and have the parser call directly e.g. the "subject" callback. Darren

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Thanks, Will look into those options. Later, Matthew Delves http://www.webmastermattd.net Darren Cook wrote: |> ... each header |> field is sepperated by \r\n (CRLF) I want to use this as a deliminator |> for the email message. |> Unfortunantly each field in the header can be spread over multiple lines |> ~ and thereby when I use \r\n as a deliminator with boost it will |> seperate strings if their is a \r or a \n. This is not what I want to |> happen. | | | I can think of two approaches: | 1. Split by \r\n as you're doing, then post-process them in a for loop | and glue together strings that start with a space. | | 2. Use Spirit to parse them. (or "hapy", or maybe regex). | | I think if I was just aiming for 2-3 certain headers I'd use spirit and | then I could hard-code the header name and have the parser call directly | e.g. the "subject" callback. | | Darren | -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.3 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFAMybwjrGi6JJmLdkRAiuhAJ9obYRkrqbFouLBgvczerPzX+jyKgCfUCM2 UtXk6zeMwL+uiOFAZ2OvjaI= =WcIJ -----END PGP SIGNATURE-----

Darren Cook wrote:
2. Use Spirit to parse them. ( or "hapy", or maybe regex).
There is a package "rfc821" by Martijn W. van der Lee in the Spirit applications repository at http://tinyurl.com/29mcn
From the blurb:
"This is an example using Spirit to verify RFC821-compliant e-mail addresses." Regards, -- Angus
participants (3)
-
Angus Leeming
-
Darren Cook
-
Matthew Delves