
Jose wrote:
I am currently testing only the read_xml parsing, and although it is only meant for very simple xml files i find its xml support very very sketchy. <snip> 1. parsing the artima.com spotlight feed
Result: FAILED
The path is rdf:RDF.item.title and I get invalid character entitly. I think the parser should support the semicolon within the tag name, given that in many cases the config files might be generated by real xml programs which use namespaces and it should be able to read them even if it does not support save.
The problem here is not the colon. The problem is the " entity, which is a required part of XML but is not supported by boost::property_tree::xml_parser::decode_char_entities() in detail/xml_parser_utils.hpp, lines 62-87. Also not supported is the ' entity, which is also required. Definitely a bug in PropTree.
2. parsing the MSDN visual c++ feed
Result: FAILED
The path is rss.channel.item.title and I get an "xml parse error". Is there a posibility of getting more meaningful errors ?
Do you have an URL?
3. parsing the main CNN feed
Result: FAILED
The path is rss.channel.item.title. This query fails with no error but if the path is shortened to rss.channel.item it dumps all the values within item, but there is no value at that level (only nested tags)
You misunderstand your own program. A node has only one value. What your loop does it retrieve all the children of the node you select with the path and print their values. So for the path rss.channel.item.title, you get the title element of the first item element in the channel. This element has no children, so the loop is never entered. In your second test you specify rss.channel.item, so you get the item element. This element has four children: the title, link, description and pubDate elements. For each of these children, the value (content) is printed. The test succeeded.
4. Parsing the Google News RSS feed
Result: FAILED
The path is rss.channel.item.title. I get "Invalid character entity error". A more meaningful error should be possible with the position in the file where the entity occurs.
Again, the problem seems to be the " entity.
5. Parsing the Google News Atom feed
Result: FAILED
The path is feed.entry.title. I get "Invalid character entity error".
Same. Attached is a patch that fixes the bug. Sebastian Redl