
Hi All, I'm finishing a preliminary version of a class with similar behavior like Java Digester. In this class the developer defines triggers witch will be fired when specific tags are found from XML file. Let me explain better with an example: A simple approach will be to use the following Digester in the following way to set up the parsing rules, and then process an input file containing this document: digester = new boost::xml_digester::xml_digester((std::istream*)&st); digester->setValidating( false ); digester->addObjectCreate<classFoo>("foo"); digester->addObjectCreate<classBar>("foo/bar"); digester->addCallMethod<classBar>("foo/bar/prop1", &classBar::setProp1); digester->addSetProperty<classBar>("foo/bar/prop2", &classBar::prop2); digester->addSetNext<classFoo>("foo/bar", &classFoo::addBar); digester->parse(); .... class classBar { std::string prop1; public: std::string prop2; void setProp1(std::string value) { prop1 = value; } }; class classFoo { std::vector<classBar*>obj_bars; public: void addBar(void *instance) { obj_bars.push_back((classBar*)instance); } }; XML: <?xml version='1.0' encoding='UTF-8'?> <foo> <bar> <prop1>xxxxxx</prop1> <prop2>yyyyyy</prop2> </bar> <bar> <prop1>zzzzzz</prop1> <prop2>kkkkkk</prop2> </bar> </foo> In order, these rules do the following tasks: 1. When a nested <foo> element is encountered, create a new instance of "classFoo" and push it on to the object stack. At the end of the <foo> element, this object will be popped off of the stack. 2. When a nested <bar> element is encountered, create a new instance of "classBar" and push it on to the object stack. At the end of the <bar> element, this object will be popped off of the stack (i.e. after the remaining rules matching foo/bar are processed). 3. Cause method setProp1 of class "classBar" to be called each time "prop1" is loaded. A std::string type parameter is passed as the value. 4. Cause property "prop2" of class "classBar" to be set each time "prop2" is loaded. The property must have public scope and its type must be std::string. 5. Cause the "addBar" method of class "classFoo" to be called, passing the instance of "classBar" created in the step 2. This rule allow to establish a parent/child relationship between "classFoo" and "classBar". I think that with these features the developer might do everything that he need to parse XML files to objects in 90% of cases. Of course the Java Digester is more powerful and complex and for next releases we can improve it. Regards, -- Themis Vassiliadis

Themis, I have a couple of questions / comments that I think already came up when you reached out for comments a couple of months ago: I believe what you suggest is a useful feature to have. However, I don't think the fact that the file format that you read in is using XML has any particular impact on the features. In other words, you may provide an API that is agnostic to the file format, and then implement it using different 'backends', one of which is based on XML. In particular, couldn't you implement the whole library in terms of boost.serialization ? What about the boost.xml sandbox project (of which I'm the author) ? Thanks, Stefan -- ...ich hab' noch einen Koffer in Berlin...

Themis Vassiliadis wrote:
I'm finishing a preliminary version of a class with similar behavior like Java Digester.
In this class the developer defines triggers witch will be fired when specific tags are found from XML file.
Let me explain better with an example:
A simple approach will be to use the following Digester in the following way to set up the parsing rules, and then process an input file containing this document:
digester = new boost::xml_digester::xml_digester((std::istream*)&st);
digester->setValidating( false );
digester->addObjectCreate<classFoo>("foo"); digester->addObjectCreate<classBar>("foo/bar");
digester->addCallMethod<classBar>("foo/bar/prop1", &classBar::setProp1); digester->addSetProperty<classBar>("foo/bar/prop2", &classBar::prop2);
digester->addSetNext<classFoo>("foo/bar", &classFoo::addBar);
digester->parse();
....
class classBar { std::string prop1; public: std::string prop2;
void setProp1(std::string value) { prop1 = value; } };
class classFoo { std::vector<classBar*>obj_bars; public: void addBar(void *instance) { obj_bars.push_back((classBar*)instance); } };
XML:
<?xml version='1.0' encoding='UTF-8'?> <foo> <bar> <prop1>xxxxxx</prop1> <prop2>yyyyyy</prop2> </bar> <bar> <prop1>zzzzzz</prop1> <prop2>kkkkkk</prop2> </bar> </foo>
Hi Themis, Here is approximately how I would do that using RapidXML: xml_document<char> doc; // (I'll skip the file access stuff. I tend to use mmap(). That just // complicates things here.) for (xml_node<char> foo_node = doc.first_node("foo"); foo_node; foo_node = foo_node->next_sibling("foo")) { classFoo foo; for (xml_node<char> bar_node = foo_node.first_node("bar"); bar_node; bar_node = bar_node->next_sibling("bar")) { classBar* bar_p = new classBar; xml_node<char> prop1_node = bar_node->first_node("prop1"); if (prop1_node) { bar_p->setProp1(prop1_node.value()); } xml_node<char> prop2_node = bar_node->first_node("prop2"); if (prop2_node) { bar_p->prop2 = prop2_node.value(); } foo.addBar(bar_p); } } It's true that your code is more concise, but it's not *much* more concise; on the other hand, it's another layer of stuff to learn and it is inevitably less flexible than doing it "by hand". Perhaps the problem is that your example is too trivial to demonstrate the real advantage of the approach. I think that it would be worth investigating how Spirit or something spirit-like could be applied to this problem (PSEUDO-CODE): rule_t prop1 = element("prop1"); rule_t prop2 = element("prop2"); rule_t bar = element("bar")(*(prop1|prop2)); rule_t foo = element("foo")(*bar); rule_t doc = *foo; doc.parse(input); That's missing the semantic actions, which I have always considered Spirit's weak point; I believe Spirit2 does better but I haven't investigated. Perhaps a domain-specific-language for writing DTDs is possible? Cheers, Phil.

Many libraries for many languages read XML configuration files. There are several ways of doind this, and the XML Digester library was designed to provide a common implementation that can be used in many different projects. The main purpose is to provide an easy and friendly tool, that allow developers to parse XML structures into objects by just mapping tags. To do so, the developer only must know the path (of TAG) information is stored. There is nothing to do with serialization because the purpose is just a XML reader focused on mapped tags. A common mistake is to think in Digester as a nonflexible way to read XML, lets just suppose that you have an Application like DBDesign (A mysql tool for Database model) that store its configuration in XML, and read different configurations provided by thirds. The structure of these XMLs might be different and in this case you will find hard to develop a code to read these XMLs (some tags can be ommited or added modifying the original structure). You might need something flexible to read any structure that can be provided and can even validate it, and when you reach this point, you will find that Digester can do it easily with just a couple of lines. Spirit is widely applicable for several things and XML Digester is just for read XML Files speedily and easily with less memory usage. -- Themis Vassiliadis On Mon, Jan 5, 2009 at 9:08 PM, Phil Endecott <spam_from_boost_dev@chezphil.org> wrote:
Themis Vassiliadis wrote:
I'm finishing a preliminary version of a class with similar behavior like Java Digester.
In this class the developer defines triggers witch will be fired when specific tags are found from XML file.
Let me explain better with an example:
A simple approach will be to use the following Digester in the following way to set up the parsing rules, and then process an input file containing this document:
digester = new boost::xml_digester::xml_digester((std::istream*)&st);
digester->setValidating( false );
digester->addObjectCreate<classFoo>("foo"); digester->addObjectCreate<classBar>("foo/bar");
digester->addCallMethod<classBar>("foo/bar/prop1", &classBar::setProp1); digester->addSetProperty<classBar>("foo/bar/prop2", &classBar::prop2);
digester->addSetNext<classFoo>("foo/bar", &classFoo::addBar);
digester->parse();
....
class classBar { std::string prop1; public: std::string prop2;
void setProp1(std::string value) { prop1 = value; } };
class classFoo { std::vector<classBar*>obj_bars; public: void addBar(void *instance) { obj_bars.push_back((classBar*)instance); } };
XML:
<?xml version='1.0' encoding='UTF-8'?> <foo> <bar> <prop1>xxxxxx</prop1> <prop2>yyyyyy</prop2> </bar> <bar> <prop1>zzzzzz</prop1> <prop2>kkkkkk</prop2> </bar> </foo>
Hi Themis,
Here is approximately how I would do that using RapidXML:
xml_document<char> doc; // (I'll skip the file access stuff. I tend to use mmap(). That just // complicates things here.)
for (xml_node<char> foo_node = doc.first_node("foo"); foo_node; foo_node = foo_node->next_sibling("foo")) { classFoo foo; for (xml_node<char> bar_node = foo_node.first_node("bar"); bar_node; bar_node = bar_node->next_sibling("bar")) { classBar* bar_p = new classBar; xml_node<char> prop1_node = bar_node->first_node("prop1"); if (prop1_node) { bar_p->setProp1(prop1_node.value()); } xml_node<char> prop2_node = bar_node->first_node("prop2"); if (prop2_node) { bar_p->prop2 = prop2_node.value(); } foo.addBar(bar_p); } }
It's true that your code is more concise, but it's not *much* more concise; on the other hand, it's another layer of stuff to learn and it is inevitably less flexible than doing it "by hand".
Perhaps the problem is that your example is too trivial to demonstrate the real advantage of the approach.
I think that it would be worth investigating how Spirit or something spirit-like could be applied to this problem (PSEUDO-CODE):
rule_t prop1 = element("prop1"); rule_t prop2 = element("prop2"); rule_t bar = element("bar")(*(prop1|prop2)); rule_t foo = element("foo")(*bar); rule_t doc = *foo; doc.parse(input);
That's missing the semantic actions, which I have always considered Spirit's weak point; I believe Spirit2 does better but I haven't investigated.
Perhaps a domain-specific-language for writing DTDs is possible?
Cheers, Phil.
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
participants (3)
-
Phil Endecott
-
Stefan Seefeld
-
Themis Vassiliadis