Re: [Boost-users] Boost Serialization make_binary & XML/ASCII
Terence, We use Spirit in a similar way to SAX: we read binary data from a serial port, assemble it into discrete units (messages) and push them through parser generated by Spirit, which in turn calls a function upon finding a match. Some of our messages are multi-part, so we need to keep some form of state that allows us to assemble the final message. I would imagine that there would be a way to break from the parsing. -----Original Message----- From: boost-users-bounces@lists.boost.org [mailto:boost-users-bounces@lists.boost.org] On Behalf Of Terence Wilson Sent: Saturday, December 09, 2006 2:55 PM To: boost-users@lists.boost.org Subject: Re: [Boost-users] Boost Serialization make_binary & XML/ASCII Robert, XML is normally parsed using a DOM or SAX parser. DOM reads the whole file into memory, SAX behaves like a recursive descent parser with callbacks to the client application. By placing the data block at the start of the file I should be able to get good performance from SAX or Spirit. Both would be good choices, however, I want to write some 'reference' code using standard tools since my work will be part of an SDK. Regards, Terence
-----Original Message----- From: boost-users-bounces@lists.boost.org [mailto:boost-users- bounces@lists.boost.org] On Behalf Of Robert Ramey Sent: Saturday, December 09, 2006 2:10 PM To: boost-users@lists.boost.org Subject: Re: [Boost-users] Boost Serialization make_binary & XML/ASCII
Well, now I'm out of my depth. Some have commented that the spirit parser is slower than other xml parsers.I don't know. I would have hoped that since spirit does a lot of the heavy lifting at compile time, it would be pretty fast. I haven't seen too much data on this so I really don't know. Any parser has to scan every character in the file so its not clear to
me that a SAX parser or any other can be know a priore to be faster than any other one.
My reason for using spirit was
a) it was already part of boost b) it was - after some learning curve - a good fit with what I wanted to do. c) well documented. d) customizable - serialization only uses a portion of the full xml so
it seemed the most efficient. e) all done at compile time so it wouldn't include dead code. f) portability to all compilers boost supports. g) By exercising a little care in code organization I was able to arrange things so that the module containing the parsing didn't depend
on the rest of the program. So the long compile time is not an issue.
It is in the library and is only recompiled when the grammar changes.
It is the last feature that suggests that you can easily use this to do your own actions upon parsing the serialization library.
After some initial pain figuring out how to use it, I have to say I have been extremely pleased with this application of spirit. I never wanted to do xml serialization as I felt it was a pain in the neck and
of relatively little utility in my view. I had anticipated a maintainence nightmare so more and more obscure corners of xml syntax were touched. I'm pleased to say this thing has been fantastic as far
as I'm concerned. After the intial one time pain - I haven't had to touch it since 2002 - and this (through spirt 1.6x - still available) is still compatible with Borland 5.51. And all the hacks required to make this so portable are only compiled into the platforms that need them.
This has been one of the most significant implementations in making the serialization library possible. (the other one would probably be mpl).
So if this were my problem I would:
a) Include the xml grammar and parser from the serialization library -
add my own actions. b) finish my code. Really this I would expect it would be 100 lines. c) If its too slow - and if profiling suggests that the spirit parser is the bottleneck - then I would look at tweaking the grammar to speed
up parsing or replacing the spirit parser with a faster one. This is my rule: "First make it work ASAP - then make it faster if necessary"
But I already am somewhat familiar with spirit so it might not be an interesting option for you. But then yo might be able to use the current parser unchanged. Of course this would bring the huge benefit
that if the xml_archive parser is tweaked for some reason (there are a
couple of issues with special characters), you would automatically inherit these changes and still be in sync.
I made the choice to invest the effort to figure out spirit rather than write my 10,000th file parser. Of course that was my decision and
may not be everyone's preference.
Good Luck
Terence Wilson wrote:
Robert,
The utility I am writing needs to be able to extract a small portion
from a large XML file generated by your library. Since it is performance sensitive I chose to use a SAX parser in order to avoid reading the whole file. Would it be much work to do this with the Spirit parser?
As always, thanks for the super-fast response.
Best regards,
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
participants (1)
-
Javier Estrada