
Daryle Walker wrote:
But you got other problems, here's the entire class in <boost/archive/impl/basic_xml_oarchive.ipp>
//======================================================================== //... namespace boost { namespace archive {
namespace detail { template<class CharType> struct XML_name { typedef bool result_type; void operator()(CharType t) const{ unsigned char lookup_table[] = { 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0, // -. 1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0, // 0-9 0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1, // A- 1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,1, // -Z _ 0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1, // a- 1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0, // -z 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 }; if((unsigned)t > 127) return; if(0 == lookup_table[(unsigned)t]) boost::throw_exception( xml_archive_exception( xml_archive_exception::xml_archive_tag_name_error ) ); } };
//... //========================================================================
A static-const table would have to be created for each version of XML_name, but (in the code later in the file) you only use a "const char" version of the template. So why bother making this a template at all? Believe it or not, Boost code doesn't _have_ to be made of templates.
When I wrote this I didn't know that it woudl be used for only one kind of character.
* Why do you use "const char" instead of just "char" in your instantiations?
off hand I don't remember. But it might ot might not have had something to do with what happens when one passes a const argument to an argument taking a non-const. This will generate a compile error for class types. I don't remember if compilers complain for primitive types or not.
* result_type doesn't match what operator() returns.
It seems that that is vestige of some earlier incarnation.
* If the table is just 0's and 1's, why not just use "bool"? Is it because "bool" can be bigger than a byte?
yes
* The code assumes that CharType is compatible with "unsigned". What if it isn't? This suggests to be that you should make a non-template function that optimizes the code for "char" and make the class template call that function (with appropriate conversions).
* The code assumes that CharType's code points are ASCII compatible.
an oversight on my part.
* XML is supposed to be Unicode-friendly. What about non-ASCII characters that could be used in XML names?
The code assumes that CharType can be converted to an unsigned integer. This is true for the instantiation used.
* As another poster said, what's wrong with using functions like "isalnum"? If you're afraid it may include non-ASCII letters and numbers, then just use a giant switch statement. Hopefully compiler makers haven't forgotten how to optimize those (usually into a table).
isalnum didn't include certain punctuation that is legal in xml tags so it whould been isalmum || ... . In fact, what is really needed is is_xxml_tag_char - so I made this.
* Why do bad values outside the range simply return while ones inside the range throw?
I don't remember why I did this now. There is a whole layer going on here that isn't really apparent. When an XML archive is read xml tags are converted to string variables for internal usage withing the archive implementation. This presents a problem. There are currently two cases xml_?archive for normal 8 bit characters and _w?archive for wide characters. xml_archive just reads in the characters into the string. Whether the xml is in ascii, the local multi-byte character set or utf8 is not taken into account. That is the codecvt facet doesn't do anything. xml_warchive presumes that the xml is coded in utf16. variables to be placed in strings - such as xml tags are tranlated into multi-byte characters in the current locale. Then they are checked here for invalid xml_characters. So what's really missing here is a good ix_xml_tag function. Given the concerns raised in this email, it would seem that it should consider the posibility of EBCDIC characters as well. Should anyone have an implementation of such a function I would be pleased to consider it. Robert Ramey