Re: [boost] Proposal: XML APIs in boost

7 Nov 2005

      Anthony Williams wrote:
...
Assume I know the encoding and character type I wish to use as input. In order
to specialize converter<> for my string type, I need to know what encoding and
character type the library is using. If the encoding and character type are
not specified in the API, but are instead open to the whims of the backend, I
cannot write my conversion code.
Ah, I think I understand what you mean by 'character type'. Yes, you are right.
The code as I posted it to the vault is missing these bits. that enable users
to write converters without knowing backend-specific details. However, some
'dom::char_trait' should be enough, right ?
...
...
...
I would suggest that the API accepts input in UTF-8, UTF-16 and UTF-32. The
user then has to supply a conversion function from their encoding to one of
these, and the library converts internally if the one they choose is not the
"correct" one.
It already does. libxml2 provides conversion functions. I need to hook them
up into such an 'xml char trait'.
I don't understand how your response ties in with my comment, so I'll try
again.
I was suggesting that we have overloads like:
node::append_element(utf8_string_type);
node::append_element(utf16_string_type);
node::append_element(utf32_string_type);
With two of them (but unspecified which two) converting to the correct
internal encoding.
Oh, but that multiplies quite a chunk of the API by four !
Typically, a unicode library provides converter functions, so what advantage
would such a rich interface have instead of asking the user to do the conversion
before calling into the xml library ?

If the internal storage encoding is a compile-time constant that can be queried
from the proposed dom::char_trait, it should be simple for users to decide how
to write the converter, and in particular, how to pass strings in the most
efficient way.

[...]
...
Imagine, for example a web browser or XML editor. The XML comes in as a byte
stream with an encoding tag such as a Charset-encoding field (if you're
lucky). You then have to read this and convert it from whatever encoding is
specified to the DOM library's internal encoding, do some processing and then
output to the screen in the user's chosen encoding.
Right.
...
If I specify the conversions to use directly on the input and output, then I
can cleanly separate my application into three layers --- process input, and
build DOM in internal encoding; process DOM as necessary; display result to
user.
If the string type and encoding is inherently part of the DOM types, this is
not so simple.
I still don't understand what you have in mind: Are you thinking of using
two separate unicode libraries / string types for input and output ? Again
unicode libraries should provide encoding conversion, if all you want is
to use distinct encodings.

I may not understand the details well enough, but asking for the API to
integrate the string conversions as you seem to be doing sounds exactly
like what you accused me of doing: premature optimization. ;-)
...
...
I'm not sure I understand your requirement ? Do you really want to plug in
multiple unicode libraries / string types ? Or do you want to use multiple
encodings ?
Multiple encodings, generally. However, your converter<> template doesn't
allow for that --- it only allows one encoding per string type.
Ah, well, the converter is not even half-finished, as in its current form
it is tied to the string type. It sure requires some substantial design
to be of any practical use.

Regards,
		Stefan

Re: [boost] Proposal: XML APIs in boost

Stefan Seefeld