
Anthony Williams wrote:
Assume I know the encoding and character type I wish to use as input. In order to specialize converter<> for my string type, I need to know what encoding and character type the library is using. If the encoding and character type are not specified in the API, but are instead open to the whims of the backend, I cannot write my conversion code.
Ah, I think I understand what you mean by 'character type'. Yes, you are right. The code as I posted it to the vault is missing these bits. that enable users to write converters without knowing backend-specific details. However, some 'dom::char_trait' should be enough, right ?
I would suggest that the API accepts input in UTF-8, UTF-16 and UTF-32. The user then has to supply a conversion function from their encoding to one of these, and the library converts internally if the one they choose is not the "correct" one.
It already does. libxml2 provides conversion functions. I need to hook them up into such an 'xml char trait'.
I don't understand how your response ties in with my comment, so I'll try again.
I was suggesting that we have overloads like:
node::append_element(utf8_string_type); node::append_element(utf16_string_type); node::append_element(utf32_string_type);
With two of them (but unspecified which two) converting to the correct internal encoding.
Oh, but that multiplies quite a chunk of the API by four ! Typically, a unicode library provides converter functions, so what advantage would such a rich interface have instead of asking the user to do the conversion before calling into the xml library ? If the internal storage encoding is a compile-time constant that can be queried from the proposed dom::char_trait, it should be simple for users to decide how to write the converter, and in particular, how to pass strings in the most efficient way. [...]
Imagine, for example a web browser or XML editor. The XML comes in as a byte stream with an encoding tag such as a Charset-encoding field (if you're lucky). You then have to read this and convert it from whatever encoding is specified to the DOM library's internal encoding, do some processing and then output to the screen in the user's chosen encoding.
Right.
If I specify the conversions to use directly on the input and output, then I can cleanly separate my application into three layers --- process input, and build DOM in internal encoding; process DOM as necessary; display result to user.
If the string type and encoding is inherently part of the DOM types, this is not so simple.
I still don't understand what you have in mind: Are you thinking of using two separate unicode libraries / string types for input and output ? Again unicode libraries should provide encoding conversion, if all you want is to use distinct encodings. I may not understand the details well enough, but asking for the API to integrate the string conversions as you seem to be doing sounds exactly like what you accused me of doing: premature optimization. ;-)
I'm not sure I understand your requirement ? Do you really want to plug in multiple unicode libraries / string types ? Or do you want to use multiple encodings ?
Multiple encodings, generally. However, your converter<> template doesn't allow for that --- it only allows one encoding per string type.
Ah, well, the converter is not even half-finished, as in its current form it is tied to the string type. It sure requires some substantial design to be of any practical use. Regards, Stefan