
Miro Jurisic <macdev@meeroh.org> writes:
In article <87isg3skr6.fsf@jbms.ath.cx>, Jeremy Maitin-Shepard <jbms@attbi.com> wrote:
[snip]
- For the purpose of string construction, the Unicode specification explicitly states that any sequence of code points is well formed, and so this provides the smallest unit by which guaranteed-well-formed strings can be formed.
Can you refer me to a specific point in the spec where this is stated?
In Unicode 4.0.1, Chapter 3.9: D30a Well-formed: A Unicode code unit sequence that purports to be in a Unicode encoding form is called well-formed if and only if it does follow the specification of that Unicode encoding form. - A Unicode code unit sequence that consists entirely of a sequence of well-formed Unic ode code unit sequences (all of the same Unicode encoding form) is itself a well-formed Unicode code unit sequence. Thus, since any code unit sequence representing a single Unicode scalar value is itself well-formed, any sequence of encoded code points is well-formed.
[snip]
-- Jeremy Maitin-Shepard