It looks like the traits aspect of Xpressive is geared toward characters, so I assume that Xpressive is not directly usable with UTF-8 encoded text, am I correct? It might work by having the character type be a 32 bit integer and then use iterator adapters which expose the sequence as ucs-4 code points (after all, the sequence is “encoded”), but that leads me to the next question: diacritics. For example something like é in decomposed unicode is two code points (e followed by a combining ´ mark), so even when the sequence is iterated as ucs-4 code points, a regexp of “.” will match just the e, not the actual (rendered) character. Since I was unable to find any discussion of this while searching for Xpressive, I am curious to hear if any thoughts have gone into these issues.