
on Sat Sep 01 2007, Chris Lattner <clattner-AT-apple.com> wrote:
for example, and efficient buffer management (at least in our context) means that the input to the lexer isn't useful as an iterator interface.
Well, the kind of input sequence is exactly one thing I would templatize.
To what benefit?
So people don't have to pay the price of copying their sequence into a null-terminated memory buffer.
In practice, clang requires its input to come from a nul terminated memory buffer (yes, we do correctly handle embedded nul's in the input buffer as whitespace). Here are the pros and cons:
Pros: clang is designed for what we perceive to be the common case. In particular, mmap'ing in files almost always implicitly null terminates the buffer (if a file is not an even multiple of a page size, most major OS's null fill to the end of the page) so we get this invariant for free in most cases. Memory buffers and many others are also easy to handle in this scheme.
Futher, knowing that we have a sequential memory buffer as an input makes various optimizations really trivial: for example our block comment skipper is vectorized on hosts that support SSE or Altivec. Having the nul terminator at the end of the file means that the lexer doesn't have to check for "end of buffer" condition in *many* highly performance sensitive lexing loops (e.g. lexing identifiers, which cannot have a nul in them).
The ability to provide specialized algorithm implementations that take advantage of special knowledge of the data structure is a strength of generic programming. -- Dave Abrahams Boost Consulting http://www.boost-consulting.com The Astoria Seminar ==> http://www.astoriaseminar.com