Thank you Caleb and Hartmut for your replies. You both seem to think regex is a bad way to go, I will explain better what I want to write just to be clear.
I want to write a tool (cli probably) where I can say, here you go, here is a large folder full of code, go and parse it. I will store the results in XML format somewhere, then, I can do say, "<myapp> class someclass" and the program will go and find where that class is declared/defined using its database, saving me headache.
So I thought I could use one of the C++ expat wrappers, and boost regex looked powerful enough to do the parsing if only I were handy enough with regular expression syntax.
Anyway, I don't know if that better explanation will make any difference to you recommendations, I look forward to reading you opinions.
Oh, and the example I looked at is here Hartmut: http://boost.org/libs/regex/example/snippets/regex_search_example.cpp - that is what got me thinking I might actually be able to take on this challenge.
It depends what you want to do: if you want to use a "real" C++ parser then you will also have to preprocess the code (including the includes) and then parse the code. In theory this gives you a "perfect" result, but only if you know what include paths to use, and what predefined macros should be set (think about conditional code blocks). Regexes on the other hand, don't require you to preprocess the code, but can get confused by macros and the like. So you have to choose the way that best meets your expectations, and live with the defects either which way ;-) To solve your problem BTW, why not scan through the file for line starts (keeping count obviously!), and at each line start see if it's also the start of the regex you are interested in (one that matches a class definition for example), if you do this don't forget to either: prefix your expression with \A or Pass the match_continuous flag to regex_search, Either will anchor the search at the start of the line you are checking, and prevent the whole text being searched. John.