boost::program_options INI file parsing

The INI file parser module of program_options needs to be changed. For some reason, the comment delimiter has been changed from ';' to '#'. The bigger problem is this has been taken beyond the first character of the line and there is no escaping ability! The more common INI file comment behavior is: '#' is a comment ONLY if it is the first character of the line ';' is a comment from that point of the line on, unless preceeded by a '\' (or more accurately, an odd number of '\' characters, since you could theoretically do \\\;, to translate to \; when made a string). I have written my own INI parser for my own purposes, but it is not associated with program_options, which makes its usefulness in that context limited (without a lot of extra coding, etc). You can view the source here: http://www.neuromancy.net/viewcvs/Mantra-I/src/file/inifile.cpp?root=mantra&rev=1.10&view=auto My own implementation does differ from the standard INI file parsing convention in allowing line continuation, ie. Allowing: my_option = some text \ some more text And it is interpreted as 'some text some more text'. One big selling point of my implementation is that it uses boost::spirit (the lexical parser) to do the parsing - so the logic for parsing is encoded in the lexical parsing language, not in code. Theoretically, this means you should be able to use my implementation directly, and adapt the code from IniFile::Load() that uses it for program_options relatively easily, and not have to worry about the parsing logic itself. Otherwise (if you don't want to use a lexical-parser based implementation, or my version specifically), could we please stick to the standards, and allow '#' as a comment ONLY from the beginning of the line, and then ';' from any point in the line (as '#' is treated in the current version) (with the 'escaping' rules). Thanks, -- PreZ :) Founder. The Neuromancy Society (http://www.neuromancy.net)

Preston A. Elder wrote:
The INI file parser module of program_options needs to be changed. For some reason, the comment delimiter has been changed from ';' to '#'. The bigger problem is this has been taken beyond the first character of the line and there is no escaping ability!
The more common INI file comment behavior is: '#' is a comment ONLY if it is the first character of the line ';' is a comment from that point of the line on, unless preceeded by a '\' (or more accurately, an odd number of '\' characters, since you could theoretically do \\\;, to translate to \; when made a string).
It's the first time I hear about ';' being used as comment delimiter. Can you point me out to some docs which document INI format as above? Note also that I never say that format of config files is exactly as used by Windows .ini files.
I have written my own INI parser for my own purposes, but it is not associated with program_options, which makes its usefulness in that context limited (without a lot of extra coding, etc).
You can view the source here:
http://www.neuromancy.net/viewcvs/Mantra-I/src/file/inifile.cpp?root=mantra&rev=1.10&view=auto I'll take a look
My own implementation does differ from the standard INI file parsing convention in allowing line continuation, ie. Allowing: my_option = some text \ some more text And it is interpreted as 'some text some more text'.
I intend to add this functionality too.
One big selling point of my implementation is that it uses boost::spirit (the lexical parser) to do the parsing - so the logic for parsing is encoded in the lexical parsing language, not in code. Theoretically, this means you should be able to use my implementation directly, and adapt the code from IniFile::Load() that uses it for program_options relatively easily, and not have to worry about the parsing logic itself.
Sorry, this is not a selling point, it's a drawback. For a simple task like parsing ini files Spirit is a overkill. To be specific: 1. Spirit does not work with borland, while program_options does. I don't want to do as serialization do, and require older Spirit for borland -- that's maintenance nightlmare. 2. IIRC, there were reports of Spirit problems on darwin and on VC 8.0, and I don't want to be affected by those.
Otherwise (if you don't want to use a lexical-parser based implementation, or my version specifically), could we please stick to the standards,
I'd be happy to stick to standards provided you give an URL to those standards ;-) - Volodya

On Sat, 14 May 2005 09:50:58 +0400, Vladimir Prus wrote:
Sorry, this is not a selling point, it's a drawback. For a simple task like parsing ini files Spirit is a overkill. To be specific: 1. Spirit does not work with borland, while program_options does. I don't want to do as serialization do, and require older Spirit for borland -- that's maintenance nightlmare. 2. IIRC, there were reports of Spirit problems on darwin and on VC 8.0, and I don't want to be affected by those. Fair enough :)
To be completely honest, my INI parser also gave me a chance to learn how to use a lexical parser too :P
I'd be happy to stick to standards provided you give an URL to those standards ;-)
The quickest google search ('INI file format') showed up: http://cloanto.com/specs/ini.html http://www.lisp-p.org/pil/ These are not authorative (its difficult to find authorative sources since most search hits are talking about the syntax of a specific INI file). They do illustrate what I was saying (though they exclude '# comments only on its own line', but this could be an innovation of unix people using INI files). As anecdotal evidence, all microsoft INI files follow this scheme, and the vi syntax highlighting also recognizes this scheme ;) In any case, whether '#' is or is not recognized as a comment, its certainly never recognized as a comment mid-way through a line in an INI file, and a ';' is always recognized as a comment in an INI file. -- PreZ :) Founder. The Neuromancy Society (http://www.neuromancy.net)

Vladimir Prus wrote:
2. IIRC, there were reports of Spirit problems on darwin and on VC 8.0, and I don't want to be affected by those.
Which problems? Links please? Cheers, -- Joel de Guzman http://www.boost-consulting.com http://spirit.sf.net

Joel de Guzman wrote:
2. IIRC, there were reports of Spirit problems on darwin and on VC 8.0, and I don't want to be affected by those.
Which problems? Links please?
As the latest regression tests show (http://tinyurl.com/dxdb3) these are fixed now. Regards Hartmut

Joel de Guzman wrote:
Vladimir Prus wrote:
2. IIRC, there were reports of Spirit problems on darwin and on VC 8.0, and I don't want to be affected by those.
Which problems? Links please?
http://thread.gmane.org/gmane.comp.lib.boost.devel/115200 http://thread.gmane.org/gmane.comp.lib.boost.testing/1024 The latter is fixed, according to Hartmut. Not sure about the first. - Volodya

Preston A. Elder wrote:
The INI file parser module of program_options needs to be changed. For
The more common INI file comment behavior is: '#' is a comment ONLY if it is the first character of the line ';' is a comment from that point of the line on, unless preceeded by a '\' (or more accurately, an odd number of '\' characters, since you could theoretically do \\\;, to translate to \; when made a string).
These rules would misparse many INI files. On DOS and Windows, backslashes are used in file paths, and semicolons are commonly used as separator in path and file lists. Such lists are often stored as value strings. The box character (#) is generally not used to start comments in Windows ini files, though certainly some programs allow it. Vladimir Prus wrote:
It's the first time I hear about ';' being used as comment delimiter. Can you point me out to some docs which document INI format as above? Note also that I never say that format of config files is exactly as used by Windows .ini files.
From Windows 3.1 Resource Kit WIN.INI Section Settings (KB item 83433) http://support.microsoft.com/default.aspx?scid=kb;en-us;83433 Format of the WIN.INI File -------------------------- The WIN.INI file contains several sections, each of which consists of a group of related settings. The sections and settings are listed in the WIN.INI file in the following format: [section name] keyname=value In this example, [section name] is the name of a section. The enclosing brackets ([]) are required, and the left bracket must be in the leftmost column on the screen. The keyname=value statement defines the value of each setting. A keyname is the name of a setting. It can consist of any combination of letters and digits, and must be followed immediately by an equal sign (=). The value can be an integer, a string, or a quoted string, depending on the setting. You can include comments in initialization files. You must begin each line of a comment with a semicolon (;). === end quote ==================================== Actually one shouldn't even try to write a parser on Windows, the API functions Get/SetPrivateProfileString are the preferred way of interacting with INI files. (And the Registry is of course preferred over ini files since win 95.) The docs for GetPrivateProfileStringA states that any byte values >= hex 20 are valid in keyname and value strings. If program_options chooses to support Win32 INI files, it should IMO not try to to parse the value strings, the programmer has to decide what to do with them. She may want to unescape characters in some way, or to discard everything following a ; or # or C. rasmus

Actually one shouldn't even try to write a parser on Windows, the API functions Get/SetPrivateProfileString are the preferred way of interacting with INI files. (And the Registry is of course preferred over ini files since win 95.)
I think that they officially deprecate the registry now. I believe that .Net pushes people toward config files in the APPDATA or the COMMON_APPDATA location as returned by SHGetFolderPath. These usually resolve to "c:/Documents and Settings/username/Application Data" and "c:/Documents and Settings/All Users/Application Data" directories respectively. I don't know if there is any way to abstract these locations in the program_options library. It would be kind of a nice insulation layer. Something like: boost::filesystem::path get_user_config_path(); boost::filesystem::path get_system_config_path(); And make them call SHGetFolderPath on Windows or return "~/" or "/etc/" on posix systems. I can't speak for other kinds of systems though. Anyway, just a thought. Jason
participants (6)
-
Hartmut Kaiser
-
Jason Stewart
-
Joel de Guzman
-
Preston A. Elder
-
re
-
Vladimir Prus