Re: [boost] [RFC] Preferred API for a CGI library proposal

Darren Garvey wrote:
This is an initial probe for ideas about an API for [any] upcoming CGI library submission.
Hi Darren, I have some GPL code that does this sort of thing; you're welcome to look at it, and I'm not fussy about the license if you want to re-use any of it for a Boost submission. This code has evolved to meet my needs, and isn't the sort of thing that you would write if you were starting from scratch. But do have a look at: http://svn.chezphil.org/libpbe/trunk/include/CgiParams.hh http://svn.chezphil.org/libpbe/trunk/src/CgiParams.cc This declares CgiParams: public map<string,string> which seems to me to be the best interface to present for accessing the CGI data. It can initialise itself from GET and POST data, including the MIME-like format used for file uploads. There is also http://svn.chezphil.org/libpbe/trunk/include/CgiVars.hh http://svn.chezphil.org/libpbe/trunk/src/CgiVars.cc which provide access to all the non-parameter variables that the CGI spec defines. There is another version of some of this that I wrote for another project in http://svn.chezphil.org/anyterm/trunk/common/CgiParams.hh http://svn.chezphil.org/anyterm/trunk/common/UrlEncodedCgiParams.cc I think this is more "boostified"! For example, it has a get_as<T>(name) method that will lexical_cast the parameter to the required type. I have also used the Apache module API, and have written standalone HTTP servers. If I was doing all this from scratch I'd try to do something that would be equally applicable in any of these situations (and also things like FCGI as someone else has suggested). I.e. you want to define an 'HTTP Request' object, which has ways of accessing the form data, not an explicitly 'CGI' object. (There is actually an HttpRequest class in http://svn.chezphil.org/libpbe/trunk/include/HttpRequest.hh, but I haven't used it in combination with form data. There is also a Spirit parser for HTTP requests in http://svn.chezphil.org/libpbe/trunk/src/Request.cc.)
Of particular interest: *should GET/POST variables be parsed by default?
So the issue is can you be more efficient in the case when the variable is not used by not parsing it? Well, if you're concerned about efficiency in that case then you would be better off not sending the thing in the first place. So I suggest parsing everything immediately, or at least on the first use of any variable.
*how should GET/POST variables be accessed?
As a map<string,string>, or similar.
*should cookie variables be accessible just like GET/POST vars, or separately?
Separately, but again in a map-like name/value thing, e.g. struct HttpRequest { map<string,string> cgi_vars; map<string,string> cookies; map<string,string> http_headers; ... }
*should the CGI environment variables each have explicit functions for their access, or should (eg.) a generic cgi::get_env() function be used?
I listed them all explicitly in my CgiVars implementation, rather than adding another map<string,string> to the HttpRequest. I think that my motivation was to get a compile-time error if I mis-remembered the variable name (i.e. vars.remote_host vs. vars["REMOTE_HOST"]).
*url decoding functions are needed for GET/POST variables.
Internal to your code, of course. I don't want to see them.
Should url encoding functions also be provided?
If you want, but in a separate header file. (And be sure you know exactly what encoding you're doing...)
*how transparent should user code be to internationalization/different character sets?
I think that in the case of url-encoded data it's hard to be certain of the character set in use. In the MIME case that data is explicitly available, and you should make it accessible to the user. I think you can also get content-type information. I'm not sure how best to fit that into the map<string,string> scheme. We really need: class string_with_charset { string s; charset_t charset; }; and then your CGI parameters can be map<string,string_with_charset>. Or something more complex to handle content-types as well. Is there a Boost wrapper for iconv yet? What about MIME handling? I don't think either has been doone; maybe you'd like to do those too. Regards, Phil.

Phil Endecott You said, "I have also used the Apache module API, and have written standalone HTTP servers." How does one go about doing that? I have tried using POCO but it's a little buggy and I can't seem to get any support. So, if I could learn to do the same but using Apache, I would be ecstatic! Could you provide a tutorial/example on how this could be done?

Hi Phil, On 06/04/07, Phil Endecott <spam_from_boost_dev@chezphil.org> wrote:
I have some GPL code that does this sort of thing; you're welcome to look at it, and I'm not fussy about the license if you want to re-use any of it for a Boost submission. This code has evolved to meet my needs, and isn't the sort of thing that you would write if you were starting from scratch.
Thank you, sir! I'm aiming for a middle-ground, so seeing your code is very helpful. [snip]
I think this is more "boostified"! For example, it has a
get_as<T>(name) method that will lexical_cast the parameter to the required type.
I suppose it could be argued that a wrapper for lexical_cast is out-of-scope, but it's likely I'll include one - unless there's strong criticism - due to the amount of use it'd probably get. I have also used the Apache module API, and have written standalone
HTTP servers. If I was doing all this from scratch I'd try to do something that would be equally applicable in any of these situations (and also things like FCGI as someone else has suggested). I.e. you want to define an 'HTTP Request' object, which has ways of accessing the form data, not an explicitly 'CGI' object. (There is actually an HttpRequest class in http://svn.chezphil.org/libpbe/trunk/include/HttpRequest.hh, but I haven't used it in combination with form data. There is also a Spirit parser for HTTP requests in http://svn.chezphil.org/libpbe/trunk/src/Request.cc.)
I completely agree here. I think the library should really be separated into (for example) a cgi::service - which handles the protocol specifics - and cgi::request's. I haven't a proof-of-concept, but I have high hopes that a good cgi::service template would allow the library to be extended to handle arbitrary cgi-based protocols, including 'standalone HTTP', almost transparently to user code.
Of particular interest:
*should GET/POST variables be parsed by default?
So the issue is can you be more efficient in the case when the variable is not used by not parsing it? Well, if you're concerned about efficiency in that case then you would be better off not sending the thing in the first place. So I suggest parsing everything immediately, or at least on the first use of any variable.
I'd agree in theory, but automatic parsing would make it easy for a malicious user to cripple the server by just POSTing huge files wouldn't it? There's also situations where a cgi program accepts large files and possibly parses them on the fly, or encrypts them or sends them in chunks to a database. As a real-world example, if you attach an exe to a gmail message, you have to wait for the whole file to be sent first before the server returns the reply that it's an invalid file type. I may be missing something but this seems like a significant problem, despite it largely being ignored (I think).
*how should GET/POST variables be accessed?
As a map<string,string>, or similar.
Noted. I'm curious if unordered_map would be more efficient, but that's an implementation detail. I'll have to see.
*should cookie variables be accessible just like GET/POST vars, or
separately?
Separately, but again in a map-like name/value thing, e.g.
struct HttpRequest { map<string,string> cgi_vars; map<string,string> cookies; map<string,string> http_headers; ... }
Ok. Although I think direct access is important, I'm tempted to include an helper function like: cgi::param( /*name*/ ) // returns 'value' That would iterate over the GET/POST vars _as well as_ the cookie vars. I'll keep my eye open for objections to the idea.
*should the CGI environment variables each have explicit functions for their
access, or should (eg.) a generic cgi::get_env() function be used?
I listed them all explicitly in my CgiVars implementation, rather than adding another map<string,string> to the HttpRequest. I think that my motivation was to get a compile-time error if I mis-remembered the variable name (i.e. vars.remote_host vs. vars["REMOTE_HOST"]).
That's the way I'm leaning too.
*url decoding functions are needed for GET/POST variables.
Internal to your code, of course. I don't want to see them.
Of course. ;)
Should url encoding functions also be provided?
If you want, but in a separate header file. (And be sure you know exactly what encoding you're doing...)
Noted. This is tricky but I suppose it's a non-vital component. Adding it sounds like fun, unfortunately...
*how transparent should user code be to internationalization/different
character sets?
I think that in the case of url-encoded data it's hard to be certain of the character set in use. In the MIME case that data is explicitly available, and you should make it accessible to the user. I think you can also get content-type information. I'm not sure how best to fit that into the map<string,string> scheme. We really need:
class string_with_charset { string s; charset_t charset; };
Sounds about right. I think an awareness of content-types would be very useful too. I'll have to be careful to not stray out of this library's scope (which should really be quite tight, imo), but awareness of the issue should be included at least. and then your CGI parameters can be map<string,string_with_charset>.
Or something more complex to handle content-types as well. Is there a Boost wrapper for iconv yet? What about MIME handling? I don't think either has been doone; maybe you'd like to do those too.
I don't think either of those have been 'boosted' yet. Having access to them might make my life easier later on, but we'll see. :) Thanks for the input, Darren
participants (3)
-
Darren Garvey
-
Jarrad Waterloo
-
Phil Endecott