
Darren wrote:
I think the library should really be separated into (for example) a cgi::service - which handles the protocol specifics - and cgi::request's.
I think I agree, except that 'cgi' is the wrong name; it's an http request, which could be a CGI request or something else.
I have high hopes that a good cgi::service template would allow the library to be extended to handle arbitrary cgi-based protocols, including 'standalone HTTP'
Yes, except again you need to swap that around; "standard HTTP" is not a "CGI-based protocol", but the converse.
Of particular interest: *should GET/POST variables be parsed by default?
So the issue is can you be more efficient in the case when the variable is not used by not parsing it? Well, if you're concerned about efficiency in that case then you would be better off not sending the thing in the first place. So I suggest parsing everything immediately, or at least on the first use of any variable.
I'd agree in theory, but automatic parsing would make it easy for a malicious user to cripple the server by just POSTing huge files wouldn't it?
A DOS attack of X million uploads of a file of size S is in most ways equivalent to 10*X million uploads of a file of size S/10, or 100*X million uploads of a file of size S/100. Where do you draw the line? The place to avoid this sort of concern is with bandwidth throttling in the front-end of the web server.
There's also situations where a cgi program accepts large files and possibly parses them on the fly, or encrypts them or sends them in chunks to a database. As a real-world example, if you attach an exe to a gmail message, you have to wait for the whole file to be sent first before the server returns the reply that it's an invalid file type.
I think it's hard to avoid parsing the whole stream in order to know which variables are present and that it's syntactically correct before continuing. And I don't think you can control the order in which the browser sends the variables. But if you can devise a scheme that allows lazy parsing of the data, great! As long as it doesn't add any syntactic complexity in the common case of a few small variables.
*should cookie variables be accessible just like GET/POST vars, or separately?
Separately
Ok. Although I think direct access is important, I'm tempted to include an helper function like: cgi::param( /*name*/ ) // returns 'value' That would iterate over the GET/POST vars _as well as_ the cookie vars. I'll keep my eye open for objections to the idea.
I think that the recent fuss about "Javascript Hijacking" has emphasised the fact that programmers need to be aware of whether they are dealing with cookies, GET (URL) variables, or POST data. Cookies set by example.com are returned to example.com even when the request comes from a script element on a page served by bad.com. In contrast, the bad.com page's script cannot see the GET or POST data that example.com's page is sending. Phil.