Re: [boost] [RFC] Preferred API for a CGI library proposal

6 Apr 2007

      Darren wrote:
...
I think the library should really be separated into 
(for example) a cgi::service - which handles the protocol specifics - and 
cgi::request's.
I think I agree, except that 'cgi' is the wrong name; it's an http 
request, which could be a CGI request or something else.
...
I have high hopes that a 
good cgi::service template would allow the library to be extended to handle 
arbitrary cgi-based protocols, including 'standalone HTTP'
Yes, except again you need to swap that around; "standard HTTP" is not 
a "CGI-based protocol", but the converse.
...
...
...
Of particular interest:
*should GET/POST variables be parsed by default?
So the issue is can you be more efficient in the case when the variable 
is not used by not parsing it?  Well, if you're concerned about 
efficiency in that case then you would be better off not sending the 
thing in the first place.  So I suggest parsing everything immediately, 
or at least on the first use of any variable.
I'd agree in theory, but automatic parsing would make it easy for a
malicious user to cripple the server by just POSTing huge files wouldn't it?
A DOS attack of X million uploads of a file of size S is in most ways 
equivalent to 10*X million uploads of a file of size S/10, or 100*X 
million uploads of a file of size S/100.  Where do you draw the line?  
The place to avoid this sort of concern is with bandwidth throttling in 
the front-end of the web server.
...
There's also situations where a cgi program accepts large files and possibly
parses them on the fly, or encrypts them or sends them in chunks to a
database. As a real-world example, if you attach an exe to a gmail message,
you have to wait for the whole file to be sent first before the server
returns the reply that it's an invalid file type.
I think it's hard to avoid parsing the whole stream in order to know 
which variables are present and that it's syntactically correct before 
continuing.  And I don't think you can control the order in which the 
browser sends the variables.  But if you can devise a scheme that 
allows lazy parsing of the data, great!  As long as it doesn't add any 
syntactic complexity in the common case of a few small variables.
...
...
...
*should cookie variables be accessible just like GET/POST vars, or 
separately?
Separately
Ok. Although I think direct access is important, I'm tempted to include an 
helper function like:
cgi::param( /*name*/ ) // returns 'value'
That would iterate over the GET/POST vars _as well as_ the cookie vars. I'll 
keep my eye open for objections to the idea.
I think that the recent fuss about "Javascript Hijacking" has 
emphasised the fact that programmers need to be aware of whether they 
are dealing with cookies, GET (URL) variables, or POST data.  Cookies 
set by example.com are returned to example.com even when the request 
comes from a script element on a page served by bad.com.  In contrast, 
the bad.com page's script cannot see the GET or POST data that 
example.com's page is sending.

Phil.

Re: [boost] [RFC] Preferred API for a CGI library proposal

Phil Endecott