Fwd: [Boost-users][RFC] Preferred API for a CGI library proposal

[cc'ed from boost-users list] Hello all, This may seem off-topic, but I assure you it isn't. ;) This is an initial probe for ideas about an API for [any] upcoming CGI library submission. Of particular interest: *should GET/POST variables be parsed by default? (not _strictly_ API, but relevant) *how should GET/POST variables be accessed? Like CGI.pm (perl) or python's cgi module? *should cookie variables be accessible just like GET/POST vars, or separately? Should they be lumped together by default with more direct access when needed? Would it be confusing to allow user code to determine behaviour? *should the CGI environment variables each have explicit functions for their access, or should (eg.) a generic cgi::get_env() function be used? *url decoding functions are needed for GET/POST variables. Should url encoding functions also be provided? *how transparent should user code be to internationalization/different character sets? The reason I'm asking such an open-ended question is that I'm assuming lots of people here will have experience using CGI libraries (possibly in other languages, possibly your own) and may have specific ideas about what should or shouldn't move from them into a Boost.CGI library, if one were accepted. Whether or not you see the point of a C++ CGI library, your comments on APIs provided by other languages would still be very much appreciated. Cheers, Darren

On 4/5/07, Darren Garvey <lists.drrngrvy@googlemail.com> wrote:
[cc'ed from boost-users list]
Hello all,
This may seem off-topic, but I assure you it isn't. ;)
This is an initial probe for ideas about an API for [any] upcoming CGI library submission. Of particular interest: *should GET/POST variables be parsed by default? (not _strictly_ API, but relevant)
Perhaps parsing them on first use might give a perf boost, but I don't think it would make much difference.
*how should GET/POST variables be accessed? Like CGI.pm (perl) or python's cgi module?
I havn't used perl or python, but I think if you have a class cgi_context, it should have get and post members. There is no limit to post size (iirc) though, so a way to stream it might be useful to keep memory usage down.
*should cookie variables be accessible just like GET/POST vars, or separately? Should they be lumped together by default with more direct access when needed? Would it be confusing to allow user code to determine behaviour?
I think being able to access them like above for get vars would be a plus.
*should the CGI environment variables each have explicit functions for their access, or should (eg.) a generic cgi::get_env() function be used?
Don't really have an opinion here.
*url decoding functions are needed for GET/POST variables. Should url encoding functions also be provided?
Yes. Though not strictly CGI, I think a htmlencode/htmldecode might be useful utility functions to include too.
*how transparent should user code be to internationalization/different character sets?
You aren't always outputting text in CGI, so I don't think it should be transparent. A way to access it as a text stream, perhaps with a way to specify encoding, would be nice though.
The reason I'm asking such an open-ended question is that I'm assuming lots of people here will have experience using CGI libraries (possibly in other languages, possibly your own) and may have specific ideas about what should or shouldn't move from them into a Boost.CGI library, if one were accepted.
I think a CGI library should also include transparent FastCGI support. Something I would also like to see is async support, with an API similar to asio.
Whether or not you see the point of a C++ CGI library, your comments on APIs provided by other languages would still be very much appreciated.
Cheers, Darren _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
-- Cory Nelson http://www.int64.org

Hi Cory, On 05/04/07, Cory Nelson <phrosty@gmail.com > wrote:
On 4/5/07, Darren Garvey <lists.drrngrvy@googlemail.com> wrote:
[cc'ed from boost-users list]
Hello all,
This may seem off-topic, but I assure you it isn't. ;)
This is an initial probe for ideas about an API for [any] upcoming CGI library submission. Of particular interest: *should GET/POST variables be parsed by default? (not _strictly_ API, but relevant)
Perhaps parsing them on first use might give a perf boost, but I don't think it would make much difference.
The other part to this question is with regards to the stdin stream (ie. POST variables). This can be huge and there may be no need to parse it at all. So, do you parse GET vars and not POST vars by default (possibly breaking a program if there is a change of a form from GET to POST); parse nothing (making code that bit uglier) or parse everything (potentially dangerous)? Try a more complex way? [noted, agreed, snipped]
The reason I'm asking such an open-ended question is that I'm assuming lots of people here will have experience using CGI libraries (possibly in other languages, possibly your own) and may have specific ideas about what should or shouldn't move from them into a Boost.CGI library, if one were accepted.
I think a CGI library should also include transparent FastCGI support.
I agree that shipping FastCGI support is a must - I'm not sure making it completely transparent is the best option - but I agree in principle. This discussion is for a separate thread though, perhaps (I've been thinking of this for the summer of code (or just the summer...), so input is more than welcome)? Something I would also like to see is async support, with an API
similar to asio.
By 'async support' you mean for input/output? Thanks for the comments, Darren

On 4/5/07, Darren Garvey <lists.drrngrvy@googlemail.com> wrote:
Hi Cory,
On 05/04/07, Cory Nelson <phrosty@gmail.com > wrote:
On 4/5/07, Darren Garvey <lists.drrngrvy@googlemail.com> wrote:
[cc'ed from boost-users list]
Hello all,
This may seem off-topic, but I assure you it isn't. ;)
This is an initial probe for ideas about an API for [any] upcoming CGI library submission. Of particular interest: *should GET/POST variables be parsed by default? (not _strictly_ API, but relevant)
Perhaps parsing them on first use might give a perf boost, but I don't think it would make much difference.
The other part to this question is with regards to the stdin stream (ie. POST variables). This can be huge and there may be no need to parse it at all. So, do you parse GET vars and not POST vars by default (possibly breaking a program if there is a change of a form from GET to POST); parse nothing (making code that bit uglier) or parse everything (potentially dangerous)? Try a more complex way?
I think GET should be parsed immediately, and POST should be read in by the user. Maybe have an adapter class that can decode POST data if the user wants it like that?
[noted, agreed, snipped]
The reason I'm asking such an open-ended question is that I'm assuming lots of people here will have experience using CGI libraries (possibly in other languages, possibly your own) and may have specific ideas about what should or shouldn't move from them into a Boost.CGI library, if one were accepted.
I think a CGI library should also include transparent FastCGI support.
I agree that shipping FastCGI support is a must - I'm not sure making it completely transparent is the best option - but I agree in principle. This discussion is for a separate thread though, perhaps (I've been thinking of this for the summer of code (or just the summer...), so input is more than welcome)?
Well, normal CGI is already phased out by scripting langs, I imagine FastCGI will be the primary use of this library. Innards can be kept to other threads but I think not taking it into consideration while discussing the basics of the library would be a mistake. I do think it could be mostly transparent: the user should be able to setup a request handler and either give it to a FastCGI listener or attach it to stdin/stdout without code changes.
Something I would also like to see is async support, with an API
similar to asio.
By 'async support' you mean for input/output?
Yes, like an asio socket class, with begin_read/begin_write etc. This is one thing I always see lacking in current web development kits: so many of them spend time accessing databases etc, things that block and should be done in an async manner, and I bet suffer performance because of it. It would be nice for a change to have an API that allows such scalability when it is wanted and ease of use when it doesn't matter. This is not so important for regular CGI but in FastCGI it could be a great boon.
Thanks for the comments, Darren _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
-- Cory Nelson http://www.int64.org

On 06/04/07, Cory Nelson <phrosty@gmail.com> wrote:
I think GET should be parsed immediately, and POST should be read in by the user. Maybe have an adapter class that can decode POST data if the user wants it like that?
Such an adapter class could be very useful. I was thinking that a boost::iostreams::filtering_streambuf might be ideal for this. That way, it should be relatively simple for a library user to add in their own filter if they want to do further decoding.
completely transparent is the best option - but I agree in principle. This discussion is for a separate thread though, perhaps (I've been thinking of this for the summer of code (or just the summer...), so input is more
I agree that shipping FastCGI support is a must - I'm not sure making it than
welcome)?
Well, normal CGI is already phased out by scripting langs, I imagine FastCGI will be the primary use of this library. Innards can be kept to other threads but I think not taking it into consideration while discussing the basics of the library would be a mistake.
I didn't mean to imply it shouldn't be discussed, just that I think this topic is a complex one and probably best discussed in detail. You're right though: it would be a mistake to ignore the issue this early on... I do think it could be mostly transparent: the user should be able to
setup a request handler and either give it to a FastCGI listener or attach it to stdin/stdout without code changes.
This idea is one I've just come around to (but not implemented yet). I think the best solution might be passing a request handler to the FastCGI listener's constructor, like you mentioned. If the request handler can be unaware of the protocol that the request came from, that means the request handler can be used as-is with any 'protocol listener' (I'm calling them 'services', after asio's io_service, for now...) provided by the library or users of the library.
By 'async support' you mean for input/output?
Yes, like an asio socket class, with begin_read/begin_write etc.
This is one thing I always see lacking in current web development kits: so many of them spend time accessing databases etc, things that block and should be done in an async manner, and I bet suffer performance because of it. It would be nice for a change to have an API that allows such scalability when it is wanted and ease of use when it doesn't matter.
This is not so important for regular CGI but in FastCGI it could be a great boon.
I still not sure I see exactly what you mean. As far as my own prototype goes, asio is at the core of the FastCGI interpreter. It seemed natural for input and output to be done asynchronously to the thread(s) servicing each request. In other words, requests are received and parsed and enqueued when ready, all in one or more background threads; requests are then handled by other thread(s): These use a streambuf for output, which is sent to the server when required. Filling the streambuf should be synchronous (to the thread handling the request); wrapping it in headers and sending it to the server should be asynchronous. Is this what you mean, or am I miles off? Do you think the library should provide more than this? Regards, Darren

Cory Nelson wrote:
This is one thing I always see lacking in current web development kits: so many of them spend time accessing databases etc, things that block and should be done in an async manner, and I bet suffer performance because of it. It would be nice for a change to have an API that allows such scalability when it is wanted and ease of use when it doesn't matter. FYI, some database have async API:
http://www.postgresql.org/docs/current/static/libpq-async.html http://manuals.sybase.com/onlinebooks/group-cnarc/cng1110e/dblib/@Generic__B... could be a good project for asio. -- Alexander Nasonov http://nasonov.blogspot.com Dignity does not consist in possessing honors, but in deserving them. -- Aristotle -- This quote is generated by: /usr/pkg/bin/curl -L http://tinyurl.com/veusy \ | sed -e 's/^document\.write(.//' -e 's/.);$/ --/' \ -e 's/<[^>]*>//g' -e 's/^More quotes from //' \ | fmt | tee ~/.signature-quote
participants (3)
-
Alexander Nasonov
-
Cory Nelson
-
Darren Garvey