[GSoC][cgi] Status update.

older
[Filesystem] Anyone interested in...

Darren Garvey

16 Aug 2007 16 Aug '07

10:44 p.m.

Hello all, [Note: please only reply to boost@lists.boost.org, don't 'reply all'. Cheers.] As you may be aware, we have been working on a Boost-style CGI library for the Google Summer of Code. There is still a lot of work to do on the code, but it's 'taking shape' now so I'm inviting any interested parties for their comments. The structure of the library has changed quite significantly since the GSoC program started, for the better we believe, but you may disagree. ;) We'd like to know. Docs are located at: http://cgi.sf.net Source is in the Boost sandbox, here: http://tinyurl.com/2ldvk3 The internal links in the docs are mostly not working at the moment, sorry, but the TOC can be used to get you around. Regards, Darren

Show replies by date

Phil Endecott

17 Aug 17 Aug

8:48 p.m.

Darren Garvey wrote:

...

As you may be aware, we have been working on a Boost-style CGI library for the Google Summer of Code. There is still a lot of work to do on the code, but it's 'taking shape' now so I'm inviting any interested parties for their comments. The structure of the library has changed quite significantly since the GSoC program started, for the better we believe, but you may disagree. ;) We'd like to know.

Docs are located at: http://cgi.sf.net Source is in the Boost sandbox, here: http://tinyurl.com/2ldvk3

The internal links in the docs are mostly not working at the moment, sorry, but the TOC can be used to get you around.

Hi Darren, I'd like to look at the tutorial or quickstart sections, but the links from the introduction page don't work (as expected), and they're not listed in the TOC. Can you post a link? From what I can see so far - - You've chosen some too-short identifiers; abbreviating "request" to "req" is fine, but abbreviating "sync" to "s" is not. So I'd vote for "sync_req", rather than "srequest". - The documentation is all very "in at the deep end", starting by describing the differences between the different protocols that you've implemented. It would be better to describe the common aspects, i.e. how to access form or other data. As far as I can see you don't ever describe this, always just commenting "use the request here", or something like that. - Err, actually maybe you do describe the functions for accessing the form data, in the "request meta-data" section at the end of http://cgi.sourceforge.net/html/cgi/ug.html. But why is this _meta_ data, not just "data"? I will have another look when some more tutorial material is available. Regards, Phil.

Darren Garvey

18 Aug 18 Aug

6:33 p.m.

Hi Phil, On 17/08/07, Phil Endecott < spam_from_boost_dev@chezphil.org> wrote:

...

I'd like to look at the tutorial or quickstart sections, but the links from the introduction page don't work (as expected), and they're not listed in the TOC. Can you post a link?

I've restructured the docs a bit. Now the tutorial is linked from the first page: http://cgi.sf.net

...

From what I can see so far -

- You've chosen some too-short identifiers; abbreviating "request" to "req" is fine, but abbreviating "sync" to "s" is not. So I'd vote for "sync_req", rather than "srequest".

I based that on things like xpressive::sregex (static regex). I agree it's not particularly clear though and it's probably a good idea to remove the difference and just follow the conventions described here: http://tinyurl.com/35cnrg. - The documentation is all very "in at the deep end", starting by

...

describing the differences between the different protocols that you've implemented. It would be better to describe the common aspects, i.e. how to access form or other data. As far as I can see you don't ever describe this, always just commenting "use the request here", or something like that.

Hopefully it's a bit clearer now, with the quickstart put up. Two main sections which still need to be uploaded - on `cgi::reply` and `basic_protocol_service<>`s - will have to wait as this computer can't do what I need. Still, it should be relatively guessable how to write a basic CGI program (but I could be mistaken). - Err, actually maybe you do describe the functions for accessing the

...

form data, in the "request meta-data" section at the end of http://cgi.sourceforge.net/html/cgi/ug.html . But why is this _meta_ data, not just "data"?

That's because form/environment data are 'meta-variables', according to the CGI specification. Perhaps it's a bit liberal/confusing to swap 'variables' for 'data', since this is C++ after all? I've tried to clear this up, so as to not be misleading. I will have another look when some more tutorial material is available. Thanks again for the interest. Sorry the internal links are still broken! Regards, Darren

Eric Niebler

7:28 p.m.

Darren Garvey wrote:

...

On 17/08/07, Phil Endecott < spam_from_boost_dev@chezphil.org> wrote:

...
From what I can see so far - - You've chosen some too-short identifiers; abbreviating "request" to "req" is fine, but abbreviating "sync" to "s" is not. So I'd vote for "sync_req", rather than "srequest".

I based that on things like xpressive::sregex (static regex).

FYI- the "s" in xpressive::sregex stands for "string", not "static". It follows TR1 regex, which uses for example "smatch" and "cmatch" as typedefs for match_results<string::iterator> and match_result<char const *> respectively. -- Eric Niebler Boost Consulting www.boost-consulting.com The Astoria Seminar ==> http://www.astoriaseminar.com

Phil Endecott

19 Aug 19 Aug

11:57 a.m.

Darren Garvey wrote:

...

On 17/08/07, Phil Endecott < spam_from_boost_dev@chezphil.org> wrote:

...
- Err, actually maybe you do describe the functions for accessing the form data, in the "request meta-data" section at the end of http://cgi.sourceforge.net/html/cgi/ug.html . But why is this _meta_ data, not just "data"?

That's because form/environment data are 'meta-variables', according to the CGI specification.

Hmm, you mean rfc3875 section 4.1 "Request Meta-Variables"? Most of the stuff that that section is describing really is meta-data, i.e. data-about-the-data; e.g. the IP address of the HTTP client and the id of the user. Section 4.2 describes the contents of the HTML form where it's called simply "request data". At least, that's the case for POST data. For GET data, it's true that the form data is in QUERY_STRING which the spec groups with the meta-data, and according to section 4.3.1 "The GET method indicates that the script should produce a document based on the meta-variable values." But you're hiding the difference between GET and POST form data, and I suggest that you apply common sense rather than just copying the spec: call the meta-data meta-data, and call the data data.

...

...
I will have another look when some more tutorial material is available.

Thanks for the tutorial. Some more thoughts: - I think it's important, especially for a Boost library, that you supply standard-library-like interfaces where possible. For example, I would like to see the form variables having a std::map-like interface. (This would be most easily achieved by actually making them a std::map, but if you choose some other implementation structure then you can use Boost.Iterator to help with the interface). This would allow me to iterate through them. At present, I don't think you have a way to get the names of all the form variables. - I don't see a way to use the form data parsers independently. For example, if I write my own code for an Apache module or a stand-alone HTTP server, I would like to be able to invoke your parser to decipher the urlencoded or multipart/form-data from a string that I supply. You could provide this as an independent header file. - In the '10 minute intro', you have: std::string user_name( req.meta_cookie("user_name") ); if (!user_name.empty()) { .... So is it impossible to distinguish between an empty string and a form or cookie value not being set? - I'm not sure about the idea of sending the response to the request object. I would think of the request object as a const thing. When I see req << "something..." I thing that that's something that should only happen when the request object is being created e.g. to set the POST data. In my apps I have always had a handler function that returns a response: HTTP::Response handle ( const HTTP::Request& req ) { ... } but this doesn't fit as well into your arrangement. Perhaps you need a "Message" object to contain a request and the corresponding response? ("Message" is the RFC2616 term for a request/response pair.) - The term "Response" is normally used rather than "Reply" (e.g. in the HTTP spec). - For multipart/form-data, each form variable has some associated metadata; e.g. for a file upload, the filename and possibly the mime type are supplied by the browser, and the charset may be indicated for other variables. I don't see any way to get at this. Regards, Phil.

Darren Garvey

20 Aug 20 Aug

1:07 a.m.

Hi Phil, On 19/08/07, Phil Endecott <spam_from_boost_dev@chezphil.org > wrote:

...

...
That's because form/environment data are 'meta-variables', according to the CGI specification.

Hmm, you mean rfc3875 section 4.1 "Request Meta-Variables"?

Yep. Most of the stuff that that section is describing really is meta-data, i.e.

...

data-about-the-data; e.g. the IP address of the HTTP client and the id of the user. Section 4.2 describes the contents of the HTML form where it's called simply "request data".

At least, that's the case for POST data. For GET data, it's true that the form data is in QUERY_STRING which the spec groups with the meta-data, and according to section 4.3.1 "The GET method indicates that the script should produce a document based on the meta-variable values." But you're hiding the difference between GET and POST form data, and I suggest that you apply common sense rather than just copying the spec: call the meta-data meta-data, and call the data data.

What you say makes perfect sense, unfortunately... To be honest, the reason for referring to the data as 'meta-data' is because I thought having getter functions called 'meta_*' was the least ambiguous way to do it (and obviously I thought the term was accurate). The 'get' and 'post' keywords seem to make most alternatives sound like they're doing something they're not. The previous iteration used: var<GET>() var<POST>() ... The change of heart came because that gets very ugly when you need to use qualified names. Still, if calling them all meta_* is plain wrong, then it has to change. - I think it's important, especially for a Boost library, that you

...

supply standard-library-like interfaces where possible. For example, I would like to see the form variables having a std::map-like interface. (This would be most easily achieved by actually making them a std::map, but if you choose some other implementation structure then you can use Boost.Iterator to help with the interface). This would allow me to iterate through them. At present, I don't think you have a way to get the names of all the form variables.

Oops, forgot to document this! There is a way to do this actually: cgi::request req; cgi::map& form_map = req.meta_form(); cgi::map is a typedef for map<string,string> for now, but this should probably be a multimap. I've made a note about where this might be heading in the 'future development' section of the docs (linked from the first page). Note that this can't currently be used for environment variables: this will probably change, but with the warning that it's going to be much slower for standard CGI.

...

- I don't see a way to use the form data parsers independently. For example, if I write my own code for an Apache module or a stand-alone HTTP server, I would like to be able to invoke your parser to decipher the urlencoded or multipart/form-data from a string that I supply. You could provide this as an independent header file.

Currently it's an implementation detail. It would be nice to make this visible to library users, but I'm not sure what we're using now is generic enough to make that sensible. We'll see, I suppose. :)

...

- In the '10 minute intro', you have: std::string user_name( req.meta_cookie("user_name") ); if (!user_name.empty()) { .... So is it impossible to distinguish between an empty string and a form or cookie value not being set?

Not really... I was thinking about this earlier and it could be done easily enough (I won't go into it, if you don't mind), but it would be a bit ugly, say something like: std::string name( req.meta_get("name") ); if (req.is_unset(name)) // var was not set There are a couple of other possibilities, one of which is to use a `cgi::param` instead of `std::string`s. Then you could do: cgi::param& name = req.meta_form("name"); if (name.empty()) // `name` contains no data if (name.unset()) // "name" is not a form variable Another option is to set the error_code value to something like error::unset in the case where the error-checked version of meta_* is used: boost::system::error_code ec; std::string name( req.meta_post("name", ec) ); if (name.empty()) // `name` contains no data if (ec ==error::unset) // "name" is not a POST variable Thoughts?

...

- I'm not sure about the idea of sending the response to the request object. I would think of the request object as a const thing. When I see req << "something..." I thing that that's something that should only happen when the request object is being created e.g. to set the POST data. In my apps I have always had a handler function that returns a response: HTTP::Response handle ( const HTTP::Request& req ) { ... } but this doesn't fit as well into your arrangement.

I'm not sure I follow. The request object doesn't support operator<<. You write to a request by using something like: cgi::request req; std::string response("Content-type: text/plain\r\n\r\nHello"); // variation 1 cgi::write(req, cgi::buffer(response)); // variation 2 cgi::async_write(req, cgi::buffer(response)); // variation 3 cgi::reply rep; rep<< response; rep.send(req); Perhaps the docs are misleading?

...

Perhaps you need a "Message" object to contain a request and the corresponding response? ("Message" is the RFC2616 term for a request/response pair.)

As it develops, it seems more sensible to drop the 'reply' object entirely and provide a single iostream interface on top of the request, much like there is an iostream above a tcp::socket in Boost.Asio: That design has been approved by boost members already and seems to work quite nicely.

...

- The term "Response" is normally used rather than "Reply" (e.g. in the HTTP spec).

Noted, thanks.

...

- For multipart/form-data, each form variable has some associated metadata; e.g. for a file upload, the filename and possibly the mime type are supplied by the browser, and the charset may be indicated for other variables. I don't see any way to get at this.

That's because there isn't. ;) You can expect this to be available eventually. Thanks for the pointers, Darren

Phil Endecott

10:54 a.m.

Darren Garvey wrote:

...

On 19/08/07, Phil Endecott <spam_from_boost_dev@chezphil.org > wrote:

...
- I think it's important, especially for a Boost library, that you supply standard-library-like interfaces where possible. For example, I would like to see the form variables having a std::map-like interface. (This would be most easily achieved by actually making them a std::map, but if you choose some other implementation structure then you can use Boost.Iterator to help with the interface). This would allow me to iterate through them. At present, I don't think you have a way to get the names of all the form variables.

Oops, forgot to document this! There is a way to do this actually:

cgi::request req; cgi::map& form_map = req.meta_form();

Good. If I were you, I would advertise this as the primary way to access the data, i.e. string frob = req.form["frob"]; rather than string frob = req.meta_form("frob"); because.....

...

...
- In the '10 minute intro', you have: std::string user_name( req.meta_cookie("user_name") ); if (!user_name.empty()) { .... So is it impossible to distinguish between an empty string and a form or cookie value not being set?

Not really... I was thinking about this earlier and it could be done easily enough (I won't go into it, if you don't mind), but it would be a bit ugly,

...we all know how to do this with a std::map: cgi::map::const_iterator i = req.form.find("frob"); if (i==req.form.end()) { ....not set.... } else { string frob = i->second; ....... } (OK, you might think that's ugly too, but it's "standard ugly". The main point is that a user already knows how to do this, and doesn't need to refer to your docs to discover your is_unset() function or whatever.)

...

Note that [the data map] can't currently be used for environment variables: this will probably change, but with the warning that it's going to be much slower for standard CGI.

Why? I don't think getenv is particularly slow. Actually a small library that implements a std::map-like interface to the environment variables would be useful in itself. I encourage you to take every opportunity to make components that can be used independently, as this could.

...

...
- I'm not sure about the idea of sending the response to the request object. I would think of the request object as a const thing. When I see req << "something..." I thing that that's something that should only happen when the request object is being created e.g. to set the POST data. In my apps I have always had a handler function that returns a response: HTTP::Response handle ( const HTTP::Request& req ) { ... } but this doesn't fit as well into your arrangement.

I'm not sure I follow. The request object doesn't support operator<<.

Err OK. I was probably confused by "rep" and "req" not differing by many pixels, and this line from the Quickstart: "You can write to the request object directly, but for now we're going to just use the reply, which is simpler. Writing to a reply is buffered - whereas writing to the request directly isn't" Regards, Phil.

Darren Garvey

25 Aug 25 Aug

2:31 a.m.

Hi Phil, sorry for the late response. On 20/08/07, Phil Endecott <spam_from_boost_dev@chezphil.org> wrote:

...

...
cgi::request req; cgi::map& form_map = req.meta_form();

Good. If I were you, I would advertise this as the primary way to access the data, i.e.

string frob = req.form ["frob"];



...we all know how to do this with a std::map:

cgi::map::const_iterator i = req.form.find("frob"); if (i==req.form.end()) { ....not set.... } else { string frob = i->second; ....... }

(OK, you might think that's ugly too, but it's "standard ugly". The main point is that a user already knows how to do this, and doesn't need to refer to your docs to discover your is_unset() function or whatever.)

I agree that the 'standard' way should be encouraged more. At the same time, the member function version has the advantage that there is no requirement to parse all of the variables unless it's needed, which is a bonus in the cast of POST data. Plus, I'm not too keen on allowing direct access to the variables. What I put in for now seems (to me) like a reasonable compromise: request req; cgi::map& form_map = req.form_(); // returns the map of the form variables std::string name = req.form_()["name"]; // direct access like this name = req.form_()["name"]->first; // or even like this! BOOST_ASSERT( name == req.form_("name")); // shortcut to the above (obviously) The reasoning behind the trailing underscore, is to avoid req.get() and req.post(), which I think are misleading (as is req.get_var()). This is about the fifth iteration for these functions' names, but I quite like this one. Any better suggestions?

...

Note that [the data map] can't currently be used for environment variables: this will

...
probably change, but with the warning that it's going to be much slower for standard CGI.

Why? I don't think getenv is particularly slow. Actually a small library that implements a std::map-like interface to the environment variables would be useful in itself. I encourage you to take every opportunity to make components that can be used independently, as this could.

I finally discovered `extern char ** environ;`. Exactly what's needed; I've added this so now access is uniform across variable types. :) (if you're interested: *http://tinyurl.com/2delnd* - it's not a 'map-like interface' though, it just copies the environment data into a map<string,string>. It probably needs profiling and tweaking too).

...

Err OK. I was probably confused by "rep" and "req" not differing by many pixels, and this line from the Quickstart: "You can write to the request object directly, but for now we're going to just use the reply, which is simpler. Writing to a reply is buffered - whereas writing to the request directly isn't"

No problem. I'll fix this, thanks. Regards, Darren

Darren Garvey

2:38 a.m.

On 25/08/07, Darren Garvey <lists.drrngrvy@googlemail.com> wrote:

...

name = req.form_()["name"]->first; // or even like this!

This should obviously be: name = req.form_().find("name")->first; // or even like this! Sorry for the rather pointless noise.

Phil Endecott

1:02 p.m.

Hi Darren, Darren Garvey wrote:

...

On 20/08/07, Phil Endecott <spam_from_boost_dev@chezphil.org> wrote:

...


...
cgi::request req; cgi::map& form_map = req.meta_form();

Good. If I were you, I would advertise this as the primary way to access the data, i.e.

string frob = req.form ["frob"];



...we all know how to do this with a std::map:

cgi::map::const_iterator i = req.form.find("frob"); if (i==req.form.end()) { ....not set.... } else { string frob = i->second; ....... }

(OK, you might think that's ugly too, but it's "standard ugly". The main point is that a user already knows how to do this, and doesn't need to refer to your docs to discover your is_unset() function or whatever.)

I agree that the 'standard' way should be encouraged more. At the same time, the member function version has the advantage that there is no requirement to parse all of the variables unless it's needed, which is a bonus in the cast of POST data.

I think you could implement a map-like interface that parses the input lazily. Presumably you have a parser for each form-data format that provides some sort of iterator-adapter, which takes a forward iterator over the character data and returns an iterator over name-value string pairs. I think I've already suggested that these parsers should be accessible independent of the rest of the library.

...

Plus, I'm not too keen on allowing direct access to the variables.

Why?

...

What I put in for now seems (to me) like a reasonable compromise:

request req; cgi::map& form_map = req.form_(); // returns the map of the form variables std::string name = req.form_()["name"]; // direct access like this name = req.form_()["name"]->first; // or even like this! BOOST_ASSERT( name == req.form_("name")); // shortcut to the above (obviously)

The reasoning behind the trailing underscore, is to avoid req.get() and req.post(), which I think are misleading (as is req.get_var()). This is about the fifth iteration for these functions' names, but I quite like this one. Any better suggestions?

req.form_data

...

...
Note that [the data map] can't currently be used for environment variables: this will

...
probably change, but with the warning that it's going to be much slower for standard CGI.

Why? I don't think getenv is particularly slow. Actually a small library that implements a std::map-like interface to the environment variables would be useful in itself. I encourage you to take every opportunity to make components that can be used independently, as this could.

I finally discovered `extern char ** environ;`. Exactly what's needed; I've added this so now access is uniform across variable types. :) (if you're interested: *http://tinyurl.com/2delnd* - it's not a 'map-like interface' though, it just copies the environment data into a map<string,string>. It probably needs profiling and tweaking too).

Well a const map<string,const char*> would probably be about as good as you can get, i.e. don't copy the values of the environment variables. Regards, Phil.

Jean-Christophe Roux

17 Aug 17 Aug

9:24 p.m.

Darren Garvey wrote:

...

Hello all,

Hello Darren, Here is basic use case. I have this ajaxian website with a button that, on a click, calls a php script taking a couple of POST parameters and returning a string. I suspect that C++ would nicely speed up the process. What would be the best use of your library? Of course, I worry about the time to start the process. Regards Jean-Christophe

Darren Garvey

18 Aug 18 Aug

6:50 p.m.

Hi Jean-Christophe, On 17/08/07, Jean-Christophe Roux <jcxxr@yahoo.com> wrote:

...

Darren Garvey wrote:

...
Hello all,

Hello Darren,

Here is basic use case. I have this ajaxian website with a button that, on a click, calls a php script taking a couple of POST parameters and returning a string. I suspect that C++ would nicely speed up the process. What would be the best use of your library? Of course, I worry about the time to start the process.

Well, FastCGI is ideal for AJAX functions. It was supposed to be implemented by now, but unfortunately you'll have to wait a little bit longer. :( The structure of the library has been mostly worked out, so I'm '85% sure' that the following code will work, eventually. Note that the following program is basically a backend daemon and will handle an arbitrary number of requests (you'll probably limit this with your server though). #include <boost/cgi/fcgi.hpp> #include <boost/thread.hpp> #include <boost/bind.hpp> int sub_main(cgi::fcgi_request& req) { cgi::reply rep; // use this to write to the client // First, check the client didn't send us too much data if (req.content_length() > 2048) { return req.close(cgi::http::bad_request, 1); } rep<< "This will be output to the client" << "the POST variable 'post_var_1' has a value" << req.meta_post("post_var_1", true); // the 'true' above tells the request to read and parse the client's post data until // either 'post_var_1' is found or the end of POST data is reached. rep.send(req); // return the response to the client. return req.close(cgi::http::ok, 0); } int main(int,char**) { // This is in the examples: http://tinyurl.com/27vnnx cgi::fcgi_threadpool_server server; boost::thread t1(boost::bind(&cgi::fcgi_threadpool_server::run, server); t1.join(); return 0; } Regards, Darrem

Darren Garvey

6:53 p.m.

...

<snip> // This is in the examples: http://tinyurl.com/27vnnx cgi::fcgi_threadpool_server server; boost::thread t1(boost::bind(&cgi::fcgi_threadpool_server::run, server); t1.join();

Oops. This should of course be: // Run a server with a threadpool of 15 threads, calling `sub_main` every time a request is // ready to be run cgi::fcgi_threadpool_server server(15, &sub_main); server.run(); Regards, Darren

Mathias Gaunard

19 Aug 19 Aug

7:38 p.m.

Jean-Christophe Roux wrote:

...

Here is basic use case. I have this ajaxian website with a button that, on a click, calls a php script taking a couple of POST parameters and returning a string. I suspect that C++ would nicely speed up the process. What would be the best use of your library? Of course, I worry about the time to start the process.

The time to start the process is an issue that always exists with CGI, and can be reduced by using FastCGI or SCGI. Anyway, if you use PHP as an apache module, this is not only highly unsafe but also quite annoying, permissions-wise and stuff. The only right way to do such a thing is using CGI, FastCGI or SCGI.

Peter Foley

11:14 a.m.

Hi Darren, I am glad to see that your making progress! I have limited time this week to look at it but I will see what I can do as I am interested in this library. I have had a quick look through the documentation and have a couple of questions (they could go into an FAQ). 1. I know you have a section on "Server Support" (which seems to be a placeholder atm) but which web servers have you tested this with? Have you only tried this with apache? Is the library code portable? 2. Somewhat related to the above if the library code is portable will the FASTCGI support you are building in only support pipes or will it also support TCP sockets? The reason I am asking is that IIS7 will be providing support for the FASTCGI protocol (see http://www.iis.net/default.aspx?tabid=1000051 for more info). Thanks, Peter. From: Darren Garvey [mailto:lists.drrngrvy@googlemail.com] Sent: Friday, 17 August 2007 8:44 AM To: boost@lists.boost.org; spam_from_boost_dev@chezphil.org; Peter Foley; jeff_j_dunlap@yahoo.com; ssiloti@gmail.com; shams@orcon.net.nz Subject: [boost][GSoC][cgi] Status update. Hello all, [Note: please only reply to boost@lists.boost.org, don't 'reply all'. Cheers.] As you may be aware, we have been working on a Boost-style CGI library for the Google Summer of Code. There is still a lot of work to do on the code, but it's 'taking shape' now so I'm inviting any interested parties for their comments. The structure of the library has changed quite significantly since the GSoC program started, for the better we believe, but you may disagree. ;) We'd like to know. Docs are located at: http://cgi.sf.net Source is in the Boost sandbox, here: http://tinyurl.com/2ldvk3 The internal links in the docs are mostly not working at the moment, sorry, but the TOC can be used to get you around. Regards, Darren

Darren Garvey

7:38 p.m.

Hi Peter, On 19/08/07, Peter Foley <peter@ifoley.id.au> wrote:

...

<snip> 1. I know you have a section on "Server Support" (which seems to be a placeholder atm) but which web servers have you tested this with? Have you only tried this with apache? Is the library code portable?

I have access to apache 1.3.*/2.* and lighttpd 1.4.*/1.5.* on windows, cygwin and linux, but I haven't been testing on all of them yet since there are more fundamental things that need to be done first. :-( 2. Somewhat related to the above if the library code is

...

portable will the FASTCGI support you are building in only support pipes or will it also support TCP sockets? The reason I am asking is that IIS7 will be providing support for the FASTCGI protocol (see http://www.iis.net/default.aspx?tabid=1000051 for more info).

Currently Boost.Asio doesn't support pipes, so this library won't either, so it uses TCP sockets. It's undecided whether, when pipe support comes about, the use of the two will be a run-time or compile-time choice. Over the summer the leaning has changed from run- to compile-time, but this is debatable... Regards, Darren

Martin Wille

2:32 p.m.

Darren Garvey wrote:

...

Docs are located at: http://cgi.sf.net

I had a quick glance at the docs. On the "supported protocols" page you list when which of the supported protocols could be used. An important reason - maybe the most important one - to use FCGI or SCGI is missing: privilege separation. With S/FCGI you can run the server process under a different user id, with different rights to access certain resources, even with a different view on the system the software runs on (e.g. chroot), or on a separate machine. Regards, m

Peter Dimov

3:25 p.m.

Martin Wille wrote:

...

Darren Garvey wrote:

...
Docs are located at: http://cgi.sf.net

I had a quick glance at the docs.

So did I. Sorry about replying to this post instead of the original, I've deleted it. I'm not sure I like the cgi::xcgi_service scheme and the cgi::service typedef'ing depending on what headers are included. How about just providing xcgi::service? Then the appropriate protocol can be chosen via 'using namespace xcgi' or 'namespace cgi = xcgi'.

Darren Garvey

7:27 p.m.

On 19/08/07, Peter Dimov <pdimov@pdimov.com> wrote:

...

I'm not sure I like the cgi::xcgi_service scheme and the cgi::service typedef'ing depending on what headers are included. How about just providing xcgi::service? Then the appropriate protocol can be chosen via 'using namespace xcgi' or 'namespace cgi = xcgi'.

This is something I've been meaning to get feedback on, actually. The problem with giving things their own namespace is IMO it gets a bit ugly, for example: boost::cgi::fcgi::service service; The thing I've been wondering is about a single library dumping more than one namespace into the boost namespace, so you'd have boost::cgi, boost::fcgi and boost::scgi. I guessed that idea would be shot down in flames though. Why do you not like the typedef-header scheme? Too fickle? Regards, Darren

Martin Wille

7:36 p.m.

Darren Garvey wrote:

...

Why do you not like the typedef-header scheme? Too fickle?

It's fragile at the user's site. E.g. if a user uses two libraries that happen to #include two different Boost.CGI (tentative name :) headers then things break horribly. Of course, you could blame those libraries for their leaky abstractions, but the root cause is that you needlessly gave people a gun to shoot themselves in their feet. Regards, m

Darren Garvey

7:43 p.m.

On 19/08/07, Martin Wille <mw8329@yahoo.com.au> wrote:

...

Darren Garvey wrote:

...
Why do you not like the typedef-header scheme? Too fickle?

It's fragile at the user's site. E.g. if a user uses two libraries that happen to #include two different Boost.CGI (tentative name :) headers then things break horribly. Of course, you could blame those libraries for their leaky abstractions, but the root cause is that you needlessly gave people a gun to shoot themselves in their feet.

Well two protocols can't ever be used in the same program, so I don't think the situation you describe could ever arise. Also, if a library/program is designed for only one protocol, a macro like 'BOOST_CGI_EXPLICIT_XCGI' could be defined, which would mean that library can only be used with a particular protocol. My lack of experience might be showing here though: this could for all I know be a generally recognised timebomb... Regards, Darren

Martin Wille

7:53 p.m.

Darren Garvey wrote:

...

On 19/08/07, Martin Wille wrote:

...

Well two protocols can't ever be used in the same program,

Why is that? Can't I have a thread answering fcgi requests while another one answers scgi requests? Or even a single thread that can answer either sort of request? Regards, m

Darren Garvey

8:26 p.m.

On 19/08/07, Martin Wille <mw8329@yahoo.com.au> wrote:

...

Darren Garvey wrote:

...
On 19/08/07, Martin Wille wrote:

...
Well two protocols can't ever be used in the same program,

Why is that? Can't I have a thread answering fcgi requests while another one answers scgi requests? Or even a single thread that can answer either sort of request?

Hmm, well the program has to bind to port 0, where connections are connected to. I have looked into this and as far as I can tell, there is no efficient way to differentiate between an SCGI request and a FastCGI one: it is possible, but you would have a noticeable overhead to do this. The only situation I can think of where you would have a program accepting with both protocols is when you have a remote SCGI/FastCGI daemon handling requests from different HTTP servers. It doesn't seem sensible to use both protocols from a single server, as choosing one over the other is, IIUC, a configuration issue. In the remote daemon case, why would you not just recompile the program twice, one daemon for each protocol? Regards, Darren

Martin Wille

9:06 p.m.

Darren Garvey wrote:

...

On 19/08/07, Martin Wille wrote:

...
Darren Garvey wrote:

...
On 19/08/07, Martin Wille wrote: Well two protocols can't ever be used in the same program, Why is that? Can't I have a thread answering fcgi requests while another one answers scgi requests? Or even a single thread that can answer either sort of request?

Hmm, well the program has to bind to port 0, where connections are connected to.

What? Port 0?

...

I have looked into this and as far as I can tell, there is no efficient way to differentiate between an SCGI request and a FastCGI one: it is possible, but you would have a noticeable overhead to do this.

A program can listen() to and accept() connections from more than one port. Offering SCGI and FCGI over the same port would indeed not make a lot of sense.

...

The only situation I can think of where you would have a program accepting with both protocols is when you have a remote SCGI/FastCGI daemon handling requests from different HTTP servers.

It can be a single HTTP server that mounts different SCGI/FCGI server addresses at different points in its document tree. Not a very likely setup, of course. Another scenario: imagine a closed-source FCGI/SCGI hybrid server. That server can't predict whether FCGI or SCGI will be supported better on the HTTP server that gets used by a customer. The scenario you mentioned isn't uncommon, either. E.g. consider an FCGI/SCGI hybrid in a DMZ. That hybrid could get accessed from an HTTP server inside the DMZ and from another HTTP server from the intranet of a company. One could also imagine SCGI or FCGI being used by programs that do not happen to be a web server. (Supporting *CGI from the other side would be a nice addition to your library ;)

...

It doesn't seem sensible to use both protocols from a single server, as choosing one over the other is, IIUC, a configuration issue. In the remote daemon case, why would you not just recompile the program twice, one daemon for each protocol?

E.g. because it can make sense to implement locks inside a single application instead of using some additional IPC mechanism for them. Regards, m

Darren Garvey

20 Aug 20 Aug

3:55 a.m.

Martin, On 19/08/07, Martin Wille <mw8329@yahoo.com.au > wrote:

...

...
Hmm, well the program has to bind to port 0, where connections are connected

to.

What? Port 0?

Hmm... Ok, they both bind to the stdin file descriptor (0). I've made this mistake before and obviously 'informed' myself with sketchy info. I'm very sorry for the (rotten) noise, but I'm glad my head's out of the sand (again, for now). Nasty. Anyway, a single program *can* indeed work with both protocols. This requires only a trivial addition to the interface and only minor changes internally. However, it does make the selective header/implicit typedef's strategy seem much more fickle. <snip>

...

Another scenario: imagine a closed-source FCGI/SCGI hybrid server. That server can't predict whether FCGI or SCGI will be supported better on the HTTP server that gets used by a customer.

Noted. Providing two different port numbers is much simpler than providing two versions of the program. One could also imagine SCGI or FCGI being used by programs that do not

...

happen to be a web server. (Supporting *CGI from the other side would be a nice addition to your library ;)

That's a large pool of use-cases that needs to be catered for, no doubt. Using a FastCGI daemon as an intermediate server/filter/load-balancer/etc. to other back-end processes is a very reasonable use-case, for example. Thanks for your persistence, Martin, that was no small brain-wrong you pointed out. :) Regards, Darren

Peter Dimov

19 Aug 19 Aug

7:57 p.m.

Darren Garvey wrote:

...

The thing I've been wondering is about a single library dumping more than one namespace into the boost namespace, so you'd have boost::cgi, boost::fcgi and boost::scgi. I guessed that idea would be shot down in flames though.

I see nothing wrong with that.

Martin Wille

8:21 p.m.

Darren Garvey wrote:

...

This is something I've been meaning to get feedback on, actually. The problem with giving things their own namespace is IMO it gets a bit ugly, for example:

boost::cgi::fcgi::service service;

People can easily use typedefs or namespace aliases if they want to avoid the long names. Regards, m

Darren Garvey

8:34 p.m.

On 19/08/07, Martin Wille <mw8329@yahoo.com.au> wrote:

...

Darren Garvey wrote:

...
Docs are located at: http://cgi.sf.net

I had a quick glance at the docs. On the "supported protocols" page you list when which of the supported protocols could be used. An important reason - maybe the most important one - to use FCGI or SCGI is missing: privilege separation. With S/FCGI you can run the server process under a different user id, with different rights to access certain resources, even with a different view on the system the software runs on (e.g. chroot), or on a separate machine.

Good point. You reminded me of another advantage not explicitly mentioned: since S/FastCGI daemons connect to the server via tcp sockets, they can be in remote locations. FastCGI offers some security support too. Thanks for that! Darren

6525

Age (days ago)

6534

Last active (days ago)

List overview

Download

27 comments

8 participants

participants (8)

Darren Garvey
Eric Niebler
Jean-Christophe Roux
Martin Wille
Mathias Gaunard
Peter Dimov
Peter Foley
Phil Endecott