[beast] Chunking example
How would an example of receiving a chunked transfer using the synchronous API look? In pseudo-code, I want to do as follows: establish connection while not closed read chunk print chunk
On Sun, Jul 2, 2017 at 5:41 AM, Bjorn Reese via Boost
How would an example of receiving a chunked transfer using the synchronous API look? In pseudo-code, I want to do as follows:
establish connection while not closed read chunk print chunk
That's going to be be quite involved, you would need to subclass
beast::http::basic_parser and implement on_chunk to remember the chunk
size. It is not something that I recommend nor is it a common
use-case. HTTP applications are not supposed to care about the
boundaries between chunks since intermediates like proxies are allowed
to re-frame chunked message payloads. However, some applications may
wish to decode the chunk-extension and Beast handles that, but you
have to subclass beast::http::basic_parser for it.
Its possible that what you are really asking is how to read a message
payload incrementally? One of the examples performs a similar
operation:
http://vinniefalco.github.io/beast/beast/more_examples/http_relay.html
Something like this should achieve your goal (note, untested):
/* This function reads a message using a fixed size buffer to hold
portions of the body, and prints the body contents to a `std::ostream`.
*/
template<
bool isRequest,
class SyncReadStream,
class DynamicBuffer>
void
read_and_print_body(
std::ostream& os,
SyncReadStream& stream,
DynamicBuffer& buffer,
error_code& ec)
{
parser
Vinnie Falco wrote:
Its possible that what you are really asking is how to read a message payload incrementally? One of the examples performs a similar operation: http://vinniefalco.github.io/beast/beast/more_examples/http_relay.html
Something like this should achieve your goal (note, untested):
This preempts one of my questions, I was going to ask for precisely that example, as it's referenced in the documentation as a use case here: http://vinniefalco.github.io/stage/beast/review/beast/using_http/parser_stre... as "* Receive a large body using a fixed-size buffer."
On Sun, Jul 2, 2017 at 6:52 AM, Bjorn Reese via Boost
Can I differentiate between headers and trailers, or are they mixed together?
They are mixed together. You can look at the "Trailer" field after receiving the header to know which trailers have been promised. And then when the message is complete, inspect the headers again looking for those specific fields. If you absolutely need to distinguish them, for example to perform more robust error checking, reject trailers which were not promised or which are disallowed as per rfc7230, then you can subclass basic_parser and provide a suitable implementation of on_field. In this case you could store them in a separate container if desired.
On 07/02/2017 03:09 PM, Vinnie Falco via Boost wrote:
to re-frame chunked message payloads. However, some applications may wish to decode the chunk-extension and Beast handles that, but you have to subclass beast::http::basic_parser for it.
So chunk-ext fields are not passed to the user unless they write a parser to extract them?
On Sun, Jul 2, 2017 at 7:34 AM, Bjorn Reese via Boost
So chunk-ext fields are not passed to the user unless they write a parser to extract them?
If you want the chunk extensions you have to subclass basic_parser
(not the same as writing a parser). Here's an example of what a
user-defined subclass could start with:
template
On 07/02/2017 04:44 PM, Vinnie Falco via Boost wrote:
If you want the chunk extensions you have to subclass basic_parser (not the same as writing a parser). Here's an example of what a user-defined subclass could start with:
Chunking is (abstractly speaking) similiar to WebSocket frames, so I am surprised at how different these solutions are.
On Sun, Jul 2, 2017 at 9:18 AM, Bjorn Reese via Boost
On 07/02/2017 04:44 PM, Vinnie Falco via Boost wrote: Chunking is (abstractly speaking) similiar to WebSocket frames, so I am surprised at how different these solutions are.
Actually Beast treats them very similarly. Its not generally possible in Beast to see the WebSocket frame boundaries either (especially if you have compression on). It might look like you can see frame boundaries because of the function `websocket::stream::read_frame` but that function can cross boundaries or return less than a full frame. As per rfc6455 (WebSocket) interfaces presented to the application (`beast::websocket::stream` in this case) are free to reframe messages.
On 07/02/2017 06:19 PM, Vinnie Falco via Boost wrote:
Actually Beast treats them very similarly. Its not generally possible
But their APIs are quite different.
in Beast to see the WebSocket frame boundaries either (especially if you have compression on). It might look like you can see frame boundaries because of the function `websocket::stream::read_frame` but that function can cross boundaries or return less than a full frame.
As per rfc6455 (WebSocket) interfaces presented to the application (`beast::websocket::stream` in this case) are free to reframe messages.
Does Beast re-packet WebSocket frames or HTTP chunks?
On Sun, Jul 2, 2017 at 9:49 AM, Bjorn Reese via Boost
But their APIs are quite different.
I'm not sure I agree that the claimed abstract similarity of the concept warrants efforts to make the interfaces "consistent." The APIs were not planned ahead of time, they were arrived at after months of reworking to handle all of the use cases. Some use-cases were made very easy, at the expense of some other less common use-cases requiring more effort (such as extracting chunk extensions).
Does Beast re-packet WebSocket frames...
Yes
or HTTP chunks?
Chunk-encoding is removed from incoming messages before presentation to the application layer. For outgoing messages, Beast is under control over how pieces are chunked. However, the Body::reader type has influence but not complete control (the specifications is intentionally vague). Users can set chunk extensions and trailers by using a ChunkDecorator http://vinniefalco.github.io/beast/beast/using_http/serializer_stream_operat... Thanks
On 07/02/2017 07:16 PM, Vinnie Falco via Boost wrote:
On Sun, Jul 2, 2017 at 9:49 AM, Bjorn Reese via Boost
wrote:
Does Beast re-packet WebSocket frames...
Yes
Under what conditions? The reason I ask is that Beast claims to provide a low-level API, so I would assume that users can have complete control over this.
or HTTP chunks?
Chunk-encoding is removed from incoming messages before presentation to the application layer. For outgoing messages, Beast is under
Does Beast concatenate chunks before passing them to the application layer, or does the application layer get the chunks in the same way as they were received?
control over how pieces are chunked. However, the Body::reader type has influence but not complete control (the specifications is intentionally vague).
Why does the user not have complete control? Under what conditions will Beast re-packet outgoing chunks?
On Sun, Jul 2, 2017 at 10:51 AM, Bjorn Reese via Boost
Does Beast re-packet WebSocket frames... ... Yes ... Under what conditions?
There are stream settings for auto-fragmentation of messages into frames of prescribed sizes. Also, when the permessage-deflate algorithm is enabled, Beast reframes messages to fit in the fixed compression and decompression buffers it uses.
The reason I ask is that Beast claims to provide a low-level API, so I would assume that users can have complete control over this.
I don't think that's a good assumption. From https://tools.ietf.org/html/rfc6455#section-1.2 "The WebSocket message does not necessarily correspond to a particular network layer framing, as a fragmented message may be coalesced or split by an intermediary." beast::websocket::stream is considered the first intermediary, and it exercises its privilege to reframe message octets. This is established practice, Autobhan websocket implementations also do this; they have a similar auto-fragment feature (which I copied).
or HTTP chunks?
The "low-level" claims refer to the implementation of the HTTP protocol. It doesn't imply that callers will have control over chunks. Intermediaries along the HTTP path are allowed to rechunk message payloads. Perhaps I have overlooked something. What is the use-case for application level control over chunking?
Does Beast concatenate chunks before passing them to the application layer, or does the application layer get the chunks in the same way as they were received?
The application sees the message body only after the chunked Transfer-Encoding has been removed.
Why does the user not have complete control? Under what conditions will Beast re-packet outgoing chunks?
Beast's implementation does not currently reframe serialized chunks, but there is no specification for how Beast will perform the chunked encoding. This could be changed in the future, if there was a compelling use-case. Thanks
On 07/02/2017 08:04 PM, Vinnie Falco via Boost wrote:
Perhaps I have overlooked something. What is the use-case for application level control over chunking?
The publish-subscribe pattern. Consider a device (e.g. a sensor), that publishes events (e.g. measurements) that are distributed unevenly and unpredictably over time. Each chunk contains a single event, so preserving the chunk boundary is important.
The application sees the message body only after the chunked Transfer-Encoding has been removed.
Can you elaborate with an example?
Beast's implementation does not currently reframe serialized chunks, but there is no specification for how Beast will perform the chunked encoding. This could be changed in the future, if there was a compelling use-case.
Beast should not break the end-to-end principle unless explicitly allowed by the application.
On Tue, Jul 4, 2017 at 10:49 AM, Bjorn Reese via Boost
That was scary reading, because HTTPbis seemed to be unaware that such deprecation would break some of IETF's own standards, e.g. RFC 3507.
I agree :) That's why I made sure that Beast is capable of sending and receiving chunk-extensions.
Another use-case that is used in practice is for in-band meta-data. ... chunk extensions are used to carry the meta-data.
That makes sense, and that's a supported use-case. You can send the extensions and you can receive them.
Consider a device (e.g. a sensor), that publishes events (e.g. measurements) that are distributed unevenly and unpredictably over time. Each chunk contains a single event, so preserving the chunk boundary is important.
You're talking about a "perpetual message" of some kind, where the end of the message never arrives. I think that's kind of an abuse of HTTP. Isn't Websocket more suited for that? It supports message boundaries which are not rewritten by intermediates (unlike HTTP). With respect to detecting chunk boundaries on input, this is possible, but... I can't guarantee it on output. In fact I'm not even sure how you would convince Beast to keep a message in "perpetually sending" mode since that is a use-case not anticipated by rfc7230. I think the sane way to do what you want is to use Beast for sending the header, which works great, and then take responsibility for outputting the body chunks yourself. True, you'll have to be getting your hands dirty with the protocol details of HTTP but that's already assumed because you 1. want to know about chunks, 2. want to control their output, 3. want to send an infinitely long message. I'm not losing sleep that Beast doesn't make this use-case ridiculously easy.
The application sees the message body only after the chunked Transfer-Encoding has been removed. ... Can you elaborate with an example?
Let me rephrase that, beast::http::parser appends chunk bodies into the body container stored in the message (std::string for example).
From the caller's perspective they just see that the message body is growing. There's no guarantee that each increment of growth exactly corresponds to one chunk.
However, if you want to see chunks that is possible by subclassing
basic_parser and handling it in the on_chunk callback, which is called
after the chunk header is received and gets passed the size of the
upcoming body as well as the chunk-extension.
I suppose I could easily add an optional user-facing callback to
beast::http::parser with the signature
Beast should not break the end-to-end principle unless explicitly allowed by the application.
I think if users demand control over outgoing chunks, its reasonable to ask that they manually output the chunked message body; Beast will take care of the header.
Is it possible to provide a HTTP 200 response alias for Beast? ... This is needed for SHOUTcast internet radio servers.
There are no provisions for this. Beast's parser is strict. Responses which don't start with a valid HTTP-version generate an error. That said, I recognize that sometimes in the real world if you want to get in to the club you have to pay the cover. I can add a simple customization point to basic_parser giving the derived class the opportunity to parse the status-line if it wants, to handle this case. However, it won't be for free - I'll kindly ask that the stakeholders (the people who intend to use the feature) participate in the design and code review in order that the result suits their needs. Its not my area of knowledge so if we want to get this right, those with more experience should weigh in. And by the way, thanks for pointing this out! This is exactly the sort of thing that Beast needs in order to mature into something great.
The chunk decorator uses boost::asio::null_buffers to overload the field trailer.
Given your desire to adapt Beast to Networking TS, do notice that null_buffers are not part of N4656 (they have been replaced by dedicated wait functions.)
Good point!! That will have to be addressed when Beast is ported. Right now the chunk decorator is pretty raw. Again, this is a case where a feature has been created with no stakeholders to vet the design. I wrote the minimum amount of interface necessary to make sure Beast supports the feature. With feedback from people actually using it in the field, it can be improved. Thanks
On 5/07/2017 06:18, Vinnie Falco wrote:
You're talking about a "perpetual message" of some kind, where the end of the message never arrives. I think that's kind of an abuse of HTTP. Isn't Websocket more suited for that? It supports message boundaries which are not rewritten by intermediates (unlike HTTP).
With respect to detecting chunk boundaries on input, this is possible, but... I can't guarantee it on output. In fact I'm not even sure how you would convince Beast to keep a message in "perpetually sending" mode since that is a use-case not anticipated by rfc7230.
Perpetual documents are relatively common in server push techniques, even before chunked transfer was a thing. (multipart/x-mixed-replace somewhat resembles chunked transfer, except with the behaviour that the new content completely replaces the old content instead of being appended to it.) The other common pattern (Comet style long poll) is to accept a request and parse the headers but then just keep the connection open indefinitely before finally sending a response. Granted these are both old technologies now and SSE and WebSockets are "better", but there are still environments where the older ones work and the newer ones don't, and examples still exist in the wild.
On Tue, Jul 4, 2017 at 4:31 PM, Gavin Lambert via Boost
The other common pattern (Comet style long poll) is to accept a request and parse the headers but then just keep the connection open indefinitely before finally sending a response.
That's a supported use case, call `write_header` or `async_write_header` on the `serializer` and then at your leisure call `write` or `async_write` on the `serializer` again to finish.
On 07/04/2017 08:18 PM, Vinnie Falco via Boost wrote:
With respect to detecting chunk boundaries on input, this is possible, but... I can't guarantee it on output. In fact I'm not even sure how
Why not? When the user passes a buffer to Beast in chunked-mode, then serialization can convert it directly into a chunk. That ought to be very easy to do.
you would convince Beast to keep a message in "perpetually sending" mode since that is a use-case not anticipated by rfc7230.
Are you implying that Beast has an upper limit for persistent connections? If not, what are you trying to tell me? The use cases I have mentioned are way older than RFC 7230 (and RFC 6455 to answer your "why not WebSocket?" question.)
I suppose I could easily add an optional user-facing callback to beast::http::parser with the signature
to allow users to easily detect chunk boundaries during parsing. It would be a handful of lines of code
A simpler model is to send a buffer when it is passed to Beast, and likewise pass a chunk to the application as it is parsed. This preserves the chunk boundaries without any special facilities.
I think if users demand control over outgoing chunks, its reasonable to ask that they manually output the chunked message body; Beast will take care of the header.
Does this mean that the user has to manually insert chunk-size etc?
On Thu, Jul 6, 2017 at 12:41 PM, Bjorn Reese via Boost
When the user passes a buffer to Beast in chunked-mode, then serialization can convert it directly into a chunk. That ought to be very easy to do.
Something like that could be made to work if the caller is using `buffer_body`. That type of body represents caller supplied buffers, there could be a guarantee that each caller provided buffer will be output as a single chunk: https://github.com/vinniefalco/Beast/blob/89c416cde64de3265299ced216a6eef942...
you would convince Beast to keep a message in "perpetually sending" mode since that is a use-case not anticipated by rfc7230.
Are you implying that Beast has an upper limit for persistent connections? If not, what are you trying to tell me?
There's no upper limit, but I have not given consideration to a mode where a BodyReader never runs out of buffers. Perhaps it already works without change. But there are no tests for it and I have not tried it, so I cannot make a claim that it works.
...and likewise pass a chunk to the application as it is parsed. This preserves the chunk boundaries without any special facilities.
Beast does not buffer the entire chunk in memory so the application would only see pieces of it. The callback-based solution I provided, is the closest to what you're asking ("pass a chunk to the application").
I think if users demand control over outgoing chunks, its reasonable to ask that they manually output the chunked message body; Beast will take care of the header. ... Does this mean that the user has to manually insert chunk-size etc?
In the worst case, yes, but if there is a need for an easier way for users to manually produce chunked output, a simple adapter can be provided. Beast had one which I removed after I made significant improvements to the serialization interfaces, it can be brought back. This function takes a caller provided ConstBufferSequence and lazily transforms it into a new ConstBufferSequence (non-allocating) which performs a chunk-encoding: https://github.com/vinniefalco/Beast/blob/1e3543f63ed46bb1b2e698939f3cf0b132... Thanks
On Thu, Jul 6, 2017 at 12:41 PM, Bjorn Reese via Boost
...
I've thought about your feedback and I think this is an issue that is worth addressing. Some design work needs to be done to make it fit into the greater whole, I've opened an issue. Feel free to subscribe and add to it, the more feedback the better the result: https://github.com/vinniefalco/Beast/issues/614 Thanks!
On 07/02/2017 03:09 PM, Vinnie Falco via Boost wrote:
parser
p; read_header(stream, buffer, p, ec); if(ec) return; while(! p.is_done()) { char buf[512]; p.get().body.data = buf; p.get().body.size = sizeof(buf); read(stream, buffer, p, ec);
If the above is used to read a chunked transfer, what happens to chunk-ext fields? Are they inserted as header fields or are they lost? Given that this is an example of incremental reading, why does it use read() rather than read_some()?
On Sun, Jul 2, 2017 at 10:54 AM, Bjorn Reese via Boost
If the above is used to read a chunked transfer, what happens to chunk-ext fields? Are they inserted as header fields or are they lost?
Chunk extensions are not valid HTTP headers. `beast::http::parser` does not store them in the `basic_fields`. It doesn't store them at all, they are simply discarded.
From https://tools.ietf.org/html/rfc7230#section-4.1.1 "The chunked encoding is specific to each connection and is likely to be removed or recoded by each recipient (including intermediaries) before any higher-level application would have a chance to inspect the extensions. Hence, use of chunk extensions is generally limited to specialized HTTP services such as "long polling" (where client and server can have shared expectations regarding the use of chunk extensions) or for padding within an end-to-end secured connection.
To my understanding, chunk-extensions are a rare niche use-case with meaning only to applications using a custom interpretation at each end of the connection. In fact 5 years ago the IETF almost deprecated them: https://trac.ietf.org/trac/httpbis/ticket/343 Beast doesn't go out of its way to help you get at the extensions, but it also doesn't make it impossible. On the other hand, I do not have significant expertise with HTTP servers; if a compelling use-case presents itself this is an aspect of the library which may be improved, in a backward-compatible way.
Given that this is an example of incremental reading, why does it use read() rather than read_some()?
The contracts for those functions are as follows `read` continues until there's an error or the message is complete `read_some` continues until it gets at least one byte, an error occurs, or the message is complete In the example I posted the behaviors are very similar since the buffer has a 512-byte limit. Either would work.
On 07/02/2017 08:11 PM, Vinnie Falco via Boost wrote:
To my understanding, chunk-extensions are a rare niche use-case with meaning only to applications using a custom interpretation at each end of the connection. In fact 5 years ago the IETF almost deprecated them: https://trac.ietf.org/trac/httpbis/ticket/343
That was scary reading, because HTTPbis seemed to be unaware that such deprecation would break some of IETF's own standards, e.g. RFC 3507.
On the other hand, I do not have significant expertise with HTTP servers; if a compelling use-case presents itself this is an aspect of the library which may be improved, in a backward-compatible way.
Chunk extensions were originally designed for per-chunk signatures. I do not know how extensively this is used. Another use-case that is used in practice is for in-band meta-data. Consider an Internet radio station that sends a constant stream of audio. When a new track is played this will be signaled by meta-data telling the track title, artist name, etc. Some audio codec formats embed this meta-data into the stream itself (e.g. MP3 ID3 tags), while others do not. In the latter case, chunk extensions are used to carry the meta-data.
The chunk decorator uses boost::asio::null_buffers to overload the field trailer. Given your desire to adapt Beast to Networking TS, do notice that null_buffers are not part of N4656 (they have been replaced by dedicated wait functions.)
participants (4)
-
Bjorn Reese
-
Gavin Lambert
-
Peter Dimov
-
Vinnie Falco