Serialization and disctintion between eof and end of archive

Hi, The boost.serialization library seems to use eod-of-stream in the underlying istream object to denote the end-of-archive. This equivalence might make sense with files where the stream is open for a short time and really associated with a single archive, but seems to be cumbersome when used with streams that are supposed to be long-lived (network sessions?) and used for transmission of many separate archives. The problem arises between two applications that want to use serializatioin library for data exchange "on the fly", using network sockets. Small tests have shown that it's not enough for the sender to flush its output streams (although it does result in the archive's data arriving at the destination side). For the archive to be read correctly, the sender needs to entirely close the connection. This indicates that the end-of-stream condition is used to denote end-of-archive in the serialization sense. Taking into account the interface of the serialization library (where the readers are created from streams), where the stream object is syntactically supposed to live longer than the archive, treating eof as eoa is counterintuitive. I really expect this to work for the receiver: std::istream &is = ...; // some input stream, possibly long-lived while (...) { boost::archive::text_iarchive ar(is); // ... } with similar structure on the sender-side. Any thoughts? -- Maciej Sobczak : http://www.msobczak.com/ Programming : http://www.msobczak.com/prog/

Maciej Sobczak wrote:
Hi,
The boost.serialization library seems to use eod-of-stream in the underlying istream object to denote the end-of-archive.
Why does it seem that way? It would certainly be contray to my intention.
This equivalence might make sense with files where the stream is open for a short time and really associated with a single archive,
but seems to be cumbersome when used with streams that are supposed to be long-lived (network sessions?) and used for transmission of many separate archives.
I don't see that this would be a problem. What is the matter with the following? ?ostream os("pipename or whatever"); // first archive { ?_oarchive oa(os); oa << ...; } // archive is destroyed here - stream remains open and available // second archive { ?_oarchive oa(os); oa << ...; } // archive is destroyed here - stream remains open and available os.close();
The problem arises between two applications that want to use serializatioin library for data exchange "on the fly", using network sockets. Small tests have shown that it's not enough for the sender to flush its output streams (although it does result in the archive's data arriving at the destination side). For the archive to be read correctly, the sender needs to entirely close the connection. This indicates that the end-of-stream condition is used to denote end-of-archive in the serialization sense.
I don't believe the conclusion follows. The archive has to be constructed and destroyed - but the stream doesn't have to be.
Taking into account the interface of the serialization library (where the readers are created from streams), where the stream object is syntactically supposed to live longer than the archive, treating eof as eoa is counterintuitive. I really expect this to work for the receiver:
std::istream &is = ...; // some input stream, possibly long-lived
while (...) { boost::archive::text_iarchive ar(is); // ... }
with similar structure on the sender-side.
Any thoughts?
I also expect this to work Robert Ramey

Hi, Robert Ramey wrote:
The boost.serialization library seems to use eod-of-stream in the underlying istream object to denote the end-of-archive.
Why does it seem that way? It would certainly be contray to my intention.
It seems that way, because this is how it's shown by the example program provided by Seweryn on the "users" list: http://lists.boost.org/boost-users/2006/01/16646.php I have experimented a bit with this program and I've found that when the sender (the server in this case) flushes the stream, it's enough for the data to arrive at the destination, and it can be retrieved by regular stream read (the data is then identical as if the same archive was written to cout in the first place). But when the text_iarchive object is used to read it, it blocks. The only way to make it continue is to close the stream on the server side. So it looks like the text_iarchive is really waiting for eof (or for something else in the stream).
I don't see that this would be a problem. What is the matter with the following?
?ostream os("pipename or whatever");
// first archive { ?_oarchive oa(os); oa << ...; } // archive is destroyed here - stream remains open and available // second archive { ?_oarchive oa(os); oa << ...; } // archive is destroyed here - stream remains open and available
os.close();
The matter is that there might be hours of pause between these two blocks above and the receiver might not want to wait that long. The archive (the first one) should be succesfully read on the other end of the wire as soon as the bytes make their way to the receiver. It does not seem to be the case.
I don't believe the conclusion follows. The archive has to be constructed and destroyed - but the stream doesn't have to be.
Yes, but what about the reading part? Is it possible for the reader to successfully read the first archive *before* the next archive arrives (which can happen hours later)? I hope to be mistaken, but my initial experiments with the OP's code led me to the above considerations. Regards, -- Maciej Sobczak : http://www.msobczak.com/ Programming : http://www.msobczak.com/prog/

Maciej Sobczak wrote:
Hi,
Robert Ramey wrote:
The boost.serialization library seems to use eod-of-stream in the underlying istream object to denote the end-of-archive.
Why does it seem that way? It would certainly be contray to my intention.
It seems that way, because this is how it's shown by the example program provided by Seweryn on the "users" list:
That example shows something entirely different. It does not show that the serialization code relies on eol to denote end of archive. A cursory examination of the library source also should convince anyone that serialization does not depend on end of stream in anyway.
I have experimented a bit with this program and I've found that when the sender (the server in this case) flushes the stream, it's enough for the data to arrive at the destination, and it can be retrieved by regular stream read (the data is then identical as if the same archive was written to cout in the first place). But when the text_iarchive object is used to read it, it blocks. The only way to make it continue is to close the stream on the server side. So it looks like the text_iarchive is really waiting for eof (or for something else in the stream).
It may look that way, but that's not what's happening. I recomend you investigate the management (or lack there of, flushing of the underlying stream.
I don't see that this would be a problem. What is the matter with the following?
?ostream os("pipename or whatever");
// first archive { ?_oarchive oa(os); oa << ...; } // archive is destroyed here - stream remains open and available // second archive { ?_oarchive oa(os); oa << ...; } // archive is destroyed here - stream remains open and available
os.close();
The matter is that there might be hours of pause between these two blocks above and the receiver might not want to wait that long.
If that's the case, then the ?ostream streambuf implementation needs enhancement. It is outside the scope of the serialization library. The
archive (the first one) should be succesfully read on the other end of the wire as soon as the bytes make their way to the receiver. It does not seem to be the case.
It may not be - but it is not something that can be fixed from within the serialization library.
I don't believe the conclusion follows. The archive has to be constructed and destroyed - but the stream doesn't have to be.
Yes, but what about the reading part? Is it possible for the reader to successfully read the first archive *before* the next archive arrives (which can happen hours later)?
This would depend on the streambuf implementation used by the underlying stream. The serialization library library requests all characters required - no more no less. \
I hope to be mistaken, but my initial experiments with the OP's code led me to the above considerations.
I think you are mistaken. Robert Ramey
Regards,

Robert Ramey wrote:
I have experimented a bit with this program and I've found that when the sender (the server in this case) flushes the stream, it's enough for the data to arrive at the destination, and it can be retrieved by regular stream read (the data is then identical as if the same archive was written to cout in the first place). But when the text_iarchive object is used to read it, it blocks. The only way to make it continue is to close the stream on the server side. So it looks like the text_iarchive is really waiting for eof (or for something else in the stream).
It may look that way, but that's not what's happening. I recomend you investigate the management (or lack there of, flushing of the underlying stream.
OK, after further investigation it appears that the archive reader is in fact sensitive to one of these two: - end of line - end of stream It is possible to reuse long-lived connection for sending many archives (and receiving them without unnecessary waits), provided one of these two happens. So, the sender might look like this: ?ostream outstream(...); while (...) { { boost::archive::text_oarchive ar(outstream); ar << myObject; } // this: outstream << std::endl; // or '\n' followed by .flush() } With this additional newline+flush the receiver has no problems with de-serializing the data as soon as they arrive. Thank you for helping solving this. Regards, -- Maciej Sobczak : http://www.msobczak.com/ Programming : http://www.msobczak.com/prog/

Actually this does reveal the true issue - and one that can and should be addressed from withing the serialization library. text archives output each value preceeded with a space to separate the tokens. The space is used to delimit the data value. the last is >> t is waiting for a space to return - it doesn't find it untl the next archive. So text archives should be terminated with a space or newline to prevent this from happening. I will add this to the text_oarchive destructor. This should address the problem. Good work gentlemen, Robert Ramey Maciej Sobczak wrote:
Robert Ramey wrote:
I have experimented a bit with this program and I've found that when the sender (the server in this case) flushes the stream, it's enough for the data to arrive at the destination, and it can be retrieved by regular stream read (the data is then identical as if the same archive was written to cout in the first place). But when the text_iarchive object is used to read it, it blocks. The only way to make it continue is to close the stream on the server side. So it looks like the text_iarchive is really waiting for eof (or for something else in the stream).
It may look that way, but that's not what's happening. I recomend you investigate the management (or lack there of, flushing of the underlying stream.
OK, after further investigation it appears that the archive reader is in fact sensitive to one of these two:
- end of line - end of stream
It is possible to reuse long-lived connection for sending many archives (and receiving them without unnecessary waits), provided one of these two happens. So, the sender might look like this:
?ostream outstream(...);
while (...) { { boost::archive::text_oarchive ar(outstream);
ar << myObject; }
// this: outstream << std::endl; // or '\n' followed by .flush() }
With this additional newline+flush the receiver has no problems with de-serializing the data as soon as they arrive.
Thank you for helping solving this. Regards,
participants (2)
-
Maciej Sobczak
-
Robert Ramey