
Neal Becker wrote:
I have 1 more question. What about an archive that uses mmap? This should be very efficient for saving/restoring large structures in binary format. Is this a good idea?
I did all the STL collections as I considered these "fundamental". . I did a version of shared_ptr myself as a test/proof of concept. 1 more question? Hmmm so for I have received submissions for serialization of optional, scoped_ptr, date_time library among othersThese are from people who needed to serialize other boost objects that they use. I've stopped incorporating them as the library is getting too big for me to manage. If this library is ever accepted, serialization of boost components will have to be a responsibility separate from the serialization library itself. After, they are really applications of the serialization library rathet than components of the library. So feel free, to make a serializer for any kind of data structure you find useful. All we have to do is to find a place to put it. Robert Ramey

Robert Ramey wrote:
Neal Becker wrote:
I have 1 more question. What about an archive that uses mmap? This should be very efficient for saving/restoring large structures in binary format. Is this a good idea?
I did all the STL collections as I considered these "fundamental". . I did a version of shared_ptr myself as a test/proof of concept.
1 more question? Hmmm so for I have received submissions for serialization of optional, scoped_ptr, date_time library among othersThese are from people who needed to serialize other boost objects that they use. I've stopped incorporating them as the library is getting too big for me to manage. If this library is ever accepted, serialization of boost components will have to be a responsibility separate from the serialization library itself. After, they are really applications of the serialization library rathet than components of the library.
So feel free, to make a serializer for any kind of data structure you find useful. All we have to do is to find a place to put it.
Thanks. Actually, though, what I'm talking about is a different archive type, not a data structure. Right now binary data is saved/restored using read/write. mmap is potentially much more efficient for large data structures. OTOH, there is one big problem for it, which is that mmap can only be applied to one contiguous memory region. I believe you would have to serialize one object to one file, and if you want more objects you would need multiple files to map them to. Still, might be interesting.

First off, I vote for bumping serialization to the top of the review queue. I've been using the library successfully in a development project for several months now. While I have not exercised the more advanced facets as some have, it works well with a complex mix of MFC, ATL, and boost libraries including bind, crc, function, iterators, signals and spirit. I've been able to serialize these to/from disk/clipboard/registry keys, etc. The only difficulty left (I haven't yet tried serialization17) is xml string output being rather verbose.
I have 1 more question. What about an archive that uses mmap? This should be very efficient for saving/restoring large structures in binary format. Is this a good idea?
This topic has come up on the spirit list. Originally I was a proponent of mmap approach based on experiencing major performance improvements with earlier version of Windows particularly Windows CE. Today I've seen much less difference in performance. I've been told that Windows 2K/Xp using NTFS memorymaps files under the covers, and unless your actively adressing into the memory range there's little to be gained, atleast on Windows. Perhaps other os do this as well?
I did all the STL collections as I considered these "fundamental". . I did a version of shared_ptr myself as a test/proof of concept.
1 more question? Hmmm so for I have received submissions for serialization of optional, scoped_ptr, date_time library among othersThese are from people who needed to serialize other boost objects that they use. I've stopped incorporating them as the library is getting too big for me to manage. If this library is ever accepted, serialization of boost components will have to be a responsibility separate from the serialization library itself. After, they are really applications of the serialization library rathet than components of the library.
Any thoughts on future serialization of boost::function even being possible? My use case is that I've a std::map of std::string/boost::function's that provide user configurable filters that detemine what gets displayed in a Table/List View. These currently are not being saved between sessions. It'd be great to do mFilterMap & ar; Robert, Thanks for the great job!!! Jeff

Jeff Flinn wrote: [...]
I have 1 more question. What about an archive that uses mmap? This should be very efficient for saving/restoring large structures in binary format. Is this a good idea?
This topic has come up on the spirit list. Originally I was a proponent of mmap approach based on experiencing major performance improvements with earlier version of Windows particularly Windows CE. Today I've seen much less difference in performance. I've been told that Windows 2K/Xp using NTFS memorymaps files under the covers, and unless your actively adressing into the memory range there's little to be gained, atleast on Windows. Perhaps other os do this as well?
I'm interested in linux myself. In restore, a big potential speed gain is possible if you don't actually use all the data. In that case it is never actually read from the disk. For save, I'm speculating that it is possible for the OS to optimize saving of unmodified pages using sparse files, but that maybe a bit farfetched.

"Jeff Flinn" <TriumphSprint2000@hotmail.com> wrote in message news:c22in1$god$1@sea.gmane.org...
This topic has come up on the spirit list. Originally I was a proponent of mmap approach based on experiencing major performance improvements with earlier version of Windows particularly Windows CE. Today I've seen much less difference in performance. I've been told that Windows 2K/Xp using NTFS memorymaps files under the covers, and unless your actively adressing into the memory range there's little to be gained, atleast on Windows. Perhaps other os do this as well?
This is an issue that I have researched before. I have compared the performance of Win32 memory mapped file API, synchronous file I/O API along with iostreams and C-style buffered I/O (stdio.h). Still have the benchmark prog but it uses a few library classes of my own so if I'd try to post enough of the source here so it can be built and run, it'll be rather unwieldy. It used fairly large I/O buffers (16K) to maximize throughput and tested both sequential and random reading and writing performance. Actually, I even had a provision to test file creation as opposed to writing to an already allocated space since it might have put mapped files at a disadvantage. The summary of multiple runs is on the attached chart. The chart is in logarithmic scale so actual numbers should be irrelevant. To provide a reference point, the test was run on a Xeon 1.7G x 2 machine with 2G RDRAM under Win.2003 Server and allocated the file on an NTFS-formatted 7200 RPM IDE drive connected to a non-RAID (and, as far as I can tell, non-cached) onboard IDE adapter. Typical run times for a 256M test file are 0.3-0.5 sec using memory mapped files, 0.5-0.6 sec using file I/O (I'm not even gonna mention iostreams and stdio here, you'll see how atrocious they are from the chart). Short version - memory mapped files are SO much faster, it is not even funny. I did not go the extra mile and test the asynchronous I/O (scatter/gather reads and writes); my expectation is that they should beat memory mapping but not by much. There apparently is a good reason to listen to the old Microsoft's recommendation and use memory mapping of files where possible. ...Max... begin 666 IOBenchmark.pdf M)5!$1BTQ+C(*)<#(S-(-"C$@,"!O8FH*/#P*+U1I=&QE("A)3T)E;F-H;6%R M:RYX;',I"B]!=71H;W(@*$UA>"D*+T-R96%T;W(@*'!D9D9A8W1O<GD@=W=W M+G!D9F9A8W1O<GDN8V]M*0HO4')O9'5C97(@*'!D9D9A8W1O<GD@,BXQ-2!< M*%=I;F1O=W,@,C P,R!397)V97(@4G5S<VEA;EPI*0HO0W)E871I;VY$871E M("A$.C(P,#0P,S R,3(U-S Q*0H^/@IE;F1O8FH*-" P(&]B:@H\/ HO1FEL M=&5R("]&;&%T941E8V]D90HO3&5N9W1H(#4@,"!2"CX^"G-T<F5A;0T*2(F% MEUW/W<:MA>_]*_9E6CBJYE,2$.2B:5.@0&Y:`[XVZK1Q82>M':-_OQR):TF; M(64<G),S7L^[-</A(CGSE.OC?X\M37U]Y'F9^N/#BR6W::ZZ?/_B[R]4SNM% ME,5%JGFJ%W%?7N16QG]._5A?@)ZGJ[XO+_(R3_6J'^LKL$SYJH_E15[+D[PO M3[ED65]^7M<7H):I7LZGZPO0UFF^_L*QO@!]F])E"[J^``8PZ==TDUP3*;L03P)J ^&>^`E@34-<3P!I 2WNA(< U@;HG_ E@ M3:#WITUR34 ZWG637 /HDKK737)-H+:G37)-0/Z_ZR:Y)G ,!B> -0#:9'Z: M.$[@CZ]>_.'[-.J!W.&K?VH+DX:U'*G7]U+UZD^/K\>ZRCV^^L?CJ_EWCU?_ MEI(Q+XO4X/$O4]K_:1T])C7]FZ^.?_QZ+ZSCG^7X]J?2;!C\Z7P(8W=][.S/ MK\9FYT=Z?)UDBTGJF50+:7KRFQ\>7WUW;,G(VU[AAO[1U_>&,_0??;WOXAM? MW*OOT']U]33O:1O_>MI[\M ?@=[W:Q'@4P"(>^\^($-&.[;XWP!HH_H.X',` M[/7AYA.9(?PY`&YC)#VO'[__+@#V#AW?0BIYE)<!O ^ @C.^# #)NKMK* MB M\(T/U-U>-X>09U8YMO!+`#2$(;CIND[K71S;C"CXF3Z&R-M<;/L(>A/GQHOX MX -]'P8'\*T!Y(K$25(@N^M5E1=<DSV!ZOO8ZIW@T-,^7'L'@-YA!AM#`/O@ M=?.%E()$@5Z0*/X1AMUNMYA6;-&:20&92S5(;P.@80LVTP"L\*N]1@5*02I: MNP!HJ'M!&,0O6O>L7Q00O^@>K%\`9!2^X!3U:":.7P"L7[CL-N.R@[LZ7ETW MV=#VGG5SFS+(:UD(0MU9F:QCVMY;CC?.T%^[^L8=V#.HOB!*-LZJ<X,V2(<N M31A1M#$`P!/87 " VFKO"?J*#AU\(3$5; L#P%2P+0P`2F?PA<R+MJ8#P# ' M8<J+VMH/\\5S-E,4T.9@&QC4@A-:1P)H*,W!+90^^0U,=3'D79ZDFE![K1\! MH+ %URQV++<A;/--'J?&LA?<86/9"R+<PK*G0&?9LUY<]\ZX]6GSO7CH29Y: MP:P)H$95#P#'/1LD``OJ@0V#`M*@@D0`D">_04&O.(2-(X 5>[1F44 ZE":K M[5 `ZN27=N@K[&AO2H&"/F_=`+VBJ@5!*!P$K!T4$#OH%H*;JA@$@C/4BF2Q M?@"P?N&J*TM_<%-BB>Z7-0`-73BXRK9.OB-4[RP[QA$Y[15SHRF_\_4>=%CH M:]!@51_O)[_!$JA!_R+0@_Y%8 OZ%P".?)\"O0;]B\ :]"\`IV$^!T#U^Q?U M->A?`$H*^A>!XO<OZC7H7P3<!QC5+>A?`&I4M@ADOW]1[Y,[#5+?OA !<9,[ M5E.O2*0@1,VM!U0WC,R3#_0$L_X4`'7RO5CV07?CX\QZ\=#'H%9],P+@5&_3 M#, :M 8`YZ1F@PR@!.V+P!+E`8#5;U_0\_R%/8J7_/9%8 G:%X R^^V+>@W: M%X'-;U_0I?GX[8M #MH7@8XM6#L`V# M!3<ECR/M7T$4Q _%+QD`&NXZB%-C MQEI+*- 1:.L(Z RTM43?N^_&D?"UJX_V,ON6`) 1!7M(`#"E#2-T#G0VX0&P M,MJ[5N <Z.Q5`JAHLL$GQ%/=;3^J9UZU;3\`>-7!%\0RW?>4`H6!#N)4.#H' M@3Y-97T-8$&!MST(P(IC6MLI(+;KMU<AMM,X6-L!:)@E@KL2V]W'H;%\6=L! M8/D*`M7X5+6V`["BNEC;*2"VT^IB?0>@X)C6=S(0EL,VU3<>@#4:#!5(T?Q- M`%.9#23T`N_;G 70H]L&P =KL$?Q3?9G(@!E\J<RZ#VJL0I(IRG^70+@D]6V M&@!LV3;I%9"DOXU"94NW.0^@80LVYP&@8P<W)2FOR1)$H95IK7>YT%CG@S!) MRNM=VY17H#/0-N4!,- FY8M45VESZX(]?N?J6T8^?_3U&O0)Z!Q*WKAZFF<_ MRM1I2?\#8_I;W4P@P +[R0<2ZV?PB52]-D1UP?CYV0?.T2[X_7"T(\ H!E'* M'$G>^4!A;0SNH; VO@\`3MDO`V!!CP@NXO3T:Q^H%7<=G.+L4T&@SO&P^$#+ MN.L<`/WN"(T#P_<^(*^A?'N$SH$AB/,Y'-ILJ;LBP^'L.U9U/MBL8P]]#(]^ MAR%0$2.;*P!\QT+=$*/@]Z7%!8D"H")&P1'.'ACL,,]NED M"-';`.C8P"\^ M('::W?Y%H,&0UBT`^N0^E:BOP2@"H')\MID&H 7#+0$.==8L"HA9?*] YTLI M.$1CW;)V4:!SO@X.T5G^K5T`L&[9;%OVDB=]T*\Y*G.4L:FF.H=.N\%#'_U+ M.["U`X"[W8WN5?P0`F!5M=U+@7/B##XA=M/^9QL8@(XSV@:FP#DP!I^(!D;J MC'(0I,Q!)PCS:3CK> `%J683!4#%*:TC`?0I*/X`./S[N93J[1'$CPH$4>+ M:>VH>N,L9OT(@$^HX 3BQ^S:4?6.82PX@K@QWP;Y'#=MIAQ&$SOY$3SDT7J" M<1- #YZ1!#CMV50#L$V^(U5/.4H#`#5X`Q+H.(3-50`;]F@S08&<D:NV00'H MV(-M4 `V=:3M3ZH7]GCK!@`==2T(0_&[DZJUH#4$]R1NN V!F$&OP;I!@990 M^*T;`/!U%9R@L2I9.RC0.30'A^C^) >5)<FD6I5Z*=/!RFGV.U??YB#1H.>@ M\D-G.7CCZ^QMO[KZZ&WM[@-I9O-]!$#3-/P4Z&OP- (@;E2__S<`6%4_!P"J M:O"%S"C^' `MJ"@$4%7?!?H:3*L`BIM&5)E&+P. LV9P"YPU7_MZS4&>$W#[ M%M4%,TSQ@3;CEG,`5-QR<()STOS!!]BY@A"?KZK_&*#NQ7@KB*'UHNJ<9*T7 M5>>KRZ;9H<L8B!C8'0)@9[-I!F!!#()/I!3E`8"L0?+/<'EW!7L4+\WNDP) M3@C3VP"HNH5?`GW%/7WP`7^ZZBSC$O",(YYEDW*"!N*+Z?`;"W!5$00^A% M6$,H((;05XTU!( RK?7N%.>H9PVAP-F[@KL\'T[6$<O>E]8-[=/&Z="EN\R^ M(U1G=[5A5!VCH$WW0Q[-9_8+-P#611LC`*R+MOL`6(-W#0!Q5'>;#_2*,]KF M`V!%+@5?$,-TWU$`&.8@3C(*+GZZ`EB#<11 2<@EVX$`9!S3F@X`I\G@*LYI MTD^F83I_KB=0@IF9`%^QUG0*-(Y3UG0`\(H-#G$.E-9S"HCGBE\_`?!Y9#S7 MYKWHB&>JZSG51Q?Q1T("['/O`@`3V:^!ON"0/_I FH.[)L#Z&>SQ?$"]"8!E M<NLK]#PC']\&0,46?@F !1?QP0<*V_7+`*C3;10*VWEPE36AQ <W)1F?W=) MH.,FB@^TV4MWJ@4C2W"$QM+S@P]TENC@(L\7D,WWO _^V['!O_EB#6HG=-ZA M_?JAISEA_S9-`$1W")UW^"D`MNENA\,*S6T@!/@$_!P`6V1'!>3YTMWIG(!O M=Z@,HLU"`!N::!!E<8K6K/<!P$'#6@E \^LN=9:D;WR@SBC]P2'.D<[6`P`- M80ANNN(5&<3QG.ALT0.0)W?4H,X73A#GQHNP)4N!\XGSK0':/@AM>*9:NQWR ML$OSLQD`1VN[10"<IZSA%. \96\:.DN>C2* -OG)"'V%X:P;%)!Y2Y,QV&)F MD[>I`F"%X>P]*%#8Y&V^`VA1X06PH'#9A%>@L@?;A ? <2HXA22\5F:;\ `8 M29OQ"ISC5'!7DO')?:\2X!LFN,VV(N&"4)]O&)ORZY[,&[]@<U[U#3W8;O'0 MQQO#WR%T/B9M.@'8L$-[UPK0$O8>H%<,(L$6)>7UJFV/44!27J_:]A@`=?); M#'0^(*RI%#@?$#97`"",-EVA\_T0Q+GX'0;JAA-:QRD@LU:ZO07QB]]AH/?) M=POT[0L1:&GRRP[TBD0*0J2JM3/4#25E\H'.6>BG`, L9)S4Y4&P'2^/Q;42 M`1:M'WU '@[N'5,OZ(!O`V );I$`FO1'7Y=WP^+:F4"%EWX.@ 69%NRQS)-; M=ZFS27\(`(R4+WW]G'6"(%3.G-\$0,<6/@4`9\Y??:!%[R\"'-W?!0"GG2!. MY[0S^4!'H'\*]&@:ZF7?W,:J:/-9=>[0IK/JT5RL^C",3H0V2 `:FH^]20`+ MBHJ]* 42QP!_C](\O,9 =4'1^.P#IU^"WX_]`H!^L:D$@&&TF:* &*JY<S$! MS@#O`X#CEK44`%:NX"+*BCB\]H%:<=?!*2K?>4&@3L\5'VA\Y^4`<$=*JGSG M?>\#TB#R[1$Z&T 0Y]-Q-EOZGD<;!V?KN$,?CMGN?F!,6\7W'( -I[2>4T!Z M3'#5`#ANV1X"H >C"H \N_<,M2 9@QWFC@W8RJB &&)VAV("#9:R^0Z@3WX' M@;XBFVVZ*R#IGMWG!X&&>PC"(.FN]V#370%)=S_;H;,\!X=HK#PVX14X)Z+@ M$)T%W"8\`%8>FZ_K/JFL6_#Z4'TKP>L#^N(_/E1.\QP\/@B4X/%!H/F/#^IK MU `42#EX?!!HP>.#P.H_/J#G'#P^"+1@]":P^(\/ZFOD: 6*7_:@EN#Q0:!% M8RL`^-':3?5SH@M.H-\/`E39YJW7%&AL\]9L`-CF@Q.<;OS!!SBM!2$^>\=_ MGH%%*K9<[]:#UX?J:4[!ZX- ]5\?U)?@]0$@16](`ME_?5"OP>N#P!I,4P!R M"EX?!*K_^J"^!J\/`)+,;NN@7H/7!X&.7'WM`Y+,_M. `)^!010JGX'%!R2? MM7#G`` end

I forgot to mention that benchmarking code randomly changed the order of tests to minimize the disparity in cache hit/miss ratios. Also, big difference in writing/creation times between memory-mapped and file-based I/O may mean that memory-mappings implement cached writes whereas file writes are somehow write-through, though it doesn't seem to be the case according to documentation. Perhaps CloseHandle() waits until all dirty cache pages are written to the device but does not wait for the dirty pages that belonged to the memory mapping. It is also possible that file-based writes and memory-mapped writes are logged differently in the NTFS journal. None of this should apply to reads, though. ...Max...

The summary of multiple runs is on the attached chart. The chart is in logarithmic scale so actual numbers should be irrelevant. To provide a reference point, the test was run on a Xeon 1.7G x 2 machine with 2G RDRAM under Win.2003 Server and allocated the file on an NTFS-formatted 7200 RPM IDE drive connected to a non-RAID (and, as far as I can tell, non-cached) onboard IDE adapter. Typical run times for a 256M test file are 0.3-0.5 sec using memory mapped files, 0.5-0.6 sec using file I/O (I'm not even gonna mention iostreams and stdio here, you'll see how atrocious they are from the chart).
Short version - memory mapped files are SO much faster, it is not even funny. I did not go the extra mile and test the asynchronous I/O (scatter/gather reads and writes); my expectation is that they should beat memory mapping but not by much. There apparently is a good reason to
At 02:17 PM 3/2/2004, Max Motovilov wrote: listen
to the old Microsoft's recommendation and use memory mapping of files where possible.
Hum... The chart only shows highly significant differences for "Create sequential, Win32 Map" and "Write sequential, Win32 Map" compared to the similar operations for other approaches. What was being measured? CPU time or wall clock time? But for those two operations the timing differences are so great that they make me wonder if the data was actually written to disk on those tests. Did you verify the data was actually written, and that your timing actually covered the period during which writing occurred? Remember that with some operating systems writing can under some conditions be deferred past the point where the program which creates the data terminates. Thus timings generated by the program itself can be bogus. Did you compute apparent disk-transfer rates to verify timings were reasonable? Just curious, --Beman

"Beman Dawes" <bdawes@acm.org> wrote in message
Hum... The chart only shows highly significant differences for "Create sequential, Win32 Map" and "Write sequential, Win32 Map" compared to the similar operations for other approaches.
Almost 2X for reading performance - it may not be immediately apparent because Y axis is logarithmic.
What was being measured? CPU time or wall clock time?
System time, which, I imagine, would be wall clock time in this parlance.
But for those two operations the timing differences are so great that they make me wonder if the data was actually written to disk on those tests.
I've theorized on that a bit in my follow-up message. Note that writing and reading performances are rather close for file mappings and far apart for regular file I/O. There's gotta be a write-through vs. write-back issue here somewhere.
Remember that with some operating systems writing can under some conditions be deferred past the point where the program which creates the data terminates. Thus timings generated by the program itself can be bogus.
I agree. However the data were collected over multiple runs of the same process as well as over multiple invocations of the test pass within one run. Neither guarantees measuring the final write-through performance of course, but IMHO this mode of testing approaches the natural behavior of an application running under this specific OS. After all, an application developer would probably be interested in observable performance numbers, even if optimizations built into the OS may result in indefinite delays before data actually end up on the disk.
Did you compute apparent disk-transfer rates to verify timings were reasonable?
Again, with the read-ahead and write-back behavior of the file cache, reasonability does not necessarily imply that you measure the low-level I/O operation. If this test really measures the overhead of different caching mechanisms within the OS it is fine by me, as long as it approximates what a real application would experience. Though I agree that running it on a system with less RAM and using larger files (I believe I did try 1G files and got very similar results but neglected to collect enough data with this setting) would give a different, no less interesting, insight. ...Max...

Robert, [snip]
to serialize these to/from disk/clipboard/registry keys, etc. The only difficulty left (I haven't yet tried serialization17) is xml string output being rather verbose.
FYI, This is no longer an issue with serialization17. std::strings are now output as expected. Jeff F

On Tue, 2 Mar 2004 08:28:25 -0800, Robert Ramey wrote
1 more question? Hmmm so for I have received submissions for serialization of optional, scoped_ptr, date_time library among others. These are from people who needed to serialize other boost objects that they use. I've stopped incorporating them as the library is getting too big for me to manage. If this library is ever accepted, serialization of boost components will have to be a responsibility separate from the serialization library itself. After, they are really applications of the serialization library rathet than components of the library.
I agree that the serialization for items like date-time should be part of date-time and not serialization. I was planning on working on the changes for date-time as part of my 'evaluation' of the new serialization. Jeff
participants (6)
-
Beman Dawes
-
Jeff Flinn
-
Jeff Garland
-
Max Motovilov
-
Neal D. Becker
-
Robert Ramey