xml_iarchive blows out the stack

When restoring a string from an xml_iarchive, the process stack can grow surprisingly large. A demonstration is appended. Restoring a string with 10000 '<' characters blows out the stack to more than 3.7MB. Notice that the demo program shows that neither the letter 'x' nor the letter '.' is problematic. It seems that stack growth only comes from entity references in the xml archive, i.e., sequences that match the 'Reference' pattern in basic_xml_grammar.ipp): >, &, < ' ". My guess is that the parser used to unescape the references has the property that it pushes the stack every time it sees one of Reference patterns. This isn't just a theoretical problem. It can arise in practice if one tries to restore a string containing XML, or C++ source. (all those '<', '>' and '&'). In fact, I found it by investigating why my stack grew to more than 1MB when I switched from a text_archive to an xml archive in a real application. I haven't taken a close look at the grammar, and I have no experience at all with spirit. Is this likely to be something easily fixable, or is it something one just has to live with? Cheers, John Salmon ---------------cut here serstack.cpp -------- // Demonstration that xml_iarchive blows out the // stack when handed a string with lots of xml // entity references: &, <, >, ', ". This demo // does not explore what happens with unicode char refs, // e.g., NNNN; and XXXX; #include <boost/archive/xml_oarchive.hpp> #include <boost/archive/xml_iarchive.hpp> #include <boost/serialization/string.hpp> #include <sstream> #include <cassert> #include <iostream> #include <cstdlib> using namespace std; using namespace boost; // Figuring out how much the stack has grown is *very* // system dependent. This works on at least one // version of Linux. void checkStk(char *txt){ pid_t pid = getpid(); char command[512]; printf("%s", txt); sprintf(command, "grep VmStk /proc/%d/status", pid); system(command); } void archive_string(const char c){ string bigstring(10000, c); stringstream ss; archive::xml_oarchive oa(ss); oa << BOOST_SERIALIZATION_NVP(bigstring); cout << "Archiving a string of '" << c << "'\n"; string copy_of_bigstring; archive::xml_iarchive ia(ss); checkStk("Before ia >> copy\n"); ia >> BOOST_SERIALIZATION_NVP(copy_of_bigstring); checkStk("After ia >> copy\n"); assert( bigstring == copy_of_bigstring ); } int main(int argc, char **argv){ const char *letters; if(argc == 2) letters = argv[1]; else letters = "x>"; // Note that once the stack is 'blown', you don't // learn much by testing other letters. I.e., // serstack "xyz&>" // tells you about & but not much about > for(const char *p = letters; *p; ++p) archive_string(*p); return 0; } ----------------- # The stack grows to 3.6MB when we archive strings # with entity refernces: salmonj@drda0026.nyc$ serstack '<' Archiving a string of '<' Before ia >> copy VmStk: 12 kB After ia >> copy VmStk: 3760 kB salmonj@drda0026.nyc$ serstack '&' Archiving a string of '&' Before ia >> copy VmStk: 12 kB After ia >> copy VmStk: 3760 kB salmonj@drda0026.nyc$ serstack '"' Archiving a string of '"' Before ia >> copy VmStk: 12 kB After ia >> copy VmStk: 3760 kB salmonj@drda0026.nyc$ serstack "'" Archiving a string of ''' Before ia >> copy VmStk: 12 kB After ia >> copy VmStk: 3760 kB # But if we archive strings containing 'plain' # characters, even punction, the stack remains # a svelte 12kB. salmonj@drda0026.nyc$ serstack 'abcdef.!@123*()789' Archiving a string of 'a' Before ia >> copy VmStk: 12 kB After ia >> copy VmStk: 12 kB Archiving a string of 'b' Before ia >> copy VmStk: 12 kB After ia >> copy VmStk: 12 kB Archiving a string of 'c' Before ia >> copy VmStk: 12 kB After ia >> copy VmStk: 12 kB Archiving a string of 'd' Before ia >> copy VmStk: 12 kB After ia >> copy VmStk: 12 kB Archiving a string of 'e' Before ia >> copy VmStk: 12 kB After ia >> copy VmStk: 12 kB Archiving a string of 'f' Before ia >> copy VmStk: 12 kB After ia >> copy VmStk: 12 kB Archiving a string of '.' Before ia >> copy VmStk: 12 kB After ia >> copy VmStk: 12 kB Archiving a string of '!' Before ia >> copy VmStk: 12 kB After ia >> copy VmStk: 12 kB Archiving a string of '@' Before ia >> copy VmStk: 12 kB After ia >> copy VmStk: 12 kB Archiving a string of '1' Before ia >> copy VmStk: 12 kB After ia >> copy VmStk: 12 kB Archiving a string of '2' Before ia >> copy VmStk: 12 kB After ia >> copy VmStk: 12 kB Archiving a string of '3' Before ia >> copy VmStk: 12 kB After ia >> copy VmStk: 12 kB Archiving a string of '*' Before ia >> copy VmStk: 12 kB After ia >> copy VmStk: 12 kB Archiving a string of '(' Before ia >> copy VmStk: 12 kB After ia >> copy VmStk: 12 kB Archiving a string of ')' Before ia >> copy VmStk: 12 kB After ia >> copy VmStk: 12 kB Archiving a string of '7' Before ia >> copy VmStk: 12 kB After ia >> copy VmStk: 12 kB Archiving a string of '8' Before ia >> copy VmStk: 12 kB After ia >> copy VmStk: 12 kB Archiving a string of '9' Before ia >> copy VmStk: 12 kB After ia >> copy VmStk: 12 kB salmonj@drda0026.nyc$
participants (1)
-
John Salmon