: Review of a safer memory management approach for C++

A few responses ...
4. Re: Review of a safer memory management approach for C++ (Mathias Gaunard) 5. Re: Review of a safer memory management approach for C++ (Roland Bock) 6. Re: Review of a safer memory management approach for C++ (Fernando Cacciola)
------------------------------
Message: 4 Date: Mon, 07 Jun 2010 18:07:55 +0100 From: Mathias Gaunard <mathias.gaunard@ens-lyon.org> To: boost@lists.boost.org Subject: Re: [boost] Review of a safer memory management approach for C++ Message-ID: <huj91v$cem$1@dough.gmane.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Bartlett, Roscoe A wrote:
Tools like valgrind and purify are very helpful but are not nearly sufficient as described in Section 3.2 (and other sections) in:
http://www.cs.sandia.gov/~rabartl/TeuchosMemoryManagementSAND.pdf
The limitations you give are simply that you expect valgrind to do more than tell you about memory errors, but also to tell you about contract violation of any library as well. That's not something a generic tool can do.
[Bartlett, Roscoe A] Exactly my point.
Contracts are specified by each library, and can be optionally checked in a special debug mode the library may provide.
[Bartlett, Roscoe A] Exactly. However, rather than doing this haphazardly why not have a consistent built-in approach in your own software to catch such mistakes automatically in a debug-mode build? Enter the Teuchos MM classes and the idioms described in: http://www.cs.sandia.gov/~rabartl/TeuchosMemoryManagementSAND.pdf
All that valgrind can do (as far as I my usage goes) is tell you if you access some unallocated memory (relatively to the default global allocator) or if you read an uninitialized object.
[Bartlett, Roscoe A] Valgrind and purify are very useful but they will often only flag there is a problem until much after the original error occurred. As an example, a few months ago I was using std::multimap for the first time in a GCC implementation. The documentation I found for std::multimap on the web was not very detailed and did not really explain behavior in a few important cases (I have had a hard time finding decent standard C++ library documentation). I ran the code and it behaved in strange ways, segfaulted, etc. I turned on the checked STL implementation with -D_GLIBCXX_DEBUG but it did not complain about anything. I ran valgrind on it is it complained there was a problem in a place in the code that made no sense at all. I knew the only "unsafe" code that I had written (that did not exclusively use the Teuchos MM classes and idioms) was the std::multimap code. After more experimentation I figured out that I had guess the behavior of std::multimap incorrectly. Once I figured out what the behavior really was, the program ran fine. Here was a case where the checked STL implementation was not catching a basic user error and valgrind was worthless. I just wish that I would have saved the state of this code in a branch or something so that I could show this to other people.
------------------------------
Message: 5 Date: Mon, 07 Jun 2010 19:14:33 +0200 From: Roland Bock <rbock@eudoxos.de> To: boost@lists.boost.org Subject: Re: [boost] Review of a safer memory management approach for C++ Message-ID: <4C0D28F9.1060301@eudoxos.de> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Bartlett, Roscoe A wrote:
Come one, at least some people on this mail list must have had similar nightmare experiences in trying to track down and diagnose hard memory misuse errors in C++ that took days of effort to resolve (even with the help of tools like valgrind and purify).
Hi,
yes, I had such experiences, but that's years in the past, when I did not know shared pointers. With shared pointers only very few memory issues ever occurred.
[Bartlett, Roscoe A] I have to admit, that w.r.t. single objects, after I started using smart reference-counted pointers, I experienced very few memory errors. However, I was still having some errors and I was writing a lot of paranoid manual error checking code with all of the raw pointers that remained (and yes, if all you have is an RCP class, you will still need to use raw pointers in many cases). After I developed Teuchos::Ptr, the raw pointers to single objects went away and I ripped out a bunch of manual error checking code (a process that I will likely never finish because of the large amount of code I have written over the years). There are people in my domain that still today will refuse to use a smart pointer class and insist on manipulating raw memory, even in brand-new code. The cycle of undefined behavior, segfault, etc. will continue ...
These days, our memory issues are of a different kind: Memory fragmenation due to malloc's strategies in multithreading scenarios. These are even worse, because formally, there is no problem in the code. Writing you own allocator and then suddenly not being able to use valgrind anymore, that's the memory fun today.
[Bartlett, Roscoe A] Do people have experience with library allocators from MPI and TBB? These are supposed to place memory more carefully but they mean that you can't use the allocator embedded in std::vector anymore.
------------------------------
Message: 6 Date: Mon, 07 Jun 2010 14:32:15 -0300 From: Fernando Cacciola <fernando.cacciola@gmail.com> To: boost@lists.boost.org Subject: Re: [boost] Review of a safer memory management approach for C++ Message-ID: <hujad9$ic7$1@dough.gmane.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Hi Bartlett,
Just for the record and since your statements below are about general experiences with large scale C++ projects, let me put my own experiences in context: I had been architecting, designing and implementing large- scaled projects from the early 90's. The largest of which accounts for 160K of C++ code, which in fact I wrote almost entirely myslef. Again this is just to put the comments based on my experience in persective.
[Bartlett, Roscoe A] The real problem comes when you can't architect all of the code yourself with one consistent style and instead have to glue together code from lots of different sources that had very inconsistent ideas about design and managing memory. Early experience with this type of work lead me to put in the extra_data hack in Teuchos::RCP. It was not pretty but it worked to glue all kinds of software together effectively.
These types of experiences have lead many C++ teams to code C++ with a (justified but unfortunate) paranoia about the use of memory in C++ that has all kinds of bad consequences
I think that those memory problems are funtamentally rooted at a design issue.
[Bartlett, Roscoe A] The problems are rooted in the raw manipulation of memory with raw pointers and raw calls to new and delete and raw use of arrays as raw pointers. In my opinion that this is the design problem. Others will likely disagree.
In the mid 90's I used to developed best practices and utilities for sane memory management (and other sanity requirements), in the same spirit of the paper you presented (I did read it btw). I even implemented custom allocators, based on class-specific memory pools, and mandated that every object should obey a strict allocation-deallocation protocol.
Simply defining a new class on my system required the use of a macro- based DSL, something like "DEFINE_DERIVED_OBJECT(Foo,Base). Likewise, object graphs had to be very carefully spelled with my DSL, as in "INCLUDE_SUBOBJECT(Bar)", and so on..
This forced everyone in the team, year after year, to learn a language on top of C++.
[Bartlett, Roscoe A] Note that the STL absolutely defined a new language in C++ as has been pointed out by Scott Meyers in Item 1 in "Effective C++ 3rd edition". The question is if the new language provides enough benefit to justify having to learn it. I believe that STL has been hugely worth it but it is only a container, algorithm, and data-structure library and does not solve the most fundamental problems with C++ coding; the usage of raw pointers which the STL alone did not eliminate the need for in many programs (others may disagree).
In the long end, however, I realized I was just overengineering the problem way too much, putting a big burden on the team and making it difficult for newcomers.
[Bartlett, Roscoe A] That was my opinion also in the late 90s as I was also going down that road and I stopped around 1999 but I have since come to regret that view today for the reasons described in Section 3.2 of: http://www.cs.sandia.gov/~rabartl/TeuchosMemoryManagementSAND.pdf As argued in Sections 5.8 and 6.2 of the above document, the Teuchos MM classes and the associated idioms create a fairly thin language above raw pointers that increase the self-documenting nature of the code in a way you can't do with a language like Java or Python.
When C++ evolved I found new, much simpler ways to solve the same problems, and in particular, memory problems: using smart pointers (even long before boost::shared_ptr came alone). Once I started using smart pointers I never looked back, and I never again, ever, had to spend a single minute on a memory leak.
[Bartlett, Roscoe A] Smart pointers solve leaks very well but the real problem are other invalid usages of memory that create undefined behavior. In CSE we have to use lots of arrays and we can't mandate that all memory use std::vector or STL allocators to do all of the allocations. The main holes that were filled with the Teuchos MM classes were better safer handling of arrays without mandating (not even at compile-time) how the memory is allocated or deallocated in a consistent system.
Come one, at least some people on this mail list must have had similar nightmare experiences in trying to track down and diagnose hard memory misuse errors in C++ that took days of effort to resolve (even with the help of tools like valgrind and purify).
Before I started using smart pointers, yes. After that, no.. never again.
[Bartlett, Roscoe A] Yes but what currently exists in C++0x and boost for arrays is not sufficient to provide the guarantees. Read the paper, look the classes, and make of you mind for yourself. Also, what did you do in cases where you did not need machinery for persisting associations and could not afford the overhead of reference-counting classes? Did you just use a raw pointer? Did you use a raw C++ reference? Are all of your objects using value semantics (deep or shallow copy) and just copied objects? That was the hole the Teuchos::Ptr class was designed to fill and it still provides full referentially checking in a debug-mode build.
And again, tools like valgrind and purify will *never* catch semantic misuse of memory (i.e. allocating a big chunk of memory and then breaking it up to be used to construct different objects and array of objects). The Teuchos MM classes will catch most semantic misuse of memory in a way that no tool like valgrind and purify every can (because they don't know the context of your program, they only know that any read/writes in a big block of memory that you allocated look okay). I think this is a big deal in catching hard-to-find defects that are not (technically speaking) memory misuse defects but are program defects none the less.
I totally fail to see why this design mistakes (wrong allocation pattern) should be detected by a framework? These are design issues and should be deal with at that stage. Surely any team can be trained not to make such mistakes.
[Bartlett, Roscoe A] Telling people to simply stop making mistakes is not a solution to every problem. W. Edwards Deming stated that most mistakes that are made by people are due to a faulty process or faulty support tools. If people keep making mistakes then we need to first look at the processes and tools and not just blame them. We should be able to create the tools so that basic errors in memory usage are automatically detected, even in C++. The Teuchos MM classes and idioms paired with a static analysis tool (if we could find and configure one) to help enforce the idioms would largely solve this problem in C++. Otherwise, most people what currently write CSE software would be better off writing code in a language like C# but that is just not viable for many reasons.

Bartlett, Roscoe A wrote:
Valgrind and purify are very useful but they will often only flag there is a problem until much after the original error occurred. As an example, a few months ago I was using std::multimap for the first time in a GCC implementation. The documentation I found for std::multimap on the web was not very detailed and did not really explain behavior in a few important cases (I have had a hard time finding decent standard C++ library documentation). I ran the code and it behaved in strange ways, segfaulted, etc. I turned on the checked STL implementation with -D_GLIBCXX_DEBUG but it did not complain about anything. I ran valgrind on it is it complained there was a problem in a place in the code that made no sense at all. I knew the only "unsafe" code that I had written (that did not exclusively use the Teuchos MM classes and idioms) was the std::multimap code. After more experimentation I figured out that I had guess the behavior of std::multimap incorrectly. Once I figured out what the behavior really was, the program ran fine. Here was a case where the checked STL implementation was not catching a basic user error and valgrind was worthless. I just wish that I would have saved the state of this code in a branch or something so that I could show this to other people.
I like http://www.sgi.com/tech/stl/, but it is for STLPORT so you will find some differences from gnu stl. I never use multimap and prefer map of vectors. You can turn off the stl memory pool at the compile line and force it to use the system allocator to help valgrind find errors.
There are people in my domain that still today will refuse to use a smart pointer class and insist on manipulating raw memory, even in brand-new code. The cycle of undefined behavior, segfault, etc. will continue ...
They are writing C code and compiling it with the C++ compiler. Everything in the C++ language that is not C is there to help with this problem, but it only helps if people use it. If they refuse to follow coding conventions that prevent bugs why do you expect them to use your memory checking pointers?
Do people have experience with library allocators from MPI and TBB? These are supposed to place memory more carefully but they mean that you can't use the allocator embedded in std::vector anymore.
You pass the allocator at the end of the template parameter list. They place memory carefully to prevent performance problems, not to prevent bugs. Buffer overrun by one element may become benign in the majority of cases when you pad your allocations out to the cache line, but I wouldn't count on it. Regards, Luke

Simonson, Lucanus J skrev:
I never use multimap and prefer map of vectors.
Hm. Interesting. I assume that your reason is that it implies fewer nodes in the multimap a) easier balancing/fewer buckets b) fewer memory allocations, tighter memory c) less memory footprint because the key is only stored once Am I right? OTOH, iteration is slightly more complicated. I guess this is a good example of where segmented iterators could be useful. ? regards -Thorsten

On Wed, Jun 9, 2010 at 4:39 PM, Thorsten Ottosen <nesotto@cs.aau.dk> wrote:
Simonson, Lucanus J skrev:
I never use multimap and prefer map of vectors.
Hm. Interesting. I assume that your reason is that it implies fewer nodes in the multimap
a) easier balancing/fewer buckets b) fewer memory allocations, tighter memory c) less memory footprint because the key is only stored once
Am I right?
OTOH, iteration is slightly more complicated. I guess this is a good example of where segmented iterators could be useful. ?
Sorry the ignorance, but what would be segmented iterators?
?
regards
-Thorsten
Regards, -- Felipe Magno de Almeida

On Jun 9, 2010, at 2:06 PM, Felipe Magno de Almeida wrote:
On Wed, Jun 9, 2010 at 4:39 PM, Thorsten Ottosen <nesotto@cs.aau.dk> wrote:
Simonson, Lucanus J skrev:
I never use multimap and prefer map of vectors.
Hm. Interesting. I assume that your reason is that it implies fewer nodes in the multimap
a) easier balancing/fewer buckets b) fewer memory allocations, tighter memory c) less memory footprint because the key is only stored once
Am I right?
OTOH, iteration is slightly more complicated. I guess this is a good example of where segmented iterators could be useful. ?
Sorry the ignorance, but what would be segmented iterators?

On Wed, Jun 9, 2010 at 5:16 PM, Belcourt, Kenneth <kbelco@sandia.gov> wrote:
On Jun 9, 2010, at 2:06 PM, Felipe Magno de Almeida wrote:
On Wed, Jun 9, 2010 at 4:39 PM, Thorsten Ottosen <nesotto@cs.aau.dk> wrote:
[snip]
Am I right?
OTOH, iteration is slightly more complicated. I guess this is a good example of where segmented iterators could be useful. ?
Sorry the ignorance, but what would be segmented iterators?
Wow! This fits so well with what I needed! I have a recursive concept where models usually are just representation of a XML. And I needed a way to have algorithms that worked depth-first with input_iterators, for parsing as we work it, and other where segmentation would make more sense, e.g. DOM. Now I can say that it is a flattened sequence with a non-segmented iterator that works depth-first, or I can have a segmented iterator that works depth-first but that also doesn't hide the segmentation property of the structure, so that I can write specialized algorithms for it.
-- Noel
Thanks! -- Felipe Magno de Almeida

Thorsten Ottosen wrote:
Simonson, Lucanus J skrev:
I never use multimap and prefer map of vectors.
Hm. Interesting. I assume that your reason is that it implies fewer nodes in the multimap
a) easier balancing/fewer buckets b) fewer memory allocations, tighter memory c) less memory footprint because the key is only stored once
Am I right?
OTOH, iteration is slightly more complicated. I guess this is a good example of where segmented iterators could be useful.
There are actually cases where a map of vectors would be less efficient than a multimap when the number of elements with non-unique keys is small, but you are right that there are cases where a map of vectors will be more efficient. The real reason is more of a defensive programming practice. Multi-maps are somewhat confusing to work with because the order of insertion for elements with the same key isn't (to my knowledge) well defined. If you insert an element and it returns an iterator, for example, is that iterator pointing to the first, last or some middle element in a group with the same key value? It is error prone to work in a mode where your code works for multimaps where elements have unique keys but breaks when multiple elements start sharing a key because it is easy to forget about and no compile time error catches the mistake. With map of vectors the compiler forces you to handle the case where keys are equal. With a multimap things become confused when you insert an element and then follow up with a loop that decrements the iterator returned by insert until a condition is met. With a map of vectors it is much more explicit and self documenting what the code is doing and why. It also becomes easier to think about the data structure as a map of vectors rather than a flat multimap. For these reasons I actually would prefer not to use segmented iterators on a map of vectors in most cases since it abstracts away the clarity of the code I'm trying to achieve. Regards, Luke
participants (5)
-
Bartlett, Roscoe A
-
Belcourt, Kenneth
-
Felipe Magno de Almeida
-
Simonson, Lucanus J
-
Thorsten Ottosen