
On Fri, Jan 21, 2011 at 9:23 PM, Eric Niebler <eric@boostpro.com> wrote:
On 1/21/2011 7:45 PM, Dean Michael Berris wrote:
On Fri, Jan 21, 2011 at 8:37 PM, Eric Niebler <eric@boostpro.com> wrote:
Uh, if the string is immutable, then two strings can transparently share the same data. There is no "write" in "copy-on-write". That's the definition of immutable. :-)
Ha! You're right. :D Now if you can make this work without reference counting, that'd be perfect. :D
Why? Immutable data is exactly the situation (and the only situation IMO) when you'd want to share data. The potential savings, both in time and space, are huge. What's wrong with a ref-counted string impl?
Good question. Here are some reasons off the top of my head for why a ref-counted string implementation might not be desirable: 1. The overhead of maintaining that table of reference counts may be significant if you have a lot of strings created in your application. The only way I know how this would be implemented is with maybe static intrusive container of these string chunks. Notice I say "created" not copied around because the cost of copying whole strings around in memory would dwarf the cost of having that static intrusive container for string chunks. 2. In the case that you do have strings copied around, consider the case in multi-threaded applications where you have different threads running on different cores. Although there won't be any mutations, updating a reference count that's shared across cores will cause some "unnecessary" cache ping-pong across cores. You won't have this if each thread was just operating on a copy of the data and not needing to update reference counts. 3. The reference count almost always will be implemented with atomic counters to be efficient. In cases where the hardware doesn't support these (in embedded environments) the reference count would be implemented either with a spin-lock if the platform supports threading, OS-level semaphores, etc. -- none of which would be necessary if you just copied the string to begin with. For the majority of cases an interning strategy for strings to be able to share the same storage would be a good thing. Maybe turning off the reference counting "default" might be an option as well. Of course I'm just listing the reasons why a ref-counted implementation wouldn't be desirable -- in the majority of the cases, I'd say reference counting may be a good thing. ;) -- Dean Michael Berris about.me/deanberris