Interest request for pointer+bit compression optimization

vicente.botet

7 Sep 2009 7 Sep '09

5 p.m.

Hi, I have recently found in the blog of Joaquin the following optimization consisting in embed a bit into the representation of a pointer. Please read it, it is not too long (http://bannalia.blogspot.com/2008/11/optimizing-red-black-tree-color-bits.ht...) I was wondering if it would be interesting to provide some generic template classes making easier this kind of optimization. We can define two templates (See below) template <typename E> struct bit_ref; template <typename T> struct pointer_ref; So instead of having enum color_type{red=false,black=true}; class rb_node { private: color_type color_; regular_rb_node_header* parent; color_type& color(){return color_;}? color_type color()const{return color_;}? rb_node*& parent(){return parent_;}? rb_node* parent() const{return parent_;} } we can compress the colot_type and the rb_node* as follows class rb_node { private: uintptr_type parentcolor_; public: bit_ref<color_type> color(){return bit_ref<color_type>(&parentcolor_);} color_type color()const{return bit_ref<color_type>(&parentcolor_);} pointer_ref<rb_node> parent(){return pointer_ref<rb_node>(&parentcolor_);} rb_node* parent() const {return pointer_ref<rb_node>(&parentcolor_);} }; Next follows the definition of the template classes template <typename E> struct bit_ref { bit_ref(uintptr_type* r_):r(r_){} operator E() const { return E(*r&uintptr_type(1)); } bit_ref& operator=(E c) { *r&=~uintptr_type(1); *r|=uintptr_type(c); return *this; } bit_ref& operator=(const bit_ref& x) { return operator=(x.operator T()); } private: uintptr_type* r; }; template <typename T> struct pointer_ref { pointer_ref(uintptr_type* r_):r(r_){} operator T*() const { return (T*)(void*)(*r&~uintptr_type(1)); } pointer_ref& operator=(T* p) { *r=((uintptr_type)(void*)p)|(*r&uintptr_type(1)); return *this; } pointer_ref& operator=(const pointer_ref& x) { return operator=(x.operator T*()); } pointer_ref operator->()const { return operator T*(); } private: uintptr_type* r; }; Thanks Joaquin for sharing simple things like this one, stable_vector and others in your Bannalia blog. _____________________ Vicente Juan Botet Escribá

Show replies by date

Edouard A.

7 Sep 7 Sep

5:33 p.m.

New subject: Interest request for pointer+bit compression optimization

...

I was wondering if it would be interesting to provide some generic template classes making >easier this kind of optimization.

Yes, at least for me. You actually have more than one bit available as the OS generally reserves a huge area of virtual memory for the kernel. Typical example, Windows 32, maximum user pointer value is 0x80000000. Don't remember the values by heart for the other oses. On 64-bit OSes you really can use the large pointers to store data and therefore have very compact structures in memory. Correct me if I'm wrong but currently most OSes only allow several gigabytes of virtual memory for any given user process (something like 100-200 GB), that means there's room for expression! :D In addition, embedding a tag within a pointer is a technique to reduce the ABA problem occurrence on lockfree containers. It would be a nice to have IMHO, but there is a huge work of testing to make it safe and reliable. -- EA __________ Information from ESET NOD32 Antivirus, version of virus signature database 4403 (20090907) __________ The message was checked by ESET NOD32 Antivirus. http://www.eset.com

Neil Groves

5:42 p.m.

New subject: Interest request for pointer+bit compression optimization

On Mon, Sep 7, 2009 at 6:33 PM, Edouard A. <edouard@fausse.info> wrote:

...

...
I was wondering if it would be interesting to provide some generic template classes making >easier this kind of optimization.

Yes, at least for me.

You actually have more than one bit available as the OS generally reserves a huge area of virtual memory for the kernel. Typical example, Windows 32, maximum user pointer value is 0x80000000. Don't remember the values by heart for the other oses.

This is incorrect. The pointer limit is only limited to under 0x80000000 if you are linking without the /LARGEADDRESSAWARE flag. It is quite common to use this flag to enable the best performance of a 32-bit application on a 64-bit version of Windows. It can also have some benefit when using the boot.ini /3GB tuning. http://msdn.microsoft.com/en-us/library/wz223b1z%28VS.80%29.aspx

...

On 64-bit OSes you really can use the large pointers to store data and therefore have very compact structures in memory. Correct me if I'm wrong but currently most OSes only allow several gigabytes of virtual memory for any given user process (something like 100-200 GB), that means there's room for expression! :D

The limit is as high as 8TB on some Windows 64-bit Operating Systems. http://msdn.microsoft.com/en-us/library/aa366778%28VS.85%29.aspx

...

In addition, embedding a tag within a pointer is a technique to reduce the ABA problem occurrence on lockfree containers.

It would be a nice to have IMHO, but there is a huge work of testing to make it safe and reliable.

Agreed.

...

--

EA

Regards, Neil Groves

Edouard A.

8:16 p.m.

New subject: Interest request for pointer+bit compression optimization

...

This is incorrect. The pointer limit is only limited to under 0x80000000 if you are linking without the /LARGEADDRESSAWARE flag. It is quite common to use this flag to enable the best performance of a 32-bit application on a 64-bit version of Windows. It can also have some benefit when using the boot.ini /3GB tuning.

http://msdn.microsoft.com/en-us/library/wz223b1z%28VS.80%29.aspx

Yes there is the possibility to expand the user space to 3GB at the cost of global system performance and system instability (hope your drivers aren't assuming anything about the address of the buffers in the irps they'll get ;) although if I remember correctly to get the WHQL certification you need to run in 3GB so if you have WHQL drivers you should be fine) on Windows 32-bit (the infamous /3GB switch you mentioned). I forgot that Windows 64 32-bit VM automatically expanded the user space to 3GB when the switch was on. That's safer in this context as the drivers deal with 64-bit address and all requests go through the VM. Anyway, I don't mean to argue about the ideal size of the user space in Windows 32, you're right the library should be able to be detect whether or not a 32-bit application has got 3GB or 2GB space.

...

On 64-bit OSes you really can use the large pointers to store data and therefore have very compact structures in memory. Correct me if I'm wrong but currently most OSes only allow several gigabytes of virtual memory for any given user process (something like 100-200 GB), that means there's room for expression! :D

The limit is as high as 8TB on some Windows 64-bit Operating Systems. http://msdn.microsoft.com/en-us/library/aa366778%28VS.85%29.aspx

You're right! I was mixing up virtual and physical memory limits. That's the amount of physical memory that is limited to 128 GB (mmm I see you can even load up 2 TB on the datacenter edition, having your whole movie collection in memory is some sort of major feature methinks :p). That leaves (in theory) around 21 bits to store information. -- EA __________ Information from ESET NOD32 Antivirus, version of virus signature database 4403 (20090907) __________ The message was checked by ESET NOD32 Antivirus. http://www.eset.com

Neil Groves

8:38 p.m.

New subject: Interest request for pointer+bit compression optimization

On Mon, Sep 7, 2009 at 9:16 PM, Edouard A. <edouard@fausse.info> wrote:

...

...
This is incorrect. The pointer limit is only limited to under 0x80000000 if you are linking without the /LARGEADDRESSAWARE flag. It is quite common to use this flag to enable the best performance of a 32-bit application on a 64-bit version of Windows. It can also have some benefit when using the boot.ini /3GB tuning.

http://msdn.microsoft.com/en-us/library/wz223b1z%28VS.80%29.aspx

Yes there is the possibility to expand the user space to 3GB at the cost of global system performance and system instability (hope your drivers aren't assuming anything about the address of the buffers in the irps they'll get ;) although if I remember correctly to get the WHQL certification you need to run in 3GB so if you have WHQL drivers you should be fine) on Windows 32-bit (the infamous /3GB switch you mentioned).

I forgot that Windows 64 32-bit VM automatically expanded the user space to 3GB when the switch was on. That's safer in this context as the drivers deal with 64-bit address and all requests go through the VM.

What you say about driver stability is very valid. However linking with /LARGEADDRESSAWARE does not force this behaviour, it only signifies to the OS that your application is capable of using the extra address space safely. For many applications this is useful, for a great many others it merely leads to confusion.

...

Anyway, I don't mean to argue about the ideal size of the user space in Windows 32, you're right the library should be able to be detect whether or not a 32-bit application has got 3GB or 2GB space.

...
On 64-bit OSes you really can use the large pointers to store data and therefore have very compact structures in memory. Correct me if I'm wrong but currently most OSes only allow several gigabytes of virtual memory for any given user process (something like 100-200 GB), that means there's room for expression! :D

The limit is as high as 8TB on some Windows 64-bit Operating Systems. http://msdn.microsoft.com/en-us/library/aa366778%28VS.85%29.aspx

You're right! I was mixing up virtual and physical memory limits. That's the amount of physical memory that is limited to 128 GB (mmm I see you can even load up 2 TB on the datacenter edition, having your whole movie collection in memory is some sort of major feature methinks :p).

That leaves (in theory) around 21 bits to store information.

You appear to be assuming that the OS guarantees that it uses the lowest available virtual addresses. Would you please provide a link to the documentation? I was under the impression that the 8TB limit is merely a practical one since this was the maximum hardware specification available for testing. I prefer using the least-significant bits when the alignment allows such an optimisation. This seems less error-prone and better documented as a safe idiom.

...

--

EA

Regards, Neil Groves

Edouard A.

9:19 p.m.

New subject: Interest request for pointer+bit compression optimization

...

You appear to be assuming that the OS guarantees that it uses the lowest available virtual addresses. Would you please provide a link to the documentation? I was under the impression that the 8TB limit is merely a practical one since this was the maximum hardware specification available for testing.

The OS isn't floating around in virtual memory space! The layout is strict and doesn't change and of it's precisely and correctly documented in the driver developer's manual. There is some randomness added for security, but the whole user space process is very strictly bounded. That's true for all OSes I've programmed drivers for, which is, I agree far from being exhaustive. For Windows x86 anything between 0x0 and 0x7fffffff is for user mode, with the first pages made inaccessible to help debug null pointers errors. I also remember that there's some clutter near the 2GB boundary (not sure about it, look it up if you're interested). Of course that goes up a bit if you extend the user space to 3GB. For Windows 64 running on IA64 anything between 0 and 0x6fbfffeffff is for user mode. It goes up to 0x7fffffeffff for AMD64. Again the first pages are inaccessible for the same reasons. Must something like a 32 kb locked range.

...

I prefer using the least-significant bits when the alignment allows such an optimisation. This seems less error-prone and better documented as a safe idiom.

I believe in freedom. -- EA __________ Information from ESET NOD32 Antivirus, version of virus signature database 4403 (20090907) __________ The message was checked by ESET NOD32 Antivirus. http://www.eset.com

Neil Groves

8 Sep 8 Sep

7:47 a.m.

New subject: Interest request for pointer+bit compression optimization

On Mon, Sep 7, 2009 at 10:19 PM, Edouard A. <edouard@fausse.info> wrote:

...

...
You appear to be assuming that the OS guarantees that it uses the lowest available virtual addresses. Would you please provide a link to the documentation? I was under the impression that the 8TB limit is merely a practical one since this was the maximum hardware specification available for testing.

The OS isn't floating around in virtual memory space! The layout is strict and doesn't change and of it's precisely and correctly documented in the driver developer's manual.

I would be alarmed if it was floating around. My concern was the perceived lack of documented guarantees. I was ignorant of any documentation with respect to the pointer ranges in the 64-bit Windows Operating Systems. My motivation was to clarify the safe useable ranges and ensure we weren't going to use undocumented implementation details by default.

...

There is some randomness added for security, but the whole user space process is very strictly bounded. That's true for all OSes I've programmed drivers for, which is, I agree far from being exhaustive.

For Windows x86 anything between 0x0 and 0x7fffffff is for user mode, with the first pages made inaccessible to help debug null pointers errors. I also remember that there's some clutter near the 2GB boundary (not sure about it, look it up if you're interested). Of course that goes up a bit if you extend the user space to 3GB.

For Windows 64 running on IA64 anything between 0 and 0x6fbfffeffff is for user mode. It goes up to 0x7fffffeffff for AMD64.

Again the first pages are inaccessible for the same reasons. Must something like a 32 kb locked range.

I was unaware of these exact address ranges, but you are correct. I have found documentation to this effect here: http://www.amd64.ru/download.php?uid=24504 The documentation for these ranges is not nearly as easy to find as the safe ranges for the 32-bit platforms.

...

I prefer using the least-significant bits when the alignment allows such an

...
optimisation. This seems less error-prone and better documented as a safe idiom.

I believe in freedom.

The fact that these are safe documented user ranges alters my opinion. Does anyone know of a way to determine the linker flag setting /LARGEADDRESSAWARE, or is it better to implement a pessimisation? I agree with Dave that looking at existing implementations is a good idea. Does anyone know which files are of interest in the CLang code, or know of some search hints?

I would like to see this generalised and optimally implemented too.

...

--

EA

Regards, Neil Groves

Edouard A.

8:02 a.m.

New subject: Interest request for pointer+bit compression optimization

...

I would be alarmed if it was floating around. My concern was the

On Tue, 8 Sep 2009 08:47:15 +0100, Neil Groves <neil@grovescomputing.com> wrote: perceived

...

lack of documented guarantees. I was ignorant of any documentation with respect to the pointer ranges in the 64-bit Windows Operating Systems. My motivation was to clarify the safe useable ranges and ensure we weren't going to use undocumented implementation details by default.

They are safe and clearly documented. What could happen is an extension of the user range, if someday 8 TB of memory shows to be too limited for user processes (it will happen some day, don't worry ;) ). You don't care too much about the extensions of the paged pool and non paged pool from an user process point of view. That may change earlier. Anyway, we have time before this happens and it will be clearly explained and documented. Failure to do so would result in the breakage of many system components.

...

I was unaware of these exact address ranges, but you are correct. I have found documentation to this effect here: http://www.amd64.ru/download.php?uid=24504

The documentation for these ranges is not nearly as easy to find as the safe ranges for the 32-bit platforms.

This is because the 64-bit platform is (it saddens me) not yet widely adopted, I guess.

...

...
The fact that these are safe documented user ranges alters my opinion. Does anyone know of a way to determine the linker flag setting /LARGEADDRESSAWARE, or is it better to implement a pessimisation?

Yes, GlobalMemoryStatusEx will give you the total amount of virtual memory available, but we're talking about half of a bit. :p Regards. -- EA

Cory Nelson

9 Sep 9 Sep

12:36 a.m.

New subject: Interest request for pointer+bit compression optimization

On Tue, Sep 8, 2009 at 12:47 AM, Neil Groves <neil@grovescomputing.com> wrote:

...

On Mon, Sep 7, 2009 at 10:19 PM, Edouard A. <edouard@fausse.info> wrote:

...
...
You appear to be assuming that the OS guarantees that it uses the lowest available virtual addresses. Would you please provide a link to the documentation? I was under the impression that the 8TB limit is merely a practical one since this was the maximum hardware specification available for testing.

The OS isn't floating around in virtual memory space! The layout is strict and doesn't change and of it's precisely and correctly documented in the driver developer's manual.

I would be alarmed if it was floating around. My concern was the perceived lack of documented guarantees. I was ignorant of any documentation with respect to the pointer ranges in the 64-bit Windows Operating Systems. My motivation was to clarify the safe useable ranges and ensure we weren't going to use undocumented implementation details by default.

The lock-free stack implementation in XP and Vista uses the high 21 bits of a 64-bit pointer for the ABA tag. Undocumented implementation detail, but at least it's not a complete mystery as to what should be safe for use :) -- Cory Nelson http://int64.org

Stefan Strasser

7 Sep 7 Sep

9:01 p.m.

New subject: Interest request for pointer+bit compression optimization

Am Monday 07 September 2009 19:42:57 schrieb Neil Groves:

...

...
It would be a nice to have IMHO, but there is a huge work of testing to make it safe and reliable.

Agreed.

if you wanted to exploit system implementation specific free bits. detecting the alignment of pointers seems to be possible reliably and statically, otherwise boost intrusive and STL implementations wouldn't use it. that's 2-3 free bits on aligned 32/64 bit platforms.

vicente.botet

8:21 p.m.

New subject: Interest request for pointer+bit compressionoptimization

Hi ----- Original Message ----- From: "Edouard A." <edouard@fausse.info> To: <boost@lists.boost.org> Sent: Monday, September 07, 2009 7:33 PM Subject: Re: [boost] Interest request for pointer+bit compressionoptimization

...

...
I was wondering if it would be interesting to provide some generic template classes making >easier this kind of optimization.

Yes, at least for me.

You actually have more than one bit available as the OS generally reserves a huge area of virtual memory for the kernel. Typical example, Windows 32, maximum user pointer value is 0x80000000. Don't remember the values by heart for the other oses.

The trick is not because the OS don't use the whole addresable memory but on the fact memory is aligned.

...

From the blog "Alignment considerations dictate that many types must lie in memory addresses that are multiple of some small integer greater than 1; in particular, it is extremely usual that the alignment of a type T is even (i.e. a multiple of 2), in which case the least significant bit of the representation of pointers to T is always zero. And this is precisely the situation in which we can reuse this bit to embed the color information:"

...

On 64-bit OSes you really can use the large pointers to store data and therefore have very compact structures in memory. Correct me if I'm wrong but currently most OSes only allow several gigabytes of virtual memory for any given user process (something like 100-200 GB), that means there's room for expression! :D

In addition, embedding a tag within a pointer is a technique to reduce the ABA problem occurrence on lockfree containers.

It would be a nice to have IMHO, but there is a huge work of testing to make it safe and reliable.

Well there is already the fact that boost::aligned_of is not portable, but with the new C++1x we will have this is a portable way. Other than that there is not too much to do. Thanks, Vicente

Edouard A.

8:42 p.m.

New subject: Interest request for pointer+bit compressionoptimization

...

You actually have more than one bit available as the OS generally reserves a huge area of virtual memory for the kernel. Typical example, Windows 32, maximum user pointer value is 0x80000000. Don't remember the values by heart for the other oses.

...

...
The trick is not because the OS don't use the whole addresable memory but on the fact >memory is aligned. From the blog "Alignment considerations dictate that many types must lie in memory addresses that are >multiple of some small integer greater than 1; in particular, it is extremely usual that >the alignment of a type T is even (i.e. a multiple of 2), in which case the least >significant bit of the representation of pointers to T is always zero. And this is >precisely the situation in which we can reuse this bit to embed the color information:"

I know, that doesn't mean it cannot be pushed further. 1 bit is not much. You can even go beyond that and use your own memory allocator that's going to make sure the pointer have space where you expect it to bit. Let's say you can salvage 4 bit on the low part and 4 on the high part (for 32 bit pointers), that's 8 bit ! :) -- EA __________ Information from ESET NOD32 Antivirus, version of virus signature database 4403 (20090907) __________ The message was checked by ESET NOD32 Antivirus. http://www.eset.com

Daniel Hulme

5:46 p.m.

New subject: Interest request for pointer+bit compression optimization

On Mon, Sep 07, 2009 at 07:00:14PM +0200, vicente.botet wrote:

...

I have recently found in the blog of Joaquin the following optimization consisting in embed a bit into the representation of a pointer. Please read it, it is not too long (http://bannalia.blogspot.com/2008/11/optimizing-red-black-tree-color-bits.ht...)

One of my colleagues wrote something similar, with get and set methods instead of proxy classes, and we use it extensively at work. On AMD64 (and whatever the Intel name for the same thing is) pointers are always divisible by 4, so we actually store 2 bits in each pointer. I've also written a boost::variant-like class that uses the 2 bits to store the which() value for up to four types of pointer (one of which can instead be boost::blank), in order to reduce the size of my small- buffer-optimization class. We don't use this right now because code elsewhere in the system needs to get a reference to the pointer stored in it, and templating the whole lot so we can pass a proxy object around is too horrific for words. -- "I think you look like the Mona Lisa. You always seem to be at a window admiring the landscape that is actually behind you." Herve Le Tellier http://surreal.istic.org/ It's like a DEATH CIRCUS!

vicente.botet

8:25 p.m.

New subject: Interest request for pointer+bitcompression optimization

----- Original Message ----- From: "Daniel Hulme" <st@istic.org> To: <boost@lists.boost.org> Sent: Monday, September 07, 2009 7:46 PM Subject: Re: [boost] Interest request for pointer+bitcompression optimization On Mon, Sep 07, 2009 at 07:00:14PM +0200, vicente.botet wrote:

...

I have recently found in the blog of Joaquin the following optimization consisting in embed a bit into the representation of a pointer. Please read it, it is not too long (http://bannalia.blogspot.com/2008/11/optimizing-red-black-tree-color-bits.ht...)

...

Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Hi Daniel, Could you share the code of the variant-like class? Vicente

Daniel Hulme

8 Sep 8 Sep

7:34 a.m.

New subject: Interest request for pointer+bitcompression optimization

On Mon, Sep 07, 2009 at 10:25:38PM +0200, vicente.botet wrote:

...

Hi Daniel, Could you share the code of the variant-like class?

Probably not, for two reasons: 1) As I wrote this for work, it's their code, and there's no compelling reason for them to share it. 2) As we can't use it for the intended use (for the reason I explained), I haven't finished it off: it's more of a POC than production-ready code. So, don't get your hopes up, but I'll see what I can do. Even so, starting with the pointer-stuffing class and my container class's unit tests, it took maybe a day or two to get it to the testable state it's in now, so starting an open-source implementation from scratch wouldn't be much duplicated work. -- Sufficiently advanced humour is indistinguishable from tedium. corollary: Humour distinguishable from tedium is insufficiently advanced. http://surreal.istic.org/ Yellow: it's the new khaki.

Stefan Strasser

7 Sep 7 Sep

8:32 p.m.

New subject: Interest request for pointer+bit compression optimization

Am Monday 07 September 2009 19:00:14 schrieb vicente.botet:

...

Hi,

I have recently found in the blog of Joaquin the following optimization consisting in embed a bit into the representation of a pointer. Please read it, it is not too long (http://bannalia.blogspot.com/2008/11/optimizing-red-black-tree-color-bits. html)

I was wondering if it would be interesting to provide some generic template classes making easier this kind of optimization.

I know that boost.intrusive's rb-tree implements that, and below I quoted its way to decide if the pointer is aligned in a way to use the lower bits of the pointer. if you wanted to implement that as a generic template I guess the best way is to hide it behind a class that models the concept of a std::pair, or a boost::tuple. e.g. optimized_pair<void *,bool> p(0,false); assert(sizeof(p) == sizeof(void *)); boost.intrusive: template <typename T> struct alignment_of; template <typename T> struct alignment_of_hack { char c; T t; alignment_of_hack(); }; template <unsigned A, unsigned S> struct alignment_logic { static const std::size_t value = A < S ? A : S; }; template< typename T > struct alignment_of { static const std::size_t value = alignment_logic < sizeof(alignment_of_hack<T>) - sizeof(T) , sizeof(T) >::value; }; //Inherit from the detail::link_dispatch depending on the embedding capabilities template<class VoidPointer, bool OptimizeSize = false> struct rbtree_node_traits : public rbtree_node_traits_dispatch < VoidPointer , OptimizeSize && (max_pointer_plus_bits < VoidPointer , detail::alignment_of<compact_rbtree_node<VoidPointer> >::value >::value >= 1) > {};

vicente.botet

8:32 p.m.

New subject: Interest request for pointer+bit compressionoptimization

Hi Stephan, ----- Original Message ----- From: "Stefan Strasser" <strasser@uni-bremen.de> To: <boost@lists.boost.org> Sent: Monday, September 07, 2009 10:32 PM Subject: Re: [boost] Interest request for pointer+bit compressionoptimization

...

Am Monday 07 September 2009 19:00:14 schrieb vicente.botet:

...
Hi,

I have recently found in the blog of Joaquin the following optimization consisting in embed a bit into the representation of a pointer. Please read it, it is not too long (http://bannalia.blogspot.com/2008/11/optimizing-red-black-tree-color-bits. html)

I was wondering if it would be interesting to provide some generic template classes making easier this kind of optimization.

I know that boost.intrusive's rb-tree implements that, and below I quoted its way to decide if the pointer is aligned in a way to use the lower bits of the pointer.

Thanks for the pointer. So there is already an example of a potential Boost user.

...

if you wanted to implement that as a generic template I guess the best way is to hide it behind a class that models the concept of a std::pair, or a boost::tuple.

e.g. optimized_pair<void *,bool> p(0,false); assert(sizeof(p) == sizeof(void *));

Yes, this could be a valid alternative. I would try to implement it. Thanks, Vicente

vicente.botet

10:40 p.m.

New subject: Interest request for pointer+bit compressionoptimization

----- Original Message ----- From: "vicente.botet" <vicente.botet@wanadoo.fr> To: <boost@lists.boost.org> Sent: Monday, September 07, 2009 10:32 PM Subject: Re: [boost] Interest request for pointer+bit compressionoptimization

...

Hi Stephan, ----- Original Message ----- From: "Stefan Strasser" <strasser@uni-bremen.de> To: <boost@lists.boost.org> Sent: Monday, September 07, 2009 10:32 PM Subject: Re: [boost] Interest request for pointer+bit compressionoptimization

...
Am Monday 07 September 2009 19:00:14 schrieb vicente.botet:

...
Hi,

I have recently found in the blog of Joaquin the following optimization consisting in embed a bit into the representation of a pointer. Please read it, it is not too long (http://bannalia.blogspot.com/2008/11/optimizing-red-black-tree-color-bits. html)

I was wondering if it would be interesting to provide some generic template classes making easier this kind of optimization.

if you wanted to implement that as a generic template I guess the best way is to hide it behind a class that models the concept of a std::pair, or a boost::tuple.

e.g. optimized_pair<void *,bool> p(0,false); assert(sizeof(p) == sizeof(void *));

Yes, this could be a valid alternative. I would try to implement it.

I have not compiled the code but here it is a first trial: // Selects between a compressed and a regular class depending on whether // * the use wants the compression, // *the compilers has a unsigned integer type with the size of the a pointer and // *the alignement of the structure let an unsued bit template <typename Compressed, typename Regular, bool Compression=true> struct best_of { typedef boost::mpl::if_c< Compression&& has_uintptr_type::value&& (boost::alignment_of<Compressed>::value%2==0), Compressed, Regular

...

::type type; };

// compressed version template <typename Bit, typename Pointee> class compressed_bit_pointer { uintptr_type ui_; public: regular _bit_pointer (): ui_(0); regular _bit_pointer (Bit b, Pointee* p); pointer_ref<Pointee> pointer() {return pointer_ref(ui_);} Pointee* pointer() const {return pointer_ref(ui_);} bit_ref<Bit> bit() {return bit_ref(ui_);} Bit bit() const {return bit_ref(ui_);} }; // regular version template <typename Bit, typename Pointee> class regular_bit_pointer { Bit bit_; Pointee pointer_; public: regular _bit_pointer ():pointer_(0),bit_(0); regular _bit_pointer (Bit b, Pointee* p); pointer_ref<Pointee> pointer() {return pointer_;} Pointee* pointer() const {return pointer_;} bit_ref<Bit> bit() {return bit_;} Bit bit() const {return bit_;} }; // selects the best of the compressed or the regular version template <typename Pointee, typename Bit, bool Compression=true> struct optimized_pair : typename best_of<compressed_bit_pointer<Bit, Pointee>, regular_bit_pointer<Bit, Pointee>, Compression>::type {}; The use is a little bit more simple, no need to declare two classes compressed and regular optimized_pair<color_type, rb_node> color_parent_; pointer_ref<rb_node> color() {return color_parent_.pointer();} rb_node* parent() const {return color_parent_.pointer();} color_type color()const{return color_parent_.bit();} bit_ref<color_type> color(){return color_parent_.bit();} I think that I will put altogther in a package. Any sugestion on where this utility could be placed? A new library? Do you have suggestion to improve the namming and a namespace? Best, Vicente

David Abrahams

8 Sep 8 Sep

12:39 a.m.

New subject: Interest request for pointer+bit compressionoptimization

on Mon Sep 07 2009, "vicente.botet" <vicente.botet-AT-wanadoo.fr> wrote:

...

I think that I will put altogther in a package. Any sugestion on where this utility could be placed? A new library? Do you have suggestion to improve the namming and a namespace?

Two points: 1. This optimization is currently (as of 1.40) being used in Boost.Function 2. The Clang project is full of this sort of thing, IIUC. You might look at what they're doing before you decide on an interface. -- Dave Abrahams BoostPro Computing http://www.boostpro.com

Vicente Botet Escriba

7:37 a.m.

New subject: Interest request for pointer+bit compressionoptimization

David Abrahams-3 wrote:

...

on Mon Sep 07 2009, "vicente.botet" <vicente.botet-AT-wanadoo.fr> wrote:

...
I think that I will put altogther in a package. Any sugestion on where this utility could be placed? A new library? Do you have suggestion to improve the namming and a namespace?

Two points:

1. This optimization is currently (as of 1.40) being used in Boost.Function

2. The Clang project is full of this sort of thing, IIUC. You might look at what they're doing before you decide on an interface.

-- Dave Abrahams BoostPro Computing http://www.boostpro.com

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Hi, Do you have a specific pointer for CLang related to this usage? Thanks for your advice, Vicente -- View this message in context: http://www.nabble.com/Interest-request-for-pointer%2Bbit-compression-optimiz... Sent from the Boost - Dev mailing list archive at Nabble.com.

David Abrahams

7:54 a.m.

New subject: Interest request for pointer+bit compressionoptimization

on Tue Sep 08 2009, Vicente Botet Escriba <vicente.botet-AT-wanadoo.fr> wrote:

...

Do you have a specific pointer for CLang related to this usage?

Nope, this is hearsay. Doug? -- Dave Abrahams BoostPro Computing http://www.boostpro.com

troy d. straszheim

10:51 a.m.

New subject: Interest request for pointer+bit compressionoptimization

David Abrahams wrote:

...

on Tue Sep 08 2009, Vicente Botet Escriba <vicente.botet-AT-wanadoo.fr> wrote:

...
Do you have a specific pointer for CLang related to this usage?

Nope, this is hearsay. Doug?

If I remember Doug's explanation correctly, one of the optimizations inside clang is: There is a global table of 'plain' types, e.g. int, bool, float, std::string. Clang internally represents CV qualified types by e.g. creating a pointer to the type 'int', then using the small bits of the pointer to tag const and volatile. I don't recall whether they used more than two bits (for, say reference, static, pointer, etc.) Sebastian Redl gave a great intro to clang @ boostcon, I'd bet he knows the details as well. Aside from that, another use case: Mark-and-sweep garbage collection. This came up when as an exercise I implemented a lisp interpreter with spirit, variants, visitors, and intrusive_ptr. Currently this thing just leaks pointer loops, and a taggable_ptr combined with some kind of memory pool would seem a nice way to start working on it... -t

Doug Gregor

3:22 p.m.

New subject: Interest request for pointer+bit compressionoptimization

On Tue, Sep 8, 2009 at 3:51 AM, troy d. straszheim<troy@resophonic.com> wrote:

...

David Abrahams wrote:

...
on Tue Sep 08 2009, Vicente Botet Escriba <vicente.botet-AT-wanadoo.fr> wrote:

...
Do you have a specific pointer for CLang related to this usage?

Nope, this is hearsay. Doug?

If I remember Doug's explanation correctly, one of the optimizations inside clang is: There is a global table of 'plain' types, e.g. int, bool, float, std::string. Clang internally represents CV qualified types by e.g. creating a pointer to the type 'int', then using the small bits of the pointer to tag const and volatile. I don't recall whether they used more than two bits (for, say reference, static, pointer, etc.) Sebastian Redl gave a great intro to clang @ boostcon, I'd bet he knows the details as well.

vicente.botet

7:57 p.m.

New subject: Interest request for pointer+bit compressionoptimization

Hi, ----- Original Message ----- From: "Doug Gregor" <doug.gregor@gmail.com> To: <boost@lists.boost.org> Sent: Tuesday, September 08, 2009 5:22 PM Subject: Re: [boost] Interest request for pointer+bit compressionoptimization On Tue, Sep 8, 2009 at 3:51 AM, troy d. straszheim<troy@resophonic.com> wrote:

...

David Abrahams wrote:

...
on Tue Sep 08 2009, Vicente Botet Escriba <vicente.botet-AT-wanadoo.fr> wrote:

...
Do you have a specific pointer for CLang related to this usage?

Nope, this is hearsay. Doug?

If I remember Doug's explanation correctly, one of the optimizations inside clang is: There is a global table of 'plain' types, e.g. int, bool, float, std::string. Clang internally represents CV qualified types by e.g. creating a pointer to the type 'int', then using the small bits of the pointer to tag const and volatile. I don't recall whether they used more than two bits (for, say reference, static, pointer, etc.) Sebastian Redl gave a great intro to clang @ boostcon, I'd bet he knows the details as well.

Clang's QualType smart pointer uses three bits, for const, volatile, and restrict, so we have to force 8-byte alignment when we allocate Type structures. There's a tiny bit of explanation of this part of the type system here: http://clang.llvm.org/docs/InternalsManual.html#QualType The actual QualType class is here: https://llvm.org/svn/llvm-project/cfe/trunk/include/clang/AST/Type.h We use this bit-mangling trick a *lot* in Clang. The general infrastructure is "PointerIntPair", which is what it says it is: a pointer and an integer mangled together, with accessors for each. The code is here: https://llvm.org/svn/llvm-project/llvm/trunk/include/llvm/ADT/PointerIntPair... We build most of our bit-mangled structures on top of PointerIntPair or PointerUnion, a union of two (distinct) pointer types: https://llvm.org/svn/llvm-project/llvm/trunk/include/llvm/ADT/PointerUnion.h - Doug _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost The PointerIntPair provides exactly what optimized_pair class. The name is much more explicit. Thanks for the pointer. Extracted from https://llvm.org/svn/llvm-project/llvm/trunk/include/llvm/ADT/PointerIntPair... " /// PointerIntPair - This class implements a pair of a pointer and small /// integer. It is designed to represent this in the space required by one /// pointer by bitmangling the integer into the low part of the pointer. This /// can only be done for small integers: typically up to 3 bits, but it depends /// on the number of bits available according to PointerLikeTypeTraits for the /// type. /// /// Note that PointerIntPair always puts the Int part in the highest bits /// possible. For example, PointerIntPair<void*, 1, bool> will put the bit for /// the bool into bit #2, not bit #0, which allows the low two bits to be used /// for something else. For example, this allows: /// PointerIntPair<PointerIntPair<void*, 1, bool>, 1, bool> /// ... and the two bools will land in different bits. /// template <typename PointerTy, unsigned IntBits, typename IntType=unsigned, typename PtrTraits = PointerLikeTypeTraits<PointerTy> > class PointerIntPair; " I like the way it manage to store several bits using a PointerLikeTypeTraits Thanks to all to pointing to this simple class. Are you interested on the inclusion of this class to boost? Vicente

Doug Gregor

9 Sep 9 Sep

12:08 a.m.

New subject: Interest request for pointer+bit compressionoptimization

On Tue, Sep 8, 2009 at 12:57 PM, vicente.botet<vicente.botet@wanadoo.fr> wrote:

...

template <typename PointerTy, unsigned IntBits, typename IntType=unsigned, typename PtrTraits = PointerLikeTypeTraits<PointerTy> > class PointerIntPair; "

I like the way it manage to store several bits using a PointerLikeTypeTraits

Thanks to all to pointing to this simple class.

Are you interested on the inclusion of this class to boost?

As I mentioned before, we've found it very useful in Clang to keep various data structures small. I imagine that Boost libraries and users would want to do the same, so it would make a good addition to Boost. - Doug

Sebastian Redl

11:47 a.m.

New subject: Interest request for pointer+bit compressionoptimization

Doug Gregor wrote:

...

On Tue, Sep 8, 2009 at 12:57 PM, vicente.botet<vicente.botet@wanadoo.fr> wrote:

...
template <typename PointerTy, unsigned IntBits, typename IntType=unsigned, typename PtrTraits = PointerLikeTypeTraits<PointerTy> > class PointerIntPair; "

I like the way it manage to store several bits using a PointerLikeTypeTraits

Thanks to all to pointing to this simple class.

Are you interested on the inclusion of this class to boost?

As I mentioned before, we've found it very useful in Clang to keep various data structures small. I imagine that Boost libraries and users would want to do the same, so it would make a good addition to Boost.

I would also be interested. I've already implemented my own version in a library I'm developing (which I might one day submit to Boost), so I would really like one in Boost. Sebastian

Ion Gaztañaga

8 Sep 8 Sep

9:04 a.m.

New subject: Interest request for pointer+bit compression optimization

vicente.botet escribió:

...

Hi,

I have recently found in the blog of Joaquin the following optimization consisting in embed a bit into the representation of a pointer. Please read it, it is not too long (http://bannalia.blogspot.com/2008/11/optimizing-red-black-tree-color-bits.ht...)

I was wondering if it would be interesting to provide some generic template classes making easier this kind of optimization.

Take a look at http://www.boost.org/boost/boost/intrusive/pointer_plus_bits.hpp for some helper classes used in Intrusive that might be useful: The method is being used to embed also N bits in offset_ptr and other smart pointers, see: http://www.boost.org/boost/intrusive/pointer_plus_bits.hpp http://www.boost.org/boost/interprocess/offset_ptr.hpp then the container select the correct node depending on the alignment of the generic pointer type: //Inherit from the detail::link_dispatch depending on the embedding capabilities template<class VoidPointer, bool OptimizeSize = false> struct rbtree_node_traits : public rbtree_node_traits_dispatch < VoidPointer , OptimizeSize && (max_pointer_plus_bits < VoidPointer , detail::alignment_of<compact_rbtree_node<VoidPointer>

...

::value >::value >= 1) > {};

the same happens with AVL trees but in this case we need 2 bits. Best, Ion

vicente.botet

7:25 p.m.

New subject: Interest request for pointer+bit compressionoptimization

Hi Ion, ----- Original Message ----- From: "Ion Gaztañaga" <igaztanaga@gmail.com> To: <boost@lists.boost.org> Sent: Tuesday, September 08, 2009 11:04 AM Subject: Re: [boost] Interest request for pointer+bit compressionoptimization vicente.botet escribió:

...

Hi,

I have recently found in the blog of Joaquin the following optimization consisting in embed a bit into the representation of a pointer. Please read it, it is not too long (http://bannalia.blogspot.com/2008/11/optimizing-red-black-tree-color-bits.ht...)

I was wondering if it would be interesting to provide some generic template classes making easier this kind of optimization.

...

::value >::value >= 1) > {};

the same happens with AVL trees but in this case we need 2 bits. Best, Ion _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost Thanks Ion, I was unaware this trick was included from the begining on Boost.Intrusive. With the optimized_pair class (or a variant taking care of the number of bits), instead of defining compressed and compact classes rbtree_node and rbtree_node_traits you could just do the following template<class VoidPointer, bool OptimizeSize = false> struct rbtree_node { typedef typename pointer_to_other <VoidPointer ,compact_rbtree_node<VoidPointer> >::type node_ptr; enum color { red_t, black_t }; node_ptr left_, right_; optimized_pair<node_ptr, color, 1, OptimizeSize> parent_plus_color_; // (1) }; template<class VoidPointer, bool OptimizeSize = false> struct rbtree_node_traits { typedef rbtree_node<VoidPointer, OptimizeSize> node; typedef typename boost::pointer_to_other <VoidPointer, node>::type node_ptr; typedef typename boost::pointer_to_other <VoidPointer, const node>::type const_node_ptr; typedef typename node::color color; static node_ptr get_parent(const_node_ptr n) { return n->parent_plus_color_.get_pointer(); } // (2) static void set_parent(node_ptr n, node_ptr p) { n->parent_plus_color_.set_pointer(p); } // (2) static node_ptr get_left(const_node_ptr n) { return n->left_; } static void set_left(node_ptr n, node_ptr l) { n->left_ = l; } static node_ptr get_right(const_node_ptr n) { return n->right_; } static void set_right(node_ptr n, node_ptr r) { n->right_ = r; } static color get_color(const_node_ptr n) { return n->parent_plus_color_.get_bits(); } // (2) static void set_color(node_ptr n, color c) { n->parent_plus_color_.set_bits(c); } // (2) static color black() { return node::black_t; } static color red() { return node::red_t; } }; The main advantages are A. Avoids duplication of common parts between the regular and the compressed implementations. B. the user of the optimized_pair would not care on the condition OptimizeSize && (max_pointer_plus_bits < VoidPointer , detail::alignment_of<compact_rbtree_node<VoidPointer> C. The single differences with a regular class are minimal (1) the declaration and (2) the getters and setters. I think that we can rename optimized_pair by pointer_plus_bits which i more specific. Do you think that it is worth include this class in Boost? Best, Vicente

5779

Age (days ago)

5781

Last active (days ago)

List overview

Download

27 comments

12 participants

participants (12)

Cory Nelson
Daniel Hulme
David Abrahams
Doug Gregor
Edouard A.
Ion Gaztañaga
Neil Groves
Sebastian Redl
Stefan Strasser
troy d. straszheim
Vicente Botet Escriba
vicente.botet