[serialization] boost::serialization adds huge amounts of exports to resultant Windows PE file

Hey guys,
I am using boost::serialization from 1.44.0. One thing that I noticed
is that linking statically to the serialization libs will add several
hundred exports in the final exe file that I get. Using `dumpbin
/exports my_program.exe`
Here's a brief illustration:
Let's say we have a typical Hello World program (code example at the
end of this message) that uses `iostream`. Here's what we get when we
run `dumpbin /exports my_program.exe`.
Dump of file my_program.exe
File Type: EXECUTABLE IMAGE
Summary
4000 .data
2000 .pdata
7000 .rdata
1000 .rsrc
18000 .text
However, if we just add 6 lines of code to include
boost::serialization (code example at the end of this message), we
would get tons of exports.
Dump of file my_program.exe
File Type: EXECUTABLE IMAGE
Section contains the following exports for my_program.exe
00000000 characteristics
4CA376FA time date stamp Thu Sep 30 01:27:22 2010
0.00 version
1 ordinal base
14 number of functions
14 number of names
ordinal hint RVA name
1 0 00029480
??_B?1??get_instance@?$singleton@V?$map@Vtext_oarchive@archive@boost@@@?A0xca82ee40@detail@archive@boost@@@serialization@boost@@CAAEAV?$map@Vtext_oarchive@archive@boost@@@?A0xca82ee40@detail@archive@3@XZ@51
= ??_B?1??get_instance@?$singleton@V?$map@Vtext_oarchive@archive@boost@@@?A0xca82ee40@detail@archive@boost@@@serialization@boost@@CAAEAV?$map@Vtext_oarchive@archive@boost@@@?A0xca82ee40@detail@archive@3@XZ@51
(`private: static class boost::archive::detail::`anonymous
namespace'::map<class boost::archive::text_oarchive> & __cdecl
boost::serialization::singleton

Chris Yuen wrote:
Hey guys,
I am using boost::serialization from 1.44.0. One thing that I noticed is that linking statically to the serialization libs will add several hundred exports in the final exe file that I get. Using `dumpbin /exports my_program.exe`
These functions are not explicity called from the library. But they ARE called as part of the serialization process. Its just that MSVC doesn't see them. So when you compile for release, The MSVC Linker strips them out and the program won't work anymore. In order to work around this, these functions are explicitly exported. This prevents MSVC from stripping them out. For more information see force_include.hpp Robert Ramey

On Wed, Sep 29, 2010 at 06:48:55PM -0800, Robert Ramey wrote:
Chris Yuen wrote:
Hey guys,
I am using boost::serialization from 1.44.0. One thing that I noticed is that linking statically to the serialization libs will add several hundred exports in the final exe file that I get. Using `dumpbin /exports my_program.exe`
These functions are not explicity called from the library. But they ARE called as part of the serialization process. Its just that MSVC doesn't see them. So when you compile for release, The MSVC Linker strips them out and the program won't work anymore. In order to work around this, these functions are explicitly exported. This prevents MSVC from stripping them out. For more information see force_include.hpp
These exported symbols are excellent for provoking bugs in software that makes assumptions about the maximum reasonable length a symbol should be able to have. I had a quite fun hair-tearing experience with a task manager replacement that overran some buffer due to Boost.S11n, resulting in instability, bogus output and program crashes. They are a bit annoying though, as they tend to show up in any crash reports I get, as they're the only symbols exported that the crash helper can locate. Took me a couple of post-mortem debugs to look at the offsets and realize that the names were red herrings. Now I finally know why they are in my modules in the first place, heh. -- Lars Viklund | zao@acc.umu.se

On 10/02/2010 12:56 AM, Lars Viklund wrote:
On Wed, Sep 29, 2010 at 06:48:55PM -0800, Robert Ramey wrote:
Chris Yuen wrote:
Hey guys,
I am using boost::serialization from 1.44.0. One thing that I noticed is that linking statically to the serialization libs will add several hundred exports in the final exe file that I get. Using `dumpbin /exports my_program.exe`
These functions are not explicity called from the library. But they ARE called as part of the serialization process. Its just that MSVC doesn't see them. So when you compile for release, The MSVC Linker strips them out and the program won't work anymore. In order to work around this, these functions are explicitly exported. This prevents MSVC from stripping them out. For more information see force_include.hpp
The problem with exporting is that it adds a semantic meaning which is unwanted. I think the /include linker option is supposed to do what you want: http://msdn.microsoft.com/en-US/library/2s3hwbhs%28v=VS.80%29.aspx "Specifying a symbol with this option overrides the removal of that symbol by /OPT:REF." You can even specify it in the source: #pragma comment(linker, "/include:__mySymbol") http://msdn.microsoft.com/en-us/library/7f0aews7%28VS.80%29.aspx
These exported symbols are excellent for provoking bugs in software that makes assumptions about the maximum reasonable length a symbol should be able to have.
I had a quite fun hair-tearing experience with a task manager replacement that overran some buffer due to Boost.S11n, resulting in instability, bogus output and program crashes.
Sounds like that app may have an exploitable security hole. There are a variety of techniques to load modules remotely into Windows, especially if the attack doesn't need be executed directly. - Marsh

Marsh Ray wrote:
On 10/02/2010 12:56 AM, Lars Viklund wrote:
On Wed, Sep 29, 2010 at 06:48:55PM -0800, Robert Ramey wrote:
Chris Yuen wrote:
Hey guys,
I am using boost::serialization from 1.44.0. One thing that I noticed is that linking statically to the serialization libs will add several hundred exports in the final exe file that I get. Using `dumpbin /exports my_program.exe`
These functions are not explicity called from the library. But they ARE called as part of the serialization process. Its just that MSVC doesn't see them. So when you compile for release, The MSVC Linker strips them out and the program won't work anymore. In order to work around this, these functions are explicitly exported. This prevents MSVC from stripping them out. For more information see force_include.hpp
The problem with exporting is that it adds a semantic meaning which is unwanted.
I'm not seeing this.
I think the /include linker option is supposed to do what you want:
http://msdn.microsoft.com/en-US/library/2s3hwbhs%28v=VS.80%29.aspx "Specifying a symbol with this option overrides the removal of that symbol by /OPT:REF."
You can even specify it in the source:
#pragma comment(linker, "/include:__mySymbol")
http://msdn.microsoft.com/en-us/library/7f0aews7%28VS.80%29.aspx
I don't see how one could pass a symbol generated by template to to a linker switch or to a pragma.
These exported symbols are excellent for provoking bugs in software that makes assumptions about the maximum reasonable length a symbol should be able to have.
I'm no seeing this either. I don't see how that this symbol information is accessible. Even if it is, I don't see how it could be used.
I had a quite fun hair-tearing experience with a task manager replacement that overran some buffer due to Boost.S11n, resulting in instability, bogus output and program crashes.
I don't see how exported symbols could cause this.
Sounds like that app may have an exploitable security hole. There are a variety of techniques to load modules remotely into Windows, especially if the attack doesn't need be executed directly.
I don't see this either. Someone else has reported that msvc no longer strips these symbols but has been unable to test this. I don't see how it's possible for the compiler/linker to know which unreferred to symbols can be stripped and which cannot be. GCC has some attributes which one can attach to flag function which shouldn't be stripped so it's not a problem there. The IBM compiler recently added attributes handle this as well. As I noted above, I see anyway that linker switches or pragmas can be generated with template code. Robert Ramey

On Sat, Oct 02, 2010 at 10:09:18PM -0800, Robert Ramey wrote:
Marsh Ray wrote:
On 10/02/2010 12:56 AM, Lars Viklund wrote:
These exported symbols are excellent for provoking bugs in software that makes assumptions about the maximum reasonable length a symbol should be able to have.
I'm no seeing this either. I don't see how that this symbol information is accessible. Even if it is, I don't see how it could be used.
The symbols exported by a module can be enumerated together with debug information to resolve a module and offset into a function name and offset, through use of dbghelp.dll's API.
I had a quite fun hair-tearing experience with a task manager replacement that overran some buffer due to Boost.S11n, resulting in instability, bogus output and program crashes.
I don't see how exported symbols could cause this.
In this case, the application had an UI feature to for a process display list of threads with resolved function names+offsets where the threads were currently executing.
Sounds like that app may have an exploitable security hole. There are a variety of techniques to load modules remotely into Windows, especially if the attack doesn't need be executed directly.
I don't see this either.
There might very well have been a security hole. I reported it upstream and a fixed version was released shortly after. I leave threat analysis to people who are competent in the field. The risk would be somewhat low in this particular case as it relies on explicit UI actions to view the data. And of course, it's triggerable completely without the help of S11n, as all you needed was an overly long symbol name. -- Lars Viklund | zao@acc.umu.se

Lars Viklund wrote:
On Sat, Oct 02, 2010 at 10:09:18PM -0800, Robert Ramey wrote:
Marsh Ray wrote:
On 10/02/2010 12:56 AM, Lars Viklund wrote:
These exported symbols are excellent for provoking bugs in software that makes assumptions about the maximum reasonable length a symbol should be able to have.
I'm no seeing this either. I don't see how that this symbol information is accessible. Even if it is, I don't see how it could be used.
The symbols exported by a module can be enumerated together with debug information to resolve a module and offset into a function name and offset, through use of dbghelp.dll's API.
For what's worth, I would expect most windows aps to be distributed in release mode which means (among other things) no debug symbols. But in any case, I've found no other way to prevent the linker for explicitly eliminating instantiations of function templates which are not explicitly referenced. I've you've got another way of doing this I would be pleased to hear it. Also note that this occurs only for classes marked with the BOOST_CLASS_EXPORT macro. This can be avoided by using the alternative explicit registration method. That has a different set of inconveniences. These are the only alternatives I know of. I realize that the decision isn't easy, but then, that's why we make the big bucks. Robert Ramey

On 10/03/2010 01:09 AM, Robert Ramey wrote:
Marsh Ray wrote:
On 10/02/2010 12:56 AM, Lars Viklund wrote:
On Wed, Sep 29, 2010 at 06:48:55PM -0800, Robert Ramey wrote:
Chris Yuen wrote:
Hey guys,
I am using boost::serialization from 1.44.0. One thing that I noticed is that linking statically to the serialization libs will add several hundred exports in the final exe file that I get. Using `dumpbin /exports my_program.exe`
These functions are not explicity called from the library. But they ARE called as part of the serialization process. Its just that MSVC doesn't see them. So when you compile for release, The MSVC Linker strips them out and the program won't work anymore. In order to work around this, these functions are explicitly exported. This prevents MSVC from stripping them out. For more information see force_include.hpp
The problem with exporting is that it adds a semantic meaning which is unwanted.
I'm not seeing this.
Adding an entry to the export table says "separately linked code can call me via this exported name/ordinal". I don't think this is the meaning that was intended.
I think the /include linker option is supposed to do what you want:
http://msdn.microsoft.com/en-US/library/2s3hwbhs%28v=VS.80%29.aspx "Specifying a symbol with this option overrides the removal of that symbol by /OPT:REF."
You can even specify it in the source:
#pragma comment(linker, "/include:__mySymbol")
http://msdn.microsoft.com/en-us/library/7f0aews7%28VS.80%29.aspx
I don't see how one could pass a symbol generated by template to to a linker switch or to a pragma.
Yeah good point. Perhaps template instantiation could generate a reference to the symbol from within some other object that does have a fixed name? I'm sure I've seen this done somewhere but don't recall exactly how. Maybe something involving the template generating a static data member with a constructor that passes the object's address to some function the linker cannot eliminate, and there had to be at least one function actually called in that translation unit in order to prevent the linker from throwing it out entirely. Not saying it's necessarily a great solution, but...
These exported symbols are excellent for provoking bugs in software that makes assumptions about the maximum reasonable length a symbol should be able to have.
I'm no seeing this either. I don't see how that this symbol information is accessible. Even if it is, I don't see how it could be used.
It goes into the the PE32 "export table", usually by name. IIRC, the export table points to NUL-terminated strings with a length limitation in some obscure document somewhere. These export table names are visible without debug symbols and are to be compared case insensitively.
Someone else has reported that msvc no longer strips these symbols but has been unable to test this. I don't see how it's possible for the compiler/linker to know which unreferred to symbols can be stripped and which cannot be.
Generally, unless it's exported, or called or address taken by something which is not removed, then it will be removed by /OPT:REF.
GCC has some attributes which one can attach to flag function which shouldn't be stripped so it's not a problem there. The IBM compiler recently added attributes handle this as well.
That sounds like a better design for C++ than the linker option. - Marsh
participants (4)
-
Chris Yuen
-
Lars Viklund
-
Marsh Ray
-
Robert Ramey