Hi again,
I'm the original poster that started this thread. WOW! Thanks for all of the
great responses. I apologize for posting this message and then getting called
away on a business trip. It is only just now that I'm getting back to see what
kind of response I got, and I'm thrilled. I'm happy to see that a number of
folks involved with Boost see this issue as a significant problem, if only to
certain types of companies.
APOLOGY: I must apologize for a small mistake in my numbers, that might be
somewhat important to someone. I managed to reverse the counts for the "Smart
Ptr" and "String Algo" libraries. I remember thinking it kinda strange that one
referenced more modules but the other referenced more lines. So it's really
true that "String Algo" causes 382 files to be read, while "Smart Ptr" causes
only 180 to be read. Sorry about that.
I've taken a first pass through all the responses, and rather than respond to
each of them individually, I'll offer some more information here and attempt to
address address some of the questions that have been pointed back to me.
1) How did I get these numbers. Give some examples.
Here's one of the places I'd love to be shown to be wrong. If my numbers are
inflated, my sales job to my boss will be that much easier. So by all means,
someone correct me if my approach is unsound.
What I did was very simple. All I did was compile a very simple program and
have g++ give me a list of all of the headers it read during the compilation,
excluding system headers. This is done using the following line from my test
Makefiles:
$(CXX) $(CPPFLAGS) $(CXXFLAGS) -c $< 2> /dev/null -MM > headers.lst
Here's the test for SmartPtr:
#include <iostream>
#include <boost/smart_ptr.hpp>
using namespace std;
int main(int argc, char* argv[])
{
return 0;
}
This simple test produces a file named headers.lst with 180 unique header paths
in it, all starting with "boost/".
Discovering the modules used by each module took a few hours of fairly tedious
labor, where I sorted and then grouped each list of headers, where each group
consisted of headers coming from the same module.
2) Here are the specific module dependencies:
Any: "base", Config, Exception, MPL, Preprocessor, Static Assert, Type Traits,
Utility
Filesystem: "base", Config, Exception, Functional, Integer, Iterators, MPL,
Preprocessor, Smart Ptr, Static Assert, Type Traits, Utility
SmartPtr "base", Config, Exception, MPL, Preprocessor, Static Assert, Type
Traits, Utility
StringAlgo: "base", Bind, Compatibility, Concept Check, Config, Exception,
Function, Integer, Iterators, MPL, Preprocessor, Range, Static Assert, String
Algo, Type Traits, Utility
3) In response to the suggestion to not use the convenience headers, like say
"smart_ptr.hpp" as apposed to a header for an individual header type.
It's bad enough to tell my programmers they can only use certain Boost modules.
To tell them that they can only use certain parts of certain Boost modules just
gets to be too much. Plus, I can see eventually using most, if not all of the
functionality of the SmartPtr module. The same can be said for the other
modules I'm interested in. If I have to run to my boss every time I want to use
one new particular feature from a Boost module, it's not worth the effort. Nor
would it be worth the overhead of figuring out how to police such a level of
code use.
So for better or worse, my consideration of the use of Boost has to be on a
Module by Module basis.
4) In response to "the license says that it's free to use, and the copyright
holders have agreed to that license, so everything is fine".
That's not true in the legal world. Neither the license nor any statements made
by the person stating a copyright mean anything if that person somehow, if
intentionally or unintentionally, included some bit of someone else's code in
what they are calling their own. If the original writer of the code can prove
original authorship of the code, nothing done without THAT PERSON'S GRANT OF
LICENSE means anything. That original author owns all rights to the use of that
code, and can dictate how it can and cannot be used. It is this issue that
concerns companies like mine.
5) In response to "Who cares how much code there is. How does one "vet" a piece
of code, regardless of how much of it there is".
It is not hard to look at 100 or 1000 lines of code in a few files and say
"there's nothing novel here". If the code is all written to do one basic thing
or set of things in a direct way, it's pretty easy to believe that a single or a
few individuals wrote the code. And, if the claimed authorship is invalid, real
damages would easy to justify as being minimal, given the very limited scope of
what the code is capable of doing.
It's also much easier to feel comfortable in the fact that many other developers
are using these 1000 lines of code in their commercial products and haven't yet
been sued over the use of some portion of it. And if/when one wants to upgrade
to the next version of a module consisting of 1000 lines of code, it's pretty
easy to see what was added/removed.
But in the case of boost, with hundreds and possibly thousands (with fuller
adoption) of individual files involved, consisting of tens to hundreds of
thousands of lines of code, you can't have any idea what you've got In fact,
you can feel fairly confident that all of those lines of code are NOT NECESSARY
in the basic sense to the benefit you wish to gain from the module in question.
So you have to ask yourself "what more does all this code do?", and you
certainly can't read and understand the purpose of every line of such a quantity
of code to answer that question And the fact that there's so much of it, leads
one to wonder "what novel things might be going on in that code to require so
much of it"? I mean, 384 header files being read for a Smart Ptr library is
pretty darn "novel" in and of itself.
Finally, there's mere statistics involved. If 1000 lines of code opens a
company to a certain amount of negative exposure, 100,000 lines of code, one
might argue, opens the company to 100 times as much exposure.
6) And...I'm not sure this question was asked specifically, but I'll ask it
myself..."what are you so worried about".
Here's an example of what we're worried about....
Say we develop a tool for Disney to use on one of its feature length films. A
month before the premier date of the film, someone takes Disney to court and
claims that one of their production tools, the one we wrote, contains code that
was stolen from them. Disney asks us to come to court to defend our use of that
code.
In the case of 1000 lines, we can say exactly what we did to vet the use of the
code, and state exactly what that code does not just for us but for anyone who
might use it, pointing out that each of those users has a very clear idea of
what the code does, what it's worth to them, and why they considered the
copyright given by the supposed author to be valid.
For 100,000 lines of code we say, well the 1% of the code we use kinda/sort
works by doing this, but it does that by going off and using bits and pieces of
all these other files, and frankly, we couldn't take the time to understand what
all that code is for, and therefore can not possibly have understood that the
code contained something novel that might have been misrepresented as to its
authorship for reason of personal gain on the part of the offending copyright
grantor.
In the first case, maybe the judge puts some value on the 1000 lines of code,
and because it's Disney, that number gets multiplied by 10X. It's still a small
amount of money for Disney, so they pay the money and just decide never to do
business with us again.
In the second case, the judge says "wow, there's a lot of code here. This is
going to take a lot of time to work out the ramifications of, and to put a
dollar amount on" and files an injunction against Disney releasing their film.
This costs Disney many millions of dollars on everything they've set in motion
in order to release the film, that will now all be wasted money. Disney sues us
for all of that money. We, as a very small software company, talk to our
lawyer, who tells us our best bet is to fold the company and go find jobs
working for Google.
7) Use a more modern C++
Some of our customers are in the Operating System Stone Age. For example, I
often develop on Fedora, but my code has to be able to compile and run on Red
Hat 5. AND, we are often told exactly what compiler to use, and that compiler
sometimes not open source, and in a few cases no longer supported. So solving
these issues with newer compilers is not an option.
8) Conclusion
So, it DOES MATTER, IN A BIG WAY how much code I have to bring into my project's
codebase to get SmartPtr capabilities. And even if it turned out that it didn't,
my boss doesn't consider it worth the risk to make that call. He'd rather hire
another programmer just to write a SmartPtr library, so that our project can
stay on schedule and he can sleep at night, knowing his company isn't going to
some day go "poof" due to a relaxed approach to using Open Source.
Some of our customers don't allow their engineering departments any access to
code on the internet for this very reason. There are firewalls designed solely
to look for and disallow anything that looks like significant code or other data
from coming into the company walls. We have to justify our use of each specific
piece of Open Source to EACH OF THESE COMPANIES before we can begin to supply
them with anything. So another big issue for us is that as soon as we say "We
use Boost", we are dismissed from consideration for a project. I bet this
happens all the time.
Thanks All for all the interesting and valuable discourse! Take care!
Steve
PS) My company DOES already use the Boost Smart Ptr library. However, it uses a
much earlier version of the library, one that depends on just a dozen or so
headers. So I guess at one point individual Boost modules were more separable.
Or maybe it was just generally smaller back when that module was adopted. So
we DO already have and use Boost Smart Ptrs...we just don't have all the nifty
new features in the latest and greatest, the most important of which is the
ability to not require that the pointed to class be defined wherever a Smart Ptr
is instantiated. I'm dying for that feature.