
Hi Janek, "Janek Kozicki" wrote
Hello,
I'm posting this review a bit after the deadline, and I'm very sorry about that. This is caused by the fact that I didn't know that all mailing list members can write a review. When I learned about that, I had only about 2 days left, but still I wanted to perform a deep analysis of what is in pqs.
--------------------------------------------------- * Do you think the library should be accepted as a Boost library?
Yes, I think it should. The scientific community, and industry are in desperate need of such library, rejecting it just because it's not perfect would be a very bad decision. It's not uncommon that boost libraries evolve once included, I believe that this will be the case here. Moreover including it will greatly increase the amount of feedback received and in result the library will reach the "perfect" state much faster. After performing checks (described below) I'm sure that the library is ready for inclusion.
Great. Thanks for the yes vote.
Note: below I often refer to yade ( http://yade.berlios.de/ ) which is a framework for performing scientific simulations. Yade is my current project, and a part of my PhD.
Yes, I know yade or at least watched some sample videos. Its a great project I have done tests of using pqs inside yade, performed
speed benchmarks and unit tests.
OK. I havent done anything but very minor performance tests on pqs, so that will be interesting.
--------------------------------------------------- * What is your evaluation of the documentation?
Before trying pqs I've read almost whole documentation (skipped few pages). So those comments and suggestions below were written before I actually tried using pqs:
As stated by other reviewers the introduction is a bit offtopic. Summary given by review manager fits here best. In "Definition of terms" an UML diagram would be nice, or at least some of those diagrams that are automatically generated by doxygen. Later I discovered that in html version there is a diagram, but not in pdf version. I wonder if any other reviewer would complain less if he saw this diagram in pdf. Downloading pdf was much easier than unpacking the library and opening index.html from it ;)
Yes it was a bit too easy!
I'd prefer a different name for coherent_unit and incoherent_unit, for example si_unit and non_si_unit.
OK.
Several terms ( 'dimensional analysis', 'dimensionally_equivalent', 'dimensional_math', 'dimensionless', 'dimensionful', 'physical_quantity', 'physical_quantity_system', 'rational_number', 'unit_output_symbol', 'unit_symbol' ) do not appear on the Entity Relationship Diagram. Therefore their meaning is for me unclear: classes, methods or concepts? If they are concepts then underscore ("_") shouldn't be used in their name. Method names can be written in the ERD diagram inside rectangle of a given class, separated by horizontal line (UML notation), so that it would be possible to look up methods in the diagram.
OK. I need to study UML for diagrams
Also a reference-style listing of all classes and methods in the library would be nice (this kind of documentation is usually auto generated by doxygen). This listing would have references to correct places in synopsis. It would be also *very* useful if at the end of the doc, there was a referenced list of all examples in the manual, and also a list of all pictures.
OK.
In Rationale:Clarity section the example with PQ_Box should be written full 'ready to compile' (because it is the first longer example that appears in the manual).
OK.
Also IMHO Rationale section should be put before Definition of terms. Usually definition of terms is put at the beginning, but usually it is much shorter. A causal reader who doesn't know what he should read, will read from top to bottom - and in this respect it's more important that he reads Rationale first. Because if he reads Definitions first - he may lose his enthusiasm due to fact that definitions are _always_ boring (no matter how hard you try to make them interesting).
IMO the optimal layout for documentation is a hierachical (or star) structure rather than a serial one. In using Quickbook I traded convenience for flexibility. OTOH I guess the serial style works for pdf's.
I prefer "per": m_per_s, etc...
OK. I think the majority do too.
In pdf page 3, missing space: "The keyword typeofis used".
In pdf page 5, missing space: "distinguish between different but dimensionally_equivalentabstract_quantities."
QuickBook 1.1? is to blame for that one AFAICS by concatenating adjacent links. (Since then I have discovered how to add explicit spaces)
In pdf page 18 there is a reference to "Example 3a", but all examples are unlabelled, so I can only detect from the context of the sentence which example is actually referred. It would be useful to label all examples. (and make a full list of all examples at the end).
That shows the vintage of the example. Its copy pasted from Example 3a in the pqs 2 series docs. Yes. All examples should be numbered. In fact stuff such as sections, paragraphs, diagrams etc should be numbered.
In pdf page 23 comment is too long to fit on the page.
In pdf page 28 - mistaken underline
In pdf page 32 - I prefer synopsis written in valid C++, for example simple copy'n'paste from file t1_quantity.hpp would work. In fact I had difficulty in understanding line "if IsDimensionless<abstract_quantity>", and lines following it, until I guessed that it's some kind of Cpp pseudocode.
I could make it clear and maybe try to follow conventions too. Maybe I should make like GIL: http://opensource.adobe.com/structPoint2DConcept.html
In pdf page 51 - mistaken underline.
I think that might be an problem in the Quickbook --> pdf chain.
Word "useage" hurts my eyes. Heh, and vim's spellchecker underlines it as a bad word ;)
OK. bad spelling.
--------------------------------------------------- * What is your evaluation of the implementation?
More constants, including mathematical ones, would be nice.
The selection included is just for demonstration. All the constants in SP961 should be included: http://physics.nist.gov/cuu/pdf/chart1.pdf
I don't like the separation of "three_d" and "two_d" - where quaternions should be put then - "four_d" ?
My rationale is that quaternions exist in a 3D space. OTOH complex would be in two_d because it exists in a 2D space.
I've gone through similar problems in my own design (just browse older revisions of yade in subversion repository http://tinyurl.com/lgk8t ), and finally I've found it most convenient to put vector (both 2d and 3d), quaternion and matrix inside "math". It was simply too cumbersome to type everywhere all those lenghty namespace names (and I used various variants, including 2d / 3d distinction). Therefore I strongly suggest to use simply namespace "math" and put inside vector2d, vector3d, quaternion and matrix. Extra namespaces increase fragmentation.
OK.
Also please kindly consider following exampls, what name is more descriptive and shorter?
three_d::vect math::vector3d
OK, my favourite name is one letter longer, but still - "math" says a lot more that "three_d", so this name is more descriptive, IMHO.
Maybe even lose math:: ?
Please be verbose, please rename vect to vector3d and vector2d, when putting them into single namespace. Also I prefer name quaternion, over quat.
OK.
I'm sorry about elaborating so much about this topic - that's because it's close to my heart. In fact I can adapt to any library layout which Andy choses, but what others think?
IMO it makes sense to put 3D entities in a 3D space and 2D entities in a 2D space, but its not critical to me either way
I believe that in matrices there is no need to support both row-column and column-row notation explicitly, but rather to keep internal compatibility thorough whole library. Based on that I'd strongly suggest to rename rc_matrix into matrix, and never implement cr_matrix. There is an interesting reasoning about this problem inside panda3d software manual (given time I'll find it later).
Its certainly less work sticking to one type of matrix so I dont object to that. There may be performance resaons to choose on or other type?
It's nice to see timer included. (but missing from docs?)
It was a bit of fun for intended for simple performance timing, but should be documented yes.
It would be nice if t1_quantity was renamed to some descriptive word. Perhaps a discussion on the mailing list would help to discover a better word.
Perhaps just quantity1, quantity2, quantity3 or quantitya, quantityb, quantityc.
I did not delve as much as I would want into the implementation of t1_quantity, due to lack of time. I noticed that there are two similar files, length.hpp and length_.hpp, but there is no force_.hpp, or any other file with _ at the end. This puzzles me a bit.
Oops...length_.hpp should have been removed from the package.
--------------------------------------------------- * What is your evaluation of the design?
The design as shown on the Entity Relationship Diagram ( /libs/pqs/doc/html/pqs/pqs_erm.html ) is so clear, that I can hardly imagine making a better design. It must be noted however that I've never written my own units library.
I need to redo it usin the correct symbols maybe and see if it can be used for other systems than SI or not.
I'm not sure if fixing number of dimensions to exactly 7 is good decision, perhaps it could be changed at later date, after the library is included to boost. Those extra dimensions would be non-si of course, but could be useful for some people (storage capacity of a harddrive comes to mind, or unit of information measured with libraries of congress ;).
I think the number of dimensions could be configurable. This is dependent on PQS becoming superceded too though, because PQS is only useful in SI IMO.
It's good that Andy plans to implement vector, matrix and quaternions operations, that will perform necessary unit checking during computation (all at compile time!). My concern however is that Finite Element Method uses large matrices (for example with 1000 columns), and I'm sure that users will prefer to stick with ublas and all functionality provided by libraries they are already using. Instead of more general (and perhaps impossible) solution that would allow use of pqs with any other math library, I recommend to investigate tight integration with boost::numeric::ublas. I'm sure that people responsible for that library will gladly accept the idea, and accept any patches related to this. Basically this will make boost::ublas a ,,sort of'' part of pqs. Other solution would be to reimplement such huge matrices in pqs, but how far Andy could go?
I dont know if its possible. Another option is to cast the quantities to numerics. IOW exit the type checking. I will have to ask on the ublas list.
--------------------------------------------------- * With what compiler did you try the library?
gcc version 3.3.5 (Debian 1:3.3.5-13)
I have compared compilation times with and without pqs. Whole yade consists of over 20000 lines of C++ and it compiles in 10minutes on my AMD 4400+ (when I'm too lazy to use distcc which can reduce this time to 3 minutes - ideal when I'm in hurry). Because pqs is just in one small module (about 100 lines of code) I measured the time for compiling only this module. I took 5 measures, and computed average:
without pqs: 3.8 seconds with pqs : 4.8 seconds
That gives about 21% increase. It would mean that if I used pqs in whole yade 10 minutes could possibly become 12 (or 4 minutes with distcc). not much difference I think. Full recompiles do not happen often, anyways.
It is interesting to see the comparison. I spent some time trying to keep compile times down.
--------------------------------------------------- * Did you try to use the library? Did you have any problems?
First I've compiled some of the examples without any problems at all. When working with the first example I discovered that it would be nice if include path for headers was shorter, like following:
#include <boost/units/length.hpp> // to get length #include <boost/units/io/length.hpp> // or sth similar to get IO for length. (input.hpp is OK too)
I think that is better than the current arrangement.
When working with the examples I decided to add quaternions to pqs, because in yade quaternion calculations play a crucial role. I used three_d::vect as a base source file. I was really pleasently surprised how easy adding them turned out to be! I was afraid that some unexpected obstacles will stop me on my way (which will cause this review to be send after the deadline of 9.june (oops it's after the deadline now!)). I've sent this quaternion implementation to Andy privately, I hope that it will be included.
Yes Thanks! I am currently digesting it!
It lacks many commonly used methods, but the base foundation for it is prepared, which I consider a success of pqs design. I should stress again, that I've virtually done nothing in effort that those quaternions will support units - those units were supported right away. Only thing I didn't know (but didn't consider important - as I spent less than 2h to complete the task) - was how to define a return type of squared unit (used by method squared_magnitude() ).
This currently just uses the binary_operation function as elsewhere, but may be simpler with Boost Typeof
Then I started to adapt yade to pqs. I decided to use units only in one module - that one where the actual calculation loop for selected simulation resides (the most crucial part of whole simulation). Yade is capable of modelling various models, so I picked the most causal one - simple DEM simulation: spheres falling onto flat plane below: http://tinyurl.com/hjy9g . Pqs was used only in simulation loop, so I expected extra slowdown due to converting data to/from physical quantities at the start/end of each loop. However it was simpler than converting all the data structures in yade. During the process I needed a unit to express a spring stiffness (hooke's law F=kx, where k is [N/m] and x is [m]). But because Fred Bertsch has just now posted an email that "review ends" I decided not to add a new unit N_div_m, but rather I used Pa that was multiplied during calculations by 1 meter constant (so maybe adding new units should be made simpler). During the process of adapting yade to use pqs I was impressed in overall how easy it did come by. As an indicator: on the first time when my modified code compiled without errors it was running correctly. I was really surprised, and this shows the power of units.
That is good to hear.
Especially when I recall how much time I've spent more than year ago to debug this code! This gives me a clear picture of how much time can be saved with pqs library. How often your code, dear reader, ran correctly after the first compile? ;)
If it could be achieved with no drop in performance then it will be near perfect!
First thing I did after sucessfully compiling was to perform a unit test simulation and compare the simulation state. I did this by running the simulation for 1000 iterations (all the spheres dropped down to the flat plane below and formed a pile), and saving spheres coordinates and their orientation every 50th iteration. I used type double for whole calculation, and boost::lexical_cast<std::string>(double) to save coordinates with 16 decimal places.
So I had 20 files from each run, which I could compare between pqs and non-pqs version. Comparing them showed that in 100th iteration about 20 spheres (out of 1000) had coordinates differing on the last decimal place. In 150th iteration all spheres' coordinates differed on average on last 2 decimal places. In 200th it was 3 last decimal places. The difference was growing during time. In 500th it was last 8 decimal places. Due to the chaotic nature of this simulation, every error will accumulate and grow. So I can conclude that in the simulation loop with pqs there appeared a small chance of having different last digit than expected. And this error accumulated over time. I'm not sure if this error is caused by pqs, or maybe by extra assignments and conversions. Personally I think that it's because the order in which components of the equation were added and multiplied was changed.
OK. I don't know what is causing this, except that pqs changes division and multiplication in some cases. I also use Boost Numeric converter. I guess I will need to look into the actual code to determine this. I had better admit that the implementation is not very pretty and should be totally reworked.
After performing above checks (aka. unit test) I did speed test, to benchmark pqs. Results are not in favor of pqs, but then I should remind that they are skewed due to extra constructor calls in pqs version. Those constructor calls would not be here if I converted whole yade's data structure to pqs. So perhaps I should not make those results public? Since I have no time now to consider whether to make them public or not, here they are:
without pqs (three runs): Computation time: 370.939 seconds Computation time: 382.615 seconds Computation time: 360.418 seconds
with pqs - and extra constructor calls, which would be avoided if I converted whole data structure! (three runs): Computation time: 415.724 seconds Computation time: 423.181 seconds Computation time: 421.799 seconds
that makes a difference of roughly 11% - 13%
OK. Bear in mind that I have spent no time in optimising PQS for performance. The use of pqs for dimensional analysis checking and best performance may need to consist in writing code using typedef defined so that quantities/ floating point types are swapped in to do dimensional analysis then swapped out in favour of floats in finished code. I have written some code this way. Within certain rules ( no operations that scales a value between units for example, which could be set up to flag an error) it should be the most effective method. All these useages are possible including the default style, the SI_units style preferred by Jesper Schmidt and the style described above... using another 'view'/ frontend or call it what you will on the underlying type.
Previously when working with this code I've seen a speed difference of ~2% just because of one extra constructor call in this loop. While I performed this benchmark with good intentions I still cannot say whether pqs is slower or not. This question will be answered when I fully convert yade to pqs. It's possible that I will discover something new with kcachegrind profiler - but due to the lack of time (review deadline) I cannot post profiler results now. If anybody wants to see the code I used it's here: http://tinyurl.com/r83tr , while the original code is here: http://tinyurl.com/ouhbc . Quick glance at the code shows 21 extra assignments of double, and 9 extra construtcor calls.
Also I should note here, that during my work with yade, every time I made a design decision which favored clean design over calculation speed I was later rewarded. The reward was, that clean design allowed unexpected optimizations in different pieces of code, which at the end resulted in faster calculation speed. I have reasons to believe that this will be also the case with pqs.
Yes. High performance hasnt had much work done on it. I suspect that some of my attempts at code enhancement might have caused slow downs. I should test different implementations, which I havent done.
During this adaptation of yade to pqs I encountered only one small obstacle: operator *= for vect was missing. I had to write { vect1 = vect1 * some_scalar;};
Also vect has a function magnitude(), please add squared_magnitude(). I tried to add it myself, but got lost in defining a squared return type.
OK. vect is quite new and unfinished, needs a lot of work and tests of course.
--------------------------------------------------- * How much effort did you put into your evaluation? A glance? A quick reading? In-depth study?
As much as a human can, given just 48 hours. Sounds a bit funny - that's because earlier I was not aware that any boost mailing list member can perform a review. So maybe it qualifies even as an in-depth study, but I'm not sure.
Maybe. Its a very impressive review all round .
--------------------------------------------------- * Are you knowledgeable about the problem domain?
Instead of claiming good (or bad knowledge) I should rather describe my background, which should let you decide whether I'm knowledgeable:
Physics : I'm working at a technical university, doing a PhD which focuses on modelling behavior of concrete. I'm familiar with Finite Element Method, with Discrete approach, with various material models (viscoelastic, hypoplastic, cohesive, etc). I've had experience with modelling granular materials and cloth. Currently I'm developing (mentioned earlier) a framework for performing physical simulatios - http://yade.berlios.de/ . I have 16 english-language publications (9 of them are just conference materials, 6 of them are in scientific journals, and 1 is in renowed scientific journal "Granular Matter"). No more details, as I feel now like boasting ;)
C++ : I have over 10 years experience with programming in C++ (mostly guided by my brother Bronek who is more active on this mailing list ;). As my achievements in this field I'd like to list serialization library in yade. After thorough testing we decided that boost::serialization does not qualify for our needs - we needed not-on-the-fly serialization (important design decision), and better support for various formats - xml easier to read by human, binary, text, etc. We even developed a backend which allows to serialize classes (their registered components) into QT dialog window (well, currently unfinished, as containers are left out, but soon this will be fixed). Another would be a multimethods library in yade, I made it as a "dreaded excercise left for the reader" - those words Andrei Alexandrescu used at the end of his chapter about multimethods in "Modern C++ Design". In this implementation recursive templates allow theoretically unlimited number of dimensions (at this moment only 3 are used), and up to 15 arguments in multimethod call. This library focuses on speed, so a callback matrix is used. By looking at this source code (in subversion repository of yade http://tinyurl.com/hxbr7 ) you can judge my familiarity with C++. (I'm more proud of multimethods in terms of template programming, so look there first, serialization is still waiting for some cleaning ;)
--------------------------------------------------- * What is your evaluation of the potential usefulness of the library?
For people working in science and engineering this library is extremely useful. There is a bad need for such library in the scientific community. I had a chance to envision myself how pqs reduces the chance of making stupid mistake in calculations. Once my code compiled without errors it was correct! If I recall now how much time I've spent debugging this code more than year ago, I see that pqs allows to save huge amounts of time.
I am glad to hear it. Now all thats needed is the original performance.
--------------------------------------------------- * Other random comment:
I feel there is a need for another discussions how to call this library in boost. Most people do not like "pqs", some people (including me) prefer name "units", but others have strong and reasonable arguments against name "units" (but I'm not sure if anything else was proposed).
It might be good to see what emerges out of the review. Maybe an enhanced units library. If so pqs might keep its name for reference/ comparison with the new library. Thanks for the very comprehensive review. Oh and the vote too! regards Andy Little