Re: [boost] [Review] Shmem (lomg)

9 Feb 2006

      I vote to accept the library into Boost.

I have been following shmem development
for long time and most of the issues
I had were resolved.

It covers important but so far neglected
area. Potential for tools based on shmem
is large.

Ion has created extensive (1MB of sources)
and powerful tool and I have full belief
he will maintain and improve it futher.

For practical reasons I do not recommend
attempt to split the library into
separate parts.

What I do not like is described in details bellow.
Most of my objections fit into two groups:

1. The documentation is /very/ dense
   with information and may easily
   scare a reader.

   Examples to play with and pictures/diagrams
   wold lessen the mental toll on reader.

2. (From someone who is not English native.)

   Currently used names are hard
   to understand and remember and lack
   intuitive meaning.

   Some words like "shared", "named"
   or "object" are overused

   I think naming conventions should
   be discussed as the most important
   issue of the library since shmem
   has real chance to establish de-facto
   standard for C++ IPC and process synchonization.

/Pavel

__________________________________________________________
1. index.html:

"portable synchronization primitives"
==>>
"portable interprocess synchronization primitives"

---------------

Need for glossary: the first page mentions
several terms that are not really well defined:

* dynamic allocation
* segment
* named allocation
* base address (not eveyone may know)

These terms should have link to glossary page
when they are used for the first time, ala:

<a ref="....">dynamic allocation</a> (<a ref="...#...">in glossary</a>)

__________________________________________________________
2. Naming: IMO the naming convention should be reconsidered.

It is rather hard to intuitively feel difference between
  "named_shared_object"
and
  "named_user_object"
for example.

I am not and English speaker so I don't dare
to suggest alternatives but current names are
quite confusing to me.

Anyway, all naming conventions should be
explained in glossary.

I have uneasy feeling from these names:
* named_shared_object (vague for me)
* segment_manager (the word manager is overused)
* segment (too much reminds me the x86 segments)
* mixing of "object" and "class" together

__________________________________________________________
3. quick_guide.html: as it was suggested
   a smart pointer should be used in the first example.

I am against adding new typedef into shmem, seeing

  shared_ptr<void, shmem::deleter>

gives me immediate clue, while

  shmem_ptr_whatever

is something one needs to look up.

-----------------

The code snippets may have link to full
source code of an example next to them.

Especially for first few snippets this
will encourage novices to play and get
immediate feedback.

--------------

Which reminds me: there should be a page
listing all examples with short description
what important and useful is in each example.

It is fast and cheap way for learning
and it is used e.g. in multi-index.

------------------

Code snippet about retrieving named object
from shmem (the fourth): the need for
std::pair when extracting the object
should be explained in a comment.
It is unexpected on the first sight.

------------------
The "offset_ptr" word should be linked
to reference.  Generally, the docs should
be interlinked as much as possible.

------------------
An example with offset_ptr AND named object
may be added to ensure the first time
reader these tools are orthogonal.

-------------------
The title:
  "Creating vectors in shared memory
  with different base addresses"

should be

  "Shmem mapped to different base adresses?
   Standard STL containers cannot be used.
   Solution: shmem's own container set."

Long but self-explaining.

Medieval publishers titled their books in similar
way and it sold well for centuries.

-------------------
Just a pedagogical nit. The snippets show:

int main() {
   ...
   if (!...) {
      return -1;
   }

Depending on OS the value (-1)
would produce strange effects. Use 1.

-------------------

Nits for snippet #6:

* the name ShmemAllocator should be shmem_allocator_t
* alloc_inst  should be "my_allocator" or so
  (the word "inst" means something in Prolog/Mercury)

__________________________________________________________
4. offset_ptr.html:the "offset_1_null_ptr" policy
   is sufficient for all purposes.

   Since I cannot think of any use case where
   an other pointer policy is required I recommend
   to remove them. This should decrease code
   complexity and most importantly mental load
   on the user.

__________________________________________________________
5. limitations.html: more hyperlinks.

-------------
In sentence
  "References will only work if memory is ..."

the word "only" should be bold and red.

------------
Perhaps this page should be divided into two parts:
* shmem is always mapped to the same address
* shmem can be mapped to different addresses

and for each cases limitations listed.

__________________________________________________________
6. rationale.html: the text about containers repeats
   what was already said before.

   While repeating helps one to remember better
   it is for the third time on last three pages
   (quick guide/limitations/rationale)

__________________________________________________________
7. concepts.html: should be named glossary
   and expanded.

Some texts here sound strange:

"...memory algorithm is an object..."

Picture/diagram would help here.

The bolding is overused.

I feel uneasy about the term "front-end"
but have no suggestion.

__________________________________________________________
8. oswrappers.html:

"Base shmem classes" may be better name.

a the classes may have name
  "XYZ_base".

The word "basic" is also used
in two docs pages later and it is
very confusing (to me).

---------------
The functions "create_with_func/open_with_func/..."

I still didn't got need for these (I am slow)
but there should be a leaf page with an example
and typical list of situations where this feature
is necessary.

The code snippet should have link to this
"more" leaf page.

----------------
Perhaps the construction/initialisation
dichotomy for mapped_file is not needed.

The class looks cheap enough to use
constructor/desctructor for them.

---------------
Huge (> 4 GB) files and mapped_file class:

I am not sure what size_t is
on current 64 bit systems.

To be absolutely, positively, safe I would use
fileoff_t everywhere instead of size_t.

-------------------

(Absolutely uninformed guess follows.)

If process A crashes in the middle
of shmem operation (I expect this
to happen during development), will
all the mutexes/shared memories/etc
be destroyed automaticlalyy after
process B is closed?

Or will reboot be needed to cleanup
the system?

In second case, could some API be added
to destroy any named remnants on explicitly?

This may be also useful for self-restarting
applications.

--------------------
Use of term "process-shared" and "named".
It is confusing (for me) and it took
me a while to get it.

I suggest to add short info on the top
of the page describing distinction between
these two approaches in tabular form.

Should I pick names I would use
"XYZ placed inside shmem" and
"XYZ identified by name".

---------------------
Generally, this page (oswrappers.html)
is /extremely/ dense with information.

It covers stuff that is taught
during whole university semester.

At least reader should be warned
on the top of page that lot of time
is needed to grok it.

Links to examples and class diagrams would help here.

__________________________________________________________
9. named_shared_object.html:

The title may be:
  "Named user data places in shmem"
or something what forms full sentence.
I have trouble parsing the current name.

-----------------
Seeing the whole class synopsis is
a bit frightening experience.
(a blob on two pages).

Perhaps a mini-synopsis with just
template parameters and few most
important functions (or groups of functions)
could be shown and only then followed by
the full class.

Other possibility would be SGI STL like
documentation with table and detailed
comments bellow the table.

-----------------
Title:

"Common named shared object classes"

==>

"Default specialisation of class doing named shared memory allocation"

;-)

I do not like the group "object classes" and
do not like that something with "object"
in name is a class.

I think this should be discussed.

------------------

Links to examples and class diagrams would help here.

------------------

Instead of having boost::shmem::anonymous_instance
I would prefere to have function

   construct_unnamed(...)
or
   construct_anonynous(...)

Similarly
   construct_unique(....)

------------------

The discussion of "index types" feels
as something so advanced that it should
be moved futher down in the docs.

Anyway, current documentation too low
on details and actual purpose.

Missing is a /table/ comparing
each type and where it shines.

__________________________________________________________
9. stl_allocators.html:

A very dense page. A table on the top comparing
advantages of each allocator type would help.

A link to header(s) implementing the allocators
should be present so reader could jump there
quickly.

I guess every allocator type is used in at least
one of shmem containers: there could be link
to show this as an example.

__________________________________________________________
10. containers_explained.html:

An overview of Boost libraries compatible
with shmem couled be added (I suspect multi-index
is compatible, while say smart pointer generally isn't).

-------------

The last code snippet may be better if
separated into smaller and focused parts
like "all in shmem", "nothing in shmem",
"whatever inbetween" and so on.

__________________________________________________________
11. customizing_boost_shmem.html:

This should be separated into three parts: algorithms,
custom allocators and indexes.

----------------

The previous pages in the docs should have
hyperlinks like

   "see how to build your own XYZ (very advanced)"

pointing here.

--------------

Generally, interlinking should be used much more.

__________________________________________________________
12. beyond_shared_memory.html:

This name suggests super-advanced functionality
far beyond needs or abilities of anyone.

I think what is presented here are very useful
tools that could be used separately from any IPC
and synchronisation.

Complete examples would be very helpful here,
as well as careful wording that doesn't
suggest need to use whatever "shared"
(in OS sense).

----------
The relevation of memory mapped files feels
too late, at this moment. Too many readers won't
get that far or will skip the "beyond".

The note about file mapping in initial
parts of the docs should have interlink
pointing here.

----------

I remember Pablo Halpern's proposal
for new allocator (on comp.std.c++).

One of the features was ability to 'move' objects
passed into an container under allocator of
this container.

At the moment of writing I have only vague
recollection but I think it could be possible
(via shmem containers):

* create complex data structures in specified
  memory areas (without the need to modify
  too much of user code).

* being able to "(de)serialize" complex data
  structures.

(As I said, these are vague ideas from the past.)

__________________________________________________________
13. streams.html: there could be possibly overlap with
    Boost.Iostreams library.

    At this moment I am not able to comment on it more.

__________________________________________________________
14. shmem_smart_ptr.html:

A teaser in form of miniexample should be on the top
of the page.

-------------

For examples I suggest to always use full qualification
for boost::shmem::scoped_ptr (and the other one as well).

Danger of making mistake is too large.

__________________________________________________________
15. shared_message_queue.html:

The mention of named_user_object should have backlink
and a comment that this means user-buffer-related tool.

------

There's no explicit info what happen if
the buffer-is-too-small error happens:
will it read part of message, will it
stay in queue or will it be discared?

This and info whether message can get fragmented
should be added.

------

The double contruction/initialisation steps
may be merged into one (no strong opinion,
just a feeling).

------
Functions like: how-big-is-the-next-message()
or peek() may be added.

I guess that people who will use shmem will
be the ones who happily use such tricks.

------

The "MyLess" in the example is not used consistently,
should be removed for clarity.

-------

The names like "mg1", "mg2" should be longer.

-----
Information on atomicity of the queue should
be told explicitly inside introduction paragraph.

---------
I wonder whether and how it would be possible to build
'objects' passing queue above shared_message_queue
(subject of necessary limitations).

I know such tool can get easily misused
but it would be first step to implement
Erlang-like message passing framework.

__________________________________________________________
16. architecture.html:

Diagrams picturing the levels of the library and
interaction between the levels would be handy.

----------

Information how lifetime of OS resources is managed
should be here (especially for Posix).

__________________________________________________________
17. performance.html:

Examples would be helpful.

Examples that show a slow way as well as
examples tuned for performance.

__________________________________________________________
18. open_issues.html

Namespaces: I do not like idea of more than one namespace
as it would only complicate already feature rich tool.

If it is possible to implement (transparently
to the user) a "type" checking then add it.

__________________________________________________________
19. future_improvements.html

Security attributes - IMHO adding them is
almost trivial (I know, I know) and they should
be added before the library is thrown to public.

__________________________________________________________
20. All shmem exception should have one common parent.

Name sem_exception is too short.

lock_exception should be deadlock_exception.

--------------

Possibly shmem::bad_alloc may inherit from
both common parent and std::bad_alloc.

This would make sense if release of "normal"
memory could have positive effect on availability
of "shared" memory. Just an idea.

__________________________________________________________
21. Wishes for additional functonality:

* ability to enumerate named from shmem.
  Useful for debugging/troubleshooting.

* debugging support: for example the first
  byte of shmem should be fixed constants
  and assert()s should be everywhere
  to check that user code didn't overwrote
  this.

  Possibly the metadata may have CRC stored
  and /always/ checked against.
  It may sounds as paranoia but multiprocess
  troubleshooting is such a pain that any measure
  that helps to catch a bug is justified.

* in "debug" mode the shared memory areas may
  be surrounded by guard areas (possibly too hard)
  and memory of an deleted object may be filled
  with 0xCC.

* primitive "transaction like" functionality
  for shmem.

  Scenario:
   - shmem segment exists
   - I do copy (clone) of the shmem data
   - many operations with shmem are executed
   - something may fail but ability to revert into
     well-defined state (in application sense,
     not in low level C++ sense) is impossible
     at this moment
   - then the stored copy of shmem will be written
     back into shmem segment, restoring the initial,
     well defined state.

__________________________________________________________
22. Wishes for docs:

* a page named "What I can do with shmem"
  with one liner description and link to
  example or detailed docs.

  This should cover small practical tasks like
  using process wide mutex, adding data
  into shared queue etc.

* the current documentation should visually
  distinguish parts showing problems, mistakes
  and potentially wrong uses of shmem.

  Picture of a bomb is typically used
  for this purpose.

* possibly a visual separation could be used
  for "basic" and "advanced" topics
  or for levels like
     "implementation details",
     "basic functionality" and
     "higher level tools".

* the docs may have section on typical
  (or expected) user mistakes, for example:

  - "what if I retrieve named object but
    use wrong type"

* the docs should say whether it is possible
  to somehow block shmem from working properly.

  E.g. both processes crash and leave
  a named OS resource hanging out,
  restarted processes will fail to
  connect the resource or it will
  be damaged beoynd repair.

  To discover such corner cases later
  in the project may very unpleasant.

* Pictures of inheritance diagrams could be relatively
  easily generated by Doxygen. On many places
  such pictures would come handy.

__________________________________________________________
EOF

Re: [boost] [Review] Shmem (lomg)

Pavel Vozenilek