Re: [boost] Gauging interest in new library re: multiscale/multigranular containers

7 Jun 2011

      Thanks to the two comments (so far).  I am happy to provide a more detailed 
discussion but thought it appropriate to keep my original posting shortish 
if possible.  

For a little backstory, although I have been working on code related to my 
original posting for a while, I decided to formally submit a "gauging 
interest" after some emails to Barry Smith, the head of the National Center 
for Ontological Research, who sent me a stack of research papers with new 
data structures and algorithms relevant to "multiscale" analysis.  This 
gave me a range of functions or algorithms to implement on top of those I 
was already working on.  There's going to be a conference on biomedical 
ontology in about six weeks' time, so I decided to assign myself the task 
of completing at least part of a library in preparation for that.  NCOR 
seems to be working on a "partition theory" which takes a top-down approach 
to scale.  I'll base my comments on "A Theory of Granular Partitions": 
http://ontology.buffalo.edu/smith/articles/partitions.pdf.  Partitions are 
defined as collections of "cells" which can be nested, including one 
"maximal cell".  For lack of a more elegant term, suppose a collection 
class is called cell_set, as in 

template<typename CELL_Type> class cell_set;

If this class is to model the theory then any instance will need a "maximal 
cell" or "root" in which all others are contained, which could be 
autogenerated with a member function.  Moreover, it needs a "subcell" 
relation and an "insert" function which probably would take an existing 
cell as parent to any new cell and that would default to the root.  I think 
a variation on this class should allow multiply parents, but for now I'll 
assume along with the above paper that parenthood is unique.  The paper 
also defines a "parthood" relation and "mereological product" such that c * 
c' represents the set of subcells shared by two cells (which given unique 
parenthood must be either c or c* or null).  They define two notions of 
summation, one of which simply groups all subcells of c + c' into a set, 
and another which looks for the least cell actually defined in the 
partition that contains all of those subcells.  
I could mention further structures which they define (and which could be 
member or namespace-scoped functions), but you get the idea.  As a 
practical tool, this construction seems to be used as a kind of sorting 
machine.  In the simplest case, there is one class of "target objects" 
which are to be fit into the set of cells (they use the term "projection" 
for a cell "projecting" onto one or more of those objects).  So cell_set 
then takes two type parameters as in 

template<typename CELL_Type, typename TARGET_Type> class cell_set;

the idea being that any cell_set instance would have a "location" function 
which takes a TARGET_Type value or collection and "locates" it within a 
cell in cell_set (defaulting to root if nothing else works).  Users would 
have to provide a boolean "fit" function testing whether a TARGET_Type 
object fits into a CELL_Type object.

Scale can be defined on cell_set instances by assigning the largest scale 
to root and stipulating that subcells are smaller-scaled than parent cells. 
 A default implementation could assign a scale 1 to root, define "depth" as 
the number of parent cells between a given cell and root, assign scale 0 to 
the "deepest" cell, and if D is the max depth assign scale to other cells c 
using 1 - depth(c)/D.  More nuanced implementations could consider the 
number of different subcells which a cell contains, with the idea that 
cells with more "inner structure" should be assigned larger scale.  Scale 
can also be used to filter cell_sets into more maximal or more minimal 
calls, providing a kind of granular query machine.  Imagine, for example, 
that a store is building a database for inventory, which fits into a 
collection of nestable categories.  Scalar filtering might come into play 
if the store wants to temporarily "reassign" products to new categories to 
prevent categories being listed on their website with too few products, or 
conversely if the store wants to spotlight categories for some sort of 
promotion but only ones with no subcategories.

Now, personally, I find that parent uniqueness is too restrictive and does 
not really fit real-world situations (for the same reasons that single 
inheritance is problematic...).  Easing this restriction makes cell_sets 
more complex in a number of ways.  They end up being graphs rather than 
trees, with parenthood one edge-type among others.  I'm not saying that 
"partition theory" is the last word on multiscale modeling; I think another 
useful class would be some sort of multi_graph<CELL_Type> which shares many 
features with cell_set but allows multiple parents and perhaps allows the 
maximal cell to be undefined.  This latter class would be more "bottom up". 
 Instead of introducing new subcells as partitions of existing cells, new 
cells would be constructed as "mergers" of existing cells.  In addition, 
edge types could be classified based on whether c <> c' (for some link type 
<>) implies that c is a subcell of c' or that c and c' are both potential 
subcells of some parent.  These differences give rise to different kind of 
"walks" or constraints on walks, e.g., only follow edges to parents, or to 
children, or to siblings.  Of course, cell_set as I described could be 
considered a special case of this more general multi_graph structure, but 
if multi_graph is to inherit cell_set's features it should support the 
construction of a "sort machine" for one or more target types; that is, 
provided a user-defined function "fitting" objects "into" vertices in the 
multi_graph, it should provide location functions for single or collections 
of objects of the proper type(s).  Scalarity and granularity functions 
would work much as described for cell_sets, although graph-theoretic 
measures such vertex degree can be used to refine scale quantification for 
cells.

I am particularly interested in using multi_graphs so defined for modeling 
GUIs.  I've read some interesting work by RedWhale software, in Palo Alto, 
concerning algorithms for matching data elements to UI controls, which they 
call "interactors".  For example, numeric data field would be mapped to 
spinners, list boxes, edit fields, scrollers, radio buttons, etc., based on 
their range of allowed values.  Assume that the links between data 
components and interactors are represented by a multigraphs, so that for 
one edge type <v>, c <v> c' implies that c is a data field and c' is an 
interactor type which will visually represent c in a GUI.  Suppose moreover 
that c <s> c' implies that c and c' are both interactors but c' is a 
higher-scales control, such as a notebook or dialog box; and that c <c> c' 
implies that c and c' are data components with c contained inside c'.  The 
visual clarity of a GUI can be analyzed by studying how these three edge 
types interact in the graph, and any patterns which emerge could be used to 
refine algorithms to automatically generate GUIs based on data models.  
This could be extended by also introducing user-generated-events and 
application tasks, such as saving updated data to disk, so that c <t> c' 
could mean that c is a event meaning that the user wants to perform task c' 
(e.g., save to disk).

Finally, to respond to a couple of points raised.  For argument's sake, 
suppose I develop a single cell_graph class which incorporates both the 
partition and multigraph functionality I've described.  (Actually I'd 
probably want instead to do a small group of related classes rather than 
one monolith, but, again, this is for argument's sake).  Certainly this 
could be implemented in terms of existing STL containers, but the choice of 
which container to use is affected by how the cell_graph itself is used.  
As far as I can tell, for optimal memory usage cell_graph should work on 
values rather than pointers, but the values should never be copied, if 
possible, so that pointers into the cell_graph remain valid indefinitely.  
On the other hand, there are occcasions when it is useful to use cell_graph 
more as a vector than as a list.  I won't go into the precise memory layout 
I've been using because I have not worked out all the details yet, but I 
think the choice of something nonstandard is justified.  Conversely, even 
if (say) cell_graph<CELL_Type, PROJECTION_Type> is implemented in terms of 
standard STL containers behind the scenes, there is enough extra 
functionality to implement that it makes sense to develop it as a separate 
library.  There are actually several other type parameters which I have not 
mentioned (although they take defaults), so using "bare" vector or set 
would yield very unwieldy code; better to hide as much as possible behind 
typedefs, partial specializations, or inhertiance: 

template
<typename VERTEX_Type, typename CONNECTOR_Type, typename NODE_WRAP_Type, 
typename NODE_SERVICER_Type> 
void 
TierBlend::Route_Matrix_Compiler
<VERTEX_Type, CONNECTOR_Type, NODE_WRAP_Type, 
NODE_SERVICER_Type>::compile(NODE_SERVICER_Type* servicer) ...

Stuff like this should be library code, not user code.

Finally, there is no single meaning to "scale" or "granularity".  A scale 
metric can be induced on a partition or "partition-graph" based solely on 
nesting (e.g. subcells inside parent cells) but in real world situations 
there is no reason why "sibling" cells should have the same scale.  The 
point is merely that classes representing these structures should provide 
for a scale measure associated with cells, graph vertices, and/or 
subgraphs, and should be a possible constraint or characterization of edge 
types as well (e.g. it could be stipulated that c <> c' implies c < c' for 
edge type <> and parthood relation <).  Similarly, "granularity" could be 
associated with a given filter, query, or operation which specifies a range 
of scales on which the operation is to be performed.

Thanks again for the comments.  

----------------------------------------
From: "Edward Diener" <eldiener@tropicsoft.com>
Sent: Sunday, June 05, 2011 8:18 AM
To: boost@lists.boost.org
Subject: Re: [boost] Gauging interest in new library re: 
multiscale/multigranular containers 

On 6/4/2011 4:55 PM, nathaniel@tierblend.org wrote:
...
Hi:  I would like to submit a library which would allow functionality
related to scale and granularity to be performed on STL-like containers.
Until recently I was in a doctoral program specializing in the
...
of science, and I am especially interested in multiscale data modeling 
and
its applications to e.g. GUI design, code generation, decision
...
and DSL implementation.  For example, one of my projects involves a DSL
implemented through multiscale graphs in lieu of conventional syntax
...
I used to study at Buffalo, which houses the National Center for
Ontological Research -- headed by the quite broad-ranging scholar Barry
Smith -- and has done pioneering work in biomedical ontology and also in
ontological models related to complex or "vague" aggregative structures.
In these structures collections of objects of some base type T can be
defined as aggregates but with varying degrees of looseness or 
"individual
coherence"; and this aggregation can be iterated to produce multiscaled
structures which can be queried or represented at different 
granularities.
There are a number of analytic procedures which can be implemented on 
such
structures, some overlapping with graph theoretic or statistical 
methods.
These structures also possess alot of internal structure above and 
beyond
their underlying list of values, so they are suited to memory-optimized 
C++
code rather than higher-level languages like Java.  However, I am not 
aware
of generic libraries which really incorporate these kind of data 
structures
and their analysis in a systematic way, although I have come across
non-template libraries which implement some similar features.  What I
propose is a library which could be used in a fashion similar to
std::vector or std::set but which support additional functions along the
lines of, e.g., building multigraphs out of T-type nodes; partitions of
T-sets with functions to define or measure e.g. intrapartition 
relevance;
defining (or extending) a scale metric on T-objects, T-sets, T-arrays, 
or
other "T- data structures" and using it for rescaling, scale-filter, and
other scale-related transformations; defining iterators or "walks" on
T-collections which take scale and granularity into account; etc.  In a
number of cases the algorithms for these kinds of operations have been
described in theoretical form or even already implemented in languages
philosophy
procedures,
trees.
like
...
Scheme.
You generally need to define "scale" and "granularity" in mathematical 
terms for computer programmers, else you are just interesting scientists 
in our particular area.

Also would it not be possible to create a class which consists of an STL 
container and some other objects which represent "scale" and 
"granularity" and add the necessary functionality to that class rather 
than attempting to re-engineer some STL container ?

_______________________________________________
Unsubscribe & other changes: 
http://lists.boost.org/mailman/listinfo.cgi/boost

Re: [boost] Gauging interest in new library re: multiscale/multigranular containers

nathaniel＠tierblend.org