
Thanks to the two comments (so far). I am happy to provide a more detailed discussion but thought it appropriate to keep my original posting shortish if possible. For a little backstory, although I have been working on code related to my original posting for a while, I decided to formally submit a "gauging interest" after some emails to Barry Smith, the head of the National Center for Ontological Research, who sent me a stack of research papers with new data structures and algorithms relevant to "multiscale" analysis. This gave me a range of functions or algorithms to implement on top of those I was already working on. There's going to be a conference on biomedical ontology in about six weeks' time, so I decided to assign myself the task of completing at least part of a library in preparation for that. NCOR seems to be working on a "partition theory" which takes a top-down approach to scale. I'll base my comments on "A Theory of Granular Partitions": http://ontology.buffalo.edu/smith/articles/partitions.pdf. Partitions are defined as collections of "cells" which can be nested, including one "maximal cell". For lack of a more elegant term, suppose a collection class is called cell_set, as in template<typename CELL_Type> class cell_set; If this class is to model the theory then any instance will need a "maximal cell" or "root" in which all others are contained, which could be autogenerated with a member function. Moreover, it needs a "subcell" relation and an "insert" function which probably would take an existing cell as parent to any new cell and that would default to the root. I think a variation on this class should allow multiply parents, but for now I'll assume along with the above paper that parenthood is unique. The paper also defines a "parthood" relation and "mereological product" such that c * c' represents the set of subcells shared by two cells (which given unique parenthood must be either c or c* or null). They define two notions of summation, one of which simply groups all subcells of c + c' into a set, and another which looks for the least cell actually defined in the partition that contains all of those subcells. I could mention further structures which they define (and which could be member or namespace-scoped functions), but you get the idea. As a practical tool, this construction seems to be used as a kind of sorting machine. In the simplest case, there is one class of "target objects" which are to be fit into the set of cells (they use the term "projection" for a cell "projecting" onto one or more of those objects). So cell_set then takes two type parameters as in template<typename CELL_Type, typename TARGET_Type> class cell_set; the idea being that any cell_set instance would have a "location" function which takes a TARGET_Type value or collection and "locates" it within a cell in cell_set (defaulting to root if nothing else works). Users would have to provide a boolean "fit" function testing whether a TARGET_Type object fits into a CELL_Type object. Scale can be defined on cell_set instances by assigning the largest scale to root and stipulating that subcells are smaller-scaled than parent cells. A default implementation could assign a scale 1 to root, define "depth" as the number of parent cells between a given cell and root, assign scale 0 to the "deepest" cell, and if D is the max depth assign scale to other cells c using 1 - depth(c)/D. More nuanced implementations could consider the number of different subcells which a cell contains, with the idea that cells with more "inner structure" should be assigned larger scale. Scale can also be used to filter cell_sets into more maximal or more minimal calls, providing a kind of granular query machine. Imagine, for example, that a store is building a database for inventory, which fits into a collection of nestable categories. Scalar filtering might come into play if the store wants to temporarily "reassign" products to new categories to prevent categories being listed on their website with too few products, or conversely if the store wants to spotlight categories for some sort of promotion but only ones with no subcategories. Now, personally, I find that parent uniqueness is too restrictive and does not really fit real-world situations (for the same reasons that single inheritance is problematic...). Easing this restriction makes cell_sets more complex in a number of ways. They end up being graphs rather than trees, with parenthood one edge-type among others. I'm not saying that "partition theory" is the last word on multiscale modeling; I think another useful class would be some sort of multi_graph<CELL_Type> which shares many features with cell_set but allows multiple parents and perhaps allows the maximal cell to be undefined. This latter class would be more "bottom up". Instead of introducing new subcells as partitions of existing cells, new cells would be constructed as "mergers" of existing cells. In addition, edge types could be classified based on whether c <> c' (for some link type <>) implies that c is a subcell of c' or that c and c' are both potential subcells of some parent. These differences give rise to different kind of "walks" or constraints on walks, e.g., only follow edges to parents, or to children, or to siblings. Of course, cell_set as I described could be considered a special case of this more general multi_graph structure, but if multi_graph is to inherit cell_set's features it should support the construction of a "sort machine" for one or more target types; that is, provided a user-defined function "fitting" objects "into" vertices in the multi_graph, it should provide location functions for single or collections of objects of the proper type(s). Scalarity and granularity functions would work much as described for cell_sets, although graph-theoretic measures such vertex degree can be used to refine scale quantification for cells. I am particularly interested in using multi_graphs so defined for modeling GUIs. I've read some interesting work by RedWhale software, in Palo Alto, concerning algorithms for matching data elements to UI controls, which they call "interactors". For example, numeric data field would be mapped to spinners, list boxes, edit fields, scrollers, radio buttons, etc., based on their range of allowed values. Assume that the links between data components and interactors are represented by a multigraphs, so that for one edge type <v>, c <v> c' implies that c is a data field and c' is an interactor type which will visually represent c in a GUI. Suppose moreover that c <s> c' implies that c and c' are both interactors but c' is a higher-scales control, such as a notebook or dialog box; and that c <c> c' implies that c and c' are data components with c contained inside c'. The visual clarity of a GUI can be analyzed by studying how these three edge types interact in the graph, and any patterns which emerge could be used to refine algorithms to automatically generate GUIs based on data models. This could be extended by also introducing user-generated-events and application tasks, such as saving updated data to disk, so that c <t> c' could mean that c is a event meaning that the user wants to perform task c' (e.g., save to disk). Finally, to respond to a couple of points raised. For argument's sake, suppose I develop a single cell_graph class which incorporates both the partition and multigraph functionality I've described. (Actually I'd probably want instead to do a small group of related classes rather than one monolith, but, again, this is for argument's sake). Certainly this could be implemented in terms of existing STL containers, but the choice of which container to use is affected by how the cell_graph itself is used. As far as I can tell, for optimal memory usage cell_graph should work on values rather than pointers, but the values should never be copied, if possible, so that pointers into the cell_graph remain valid indefinitely. On the other hand, there are occcasions when it is useful to use cell_graph more as a vector than as a list. I won't go into the precise memory layout I've been using because I have not worked out all the details yet, but I think the choice of something nonstandard is justified. Conversely, even if (say) cell_graph<CELL_Type, PROJECTION_Type> is implemented in terms of standard STL containers behind the scenes, there is enough extra functionality to implement that it makes sense to develop it as a separate library. There are actually several other type parameters which I have not mentioned (although they take defaults), so using "bare" vector or set would yield very unwieldy code; better to hide as much as possible behind typedefs, partial specializations, or inhertiance: template <typename VERTEX_Type, typename CONNECTOR_Type, typename NODE_WRAP_Type, typename NODE_SERVICER_Type> void TierBlend::Route_Matrix_Compiler <VERTEX_Type, CONNECTOR_Type, NODE_WRAP_Type, NODE_SERVICER_Type>::compile(NODE_SERVICER_Type* servicer) ... Stuff like this should be library code, not user code. Finally, there is no single meaning to "scale" or "granularity". A scale metric can be induced on a partition or "partition-graph" based solely on nesting (e.g. subcells inside parent cells) but in real world situations there is no reason why "sibling" cells should have the same scale. The point is merely that classes representing these structures should provide for a scale measure associated with cells, graph vertices, and/or subgraphs, and should be a possible constraint or characterization of edge types as well (e.g. it could be stipulated that c <> c' implies c < c' for edge type <> and parthood relation <). Similarly, "granularity" could be associated with a given filter, query, or operation which specifies a range of scales on which the operation is to be performed. Thanks again for the comments. ---------------------------------------- From: "Edward Diener" <eldiener@tropicsoft.com> Sent: Sunday, June 05, 2011 8:18 AM To: boost@lists.boost.org Subject: Re: [boost] Gauging interest in new library re: multiscale/multigranular containers On 6/4/2011 4:55 PM, nathaniel@tierblend.org wrote:
Hi: I would like to submit a library which would allow functionality related to scale and granularity to be performed on STL-like containers. Until recently I was in a doctoral program specializing in the
of science, and I am especially interested in multiscale data modeling and its applications to e.g. GUI design, code generation, decision
and DSL implementation. For example, one of my projects involves a DSL implemented through multiscale graphs in lieu of conventional syntax
I used to study at Buffalo, which houses the National Center for Ontological Research -- headed by the quite broad-ranging scholar Barry Smith -- and has done pioneering work in biomedical ontology and also in ontological models related to complex or "vague" aggregative structures. In these structures collections of objects of some base type T can be defined as aggregates but with varying degrees of looseness or "individual coherence"; and this aggregation can be iterated to produce multiscaled structures which can be queried or represented at different granularities. There are a number of analytic procedures which can be implemented on such structures, some overlapping with graph theoretic or statistical methods. These structures also possess alot of internal structure above and beyond their underlying list of values, so they are suited to memory-optimized C++ code rather than higher-level languages like Java. However, I am not aware of generic libraries which really incorporate these kind of data structures and their analysis in a systematic way, although I have come across non-template libraries which implement some similar features. What I propose is a library which could be used in a fashion similar to std::vector or std::set but which support additional functions along the lines of, e.g., building multigraphs out of T-type nodes; partitions of T-sets with functions to define or measure e.g. intrapartition relevance; defining (or extending) a scale metric on T-objects, T-sets, T-arrays, or other "T- data structures" and using it for rescaling, scale-filter, and other scale-related transformations; defining iterators or "walks" on T-collections which take scale and granularity into account; etc. In a number of cases the algorithms for these kinds of operations have been described in theoretical form or even already implemented in languages
philosophy procedures, trees. like
Scheme.
You generally need to define "scale" and "granularity" in mathematical terms for computer programmers, else you are just interesting scientists in our particular area. Also would it not be possible to create a class which consists of an STL container and some other objects which represent "scale" and "granularity" and add the necessary functionality to that class rather than attempting to re-engineer some STL container ? _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost