[library process] should we more actively pursue university thesis projects?

Thorsten Ottosen

6 Nov 2004 6 Nov '04

10:46 a.m.

Dear all, Following our discussion of the unicode library, would it not be a good idea to persue such efforts more aggresively? I could imagine it would help bring forward libraries much faster. I think it would be reasonable that the boost comunity provided 1. project descriptions 2. help and guidelines throughout the 6-12 months of the project If we had small papers explaining potential projects, these can be sent to universities which can the in turn suggest them to their students. Off the top of my head, I can think of these projects 1. C++ database library 2. C++ statistics library 3. exact reals class 4. An XML parser and generator library I could probably be a co-author and contact person of (2). Any thoughts? -Thorsten

Show replies by date

Valentin Samko

6 Nov 6 Nov

5:47 p.m.

New subject: [library process] should we more actively pursue university thesis projects?

TO> I could imagine it would help bring forward libraries much faster. I think it TO> would be reasonable that TO> the boost comunity provided TO> 1. project descriptions TO> 2. help and guidelines throughout the 6-12 months of the project TO> If we had small papers explaining potential projects, these can be sent to TO> universities which can the in turn TO> suggest them to their students. TO> Off the top of my head, I can think of these projects TO> 4. An XML parser and generator library I have written a XML parser/DOM constructor while working on my PhD a few years ago (I needed to parse/process 100Mb-500Mb XML files as fast as possible), so one of the main issues was to avoid extra copies of any strings. It can also do SAX style parsing, i.e. process incomplete XML documents and notify the caller about every XML tag. This is somewhat faster if one only needs a few tags from a huge XML document. So far, my parser is used in both, university and commercial environments (I distribute it under boost licence), so this is not a university project any more. It can be compiled by Intel C++ 7, g++ 3, VC++ 6, or higher versions. I have not tried any other compilers. Although it does some XML validation, this is not a proper validating parser, since it was more important to keep it fast and lightweight. It does not support XML namespaces yet, but I have plans to add that. I do not think it is ready for submission to boost yet(not enough comments, the code is not clean enough, and I am going to add a few features/change a few things next month), and I do not know whether anyone wants such a parser in boost. If anyone is interested, I will clean it up, add some features, produce documentation and prepare it for submission/review (hopefully by Christmas). Valentin Samko http://val.samko.info

Thorsten Ottosen

7 Nov 7 Nov

11:23 a.m.

New subject: [library process] should we more actively pursueuniversity thesis projects?

"Valentin Samko" <boost@digiways.com> wrote in message news:1902097042.20041106174726@digiways.com... | TO> 4. An XML parser and generator library I mentioned this because it is on the wish list for C++0x. Unfortunately I don't know much about this stuff- | I have written a XML parser/DOM constructor while working on my PhD a | few years ago (I needed to parse/process 100Mb-500Mb XML files as fast as | possible), so one of the main issues was to avoid extra copies of any | strings. yes, there are a lot of situations in which string performance is critical. I belive using views into the bigger strings is the way to go. | It can also do SAX style parsing, i.e. process incomplete | XML documents and notify the caller about every XML tag. This is | somewhat faster if one only needs a few tags from a huge XML document. | | So far, my parser is used in both, university and | commercial environments (I distribute it under boost licence), | so this is not a university project any more. | | It can be compiled by Intel C++ 7, g++ 3, VC++ 6, or higher versions. | I have not tried any other compilers. | | Although it does some XML validation, this is not a proper validating | parser, since it was more important to keep it fast and lightweight. | It does not support XML namespaces yet, but I have plans to add that. | | I do not think it is ready for submission to boost yet(not enough | comments, the code is not clean enough, and I am going to add a few | features/change a few things next month), and I do not know whether | anyone wants such a parser in boost. | | If anyone is interested, I will clean it up, add some features, | produce documentation and prepare it for submission/review (hopefully by Christmas). It all sounds very nice. Here is what I think you should do. 1. wait until the new release is out; then the list is normal again 2. in the mean time, find out about other solutions to this problem and compare their functionality and speed and design;; find out if other people are working on somthing similar and see if they want to cooperate with you (if you need more manpower) 3. make a new post explaining your work and provide liks to the code; make sure you explain why this can become the ultimate xml-library 4. ask for people's oppinion and what features they want 5. then prepare it for a review submission If you already have lost of usage experience with your code, I think it is a great benefit; the fact that it has been used commercially is also not bad :-) best regards Thorsten

Doug Gregor

8 Nov 8 Nov

6:33 p.m.

New subject: [library process] should we more actively pursue university thesis projects?

On Nov 6, 2004, at 5:46 AM, Thorsten Ottosen wrote:

...

Dear all,

Following our discussion of the unicode library, would it not be a good idea to persue such efforts more aggresively?

I could imagine it would help bring forward libraries much faster. I think it would be reasonable that the boost comunity provided

1. project descriptions

This is a good idea regardless of whether we are going to ask universities to write some Boost libraries. We often discuss potential libraries on the mailing list that never come into existence, but someone picks up later on and would greatly benefit from a recount of what was discussed in the form of a project description.

...

2. help and guidelines throughout the 6-12 months of the project

Again, this is useful for anyone bringing their first library up for review. Granted, it's probably more important in the academic setting (an outside contact person).

...

If we had small papers explaining potential projects, these can be sent to universities which can the in turn suggest them to their students. [snip] Any thoughts?

Well, I have a few comments. The Graph library has benefited greatly from student projects from Generic Programming classes at various universities (the isomorphism, Floyd-Warshall, and A* search algorithms are examples of this), so it can work. Additionally, several good libraries have come from universities. So the work of students at universities can be very good, of course. On the other hand, the motivations of universities and especially students writing thesis projects is very, very different from the motivations of the average Boost developer. The emphasis is on minimizing development time and writing papers about the result (new algorithms, new data structures, etc.), not on creating and maintaining high-quality software. So, here is my intended point: Unless there is a shift in perception so that creating and maintaining a Boost library (or software in general) provides the same academic benefit (as a conference paper or journal article would), the motivations of universities won't line up with the motivations of the Boost community, so I don't see much benefit in soliciting libraries. Smaller bits of functionality (graph or string algorithms, for instance) might be better-suited for class projects, although they would be too small for thesis projects. Interested individuals, whether in academia or industry, will still be able to find Boost regardless. Doug

Andreas Pokorny

12 Nov 12 Nov

6:46 p.m.

New subject: [library process] should we more actively pursue university thesis projects?

Hi, Nice! I recently worked on a xml parser and generator library. I had to work on several different xml formats, and writing sax code for all these formats looked like a stupid repetitive process. Using a dom parser did not help either, there still was lots of code which just forwarded parsed contents of strings to some method, or data structures. So i started working on a way to describe a xml format in c++ code, and generate the sax binding code for that format. So I also had to figure out how arbitrary objects of a certain type can be filled with the data in a string of the xml element. So I defined a 'value property' type for values, and a container property for sequence container ( other might follow ). These properties carry the type information, and the access path for reading and writing of that property. Usually the properties are grouped together in a so called property map, which maps certain key types on the property type ( and object ). With this meta information tool it was possible to build a class interface independent format description that generates a sax parser using the expat library to forward xml items directly to the structures. I have some example code, which is a reduced version of a real format: // two key types used for the property_map: struct Data{}; struct Name{}; // Node is struct Node { private: std::vector<Node*> nodes; std::string name,data; public: typedef property_map< mpl::vector< con<Node>, // adds the type con<Node,Node> so Node is // key and data, so Node can be used to access a // container property, that reflects a sequence // container of Nodes elem<Name, std::string>, // a std::string value property elem<Data,std::string> // like above but identified using Data >, Node > type_i; static type_i const& get_info(); }; I stripped the code in get_info, which initiallizes the property_map structure, bascially because the init code is pretty dense and needs a lot of improvement. There is another structure called RootNode which describes a similar structure but without Data. An example file for that format could look like that: <?xml version="1.0"?> <root_node name="example_tree"> <node name="empty" data="0" /> <node name="base_item1" data="124"> <node name="triple_obj" data="22"> <node name="hs1" data="9"/> <node name="hs2" data="13"/> <node name="hs3" data="10"/> </node> <node name="single" data="-120"/> </node> </root_node> With my library the format can be described like that: boost::shared_ptr<Receiver> basic_node; // Receiver is a base class for all classes which get called by the expat sax code // We now define the 'node' tag: basic_node = xml::gen_object_node( // we have to set the property map, and the tag name xml::sub_tag<Node>( Node::get_info(), "node") // no we add all attributes .attributes( xml::attribute.assign<Name>("name") | xml::attribute.assign<Data>("data") ) // and a sub tag which points on basic_node .sub_tags( xml::link_tag<Node>( basic_node, "node" ) ), Node::get_info() // the property map a second time.. :( ); // now the root tag: boost::shared_ptr<Receiver> root_node = xml::gen_root_node( xml::root_tag( RootNode::get_info(), "root_node") .attributes( xml::attribute.assign<Name>("name") ) .sub_tags( xml::link_tag<Node>( // here we link to basic_node basic_node, "node" ) ) ); Parser p; RootNode obj; try{ // parsing : p.parse( root_node, filename, &obj ); // printing: root_node->print( &obj, file_stream ); }catch ( std::exception &e){ // ... } The xml library was writen to handle lots of different formats, and to easily handle any changes of the format, during the development of the system. It was not intended to become the ultimative xml library, lots of features are missing, but i think it could be good part of a bigger more versatile xml library. Or put on top of the raw sax interface of that xml library. I have to admit that my personal intersts have moved, I am much more intersted in the property part, the defining of meta informations. I plan to write my master (diplom) thesis about that topic. So about defining type information, in C++ structures and types, and then showing how to use this information to simplyfy or automate libraries interfaces. I planed to use the xml library described above, and a simple database library as a proof of concept, maybe also a small gui library based on something like antigrain. The properties still have to be improved, their usage is still too complicated, and some features are missing. The code is available at http://svn.berlios.de/viewcvs/kant/trunk/source/src/util/ and http://svn.berlios.de/viewcvs/kant/trunk/source/src/serialize/ I think about changing the code daily, but i have to finish a different work at the university before i can focus on that code again: Currently the value property consists of a get and set part which allows const and non const access to a value: template <typename T, typename Compound = mpl::void_> struct value_property { boost::shared_ptr< setter<T,Compound> > set; boost::shared_ptr< getter<T,Compound> > get; }; getter and setter are base classes for lots of different kinds of access. The getter for example has 6 different implementations, that handle access by direct memory access, a method pointer that returns a const reference, a method pointer that returns a value, a method that expects reference parameter which gets the value assigned .... I now think about adding a feature to hook functionality into the get or set part of the property, e.g. to lock a mutex, or check the data passed to the property, for example to ensure a certain string format, and to throw on error, or to send a signal everytime the value changes ... Apart from that the property design needs a bigger change, because the current design of the value_property completly fails when used by multiple threads. -- I wish i had more time, these days -- So i would like to work on a 'property' library or meta type library, but this functionality could overlap with a possbile gui library, the boost::db ideas which were performed here and maybe also the boost::python/langbinding libraries. After that i would like to focus on either using that library in a database and/or gui library environment. Regards Andreas Pokorny On Sat, Nov 06, 2004 at 11:46:35AM +0100, Thorsten Ottosen <nesotto@cs.auc.dk> wrote:

...

Dear all,

Following our discussion of the unicode library, would it not be a good idea to persue such efforts more aggresively?

I could imagine it would help bring forward libraries much faster. I think it would be reasonable that the boost comunity provided

1. project descriptions 2. help and guidelines throughout the 6-12 months of the project

If we had small papers explaining potential projects, these can be sent to universities which can the in turn suggest them to their students.

Off the top of my head, I can think of these projects

1. C++ database library 2. C++ statistics library 3. exact reals class 4. An XML parser and generator library

I could probably be a co-author and contact person of (2).

Any thoughts?

-Thorsten

7539

Age (days ago)

7545

Last active (days ago)

List overview

Download

4 comments

4 participants

participants (4)

Andreas Pokorny
Doug Gregor
Thorsten Ottosen
Valentin Samko