
Hello everyone, My name is Alexandru Damian. I am a second year student at the Polytechnic University of Bucharest, Romania. I am very interested in the Heaps & Queues idea, and with both a good programming and mathematical background, I believe that I can offer valuable ideas to this project. Below is a draft proposal for this. I didn't know what you would expect from such a proposal (talked to grafikrobot on freenode and he redirected me here), so the structures that are to be implemented are rather vague. If the proposal is on track, I will detail them. I have also talked so some guys on the irc channel about a more generic approach to trees and tries. What's your opinion on that (given that tree structures are already implemented in /intrusive)? I hope I'm not too annoying :). ***STUDENT PROPOSAL: HEAPS & QUEUES*** NOTE: My Student Proposal follows the template listed on: https://svn.boost.org/trac/boost/wiki/SoCSubmissionTemplate **PERSONAL DETAILS** NAME: Alexandru Damian UNIVERSITY: Polytechnic University of Bucharest MAJOR: Computer Science and Engineering DEGREE PROGRAM: B.Sc. EMAIL: alexdamian1989@gmail.com AVAILABILITY: According to the official timeline, there will be 11-12 weeks at my disposal: March 24 - August 9 (16). However, I plan to contract the "Community Bonding Period" to 2-3 weeks, so that I may have at least one week advance in coding. This will help me better organize my time during the summer, when I will be unavailable for about 2 weeks (in June) because of my exams. **BACKGROUND INFORMATION:** EDUCATIONAL & WORK BACKGROUND: I have been preparing in the fields of Computer Science and Mathematics since high-school when I have competed in numerous national and international competitions, gaining important experience and always finishing among the best. My interest in personal and professional growth has continued at university as well, with even better results at contests and scientific research. My work background and coursework have supplied me with many skills in theoretical computer science, data structures, algorithms and programming languages. In addition, I have broad experience in Mathematics and problem solving techniques. In the year 2009, I have worked at different projects for the Faculty of Automatic Control and Computers, including the research and development of a way to store and illustrate the temperature ratio in the University server room. The same year, I worked as a volunteer student consultant and I offered support in Mathematics for students taking the admission examinations. EXPRESSION OF INTEREST: Boost C++ Libraries has caught my attention in 2008, when I had my first important encounters with the Open Source Community. A library that offers more advanced data structures and algorithms than STL is needed on a daily basis when optimizing complex code. It is only natural for me to choose the "Heaps & Queues" project, as I have valuable experience in the algorithms that power these structures and their real-world applications. My plans for further support and development of the Boost C++ Libraries will extend even after the GSOC deadline, with plans to implement more algorithms supplementary to the work done during the summer and in other areas (such as the BigInt idea, or AI algorithms: neural networks, genetic algorithms, fuzzy systems etc.). PROGRAMMING SKILLS: Strong background in theoretical computer science, algorithms, data stuctures and optimization . Strong background in advanced mathematics. Programming Languages: C (4/5), Java (4/5), C++ (3/5), C# (3/5), Shell scripting (2/5), ASM (2/5). Other programming languages and their proficency levels are illustrated in the Resume. **PROJECT PROPOSAL** ABSTRACT The purporse of this project is to create a library that involves heap and queue creation and manipulation. This will be very useful in implementing more complex algorithms or data structures that take advantage of these structures (a basic example is Dijkstra's shortest path algorithm). The library will include basic implementations of heap structures (i.e. binary heap) and queues, as well as the more complex (and more relevant) structures. PROJECT CONTENTS The current version of the Libraries has little support for heaps and queues. Therefore, the following will be implemented, with a discussion on every structure's difficulty, speed and memory used. Firstly, a well-known table of complexities for the binary min(max)-heap: findMin(findMax) : O(1) | deleteMin(deleteMax) : O(log N) | insert : O(log N) | merge : O(N) * D-ARY HEAP : a more general implementation of the binary heap (2-ary heap). The main advantage a d-ary heap has is that insertion will be significantly faster, requiring a theoretical O(log_d N) operations. However, finding an element's parent is rather difficult if d is not a power of 2, since we can not use bitwise shifting operations. However, delete-min is slower, because of the d-1 comparisons to find the minimum value, using a standard algorithm (probable fix: use a min-heap). * BINOMIAL HEAP, BINOMIAL TREES : a mergeable heap; although the complexity for findMin(findMax) is O(log N), this heap, along with the ones below (also mergeable), offer a better complexity for merge : O(log N). Since binomial heaps use binomial trees, this data structure will also be implemented. * FIBONACCI HEAP : another mergeable heap, with better amortized time complexity than the binomial heap * RELAXED HEAP, RELAXED FIBONACCI HEAP (an idea at: http://www.google.ro/url?sa=t&source=web&ct=res&cd=1&ved=0CAYQFjAA&url=http%3A%2F%2Fwww.pmg.lcs.mit.edu%2F~chandra%2Fpublications%2Fheap.pdf&ei=YOa5S5rEFIylOK3CtKEL&usg=AFQjCNFVgc1o8BRokpgu5iU0f1gXhkQDAw&sig2=qzEJWD56cflm1CnRf3esbA) * SKEW HEAP : a structure that is easier to merge than a binary heap, but is unbalanced. This could be considered optional, as it is a median option between d-ary heaps and mergeable heaps (binomial, fibonacci). * TREAP : although treaps are already implemented, they should be adapted to use the new heap library. * QUEUE : a basic implementation of the queue structure (possibly use the STL container) * DEQUEUE : although an implementation already exists /fusion/container, that has to be documented. A new implementation using the generic queue structure mentioned above. * PRIORITY QUEUE : implemented with the use of a heap. * BUFFER (CIRCULAR QUEUE) : fixed size, circular queue (if queue is full, the oldest data is overwritten). Such a queue could be used to implement a structure similar to RRDs. * (BONUS) SKIPLISTS : randomized search structure. * (BONUS) B-TREES : extended binary trees with excellent complexity when comparisons are not the most time-consuming operations. * (BONUS) INTERVAL TREES : extended complete binary trees supporting fast segment intersection queries. Some of these structures can use a fixed size array (use /boost/array). However, it would be good to eliminate the constraints of fixed size arrays. This will require using STL's <vector> or, should it be decided not to include a binary, a local implementation of it, i.e. ARRAY LIST. TIMELINE As stated above, I would like to start with an advance of one week (May 17). WEEKS -2, -1, 0 (April 26 - May 16) : Community Bonding Period These three weeks will offer me the possibility to accomodate with the code, the coding conventions and the community. During these 3 weeks, I will also discuss with the organization whether binary libraries should be included or proprietary structures should be implemented (variable sized arrays, basic queue, basic priority queue (heap)). Modifications in the timeline (reordering, additions, deletions) can also occur. Finally, I will decide if I clean up the implementations from /pending (i.e. queue, fibonacci_heap, relaxed_heap, etc.) or create new structures, based on the newly implemented structures. Implement some simple structures so as to better get used to the code (probably ARRAY LIST or BINARY HEAP). WEEK 1 (May 17 - May 23) : Implement D-ARY HEAP, BINARY HEAP (2-ARY HEAP). Decide whether to first implement the d-ary heap and then the binary heap as a particular case, or implement the binary heap first, and after that the d-ary heap, associating a binary min(max)-heap to each array of children; solve the excess memory. WEEKS 2, 3 (May 24 - June 6) : Implement BINOMIAL TREES, BINOMIAL HEAPS, FIBONACCI HEAPS (don't adapt, usage for relaxed fibonacci heaps), RELAXED HEAPS and RELAXED FIBONACCI HEAPS. Official start of the coding phase. All of these heaps follow the same basic ideas, therefore progress should be relatively easier after the binomial heaps are implemented. Initial algorithms will probably be those specified (and variations) in Cormen or Knuth, as well as the idea suggested in the above mentioned link. WEEKS 4, 5 (June 7 - June 20) : University Exams WEEKS 6, 7 (June 21 - July 4) : Fix any problems with previous heaps. Implement SKEW HEAPS. Adapt TREAPS. WEEKS 8, 9 (July 5 - July 18/16) : Add mutable behaviour to the heaps. Apply optimizations. Create some automated tests. Week 8 will mark the mid-term evaluation deadline. The objective is to complete HEAP structures. The timeline offers roughly 2 weeks for bug fixing, so that should be enough. WEEK 10 (July 19 - July 25) : Implement QUEUE, DEQUEUE. WEEK 11 (July 26 - August 1) : Implement CIRCULAR QUEUE, PRIORITY QUEUE. WEEK 12 (August 2 - August 8) : Fix bugs related to the QUEUE structures. Add mutable behaviour to the queues. Apply optimizations and custom algorithms for fixed-length data types. Create automated tests. WEEK 13 (August 9 - August 16): Finalize code and documentation. NOTES: 1. The Exams period is, actually, 4 weeks (June 7 - July 5), comprising 5 exams. However, I should be able to compress the total time to only 2 weeks by moving some of my studying in May. 2. Weeks 6 and 8 are reserved for fixing heap bugs. Should there be less problems than expected, I can gain one bonus week. 3. If we choose to implement the structures using STL binaries, I could gain one more bonus week from weeks 1, 10 and 11. 4. With (2) and (3) in mind, I could implement any of the bonuses written above (at the end, if everything else is complete). a) SKIPLISTS : a randomized search structure with very good practical complexity. b) INTERVAL TREES : given the already implemented Interval structure (/numeric/interval), it would be nice to have such a structure, for easy geometric intersection queries. c) B-TREES : adaptation of the Balanced Binary Trees, typically used in situations where the comparison operation is not the most time consuming one (disk seeking). WORK FLOW I plan to organize my time and coding in such a manner that I am able to post changes on a forum / blog every 4 days (except the period with exams). That means roughly 2 times per week, as I will have no problem working in weekends. TODO Cheers, -- Alexandru Damian Polytechnic University of Bucharest Faculty of Automatic Control and Computers