
Hello all, for my thesis, I'll be developing a self-contained framework for algorithms used in bioinformatics. This will include algorithms such as Hamming distance, Levenshtein distance or Longest common subsequence algorithms, gene prediction algorithms, 2D and 3D scoring matrices, alignment problems, etc. Would be there - in some near future - any interest in such library? Thanks, R. Goldwein

Robert Goldwein wrote:
Hello all,
for my thesis, I'll be developing a self-contained framework for algorithms used in bioinformatics. This will include algorithms such as Hamming distance, Levenshtein distance or Longest common subsequence algorithms, gene prediction algorithms, 2D and 3D scoring matrices, alignment problems, etc.
Would be there - in some near future - any interest in such library?
Thanks, R. Goldwein _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Dear Robert, I'm working in bioinformatics, so I would be very interested in your library. A good and free bioinformatics library should be very useful. It sounds like a great initiative. However, I also think it may be a bit too domain-specific to belong to Boost. --Johan

On 12/5/06, Johan Råde <rade@maths.lth.se> wrote:
Robert Goldwein wrote:
Hello all,
for my thesis, I'll be developing a self-contained framework for algorithms used in bioinformatics. This will include algorithms such as Hamming distance, Levenshtein distance or Longest common subsequence algorithms, gene prediction algorithms, 2D and 3D scoring matrices, alignment problems, etc.
Would be there - in some near future - any interest in such library?
I'm working in bioinformatics, so I would be very interested in your library. A good and free bioinformatics library should be very useful. It sounds like a great initiative.
However, I also think it may be a bit too domain-specific to belong to Boost.
Actually even if I do not work in bioinformatics, I would use a good LCS implementation. Also some of the distances could be useful. -- Giovanni P. Deretta

Giovanni Piero Deretta wrote:
Actually even if I do not work in bioinformatics, I would use a good LCS implementation. Also some of the distances could be useful.
Perhaps some of the algorithms, like Levenshtein distance, could be extracted and integrated into the StringAlgo library? Sebastian Redl

Hi Sebastian & Giovanni, yes, such separation to more general and more specific bioinf-related algorithms is one of possible ways how I would proceed, it should be more clear in about a month. But you're right, algorithms as general as Levenshtein distance or even LCS would be useful in String Algorithms, I'll think about it. Thanks for your response, Robert -----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Sebastian Redl Sent: Tuesday, December 05, 2006 16:40 To: boost@lists.boost.org Subject: Re: [boost] Bioinformatics algorithms in boost? Giovanni Piero Deretta wrote:
Actually even if I do not work in bioinformatics, I would use a good LCS implementation. Also some of the distances could be useful.
Perhaps some of the algorithms, like Levenshtein distance, could be extracted and integrated into the StringAlgo library? Sebastian Redl _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Hi, Robert Goldwein wrote:
Hi Sebastian & Giovanni,
yes, such separation to more general and more specific bioinf-related algorithms is one of possible ways how I would proceed, it should be more clear in about a month. But you're right, algorithms as general as Levenshtein distance or even LCS would be useful in String Algorithms, I'll think about it.
If you are willing to contribute some algorithms to StringAlgo library, just drop me a private mail, we can workout the details. The algorithms you mentioned are useful and generaly accepted. Correct me if I'm not wrong, but AFAIK there is no need for a review to add somthing like this. Regards Pavol

Dear Johan, thanks for your reply - so you surely know about bioinformatics frameworks for Perl or Python, and I miss something good in C++ (I found several bioinformatics projects, or FASTA and BLAST implementations, but my idea is a general multipurpose library). You are right, problems like gene prediction or algorithms like BLAST are quite specific, but it all uses general string algorithms. Maybe it would be wiser to divide it into some advanced string library and specialized bioinformatics library, but on the other hand, those string algorithms can be optimized specifically for our specific conditions (triplets, limited set of characters, etc.)... In about a month I should have some sketch, so I'll let you know. Thanks for encouraging words, Robert -----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Johan Råde Sent: Tuesday, December 05, 2006 15:28 To: boost@lists.boost.org Subject: Re: [boost] Bioinformatics algorithms in boost? Robert Goldwein wrote:
Hello all,
for my thesis, I'll be developing a self-contained framework for algorithms used in bioinformatics. This will include algorithms such as Hamming distance, Levenshtein distance or Longest common subsequence algorithms, gene prediction algorithms, 2D and 3D scoring matrices, alignment problems, etc.
Would be there - in some near future - any interest in such library?
Thanks, R. Goldwein _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Dear Robert, I'm working in bioinformatics, so I would be very interested in your library. A good and free bioinformatics library should be very useful. It sounds like a great initiative. However, I also think it may be a bit too domain-specific to belong to Boost. --Johan _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Robert Goldwein Sent: 05 December 2006 12:34 To: boost@lists.boost.org Subject: [boost] Bioinformatics algorithms in boost?
for my thesis, I'll be developing a self-contained framework for algorithms used in bioinformatics. This will include algorithms such as Hamming distance, Levenshtein distance or Longest common subsequence algorithms, gene prediction algorithms, 2D and 3D scoring matrices, alignment problems, etc.
Would be there - in some near future - any interest in such library?
I think this is certainly potential material for Boost. IMO this IS the sort of area that Boost should be offering algorithms - not just the more 'programmy' things. Paul --- Paul A Bristow Prizet Farmhouse, Kendal, Cumbria UK LA8 8AB +44 1539561830 & SMS, Mobile +44 7714 330204 & SMS pbristow@hetp.u-net.com

Robert Goldwein wrote:
Hello all,
for my thesis, I'll be developing a self-contained framework for algorithms used in bioinformatics. This will include algorithms such as Hamming distance, Levenshtein distance or Longest common subsequence algorithms, gene prediction algorithms, 2D and 3D scoring matrices, alignment problems, etc.
Would be there - in some near future - any interest in such library?
Yes, as others have also indicated. The real question is why you think of these as "bioinformatics algorithms" rather than just plain "algorithms"? Have they been restricted in some way that prevents them from being used for general purposes? I've used Levenshtein distance variants a great deal in geographic name processing applications. For real-world applications, there has to be a way to recognize additional distances (i.e. costs) in various cases. I expect the same refinements apply to many problem domains. Wouldn't the same apply to the bioinformatics domain? --Beman

Well, I'm sorry for this confusion, you're naturally right. The problem is that any bioinformatics textbook (e.g., basic, but one of the very best is http://www.bioalgorithms.info/) introduces these algorithms, so as time goes by, my perspective becomes somewhat limited ;-) The best way would really be to extend appropriate libraries (string & math algorithms), and create that bionf library with really bioinf-specific content. Thank you all for your responses in this matter, this is the help I was hoping for. Robert -----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Beman Dawes Sent: Tuesday, December 05, 2006 19:52 To: boost@lists.boost.org Subject: Re: [boost] Bioinformatics algorithms in boost? Robert Goldwein wrote:
Hello all,
for my thesis, I'll be developing a self-contained framework for algorithms used in bioinformatics. This will include algorithms such as Hamming distance, Levenshtein distance or Longest common subsequence algorithms, gene prediction algorithms, 2D and 3D scoring matrices, alignment problems, etc.
Would be there - in some near future - any interest in such library?
Yes, as others have also indicated. The real question is why you think of these as "bioinformatics algorithms" rather than just plain "algorithms"? Have they been restricted in some way that prevents them from being used for general purposes? I've used Levenshtein distance variants a great deal in geographic name processing applications. For real-world applications, there has to be a way to recognize additional distances (i.e. costs) in various cases. I expect the same refinements apply to many problem domains. Wouldn't the same apply to the bioinformatics domain? --Beman _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Robert Goldwein Sent: 07 December 2006 03:03 To: boost@lists.boost.org Subject: Re: [boost] Bioinformatics algorithms in boost?
Well, I'm sorry for this confusion, you're naturally right. The problem is that any bioinformatics textbook (e.g., basic, but one of the very best is http://www.bioalgorithms.info/) introduces these algorithms, so as time goes by, my perspective becomes somewhat limited ;-) The best way would really be to extend appropriate libraries (string & math algorithms), and create that bionf library with really bioinf-specific content.
And of course these same or similar algorithms are called chemometrics by chemists, statistics by statisticans and mathematics by mathematicians... So it is useful to have examples from several spheres (and give multiple names for things where, as all too often, each group calls the same thing something different). Paul --- Paul A Bristow Prizet Farmhouse, Kendal, Cumbria UK LA8 8AB +44 1539561830 & SMS, Mobile +44 7714 330204 & SMS pbristow@hetp.u-net.com

On Tue, Dec 05, 2006 at 01:33:56PM +0100, Robert Goldwein <robert.goldwein@dtptools.com> wrote:
Hello all,
for my thesis, I'll be developing a self-contained framework for algorithms used in bioinformatics. This will include algorithms such as Hamming distance, Levenshtein distance or Longest common subsequence algorithms, gene prediction algorithms, 2D and 3D scoring matrices, alignment problems, etc.
I once worked for the bioinfotmatics department in Würzburg. I boostfied (spirit based parsers) profdist http://www.biozentrum.uni-wuerzburg.de/profdist.html and cbcanalyzer http://www.biozentrum.uni-wuerzburg.de/cbcanalyzer.html Before I left I was working on more stable sequence distance correction algorithms, and more generic datastructures to represent plain sequence information and sequence files with structure information. In general I think you should also look for algorithms used in statistic methods. Regards Andreas Pokorny
participants (8)
-
Andreas Pokorny
-
Beman Dawes
-
Giovanni Piero Deretta
-
Johan Råde
-
Paul A Bristow
-
Pavol Droba
-
Robert Goldwein
-
Sebastian Redl