
Hello! I am a student from Moscow Institute of Physics and Technology and i wish to participate in Google Summer Of Code project. Boost C++ Libraries development seems really interesting for me, so that's why i decided to ask some questions about details. In the list of ideas there is "Checks & Hashes" project - i had experience in this area, writing a small lib for my university project, but it was mostly like a simple wrapper for RFC implementations of hashing algorithms like MD5 or SHA-1. As i understood this library is got to be something like a collection of different algorithms and it is got to be scalable, right? What are the requirements for me to be able to work on this project? Do i need to write some code for a start? -- Best wishes, Alexandr mailto:daywalker@mail333.com

-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of ????????? Sent: Friday, March 25, 2011 7:10 AM To: boost@lists.boost.org Subject: [boost] gsoc project
Hello! I am a student from Moscow Institute of Physics and Technology and i wish to participate in Google Summer Of Code project. Boost C++ Libraries development seems really interesting for me, so that's why i decided to ask some questions about details. In the list of ideas there is "Checks & Hashes" project - i had experience in this area, writing a small lib for my university project, but it was mostly like a simple wrapper for RFC implementations of hashing algorithms like MD5 or SHA-1. As i understood this library is got to be something like a collection of different algorithms and it is got to be scalable, right?
I didn't envisage much 'scalability' because many algorithms are individual, but several 'hash' algorithms are used for more than one actual 'check' application. So the user sees just the check function, like ISBN("0457695474x") but the 'hash' algorithm is 'under the hood'. But it's your project, so you can define it as you think best.
What are the requirements for me to be able to work on this project?
You need to be able to use some current C++ compiler and have access to Boost files via SVN.
Do i need to write some code for a start?
There are already have five people looking at this project, but you might like to look at https://svn.boost.org/svn/boost/sandbox/SOC/2011/checks/ You might like to see if you can re-build the existing skeleton code (Boost uses a rather bizarre build language called bjam - for portability). You will need to download Boost 1.46 library, and use Tortoise SVN to get boost-sandbox/ (you only need to top level folder structure - be very careful to clear the box that says "include subfolder" or you will get the whole of sandbox (big)!) Then download (update) the /soc/2011/checks folder, *including all the sub-folders this time*. and if you can, you could try coding one of the simple suggested checksums using some literature algorithm. (checksum of byte or int arrays modulo 256 perhaps?). You will find that the ISBN and ISSN examples are already done. You should also be able to add some skeleton documentation to the existing using Doxygen. You won't have write access to the sandbox, but you could work on a local copy and email me a zip of your folder. Good luck and Have fun! Paul --- Paul A. Bristow, Prizet Farmhouse, Kendal LA8 8AB UK +44 1539 561830 07714330204 pbristow@hetp.u-net.com

On 25/03/2011 11:07, Paul A. Bristow wrote:
There are already have five people looking at this project, but you might like to look at
Why are you using std::string rather than an arbitrary range? Ranges would in particular allow to apply the algorithm to data contained within a file. Also that code obviously lacks the "inline" specifier, which is necessary to put non-template free functions in header files...

-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Mathias Gaunard Sent: Friday, March 25, 2011 10:58 AM To: boost@lists.boost.org Subject: Re: [boost] gsoc project
On 25/03/2011 11:07, Paul A. Bristow wrote:
There are already have five people looking at this project, but you might like to look at
Why are you using std::string rather than an arbitrary range?
Just KISS for an naïve example.
Ranges would in particular allow to apply the algorithm to data contained within a file.
Also that code obviously lacks the "inline" specifier, which is necessary to put non-template free functions in header files...
These are details that the GSoC implementer will need to grapple with, including the even more difficult task of reconciling Boosters views! I'm sure you will have some views ;-) Paul --- Paul A. Bristow, Prizet Farmhouse, Kendal LA8 8AB UK +44 1539 561830 07714330204 pbristow@hetp.u-net.com

On 25 March 2011 10:07, Paul A. Bristow <pbristow@hetp.u-net.com> wrote:
There are already have five people looking at this project, but you might like to look at
Have you seen Scott McMurray's hash library? http://svn.boost.org/svn/boost/sandbox/hash/

On Fri, Mar 25, 2011 at 04:31, Daniel James <dnljms@gmail.com> wrote:
On 25 March 2011 10:07, Paul A. Bristow <pbristow@hetp.u-net.com> wrote:
There are already have five people looking at this project, but you might like to look at
Have you seen Scott McMurray's hash library?
I saw the checks and hashes as somewhat different projects. It would be silly, for example, to calculate the check digit for a VISA card using the same interface as a SHA-512 hash. I'd encourage a potential GSOC participant to focus just on check digits. As the wiki mentions, there aren't any libraries out there for it, and it'd be a very useful thing for all sorts of scenarios. All those things like "member numbers" or "catalog item numbers" just cry out for them to make the UX less terrible. And pragmatically, a check digits library probably won't have to prove competitive performance with Crypto++, Botan, OpenSSL, etc in order to be accepted. It'd probably be mostly an interface-driven review. That said, if anyone does want to work on hashes, I'd be happy to help. (At least as much as I can under the Work for Hire that's prevented me from working on it lately.) ~ Scott

-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Scott McMurray Sent: Saturday, March 26, 2011 2:17 AM To: boost@lists.boost.org Subject: Re: [boost] gsoc project
On Fri, Mar 25, 2011 at 04:31, Daniel James <dnljms@gmail.com> wrote:
On 25 March 2011 10:07, Paul A. Bristow <pbristow@hetp.u-net.com> wrote:
There are already have five people looking at this project, but you might like to look at
Have you seen Scott McMurray's hash library?
I saw the checks and hashes as somewhat different projects. It would be silly, for example, to calculate the check digit for a VISA card using the same interface as a SHA-512 hash.
I'd encourage a potential GSOC participant to focus just on check digits. As the wiki mentions, there aren't any libraries out there for it, and it'd be a very useful thing for all sorts of scenarios.
All those things like "member numbers" or "catalog item numbers" just cry out for them to make the UX less terrible. And pragmatically, a check digits
I agree - that was my original intention. But there are *some* similarities between the two. (Often you feed a string and you get a 'check' be it a single 4 bit digit, or a loadsabits digest?). library
probably won't have to prove competitive performance with Crypto++, Botan, OpenSSL, etc in order to be accepted. It'd probably be mostly an interface- driven review.
And there are plenty of checks to do before getting too ambitious. I hope I've made it clear that I place great importance on finishing each type of check (including tests and docs) before doing another one.
That said, if anyone does want to work on hashes, I'd be happy to help. (At least as much as I can under the Work for Hire that's prevented me from working on it lately.)
It would be silly to reinvent your wheels. Thanks Paul --- Paul A. Bristow, Prizet Farmhouse, Kendal LA8 8AB UK +44 1539 561830 07714330204 pbristow@hetp.u-net.com

On Wed, Mar 30, 2011 at 09:57, Paul A. Bristow <pbristow@hetp.u-net.com> wrote:
But there are *some* similarities between the two. (Often you feed a string and you get a 'check' be it a single 4 bit digit, or a loadsabits digest?).
Conceptually, I agree. But since the point of check digits is to find human-entry errors, the trade-offs and practical issues are totally different. For example, they only make sense for things short enough that a human would be willing to type, so having an interface that encourages sequentially providing 4-KiB blocks of unsigned chars makes no sense. Similarly, a human has provided it manually, so passing it as a wstring is a reasonable, performance-mostly-unimportant thing to do, compared to how long it took the user to type it. Especially since the checker might need to consider whether the ISBN-10 was entered with 'X', 'x', 'х' (Cyrillic ha), 'X' (fullwidth), or 'x' (fullwidth). ~ Scott

-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Scott McMurray Sent: Thursday, March 31, 2011 1:34 AM To: boost@lists.boost.org Subject: Re: [boost] gsoc project
On Wed, Mar 30, 2011 at 09:57, Paul A. Bristow <pbristow@hetp.u-net.com> wrote:
But there are *some* similarities between the two. (Often you feed a string and you get a 'check' be it a single 4 bit digit, or a loadsabits digest?).
Conceptually, I agree.
But since the point of check digits is to find human-entry errors, the trade-offs and practical issues are totally different.
For example, they only make sense for things short enough that a human would be willing to type, so having an interface that encourages sequentially providing 4-KiB blocks of unsigned chars makes no sense.
Similarly, a human has provided it manually, so passing it as a wstring is a reasonable, performance-mostly-unimportant thing to do, compared to how long it took the user to type it. Especially since the checker might need to consider whether the ISBN-10 was entered with 'X', 'x', 'х' (Cyrillic ha), 'X' (fullwidth), or 'x' (fullwidth).
No disagreement here (though some will also want to use the check on machine written files/databases). I'm sure anyone who does this project will be tapping your expertise on this area. Paul --- Paul A. Bristow, Prizet Farmhouse, Kendal LA8 8AB UK +44 1539 561830 07714330204 pbristow@hetp.u-net.com

Здравствуйте, Paul. Вы писали 25 марта 2011 г., 14:07:49:
-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of ????????? Sent: Friday, March 25, 2011 7:10 AM To: boost@lists.boost.org Subject: [boost] gsoc project
Hello! I am a student from Moscow Institute of Physics and Technology and i wish to participate in Google Summer Of Code project. Boost C++ Libraries development seems really interesting for me, so that's why i decided to ask some questions about details. In the list of ideas there is "Checks & Hashes" project - i had experience in this area, writing a small lib for my university project, but it was mostly like a simple wrapper for RFC implementations of hashing algorithms like MD5 or SHA-1. As i understood this library is got to be something like a collection of different algorithms and it is got to be scalable, right?
I didn't envisage much 'scalability' because many algorithms are individual, but several 'hash' algorithms are used for more than one actual 'check' application.
So the user sees just the check function, like ISBN("0457695474x") but the 'hash' algorithm is 'under the hood'.
But it's your project, so you can define it as you think best.
What are the requirements for me to be able to work on this project?
You need to be able to use some current C++ compiler and have access to Boost files via SVN.
Do i need to write some code for a start?
There are already have five people looking at this project, but you might like to look at
You might like to see if you can re-build the existing skeleton code (Boost uses a rather bizarre build language called bjam - for portability).
You will need to download Boost 1.46 library, and use Tortoise SVN to get boost-sandbox/
(you only need to top level folder structure - be very careful to clear the box that says "include subfolder" or you will get the whole of sandbox (big)!)
Then download (update) the /soc/2011/checks folder, *including all the sub-folders this time*.
and if you can, you could try coding one of the simple suggested checksums using some literature algorithm. (checksum of byte or int arrays modulo 256 perhaps?).
You will find that the ISBN and ISSN examples are already done.
You should also be able to add some skeleton documentation to the existing using Doxygen.
You won't have write access to the sandbox, but you could work on a local copy and email me a zip of your folder.
Good luck and Have fun!
Paul
--- Paul A. Bristow, Prizet Farmhouse, Kendal LA8 8AB UK +44 1539 561830 07714330204 pbristow@hetp.u-net.com
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Hello, Paul! I've send you a message(to pbristow@hetp.u-net.com, is it ok?) with codes included. Did you receive it? I coded checksum of byte array modulo 256 and it's been a long time, no answer though.. I just wanted to know what do you think about it, because i'm pretty sure it is not how it is got to be, but i wanted to read your notes about it. Thank you! -- Best wishes, Alexandr mailto:daywalker@mail333.com
participants (5)
-
Daniel James
-
Mathias Gaunard
-
Paul A. Bristow
-
Scott McMurray
-
Александр