Re: [boost] [git] conversion stalled

9 Oct 2013

      On 9 Oct 2013 at 22:49, Daniel Pfeifer wrote:
...
...
I see that .gitattributes now contains explicit eol and mime type
mappings for every file in the repo. I advised you against that - I
think it will make conversion performance slow from all the lookups,
never mind poor old git trying to parse that map on every file
checkout.
The commit that added the file is labeled "Add Niall Douglas'
.gitattributes file".
Care to repeat what you did advise exactly?
My original .gitattributes file was a set of file extension wildcards 
exactly matching the Boost wiki page describing SVN auto-props. This 
ensured that when checking out text formats, you got native EOLs or 
Unix EOLs as is correct for that type of text.

The problem which then emerged is that many files have been 
historically committed to Boost years ago with all sorts of weird and 
unconventional EOLs and UTF variations e.g. there are some files in 
sandbox which have intentionally malformed EOLs as part of its unit 
tests, something which would break if committed as non-binary to git. 
There is also quite a bit of UTF-16 text, especially in ancient 
revisions before UTF-8 became popular. Git has no understanding of 
UTF-16 text, so text is either UTF-8 or it's a binary.

My original advice was that unintentionally weird text formats needed 
fixing up for git correctness i.e. UTF-8 throughout with EOL 10. I 
hence sent a patch to Boost2Git which scanned text format files for 
bad EOLs, and it did repairs in flight. This seems to have worked, 
but it advertently repaired intentionally weird text formats.

My original advice was that intentionally weird text formats need a 
better file extension than .txt e.g. .bin. That said, I accepted 
Dave's argument that this causes breakage in ancient revision 
checkouts, but I still would argue that if people really need ancient 
revisions working properly, go use a legacy SVN repo.

Dave I think decided that every text file needed listing in 
.gitattributes with its EOL style, and the .gitattributes file needs 
permuting every commit because as commits pass some text files will 
change from UTF-16 EOL 13,10 through Latin1 EOL 13,10 to UTF-8 EOL 10 
etc so a static .gitattributes would still introduce corruption. This 
is what is needed if you want any possible past revision checkout to 
be a perfect representation of the SVN repo. I'd imagine Boost2Git 
will also need a map of which text files are intentionally malformed 
and must be treated as binary over which range of SVN revision 
numbers. Building that map, I would imagine, will involve a lot of 
human hours.

As this list knows, I think all this work being done for free is 
excessive. If people really, really want all possible ancient 
revisions to work, they should pay the contracting hourly rate to 
people to implement it. Expecting all this for free is in my opinion 
daft. A git conversion of Boost ought to work reasonably well for the 
past three years of checkouts, past that it should be for 
illustration of history only.

Niall

-- 
Currently unemployed and looking for work.
Work Portfolio: http://careers.stackoverflow.com/nialldouglas/