On 9 Oct 2013 at 22:49, Daniel Pfeifer wrote:
I see that .gitattributes now contains explicit eol and mime type mappings for every file in the repo. I advised you against that - I think it will make conversion performance slow from all the lookups, never mind poor old git trying to parse that map on every file checkout.
The commit that added the file is labeled "Add Niall Douglas' .gitattributes file". Care to repeat what you did advise exactly?
My original .gitattributes file was a set of file extension wildcards exactly matching the Boost wiki page describing SVN auto-props. This ensured that when checking out text formats, you got native EOLs or Unix EOLs as is correct for that type of text. The problem which then emerged is that many files have been historically committed to Boost years ago with all sorts of weird and unconventional EOLs and UTF variations e.g. there are some files in sandbox which have intentionally malformed EOLs as part of its unit tests, something which would break if committed as non-binary to git. There is also quite a bit of UTF-16 text, especially in ancient revisions before UTF-8 became popular. Git has no understanding of UTF-16 text, so text is either UTF-8 or it's a binary. My original advice was that unintentionally weird text formats needed fixing up for git correctness i.e. UTF-8 throughout with EOL 10. I hence sent a patch to Boost2Git which scanned text format files for bad EOLs, and it did repairs in flight. This seems to have worked, but it advertently repaired intentionally weird text formats. My original advice was that intentionally weird text formats need a better file extension than .txt e.g. .bin. That said, I accepted Dave's argument that this causes breakage in ancient revision checkouts, but I still would argue that if people really need ancient revisions working properly, go use a legacy SVN repo. Dave I think decided that every text file needed listing in .gitattributes with its EOL style, and the .gitattributes file needs permuting every commit because as commits pass some text files will change from UTF-16 EOL 13,10 through Latin1 EOL 13,10 to UTF-8 EOL 10 etc so a static .gitattributes would still introduce corruption. This is what is needed if you want any possible past revision checkout to be a perfect representation of the SVN repo. I'd imagine Boost2Git will also need a map of which text files are intentionally malformed and must be treated as binary over which range of SVN revision numbers. Building that map, I would imagine, will involve a lot of human hours. As this list knows, I think all this work being done for free is excessive. If people really, really want all possible ancient revisions to work, they should pay the contracting hourly rate to people to implement it. Expecting all this for free is in my opinion daft. A git conversion of Boost ought to work reasonably well for the past three years of checkouts, past that it should be for illustration of history only. Niall -- Currently unemployed and looking for work. Work Portfolio: http://careers.stackoverflow.com/nialldouglas/