[filesystem] problem: is_regular_file and deduplified files (reparse+sparse)

Hi guys, I'm seeing a lot of failures all of a sudden, its a serious problem. I imagine some of you may start seeing the same failures too very soon. I'm seeing it with boost 1.58.0. There is a related bug already entered here: https://svn.boost.org/trac/boost/ticket/11057 I can't seem to log on and update this ticket. But the problem extends further than that ticket. Windows Server 2003 support has been dropped by Microsoft, and a lot of IT depts around the world are switching over to the latest Windows Server. With this server comes the new "dedup" feature, that can automatically deduplify files. This happens on a schedule, eg 2am saturday. So suddenly we are getting messages of failures of software from all over the place, due to fs::is_regular_file() Deduped files have the REPARSE and SPARSE flag set. On the command line, you can run FSUTIL REPARSEPOINT QUERY and the "Reparse Tag Value" is 0x80000013 Which is a relatively new flag known as IO_REPARSE_TAG_DEDUP https://msdn.microsoft.com/en-us/library/windows/desktop/aa365740%28v=vs.85%... These files act as normal files, you can fopen and fread them, so I assume they should be treated almost like symlink by boost... perhaps not quite a symlink because I assume the "lstat" link properties are identical to the file's stat properties. Typically, I iterate over directories and only process files if fs::is_regular_file(filename) is true. I wrote some code to check what the properties were on these files, and its not any of the possible enums detected by file_status::type(). ideas? Best regards, Paul

On 23 Jul 2015 at 8:56, Paul Harris wrote:
Proposed Boost.AFIO doesn't support IO_REPARSE_TAG_DEDUP because I have no access to any system to test the support upon. However, if AFIO were to support IO_REPARSE_TAG_DEDUP, it would treat it identically to a symlink/junction point. I'd suggest Boost.Filesystem do the same, and treat pseudo-symlinks as symlinks. That probably means adding full symlink support for Filesystem on Windows. Here are some links to example implementation code: Reading a symlink target: https://github.com/BoostGSoC13/boost.afio/blob/master/include/boost/af io/v2/detail/impl/afio_iocp.ipp#L511 Writing a symlink: https://github.com/BoostGSoC13/boost.afio/blob/master/include/boost/af io/v2/detail/impl/afio_iocp.ipp#L848 Obviously best not allow rewriting a pseudo-symlink like IO_REPARSE_TAG_DEDUP, make it read only. Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/

Hi Niall, you can use the Azure to test this sort of thing... I think. I'm
trying it out now.
http://blogs.technet.com/b/tommypatterson/p/azureservertrial.aspx
On 23 July 2015 at 09:54, Niall Douglas

FYI, I followed the blog article,
then once the machine was "running" I clicked Connect at the bottom.
That gave me an .rdp file which in theory I could use with rdesktop, but it
uses a DNS name that was only just created, so that didn't work.
When you click the name of the server in the list, it shows the public IP
on the right.. and the port
then you can do this
$ rdesktop that.ip.addr:port
But only if you have the latest rdesktop AND you have set up kerberos
something-something.
Instead I found a windows computer and used remote desktop from there.
---
Once inside,
in the "Server Manager --> Dashboard" window on the screen, click "Add
Roles"
then go next next until "Server Roles"
expand "File and Storage services" , "File and iSCSI" , and tick "Data
Deduplication"
Then next next etc and Install.
Wait a bit... and its done.
http://www.techrepublic.com/blog/data-center/configuring-windows-server-8-de...
---
Continuing on that webpage...
Time to enable dedup. There is a temp disk D: so lets enable there.
Method 1... I did this and then went to method 2... Start PowerShell, type:
"Enable-DedupVolume D:"
Method 2... in that same Dashboard, hit the 4th button (File and Storage
Services)
Then Volumes --> Disks
click Volume 1 at the top, and then right click D: at the bottom -->
Configure Dedup.
To try and accelerate this puppy, I set the "age to dedup" to 0 days.
http://www.techrepublic.com/blog/data-center/windows-server-2012-deduplicati...
---
Time to make something to dedup. We'll just duplicate the warning.txt file
that exists on D:
In powershell:
PS> D:
PS> $file = Get-Content DATALOSS_WARNING_README.txt
Then, do these 2 commands a bunch of times until "big.txt" gets to say 6MB
PS> Add-Content big.txt $file
PS> $file = Get-Content big.txt
Then use windows explorer (or other) to make a dozen copies of big.txt
Copy c:\windows\explorer.exe to D:
to give it something to dedup
Go to D: and then copy-paste explorer.exe a dozen times.
In PowerShell, type:
PS> Update-DedupStatus -Volume D:
PS> Start-DedupStatus -Type Optimization -Volume D:
and then wait for it to finish.
you can track its progress with:
PS> Get-DedupJob
PS> Get-DedupStatus -Volume D:
---
So, once its deduped, you check.
PS> FSUTIL REPARSEPOINT QUERY big.txt
you should see that its a reparse point with that 0x800etc0013 code.
Copy-paste big.txt to big2.txt and check it with the query, and it should
tell you big2 is NOT a reparse point.
NOW TO TEST !
----
On 23 July 2015 at 13:57, Paul Harris

Note, i've reposted this in a different form onto the boost-dev list,
which I assume is the proper forum for the next step (fixing the bug)
On 23 July 2015 at 15:11, Paul Harris

On 23 Jul 2015 at 15:11, Paul Harris wrote:
I appreciate the instructions, but by "testing" I mean a regular continuous integration pass, not a manual once off. I have tagged this issue at https://github.com/BoostGSoC13/boost.afio/issues/83 and I'll see if I can set up a Jenkins install of Win Server at some point. Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/
participants (2)
-
Niall Douglas
-
Paul Harris