[filesystem] problem: is_regular_file and deduplified files (reparse+sparse)
Hi guys, I'm seeing a lot of failures all of a sudden, its a serious problem. I imagine some of you may start seeing the same failures too very soon. I'm seeing it with boost 1.58.0. There is a related bug already entered here: https://svn.boost.org/trac/boost/ticket/11057 I can't seem to log on and update this ticket. But the problem extends further than that ticket. Windows Server 2003 support has been dropped by Microsoft, and a lot of IT depts around the world are switching over to the latest Windows Server. With this server comes the new "dedup" feature, that can automatically deduplify files. This happens on a schedule, eg 2am saturday. So suddenly we are getting messages of failures of software from all over the place, due to fs::is_regular_file() Deduped files have the REPARSE and SPARSE flag set. On the command line, you can run FSUTIL REPARSEPOINT QUERY and the "Reparse Tag Value" is 0x80000013 Which is a relatively new flag known as IO_REPARSE_TAG_DEDUP https://msdn.microsoft.com/en-us/library/windows/desktop/aa365740%28v=vs.85%... These files act as normal files, you can fopen and fread them, so I assume they should be treated almost like symlink by boost... perhaps not quite a symlink because I assume the "lstat" link properties are identical to the file's stat properties. Typically, I iterate over directories and only process files if fs::is_regular_file(filename) is true. I wrote some code to check what the properties were on these files, and its not any of the possible enums detected by file_status::type(). ideas? Best regards, Paul
On 23 Jul 2015 at 8:56, Paul Harris wrote:
With this server comes the new "dedup" feature, that can automatically deduplify files. This happens on a schedule, eg 2am saturday. So suddenly we are getting messages of failures of software from all over the place, due to fs::is_regular_file()
Deduped files have the REPARSE and SPARSE flag set. On the command line, you can run FSUTIL REPARSEPOINT QUERY
and the "Reparse Tag Value" is 0x80000013
Which is a relatively new flag known as IO_REPARSE_TAG_DEDUP https://msdn.microsoft.com/en-us/library/windows/desktop/aa365740%28v=vs.85%...
These files act as normal files, you can fopen and fread them, so I assume they should be treated almost like symlink by boost... perhaps not quite a symlink because I assume the "lstat" link properties are identical to the file's stat properties.
Typically, I iterate over directories and only process files if fs::is_regular_file(filename) is true.
I wrote some code to check what the properties were on these files, and its not any of the possible enums detected by file_status::type().
ideas?
Proposed Boost.AFIO doesn't support IO_REPARSE_TAG_DEDUP because I have no access to any system to test the support upon. However, if AFIO were to support IO_REPARSE_TAG_DEDUP, it would treat it identically to a symlink/junction point. I'd suggest Boost.Filesystem do the same, and treat pseudo-symlinks as symlinks. That probably means adding full symlink support for Filesystem on Windows. Here are some links to example implementation code: Reading a symlink target: https://github.com/BoostGSoC13/boost.afio/blob/master/include/boost/af io/v2/detail/impl/afio_iocp.ipp#L511 Writing a symlink: https://github.com/BoostGSoC13/boost.afio/blob/master/include/boost/af io/v2/detail/impl/afio_iocp.ipp#L848 Obviously best not allow rewriting a pseudo-symlink like IO_REPARSE_TAG_DEDUP, make it read only. Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/
Hi Niall, you can use the Azure to test this sort of thing... I think. I'm
trying it out now.
http://blogs.technet.com/b/tommypatterson/p/azureservertrial.aspx
On 23 July 2015 at 09:54, Niall Douglas
On 23 Jul 2015 at 8:56, Paul Harris wrote:
With this server comes the new "dedup" feature, that can automatically deduplify files. This happens on a schedule, eg 2am saturday. So suddenly we are getting messages of failures of software from all over the place, due to fs::is_regular_file()
Deduped files have the REPARSE and SPARSE flag set. On the command line, you can run FSUTIL REPARSEPOINT QUERY
and the "Reparse Tag Value" is 0x80000013
Which is a relatively new flag known as IO_REPARSE_TAG_DEDUP
https://msdn.microsoft.com/en-us/library/windows/desktop/aa365740%28v=vs.85%...
These files act as normal files, you can fopen and fread them, so I
assume
they should be treated almost like symlink by boost... perhaps not quite a symlink because I assume the "lstat" link properties are identical to the file's stat properties.
Typically, I iterate over directories and only process files if fs::is_regular_file(filename) is true.
I wrote some code to check what the properties were on these files, and its not any of the possible enums detected by file_status::type().
ideas?
Proposed Boost.AFIO doesn't support IO_REPARSE_TAG_DEDUP because I have no access to any system to test the support upon.
However, if AFIO were to support IO_REPARSE_TAG_DEDUP, it would treat it identically to a symlink/junction point.
I'd suggest Boost.Filesystem do the same, and treat pseudo-symlinks as symlinks. That probably means adding full symlink support for Filesystem on Windows. Here are some links to example implementation code:
Reading a symlink target: https://github.com/BoostGSoC13/boost.afio/blob/master/include/boost/af io/v2/detail/impl/afio_iocp.ipp#L511
Writing a symlink: https://github.com/BoostGSoC13/boost.afio/blob/master/include/boost/af io/v2/detail/impl/afio_iocp.ipp#L848
Obviously best not allow rewriting a pseudo-symlink like IO_REPARSE_TAG_DEDUP, make it read only.
Niall
-- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
FYI, I followed the blog article,
then once the machine was "running" I clicked Connect at the bottom.
That gave me an .rdp file which in theory I could use with rdesktop, but it
uses a DNS name that was only just created, so that didn't work.
When you click the name of the server in the list, it shows the public IP
on the right.. and the port
then you can do this
$ rdesktop that.ip.addr:port
But only if you have the latest rdesktop AND you have set up kerberos
something-something.
Instead I found a windows computer and used remote desktop from there.
---
Once inside,
in the "Server Manager --> Dashboard" window on the screen, click "Add
Roles"
then go next next until "Server Roles"
expand "File and Storage services" , "File and iSCSI" , and tick "Data
Deduplication"
Then next next etc and Install.
Wait a bit... and its done.
http://www.techrepublic.com/blog/data-center/configuring-windows-server-8-de...
---
Continuing on that webpage...
Time to enable dedup. There is a temp disk D: so lets enable there.
Method 1... I did this and then went to method 2... Start PowerShell, type:
"Enable-DedupVolume D:"
Method 2... in that same Dashboard, hit the 4th button (File and Storage
Services)
Then Volumes --> Disks
click Volume 1 at the top, and then right click D: at the bottom -->
Configure Dedup.
To try and accelerate this puppy, I set the "age to dedup" to 0 days.
http://www.techrepublic.com/blog/data-center/windows-server-2012-deduplicati...
---
Time to make something to dedup. We'll just duplicate the warning.txt file
that exists on D:
In powershell:
PS> D:
PS> $file = Get-Content DATALOSS_WARNING_README.txt
Then, do these 2 commands a bunch of times until "big.txt" gets to say 6MB
PS> Add-Content big.txt $file
PS> $file = Get-Content big.txt
Then use windows explorer (or other) to make a dozen copies of big.txt
Copy c:\windows\explorer.exe to D:
to give it something to dedup
Go to D: and then copy-paste explorer.exe a dozen times.
In PowerShell, type:
PS> Update-DedupStatus -Volume D:
PS> Start-DedupStatus -Type Optimization -Volume D:
and then wait for it to finish.
you can track its progress with:
PS> Get-DedupJob
PS> Get-DedupStatus -Volume D:
---
So, once its deduped, you check.
PS> FSUTIL REPARSEPOINT QUERY big.txt
you should see that its a reparse point with that 0x800etc0013 code.
Copy-paste big.txt to big2.txt and check it with the query, and it should
tell you big2 is NOT a reparse point.
NOW TO TEST !
----
On 23 July 2015 at 13:57, Paul Harris
Hi Niall, you can use the Azure to test this sort of thing... I think. I'm trying it out now.
http://blogs.technet.com/b/tommypatterson/p/azureservertrial.aspx
On 23 July 2015 at 09:54, Niall Douglas
wrote: On 23 Jul 2015 at 8:56, Paul Harris wrote:
With this server comes the new "dedup" feature, that can automatically deduplify files. This happens on a schedule, eg 2am saturday. So suddenly we are getting messages of failures of software from all over the place, due to fs::is_regular_file()
Deduped files have the REPARSE and SPARSE flag set. On the command line, you can run FSUTIL REPARSEPOINT QUERY
and the "Reparse Tag Value" is 0x80000013
Which is a relatively new flag known as IO_REPARSE_TAG_DEDUP
https://msdn.microsoft.com/en-us/library/windows/desktop/aa365740%28v=vs.85%...
These files act as normal files, you can fopen and fread them, so I
they should be treated almost like symlink by boost... perhaps not quite a symlink because I assume the "lstat" link properties are identical to
assume the
file's stat properties.
Typically, I iterate over directories and only process files if fs::is_regular_file(filename) is true.
I wrote some code to check what the properties were on these files, and its not any of the possible enums detected by file_status::type().
ideas?
Proposed Boost.AFIO doesn't support IO_REPARSE_TAG_DEDUP because I have no access to any system to test the support upon.
However, if AFIO were to support IO_REPARSE_TAG_DEDUP, it would treat it identically to a symlink/junction point.
I'd suggest Boost.Filesystem do the same, and treat pseudo-symlinks as symlinks. That probably means adding full symlink support for Filesystem on Windows. Here are some links to example implementation code:
Reading a symlink target: https://github.com/BoostGSoC13/boost.afio/blob/master/include/boost/af io/v2/detail/impl/afio_iocp.ipp#L511 https://github.com/BoostGSoC13/boost.afio/blob/master/include/boost/afio/v2/...
Writing a symlink: https://github.com/BoostGSoC13/boost.afio/blob/master/include/boost/af io/v2/detail/impl/afio_iocp.ipp#L848 https://github.com/BoostGSoC13/boost.afio/blob/master/include/boost/afio/v2/...
Obviously best not allow rewriting a pseudo-symlink like IO_REPARSE_TAG_DEDUP, make it read only.
Niall
-- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
Note, i've reposted this in a different form onto the boost-dev list,
which I assume is the proper forum for the next step (fixing the bug)
On 23 July 2015 at 15:11, Paul Harris
FYI, I followed the blog article, then once the machine was "running" I clicked Connect at the bottom. That gave me an .rdp file which in theory I could use with rdesktop, but it uses a DNS name that was only just created, so that didn't work.
When you click the name of the server in the list, it shows the public IP on the right.. and the port then you can do this $ rdesktop that.ip.addr:port
But only if you have the latest rdesktop AND you have set up kerberos something-something.
Instead I found a windows computer and used remote desktop from there.
---
Once inside, in the "Server Manager --> Dashboard" window on the screen, click "Add Roles" then go next next until "Server Roles" expand "File and Storage services" , "File and iSCSI" , and tick "Data Deduplication" Then next next etc and Install. Wait a bit... and its done.
http://www.techrepublic.com/blog/data-center/configuring-windows-server-8-de...
---
Continuing on that webpage... Time to enable dedup. There is a temp disk D: so lets enable there.
Method 1... I did this and then went to method 2... Start PowerShell, type: "Enable-DedupVolume D:"
Method 2... in that same Dashboard, hit the 4th button (File and Storage Services) Then Volumes --> Disks click Volume 1 at the top, and then right click D: at the bottom --> Configure Dedup.
To try and accelerate this puppy, I set the "age to dedup" to 0 days.
http://www.techrepublic.com/blog/data-center/windows-server-2012-deduplicati...
---
Time to make something to dedup. We'll just duplicate the warning.txt file that exists on D:
In powershell: PS> D: PS> $file = Get-Content DATALOSS_WARNING_README.txt
Then, do these 2 commands a bunch of times until "big.txt" gets to say 6MB PS> Add-Content big.txt $file PS> $file = Get-Content big.txt
Then use windows explorer (or other) to make a dozen copies of big.txt
Copy c:\windows\explorer.exe to D: to give it something to dedup Go to D: and then copy-paste explorer.exe a dozen times.
In PowerShell, type: PS> Update-DedupStatus -Volume D: PS> Start-DedupStatus -Type Optimization -Volume D:
and then wait for it to finish. you can track its progress with: PS> Get-DedupJob PS> Get-DedupStatus -Volume D:
---
So, once its deduped, you check. PS> FSUTIL REPARSEPOINT QUERY big.txt you should see that its a reparse point with that 0x800etc0013 code.
Copy-paste big.txt to big2.txt and check it with the query, and it should tell you big2 is NOT a reparse point.
NOW TO TEST !
----
On 23 July 2015 at 13:57, Paul Harris
wrote: Hi Niall, you can use the Azure to test this sort of thing... I think. I'm trying it out now.
http://blogs.technet.com/b/tommypatterson/p/azureservertrial.aspx
On 23 July 2015 at 09:54, Niall Douglas
wrote: On 23 Jul 2015 at 8:56, Paul Harris wrote:
With this server comes the new "dedup" feature, that can automatically deduplify files. This happens on a schedule, eg 2am saturday. So suddenly we are getting messages of failures of software from all over the place, due to fs::is_regular_file()
Deduped files have the REPARSE and SPARSE flag set. On the command line, you can run FSUTIL REPARSEPOINT QUERY
and the "Reparse Tag Value" is 0x80000013
Which is a relatively new flag known as IO_REPARSE_TAG_DEDUP
https://msdn.microsoft.com/en-us/library/windows/desktop/aa365740%28v=vs.85%...
These files act as normal files, you can fopen and fread them, so I
they should be treated almost like symlink by boost... perhaps not quite a symlink because I assume the "lstat" link properties are identical to
assume the
file's stat properties.
Typically, I iterate over directories and only process files if fs::is_regular_file(filename) is true.
I wrote some code to check what the properties were on these files, and its not any of the possible enums detected by file_status::type().
ideas?
Proposed Boost.AFIO doesn't support IO_REPARSE_TAG_DEDUP because I have no access to any system to test the support upon.
However, if AFIO were to support IO_REPARSE_TAG_DEDUP, it would treat it identically to a symlink/junction point.
I'd suggest Boost.Filesystem do the same, and treat pseudo-symlinks as symlinks. That probably means adding full symlink support for Filesystem on Windows. Here are some links to example implementation code:
Reading a symlink target: https://github.com/BoostGSoC13/boost.afio/blob/master/include/boost/af io/v2/detail/impl/afio_iocp.ipp#L511 https://github.com/BoostGSoC13/boost.afio/blob/master/include/boost/afio/v2/...
Writing a symlink: https://github.com/BoostGSoC13/boost.afio/blob/master/include/boost/af io/v2/detail/impl/afio_iocp.ipp#L848 https://github.com/BoostGSoC13/boost.afio/blob/master/include/boost/afio/v2/...
Obviously best not allow rewriting a pseudo-symlink like IO_REPARSE_TAG_DEDUP, make it read only.
Niall
-- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
On 23 Jul 2015 at 15:11, Paul Harris wrote:
FYI, I followed the blog article, then once the machine was "running" I clicked Connect at the bottom. That gave me an .rdp file which in theory I could use with rdesktop, but it uses a DNS name that was only just created, so that didn't work.
I appreciate the instructions, but by "testing" I mean a regular continuous integration pass, not a manual once off. I have tagged this issue at https://github.com/BoostGSoC13/boost.afio/issues/83 and I'll see if I can set up a Jenkins install of Win Server at some point. Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/
participants (2)
-
Niall Douglas
-
Paul Harris