diff mbox series

[3/3] compat/mingw: support POSIX semantics for atomic renames

Message ID d17ca1c7ce38e12fe113f15b078c12bc92e8f0aa.1729695349.git.ps@pks.im (mailing list archive)
State Superseded
Headers show
Series compat/mingw: implement POSIX-style atomic renames | expand

Commit Message

Patrick Steinhardt Oct. 23, 2024, 3:05 p.m. UTC
By default, Windows restricts access to files when those files have been
opened by another process. As explained in the preceding commits, these
restrictions can be loosened such that reads, writes and/or deletes of
files with open handles _are_ allowed.

While we set up those sharing flags in most relevant code paths now, we
still don't properly handle POSIX-style atomic renames in case the
target path is open. This is failure demonstrated by t0610, where one of
our tests spawns concurrent writes in a reftable-enabled repository and
expects all of them to succeed. This test fails most of the time because
the process that has acquired the "tables.list" lock is unable to rename
it into place while other processes are busy reading that file.

Windows 10 has introduced the `FILE_RENAME_FLAG_POSIX_SEMANTICS` flag
that allows us to fix this usecase [1]. When set, it is possible to
rename a file over a preexisting file even when the target file still
has handles open. Those handles must have been opened with the
`FILE_SHARE_DELETE` flag, which we have ensured in the preceding
commits.

Careful readers might have noticed that [1] does not mention the above
flag, but instead mentions `FILE_RENAME_POSIX_SEMANICS`. This flag is
not for use with `SetFileInformationByHandle()` though, which is what we
use. And while the `FILE_RENAME_FLAG_POSIX_SEMANTICS` flag exists, it is
not documented on [2] or anywhere else as far as I can tell.

Unfortuntaly, we still support Windows systems older than Windows 10
that do not yet have this new flag. Our `_WIN32_WINNT` SDK version still
targets 0x0600, which is Windows Vista and later. And even though that
Windows version is out-of-support, bumping the SDK version all the way
to 0x0A00, which is Windows 10 and later, is not an option as it would
make it impossible to compile on Windows 8.1, which is still supported.
Instead, we have to manually declare the relevant infrastructure to make
this feature available and have fallback logic in place in case we run
on a Windows version that does not yet have this flag.

On another note: `mingw_rename()` has a retry loop that is used in case
deleting a file failed because it's still open in another process. One
might be pressed to not use this loop anymore when we can use POSIX
semantics. But unfortuntaley, we have to keep it around due to our
dependence on the `FILE_SHARE_DELETE` flag. While we know to set that
sharing flag now, other applications may not do so and may thus still
cause sharing violations when we try to rename a file.

This fixes concurrent writes in the reftable backend as demonstrated in
t0610, but may also end up fixing other usecases where Git wants to
perform renames.

[1]: https://learn.microsoft.com/en-us/windows-hardware/drivers/ddi/ntifs/ns-ntifs-_file_rename_information
[2]: https://learn.microsoft.com/en-us/windows/win32/api/winbase/ns-winbase-file_rename_info

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 compat/mingw.c             | 87 ++++++++++++++++++++++++++++++++++++--
 t/t0610-reftable-basics.sh |  8 ++--
 2 files changed, 88 insertions(+), 7 deletions(-)

Comments

Kristoffer Haugsbakk Oct. 23, 2024, 4:19 p.m. UTC | #1
On Wed, Oct 23, 2024, at 17:05, Patrick Steinhardt wrote:
> […]
> Careful readers might have noticed that [1] does not mention the above
> flag, but instead mentions `FILE_RENAME_POSIX_SEMANICS`. This flag is

s/FILE_RENAME_POSIX_SEMANICS/FILE_RENAME_FLAG_POSIX_SEMANTICS/

> not for use with `SetFileInformationByHandle()` though, which is what we
> use. And while the `FILE_RENAME_FLAG_POSIX_SEMANTICS` flag exists, it is
> not documented on [2] or anywhere else as far as I can tell.
>
> Unfortuntaly, we still support Windows systems older than Windows 10

s/Unfortuntaly/Unfortunately/

> […]
>
> On another note: `mingw_rename()` has a retry loop that is used in case
> deleting a file failed because it's still open in another process. One
> might be pressed to not use this loop anymore when we can use POSIX
> semantics. But unfortuntaley, we have to keep it around due to our

s/unfortuntaley/unfortunately/

> [1]:
> https://learn.microsoft.com/en-us/windows-hardware/drivers/ddi/ntifs/ns-ntifs-_file_rename_information
> [2]:
> https://learn.microsoft.com/en-us/windows/win32/api/winbase/ns-winbase-file_rename_info
>
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
>  compat/mingw.c             | 87 ++++++++++++++++++++++++++++++++++++--
>  t/t0610-reftable-basics.sh |  8 ++--
>  2 files changed, 88 insertions(+), 7 deletions(-)
> […]
> +
>  	/* TODO: translate more errors */

It seems that `Documentation` doesn’t mention the difference between
`TODO` and `NEEDSWORK`.  What is it?

> […]
> --
> 2.47.0.118.gfd3785337b.dirty
Taylor Blau Oct. 23, 2024, 6:30 p.m. UTC | #2
On Wed, Oct 23, 2024 at 05:05:03PM +0200, Patrick Steinhardt wrote:
> ---
>  compat/mingw.c             | 87 ++++++++++++++++++++++++++++++++++++--
>  t/t0610-reftable-basics.sh |  8 ++--
>  2 files changed, 88 insertions(+), 7 deletions(-)

All well reasoned and explained as usual. I'm glossing over the
Windows-specific parts, though they look correct to me if I squint and
pretend that I have even a passing familiarity with the platform ;-).

This looks good, and I am glad to hear that it was tested on Windows 8.1
by Johannes, and on Windows 10 by you. Do we have any reason to believe
it would break on Vista? If so, should we test there as well?

Otherwise, this is looking good, modulo the handful of typos that was
noticed earlier in the thread on this patch. Thanks to you and Johannes
for working on this together!

Thanks,
Taylor
Patrick Steinhardt Oct. 24, 2024, 6:30 a.m. UTC | #3
On Wed, Oct 23, 2024 at 06:19:43PM +0200, Kristoffer Haugsbakk wrote:
> On Wed, Oct 23, 2024, at 17:05, Patrick Steinhardt wrote:
> > […]
> > Careful readers might have noticed that [1] does not mention the above
> > flag, but instead mentions `FILE_RENAME_POSIX_SEMANICS`. This flag is
> 
> s/FILE_RENAME_POSIX_SEMANICS/FILE_RENAME_FLAG_POSIX_SEMANTICS/

No, this is actually intentional. There are two different flags:

  - FILE_RENAME_POSIX_SEMANTICS is what ntifs.h exposes and what I've
    been linking for documentation. This flag cannot be used though with
    `FileRenameInfoEx`.

  - FILE_RENAME_FLAG_POSIX_SEMANTICS is what needs to be used with
    `FileRenameInfoEx`, but it is undocumented.

> >  	/* TODO: translate more errors */
> 
> It seems that `Documentation` doesn’t mention the difference between
> `TODO` and `NEEDSWORK`.  What is it?

No idea, to be honest.

Patrick
Kristoffer Haugsbakk Oct. 24, 2024, 7:18 a.m. UTC | #4
On Thu, Oct 24, 2024, at 08:30, Patrick Steinhardt wrote:
>> s/FILE_RENAME_POSIX_SEMANICS/FILE_RENAME_FLAG_POSIX_SEMANTICS/
>
> No, this is actually intentional. There are two different flags:
>
>   - FILE_RENAME_POSIX_SEMANTICS is what ntifs.h exposes and what I've
>     been linking for documentation. This flag cannot be used though with
>     `FileRenameInfoEx`.
>
>   - FILE_RENAME_FLAG_POSIX_SEMANTICS is what needs to be used with
>     `FileRenameInfoEx`, but it is undocumented.
>

I made a copy–paste mistake. :P I should have just pointed out this
(missing T):

    _SEMANICS

So you wrote it the correct-looking way in your first bullet
point here.
Patrick Steinhardt Oct. 24, 2024, 7:20 a.m. UTC | #5
On Thu, Oct 24, 2024 at 09:18:18AM +0200, Kristoffer Haugsbakk wrote:
> On Thu, Oct 24, 2024, at 08:30, Patrick Steinhardt wrote:
> >> s/FILE_RENAME_POSIX_SEMANICS/FILE_RENAME_FLAG_POSIX_SEMANTICS/
> >
> > No, this is actually intentional. There are two different flags:
> >
> >   - FILE_RENAME_POSIX_SEMANTICS is what ntifs.h exposes and what I've
> >     been linking for documentation. This flag cannot be used though with
> >     `FileRenameInfoEx`.
> >
> >   - FILE_RENAME_FLAG_POSIX_SEMANTICS is what needs to be used with
> >     `FileRenameInfoEx`, but it is undocumented.
> >
> 
> I made a copy–paste mistake. :P I should have just pointed out this
> (missing T):
> 
>     _SEMANICS
> 
> So you wrote it the correct-looking way in your first bullet
> point here.

Oh, indeed! Thanks.

Patrick
diff mbox series

Patch

diff --git a/compat/mingw.c b/compat/mingw.c
index ce95f3a5968..df78f61f7f9 100644
--- a/compat/mingw.c
+++ b/compat/mingw.c
@@ -2206,10 +2206,16 @@  int mingw_accept(int sockfd1, struct sockaddr *sa, socklen_t *sz)
 #undef rename
 int mingw_rename(const char *pold, const char *pnew)
 {
+	static int supports_file_rename_info_ex = 1;
 	DWORD attrs, gle;
 	int tries = 0;
 	wchar_t wpold[MAX_PATH], wpnew[MAX_PATH];
-	if (xutftowcs_path(wpold, pold) < 0 || xutftowcs_path(wpnew, pnew) < 0)
+	int wpnew_len;
+
+	if (xutftowcs_path(wpold, pold) < 0)
+		return -1;
+	wpnew_len = xutftowcs_path(wpnew, pnew);
+	if (wpnew_len < 0)
 		return -1;
 
 	/*
@@ -2220,11 +2226,84 @@  int mingw_rename(const char *pold, const char *pnew)
 		return 0;
 	if (errno != EEXIST)
 		return -1;
+
 repeat:
-	if (MoveFileExW(wpold, wpnew, MOVEFILE_REPLACE_EXISTING))
-		return 0;
+	if (supports_file_rename_info_ex) {
+		/*
+		 * Our minimum required Windows version is still set to Windows
+		 * Vista. We thus have to declare required infrastructure for
+		 * FileRenameInfoEx ourselves until we bump _WIN32_WINNT to
+		 * 0x0A00. Furthermore, we have to handle cases where the
+		 * FileRenameInfoEx call isn't supported yet.
+		 */
+#define FILE_RENAME_FLAG_REPLACE_IF_EXISTS                  0x00000001
+#define FILE_RENAME_FLAG_POSIX_SEMANTICS                    0x00000002
+		FILE_INFO_BY_HANDLE_CLASS FileRenameInfoEx = 22;
+		struct {
+			/*
+			 * This is usually an unnamed union, but that is not
+			 * part of ISO C99. We thus inline the field, as we
+			 * really only care for the Flags field anyway.
+			 */
+			DWORD Flags;
+			HANDLE RootDirectory;
+			DWORD FileNameLength;
+			/*
+			 * The actual structure is defined with a single-character
+			 * flex array so that the structure has to be allocated on
+			 * the heap. As we declare this structure ourselves though
+			 * we can avoid the allocation and define FileName to have
+			 * MAX_PATH bytes.
+			 */
+			WCHAR FileName[MAX_PATH];
+		} rename_info = { 0 };
+		HANDLE old_handle = INVALID_HANDLE_VALUE;
+		BOOL success;
+
+		old_handle = CreateFileW(wpold, DELETE,
+					 FILE_SHARE_WRITE | FILE_SHARE_READ | FILE_SHARE_DELETE,
+					 NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);
+		if (old_handle == INVALID_HANDLE_VALUE) {
+			errno = err_win_to_posix(GetLastError());
+			return -1;
+		}
+
+		rename_info.Flags = FILE_RENAME_FLAG_REPLACE_IF_EXISTS |
+				    FILE_RENAME_FLAG_POSIX_SEMANTICS;
+		rename_info.FileNameLength = wpnew_len * sizeof(WCHAR);
+		memcpy(rename_info.FileName, wpnew, wpnew_len * sizeof(WCHAR));
+
+		success = SetFileInformationByHandle(old_handle, FileRenameInfoEx,
+						     &rename_info, sizeof(rename_info));
+		gle = GetLastError();
+		CloseHandle(old_handle);
+		if (success)
+			return 0;
+
+		/*
+		 * When we see ERROR_INVALID_PARAMETER we can assume that the
+		 * current system doesn't support FileRenameInfoEx. Keep us
+		 * from using it in future calls and retry.
+		 */
+		if (gle == ERROR_INVALID_PARAMETER) {
+			supports_file_rename_info_ex = 0;
+			goto repeat;
+		}
+
+		/*
+		 * In theory, we shouldn't get ERROR_ACCESS_DENIED because we
+		 * always open files with FILE_SHARE_DELETE But in practice we
+		 * cannot assume that Git is the only one accessing files, and
+		 * other applications may not set FILE_SHARE_DELETE. So we have
+		 * to retry.
+		 */
+	} else {
+		if (MoveFileExW(wpold, wpnew, MOVEFILE_REPLACE_EXISTING))
+			return 0;
+		gle = GetLastError();
+	}
+
 	/* TODO: translate more errors */
-	gle = GetLastError();
 	if (gle == ERROR_ACCESS_DENIED &&
 	    (attrs = GetFileAttributesW(wpnew)) != INVALID_FILE_ATTRIBUTES) {
 		if (attrs & FILE_ATTRIBUTE_DIRECTORY) {
diff --git a/t/t0610-reftable-basics.sh b/t/t0610-reftable-basics.sh
index babec7993e3..eaf6fab6d29 100755
--- a/t/t0610-reftable-basics.sh
+++ b/t/t0610-reftable-basics.sh
@@ -450,10 +450,12 @@  test_expect_success 'ref transaction: retry acquiring tables.list lock' '
 	)
 '
 
-# This test fails most of the time on Windows systems. The root cause is
+# This test fails most of the time on Cygwin systems. The root cause is
 # that Windows does not allow us to rename the "tables.list.lock" file into
-# place when "tables.list" is open for reading by a concurrent process.
-test_expect_success !WINDOWS 'ref transaction: many concurrent writers' '
+# place when "tables.list" is open for reading by a concurrent process. We have
+# worked around that in our MinGW-based rename emulation, but the Cygwin
+# emulation seems to be insufficient.
+test_expect_success !CYGWIN 'ref transaction: many concurrent writers' '
 	test_when_finished "rm -rf repo" &&
 	git init repo &&
 	(