diff mbox series

reftable: fix tests being broken by NFS' delete-after-close semantics

Message ID 8ac5e94a3930cdd2aee9ea86acda3155674b635c.1711035529.git.ps@pks.im (mailing list archive)
State New
Headers show
Series reftable: fix tests being broken by NFS' delete-after-close semantics | expand

Commit Message

Patrick Steinhardt March 21, 2024, 3:39 p.m. UTC
It was reported that the reftable unit tests in t0032 fail with the
following assertion when running on top of NFS:

    running test_reftable_stack_compaction_concurrent_clean
    reftable/stack_test.c: 1063: failed assertion count_dir_entries(dir) == 2
    Aborted

Setting a breakpoint immediately before the assertion in fact shows the
following list of files:

    ./stack_test-1027.QJBpnd
    ./stack_test-1027.QJBpnd/0x000000000001-0x000000000003-dad7ac80.ref
    ./stack_test-1027.QJBpnd/.nfs000000000001729f00001e11
    ./stack_test-1027.QJBpnd/tables.list

Note the weird ".nfs*" file? This file is maintained by NFS clients in
order to emulate delete-after-last-close semantics that we rely on in
the reftable code [1]. Instead of unlinking the file right away and
keeping it open in the client, the NFS client will rename it to ".nfs*"
and then delete that temporary file when the last reference to it gets
dropped. Quoting the NFS FAQ:

    > D2. What is a "silly rename"? Why do these .nfsXXXXX files keep
    > showing up?
    >
    > A. Unix applications often open a scratch file and then unlink it.
    > They do this so that the file is not visible in the file system name
    > space to any other applications, and so that the system will
    > automatically clean up (delete) the file when the application exits.
    > This is known as "delete on last close", and is a tradition among
    > Unix applications.
    >
    > Because of the design of the NFS protocol, there is no way for a
    > file to be deleted from the name space but still remain in use by an
    > application. Thus NFS clients have to emulate this using what
    > already exists in the protocol. If an open file is unlinked, an NFS
    > client renames it to a special name that looks like ".nfsXXXXX".
    > This "hides" the file while it remains in use. This is known as a
    > "silly rename." Note that NFS servers have nothing to do with this
    > behavior.

This of course throws off the assertion that we got exactly two files in
that directory.

The test in question triggers this behaviour by holding two open file
descriptors to the "tables.list" file. One of the references is because
we are about to append to the stack, whereas the other reference is
because we want to compact it. As the compaction has just finished we
already rewrote "tables.list" to point to the new contents, but the
other file descriptor pointing to the old version is still open. Thus we
trigger the delete-after-last-close emulation.

Furthermore, it was reported that this behaviour only triggers with
4f36b8597c (reftable/stack: fix race in up-to-date check, 2024-01-18).
This is expected as well because it is the first point in time where we
actually keep the "tables.list" file descriptor open for the stat cache.

Fix this bug by skipping over any files that start with a leading dot
when counting files. While we could explicitly check for a prefix of
".nfs", other network file systems like SMB for example do the same
trickery but with a ".smb" prefix. In any case though, this loosening of
the assertion should be fine given that the reftable library would never
write files with leading dots by itself.

[1]: https://nfs.sourceforge.net/#faq_d2

Reported-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/stack_test.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

Comments

Junio C Hamano March 21, 2024, 5:35 p.m. UTC | #1
Patrick Steinhardt <ps@pks.im> writes:

>  	while ((d = readdir(dir))) {
> -		if (!strcmp(d->d_name, "..") || !strcmp(d->d_name, "."))
> +		/*
> +		 * Besides skipping over "." and "..", we also need to
> +		 * skip over other files that have a leading ".". This
> +		 * is due to behaviour of NFS, which will rename files
> +		 * to ".nfs*" to emulate delete-on-last-close.
> +		 *
> +		 * In any case this should be fine as the reftable
> +		 * library will never write files with leading dots
> +		 * anyway.
> +		 */
> +		if (starts_with(d->d_name, "."))
>  			continue;

Sounds good.  We should count what we positively consider what we
wrote, not random cruft somebody else might have dropped in the
directory, and this is a good first thing to do.

Will queue.  Thanks.
Toon claes March 22, 2024, 3:15 p.m. UTC | #2
Patrick Steinhardt <ps@pks.im> writes:

> Fix this bug by skipping over any files that start with a leading dot
> when counting files. While we could explicitly check for a prefix of
> ".nfs", other network file systems like SMB for example do the same
> trickery but with a ".smb" prefix. In any case though, this loosening of
> the assertion should be fine given that the reftable library would never
> write files with leading dots by itself.

I'm fully supportive of this, as this will also fix any issues possibly
caused by .DS_Store files created by Finder on macOS, although it's very
unlikely they will be created in these tests.

--
Toon
diff mbox series

Patch

diff --git a/reftable/stack_test.c b/reftable/stack_test.c
index 509f486623..621578fda8 100644
--- a/reftable/stack_test.c
+++ b/reftable/stack_test.c
@@ -38,7 +38,17 @@  static int count_dir_entries(const char *dirname)
 		return 0;
 
 	while ((d = readdir(dir))) {
-		if (!strcmp(d->d_name, "..") || !strcmp(d->d_name, "."))
+		/*
+		 * Besides skipping over "." and "..", we also need to
+		 * skip over other files that have a leading ".". This
+		 * is due to behaviour of NFS, which will rename files
+		 * to ".nfs*" to emulate delete-on-last-close.
+		 *
+		 * In any case this should be fine as the reftable
+		 * library will never write files with leading dots
+		 * anyway.
+		 */
+		if (starts_with(d->d_name, "."))
 			continue;
 		len++;
 	}