diff mbox series

[2/3] generic/465: only complain about stale disk contents when racing directio

Message ID 169687551965.3948976.15125603449708923383.stgit@frogsfrogsfrogs (mailing list archive)
State New, archived
Headers show
Series fstests: random fixes for v2023.10.08 | expand

Commit Message

Darrick J. Wong Oct. 9, 2023, 6:18 p.m. UTC
From: Darrick J. Wong <djwong@kernel.org>

This test does a strange thing with directio -- it races a reader thread
with an appending aio writer thread and checks that the reader thread
only ever sees a (probably short) buffer containing the same contents
that are being read.

However, this has never worked correctly on XFS, which supports
concurrent readers and writers for directio.  Say you already have a
file with a single written mapping A:

AAAAAAAAAA
0        EOF

Then one thread initiates an aligned appending write:

AAAAAAAAAA---------
0        EOF      new_EOF

However, the free space is fragmented, so the file range maps to
multiple extents b and c (lowercase means unwritten here):

AAAAAAAAAAbbbbccccc
0        EOF      new_EOF

This implies separate bios for b and c.  Both bios are issued, but c
completes first.  The ioend for c will extend i_size all the way to
new_EOF.  Extent b is still marked unwritten because it hasn't completed
yet.

Next, the test reader slips in and tries to read the range between the
old EOF and the new EOF.  The file looks like this now:

AAAAAAAAAAbbbbCCCCC
0        EOF      new_EOF

So the reader sees "bbbbCCCCC" in the mapping, and the buffer returned
contains a range of zeroes followed by whatever was written to C.

For pagecache IO I would say that i_size should not be extended until
the extending write is fully complete, but the pagecache also
coordinates access so that reads and writes cannot conflict.

However, this is directio.  Reads and writes to the storage device can
be issued and acknowledged in any order.  I asked Ted and Jan about this
point, and they echoed that for directio it's expected that application
software must coordinate access themselves.

In other words, the only thing that the reader can check here is that
the filesystem is not returning stale disk contents.  Amend the test so
that null bytes in the reader buffer are acceptable.

Cc: tytso@mit.edu, jack@suse.cz
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 .../aio-dio-append-write-read-race.c               |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Christoph Hellwig Oct. 10, 2023, 7:03 a.m. UTC | #1
Yes, this looks sensible.  I've actually seen sproadic failures from
this test for a while and had this on my list to investigate.

Reviewed-by: Christoph Hellwig <hch@lst.de>
diff mbox series

Patch

diff --git a/src/aio-dio-regress/aio-dio-append-write-read-race.c b/src/aio-dio-regress/aio-dio-append-write-read-race.c
index 911f27230b..d9f8982f00 100644
--- a/src/aio-dio-regress/aio-dio-append-write-read-race.c
+++ b/src/aio-dio-regress/aio-dio-append-write-read-race.c
@@ -191,7 +191,7 @@  int main(int argc, char *argv[])
 		}
 
 		for (j = 0; j < rdata.read_sz; j++) {
-			if (rdata.buf[j] != 'a') {
+			if (rdata.buf[j] != 'a' && rdata.buf[j] != 0) {
 				fail("encounter an error: "
 					"block %d offset %d, content %x\n",
 					i, j, rbuf[j]);