diff mbox

test race when checking i_size on direct i/o read

Message ID 799f63d9-de03-eab3-1e6e-cd747aa04c36@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Eric Sandeen Aug. 18, 2017, 8:35 p.m. UTC
From: Zheng Liu <wenqing.lz@taobao.com>

In this commit a new test case is added to test that i_size races don't
occur under dio reads/writes.  We add a program in /src dir, which
has a writer to issue some append dio writes.  Meanwhile it has a reader
in this test to do some dio reads.  As we expect, reader should read
nothing or data with 'a'.  But it might read some data with '0'.

The bug can be reproduced by this test case [1].

1.  http://patchwork.ozlabs.org/patch/311761/

This ostensibly tests commit:
9fe55eea7 Fix race when checking i_size on direct i/o read

Cc: Christoph Hellwig <hch@infradead.org>
Cc: Rich Johnston <rjohnston@sgi.com>
Cc: Dave Chinner <david@fromorbit.com>
Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
[sandeen: update to recent xfstests, update commitlog]
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
---

This test was originally titled:

 xfstests: add a new test case to test i_size updated properly under dio

but I think the issue is more of when and how it's tested, not how
it's updated.

Note, this passes on xfs on 4.10, but fails on 4.12.
ext4 on 4.10 passes as well but is very slow.

iomap dio maybe?  Not sure yet.

changelog v3:
 * rebase against latest xfstests/master branch
 * update commit log

changelog v2:
 * add '-lpthread' into LLDLIBS


--
To unsubscribe from this list: send the line "unsubscribe fstests" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Eryu Guan Aug. 28, 2017, 10:35 a.m. UTC | #1
On Fri, Aug 18, 2017 at 03:35:02PM -0500, Eric Sandeen wrote:
> From: Zheng Liu <wenqing.lz@taobao.com>
> 
> In this commit a new test case is added to test that i_size races don't
> occur under dio reads/writes.  We add a program in /src dir, which
> has a writer to issue some append dio writes.  Meanwhile it has a reader
> in this test to do some dio reads.  As we expect, reader should read
> nothing or data with 'a'.  But it might read some data with '0'.
> 
> The bug can be reproduced by this test case [1].
> 
> 1.  http://patchwork.ozlabs.org/patch/311761/
> 
> This ostensibly tests commit:
> 9fe55eea7 Fix race when checking i_size on direct i/o read
> 
> Cc: Christoph Hellwig <hch@infradead.org>
> Cc: Rich Johnston <rjohnston@sgi.com>
> Cc: Dave Chinner <david@fromorbit.com>
> Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
> [sandeen: update to recent xfstests, update commitlog]
> Signed-off-by: Eric Sandeen <sandeen@redhat.com>
> ---
> 
> This test was originally titled:
> 
>  xfstests: add a new test case to test i_size updated properly under dio
> 
> but I think the issue is more of when and how it's tested, not how
> it's updated.
> 
> Note, this passes on xfs on 4.10, but fails on 4.12.
> ext4 on 4.10 passes as well but is very slow.
> 
> iomap dio maybe?  Not sure yet.

It failed for me with XFS and btrfs, ext4 passed test, kernel is
4.13-rc5. But I found that the alignment for write buffer matters too
for reproducing the btrfs failure, see below.

> 
> changelog v3:
>  * rebase against latest xfstests/master branch
>  * update commit log

Thanks for picking up the test again! And sorry that I got to it late..

> 
> changelog v2:
>  * add '-lpthread' into LLDLIBS
> 
> diff --git a/configure.ac b/configure.ac
> index 57092f1..4663004 100644
> --- a/configure.ac
> +++ b/configure.ac
> @@ -59,6 +59,7 @@ AC_PACKAGE_NEED_GETXATTR_LIBATTR
>  AC_PACKAGE_NEED_SYS_ACL_H
>  AC_PACKAGE_NEED_ACL_LIBACL_H
>  AC_PACKAGE_NEED_ACLINIT_LIBACL
> +AC_PACKAGE_NEED_PTHREADMUTEXINIT
>  
>  AC_PACKAGE_WANT_GDBM
>  AC_PACKAGE_WANT_AIO
> diff --git a/include/builddefs.in b/include/builddefs.in
> index cb52b99..fcc8b90 100644
> --- a/include/builddefs.in
> +++ b/include/builddefs.in
> @@ -25,6 +25,7 @@ LIBGDBM = @libgdbm@
>  LIBUUID = @libuuid@
>  LIBHANDLE = @libhdl@
>  LIBDM = @libdm@
> +LIBPTHREAD = @libpthread@
>  LIBTEST = $(TOPDIR)/lib/libtest.la
>  prefix = @prefix@
>  
> diff --git a/src/Makefile b/src/Makefile
> index b8aff49..e9419bd 100644
> --- a/src/Makefile
> +++ b/src/Makefile
> @@ -23,7 +23,7 @@ LINUX_TARGETS = xfsctl bstat t_mtab getdevicesize preallo_rw_pattern_reader \
>  	seek_copy_test t_readdir_1 t_readdir_2 fsync-tester nsexec cloner \
>  	renameat2 t_getcwd e4compact test-nextquota punch-alternating \
>  	attr-list-by-handle-cursor-test listxattr dio-interleaved t_dir_type \
> -	dio-invalidate-cache stat_test t_encrypted_d_revalidate
> +	dio-invalidate-cache stat_test t_encrypted_d_revalidate diotest

I think a more specific name would be better than just "diotest" :)

>  
>  SUBDIRS =
>  
> diff --git a/src/diotest.c b/src/diotest.c
> new file mode 100644
> index 0000000..7d2378f
> --- /dev/null
> +++ b/src/diotest.c
> @@ -0,0 +1,166 @@
> +/*
> + * Copyright (c) 2013 Alibaba Group.
> + * All Rights Reserved.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it would be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write the Free Software Foundation,
> + * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
> + */
> +
> +/*
> + * This is a normal case that we do some append dio writes and meanwhile
> + * we do some dio reads.  Currently in vfs we don't ensure that i_size
> + * is updated properly.  Hence the reader will read some data with '0'.
> + * But we expect that the reader should read nothing or data with 'a'.
> + */
> +
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +
> +#include <unistd.h>
> +#include <sys/types.h>
> +#include <sys/stat.h>
> +#include <fcntl.h>
> +#include <errno.h>
> +
> +#include <pthread.h>
> +
> +static char *prog;
> +
> +struct writer_data {
> +	int fd;
> +	size_t blksize;
> +	char *buf;
> +};
> +
> +static void usage(void)
> +{
> +	fprintf(stderr, "usage: %s [FILE]\n", prog);
> +}
> +
> +static void *writer(void *arg)
> +{
> +	struct writer_data *data = (struct writer_data *)arg;
> +	int ret;
> +
> +	ret = write(data->fd, data->buf, data->blksize);
> +	if (ret < 0)
> +		fprintf(stderr, "write file failed: %s\n", strerror(errno));
> +
> +	return NULL;
> +}
> +
> +int main(int argc, char *argv[])
> +{
> +	pthread_t tid;
> +	struct writer_data wdata;
> +	size_t max_blocks = 128;		/* 128 */
> +	size_t blksize = 1 * 1024 * 1024;	/* 1M */
> +	char *rbuf = NULL, *wbuf = NULL;
> +	int rfd = 0, wfd = 0;
> +	int i, j;
> +	int ret = 0;
> +
> +	prog = basename(argv[0]);
> +
> +	if (argc != 2) {
> +		usage();
> +		exit(1);
> +	}
> +
> +	wfd = open(argv[1], O_CREAT|O_DIRECT|O_WRONLY|O_APPEND|O_TRUNC, S_IRWXU);
> +	if (wfd < 0) {
> +		fprintf(stderr, "failed to open write file: %s\n",
> +			strerror(errno));
> +		exit(1);
> +	}
> +
> +	rfd = open(argv[1], O_DIRECT|O_RDONLY, S_IRWXU);
> +	if (wfd < 0) {
> +		fprintf(stderr, "failed to open read file: %s\n",
> +			strerror(errno));
> +		ret = 1;
> +		goto err;
> +	}
> +
> +	/*
> +	 * We set 1024 as an alignment size for write buf.  Feel free to change
> +	 * it with 4096.  But the problem is also hitted.
> +	 */
> +	if (posix_memalign((void **)&wbuf, 1024, blksize)) {

I suspect that a hardcoded 1024 alignment won't work for 4k sector
device, but, as I mentioned above, changing it to 4096 made test on
btrfs pass, even the comment said it didn't matter.

How about passing the alignment requirement as an argument from the
shell script? This can be looked up with _min_dio_alignment.

> +		fprintf(stderr, "failed to alloc memory: %s\n", strerror(errno));
> +		ret = 1;
> +		goto err;
> +	}
> +
> +	if (posix_memalign((void **)&rbuf, 4096, blksize)) {
> +		fprintf(stderr, "failed to alloc memory: %s\n", strerror(errno));
> +		ret = 1;
> +		goto err;
> +	}
> +
> +	memset(wbuf, 'a', blksize);
> +	wdata.fd = wfd;
> +	wdata.blksize = blksize;
> +	wdata.buf = wbuf;
> +
> +	for (i = 0; i < max_blocks; i++) {
> +		void *retval;
> +
> +		if (pthread_create(&tid, NULL, writer, &wdata)) {
> +			fprintf(stderr, "create thread failed: %s\n",
> +				strerror(errno));
> +			ret = 1;
> +			goto err;
> +		}
> +
> +		memset(rbuf, 'b', blksize);
> +		do {
> +			ret = pread(rfd, rbuf, blksize, i * blksize);
> +			if (ret < 0)
> +				fprintf(stderr, "read file failed: %s\n",
> +					strerror(errno));
> +		} while (ret <= 0);
> +
> +		if (pthread_join(tid, &retval)) {
> +			fprintf(stderr, " pthread join failed: %s\n",
> +				strerror(errno));
> +			ret = 1;
> +			goto err;
> +		}
> +
> +		if (ret >= 0) {
> +			for (j = 0; j < ret; j ++) {
> +				if (rbuf[j] != 'a') {
> +					fprintf(stderr, "encounter an error: "
> +						"offset %d content %c\n",
> +						i, rbuf[j]);

This prints a binary zero on failure, and makes diff harder to view.

Binary files tests/generic/452.out and /root/workspace/xfstests/results//xfs_4k/generic/452.out.bad differ

I changed it a bit:
-                                               "offset %d content %c\n",
-                                               i, rbuf[j]);
+                                               "block %d offset %d, content 0x%x\n",
+                                               i, j, rbuf[j]);

and the result looked fine to me:
+encounter an error: block 15 offset 0, content 0x0

> +					ret = 1;
> +					goto err;
> +				}
> +			}
> +		}
> +	}
> +
> +err:
> +	if (rfd)
> +		close(rfd);
> +	if (wfd)
> +		close(wfd);
> +	if (rbuf)
> +		free(rbuf);
> +	if (wbuf)
> +		free(wbuf);
> +
> +	return ret;
> +}
> diff --git a/tests/generic/450 b/tests/generic/450
> new file mode 100755
> index 0000000..cfb424c
> --- /dev/null
> +++ b/tests/generic/450
> @@ -0,0 +1,56 @@
> +#! /bin/bash
> +# FS QA Test No. 450
> +#
> +# Test i_size is updated properly under dio read/write
> +#
> +#-----------------------------------------------------------------------
> +# Copyright (c) 2013 Alibaba Group.  All Rights Reserved.
> +#
> +# This program is free software; you can redistribute it and/or
> +# modify it under the terms of the GNU General Public License as
> +# published by the Free Software Foundation.
> +#
> +# This program is distributed in the hope that it would be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program; if not, write the Free Software Foundation,
> +# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
> +#-----------------------------------------------------------------------
> +#
> +
> +seq=`basename $0`
> +seqres=$RESULT_DIR/$seq
> +echo "QA output created by $seq"
> +
> +here=`pwd`
> +tmp=/tmp/$$
> +status=1	# failure is the default!
> +trap "_cleanup; exit \$status" 0 1 2 3 15
> +
> +_cleanup()
> +{
> +    cd /
> +    rm -f $tmp.* $testfile
> +}
> +
> +# get standard environment, filters and checks
> +. ./common/rc
> +. ./common/filter
> +
> +# real QA test starts here
> +_supported_fs generic
> +_supported_os Linux
> +
> +testfile=$TEST_DIR/$seq.$$
> +
> +[ -x $here/src/diotest ] || _notrun "diotest not built"

_require_test_program "diotest" or a new name :)

And need a _require_odirect too.

> +
> +$here/src/diotest $testfile # > $seqres.full 2>&1 ||
> +	# _fail "i_size isn't update properly!"

All the comments can be removed :)

Thanks,
Eryu

> +
> +# success, all done
> +status=0
> +exit
> diff --git a/tests/generic/450.out b/tests/generic/450.out
> new file mode 100644
> index 0000000..734761a
> --- /dev/null
> +++ b/tests/generic/450.out
> @@ -0,0 +1 @@
> +QA output created by 450
> diff --git a/tests/generic/group b/tests/generic/group
> index b9cd0e8..a555fa0 100644
> --- a/tests/generic/group
> +++ b/tests/generic/group
> @@ -452,3 +452,4 @@
>  447 auto quick clone
>  448 auto quick rw
>  449 auto quick acl enospc
> +450 auto rw quick
> 
> --
> To unsubscribe from this list: send the line "unsubscribe fstests" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe fstests" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nikolay Borisov Aug. 29, 2017, 1:46 p.m. UTC | #2
On 28.08.2017 13:35, Eryu Guan wrote:
> On Fri, Aug 18, 2017 at 03:35:02PM -0500, Eric Sandeen wrote:
>> From: Zheng Liu <wenqing.lz@taobao.com>
>>
>> In this commit a new test case is added to test that i_size races don't
>> occur under dio reads/writes.  We add a program in /src dir, which
>> has a writer to issue some append dio writes.  Meanwhile it has a reader
>> in this test to do some dio reads.  As we expect, reader should read
>> nothing or data with 'a'.  But it might read some data with '0'.
>>
>> The bug can be reproduced by this test case [1].
>>
>> 1.  http://patchwork.ozlabs.org/patch/311761/
>>
>> This ostensibly tests commit:
>> 9fe55eea7 Fix race when checking i_size on direct i/o read
>>
>> Cc: Christoph Hellwig <hch@infradead.org>
>> Cc: Rich Johnston <rjohnston@sgi.com>
>> Cc: Dave Chinner <david@fromorbit.com>
>> Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
>> [sandeen: update to recent xfstests, update commitlog]
>> Signed-off-by: Eric Sandeen <sandeen@redhat.com>
>> ---
>>
>> This test was originally titled:
>>
>>  xfstests: add a new test case to test i_size updated properly under dio
>>
>> but I think the issue is more of when and how it's tested, not how
>> it's updated.
>>
>> Note, this passes on xfs on 4.10, but fails on 4.12.
>> ext4 on 4.10 passes as well but is very slow.
>>
>> iomap dio maybe?  Not sure yet.
> 
> It failed for me with XFS and btrfs, ext4 passed test, kernel is
> 4.13-rc5. But I found that the alignment for write buffer matters too
> for reproducing the btrfs failure, see below.

In btrfs, if the buffers are not aligned to pagesize, then we fall back
to buffered write, so that's likely why.
> 
>>
>> changelog v3:
>>  * rebase against latest xfstests/master branch
>>  * update commit log
> 
> Thanks for picking up the test again! And sorry that I got to it late..
> 
>>
>> changelog v2:
>>  * add '-lpthread' into LLDLIBS
>>
>> diff --git a/configure.ac b/configure.ac
>> index 57092f1..4663004 100644
>> --- a/configure.ac
>> +++ b/configure.ac
>> @@ -59,6 +59,7 @@ AC_PACKAGE_NEED_GETXATTR_LIBATTR
>>  AC_PACKAGE_NEED_SYS_ACL_H
>>  AC_PACKAGE_NEED_ACL_LIBACL_H
>>  AC_PACKAGE_NEED_ACLINIT_LIBACL
>> +AC_PACKAGE_NEED_PTHREADMUTEXINIT
>>  
>>  AC_PACKAGE_WANT_GDBM
>>  AC_PACKAGE_WANT_AIO
>> diff --git a/include/builddefs.in b/include/builddefs.in
>> index cb52b99..fcc8b90 100644
>> --- a/include/builddefs.in
>> +++ b/include/builddefs.in
>> @@ -25,6 +25,7 @@ LIBGDBM = @libgdbm@
>>  LIBUUID = @libuuid@
>>  LIBHANDLE = @libhdl@
>>  LIBDM = @libdm@
>> +LIBPTHREAD = @libpthread@
>>  LIBTEST = $(TOPDIR)/lib/libtest.la
>>  prefix = @prefix@
>>  
>> diff --git a/src/Makefile b/src/Makefile
>> index b8aff49..e9419bd 100644
>> --- a/src/Makefile
>> +++ b/src/Makefile
>> @@ -23,7 +23,7 @@ LINUX_TARGETS = xfsctl bstat t_mtab getdevicesize preallo_rw_pattern_reader \
>>  	seek_copy_test t_readdir_1 t_readdir_2 fsync-tester nsexec cloner \
>>  	renameat2 t_getcwd e4compact test-nextquota punch-alternating \
>>  	attr-list-by-handle-cursor-test listxattr dio-interleaved t_dir_type \
>> -	dio-invalidate-cache stat_test t_encrypted_d_revalidate
>> +	dio-invalidate-cache stat_test t_encrypted_d_revalidate diotest
> 
> I think a more specific name would be better than just "diotest" :)
> 
>>  
>>  SUBDIRS =
>>  
>> diff --git a/src/diotest.c b/src/diotest.c
>> new file mode 100644
>> index 0000000..7d2378f
>> --- /dev/null
>> +++ b/src/diotest.c
>> @@ -0,0 +1,166 @@
>> +/*
>> + * Copyright (c) 2013 Alibaba Group.
>> + * All Rights Reserved.
>> + *
>> + * This program is free software; you can redistribute it and/or
>> + * modify it under the terms of the GNU General Public License as
>> + * published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it would be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program; if not, write the Free Software Foundation,
>> + * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
>> + */
>> +
>> +/*
>> + * This is a normal case that we do some append dio writes and meanwhile
>> + * we do some dio reads.  Currently in vfs we don't ensure that i_size
>> + * is updated properly.  Hence the reader will read some data with '0'.
>> + * But we expect that the reader should read nothing or data with 'a'.
>> + */
>> +
>> +#include <stdio.h>
>> +#include <stdlib.h>
>> +#include <string.h>
>> +
>> +#include <unistd.h>
>> +#include <sys/types.h>
>> +#include <sys/stat.h>
>> +#include <fcntl.h>
>> +#include <errno.h>
>> +
>> +#include <pthread.h>
>> +
>> +static char *prog;
>> +
>> +struct writer_data {
>> +	int fd;
>> +	size_t blksize;
>> +	char *buf;
>> +};
>> +
>> +static void usage(void)
>> +{
>> +	fprintf(stderr, "usage: %s [FILE]\n", prog);
>> +}
>> +
>> +static void *writer(void *arg)
>> +{
>> +	struct writer_data *data = (struct writer_data *)arg;
>> +	int ret;
>> +
>> +	ret = write(data->fd, data->buf, data->blksize);
>> +	if (ret < 0)
>> +		fprintf(stderr, "write file failed: %s\n", strerror(errno));
>> +
>> +	return NULL;
>> +}
>> +
>> +int main(int argc, char *argv[])
>> +{
>> +	pthread_t tid;
>> +	struct writer_data wdata;
>> +	size_t max_blocks = 128;		/* 128 */
>> +	size_t blksize = 1 * 1024 * 1024;	/* 1M */
>> +	char *rbuf = NULL, *wbuf = NULL;
>> +	int rfd = 0, wfd = 0;
>> +	int i, j;
>> +	int ret = 0;
>> +
>> +	prog = basename(argv[0]);
>> +
>> +	if (argc != 2) {
>> +		usage();
>> +		exit(1);
>> +	}
>> +
>> +	wfd = open(argv[1], O_CREAT|O_DIRECT|O_WRONLY|O_APPEND|O_TRUNC, S_IRWXU);
>> +	if (wfd < 0) {
>> +		fprintf(stderr, "failed to open write file: %s\n",
>> +			strerror(errno));
>> +		exit(1);
>> +	}
>> +
>> +	rfd = open(argv[1], O_DIRECT|O_RDONLY, S_IRWXU);
>> +	if (wfd < 0) {
>> +		fprintf(stderr, "failed to open read file: %s\n",
>> +			strerror(errno));
>> +		ret = 1;
>> +		goto err;
>> +	}
>> +
>> +	/*
>> +	 * We set 1024 as an alignment size for write buf.  Feel free to change
>> +	 * it with 4096.  But the problem is also hitted.
>> +	 */
>> +	if (posix_memalign((void **)&wbuf, 1024, blksize)) {
> 
> I suspect that a hardcoded 1024 alignment won't work for 4k sector
> device, but, as I mentioned above, changing it to 4096 made test on
> btrfs pass, even the comment said it didn't matter.
> 
> How about passing the alignment requirement as an argument from the
> shell script? This can be looked up with _min_dio_alignment.
> 
>> +		fprintf(stderr, "failed to alloc memory: %s\n", strerror(errno));
>> +		ret = 1;
>> +		goto err;
>> +	}
>> +
>> +	if (posix_memalign((void **)&rbuf, 4096, blksize)) {
>> +		fprintf(stderr, "failed to alloc memory: %s\n", strerror(errno));
>> +		ret = 1;
>> +		goto err;
>> +	}
>> +
>> +	memset(wbuf, 'a', blksize);
>> +	wdata.fd = wfd;
>> +	wdata.blksize = blksize;
>> +	wdata.buf = wbuf;
>> +
>> +	for (i = 0; i < max_blocks; i++) {
>> +		void *retval;
>> +
>> +		if (pthread_create(&tid, NULL, writer, &wdata)) {
>> +			fprintf(stderr, "create thread failed: %s\n",
>> +				strerror(errno));
>> +			ret = 1;
>> +			goto err;
>> +		}
>> +
>> +		memset(rbuf, 'b', blksize);
>> +		do {
>> +			ret = pread(rfd, rbuf, blksize, i * blksize);
>> +			if (ret < 0)
>> +				fprintf(stderr, "read file failed: %s\n",
>> +					strerror(errno));
>> +		} while (ret <= 0);
>> +
>> +		if (pthread_join(tid, &retval)) {
>> +			fprintf(stderr, " pthread join failed: %s\n",
>> +				strerror(errno));
>> +			ret = 1;
>> +			goto err;
>> +		}
>> +
>> +		if (ret >= 0) {
>> +			for (j = 0; j < ret; j ++) {
>> +				if (rbuf[j] != 'a') {
>> +					fprintf(stderr, "encounter an error: "
>> +						"offset %d content %c\n",
>> +						i, rbuf[j]);
> 
> This prints a binary zero on failure, and makes diff harder to view.
> 
> Binary files tests/generic/452.out and /root/workspace/xfstests/results//xfs_4k/generic/452.out.bad differ
> 
> I changed it a bit:
> -                                               "offset %d content %c\n",
> -                                               i, rbuf[j]);
> +                                               "block %d offset %d, content 0x%x\n",
> +                                               i, j, rbuf[j]);
> 
> and the result looked fine to me:
> +encounter an error: block 15 offset 0, content 0x0
> 
>> +					ret = 1;
>> +					goto err;
>> +				}
>> +			}
>> +		}
>> +	}
>> +
>> +err:
>> +	if (rfd)
>> +		close(rfd);
>> +	if (wfd)
>> +		close(wfd);
>> +	if (rbuf)
>> +		free(rbuf);
>> +	if (wbuf)
>> +		free(wbuf);
>> +
>> +	return ret;
>> +}
>> diff --git a/tests/generic/450 b/tests/generic/450
>> new file mode 100755
>> index 0000000..cfb424c
>> --- /dev/null
>> +++ b/tests/generic/450
>> @@ -0,0 +1,56 @@
>> +#! /bin/bash
>> +# FS QA Test No. 450
>> +#
>> +# Test i_size is updated properly under dio read/write
>> +#
>> +#-----------------------------------------------------------------------
>> +# Copyright (c) 2013 Alibaba Group.  All Rights Reserved.
>> +#
>> +# This program is free software; you can redistribute it and/or
>> +# modify it under the terms of the GNU General Public License as
>> +# published by the Free Software Foundation.
>> +#
>> +# This program is distributed in the hope that it would be useful,
>> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> +# GNU General Public License for more details.
>> +#
>> +# You should have received a copy of the GNU General Public License
>> +# along with this program; if not, write the Free Software Foundation,
>> +# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
>> +#-----------------------------------------------------------------------
>> +#
>> +
>> +seq=`basename $0`
>> +seqres=$RESULT_DIR/$seq
>> +echo "QA output created by $seq"
>> +
>> +here=`pwd`
>> +tmp=/tmp/$$
>> +status=1	# failure is the default!
>> +trap "_cleanup; exit \$status" 0 1 2 3 15
>> +
>> +_cleanup()
>> +{
>> +    cd /
>> +    rm -f $tmp.* $testfile
>> +}
>> +
>> +# get standard environment, filters and checks
>> +. ./common/rc
>> +. ./common/filter
>> +
>> +# real QA test starts here
>> +_supported_fs generic
>> +_supported_os Linux
>> +
>> +testfile=$TEST_DIR/$seq.$$
>> +
>> +[ -x $here/src/diotest ] || _notrun "diotest not built"
> 
> _require_test_program "diotest" or a new name :)
> 
> And need a _require_odirect too.
> 
>> +
>> +$here/src/diotest $testfile # > $seqres.full 2>&1 ||
>> +	# _fail "i_size isn't update properly!"
> 
> All the comments can be removed :)
> 
> Thanks,
> Eryu
> 
>> +
>> +# success, all done
>> +status=0
>> +exit
>> diff --git a/tests/generic/450.out b/tests/generic/450.out
>> new file mode 100644
>> index 0000000..734761a
>> --- /dev/null
>> +++ b/tests/generic/450.out
>> @@ -0,0 +1 @@
>> +QA output created by 450
>> diff --git a/tests/generic/group b/tests/generic/group
>> index b9cd0e8..a555fa0 100644
>> --- a/tests/generic/group
>> +++ b/tests/generic/group
>> @@ -452,3 +452,4 @@
>>  447 auto quick clone
>>  448 auto quick rw
>>  449 auto quick acl enospc
>> +450 auto rw quick
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe fstests" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe fstests" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe fstests" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eryu Guan Sept. 19, 2017, 7:36 a.m. UTC | #3
On Fri, Aug 18, 2017 at 03:35:02PM -0500, Eric Sandeen wrote:
> From: Zheng Liu <wenqing.lz@taobao.com>
> 
> In this commit a new test case is added to test that i_size races don't
> occur under dio reads/writes.  We add a program in /src dir, which
> has a writer to issue some append dio writes.  Meanwhile it has a reader
> in this test to do some dio reads.  As we expect, reader should read
> nothing or data with 'a'.  But it might read some data with '0'.
> 
> The bug can be reproduced by this test case [1].
> 
> 1.  http://patchwork.ozlabs.org/patch/311761/
> 
> This ostensibly tests commit:
> 9fe55eea7 Fix race when checking i_size on direct i/o read
> 
> Cc: Christoph Hellwig <hch@infradead.org>
> Cc: Rich Johnston <rjohnston@sgi.com>
> Cc: Dave Chinner <david@fromorbit.com>
> Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
> [sandeen: update to recent xfstests, update commitlog]
> Signed-off-by: Eric Sandeen <sandeen@redhat.com>
> ---
> 
> This test was originally titled:
> 
>  xfstests: add a new test case to test i_size updated properly under dio
> 
> but I think the issue is more of when and how it's tested, not how
> it's updated.
> 
> Note, this passes on xfs on 4.10, but fails on 4.12.
> ext4 on 4.10 passes as well but is very slow.
> 
> iomap dio maybe?  Not sure yet.

My test with 4.10 kernel suggested that test still failed there. And I
digged into this test a bit, and found that it was commit d531d91d6990
("xfs: always use unwritten extents for direct I/O writes") introduced
this failure, which is in v3.14 kernel.

This is because we start allocating unwritten extents for direct writes
that can extend i_size, but in xfs_dio_write_end_io() we update in-core
i_size before converting unwritten extents to real allocations. So a
racing direct read could find the not-yet converted unwritten extents
and read zeros instead of actual data.

But I'm not sure what's the best way to fix it. I think taking exclusive
iolock instead of shared lock for direct writes that can extend i_size
could fix the non-aio dio write case, but aio-dio write still fails,
because in the aio-dio write case we defer end_io to a workqueue, which
doesn't take any iolock at all..

ext4 has no such problem because ext4 converts unwritten extents before
updating i_size, and ext4 doesn't support appending aio dio writes.

(Keep the rest of patch untrimmed for reference, as I added linux-xfs to
cc list.)

Thanks,
Eryu

> 
> changelog v3:
>  * rebase against latest xfstests/master branch
>  * update commit log
> 
> changelog v2:
>  * add '-lpthread' into LLDLIBS
> 
> diff --git a/configure.ac b/configure.ac
> index 57092f1..4663004 100644
> --- a/configure.ac
> +++ b/configure.ac
> @@ -59,6 +59,7 @@ AC_PACKAGE_NEED_GETXATTR_LIBATTR
>  AC_PACKAGE_NEED_SYS_ACL_H
>  AC_PACKAGE_NEED_ACL_LIBACL_H
>  AC_PACKAGE_NEED_ACLINIT_LIBACL
> +AC_PACKAGE_NEED_PTHREADMUTEXINIT
>  
>  AC_PACKAGE_WANT_GDBM
>  AC_PACKAGE_WANT_AIO
> diff --git a/include/builddefs.in b/include/builddefs.in
> index cb52b99..fcc8b90 100644
> --- a/include/builddefs.in
> +++ b/include/builddefs.in
> @@ -25,6 +25,7 @@ LIBGDBM = @libgdbm@
>  LIBUUID = @libuuid@
>  LIBHANDLE = @libhdl@
>  LIBDM = @libdm@
> +LIBPTHREAD = @libpthread@
>  LIBTEST = $(TOPDIR)/lib/libtest.la
>  prefix = @prefix@
>  
> diff --git a/src/Makefile b/src/Makefile
> index b8aff49..e9419bd 100644
> --- a/src/Makefile
> +++ b/src/Makefile
> @@ -23,7 +23,7 @@ LINUX_TARGETS = xfsctl bstat t_mtab getdevicesize preallo_rw_pattern_reader \
>  	seek_copy_test t_readdir_1 t_readdir_2 fsync-tester nsexec cloner \
>  	renameat2 t_getcwd e4compact test-nextquota punch-alternating \
>  	attr-list-by-handle-cursor-test listxattr dio-interleaved t_dir_type \
> -	dio-invalidate-cache stat_test t_encrypted_d_revalidate
> +	dio-invalidate-cache stat_test t_encrypted_d_revalidate diotest
>  
>  SUBDIRS =
>  
> diff --git a/src/diotest.c b/src/diotest.c
> new file mode 100644
> index 0000000..7d2378f
> --- /dev/null
> +++ b/src/diotest.c
> @@ -0,0 +1,166 @@
> +/*
> + * Copyright (c) 2013 Alibaba Group.
> + * All Rights Reserved.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it would be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write the Free Software Foundation,
> + * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
> + */
> +
> +/*
> + * This is a normal case that we do some append dio writes and meanwhile
> + * we do some dio reads.  Currently in vfs we don't ensure that i_size
> + * is updated properly.  Hence the reader will read some data with '0'.
> + * But we expect that the reader should read nothing or data with 'a'.
> + */
> +
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +
> +#include <unistd.h>
> +#include <sys/types.h>
> +#include <sys/stat.h>
> +#include <fcntl.h>
> +#include <errno.h>
> +
> +#include <pthread.h>
> +
> +static char *prog;
> +
> +struct writer_data {
> +	int fd;
> +	size_t blksize;
> +	char *buf;
> +};
> +
> +static void usage(void)
> +{
> +	fprintf(stderr, "usage: %s [FILE]\n", prog);
> +}
> +
> +static void *writer(void *arg)
> +{
> +	struct writer_data *data = (struct writer_data *)arg;
> +	int ret;
> +
> +	ret = write(data->fd, data->buf, data->blksize);
> +	if (ret < 0)
> +		fprintf(stderr, "write file failed: %s\n", strerror(errno));
> +
> +	return NULL;
> +}
> +
> +int main(int argc, char *argv[])
> +{
> +	pthread_t tid;
> +	struct writer_data wdata;
> +	size_t max_blocks = 128;		/* 128 */
> +	size_t blksize = 1 * 1024 * 1024;	/* 1M */
> +	char *rbuf = NULL, *wbuf = NULL;
> +	int rfd = 0, wfd = 0;
> +	int i, j;
> +	int ret = 0;
> +
> +	prog = basename(argv[0]);
> +
> +	if (argc != 2) {
> +		usage();
> +		exit(1);
> +	}
> +
> +	wfd = open(argv[1], O_CREAT|O_DIRECT|O_WRONLY|O_APPEND|O_TRUNC, S_IRWXU);
> +	if (wfd < 0) {
> +		fprintf(stderr, "failed to open write file: %s\n",
> +			strerror(errno));
> +		exit(1);
> +	}
> +
> +	rfd = open(argv[1], O_DIRECT|O_RDONLY, S_IRWXU);
> +	if (wfd < 0) {
> +		fprintf(stderr, "failed to open read file: %s\n",
> +			strerror(errno));
> +		ret = 1;
> +		goto err;
> +	}
> +
> +	/*
> +	 * We set 1024 as an alignment size for write buf.  Feel free to change
> +	 * it with 4096.  But the problem is also hitted.
> +	 */
> +	if (posix_memalign((void **)&wbuf, 1024, blksize)) {
> +		fprintf(stderr, "failed to alloc memory: %s\n", strerror(errno));
> +		ret = 1;
> +		goto err;
> +	}
> +
> +	if (posix_memalign((void **)&rbuf, 4096, blksize)) {
> +		fprintf(stderr, "failed to alloc memory: %s\n", strerror(errno));
> +		ret = 1;
> +		goto err;
> +	}
> +
> +	memset(wbuf, 'a', blksize);
> +	wdata.fd = wfd;
> +	wdata.blksize = blksize;
> +	wdata.buf = wbuf;
> +
> +	for (i = 0; i < max_blocks; i++) {
> +		void *retval;
> +
> +		if (pthread_create(&tid, NULL, writer, &wdata)) {
> +			fprintf(stderr, "create thread failed: %s\n",
> +				strerror(errno));
> +			ret = 1;
> +			goto err;
> +		}
> +
> +		memset(rbuf, 'b', blksize);
> +		do {
> +			ret = pread(rfd, rbuf, blksize, i * blksize);
> +			if (ret < 0)
> +				fprintf(stderr, "read file failed: %s\n",
> +					strerror(errno));
> +		} while (ret <= 0);
> +
> +		if (pthread_join(tid, &retval)) {
> +			fprintf(stderr, " pthread join failed: %s\n",
> +				strerror(errno));
> +			ret = 1;
> +			goto err;
> +		}
> +
> +		if (ret >= 0) {
> +			for (j = 0; j < ret; j ++) {
> +				if (rbuf[j] != 'a') {
> +					fprintf(stderr, "encounter an error: "
> +						"offset %d content %c\n",
> +						i, rbuf[j]);
> +					ret = 1;
> +					goto err;
> +				}
> +			}
> +		}
> +	}
> +
> +err:
> +	if (rfd)
> +		close(rfd);
> +	if (wfd)
> +		close(wfd);
> +	if (rbuf)
> +		free(rbuf);
> +	if (wbuf)
> +		free(wbuf);
> +
> +	return ret;
> +}
> diff --git a/tests/generic/450 b/tests/generic/450
> new file mode 100755
> index 0000000..cfb424c
> --- /dev/null
> +++ b/tests/generic/450
> @@ -0,0 +1,56 @@
> +#! /bin/bash
> +# FS QA Test No. 450
> +#
> +# Test i_size is updated properly under dio read/write
> +#
> +#-----------------------------------------------------------------------
> +# Copyright (c) 2013 Alibaba Group.  All Rights Reserved.
> +#
> +# This program is free software; you can redistribute it and/or
> +# modify it under the terms of the GNU General Public License as
> +# published by the Free Software Foundation.
> +#
> +# This program is distributed in the hope that it would be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program; if not, write the Free Software Foundation,
> +# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
> +#-----------------------------------------------------------------------
> +#
> +
> +seq=`basename $0`
> +seqres=$RESULT_DIR/$seq
> +echo "QA output created by $seq"
> +
> +here=`pwd`
> +tmp=/tmp/$$
> +status=1	# failure is the default!
> +trap "_cleanup; exit \$status" 0 1 2 3 15
> +
> +_cleanup()
> +{
> +    cd /
> +    rm -f $tmp.* $testfile
> +}
> +
> +# get standard environment, filters and checks
> +. ./common/rc
> +. ./common/filter
> +
> +# real QA test starts here
> +_supported_fs generic
> +_supported_os Linux
> +
> +testfile=$TEST_DIR/$seq.$$
> +
> +[ -x $here/src/diotest ] || _notrun "diotest not built"
> +
> +$here/src/diotest $testfile # > $seqres.full 2>&1 ||
> +	# _fail "i_size isn't update properly!"
> +
> +# success, all done
> +status=0
> +exit
> diff --git a/tests/generic/450.out b/tests/generic/450.out
> new file mode 100644
> index 0000000..734761a
> --- /dev/null
> +++ b/tests/generic/450.out
> @@ -0,0 +1 @@
> +QA output created by 450
> diff --git a/tests/generic/group b/tests/generic/group
> index b9cd0e8..a555fa0 100644
> --- a/tests/generic/group
> +++ b/tests/generic/group
> @@ -452,3 +452,4 @@
>  447 auto quick clone
>  448 auto quick rw
>  449 auto quick acl enospc
> +450 auto rw quick
> 
> --
> To unsubscribe from this list: send the line "unsubscribe fstests" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe fstests" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Brian Foster Sept. 19, 2017, 2:13 p.m. UTC | #4
On Tue, Sep 19, 2017 at 03:36:16PM +0800, Eryu Guan wrote:
> On Fri, Aug 18, 2017 at 03:35:02PM -0500, Eric Sandeen wrote:
> > From: Zheng Liu <wenqing.lz@taobao.com>
> > 
> > In this commit a new test case is added to test that i_size races don't
> > occur under dio reads/writes.  We add a program in /src dir, which
> > has a writer to issue some append dio writes.  Meanwhile it has a reader
> > in this test to do some dio reads.  As we expect, reader should read
> > nothing or data with 'a'.  But it might read some data with '0'.
> > 
> > The bug can be reproduced by this test case [1].
> > 
> > 1.  http://patchwork.ozlabs.org/patch/311761/
> > 
> > This ostensibly tests commit:
> > 9fe55eea7 Fix race when checking i_size on direct i/o read
> > 
> > Cc: Christoph Hellwig <hch@infradead.org>
> > Cc: Rich Johnston <rjohnston@sgi.com>
> > Cc: Dave Chinner <david@fromorbit.com>
> > Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
> > [sandeen: update to recent xfstests, update commitlog]
> > Signed-off-by: Eric Sandeen <sandeen@redhat.com>
> > ---
> > 
> > This test was originally titled:
> > 
> >  xfstests: add a new test case to test i_size updated properly under dio
> > 
> > but I think the issue is more of when and how it's tested, not how
> > it's updated.
> > 
> > Note, this passes on xfs on 4.10, but fails on 4.12.
> > ext4 on 4.10 passes as well but is very slow.
> > 
> > iomap dio maybe?  Not sure yet.
> 
> My test with 4.10 kernel suggested that test still failed there. And I
> digged into this test a bit, and found that it was commit d531d91d6990
> ("xfs: always use unwritten extents for direct I/O writes") introduced
> this failure, which is in v3.14 kernel.
> 
> This is because we start allocating unwritten extents for direct writes
> that can extend i_size, but in xfs_dio_write_end_io() we update in-core
> i_size before converting unwritten extents to real allocations. So a
> racing direct read could find the not-yet converted unwritten extents
> and read zeros instead of actual data.
> 
> But I'm not sure what's the best way to fix it. I think taking exclusive
> iolock instead of shared lock for direct writes that can extend i_size
> could fix the non-aio dio write case, but aio-dio write still fails,
> because in the aio-dio write case we defer end_io to a workqueue, which
> doesn't take any iolock at all..
> 

Can we pass a boolean or flag to xfs_iomap_write_unwritten() to have it
update the incore i_size after unwritten extent conversion? Then move
(or remove) the associated update from xfs_dio_write_end_io().

Brian

> ext4 has no such problem because ext4 converts unwritten extents before
> updating i_size, and ext4 doesn't support appending aio dio writes.
> 
> (Keep the rest of patch untrimmed for reference, as I added linux-xfs to
> cc list.)
> 
> Thanks,
> Eryu
> 
> > 
> > changelog v3:
> >  * rebase against latest xfstests/master branch
> >  * update commit log
> > 
> > changelog v2:
> >  * add '-lpthread' into LLDLIBS
> > 
> > diff --git a/configure.ac b/configure.ac
> > index 57092f1..4663004 100644
> > --- a/configure.ac
> > +++ b/configure.ac
> > @@ -59,6 +59,7 @@ AC_PACKAGE_NEED_GETXATTR_LIBATTR
> >  AC_PACKAGE_NEED_SYS_ACL_H
> >  AC_PACKAGE_NEED_ACL_LIBACL_H
> >  AC_PACKAGE_NEED_ACLINIT_LIBACL
> > +AC_PACKAGE_NEED_PTHREADMUTEXINIT
> >  
> >  AC_PACKAGE_WANT_GDBM
> >  AC_PACKAGE_WANT_AIO
> > diff --git a/include/builddefs.in b/include/builddefs.in
> > index cb52b99..fcc8b90 100644
> > --- a/include/builddefs.in
> > +++ b/include/builddefs.in
> > @@ -25,6 +25,7 @@ LIBGDBM = @libgdbm@
> >  LIBUUID = @libuuid@
> >  LIBHANDLE = @libhdl@
> >  LIBDM = @libdm@
> > +LIBPTHREAD = @libpthread@
> >  LIBTEST = $(TOPDIR)/lib/libtest.la
> >  prefix = @prefix@
> >  
> > diff --git a/src/Makefile b/src/Makefile
> > index b8aff49..e9419bd 100644
> > --- a/src/Makefile
> > +++ b/src/Makefile
> > @@ -23,7 +23,7 @@ LINUX_TARGETS = xfsctl bstat t_mtab getdevicesize preallo_rw_pattern_reader \
> >  	seek_copy_test t_readdir_1 t_readdir_2 fsync-tester nsexec cloner \
> >  	renameat2 t_getcwd e4compact test-nextquota punch-alternating \
> >  	attr-list-by-handle-cursor-test listxattr dio-interleaved t_dir_type \
> > -	dio-invalidate-cache stat_test t_encrypted_d_revalidate
> > +	dio-invalidate-cache stat_test t_encrypted_d_revalidate diotest
> >  
> >  SUBDIRS =
> >  
> > diff --git a/src/diotest.c b/src/diotest.c
> > new file mode 100644
> > index 0000000..7d2378f
> > --- /dev/null
> > +++ b/src/diotest.c
> > @@ -0,0 +1,166 @@
> > +/*
> > + * Copyright (c) 2013 Alibaba Group.
> > + * All Rights Reserved.
> > + *
> > + * This program is free software; you can redistribute it and/or
> > + * modify it under the terms of the GNU General Public License as
> > + * published by the Free Software Foundation.
> > + *
> > + * This program is distributed in the hope that it would be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > + * GNU General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU General Public License
> > + * along with this program; if not, write the Free Software Foundation,
> > + * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
> > + */
> > +
> > +/*
> > + * This is a normal case that we do some append dio writes and meanwhile
> > + * we do some dio reads.  Currently in vfs we don't ensure that i_size
> > + * is updated properly.  Hence the reader will read some data with '0'.
> > + * But we expect that the reader should read nothing or data with 'a'.
> > + */
> > +
> > +#include <stdio.h>
> > +#include <stdlib.h>
> > +#include <string.h>
> > +
> > +#include <unistd.h>
> > +#include <sys/types.h>
> > +#include <sys/stat.h>
> > +#include <fcntl.h>
> > +#include <errno.h>
> > +
> > +#include <pthread.h>
> > +
> > +static char *prog;
> > +
> > +struct writer_data {
> > +	int fd;
> > +	size_t blksize;
> > +	char *buf;
> > +};
> > +
> > +static void usage(void)
> > +{
> > +	fprintf(stderr, "usage: %s [FILE]\n", prog);
> > +}
> > +
> > +static void *writer(void *arg)
> > +{
> > +	struct writer_data *data = (struct writer_data *)arg;
> > +	int ret;
> > +
> > +	ret = write(data->fd, data->buf, data->blksize);
> > +	if (ret < 0)
> > +		fprintf(stderr, "write file failed: %s\n", strerror(errno));
> > +
> > +	return NULL;
> > +}
> > +
> > +int main(int argc, char *argv[])
> > +{
> > +	pthread_t tid;
> > +	struct writer_data wdata;
> > +	size_t max_blocks = 128;		/* 128 */
> > +	size_t blksize = 1 * 1024 * 1024;	/* 1M */
> > +	char *rbuf = NULL, *wbuf = NULL;
> > +	int rfd = 0, wfd = 0;
> > +	int i, j;
> > +	int ret = 0;
> > +
> > +	prog = basename(argv[0]);
> > +
> > +	if (argc != 2) {
> > +		usage();
> > +		exit(1);
> > +	}
> > +
> > +	wfd = open(argv[1], O_CREAT|O_DIRECT|O_WRONLY|O_APPEND|O_TRUNC, S_IRWXU);
> > +	if (wfd < 0) {
> > +		fprintf(stderr, "failed to open write file: %s\n",
> > +			strerror(errno));
> > +		exit(1);
> > +	}
> > +
> > +	rfd = open(argv[1], O_DIRECT|O_RDONLY, S_IRWXU);
> > +	if (wfd < 0) {
> > +		fprintf(stderr, "failed to open read file: %s\n",
> > +			strerror(errno));
> > +		ret = 1;
> > +		goto err;
> > +	}
> > +
> > +	/*
> > +	 * We set 1024 as an alignment size for write buf.  Feel free to change
> > +	 * it with 4096.  But the problem is also hitted.
> > +	 */
> > +	if (posix_memalign((void **)&wbuf, 1024, blksize)) {
> > +		fprintf(stderr, "failed to alloc memory: %s\n", strerror(errno));
> > +		ret = 1;
> > +		goto err;
> > +	}
> > +
> > +	if (posix_memalign((void **)&rbuf, 4096, blksize)) {
> > +		fprintf(stderr, "failed to alloc memory: %s\n", strerror(errno));
> > +		ret = 1;
> > +		goto err;
> > +	}
> > +
> > +	memset(wbuf, 'a', blksize);
> > +	wdata.fd = wfd;
> > +	wdata.blksize = blksize;
> > +	wdata.buf = wbuf;
> > +
> > +	for (i = 0; i < max_blocks; i++) {
> > +		void *retval;
> > +
> > +		if (pthread_create(&tid, NULL, writer, &wdata)) {
> > +			fprintf(stderr, "create thread failed: %s\n",
> > +				strerror(errno));
> > +			ret = 1;
> > +			goto err;
> > +		}
> > +
> > +		memset(rbuf, 'b', blksize);
> > +		do {
> > +			ret = pread(rfd, rbuf, blksize, i * blksize);
> > +			if (ret < 0)
> > +				fprintf(stderr, "read file failed: %s\n",
> > +					strerror(errno));
> > +		} while (ret <= 0);
> > +
> > +		if (pthread_join(tid, &retval)) {
> > +			fprintf(stderr, " pthread join failed: %s\n",
> > +				strerror(errno));
> > +			ret = 1;
> > +			goto err;
> > +		}
> > +
> > +		if (ret >= 0) {
> > +			for (j = 0; j < ret; j ++) {
> > +				if (rbuf[j] != 'a') {
> > +					fprintf(stderr, "encounter an error: "
> > +						"offset %d content %c\n",
> > +						i, rbuf[j]);
> > +					ret = 1;
> > +					goto err;
> > +				}
> > +			}
> > +		}
> > +	}
> > +
> > +err:
> > +	if (rfd)
> > +		close(rfd);
> > +	if (wfd)
> > +		close(wfd);
> > +	if (rbuf)
> > +		free(rbuf);
> > +	if (wbuf)
> > +		free(wbuf);
> > +
> > +	return ret;
> > +}
> > diff --git a/tests/generic/450 b/tests/generic/450
> > new file mode 100755
> > index 0000000..cfb424c
> > --- /dev/null
> > +++ b/tests/generic/450
> > @@ -0,0 +1,56 @@
> > +#! /bin/bash
> > +# FS QA Test No. 450
> > +#
> > +# Test i_size is updated properly under dio read/write
> > +#
> > +#-----------------------------------------------------------------------
> > +# Copyright (c) 2013 Alibaba Group.  All Rights Reserved.
> > +#
> > +# This program is free software; you can redistribute it and/or
> > +# modify it under the terms of the GNU General Public License as
> > +# published by the Free Software Foundation.
> > +#
> > +# This program is distributed in the hope that it would be useful,
> > +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > +# GNU General Public License for more details.
> > +#
> > +# You should have received a copy of the GNU General Public License
> > +# along with this program; if not, write the Free Software Foundation,
> > +# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
> > +#-----------------------------------------------------------------------
> > +#
> > +
> > +seq=`basename $0`
> > +seqres=$RESULT_DIR/$seq
> > +echo "QA output created by $seq"
> > +
> > +here=`pwd`
> > +tmp=/tmp/$$
> > +status=1	# failure is the default!
> > +trap "_cleanup; exit \$status" 0 1 2 3 15
> > +
> > +_cleanup()
> > +{
> > +    cd /
> > +    rm -f $tmp.* $testfile
> > +}
> > +
> > +# get standard environment, filters and checks
> > +. ./common/rc
> > +. ./common/filter
> > +
> > +# real QA test starts here
> > +_supported_fs generic
> > +_supported_os Linux
> > +
> > +testfile=$TEST_DIR/$seq.$$
> > +
> > +[ -x $here/src/diotest ] || _notrun "diotest not built"
> > +
> > +$here/src/diotest $testfile # > $seqres.full 2>&1 ||
> > +	# _fail "i_size isn't update properly!"
> > +
> > +# success, all done
> > +status=0
> > +exit
> > diff --git a/tests/generic/450.out b/tests/generic/450.out
> > new file mode 100644
> > index 0000000..734761a
> > --- /dev/null
> > +++ b/tests/generic/450.out
> > @@ -0,0 +1 @@
> > +QA output created by 450
> > diff --git a/tests/generic/group b/tests/generic/group
> > index b9cd0e8..a555fa0 100644
> > --- a/tests/generic/group
> > +++ b/tests/generic/group
> > @@ -452,3 +452,4 @@
> >  447 auto quick clone
> >  448 auto quick rw
> >  449 auto quick acl enospc
> > +450 auto rw quick
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe fstests" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe fstests" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Hellwig Sept. 19, 2017, 2:34 p.m. UTC | #5
On Tue, Sep 19, 2017 at 10:13:52AM -0400, Brian Foster wrote:
> Can we pass a boolean or flag to xfs_iomap_write_unwritten() to have it
> update the incore i_size after unwritten extent conversion? Then move
> (or remove) the associated update from xfs_dio_write_end_io().

I don't think we even need a flag - all three callers of
xfs_iomap_write_unwritten want to update the file size.
--
To unsubscribe from this list: send the line "unsubscribe fstests" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Brian Foster Sept. 19, 2017, 2:58 p.m. UTC | #6
On Tue, Sep 19, 2017 at 07:34:06AM -0700, Christoph Hellwig wrote:
> On Tue, Sep 19, 2017 at 10:13:52AM -0400, Brian Foster wrote:
> > Can we pass a boolean or flag to xfs_iomap_write_unwritten() to have it
> > update the incore i_size after unwritten extent conversion? Then move
> > (or remove) the associated update from xfs_dio_write_end_io().
> 
> I don't think we even need a flag - all three callers of
> xfs_iomap_write_unwritten want to update the file size.

Sounds reasonable to me. I wasn't sure if there was anything special
about the pnfs case. isize should probably already be stable in the
writeback case, so perhaps this would be fine as long as we ensure the
in-core size only increases (as we currently do for di_size).

Brian
--
To unsubscribe from this list: send the line "unsubscribe fstests" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/configure.ac b/configure.ac
index 57092f1..4663004 100644
--- a/configure.ac
+++ b/configure.ac
@@ -59,6 +59,7 @@  AC_PACKAGE_NEED_GETXATTR_LIBATTR
 AC_PACKAGE_NEED_SYS_ACL_H
 AC_PACKAGE_NEED_ACL_LIBACL_H
 AC_PACKAGE_NEED_ACLINIT_LIBACL
+AC_PACKAGE_NEED_PTHREADMUTEXINIT
 
 AC_PACKAGE_WANT_GDBM
 AC_PACKAGE_WANT_AIO
diff --git a/include/builddefs.in b/include/builddefs.in
index cb52b99..fcc8b90 100644
--- a/include/builddefs.in
+++ b/include/builddefs.in
@@ -25,6 +25,7 @@  LIBGDBM = @libgdbm@
 LIBUUID = @libuuid@
 LIBHANDLE = @libhdl@
 LIBDM = @libdm@
+LIBPTHREAD = @libpthread@
 LIBTEST = $(TOPDIR)/lib/libtest.la
 prefix = @prefix@
 
diff --git a/src/Makefile b/src/Makefile
index b8aff49..e9419bd 100644
--- a/src/Makefile
+++ b/src/Makefile
@@ -23,7 +23,7 @@  LINUX_TARGETS = xfsctl bstat t_mtab getdevicesize preallo_rw_pattern_reader \
 	seek_copy_test t_readdir_1 t_readdir_2 fsync-tester nsexec cloner \
 	renameat2 t_getcwd e4compact test-nextquota punch-alternating \
 	attr-list-by-handle-cursor-test listxattr dio-interleaved t_dir_type \
-	dio-invalidate-cache stat_test t_encrypted_d_revalidate
+	dio-invalidate-cache stat_test t_encrypted_d_revalidate diotest
 
 SUBDIRS =
 
diff --git a/src/diotest.c b/src/diotest.c
new file mode 100644
index 0000000..7d2378f
--- /dev/null
+++ b/src/diotest.c
@@ -0,0 +1,166 @@ 
+/*
+ * Copyright (c) 2013 Alibaba Group.
+ * All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+ */
+
+/*
+ * This is a normal case that we do some append dio writes and meanwhile
+ * we do some dio reads.  Currently in vfs we don't ensure that i_size
+ * is updated properly.  Hence the reader will read some data with '0'.
+ * But we expect that the reader should read nothing or data with 'a'.
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+
+#include <unistd.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <fcntl.h>
+#include <errno.h>
+
+#include <pthread.h>
+
+static char *prog;
+
+struct writer_data {
+	int fd;
+	size_t blksize;
+	char *buf;
+};
+
+static void usage(void)
+{
+	fprintf(stderr, "usage: %s [FILE]\n", prog);
+}
+
+static void *writer(void *arg)
+{
+	struct writer_data *data = (struct writer_data *)arg;
+	int ret;
+
+	ret = write(data->fd, data->buf, data->blksize);
+	if (ret < 0)
+		fprintf(stderr, "write file failed: %s\n", strerror(errno));
+
+	return NULL;
+}
+
+int main(int argc, char *argv[])
+{
+	pthread_t tid;
+	struct writer_data wdata;
+	size_t max_blocks = 128;		/* 128 */
+	size_t blksize = 1 * 1024 * 1024;	/* 1M */
+	char *rbuf = NULL, *wbuf = NULL;
+	int rfd = 0, wfd = 0;
+	int i, j;
+	int ret = 0;
+
+	prog = basename(argv[0]);
+
+	if (argc != 2) {
+		usage();
+		exit(1);
+	}
+
+	wfd = open(argv[1], O_CREAT|O_DIRECT|O_WRONLY|O_APPEND|O_TRUNC, S_IRWXU);
+	if (wfd < 0) {
+		fprintf(stderr, "failed to open write file: %s\n",
+			strerror(errno));
+		exit(1);
+	}
+
+	rfd = open(argv[1], O_DIRECT|O_RDONLY, S_IRWXU);
+	if (wfd < 0) {
+		fprintf(stderr, "failed to open read file: %s\n",
+			strerror(errno));
+		ret = 1;
+		goto err;
+	}
+
+	/*
+	 * We set 1024 as an alignment size for write buf.  Feel free to change
+	 * it with 4096.  But the problem is also hitted.
+	 */
+	if (posix_memalign((void **)&wbuf, 1024, blksize)) {
+		fprintf(stderr, "failed to alloc memory: %s\n", strerror(errno));
+		ret = 1;
+		goto err;
+	}
+
+	if (posix_memalign((void **)&rbuf, 4096, blksize)) {
+		fprintf(stderr, "failed to alloc memory: %s\n", strerror(errno));
+		ret = 1;
+		goto err;
+	}
+
+	memset(wbuf, 'a', blksize);
+	wdata.fd = wfd;
+	wdata.blksize = blksize;
+	wdata.buf = wbuf;
+
+	for (i = 0; i < max_blocks; i++) {
+		void *retval;
+
+		if (pthread_create(&tid, NULL, writer, &wdata)) {
+			fprintf(stderr, "create thread failed: %s\n",
+				strerror(errno));
+			ret = 1;
+			goto err;
+		}
+
+		memset(rbuf, 'b', blksize);
+		do {
+			ret = pread(rfd, rbuf, blksize, i * blksize);
+			if (ret < 0)
+				fprintf(stderr, "read file failed: %s\n",
+					strerror(errno));
+		} while (ret <= 0);
+
+		if (pthread_join(tid, &retval)) {
+			fprintf(stderr, " pthread join failed: %s\n",
+				strerror(errno));
+			ret = 1;
+			goto err;
+		}
+
+		if (ret >= 0) {
+			for (j = 0; j < ret; j ++) {
+				if (rbuf[j] != 'a') {
+					fprintf(stderr, "encounter an error: "
+						"offset %d content %c\n",
+						i, rbuf[j]);
+					ret = 1;
+					goto err;
+				}
+			}
+		}
+	}
+
+err:
+	if (rfd)
+		close(rfd);
+	if (wfd)
+		close(wfd);
+	if (rbuf)
+		free(rbuf);
+	if (wbuf)
+		free(wbuf);
+
+	return ret;
+}
diff --git a/tests/generic/450 b/tests/generic/450
new file mode 100755
index 0000000..cfb424c
--- /dev/null
+++ b/tests/generic/450
@@ -0,0 +1,56 @@ 
+#! /bin/bash
+# FS QA Test No. 450
+#
+# Test i_size is updated properly under dio read/write
+#
+#-----------------------------------------------------------------------
+# Copyright (c) 2013 Alibaba Group.  All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#-----------------------------------------------------------------------
+#
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1	# failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+    cd /
+    rm -f $tmp.* $testfile
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+
+# real QA test starts here
+_supported_fs generic
+_supported_os Linux
+
+testfile=$TEST_DIR/$seq.$$
+
+[ -x $here/src/diotest ] || _notrun "diotest not built"
+
+$here/src/diotest $testfile # > $seqres.full 2>&1 ||
+	# _fail "i_size isn't update properly!"
+
+# success, all done
+status=0
+exit
diff --git a/tests/generic/450.out b/tests/generic/450.out
new file mode 100644
index 0000000..734761a
--- /dev/null
+++ b/tests/generic/450.out
@@ -0,0 +1 @@ 
+QA output created by 450
diff --git a/tests/generic/group b/tests/generic/group
index b9cd0e8..a555fa0 100644
--- a/tests/generic/group
+++ b/tests/generic/group
@@ -452,3 +452,4 @@ 
 447 auto quick clone
 448 auto quick rw
 449 auto quick acl enospc
+450 auto rw quick