diff mbox series

netfs: Fix kernel async DIO

Message ID 286638.1736163444@warthog.procyon.org.uk (mailing list archive)
State New
Headers show
Series netfs: Fix kernel async DIO | expand

Commit Message

David Howells Jan. 6, 2025, 11:37 a.m. UTC
Hi Nicolas,

Does the attached fix your problem?

David
---
netfs: Fix kernel async DIO

Netfslib needs to be able to handle kernel-initiated asynchronous DIO that
is supplied with a bio_vec[] array.  Currently, because of the async flag,
this gets passed to netfs_extract_user_iter() which throws a warning and
fails because it only handles IOVEC and UBUF iterators.  This can be
triggered through a combination of cifs and a loopback blockdev with
something like:

        mount //my/cifs/share /foo
        dd if=/dev/zero of=/foo/m0 bs=4K count=1K
        losetup --sector-size 4096 --direct-io=on /dev/loop2046 /foo/m0
        echo hello >/dev/loop2046

This causes the following to appear in syslog:

        WARNING: CPU: 2 PID: 109 at fs/netfs/iterator.c:50 netfs_extract_user_iter+0x170/0x250 [netfs]

and the write to fail.

Fix this by removing the check in netfs_unbuffered_write_iter_locked() that
causes async kernel DIO writes to be handled as userspace writes.  Note
that this change relies on the kernel caller maintaining the existence of
the bio_vec array (or kvec[] or folio_queue) until the op is complete.

Fixes: 153a9961b551 ("netfs: Implement unbuffered/DIO write support")
Reported by: Nicolas Baranger <nicolas.baranger@3xo.fr>
Closes: https://lore.kernel.org/r/fedd8a40d54b2969097ffa4507979858@3xo.fr/
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steve French <smfrench@gmail.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: netfs@lists.linux.dev
cc: linux-cifs@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
---
 fs/netfs/direct_write.c |    7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

Comments

Nicolas Baranger Jan. 6, 2025, 12:07 p.m. UTC | #1
Hi David

Thanks for the job !
I will buid Linux 6.10 and mainline with the provided change and I'm 
comming here as soon as I get results from tests (CET working time).

Thanks again for help in this issue
Nicolas

Le 2025-01-06 12:37, David Howells a écrit :

> Hi Nicolas,
> 
> Does the attached fix your problem?
> 
> David
> ---
> netfs: Fix kernel async DIO
> 
> Netfslib needs to be able to handle kernel-initiated asynchronous DIO 
> that
> is supplied with a bio_vec[] array.  Currently, because of the async 
> flag,
> this gets passed to netfs_extract_user_iter() which throws a warning 
> and
> fails because it only handles IOVEC and UBUF iterators.  This can be
> triggered through a combination of cifs and a loopback blockdev with
> something like:
> 
> mount //my/cifs/share /foo
> dd if=/dev/zero of=/foo/m0 bs=4K count=1K
> losetup --sector-size 4096 --direct-io=on /dev/loop2046 /foo/m0
> echo hello >/dev/loop2046
> 
> This causes the following to appear in syslog:
> 
> WARNING: CPU: 2 PID: 109 at fs/netfs/iterator.c:50 
> netfs_extract_user_iter+0x170/0x250 [netfs]
> 
> and the write to fail.
> 
> Fix this by removing the check in netfs_unbuffered_write_iter_locked() 
> that
> causes async kernel DIO writes to be handled as userspace writes.  Note
> that this change relies on the kernel caller maintaining the existence 
> of
> the bio_vec array (or kvec[] or folio_queue) until the op is complete.
> 
> Fixes: 153a9961b551 ("netfs: Implement unbuffered/DIO write support")
> Reported by: Nicolas Baranger <nicolas.baranger@3xo.fr>
> Closes: 
> https://lore.kernel.org/r/fedd8a40d54b2969097ffa4507979858@3xo.fr/
> Signed-off-by: David Howells <dhowells@redhat.com>
> cc: Steve French <smfrench@gmail.com>
> cc: Jeff Layton <jlayton@kernel.org>
> cc: netfs@lists.linux.dev
> cc: linux-cifs@vger.kernel.org
> cc: linux-fsdevel@vger.kernel.org
> ---
> fs/netfs/direct_write.c |    7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/netfs/direct_write.c b/fs/netfs/direct_write.c
> index eded8afaa60b..42ce53cc216e 100644
> --- a/fs/netfs/direct_write.c
> +++ b/fs/netfs/direct_write.c
> @@ -67,7 +67,7 @@ ssize_t netfs_unbuffered_write_iter_locked(struct 
> kiocb *iocb, struct iov_iter *
> * allocate a sufficiently large bvec array and may shorten the
> * request.
> */
> -        if (async || user_backed_iter(iter)) {
> +        if (user_backed_iter(iter)) {
> n = netfs_extract_user_iter(iter, len, &wreq->buffer.iter, 0);
> if (n < 0) {
> ret = n;
> @@ -77,6 +77,11 @@ ssize_t netfs_unbuffered_write_iter_locked(struct 
> kiocb *iocb, struct iov_iter *
> wreq->direct_bv_count = n;
> wreq->direct_bv_unpin = iov_iter_extract_will_pin(iter);
> } else {
> +            /* If this is a kernel-generated async DIO request,
> +             * assume that any resources the iterator points to
> +             * (eg. a bio_vec array) will persist till the end of
> +             * the op.
> +             */
> wreq->buffer.iter = *iter;
> }
> }
Christoph Hellwig Jan. 6, 2025, 3:34 p.m. UTC | #2
On Mon, Jan 06, 2025 at 11:37:24AM +0000, David Howells wrote:
>         mount //my/cifs/share /foo
>         dd if=/dev/zero of=/foo/m0 bs=4K count=1K
>         losetup --sector-size 4096 --direct-io=on /dev/loop2046 /foo/m0
>         echo hello >/dev/loop2046

Can you add a testcase using losetup --direct-io with a file on
$TEST_DIR so that we get coverage for ITER_BVEC directio to xfstests?
Nicolas Baranger Jan. 7, 2025, 8:26 a.m. UTC | #3
Hi David

As your patch was written on top on linux-next I was required to make 
some small modifications to make it work on mainline (6.13-rc6).
The following patch is working fine for me on mainline, but i think it 
would be better to wait for your confirmation / validation (or new 
patch) before applying it on production.

#-------- PATCH --------#

diff --git a/linux-6.13-rc6/nba/_orig_fs.netfs.direct_write.c 
b/linux-6.13-rc6/fs/netfs/direct_write.c
index 88f2adf..94a1ee8 100644
--- a/linux-6.13-rc6/nba/_orig_fs.netfs.direct_write.c
+++ b/linux-6.13-rc6/fs/netfs/direct_write.c
@@ -67,7 +67,7 @@ ssize_t netfs_unbuffered_write_iter_locked(struct 
kiocb *iocb, struct iov_iter *
                  * allocate a sufficiently large bvec array and may 
shorten the
                  * request.
                  */
-               if (async || user_backed_iter(iter)) {
+               if (user_backed_iter(iter)) {
                         n = netfs_extract_user_iter(iter, len, 
&wreq->iter, 0);
                         if (n < 0) {
                                 ret = n;
@@ -77,6 +77,11 @@ ssize_t netfs_unbuffered_write_iter_locked(struct 
kiocb *iocb, struct iov_iter *
                         wreq->direct_bv_count = n;
                         wreq->direct_bv_unpin = 
iov_iter_extract_will_pin(iter);
                 } else {
+                       /* If this is a kernel-generated async DIO 
request,
+                        * assume that any resources the iterator points 
to
+                        * (eg. a bio_vec array) will persist till the 
end of
+                        * the op.
+                        */
                         wreq->iter = *iter;
                 }


#-------- TESTS --------#

Using this patch Linux 6.13-rc6 build with no error and '--direct-io=on' 
is working :


18:38:47 root@deb12-lab-10d:~# uname -a
Linux deb12-lab-10d.lab.lan 6.13.0-rc6-amd64 #0 SMP PREEMPT_DYNAMIC Mon 
Jan  6 18:14:07 CET 2025 x86_64 GNU/Linux

18:39:29 root@deb12-lab-10d:~# losetup
NAME          SIZELIMIT OFFSET AUTOCLEAR RO BACK-FILE                    
    DIO LOG-SEC
/dev/loop2046         0      0         0  0 
/mnt/FBX24T/FS-LAN/bckcrypt2046   1    4096

18:39:32 root@deb12-lab-10d:~# dmsetup ls | grep bckcrypt
bckcrypt    (254:7)

18:39:55 root@deb12-lab-10d:~# cryptsetup status bckcrypt
/dev/mapper/bckcrypt is active and is in use.
   type:    LUKS2
   cipher:  aes-xts-plain64
   keysize: 512 bits
   key location: keyring
   device:  /dev/loop2046
   loop:    /mnt/FBX24T/FS-LAN/bckcrypt2046
   sector size:  512
   offset:  32768 sectors
   size:    8589901824 sectors
   mode:    read/write

18:40:36 root@deb12-lab-10d:~# df -h | egrep 'cifs|bckcrypt'
//10.0.10.100/FBX24T      cifs        22T     13T  9,0T  60% /mnt/FBX24T
/dev/mapper/bckcrypt      btrfs      4,0T    3,3T  779G  82% 
/mnt/bckcrypt


09:08:44 root@deb12-lab-10d:~# LANG=en_US.UTF-8
09:08:46 root@deb12-lab-10d:~# dd if=/dev/zero 
of=/mnt/bckcrypt/test/test.dd bs=256M count=16 oflag=direct 
status=progress
4294967296 bytes (4.3 GB, 4.0 GiB) copied, 14 s, 302 MB/s
16+0 records in
16+0 records out
4294967296 bytes (4.3 GB, 4.0 GiB) copied, 14.2061 s, 302 MB/s



No write errors using '--direct-io=on' option of losetup with this patch 
=> writing to the back-file is more than 20x faster ...
It seems to be ok !

Let me know if something's wrong in this patch or if it can safely be 
used in production.

Again thanks everyone for help.
Nicolas



Le 2025-01-06 13:07, nicolas.baranger@3xo.fr a écrit :

> Hi David
> 
> Thanks for the job !
> I will buid Linux 6.10 and mainline with the provided change and I'm 
> comming here as soon as I get results from tests (CET working time).
> 
> Thanks again for help in this issue
> Nicolas
> 
> Le 2025-01-06 12:37, David Howells a écrit :
> 
>> Hi Nicolas,
>> 
>> Does the attached fix your problem?
>> 
>> David
>> ---
>> netfs: Fix kernel async DIO
>> 
>> Netfslib needs to be able to handle kernel-initiated asynchronous DIO 
>> that
>> is supplied with a bio_vec[] array.  Currently, because of the async 
>> flag,
>> this gets passed to netfs_extract_user_iter() which throws a warning 
>> and
>> fails because it only handles IOVEC and UBUF iterators.  This can be
>> triggered through a combination of cifs and a loopback blockdev with
>> something like:
>> 
>> mount //my/cifs/share /foo
>> dd if=/dev/zero of=/foo/m0 bs=4K count=1K
>> losetup --sector-size 4096 --direct-io=on /dev/loop2046 /foo/m0
>> echo hello >/dev/loop2046
>> 
>> This causes the following to appear in syslog:
>> 
>> WARNING: CPU: 2 PID: 109 at fs/netfs/iterator.c:50 
>> netfs_extract_user_iter+0x170/0x250 [netfs]
>> 
>> and the write to fail.
>> 
>> Fix this by removing the check in netfs_unbuffered_write_iter_locked() 
>> that
>> causes async kernel DIO writes to be handled as userspace writes.  
>> Note
>> that this change relies on the kernel caller maintaining the existence 
>> of
>> the bio_vec array (or kvec[] or folio_queue) until the op is complete.
>> 
>> Fixes: 153a9961b551 ("netfs: Implement unbuffered/DIO write support")
>> Reported by: Nicolas Baranger <nicolas.baranger@3xo.fr>
>> Closes: 
>> https://lore.kernel.org/r/fedd8a40d54b2969097ffa4507979858@3xo.fr/
>> Signed-off-by: David Howells <dhowells@redhat.com>
>> cc: Steve French <smfrench@gmail.com>
>> cc: Jeff Layton <jlayton@kernel.org>
>> cc: netfs@lists.linux.dev
>> cc: linux-cifs@vger.kernel.org
>> cc: linux-fsdevel@vger.kernel.org
>> ---
>> fs/netfs/direct_write.c |    7 ++++++-
>> 1 file changed, 6 insertions(+), 1 deletion(-)
>> 
>> diff --git a/fs/netfs/direct_write.c b/fs/netfs/direct_write.c
>> index eded8afaa60b..42ce53cc216e 100644
>> --- a/fs/netfs/direct_write.c
>> +++ b/fs/netfs/direct_write.c
>> @@ -67,7 +67,7 @@ ssize_t netfs_unbuffered_write_iter_locked(struct 
>> kiocb *iocb, struct iov_iter *
>> * allocate a sufficiently large bvec array and may shorten the
>> * request.
>> */
>> -        if (async || user_backed_iter(iter)) {
>> +        if (user_backed_iter(iter)) {
>> n = netfs_extract_user_iter(iter, len, &wreq->buffer.iter, 0);
>> if (n < 0) {
>> ret = n;
>> @@ -77,6 +77,11 @@ ssize_t netfs_unbuffered_write_iter_locked(struct 
>> kiocb *iocb, struct iov_iter *
>> wreq->direct_bv_count = n;
>> wreq->direct_bv_unpin = iov_iter_extract_will_pin(iter);
>> } else {
>> +            /* If this is a kernel-generated async DIO request,
>> +             * assume that any resources the iterator points to
>> +             * (eg. a bio_vec array) will persist till the end of
>> +             * the op.
>> +             */
>> wreq->buffer.iter = *iter;
>> }
>> }
Paulo Alcantara Jan. 7, 2025, 12:03 p.m. UTC | #4
David Howells <dhowells@redhat.com> writes:

> netfs: Fix kernel async DIO
>
> Netfslib needs to be able to handle kernel-initiated asynchronous DIO that
> is supplied with a bio_vec[] array.  Currently, because of the async flag,
> this gets passed to netfs_extract_user_iter() which throws a warning and
> fails because it only handles IOVEC and UBUF iterators.  This can be
> triggered through a combination of cifs and a loopback blockdev with
> something like:
>
>         mount //my/cifs/share /foo
>         dd if=/dev/zero of=/foo/m0 bs=4K count=1K
>         losetup --sector-size 4096 --direct-io=on /dev/loop2046 /foo/m0
>         echo hello >/dev/loop2046
>
> This causes the following to appear in syslog:
>
>         WARNING: CPU: 2 PID: 109 at fs/netfs/iterator.c:50 netfs_extract_user_iter+0x170/0x250 [netfs]
>
> and the write to fail.
>
> Fix this by removing the check in netfs_unbuffered_write_iter_locked() that
> causes async kernel DIO writes to be handled as userspace writes.  Note
> that this change relies on the kernel caller maintaining the existence of
> the bio_vec array (or kvec[] or folio_queue) until the op is complete.
>
> Fixes: 153a9961b551 ("netfs: Implement unbuffered/DIO write support")
> Reported by: Nicolas Baranger <nicolas.baranger@3xo.fr>
> Closes: https://lore.kernel.org/r/fedd8a40d54b2969097ffa4507979858@3xo.fr/
> Signed-off-by: David Howells <dhowells@redhat.com>
> cc: Steve French <smfrench@gmail.com>
> cc: Jeff Layton <jlayton@kernel.org>
> cc: netfs@lists.linux.dev
> cc: linux-cifs@vger.kernel.org
> cc: linux-fsdevel@vger.kernel.org
> ---
>  fs/netfs/direct_write.c |    7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)

LGTM.  Feel free to add:

Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>

Thanks Christoph and Dave!
David Howells Jan. 7, 2025, 2:49 p.m. UTC | #5
Thanks!

I ported the patch to linus/master (see below) and it looks pretty much the
same as yours, give or take tabs getting converted to spaces.

Could I put you down as a Tested-by?

David

---
netfs: Fix kernel async DIO

Netfslib needs to be able to handle kernel-initiated asynchronous DIO that
is supplied with a bio_vec[] array.  Currently, because of the async flag,
this gets passed to netfs_extract_user_iter() which throws a warning and
fails because it only handles IOVEC and UBUF iterators.  This can be
triggered through a combination of cifs and a loopback blockdev with
something like:

        mount //my/cifs/share /foo
        dd if=/dev/zero of=/foo/m0 bs=4K count=1K
        losetup --sector-size 4096 --direct-io=on /dev/loop2046 /foo/m0
        echo hello >/dev/loop2046

This causes the following to appear in syslog:

        WARNING: CPU: 2 PID: 109 at fs/netfs/iterator.c:50 netfs_extract_user_iter+0x170/0x250 [netfs]

and the write to fail.

Fix this by removing the check in netfs_unbuffered_write_iter_locked() that
causes async kernel DIO writes to be handled as userspace writes.  Note
that this change relies on the kernel caller maintaining the existence of
the bio_vec array (or kvec[] or folio_queue) until the op is complete.

Fixes: 153a9961b551 ("netfs: Implement unbuffered/DIO write support")
Reported by: Nicolas Baranger <nicolas.baranger@3xo.fr>
Closes: https://lore.kernel.org/r/fedd8a40d54b2969097ffa4507979858@3xo.fr/
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steve French <smfrench@gmail.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: netfs@lists.linux.dev
cc: linux-cifs@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
---
 fs/netfs/direct_write.c |    7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/fs/netfs/direct_write.c b/fs/netfs/direct_write.c
index 173e8b5e6a93..f9421f3e6d37 100644
--- a/fs/netfs/direct_write.c
+++ b/fs/netfs/direct_write.c
@@ -67,7 +67,7 @@ ssize_t netfs_unbuffered_write_iter_locked(struct kiocb *iocb, struct iov_iter *
 		 * allocate a sufficiently large bvec array and may shorten the
 		 * request.
 		 */
-		if (async || user_backed_iter(iter)) {
+		if (user_backed_iter(iter)) {
 			n = netfs_extract_user_iter(iter, len, &wreq->iter, 0);
 			if (n < 0) {
 				ret = n;
@@ -77,6 +77,11 @@ ssize_t netfs_unbuffered_write_iter_locked(struct kiocb *iocb, struct iov_iter *
 			wreq->direct_bv_count = n;
 			wreq->direct_bv_unpin = iov_iter_extract_will_pin(iter);
 		} else {
+			/* If this is a kernel-generated async DIO request,
+			 * assume that any resources the iterator points to
+			 * (eg. a bio_vec array) will persist till the end of
+			 * the op.
+			 */
 			wreq->iter = *iter;
 		}
Nicolas Baranger Jan. 7, 2025, 6:08 p.m. UTC | #6
Hi David

Sure you can !

Please also note that after building 'linux-next' and applying the first 
patch you provide I sucessfully test DIO write (same test process as 
before).
It works fine too !

I stay availiable for further testing

Thanks again for help (special thanks to Christoph and David)
Nicolas



Le 2025-01-07 15:49, David Howells a écrit :

> Thanks!
> 
> I ported the patch to linus/master (see below) and it looks pretty much 
> the
> same as yours, give or take tabs getting converted to spaces.
> 
> Could I put you down as a Tested-by?
> 
> David
> 
> ---
> netfs: Fix kernel async DIO
> 
> Netfslib needs to be able to handle kernel-initiated asynchronous DIO 
> that
> is supplied with a bio_vec[] array.  Currently, because of the async 
> flag,
> this gets passed to netfs_extract_user_iter() which throws a warning 
> and
> fails because it only handles IOVEC and UBUF iterators.  This can be
> triggered through a combination of cifs and a loopback blockdev with
> something like:
> 
> mount //my/cifs/share /foo
> dd if=/dev/zero of=/foo/m0 bs=4K count=1K
> losetup --sector-size 4096 --direct-io=on /dev/loop2046 /foo/m0
> echo hello >/dev/loop2046
> 
> This causes the following to appear in syslog:
> 
> WARNING: CPU: 2 PID: 109 at fs/netfs/iterator.c:50 
> netfs_extract_user_iter+0x170/0x250 [netfs]
> 
> and the write to fail.
> 
> Fix this by removing the check in netfs_unbuffered_write_iter_locked() 
> that
> causes async kernel DIO writes to be handled as userspace writes.  Note
> that this change relies on the kernel caller maintaining the existence 
> of
> the bio_vec array (or kvec[] or folio_queue) until the op is complete.
> 
> Fixes: 153a9961b551 ("netfs: Implement unbuffered/DIO write support")
> Reported by: Nicolas Baranger <nicolas.baranger@3xo.fr>
> Closes: 
> https://lore.kernel.org/r/fedd8a40d54b2969097ffa4507979858@3xo.fr/
> Signed-off-by: David Howells <dhowells@redhat.com>
> cc: Steve French <smfrench@gmail.com>
> cc: Jeff Layton <jlayton@kernel.org>
> cc: netfs@lists.linux.dev
> cc: linux-cifs@vger.kernel.org
> cc: linux-fsdevel@vger.kernel.org
> ---
> fs/netfs/direct_write.c |    7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/netfs/direct_write.c b/fs/netfs/direct_write.c
> index 173e8b5e6a93..f9421f3e6d37 100644
> --- a/fs/netfs/direct_write.c
> +++ b/fs/netfs/direct_write.c
> @@ -67,7 +67,7 @@ ssize_t netfs_unbuffered_write_iter_locked(struct 
> kiocb *iocb, struct iov_iter *
> * allocate a sufficiently large bvec array and may shorten the
> * request.
> */
> -        if (async || user_backed_iter(iter)) {
> +        if (user_backed_iter(iter)) {
> n = netfs_extract_user_iter(iter, len, &wreq->iter, 0);
> if (n < 0) {
> ret = n;
> @@ -77,6 +77,11 @@ ssize_t netfs_unbuffered_write_iter_locked(struct 
> kiocb *iocb, struct iov_iter *
> wreq->direct_bv_count = n;
> wreq->direct_bv_unpin = iov_iter_extract_will_pin(iter);
> } else {
> +            /* If this is a kernel-generated async DIO request,
> +             * assume that any resources the iterator points to
> +             * (eg. a bio_vec array) will persist till the end of
> +             * the op.
> +             */
> wreq->iter = *iter;
> }
diff mbox series

Patch

diff --git a/fs/netfs/direct_write.c b/fs/netfs/direct_write.c
index eded8afaa60b..42ce53cc216e 100644
--- a/fs/netfs/direct_write.c
+++ b/fs/netfs/direct_write.c
@@ -67,7 +67,7 @@  ssize_t netfs_unbuffered_write_iter_locked(struct kiocb *iocb, struct iov_iter *
 		 * allocate a sufficiently large bvec array and may shorten the
 		 * request.
 		 */
-		if (async || user_backed_iter(iter)) {
+		if (user_backed_iter(iter)) {
 			n = netfs_extract_user_iter(iter, len, &wreq->buffer.iter, 0);
 			if (n < 0) {
 				ret = n;
@@ -77,6 +77,11 @@  ssize_t netfs_unbuffered_write_iter_locked(struct kiocb *iocb, struct iov_iter *
 			wreq->direct_bv_count = n;
 			wreq->direct_bv_unpin = iov_iter_extract_will_pin(iter);
 		} else {
+			/* If this is a kernel-generated async DIO request,
+			 * assume that any resources the iterator points to
+			 * (eg. a bio_vec array) will persist till the end of
+			 * the op.
+			 */
 			wreq->buffer.iter = *iter;
 		}
 	}