diff mbox series

fs/netfs/read_collect: add to next->prev_donated

Message ID 20250210191118.3444416-1-max.kellermann@ionos.com (mailing list archive)
State Handled Elsewhere, archived
Headers show
Series fs/netfs/read_collect: add to next->prev_donated | expand

Commit Message

Max Kellermann Feb. 10, 2025, 7:11 p.m. UTC
If multiple subrequests donate data to the same "next" request
(depending on the subrequest completion order), each of them would
overwrite the `prev_donated` field, causing data corruption and a
BUG() crash ("Can't donate prior to front").

Fixes: ee4cdf7ba857 ("netfs: Speed up buffered reading")
Closes: https://lore.kernel.org/netfs/CAKPOu+_4mUwYgQtRTbXCmi+-k3PGvLysnPadkmHOyB7Gz0iSMA@mail.gmail.com/
Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
---
David, this seems to fix the bug for me.  Please also check if we need
a "donation_changed" check.
---
 fs/netfs/read_collect.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

David Howells Feb. 14, 2025, 12:47 p.m. UTC | #1
Max Kellermann <max.kellermann@ionos.com> wrote:

> David, this seems to fix the bug for me.  Please also check if we need
> a "donation_changed" check.

Shouldn't do since at that point we've been holding rreq->lock since the last
check.

> If multiple subrequests donate data to the same "next" request
> (depending on the subrequest completion order), each of them would
> overwrite the `prev_donated` field, causing data corruption and a
> BUG() crash ("Can't donate prior to front").
> 
> Fixes: ee4cdf7ba857 ("netfs: Speed up buffered reading")
> Closes: https://lore.kernel.org/netfs/CAKPOu+_4mUwYgQtRTbXCmi+-k3PGvLysnPadkmHOyB7Gz0iSMA@mail.gmail.com/
> Signed-off-by: Max Kellermann <max.kellermann@ionos.com>

Signed-off-by: David Howells <dhowells@redhat.com>
Max Kellermann Feb. 20, 2025, 1:09 p.m. UTC | #2
On Fri, Feb 14, 2025 at 1:47 PM David Howells <dhowells@redhat.com> wrote:
> Signed-off-by: David Howells <dhowells@redhat.com>

Greg, you merged my other two netfs fixes into 6.13.3, but omitted
this one. Did you miss it, or was there another reason?
Greg KH Feb. 20, 2025, 2:17 p.m. UTC | #3
On Thu, Feb 20, 2025 at 02:09:17PM +0100, Max Kellermann wrote:
> On Fri, Feb 14, 2025 at 1:47 PM David Howells <dhowells@redhat.com> wrote:
> > Signed-off-by: David Howells <dhowells@redhat.com>
> 
> Greg, you merged my other two netfs fixes into 6.13.3, but omitted
> this one. Did you miss it, or was there another reason?

It wasn't sent to me or to the stable list, so how could have I seen it?

confused,

greg k-h
Max Kellermann Feb. 20, 2025, 3 p.m. UTC | #4
On Thu, Feb 20, 2025 at 3:17 PM Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
> It wasn't sent to me or to the stable list, so how could have I seen it?

Oh, of course, I forgot to add stable. How shall we proceed? Do you
want me to resend to you with David's Signed-off-by?
Greg KH Feb. 20, 2025, 3:10 p.m. UTC | #5
On Thu, Feb 20, 2025 at 04:00:49PM +0100, Max Kellermann wrote:
> On Thu, Feb 20, 2025 at 3:17 PM Greg Kroah-Hartman
> <gregkh@linuxfoundation.org> wrote:
> > It wasn't sent to me or to the stable list, so how could have I seen it?
> 
> Oh, of course, I forgot to add stable. How shall we proceed? Do you
> want me to resend to you with David's Signed-off-by?

Please do, and cc: stable.

thanks,

greg k-h
Salvatore Bonaccorso March 1, 2025, 2:17 p.m. UTC | #6
Hi

(btw side question from a bystander looking a the bug)

On Thu, Feb 20, 2025 at 04:00:49PM +0100, Max Kellermann wrote:
> On Thu, Feb 20, 2025 at 3:17 PM Greg Kroah-Hartman
> <gregkh@linuxfoundation.org> wrote:
> > It wasn't sent to me or to the stable list, so how could have I seen it?
> 
> Oh, of course, I forgot to add stable. How shall we proceed? Do you
> want me to resend to you with David's Signed-off-by?

Do I see it correctly that this will be 6.12.y and 6.13.y specific
backports since the code in mainline changed substantially with
e2d46f2ec332 ("netfs: Change the read result collector to only use one
work item") and so your change does not apply there anymore?

I'm asking since we got a bug report in Debian which seems to idicate
it might have the same root cause as you report, and it is at
https://bugs.debian.org/1098698 .

Regards,
Salvatore
Max Kellermann March 1, 2025, 4:51 p.m. UTC | #7
On Sat, Mar 1, 2025 at 3:17 PM Salvatore Bonaccorso <carnil@debian.org> wrote:
> Do I see it correctly that this will be 6.12.y and 6.13.y specific
> backports since the code in mainline changed substantially with
> e2d46f2ec332 ("netfs: Change the read result collector to only use one
> work item") and so your change does not apply there anymore?

Correct.

> I'm asking since we got a bug report in Debian which seems to idicate
> it might have the same root cause as you report, and it is at
> https://bugs.debian.org/1098698 .

This indeed looks similar.

Note that this is one of four netfs crash fixes I posted recently.
Two other fixes have been released with 6.13.4 already:
https://lore.kernel.org/netfs/20250211093432.3524035-1-max.kellermann@ionos.com/
https://lore.kernel.org/netfs/20250210223144.3481766-1-max.kellermann@ionos.com/
The fourth bug (that got no reviews so far) is here (this one affects
master as well):
https://lore.kernel.org/netfs/20250228123602.2140459-1-max.kellermann@ionos.com/

Max
diff mbox series

Patch

diff --git a/fs/netfs/read_collect.c b/fs/netfs/read_collect.c
index 0d95cdbe5611..681b630b4f06 100644
--- a/fs/netfs/read_collect.c
+++ b/fs/netfs/read_collect.c
@@ -284,7 +284,7 @@  static bool netfs_consume_read_data(struct netfs_io_subrequest *subreq, bool was
 				   netfs_trace_donate_to_deferred_next);
 	} else {
 		next = list_next_entry(subreq, rreq_link);
-		WRITE_ONCE(next->prev_donated, excess);
+		WRITE_ONCE(next->prev_donated, next->prev_donated + excess);
 		trace_netfs_donate(rreq, subreq, next, excess,
 				   netfs_trace_donate_to_next);
 	}