Message ID | 20250210191118.3444416-1-max.kellermann@ionos.com (mailing list archive) |
---|---|
State | Handled Elsewhere, archived |
Headers | show |
Series | fs/netfs/read_collect: add to next->prev_donated | expand |
Max Kellermann <max.kellermann@ionos.com> wrote: > David, this seems to fix the bug for me. Please also check if we need > a "donation_changed" check. Shouldn't do since at that point we've been holding rreq->lock since the last check. > If multiple subrequests donate data to the same "next" request > (depending on the subrequest completion order), each of them would > overwrite the `prev_donated` field, causing data corruption and a > BUG() crash ("Can't donate prior to front"). > > Fixes: ee4cdf7ba857 ("netfs: Speed up buffered reading") > Closes: https://lore.kernel.org/netfs/CAKPOu+_4mUwYgQtRTbXCmi+-k3PGvLysnPadkmHOyB7Gz0iSMA@mail.gmail.com/ > Signed-off-by: Max Kellermann <max.kellermann@ionos.com> Signed-off-by: David Howells <dhowells@redhat.com>
On Fri, Feb 14, 2025 at 1:47 PM David Howells <dhowells@redhat.com> wrote:
> Signed-off-by: David Howells <dhowells@redhat.com>
Greg, you merged my other two netfs fixes into 6.13.3, but omitted
this one. Did you miss it, or was there another reason?
On Thu, Feb 20, 2025 at 02:09:17PM +0100, Max Kellermann wrote: > On Fri, Feb 14, 2025 at 1:47 PM David Howells <dhowells@redhat.com> wrote: > > Signed-off-by: David Howells <dhowells@redhat.com> > > Greg, you merged my other two netfs fixes into 6.13.3, but omitted > this one. Did you miss it, or was there another reason? It wasn't sent to me or to the stable list, so how could have I seen it? confused, greg k-h
On Thu, Feb 20, 2025 at 3:17 PM Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
> It wasn't sent to me or to the stable list, so how could have I seen it?
Oh, of course, I forgot to add stable. How shall we proceed? Do you
want me to resend to you with David's Signed-off-by?
On Thu, Feb 20, 2025 at 04:00:49PM +0100, Max Kellermann wrote: > On Thu, Feb 20, 2025 at 3:17 PM Greg Kroah-Hartman > <gregkh@linuxfoundation.org> wrote: > > It wasn't sent to me or to the stable list, so how could have I seen it? > > Oh, of course, I forgot to add stable. How shall we proceed? Do you > want me to resend to you with David's Signed-off-by? Please do, and cc: stable. thanks, greg k-h
Hi (btw side question from a bystander looking a the bug) On Thu, Feb 20, 2025 at 04:00:49PM +0100, Max Kellermann wrote: > On Thu, Feb 20, 2025 at 3:17 PM Greg Kroah-Hartman > <gregkh@linuxfoundation.org> wrote: > > It wasn't sent to me or to the stable list, so how could have I seen it? > > Oh, of course, I forgot to add stable. How shall we proceed? Do you > want me to resend to you with David's Signed-off-by? Do I see it correctly that this will be 6.12.y and 6.13.y specific backports since the code in mainline changed substantially with e2d46f2ec332 ("netfs: Change the read result collector to only use one work item") and so your change does not apply there anymore? I'm asking since we got a bug report in Debian which seems to idicate it might have the same root cause as you report, and it is at https://bugs.debian.org/1098698 . Regards, Salvatore
On Sat, Mar 1, 2025 at 3:17 PM Salvatore Bonaccorso <carnil@debian.org> wrote: > Do I see it correctly that this will be 6.12.y and 6.13.y specific > backports since the code in mainline changed substantially with > e2d46f2ec332 ("netfs: Change the read result collector to only use one > work item") and so your change does not apply there anymore? Correct. > I'm asking since we got a bug report in Debian which seems to idicate > it might have the same root cause as you report, and it is at > https://bugs.debian.org/1098698 . This indeed looks similar. Note that this is one of four netfs crash fixes I posted recently. Two other fixes have been released with 6.13.4 already: https://lore.kernel.org/netfs/20250211093432.3524035-1-max.kellermann@ionos.com/ https://lore.kernel.org/netfs/20250210223144.3481766-1-max.kellermann@ionos.com/ The fourth bug (that got no reviews so far) is here (this one affects master as well): https://lore.kernel.org/netfs/20250228123602.2140459-1-max.kellermann@ionos.com/ Max
diff --git a/fs/netfs/read_collect.c b/fs/netfs/read_collect.c index 0d95cdbe5611..681b630b4f06 100644 --- a/fs/netfs/read_collect.c +++ b/fs/netfs/read_collect.c @@ -284,7 +284,7 @@ static bool netfs_consume_read_data(struct netfs_io_subrequest *subreq, bool was netfs_trace_donate_to_deferred_next); } else { next = list_next_entry(subreq, rreq_link); - WRITE_ONCE(next->prev_donated, excess); + WRITE_ONCE(next->prev_donated, next->prev_donated + excess); trace_netfs_donate(rreq, subreq, next, excess, netfs_trace_donate_to_next); }
If multiple subrequests donate data to the same "next" request (depending on the subrequest completion order), each of them would overwrite the `prev_donated` field, causing data corruption and a BUG() crash ("Can't donate prior to front"). Fixes: ee4cdf7ba857 ("netfs: Speed up buffered reading") Closes: https://lore.kernel.org/netfs/CAKPOu+_4mUwYgQtRTbXCmi+-k3PGvLysnPadkmHOyB7Gz0iSMA@mail.gmail.com/ Signed-off-by: Max Kellermann <max.kellermann@ionos.com> --- David, this seems to fix the bug for me. Please also check if we need a "donation_changed" check. --- fs/netfs/read_collect.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)