Message ID | 20240719140907.1598372-2-dhowells@redhat.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | cifs: Miscellaneous fixes and a trace point | expand |
On 7/19/2024 10:09 AM, David Howells wrote: > When a subrequest is marked for needing retry, netfs will call > cifs_prepare_write() which will make cifs repick the server for the op > before renegotiating credits; it then calls cifs_issue_write() which > invokes smb2_async_writev() - which re-repicks the server. > > If a different server is then selected, this causes the increment of > server->in_flight to happen against one record and the decrement to happen > against another, leading to misaccounting. > > Fix this by just removing the repick code in smb2_async_writev(). As this > is only called from netfslib-driven code, cifs_prepare_write() should > always have been called first, and so server should never be NULL and the > preparatory step is repeated in the event that we do a retry. > > The problem manifests as a warning looking something like: > > WARNING: CPU: 4 PID: 72896 at fs/smb/client/smb2ops.c:97 smb2_add_credits+0x3f0/0x9e0 [cifs] > ... > RIP: 0010:smb2_add_credits+0x3f0/0x9e0 [cifs] > ... > smb2_writev_callback+0x334/0x560 [cifs] > cifs_demultiplex_thread+0x77a/0x11b0 [cifs] > kthread+0x187/0x1d0 > ret_from_fork+0x34/0x60 > ret_from_fork_asm+0x1a/0x30 > > Which may be triggered by a number of different xfstests running against an > Azure server in multichannel mode. generic/249 seems the most repeatable, > but generic/215, generic/249 and generic/308 may also show it. Nice fix, and good explanation. So, is this the negative-credits issue we've been looking to fix? Or just one instance? I'm very curious why it manifested when testing with Azure Files. Do connection errors disappear with the fix, or do they still occur but are now recoverable? Feel free to add... Acked-by: Tom Talpey <tom@talpey.com> Tom. > Fixes: 3ee1a1fc3981 ("cifs: Cut over to using netfslib") > Reported-by: Steve French <sfrench@samba.org> > Signed-off-by: David Howells <dhowells@redhat.com> > Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> > cc: Jeff Layton <jlayton@kernel.org> > cc: Aurelien Aptel <aaptel@suse.com> > cc: linux-cifs@vger.kernel.org > cc: netfs@lists.linux.dev > cc: linux-fsdevel@vger.kernel.org > --- > fs/smb/client/smb2pdu.c | 3 --- > 1 file changed, 3 deletions(-) > > diff --git a/fs/smb/client/smb2pdu.c b/fs/smb/client/smb2pdu.c > index 2ae2dbb6202b..bb84a89e5905 100644 > --- a/fs/smb/client/smb2pdu.c > +++ b/fs/smb/client/smb2pdu.c > @@ -4859,9 +4859,6 @@ smb2_async_writev(struct cifs_io_subrequest *wdata) > struct cifs_io_parms *io_parms = NULL; > int credit_request; > > - if (!wdata->server || test_bit(NETFS_SREQ_RETRYING, &wdata->subreq.flags)) > - server = wdata->server = cifs_pick_channel(tcon->ses); > - > /* > * in future we may get cifs_io_parms passed in from the caller, > * but for now we construct it here... > > >
On Fri, Jul 19, 2024 at 9:27 AM Tom Talpey <tom@talpey.com> wrote: > > On 7/19/2024 10:09 AM, David Howells wrote: > > When a subrequest is marked for needing retry, netfs will call > > cifs_prepare_write() which will make cifs repick the server for the op > > before renegotiating credits; it then calls cifs_issue_write() which > > invokes smb2_async_writev() - which re-repicks the server. > > > > If a different server is then selected, this causes the increment of > > server->in_flight to happen against one record and the decrement to happen > > against another, leading to misaccounting. > > > > Fix this by just removing the repick code in smb2_async_writev(). As this > > is only called from netfslib-driven code, cifs_prepare_write() should > > always have been called first, and so server should never be NULL and the > > preparatory step is repeated in the event that we do a retry. > > > > The problem manifests as a warning looking something like: > > > > WARNING: CPU: 4 PID: 72896 at fs/smb/client/smb2ops.c:97 smb2_add_credits+0x3f0/0x9e0 [cifs] > > ... > > RIP: 0010:smb2_add_credits+0x3f0/0x9e0 [cifs] > > ... > > smb2_writev_callback+0x334/0x560 [cifs] > > cifs_demultiplex_thread+0x77a/0x11b0 [cifs] > > kthread+0x187/0x1d0 > > ret_from_fork+0x34/0x60 > > ret_from_fork_asm+0x1a/0x30 > > > > Which may be triggered by a number of different xfstests running against an > > Azure server in multichannel mode. generic/249 seems the most repeatable, > > but generic/215, generic/249 and generic/308 may also show it. > > Nice fix, and good explanation. So, is this the negative-credits issue > we've been looking to fix? Or just one instance? Yes it fixes the only recent crediting issues that I had been seeing. > Feel free to add... > > Acked-by: Tom Talpey <tom@talpey.com> Done
diff --git a/fs/smb/client/smb2pdu.c b/fs/smb/client/smb2pdu.c index 2ae2dbb6202b..bb84a89e5905 100644 --- a/fs/smb/client/smb2pdu.c +++ b/fs/smb/client/smb2pdu.c @@ -4859,9 +4859,6 @@ smb2_async_writev(struct cifs_io_subrequest *wdata) struct cifs_io_parms *io_parms = NULL; int credit_request; - if (!wdata->server || test_bit(NETFS_SREQ_RETRYING, &wdata->subreq.flags)) - server = wdata->server = cifs_pick_channel(tcon->ses); - /* * in future we may get cifs_io_parms passed in from the caller, * but for now we construct it here...