Message ID | 20250212073440.12538-1-sprasad@microsoft.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | None | expand |
tentatively merged into cifs-2.6.git for-next pending more reviews and testing On Wed, Feb 12, 2025 at 1:35 AM <nspmangalore@gmail.com> wrote: > > From: Shyam Prasad N <sprasad@microsoft.com> > > Our current approach to select a channel for sending requests is this: > 1. iterate all channels to find the min and max queue depth > 2. if min and max are not the same, pick the channel with min depth > 3. if min and max are same, round robin, as all channels are equally loaded > > The problem with this approach is that there's a lag between selecting > a channel and sending the request (that increases the queue depth on the channel). > While these numbers will eventually catch up, there could be a skew in the > channel usage, depending on the application's I/O parallelism and the server's > speed of handling requests. > > With sufficient parallelism, this lag can artificially increase the queue depth, > thereby impacting the performance negatively. > > This change will change the step 1 above to start the iteration from the last > selected channel. This is to reduce the skew in channel usage even in the presence > of this lag. > > Fixes: ea90708d3cf3 ("cifs: use the least loaded channel for sending requests") > Cc: <stable@vger.kernel.org> > Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> > --- > fs/smb/client/transport.c | 14 +++++++------- > 1 file changed, 7 insertions(+), 7 deletions(-) > > diff --git a/fs/smb/client/transport.c b/fs/smb/client/transport.c > index 0dc80959ce48..e2fbf8b18eb2 100644 > --- a/fs/smb/client/transport.c > +++ b/fs/smb/client/transport.c > @@ -1015,14 +1015,16 @@ struct TCP_Server_Info *cifs_pick_channel(struct cifs_ses *ses) > uint index = 0; > unsigned int min_in_flight = UINT_MAX, max_in_flight = 0; > struct TCP_Server_Info *server = NULL; > - int i; > + int i, start, cur; > > if (!ses) > return NULL; > > spin_lock(&ses->chan_lock); > + start = atomic_inc_return(&ses->chan_seq); > for (i = 0; i < ses->chan_count; i++) { > - server = ses->chans[i].server; > + cur = (start + i) % ses->chan_count; > + server = ses->chans[cur].server; > if (!server || server->terminate) > continue; > > @@ -1039,17 +1041,15 @@ struct TCP_Server_Info *cifs_pick_channel(struct cifs_ses *ses) > */ > if (server->in_flight < min_in_flight) { > min_in_flight = server->in_flight; > - index = i; > + index = cur; > } > if (server->in_flight > max_in_flight) > max_in_flight = server->in_flight; > } > > /* if all channels are equally loaded, fall back to round-robin */ > - if (min_in_flight == max_in_flight) { > - index = (uint)atomic_inc_return(&ses->chan_seq); > - index %= ses->chan_count; > - } > + if (min_in_flight == max_in_flight) > + index = (uint)start % ses->chan_count; > > server = ses->chans[index].server; > spin_unlock(&ses->chan_lock); > -- > 2.43.0 >
Shyam, Could the two test failures in: http://smb311-linux-testing.southcentralus.cloudapp.azure.com/#/builders/3/builds/386 be related to this patch ie tests generic/133 and generic/471 failing with hundreds of "CIFS: Status code returned 0xc000009a STATUS_INSUFFICIENT_RESOURCES" errors. The only failing tests were multichannel related. This test run was with these six patches: 600ed21fe802 (HEAD -> for-next, origin/for-next, origin/HEAD) netfs: Fix a number of read-retry hangs 9f75ff5536b1 smb: client, common: Avoid multiple -Wflex-array-member-not-at-end warnings fab0eddb9fe7 cifs: Treat unhandled directory name surrogate reparse points as mount directory nodes 69476da76b9c cifs: Throw -EOPNOTSUPP error on unsupported reparse point type from parse_reparse_point() ef590eae88cf cifs: deal with the channel loading lag while picking channels f1bf10d7e909 cifs: pick channels for individual subrequests Anyone else seeing the same errors with multichannel on these tests? be related On Wed, Feb 12, 2025 at 2:35 PM Steve French <smfrench@gmail.com> wrote: > > tentatively merged into cifs-2.6.git for-next pending more reviews and testing > > On Wed, Feb 12, 2025 at 1:35 AM <nspmangalore@gmail.com> wrote: > > > > From: Shyam Prasad N <sprasad@microsoft.com> > > > > Our current approach to select a channel for sending requests is this: > > 1. iterate all channels to find the min and max queue depth > > 2. if min and max are not the same, pick the channel with min depth > > 3. if min and max are same, round robin, as all channels are equally loaded > > > > The problem with this approach is that there's a lag between selecting > > a channel and sending the request (that increases the queue depth on the channel). > > While these numbers will eventually catch up, there could be a skew in the > > channel usage, depending on the application's I/O parallelism and the server's > > speed of handling requests. > > > > With sufficient parallelism, this lag can artificially increase the queue depth, > > thereby impacting the performance negatively. > > > > This change will change the step 1 above to start the iteration from the last > > selected channel. This is to reduce the skew in channel usage even in the presence > > of this lag. > > > > Fixes: ea90708d3cf3 ("cifs: use the least loaded channel for sending requests") > > Cc: <stable@vger.kernel.org> > > Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> > > --- > > fs/smb/client/transport.c | 14 +++++++------- > > 1 file changed, 7 insertions(+), 7 deletions(-) > > > > diff --git a/fs/smb/client/transport.c b/fs/smb/client/transport.c > > index 0dc80959ce48..e2fbf8b18eb2 100644 > > --- a/fs/smb/client/transport.c > > +++ b/fs/smb/client/transport.c > > @@ -1015,14 +1015,16 @@ struct TCP_Server_Info *cifs_pick_channel(struct cifs_ses *ses) > > uint index = 0; > > unsigned int min_in_flight = UINT_MAX, max_in_flight = 0; > > struct TCP_Server_Info *server = NULL; > > - int i; > > + int i, start, cur; > > > > if (!ses) > > return NULL; > > > > spin_lock(&ses->chan_lock); > > + start = atomic_inc_return(&ses->chan_seq); > > for (i = 0; i < ses->chan_count; i++) { > > - server = ses->chans[i].server; > > + cur = (start + i) % ses->chan_count; > > + server = ses->chans[cur].server; > > if (!server || server->terminate) > > continue; > > > > @@ -1039,17 +1041,15 @@ struct TCP_Server_Info *cifs_pick_channel(struct cifs_ses *ses) > > */ > > if (server->in_flight < min_in_flight) { > > min_in_flight = server->in_flight; > > - index = i; > > + index = cur; > > } > > if (server->in_flight > max_in_flight) > > max_in_flight = server->in_flight; > > } > > > > /* if all channels are equally loaded, fall back to round-robin */ > > - if (min_in_flight == max_in_flight) { > > - index = (uint)atomic_inc_return(&ses->chan_seq); > > - index %= ses->chan_count; > > - } > > + if (min_in_flight == max_in_flight) > > + index = (uint)start % ses->chan_count; > > > > server = ses->chans[index].server; > > spin_unlock(&ses->chan_lock); > > -- > > 2.43.0 > > > > > -- > Thanks, > > Steve
diff --git a/fs/smb/client/transport.c b/fs/smb/client/transport.c index 0dc80959ce48..e2fbf8b18eb2 100644 --- a/fs/smb/client/transport.c +++ b/fs/smb/client/transport.c @@ -1015,14 +1015,16 @@ struct TCP_Server_Info *cifs_pick_channel(struct cifs_ses *ses) uint index = 0; unsigned int min_in_flight = UINT_MAX, max_in_flight = 0; struct TCP_Server_Info *server = NULL; - int i; + int i, start, cur; if (!ses) return NULL; spin_lock(&ses->chan_lock); + start = atomic_inc_return(&ses->chan_seq); for (i = 0; i < ses->chan_count; i++) { - server = ses->chans[i].server; + cur = (start + i) % ses->chan_count; + server = ses->chans[cur].server; if (!server || server->terminate) continue; @@ -1039,17 +1041,15 @@ struct TCP_Server_Info *cifs_pick_channel(struct cifs_ses *ses) */ if (server->in_flight < min_in_flight) { min_in_flight = server->in_flight; - index = i; + index = cur; } if (server->in_flight > max_in_flight) max_in_flight = server->in_flight; } /* if all channels are equally loaded, fall back to round-robin */ - if (min_in_flight == max_in_flight) { - index = (uint)atomic_inc_return(&ses->chan_seq); - index %= ses->chan_count; - } + if (min_in_flight == max_in_flight) + index = (uint)start % ses->chan_count; server = ses->chans[index].server; spin_unlock(&ses->chan_lock);