diff mbox series

[08/11] cifs: distribute channels across interfaces based on speed

Message ID 20230310153211.10982-8-sprasad@microsoft.com (mailing list archive)
State New, archived
Headers show
Series [01/11] cifs: fix tcon status change after tree connect | expand

Commit Message

Shyam Prasad N March 10, 2023, 3:32 p.m. UTC
Today, if the server interfaces RSS capable, we simply
choose the fastest interface to setup a channel. This is not
a scalable approach, and does not make a lot of attempt to
distribute the connections.

This change does a weighted distribution of channels across
all the available server interfaces, where the weight is
a function of the advertised interface speed.

Also make sure that we don't mix rdma and non-rdma for channels.

Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
---
 fs/cifs/cifs_debug.c | 16 +++++++++++
 fs/cifs/cifsglob.h   |  2 ++
 fs/cifs/sess.c       | 67 +++++++++++++++++++++++++++++++++++---------
 3 files changed, 71 insertions(+), 14 deletions(-)

Comments

Jan Čermák Feb. 27, 2024, 11:16 a.m. UTC | #1
Hi Shyam, everyone,

On 10. 03. 23 16:32, Shyam Prasad N wrote:

> +	if (!ses->iface_count) {
> +		spin_unlock(&ses->iface_lock);
> +		cifs_dbg(VFS, "server %s does not advertise interfaces\n", ses->server->hostname);
> +		return 0;
> +	}

May I ask why this is now being logged, and what can be tweaked in the 
case that a server does not advertise interfaces? After updating the 
kernel from 6.1 to 6.6 in Home Assistant OS, we got a report [1] of 
these messages appearing, yet so far only from a single attentive user. 
I wonder if we are going to see them more often, and it that case a 
suggestion to users would come handy. If there's not a simple 
resolution, could the verbosity be lowered to FYI, maybe?

Thanks,
Jan


[1] https://github.com/home-assistant/operating-system/issues/3201
Shyam Prasad N Feb. 27, 2024, 4:17 p.m. UTC | #2
On Tue, Feb 27, 2024 at 4:46 PM Jan Čermák <sairon@sairon.cz> wrote:
>
> Hi Shyam, everyone,
>
> On 10. 03. 23 16:32, Shyam Prasad N wrote:
>
> > +     if (!ses->iface_count) {
> > +             spin_unlock(&ses->iface_lock);
> > +             cifs_dbg(VFS, "server %s does not advertise interfaces\n", ses->server->hostname);
> > +             return 0;
> > +     }
>
> May I ask why this is now being logged, and what can be tweaked in the
> case that a server does not advertise interfaces? After updating the
> kernel from 6.1 to 6.6 in Home Assistant OS, we got a report [1] of
> these messages appearing, yet so far only from a single attentive user.
> I wonder if we are going to see them more often, and it that case a
> suggestion to users would come handy. If there's not a simple
> resolution, could the verbosity be lowered to FYI, maybe?
>
> Thanks,
> Jan
>
>
> [1] https://github.com/home-assistant/operating-system/issues/3201

Hi Jan,

These messages (in theory) should not show up if either multichannel
or max_channels are not specified mount options.
And if multichannel is enabled, then the server should support the
ioctl to return the server interface list.
These messages are meant to help the user/developer understand why
multiple channels are not getting established to the server, even
after specifying multichannel as a mount option.

The repeating nature of these messages here leads me to also believe
that there's something fishy going on here.
Either the network health is not good, or that there's some bug at play here.
Jan Čermák Feb. 28, 2024, 9:22 a.m. UTC | #3
Hi Shyam,

On 27. 02. 24 17:17, Shyam Prasad N wrote:
> These messages (in theory) should not show up if either multichannel
> or max_channels are not specified mount options.

That shouldn't be the case here, I checked with the user and he's not 
doing anything fishy himself (like interfering with the standard mount 
utilities), and the userspace tools creating the mounts should not be 
setting any of these options, which I confirmed by asking for his mounts 
list:

//192.168.1.12/folder on /mnt/data/supervisor/mounts/folder type cifs 
(rw,relatime,vers=default,cache=strict,username=user,uid=0,noforceuid,gid=0,noforcegid,addr=192.168.1.12,file_mode=0755,dir_mode=0755,soft,nounix,mapposix,rsize=4194304,wsize=4194304,bsize=1048576,echo_interval=60,actimeo=1,closetimeo=1)
//192.168.1.12/folder on /mnt/data/supervisor/media/folder type cifs 
(rw,relatime,vers=default,cache=strict,username=user,uid=0,noforceuid,gid=0,noforcegid,addr=192.168.1.12,file_mode=0755,dir_mode=0755,soft,nounix,mapposix,rsize=4194304,wsize=4194304,bsize=1048576,echo_interval=60,actimeo=1,closetimeo=1)


Or am I missing anything here?

> The repeating nature of these messages here leads me to also believe
> that there's something fishy going on here.
> Either the network health is not good, or that there's some bug at play here.

Maybe, however I'm not able to reproduce the above behavior yet. But 
there's been so far one more report of this happening, so it's not a 
single isolated case. I appreciate any advice what to look at further.

Cheers,
Jan
Shyam Prasad N March 5, 2024, 2:56 p.m. UTC | #4
On Wed, Feb 28, 2024 at 2:52 PM Jan Čermák <sairon@sairon.cz> wrote:
>
> Hi Shyam,
>

Hi Jan,
Apologies for the delay.

> On 27. 02. 24 17:17, Shyam Prasad N wrote:
> > These messages (in theory) should not show up if either multichannel
> > or max_channels are not specified mount options.
>
> That shouldn't be the case here, I checked with the user and he's not
> doing anything fishy himself (like interfering with the standard mount
> utilities), and the userspace tools creating the mounts should not be
> setting any of these options, which I confirmed by asking for his mounts
> list:
>
> //192.168.1.12/folder on /mnt/data/supervisor/mounts/folder type cifs
> (rw,relatime,vers=default,cache=strict,username=user,uid=0,noforceuid,gid=0,noforcegid,addr=192.168.1.12,file_mode=0755,dir_mode=0755,soft,nounix,mapposix,rsize=4194304,wsize=4194304,bsize=1048576,echo_interval=60,actimeo=1,closetimeo=1)
> //192.168.1.12/folder on /mnt/data/supervisor/media/folder type cifs
> (rw,relatime,vers=default,cache=strict,username=user,uid=0,noforceuid,gid=0,noforcegid,addr=192.168.1.12,file_mode=0755,dir_mode=0755,soft,nounix,mapposix,rsize=4194304,wsize=4194304,bsize=1048576,echo_interval=60,actimeo=1,closetimeo=1)

Hmmm.. That seems like a bug.
Is there any chance that the user is willing to try out if the same
bug reproduces with the latest mainline kernel?

The other option is for us to try with the 6.6 kernel. But without the
steps to repro, it'll just be shots in the dark.
Let me try to go through the code and see if I can spot anything here.

>
>
> Or am I missing anything here?
>
> > The repeating nature of these messages here leads me to also believe
> > that there's something fishy going on here.
> > Either the network health is not good, or that there's some bug at play here.
>
> Maybe, however I'm not able to reproduce the above behavior yet. But
> there's been so far one more report of this happening, so it's not a
> single isolated case. I appreciate any advice what to look at further.

If a user has reproduced this issue, the one thing they can send us is
the ftrace output of cifs events when the issue is being seen.
i.e. something like this:
# trace-cmd start -e cifs
# <now wait for the issue to reproduce>
# trace-cmd stop
# trace-cmd extract > /tmp/outputs.txt
# uname -r >> /tmp/outputs,txt
# cat /proc/fs/cifs/DebugData >> /tmp/outputs,txt
# cat /proc/fs/cifs/Stats >> /tmp/outputs,txt

And then provide the outputs.txt file to us.
Going through that capture can help us understand this better.

>
> Cheers,
> Jan
Paulo Alcantara March 6, 2024, 3:43 p.m. UTC | #5
Shyam Prasad N <nspmangalore@gmail.com> writes:

> On Wed, Feb 28, 2024 at 2:52 PM Jan Čermák <sairon@sairon.cz> wrote:
>>
>> Hi Shyam,
>>
>
> Hi Jan,
> Apologies for the delay.
>
>> On 27. 02. 24 17:17, Shyam Prasad N wrote:
>> > These messages (in theory) should not show up if either multichannel
>> > or max_channels are not specified mount options.
>>
>> That shouldn't be the case here, I checked with the user and he's not
>> doing anything fishy himself (like interfering with the standard mount
>> utilities), and the userspace tools creating the mounts should not be
>> setting any of these options, which I confirmed by asking for his mounts
>> list:
>>
>> //192.168.1.12/folder on /mnt/data/supervisor/mounts/folder type cifs
>> (rw,relatime,vers=default,cache=strict,username=user,uid=0,noforceuid,gid=0,noforcegid,addr=192.168.1.12,file_mode=0755,dir_mode=0755,soft,nounix,mapposix,rsize=4194304,wsize=4194304,bsize=1048576,echo_interval=60,actimeo=1,closetimeo=1)
>> //192.168.1.12/folder on /mnt/data/supervisor/media/folder type cifs
>> (rw,relatime,vers=default,cache=strict,username=user,uid=0,noforceuid,gid=0,noforcegid,addr=192.168.1.12,file_mode=0755,dir_mode=0755,soft,nounix,mapposix,rsize=4194304,wsize=4194304,bsize=1048576,echo_interval=60,actimeo=1,closetimeo=1)
>
> Hmmm.. That seems like a bug.

Yes.  I see a couple of issues here:

(1) cifs_chan_update_iface() seems to be called over reconnect for all
dialect versions and servers that do not support
FSCTL_QUERY_NETWORK_INTERFACE_INFO, so @ses->iface_count will be zero in
some cases and then VFS message will start being flooded on dmesg.

(2) __cifs_reconnect() is queueing smb2_reconnect_server() even for SMB1
mounts, so smb2_reconnect() ends up being called and then call
SMB3_request_interfaces() because SMB2_GLOBAL_CAP_MULTI_CHANNEL is mixed
with CAP_LARGE_FILES.
Jan Čermák March 11, 2024, 10:01 a.m. UTC | #6
Hi Paulo, Shyam,

On 06. 03. 24 16:43, Paulo Alcantara wrote:
> 
> Yes.  I see a couple of issues here:
> 
> (1) cifs_chan_update_iface() seems to be called over reconnect for all
> dialect versions and servers that do not support
> FSCTL_QUERY_NETWORK_INTERFACE_INFO, so @ses->iface_count will be zero in
> some cases and then VFS message will start being flooded on dmesg.
> 
> (2) __cifs_reconnect() is queueing smb2_reconnect_server() even for SMB1
> mounts, so smb2_reconnect() ends up being called and then call
> SMB3_request_interfaces() because SMB2_GLOBAL_CAP_MULTI_CHANNEL is mixed
> with CAP_LARGE_FILES.

thanks for looking into this! Is there anything else you'll need from me 
or to test by the users who are observing the issue, or do you have 
enough information for a fix?

Regards,
Jan
Shyam Prasad N March 11, 2024, 11:14 a.m. UTC | #7
On Mon, Mar 11, 2024 at 3:31 PM Jan Čermák <sairon@sairon.cz> wrote:
>
> Hi Paulo, Shyam,
>
> On 06. 03. 24 16:43, Paulo Alcantara wrote:
> >
> > Yes.  I see a couple of issues here:
> >

Thanks for the review, Paulo.

> > (1) cifs_chan_update_iface() seems to be called over reconnect for all
> > dialect versions and servers that do not support
> > FSCTL_QUERY_NETWORK_INTERFACE_INFO, so @ses->iface_count will be zero in
> > some cases and then VFS message will start being flooded on dmesg.

Valid point.

> >
> > (2) __cifs_reconnect() is queueing smb2_reconnect_server() even for SMB1
> > mounts, so smb2_reconnect() ends up being called and then call
> > SMB3_request_interfaces() because SMB2_GLOBAL_CAP_MULTI_CHANNEL is mixed
> > with CAP_LARGE_FILES.

Excellent catch. I did not take this into account.
It is very likely that this is going on here.

Let me submit a patch to check the exact dialect before calling these
functions to make sure we only do it for SMB3+.

>
> thanks for looking into this! Is there anything else you'll need from me
> or to test by the users who are observing the issue, or do you have
> enough information for a fix?
>
> Regards,
> Jan

Thanks for bringing this to our notice, Jan.
Jan Čermák March 12, 2024, 2:20 p.m. UTC | #8
On 11. 03. 24 12:14, Shyam Prasad N wrote:
> Let me submit a patch to check the exact dialect before calling these
> functions to make sure we only do it for SMB3+.
> 

FWIW, most of the reports (except the first one with macOS) are users 
running their SMB server on QNAP NAS's (more details are in the GH 
ticket I linked earlier).

Please CC me in the e-mail with the patch as well, I can create a build 
with that patch applied to 6.6.y and ask if someone could try if it 
resolves the issue in their environment.

Cheers,
Jan
Shyam Prasad N March 13, 2024, 10:45 a.m. UTC | #9
On Tue, Mar 12, 2024 at 7:50 PM Jan Čermák <sairon@sairon.cz> wrote:
>
> On 11. 03. 24 12:14, Shyam Prasad N wrote:
> > Let me submit a patch to check the exact dialect before calling these
> > functions to make sure we only do it for SMB3+.
> >
>
> FWIW, most of the reports (except the first one with macOS) are users
> running their SMB server on QNAP NAS's (more details are in the GH
> ticket I linked earlier).
>
> Please CC me in the e-mail with the patch as well, I can create a build
> with that patch applied to 6.6.y and ask if someone could try if it
> resolves the issue in their environment.
>
> Cheers,
> Jan

Just sent two patches for this.
One patch is just to change the log level for this log from VFS -> FYI.
The other one is for the suggestions made by Paulo.
#1 will fix this anyway. I'm curious to know if #2 alone would fix this problem.
Also, please ask for the output of the following command while testing this out:
# cat /proc/fs/cifs/DebugData
Jan Čermák March 26, 2024, 2:10 p.m. UTC | #10
On 13. 03. 24 11:45, Shyam Prasad N wrote:
> Just sent two patches for this.
> One patch is just to change the log level for this log from VFS -> FYI.
> The other one is for the suggestions made by Paulo.
> #1 will fix this anyway. I'm curious to know if #2 alone would fix this problem.
> Also, please ask for the output of the following command while testing this out:
> # cat /proc/fs/cifs/DebugData

One of the users tried testing on the same system with only the second 
patch applied and reports the issue still persists :( He supplied the 
DebugData output only as a screenshot [1], not sure what to look there 
for, but definitely the "Server interfaces" section is missing there 
compared to my setup.

Cheers,
Jan

[1] 
https://github.com/home-assistant/operating-system/issues/3201#issuecomment-2019357433
diff mbox series

Patch

diff --git a/fs/cifs/cifs_debug.c b/fs/cifs/cifs_debug.c
index 4391c7aac3cb..cee3af02e2c3 100644
--- a/fs/cifs/cifs_debug.c
+++ b/fs/cifs/cifs_debug.c
@@ -219,6 +219,8 @@  static int cifs_debug_data_proc_show(struct seq_file *m, void *v)
 	struct cifs_ses *ses;
 	struct cifs_tcon *tcon;
 	struct cifs_server_iface *iface;
+	size_t iface_weight = 0, iface_min_speed = 0;
+	struct cifs_server_iface *last_iface = NULL;
 	int c, i, j;
 
 	seq_puts(m,
@@ -465,11 +467,25 @@  static int cifs_debug_data_proc_show(struct seq_file *m, void *v)
 					   "\tLast updated: %lu seconds ago",
 					   ses->iface_count,
 					   (jiffies - ses->iface_last_update) / HZ);
+
+			last_iface = list_last_entry(&ses->iface_list,
+						     struct cifs_server_iface,
+						     iface_head);
+			iface_min_speed = last_iface->speed;
+
 			j = 0;
 			list_for_each_entry(iface, &ses->iface_list,
 						 iface_head) {
 				seq_printf(m, "\n\t%d)", ++j);
 				cifs_dump_iface(m, iface);
+
+				iface_weight = iface->speed / iface_min_speed;
+				seq_printf(m, "\t\tWeight (cur,total): (%zu,%zu)"
+					   "\n\t\tAllocated channels: %u\n",
+					   iface->weight_fulfilled,
+					   iface_weight,
+					   iface->num_channels);
+
 				if (is_ses_using_iface(ses, iface))
 					seq_puts(m, "\t\t[CONNECTED]\n");
 			}
diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h
index a11e7b10f607..e3ba5c979832 100644
--- a/fs/cifs/cifsglob.h
+++ b/fs/cifs/cifsglob.h
@@ -948,6 +948,8 @@  struct cifs_server_iface {
 	struct list_head iface_head;
 	struct kref refcount;
 	size_t speed;
+	size_t weight_fulfilled;
+	unsigned int num_channels;
 	unsigned int rdma_capable : 1;
 	unsigned int rss_capable : 1;
 	unsigned int is_active : 1; /* unset if non existent */
diff --git a/fs/cifs/sess.c b/fs/cifs/sess.c
index b8bfebe4498e..78a7cfa75e91 100644
--- a/fs/cifs/sess.c
+++ b/fs/cifs/sess.c
@@ -167,7 +167,9 @@  int cifs_try_adding_channels(struct cifs_sb_info *cifs_sb, struct cifs_ses *ses)
 	int left;
 	int rc = 0;
 	int tries = 0;
+	size_t iface_weight = 0, iface_min_speed = 0;
 	struct cifs_server_iface *iface = NULL, *niface = NULL;
+	struct cifs_server_iface *last_iface = NULL;
 
 	spin_lock(&ses->chan_lock);
 
@@ -196,21 +198,11 @@  int cifs_try_adding_channels(struct cifs_sb_info *cifs_sb, struct cifs_ses *ses)
 	}
 	spin_unlock(&ses->chan_lock);
 
-	/*
-	 * Keep connecting to same, fastest, iface for all channels as
-	 * long as its RSS. Try next fastest one if not RSS or channel
-	 * creation fails.
-	 */
-	spin_lock(&ses->iface_lock);
-	iface = list_first_entry(&ses->iface_list, struct cifs_server_iface,
-				 iface_head);
-	spin_unlock(&ses->iface_lock);
-
 	while (left > 0) {
 
 		tries++;
 		if (tries > 3*ses->chan_max) {
-			cifs_dbg(FYI, "too many channel open attempts (%d channels left to open)\n",
+			cifs_dbg(VFS, "too many channel open attempts (%d channels left to open)\n",
 				 left);
 			break;
 		}
@@ -218,17 +210,34 @@  int cifs_try_adding_channels(struct cifs_sb_info *cifs_sb, struct cifs_ses *ses)
 		spin_lock(&ses->iface_lock);
 		if (!ses->iface_count) {
 			spin_unlock(&ses->iface_lock);
+			cifs_dbg(VFS, "server %s does not advertise interfaces\n", ses->server->hostname);
 			break;
 		}
 
+		if (!iface)
+			iface = list_first_entry(&ses->iface_list, struct cifs_server_iface,
+						 iface_head);
+		last_iface = list_last_entry(&ses->iface_list, struct cifs_server_iface,
+					     iface_head);
+		iface_min_speed = last_iface->speed;
+
 		list_for_each_entry_safe_from(iface, niface, &ses->iface_list,
 				    iface_head) {
+			/* do not mix rdma and non-rdma interfaces */
+			if (iface->rdma_capable != ses->server->rdma)
+				continue;
+
 			/* skip ifaces that are unusable */
 			if (!iface->is_active ||
 			    (is_ses_using_iface(ses, iface) &&
-			     !iface->rss_capable)) {
+			     !iface->rss_capable))
+				continue;
+
+			/* check if we already allocated enough channels */
+			iface_weight = iface->speed / iface_min_speed;
+
+			if (iface->weight_fulfilled >= iface_weight)
 				continue;
-			}
 
 			/* take ref before unlock */
 			kref_get(&iface->refcount);
@@ -245,10 +254,17 @@  int cifs_try_adding_channels(struct cifs_sb_info *cifs_sb, struct cifs_ses *ses)
 				continue;
 			}
 
-			cifs_dbg(FYI, "successfully opened new channel on iface:%pIS\n",
+			iface->num_channels++;
+			iface->weight_fulfilled++;
+			cifs_dbg(VFS, "successfully opened new channel on iface:%pIS\n",
 				 &iface->sockaddr);
 			break;
 		}
+
+		/* reached end of list. reset weight_fulfilled */
+		if (list_entry_is_head(iface, &ses->iface_list, iface_head))
+			list_for_each_entry(iface, &ses->iface_list, iface_head)
+				iface->weight_fulfilled = 0;
 		spin_unlock(&ses->iface_lock);
 
 		left--;
@@ -267,8 +283,10 @@  int
 cifs_chan_update_iface(struct cifs_ses *ses, struct TCP_Server_Info *server)
 {
 	unsigned int chan_index;
+	size_t iface_weight = 0, iface_min_speed = 0;
 	struct cifs_server_iface *iface = NULL;
 	struct cifs_server_iface *old_iface = NULL;
+	struct cifs_server_iface *last_iface = NULL;
 	int rc = 0;
 
 	spin_lock(&ses->chan_lock);
@@ -288,13 +306,34 @@  cifs_chan_update_iface(struct cifs_ses *ses, struct TCP_Server_Info *server)
 	spin_unlock(&ses->chan_lock);
 
 	spin_lock(&ses->iface_lock);
+	if (!ses->iface_count) {
+		spin_unlock(&ses->iface_lock);
+		cifs_dbg(VFS, "server %s does not advertise interfaces\n", ses->server->hostname);
+		return 0;
+	}
+
+	last_iface = list_last_entry(&ses->iface_list, struct cifs_server_iface,
+				     iface_head);
+	iface_min_speed = last_iface->speed;
+
 	/* then look for a new one */
 	list_for_each_entry(iface, &ses->iface_list, iface_head) {
+		/* do not mix rdma and non-rdma interfaces */
+		if (iface->rdma_capable != server->rdma)
+			continue;
+
 		if (!iface->is_active ||
 		    (is_ses_using_iface(ses, iface) &&
 		     !iface->rss_capable)) {
 			continue;
 		}
+
+		/* check if we already allocated enough channels */
+		iface_weight = iface->speed / iface_min_speed;
+
+		if (iface->weight_fulfilled >= iface_weight)
+			continue;
+
 		kref_get(&iface->refcount);
 		break;
 	}