diff mbox series

[1/2] ksmbd: fix possibly wrong init value for RDMA buffer size

Message ID 20250106033956.27445-1-xw897002528@gmail.com (mailing list archive)
State New, archived
Headers show
Series [1/2] ksmbd: fix possibly wrong init value for RDMA buffer size | expand

Commit Message

He Wang Jan. 6, 2025, 3:39 a.m. UTC
Field `initiator_depth` is for incoming request.

According to the man page, `max_qp_rd_atom` is the maximum number of
outstanding packaets, and `max_qp_init_rd_atom` is the maximum depth of
incoming requests.

Signed-off-by: He Wang <xw897002528@gmail.com>
---
 fs/smb/server/transport_rdma.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Namjae Jeon Jan. 6, 2025, 8:30 a.m. UTC | #1
On Mon, Jan 6, 2025 at 12:40 PM He Wang <xw897002528@gmail.com> wrote:
>
> Field `initiator_depth` is for incoming request.
>
> According to the man page, `max_qp_rd_atom` is the maximum number of
> outstanding packaets, and `max_qp_init_rd_atom` is the maximum depth of
> incoming requests.
>
> Signed-off-by: He Wang <xw897002528@gmail.com>
Applied your two patches to #ksmbd-for-next-next.
Thanks!
Tom Talpey Jan. 7, 2025, 9:04 p.m. UTC | #2
On 1/5/2025 10:39 PM, He Wang wrote:
> Field `initiator_depth` is for incoming request.
> 
> According to the man page, `max_qp_rd_atom` is the maximum number of
> outstanding packaets, and `max_qp_init_rd_atom` is the maximum depth of
> incoming requests.

I do not believe this is correct, what "man page" are you referring to?
The commit message is definitely wrong. Neither value is referring to
generic "maximum packets" nor "incoming requests".

The max_qp_rd_atom is the "ORD" or outgoing read/atomic request depth.
The ksmbd server uses this to control RDMA Read requests to fetch data
from the client for certain SMB3_WRITE operations. (SMB Direct does not
use atomics)

The max_qp_init_rd_atom is the "IRD" or incoming read/atomic request
depth. The SMB3 protocol does not allow clients to request data from
servers via RDMA Read. This is absolutely by design, and the server
therefore does not use this value.

In practice, many RDMA providers set the rd_atom and rd_init_atom to
the same value, but this change would appear to break SMB Direct write
functionality when operating over providers that do not.

So, NAK.

Namjae, you should revert your upstream commit.

Tom.

> 
> Signed-off-by: He Wang <xw897002528@gmail.com>
> ---
>   fs/smb/server/transport_rdma.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/smb/server/transport_rdma.c b/fs/smb/server/transport_rdma.c
> index 0ef3c9f0b..c6dbbbb32 100644
> --- a/fs/smb/server/transport_rdma.c
> +++ b/fs/smb/server/transport_rdma.c
> @@ -1640,7 +1640,7 @@ static int smb_direct_accept_client(struct smb_direct_transport *t)
>   	int ret;
>   
>   	memset(&conn_param, 0, sizeof(conn_param));
> -	conn_param.initiator_depth = min_t(u8, t->cm_id->device->attrs.max_qp_rd_atom,
> +	conn_param.initiator_depth = min_t(u8, t->cm_id->device->attrs.max_qp_init_rd_atom,
>   					   SMB_DIRECT_CM_INITIATOR_DEPTH);
>   	conn_param.responder_resources = 0;
>
Namjae Jeon Jan. 7, 2025, 11:14 p.m. UTC | #3
On Wed, Jan 8, 2025 at 6:04 AM Tom Talpey <tom@talpey.com> wrote:
>
> On 1/5/2025 10:39 PM, He Wang wrote:
> > Field `initiator_depth` is for incoming request.
> >
> > According to the man page, `max_qp_rd_atom` is the maximum number of
> > outstanding packaets, and `max_qp_init_rd_atom` is the maximum depth of
> > incoming requests.
>
> I do not believe this is correct, what "man page" are you referring to?
> The commit message is definitely wrong. Neither value is referring to
> generic "maximum packets" nor "incoming requests".
>
> The max_qp_rd_atom is the "ORD" or outgoing read/atomic request depth.
> The ksmbd server uses this to control RDMA Read requests to fetch data
> from the client for certain SMB3_WRITE operations. (SMB Direct does not
> use atomics)
>
> The max_qp_init_rd_atom is the "IRD" or incoming read/atomic request
> depth. The SMB3 protocol does not allow clients to request data from
> servers via RDMA Read. This is absolutely by design, and the server
> therefore does not use this value.
>
> In practice, many RDMA providers set the rd_atom and rd_init_atom to
> the same value, but this change would appear to break SMB Direct write
> functionality when operating over providers that do not.
>
> So, NAK.
>
> Namjae, you should revert your upstream commit.
Okay, Thanks for your review:)
Steve, Please revert it in ksmbd-for-next also.

>
> Tom.
>
> >
> > Signed-off-by: He Wang <xw897002528@gmail.com>
> > ---
> >   fs/smb/server/transport_rdma.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/fs/smb/server/transport_rdma.c b/fs/smb/server/transport_rdma.c
> > index 0ef3c9f0b..c6dbbbb32 100644
> > --- a/fs/smb/server/transport_rdma.c
> > +++ b/fs/smb/server/transport_rdma.c
> > @@ -1640,7 +1640,7 @@ static int smb_direct_accept_client(struct smb_direct_transport *t)
> >       int ret;
> >
> >       memset(&conn_param, 0, sizeof(conn_param));
> > -     conn_param.initiator_depth = min_t(u8, t->cm_id->device->attrs.max_qp_rd_atom,
> > +     conn_param.initiator_depth = min_t(u8, t->cm_id->device->attrs.max_qp_init_rd_atom,
> >                                          SMB_DIRECT_CM_INITIATOR_DEPTH);
> >       conn_param.responder_resources = 0;
> >
>
Steve French Jan. 7, 2025, 11:32 p.m. UTC | #4
On Tue, Jan 7, 2025 at 5:14 PM Namjae Jeon <linkinjeon@kernel.org> wrote:
>
> On Wed, Jan 8, 2025 at 6:04 AM Tom Talpey <tom@talpey.com> wrote:
> >
> > I do not believe this is correct, what "man page" are you referring to?
> > The commit message is definitely wrong. Neither value is referring to
> > generic "maximum packets" nor "incoming requests".
> >
> > The max_qp_rd_atom is the "ORD" or outgoing read/atomic request depth.
> > The ksmbd server uses this to control RDMA Read requests to fetch data
> > from the client for certain SMB3_WRITE operations. (SMB Direct does not
> > use atomics)
> >
> > The max_qp_init_rd_atom is the "IRD" or incoming read/atomic request
> > depth. The SMB3 protocol does not allow clients to request data from
> > servers via RDMA Read. This is absolutely by design, and the server
> > therefore does not use this value.
> >
> > In practice, many RDMA providers set the rd_atom and rd_init_atom to
> > the same value, but this change would appear to break SMB Direct write
> > functionality when operating over providers that do not.
> >
> > So, NAK.
> >
> > Namjae, you should revert your upstream commit.
> Okay, Thanks for your review:)
> Steve, Please revert it in ksmbd-for-next also.


I have removed  "ksmbd: fix possibly wrong init value for RDMA buffer
size" so ksmbd-for-next currently has only these four:

37a11d8b2993 (HEAD -> ksmbd-for-next, origin/ksmbd-for-next) ksmbd:
Implement new SMB3 POSIX type
2ac538e40278 ksmbd: fix unexpectedly changed path in ksmbd_vfs_kern_path_locked
c7f3cd1b245d ksmbd: Remove unneeded if check in ksmbd_rdma_capable_netdev()
4c16e1cadcbc ksmbd: fix a missing return value check bug
Tom Talpey Jan. 8, 2025, 1:58 p.m. UTC | #5
On 1/7/2025 10:19 PM, He X wrote:
> Thanks for your review!
> 
> By man page, I mean rdma_xxx man pages like https://linux.die.net/man/3/ 
> rdma_connect <https://linux.die.net/man/3/rdma_connect>. I do mean ORD 
> or IRD, just bad wording.

Ok, that's the user verb API, we're in the kernel here. Some things are
similar, but not all.

> In short, RDMA on my setup did not work. While I am digging around, I 

Ok, that's important and perhaps this needs more digging. What was
your setup? Was it an iWARP connection, for example? The iWARP protocol
is stricter than IB for IRD, because it does not support "retry" when
there are insufficient resources. This is a Good Thing, by the way,
it avoids silly tail latencies. But it can cause sloppy upper layer
code to break.

If IRD/ORD is the problem, you'll see connections break when write-heavy
workloads are present. Is that what you mean by "did not work"?

> noticed that `initiator_depth` is generally set to `min(xxx, 
> max_qp_init_rd_atom)` in the kernel source code. I am not aware of that 
> ksmbd direct did not use IRD. And many clients set them to the same value.

Again "many"? Please be specific. Clients implement protocols, and
protocols have differing requirements. An SMB3 client should advertise
an ORD == 0, and should offer at least a small IRD > 0.

An SMB3 server will do the converse - an IRD == 0 at all times, and an
ORD > 0 in response to the client's offered IRD. The resulting limits
are exchanged in the SMB Direct negotiation packets. The IRD==0 is what
you see in the very next line after your change:

 >> conn_param.responder_resources = 0;

Other protocols may make different choices. Not this one.

Tom.


> 
> FYI, here is the original discussion on github https://github.com/ 
> namjaejeon/ksmbd/issues/497 <https://github.com/namjaejeon/ksmbd/ 
> issues/497>.
> 
> Tom Talpey <tom@talpey.com <mailto:tom@talpey.com>> 于2025年1月8日周三 
> 05:04写道:
> 
>     On 1/5/2025 10:39 PM, He Wang wrote:
>      > Field `initiator_depth` is for incoming request.
>      >
>      > According to the man page, `max_qp_rd_atom` is the maximum number of
>      > outstanding packaets, and `max_qp_init_rd_atom` is the maximum
>     depth of
>      > incoming requests.
> 
>     I do not believe this is correct, what "man page" are you referring to?
>     The commit message is definitely wrong. Neither value is referring to
>     generic "maximum packets" nor "incoming requests".
> 
>     The max_qp_rd_atom is the "ORD" or outgoing read/atomic request depth.
>     The ksmbd server uses this to control RDMA Read requests to fetch data
>     from the client for certain SMB3_WRITE operations. (SMB Direct does not
>     use atomics)
> 
>     The max_qp_init_rd_atom is the "IRD" or incoming read/atomic request
>     depth. The SMB3 protocol does not allow clients to request data from
>     servers via RDMA Read. This is absolutely by design, and the server
>     therefore does not use this value.
> 
>     In practice, many RDMA providers set the rd_atom and rd_init_atom to
>     the same value, but this change would appear to break SMB Direct write
>     functionality when operating over providers that do not.
> 
>     So, NAK.
> 
>     Namjae, you should revert your upstream commit.
> 
>     Tom.
> 
>      >
>      > Signed-off-by: He Wang <xw897002528@gmail.com
>     <mailto:xw897002528@gmail.com>>
>      > ---
>      >   fs/smb/server/transport_rdma.c | 2 +-
>      >   1 file changed, 1 insertion(+), 1 deletion(-)
>      >
>      > diff --git a/fs/smb/server/transport_rdma.c b/fs/smb/server/
>     transport_rdma.c
>      > index 0ef3c9f0b..c6dbbbb32 100644
>      > --- a/fs/smb/server/transport_rdma.c
>      > +++ b/fs/smb/server/transport_rdma.c
>      > @@ -1640,7 +1640,7 @@ static int smb_direct_accept_client(struct
>     smb_direct_transport *t)
>      >       int ret;
>      >
>      >       memset(&conn_param, 0, sizeof(conn_param));
>      > -     conn_param.initiator_depth = min_t(u8, t->cm_id->device-
>      >attrs.max_qp_rd_atom,
>      > +     conn_param.initiator_depth = min_t(u8, t->cm_id->device-
>      >attrs.max_qp_init_rd_atom,
>      >                                         
>     SMB_DIRECT_CM_INITIATOR_DEPTH);
>      >       conn_param.responder_resources = 0;
>      >
> 
> 
> 
> -- 
> Best regards,
> xhe
Tom Talpey Jan. 8, 2025, 4:38 p.m. UTC | #6
On 1/8/2025 10:03 AM, He X wrote:
>  > Ok, that's important and perhaps this needs more digging. What was
> your setup? Was it an iWARP connection, for example?
> 
> Direct connection between two mlx5_ib, ROCE network.
> 
>  > If IRD/ORD is the problem, you'll see connections break when write-heavy
> workloads are present. Is that what you mean by "did not work"?
> 
> Yes. Only disconnect when copying large files from clients(cifs) to 
> ksmbd. I do see some retrying in logs, but it is not able to recover.
> 
> I have cleared my testing logs, so I can not paste it here.

Ok. The interesting item would be the work request completion status
that preceded the connection failure, or the async error upcall event
from the rdma driver if that triggered first. Both client and server
logs are needed. And it can be a higher-level issue too, there were
some signing issues related to the fscache changes, these might be
in kernel 6.12. I tested mostly successfully at SDC in September with
them, anyway.

There may well be something else going on - RoCE can be very tricky
to set up since it depends on link layer flow control. You're not
using RoCEv2?

BTW the code does have some strange-looking defaults between client
and server IRD/ORD queue depths. The server defaults to 8 ORD, while
the client defaults to 32 IRD. This is odd, but not in itself fatal.
After all, other implementations (e.g. Windows) have their own defaults
too. The negotiation at both RDMA and SMB Direct should align them.

>  > Again "many"?
> 
> I mean the quote `In practice, many RDMA providers set the rd_atom and 
> rd_init_atom to the same value`.
> 
>> Other protocols may make different choices. Not this one.
> 
> Got. I'll do some more tests to see if I can find out the problem. 
> Thanks for your patience!

Great, looking forward to that.

Tom.

> 
> Tom Talpey <tom@talpey.com <mailto:tom@talpey.com>> 于2025年1月8日周三 
> 21:58写道:
> 
>     On 1/7/2025 10:19 PM, He X wrote:
>      > Thanks for your review!
>      >
>      > By man page, I mean rdma_xxx man pages like https://
>     linux.die.net/man/3/ <https://linux.die.net/man/3/>
>      > rdma_connect <https://linux.die.net/man/3/rdma_connect <https://
>     linux.die.net/man/3/rdma_connect>>. I do mean ORD
>      > or IRD, just bad wording.
> 
>     Ok, that's the user verb API, we're in the kernel here. Some things are
>     similar, but not all.
> 
>      > In short, RDMA on my setup did not work. While I am digging
>     around, I
> 
>     Ok, that's important and perhaps this needs more digging. What was
>     your setup? Was it an iWARP connection, for example? The iWARP protocol
>     is stricter than IB for IRD, because it does not support "retry" when
>     there are insufficient resources. This is a Good Thing, by the way,
>     it avoids silly tail latencies. But it can cause sloppy upper layer
>     code to break.
> 
>     If IRD/ORD is the problem, you'll see connections break when write-heavy
>     workloads are present. Is that what you mean by "did not work"?
> 
>      > noticed that `initiator_depth` is generally set to `min(xxx,
>      > max_qp_init_rd_atom)` in the kernel source code. I am not aware
>     of that
>      > ksmbd direct did not use IRD. And many clients set them to the
>     same value.
> 
>     Again "many"? Please be specific. Clients implement protocols, and
>     protocols have differing requirements. An SMB3 client should advertise
>     an ORD == 0, and should offer at least a small IRD > 0.
> 
>     An SMB3 server will do the converse - an IRD == 0 at all times, and an
>     ORD > 0 in response to the client's offered IRD. The resulting limits
>     are exchanged in the SMB Direct negotiation packets. The IRD==0 is what
>     you see in the very next line after your change:
> 
>       >> conn_param.responder_resources = 0;
> 
>     Other protocols may make different choices. Not this one.
> 
>     Tom.
> 
> 
>      >
>      > FYI, here is the original discussion on github https://
>     github.com/ <https://github.com/>
>      > namjaejeon/ksmbd/issues/497 <https://github.com/namjaejeon/ksmbd/
>     <https://github.com/namjaejeon/ksmbd/>
>      > issues/497>.
>      >
>      > Tom Talpey <tom@talpey.com <mailto:tom@talpey.com>
>     <mailto:tom@talpey.com <mailto:tom@talpey.com>>> 于2025年1月8日周三
>      > 05:04写道:
>      >
>      >     On 1/5/2025 10:39 PM, He Wang wrote:
>      >      > Field `initiator_depth` is for incoming request.
>      >      >
>      >      > According to the man page, `max_qp_rd_atom` is the maximum
>     number of
>      >      > outstanding packaets, and `max_qp_init_rd_atom` is the maximum
>      >     depth of
>      >      > incoming requests.
>      >
>      >     I do not believe this is correct, what "man page" are you
>     referring to?
>      >     The commit message is definitely wrong. Neither value is
>     referring to
>      >     generic "maximum packets" nor "incoming requests".
>      >
>      >     The max_qp_rd_atom is the "ORD" or outgoing read/atomic
>     request depth.
>      >     The ksmbd server uses this to control RDMA Read requests to
>     fetch data
>      >     from the client for certain SMB3_WRITE operations. (SMB
>     Direct does not
>      >     use atomics)
>      >
>      >     The max_qp_init_rd_atom is the "IRD" or incoming read/atomic
>     request
>      >     depth. The SMB3 protocol does not allow clients to request
>     data from
>      >     servers via RDMA Read. This is absolutely by design, and the
>     server
>      >     therefore does not use this value.
>      >
>      >     In practice, many RDMA providers set the rd_atom and
>     rd_init_atom to
>      >     the same value, but this change would appear to break SMB
>     Direct write
>      >     functionality when operating over providers that do not.
>      >
>      >     So, NAK.
>      >
>      >     Namjae, you should revert your upstream commit.
>      >
>      >     Tom.
>      >
>      >      >
>      >      > Signed-off-by: He Wang <xw897002528@gmail.com
>     <mailto:xw897002528@gmail.com>
>      >     <mailto:xw897002528@gmail.com <mailto:xw897002528@gmail.com>>>
>      >      > ---
>      >      >   fs/smb/server/transport_rdma.c | 2 +-
>      >      >   1 file changed, 1 insertion(+), 1 deletion(-)
>      >      >
>      >      > diff --git a/fs/smb/server/transport_rdma.c b/fs/smb/server/
>      >     transport_rdma.c
>      >      > index 0ef3c9f0b..c6dbbbb32 100644
>      >      > --- a/fs/smb/server/transport_rdma.c
>      >      > +++ b/fs/smb/server/transport_rdma.c
>      >      > @@ -1640,7 +1640,7 @@ static int
>     smb_direct_accept_client(struct
>      >     smb_direct_transport *t)
>      >      >       int ret;
>      >      >
>      >      >       memset(&conn_param, 0, sizeof(conn_param));
>      >      > -     conn_param.initiator_depth = min_t(u8, t->cm_id->device-
>      >      >attrs.max_qp_rd_atom,
>      >      > +     conn_param.initiator_depth = min_t(u8, t->cm_id->device-
>      >      >attrs.max_qp_init_rd_atom,
>      >      >
>      >     SMB_DIRECT_CM_INITIATOR_DEPTH);
>      >      >       conn_param.responder_resources = 0;
>      >      >
>      >
>      >
>      >
>      > --
>      > Best regards,
>      > xhe
> 
> 
> 
> -- 
> Best regards,
> xhe
diff mbox series

Patch

diff --git a/fs/smb/server/transport_rdma.c b/fs/smb/server/transport_rdma.c
index 0ef3c9f0b..c6dbbbb32 100644
--- a/fs/smb/server/transport_rdma.c
+++ b/fs/smb/server/transport_rdma.c
@@ -1640,7 +1640,7 @@  static int smb_direct_accept_client(struct smb_direct_transport *t)
 	int ret;
 
 	memset(&conn_param, 0, sizeof(conn_param));
-	conn_param.initiator_depth = min_t(u8, t->cm_id->device->attrs.max_qp_rd_atom,
+	conn_param.initiator_depth = min_t(u8, t->cm_id->device->attrs.max_qp_init_rd_atom,
 					   SMB_DIRECT_CM_INITIATOR_DEPTH);
 	conn_param.responder_resources = 0;