diff mbox series

[v3,09/13] sunrpc: add a symlink from rpc-client directory to the xprt_switch

Message ID 20210426171947.99233-10-olga.kornievskaia@gmail.com (mailing list archive)
State New, archived
Headers show
Series create sysfs files for changing IP address | expand

Commit Message

Olga Kornievskaia April 26, 2021, 5:19 p.m. UTC
From: Olga Kornievskaia <kolga@netapp.com>

An rpc client uses a transport switch and one ore more transports
associated with that switch. Since transports are shared among
rpc clients, create a symlink into the xprt_switch directory
instead of duplicating entries under each rpc client.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
---
 net/sunrpc/clnt.c  |  2 +-
 net/sunrpc/sysfs.c | 25 +++++++++++++++++++++++--
 net/sunrpc/sysfs.h |  6 +++++-
 3 files changed, 29 insertions(+), 4 deletions(-)

Comments

Dan Aloni April 27, 2021, 4:42 a.m. UTC | #1
On Mon, Apr 26, 2021 at 01:19:43PM -0400, Olga Kornievskaia wrote:
> From: Olga Kornievskaia <kolga@netapp.com>
> 
> An rpc client uses a transport switch and one ore more transports
> associated with that switch. Since transports are shared among
> rpc clients, create a symlink into the xprt_switch directory
> instead of duplicating entries under each rpc client.
> 
> Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
>
>..
> @@ -188,6 +204,11 @@ void rpc_sysfs_client_destroy(struct rpc_clnt *clnt)
>  	struct rpc_sysfs_client *rpc_client = clnt->cl_sysfs;
>  
>  	if (rpc_client) {
> +		char name[23];
> +
> +		snprintf(name, sizeof(name), "switch-%d",
> +			 rpc_client->xprt_switch->xps_id);
> +		sysfs_remove_link(&rpc_client->kobject, name);

Hi Olga,

If a client can use a single switch, shouldn't the name of the symlink
be just "switch"? This is to be consistent with other symlinks in
`sysfs` such as the ones in block layer, for example in my
`/sys/block/sda`:

    bdi -> ../../../../../../../../../../../virtual/bdi/8:0
    device -> ../../../5:0:0:0
Olga Kornievskaia April 27, 2021, 12:12 p.m. UTC | #2
On Tue, Apr 27, 2021 at 12:42 AM Dan Aloni <dan@kernelim.com> wrote:
>
> On Mon, Apr 26, 2021 at 01:19:43PM -0400, Olga Kornievskaia wrote:
> > From: Olga Kornievskaia <kolga@netapp.com>
> >
> > An rpc client uses a transport switch and one ore more transports
> > associated with that switch. Since transports are shared among
> > rpc clients, create a symlink into the xprt_switch directory
> > instead of duplicating entries under each rpc client.
> >
> > Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> >
> >..
> > @@ -188,6 +204,11 @@ void rpc_sysfs_client_destroy(struct rpc_clnt *clnt)
> >       struct rpc_sysfs_client *rpc_client = clnt->cl_sysfs;
> >
> >       if (rpc_client) {
> > +             char name[23];
> > +
> > +             snprintf(name, sizeof(name), "switch-%d",
> > +                      rpc_client->xprt_switch->xps_id);
> > +             sysfs_remove_link(&rpc_client->kobject, name);
>
> Hi Olga,
>
> If a client can use a single switch, shouldn't the name of the symlink
> be just "switch"? This is to be consistent with other symlinks in
> `sysfs` such as the ones in block layer, for example in my
> `/sys/block/sda`:
>
>     bdi -> ../../../../../../../../../../../virtual/bdi/8:0
>     device -> ../../../5:0:0:0

I think the client is written so that in the future it might have more
than one switch?

>
>
> --
> Dan Aloni
Dan Aloni May 12, 2021, 10:42 a.m. UTC | #3
On Tue, Apr 27, 2021 at 08:12:53AM -0400, Olga Kornievskaia wrote:
> On Tue, Apr 27, 2021 at 12:42 AM Dan Aloni <dan@kernelim.com> wrote:
> >
> > On Mon, Apr 26, 2021 at 01:19:43PM -0400, Olga Kornievskaia wrote:
> > > From: Olga Kornievskaia <kolga@netapp.com>
> > >
> > > An rpc client uses a transport switch and one ore more transports
> > > associated with that switch. Since transports are shared among
> > > rpc clients, create a symlink into the xprt_switch directory
> > > instead of duplicating entries under each rpc client.
> > >
> > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> > >
> > >..
> > > @@ -188,6 +204,11 @@ void rpc_sysfs_client_destroy(struct rpc_clnt *clnt)
> > >       struct rpc_sysfs_client *rpc_client = clnt->cl_sysfs;
> > >
> > >       if (rpc_client) {
> > > +             char name[23];
> > > +
> > > +             snprintf(name, sizeof(name), "switch-%d",
> > > +                      rpc_client->xprt_switch->xps_id);
> > > +             sysfs_remove_link(&rpc_client->kobject, name);
> >
> > Hi Olga,
> >
> > If a client can use a single switch, shouldn't the name of the symlink
> > be just "switch"? This is to be consistent with other symlinks in
> > `sysfs` such as the ones in block layer, for example in my
> > `/sys/block/sda`:
> >
> >     bdi -> ../../../../../../../../../../../virtual/bdi/8:0
> >     device -> ../../../5:0:0:0
> 
> I think the client is written so that in the future it might have more
> than one switch?

I wonder what would be the use for that, as a switch is already collection of
xprts. Which would determine the switch to use per a new task IO?
Olga Kornievskaia May 12, 2021, 1:40 p.m. UTC | #4
On Wed, May 12, 2021 at 9:37 AM Olga Kornievskaia
<olga.kornievskaia@gmail.com> wrote:
>
>
>
> On Wed, May 12, 2021 at 6:42 AM Dan Aloni <dan@kernelim.com> wrote:
>>
>> On Tue, Apr 27, 2021 at 08:12:53AM -0400, Olga Kornievskaia wrote:
>> > On Tue, Apr 27, 2021 at 12:42 AM Dan Aloni <dan@kernelim.com> wrote:
>> > >
>> > > On Mon, Apr 26, 2021 at 01:19:43PM -0400, Olga Kornievskaia wrote:
>> > > > From: Olga Kornievskaia <kolga@netapp.com>
>> > > >
>> > > > An rpc client uses a transport switch and one ore more transports
>> > > > associated with that switch. Since transports are shared among
>> > > > rpc clients, create a symlink into the xprt_switch directory
>> > > > instead of duplicating entries under each rpc client.
>> > > >
>> > > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
>> > > >
>> > > >..
>> > > > @@ -188,6 +204,11 @@ void rpc_sysfs_client_destroy(struct rpc_clnt *clnt)
>> > > >       struct rpc_sysfs_client *rpc_client = clnt->cl_sysfs;
>> > > >
>> > > >       if (rpc_client) {
>> > > > +             char name[23];
>> > > > +
>> > > > +             snprintf(name, sizeof(name), "switch-%d",
>> > > > +                      rpc_client->xprt_switch->xps_id);
>> > > > +             sysfs_remove_link(&rpc_client->kobject, name);
>> > >
>> > > Hi Olga,
>> > >
>> > > If a client can use a single switch, shouldn't the name of the symlink
>> > > be just "switch"? This is to be consistent with other symlinks in
>> > > `sysfs` such as the ones in block layer, for example in my
>> > > `/sys/block/sda`:
>> > >
>> > >     bdi -> ../../../../../../../../../../../virtual/bdi/8:0
>> > >     device -> ../../../5:0:0:0
>> >
>> > I think the client is written so that in the future it might have more
>> > than one switch?
>>
>> I wonder what would be the use for that, as a switch is already collection of
>> xprts. Which would determine the switch to use per a new task IO?
>
>
> I thought the switch is a collection of xprts of the same type. And if you wanted to have an RDMA connection and a TCP connection to the same server, then it would be stored under different switches? For instance we round-robin thru the transports but I don't see why we would be doing so between a TCP and an RDMA transport. But I see how a client can totally switch from an TCP based transport to an RDMA one (or a set of transports and round-robin among that set). But perhaps I'm wrong in how I'm thinking about xprt_switch and multipathing.

<looks like my reply bounced so trying to resend>
Dan Aloni May 12, 2021, 2:16 p.m. UTC | #5
On Wed, May 12, 2021 at 09:49:01AM -0400, Olga Kornievskaia wrote:
> On Wed, May 12, 2021 at 9:40 AM Olga Kornievskaia <
> olga.kornievskaia@gmail.com> wrote:
> 
> > On Wed, May 12, 2021 at 9:37 AM Olga Kornievskaia
> > <olga.kornievskaia@gmail.com> wrote:
> > > On Wed, May 12, 2021 at 6:42 AM Dan Aloni <dan@kernelim.com> wrote:
> > >> On Tue, Apr 27, 2021 at 08:12:53AM -0400, Olga Kornievskaia wrote:
> > >> > On Tue, Apr 27, 2021 at 12:42 AM Dan Aloni <dan@kernelim.com> wrote:
> > >> > > On Mon, Apr 26, 2021 at 01:19:43PM -0400, Olga Kornievskaia wrote:
> > >> > > > From: Olga Kornievskaia <kolga@netapp.com>
> > >> > > >
> > >> > > > An rpc client uses a transport switch and one ore more transports
> > >> > > > associated with that switch. Since transports are shared among
> > >> > > > rpc clients, create a symlink into the xprt_switch directory
> > >> > > > instead of duplicating entries under each rpc client.
> > >> > > >
> > >> > > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> > >> > > >
> > >> > > >..
> > >> > > > @@ -188,6 +204,11 @@ void rpc_sysfs_client_destroy(struct
> > rpc_clnt *clnt)
> > >> > > >       struct rpc_sysfs_client *rpc_client = clnt->cl_sysfs;
> > >> > > >
> > >> > > >       if (rpc_client) {
> > >> > > > +             char name[23];
> > >> > > > +
> > >> > > > +             snprintf(name, sizeof(name), "switch-%d",
> > >> > > > +                      rpc_client->xprt_switch->xps_id);
> > >> > > > +             sysfs_remove_link(&rpc_client->kobject, name);
> > >> > >
> > >> > > Hi Olga,
> > >> > >
> > >> > > If a client can use a single switch, shouldn't the name of the
> > symlink
> > >> > > be just "switch"? This is to be consistent with other symlinks in
> > >> > > `sysfs` such as the ones in block layer, for example in my
> > >> > > `/sys/block/sda`:
> > >> > >
> > >> > >     bdi -> ../../../../../../../../../../../virtual/bdi/8:0
> > >> > >     device -> ../../../5:0:0:0
> > >> >
> > >> > I think the client is written so that in the future it might have more
> > >> > than one switch?
> > >>
> > >> I wonder what would be the use for that, as a switch is already
> > collection of
> > >> xprts. Which would determine the switch to use per a new task IO?
> > >
> > >
> > > I thought the switch is a collection of xprts of the same type. And if
> > you wanted to have an RDMA connection and a TCP connection to the same
> > server, then it would be stored under different switches? For instance we
> > round-robin thru the transports but I don't see why we would be doing so
> > between a TCP and an RDMA transport. But I see how a client can totally
> > switch from an TCP based transport to an RDMA one (or a set of transports
> > and round-robin among that set). But perhaps I'm wrong in how I'm thinking
> > about xprt_switch and multipathing.
> >
> > <looks like my reply bounced so trying to resend>
> >
> 
> And more to answer your question, we don't have a method to switch between
> different xprt switches. But we don't have a way to specify how to mount
> with multiple types of transports. Perhaps sysfs could be/would be a way to
> switch between the two. Perhaps during session trunking discovery, the
> server can return back a list of IPs and types of transports. Say all IPs
> would be available via TCP and RDMA, then the client can create a switch
> with RDMA transports and another with TCP transports, then perhaps there
> will be a policy engine that would decide which one to choose to use to
> begin with. And then with sysfs interface would be a way to switch between
> the two if there are problems.

You raise a good point, also relevant to the ability to dynamically add
new transports on the fly with the sysfs interface - their protocol type
may be different.

Returning to the topic of multiple switches per client, I recall that a
few times it was hinted that there is an intention to have the
implementation details of xprtswitch be modified to be loadable and
pluggable with custom algorithms.  And if we are going in that
direction, I'd expect the advanced transport management and request
routing can be below the RPC client level, where we have example uses:

1) Optimizations in request routing that I've previously written about
(based on request data memory).

2) If we lift the restriction of multiple protocol types on the same
xprtswitch on one switch, then we can also allow for the implementation
'RDMA-by-default with TCP failover on standby' similar to what you
suggest, but with one switch maintaining two lists behind the scenes.
Trond Myklebust May 13, 2021, 3:13 p.m. UTC | #6
On Wed, 2021-05-12 at 17:16 +0300, Dan Aloni wrote:
> On Wed, May 12, 2021 at 09:49:01AM -0400, Olga Kornievskaia wrote:
> > On Wed, May 12, 2021 at 9:40 AM Olga Kornievskaia <
> > olga.kornievskaia@gmail.com> wrote:
> > 
> > > On Wed, May 12, 2021 at 9:37 AM Olga Kornievskaia
> > > <olga.kornievskaia@gmail.com> wrote:
> > > > On Wed, May 12, 2021 at 6:42 AM Dan Aloni <dan@kernelim.com>
> > > > wrote:
> > > > > On Tue, Apr 27, 2021 at 08:12:53AM -0400, Olga Kornievskaia
> > > > > wrote:
> > > > > > On Tue, Apr 27, 2021 at 12:42 AM Dan Aloni <
> > > > > > dan@kernelim.com> wrote:
> > > > > > > On Mon, Apr 26, 2021 at 01:19:43PM -0400, Olga
> > > > > > > Kornievskaia wrote:
> > > > > > > > From: Olga Kornievskaia <kolga@netapp.com>
> > > > > > > > 
> > > > > > > > An rpc client uses a transport switch and one ore more
> > > > > > > > transports
> > > > > > > > associated with that switch. Since transports are
> > > > > > > > shared among
> > > > > > > > rpc clients, create a symlink into the xprt_switch
> > > > > > > > directory
> > > > > > > > instead of duplicating entries under each rpc client.
> > > > > > > > 
> > > > > > > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> > > > > > > > 
> > > > > > > > ..
> > > > > > > > @@ -188,6 +204,11 @@ void
> > > > > > > > rpc_sysfs_client_destroy(struct
> > > rpc_clnt *clnt)
> > > > > > > >       struct rpc_sysfs_client *rpc_client = clnt-
> > > > > > > > >cl_sysfs;
> > > > > > > > 
> > > > > > > >       if (rpc_client) {
> > > > > > > > +             char name[23];
> > > > > > > > +
> > > > > > > > +             snprintf(name, sizeof(name), "switch-%d",
> > > > > > > > +                      rpc_client->xprt_switch-
> > > > > > > > >xps_id);
> > > > > > > > +             sysfs_remove_link(&rpc_client->kobject,
> > > > > > > > name);
> > > > > > > 
> > > > > > > Hi Olga,
> > > > > > > 
> > > > > > > If a client can use a single switch, shouldn't the name
> > > > > > > of the
> > > symlink
> > > > > > > be just "switch"? This is to be consistent with other
> > > > > > > symlinks in
> > > > > > > `sysfs` such as the ones in block layer, for example in
> > > > > > > my
> > > > > > > `/sys/block/sda`:
> > > > > > > 
> > > > > > >     bdi ->
> > > > > > > ../../../../../../../../../../../virtual/bdi/8:0
> > > > > > >     device -> ../../../5:0:0:0
> > > > > > 
> > > > > > I think the client is written so that in the future it
> > > > > > might have more
> > > > > > than one switch?
> > > > > 
> > > > > I wonder what would be the use for that, as a switch is
> > > > > already
> > > collection of
> > > > > xprts. Which would determine the switch to use per a new task
> > > > > IO?
> > > > 
> > > > 
> > > > I thought the switch is a collection of xprts of the same type.
> > > > And if
> > > you wanted to have an RDMA connection and a TCP connection to the
> > > same
> > > server, then it would be stored under different switches? For
> > > instance we
> > > round-robin thru the transports but I don't see why we would be
> > > doing so
> > > between a TCP and an RDMA transport. But I see how a client can
> > > totally
> > > switch from an TCP based transport to an RDMA one (or a set of
> > > transports
> > > and round-robin among that set). But perhaps I'm wrong in how I'm
> > > thinking
> > > about xprt_switch and multipathing.
> > > 
> > > <looks like my reply bounced so trying to resend>
> > > 
> > 
> > And more to answer your question, we don't have a method to switch
> > between
> > different xprt switches. But we don't have a way to specify how to
> > mount
> > with multiple types of transports. Perhaps sysfs could be/would be
> > a way to
> > switch between the two. Perhaps during session trunking discovery,
> > the
> > server can return back a list of IPs and types of transports. Say
> > all IPs
> > would be available via TCP and RDMA, then the client can create a
> > switch
> > with RDMA transports and another with TCP transports, then perhaps
> > there
> > will be a policy engine that would decide which one to choose to
> > use to
> > begin with. And then with sysfs interface would be a way to switch
> > between
> > the two if there are problems.
> 
> You raise a good point, also relevant to the ability to dynamically
> add
> new transports on the fly with the sysfs interface - their protocol
> type
> may be different.
> 
> Returning to the topic of multiple switches per client, I recall that
> a
> few times it was hinted that there is an intention to have the
> implementation details of xprtswitch be modified to be loadable and
> pluggable with custom algorithms.  And if we are going in that
> direction, I'd expect the advanced transport management and request
> routing can be below the RPC client level, where we have example
> uses:
> 
> 1) Optimizations in request routing that I've previously written
> about
> (based on request data memory).
> 
> 2) If we lift the restriction of multiple protocol types on the same
> xprtswitch on one switch, then we can also allow for the
> implementation
> 'RDMA-by-default with TCP failover on standby' similar to what you
> suggest, but with one switch maintaining two lists behind the scenes.
> 

I'm not that interested in supporting multiple switches per client, or
any setup that is asymmetric w.r.t. transport characteristics at this
time.

Any such setup is going to need a policy engine in order to decide
which RPC calls can be placed on which set of transports. That again
will end up adding a lot of complexity in the kernel itself. I've yet
to see any compelling justification for doing so.
Chuck Lever May 13, 2021, 3:18 p.m. UTC | #7
> On May 13, 2021, at 11:13 AM, Trond Myklebust <trondmy@hammerspace.com> wrote:
> 
> On Wed, 2021-05-12 at 17:16 +0300, Dan Aloni wrote:
>> On Wed, May 12, 2021 at 09:49:01AM -0400, Olga Kornievskaia wrote:
>>> On Wed, May 12, 2021 at 9:40 AM Olga Kornievskaia <
>>> olga.kornievskaia@gmail.com> wrote:
>>> 
>>>> On Wed, May 12, 2021 at 9:37 AM Olga Kornievskaia
>>>> <olga.kornievskaia@gmail.com> wrote:
>>>>> On Wed, May 12, 2021 at 6:42 AM Dan Aloni <dan@kernelim.com>
>>>>> wrote:
>>>>>> On Tue, Apr 27, 2021 at 08:12:53AM -0400, Olga Kornievskaia
>>>>>> wrote:
>>>>>>> On Tue, Apr 27, 2021 at 12:42 AM Dan Aloni <
>>>>>>> dan@kernelim.com> wrote:
>>>>>>>> On Mon, Apr 26, 2021 at 01:19:43PM -0400, Olga
>>>>>>>> Kornievskaia wrote:
>>>>>>>>> From: Olga Kornievskaia <kolga@netapp.com>
>>>>>>>>> 
>>>>>>>>> An rpc client uses a transport switch and one ore more
>>>>>>>>> transports
>>>>>>>>> associated with that switch. Since transports are
>>>>>>>>> shared among
>>>>>>>>> rpc clients, create a symlink into the xprt_switch
>>>>>>>>> directory
>>>>>>>>> instead of duplicating entries under each rpc client.
>>>>>>>>> 
>>>>>>>>> Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
>>>>>>>>> 
>>>>>>>>> ..
>>>>>>>>> @@ -188,6 +204,11 @@ void
>>>>>>>>> rpc_sysfs_client_destroy(struct
>>>> rpc_clnt *clnt)
>>>>>>>>>       struct rpc_sysfs_client *rpc_client = clnt-
>>>>>>>>>> cl_sysfs;
>>>>>>>>> 
>>>>>>>>>       if (rpc_client) {
>>>>>>>>> +             char name[23];
>>>>>>>>> +
>>>>>>>>> +             snprintf(name, sizeof(name), "switch-%d",
>>>>>>>>> +                      rpc_client->xprt_switch-
>>>>>>>>>> xps_id);
>>>>>>>>> +             sysfs_remove_link(&rpc_client->kobject,
>>>>>>>>> name);
>>>>>>>> 
>>>>>>>> Hi Olga,
>>>>>>>> 
>>>>>>>> If a client can use a single switch, shouldn't the name
>>>>>>>> of the
>>>> symlink
>>>>>>>> be just "switch"? This is to be consistent with other
>>>>>>>> symlinks in
>>>>>>>> `sysfs` such as the ones in block layer, for example in
>>>>>>>> my
>>>>>>>> `/sys/block/sda`:
>>>>>>>> 
>>>>>>>>     bdi ->
>>>>>>>> ../../../../../../../../../../../virtual/bdi/8:0
>>>>>>>>     device -> ../../../5:0:0:0
>>>>>>> 
>>>>>>> I think the client is written so that in the future it
>>>>>>> might have more
>>>>>>> than one switch?
>>>>>> 
>>>>>> I wonder what would be the use for that, as a switch is
>>>>>> already
>>>> collection of
>>>>>> xprts. Which would determine the switch to use per a new task
>>>>>> IO?
>>>>> 
>>>>> 
>>>>> I thought the switch is a collection of xprts of the same type.
>>>>> And if
>>>> you wanted to have an RDMA connection and a TCP connection to the
>>>> same
>>>> server, then it would be stored under different switches? For
>>>> instance we
>>>> round-robin thru the transports but I don't see why we would be
>>>> doing so
>>>> between a TCP and an RDMA transport. But I see how a client can
>>>> totally
>>>> switch from an TCP based transport to an RDMA one (or a set of
>>>> transports
>>>> and round-robin among that set). But perhaps I'm wrong in how I'm
>>>> thinking
>>>> about xprt_switch and multipathing.
>>>> 
>>>> <looks like my reply bounced so trying to resend>
>>>> 
>>> 
>>> And more to answer your question, we don't have a method to switch
>>> between
>>> different xprt switches. But we don't have a way to specify how to
>>> mount
>>> with multiple types of transports. Perhaps sysfs could be/would be
>>> a way to
>>> switch between the two. Perhaps during session trunking discovery,
>>> the
>>> server can return back a list of IPs and types of transports. Say
>>> all IPs
>>> would be available via TCP and RDMA, then the client can create a
>>> switch
>>> with RDMA transports and another with TCP transports, then perhaps
>>> there
>>> will be a policy engine that would decide which one to choose to
>>> use to
>>> begin with. And then with sysfs interface would be a way to switch
>>> between
>>> the two if there are problems.
>> 
>> You raise a good point, also relevant to the ability to dynamically
>> add
>> new transports on the fly with the sysfs interface - their protocol
>> type
>> may be different.
>> 
>> Returning to the topic of multiple switches per client, I recall that
>> a
>> few times it was hinted that there is an intention to have the
>> implementation details of xprtswitch be modified to be loadable and
>> pluggable with custom algorithms.  And if we are going in that
>> direction, I'd expect the advanced transport management and request
>> routing can be below the RPC client level, where we have example
>> uses:
>> 
>> 1) Optimizations in request routing that I've previously written
>> about
>> (based on request data memory).
>> 
>> 2) If we lift the restriction of multiple protocol types on the same
>> xprtswitch on one switch, then we can also allow for the
>> implementation
>> 'RDMA-by-default with TCP failover on standby' similar to what you
>> suggest, but with one switch maintaining two lists behind the scenes.
>> 
> 
> I'm not that interested in supporting multiple switches per client, or
> any setup that is asymmetric w.r.t. transport characteristics at this
> time.
> 
> Any such setup is going to need a policy engine in order to decide
> which RPC calls can be placed on which set of transports. That again
> will end up adding a lot of complexity in the kernel itself. I've yet
> to see any compelling justification for doing so.

I agree -- SMB multi-channel allows TCP+RDMA configurations, and its
tough to decide how to distribute work across connections and NICs
that have such vastly different performance characteristics.

I would like to see us crawling and walking before trying to run.


--
Chuck Lever
Olga Kornievskaia May 13, 2021, 3:53 p.m. UTC | #8
On Thu, May 13, 2021 at 11:18 AM Chuck Lever III <chuck.lever@oracle.com> wrote:
>
>
>
> > On May 13, 2021, at 11:13 AM, Trond Myklebust <trondmy@hammerspace.com> wrote:
> >
> > On Wed, 2021-05-12 at 17:16 +0300, Dan Aloni wrote:
> >> On Wed, May 12, 2021 at 09:49:01AM -0400, Olga Kornievskaia wrote:
> >>> On Wed, May 12, 2021 at 9:40 AM Olga Kornievskaia <
> >>> olga.kornievskaia@gmail.com> wrote:
> >>>
> >>>> On Wed, May 12, 2021 at 9:37 AM Olga Kornievskaia
> >>>> <olga.kornievskaia@gmail.com> wrote:
> >>>>> On Wed, May 12, 2021 at 6:42 AM Dan Aloni <dan@kernelim.com>
> >>>>> wrote:
> >>>>>> On Tue, Apr 27, 2021 at 08:12:53AM -0400, Olga Kornievskaia
> >>>>>> wrote:
> >>>>>>> On Tue, Apr 27, 2021 at 12:42 AM Dan Aloni <
> >>>>>>> dan@kernelim.com> wrote:
> >>>>>>>> On Mon, Apr 26, 2021 at 01:19:43PM -0400, Olga
> >>>>>>>> Kornievskaia wrote:
> >>>>>>>>> From: Olga Kornievskaia <kolga@netapp.com>
> >>>>>>>>>
> >>>>>>>>> An rpc client uses a transport switch and one ore more
> >>>>>>>>> transports
> >>>>>>>>> associated with that switch. Since transports are
> >>>>>>>>> shared among
> >>>>>>>>> rpc clients, create a symlink into the xprt_switch
> >>>>>>>>> directory
> >>>>>>>>> instead of duplicating entries under each rpc client.
> >>>>>>>>>
> >>>>>>>>> Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> >>>>>>>>>
> >>>>>>>>> ..
> >>>>>>>>> @@ -188,6 +204,11 @@ void
> >>>>>>>>> rpc_sysfs_client_destroy(struct
> >>>> rpc_clnt *clnt)
> >>>>>>>>>       struct rpc_sysfs_client *rpc_client = clnt-
> >>>>>>>>>> cl_sysfs;
> >>>>>>>>>
> >>>>>>>>>       if (rpc_client) {
> >>>>>>>>> +             char name[23];
> >>>>>>>>> +
> >>>>>>>>> +             snprintf(name, sizeof(name), "switch-%d",
> >>>>>>>>> +                      rpc_client->xprt_switch-
> >>>>>>>>>> xps_id);
> >>>>>>>>> +             sysfs_remove_link(&rpc_client->kobject,
> >>>>>>>>> name);
> >>>>>>>>
> >>>>>>>> Hi Olga,
> >>>>>>>>
> >>>>>>>> If a client can use a single switch, shouldn't the name
> >>>>>>>> of the
> >>>> symlink
> >>>>>>>> be just "switch"? This is to be consistent with other
> >>>>>>>> symlinks in
> >>>>>>>> `sysfs` such as the ones in block layer, for example in
> >>>>>>>> my
> >>>>>>>> `/sys/block/sda`:
> >>>>>>>>
> >>>>>>>>     bdi ->
> >>>>>>>> ../../../../../../../../../../../virtual/bdi/8:0
> >>>>>>>>     device -> ../../../5:0:0:0
> >>>>>>>
> >>>>>>> I think the client is written so that in the future it
> >>>>>>> might have more
> >>>>>>> than one switch?
> >>>>>>
> >>>>>> I wonder what would be the use for that, as a switch is
> >>>>>> already
> >>>> collection of
> >>>>>> xprts. Which would determine the switch to use per a new task
> >>>>>> IO?
> >>>>>
> >>>>>
> >>>>> I thought the switch is a collection of xprts of the same type.
> >>>>> And if
> >>>> you wanted to have an RDMA connection and a TCP connection to the
> >>>> same
> >>>> server, then it would be stored under different switches? For
> >>>> instance we
> >>>> round-robin thru the transports but I don't see why we would be
> >>>> doing so
> >>>> between a TCP and an RDMA transport. But I see how a client can
> >>>> totally
> >>>> switch from an TCP based transport to an RDMA one (or a set of
> >>>> transports
> >>>> and round-robin among that set). But perhaps I'm wrong in how I'm
> >>>> thinking
> >>>> about xprt_switch and multipathing.
> >>>>
> >>>> <looks like my reply bounced so trying to resend>
> >>>>
> >>>
> >>> And more to answer your question, we don't have a method to switch
> >>> between
> >>> different xprt switches. But we don't have a way to specify how to
> >>> mount
> >>> with multiple types of transports. Perhaps sysfs could be/would be
> >>> a way to
> >>> switch between the two. Perhaps during session trunking discovery,
> >>> the
> >>> server can return back a list of IPs and types of transports. Say
> >>> all IPs
> >>> would be available via TCP and RDMA, then the client can create a
> >>> switch
> >>> with RDMA transports and another with TCP transports, then perhaps
> >>> there
> >>> will be a policy engine that would decide which one to choose to
> >>> use to
> >>> begin with. And then with sysfs interface would be a way to switch
> >>> between
> >>> the two if there are problems.
> >>
> >> You raise a good point, also relevant to the ability to dynamically
> >> add
> >> new transports on the fly with the sysfs interface - their protocol
> >> type
> >> may be different.
> >>
> >> Returning to the topic of multiple switches per client, I recall that
> >> a
> >> few times it was hinted that there is an intention to have the
> >> implementation details of xprtswitch be modified to be loadable and
> >> pluggable with custom algorithms.  And if we are going in that
> >> direction, I'd expect the advanced transport management and request
> >> routing can be below the RPC client level, where we have example
> >> uses:
> >>
> >> 1) Optimizations in request routing that I've previously written
> >> about
> >> (based on request data memory).
> >>
> >> 2) If we lift the restriction of multiple protocol types on the same
> >> xprtswitch on one switch, then we can also allow for the
> >> implementation
> >> 'RDMA-by-default with TCP failover on standby' similar to what you
> >> suggest, but with one switch maintaining two lists behind the scenes.
> >>
> >
> > I'm not that interested in supporting multiple switches per client, or
> > any setup that is asymmetric w.r.t. transport characteristics at this
> > time.
> >
> > Any such setup is going to need a policy engine in order to decide
> > which RPC calls can be placed on which set of transports. That again
> > will end up adding a lot of complexity in the kernel itself. I've yet
> > to see any compelling justification for doing so.
>
> I agree -- SMB multi-channel allows TCP+RDMA configurations, and its
> tough to decide how to distribute work across connections and NICs
> that have such vastly different performance characteristics.
>
> I would like to see us crawling and walking before trying to run.

Sounds good folks. I'll remove the multiple switches from the sysfs
infrastructure. v7 coming up.

>
>
> --
> Chuck Lever
>
>
>
Tom Talpey May 13, 2021, 4:42 p.m. UTC | #9
On 5/13/2021 11:18 AM, Chuck Lever III wrote:
> I agree -- SMB multi-channel allows TCP+RDMA configurations, and its
> tough to decide how to distribute work across connections and NICs
> that have such vastly different performance characteristics.

Purely FYI on this point - SMB does not attempt to perform operations
over links which differ in transport protocol nor link speed. It
prefers RDMA over any TCP, and higher link speed over lower. When
multiple tiers exist, it relegates lower-tier connections to standby
status.

Most implementations have various tweaks to prefer or to poison
certain transports or connection associations, but the basics
are always to use homogeneous links.

Tom.
Olga Kornievskaia May 13, 2021, 9:18 p.m. UTC | #10
On Tue, Apr 27, 2021 at 12:42 AM Dan Aloni <dan@kernelim.com> wrote:
>
> On Mon, Apr 26, 2021 at 01:19:43PM -0400, Olga Kornievskaia wrote:
> > From: Olga Kornievskaia <kolga@netapp.com>
> >
> > An rpc client uses a transport switch and one ore more transports
> > associated with that switch. Since transports are shared among
> > rpc clients, create a symlink into the xprt_switch directory
> > instead of duplicating entries under each rpc client.
> >
> > Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> >
> >..
> > @@ -188,6 +204,11 @@ void rpc_sysfs_client_destroy(struct rpc_clnt *clnt)
> >       struct rpc_sysfs_client *rpc_client = clnt->cl_sysfs;
> >
> >       if (rpc_client) {
> > +             char name[23];
> > +
> > +             snprintf(name, sizeof(name), "switch-%d",
> > +                      rpc_client->xprt_switch->xps_id);
> > +             sysfs_remove_link(&rpc_client->kobject, name);
>
> Hi Olga,
>
> If a client can use a single switch, shouldn't the name of the symlink
> be just "switch"? This is to be consistent with other symlinks in
> `sysfs` such as the ones in block layer, for example in my
> `/sys/block/sda`:
>
>     bdi -> ../../../../../../../../../../../virtual/bdi/8:0
>     device -> ../../../5:0:0:0
>

Jumping back to this comment because now that I went back to try to
modify the code I'm having doubts.

We still need numbering of xprt switches because they are different
for different mounts. So xprt_switches directory would still have
switch-0 for say a mount to server A and then switch-0 for a mount to
server B.  While yes I see that for a given rpc client that's making a
link into a xprt_switches directory will only have 1 link. And "yes"
the name of the link could be "switch". But isn't it more informative
to keep this to be the same name as the name of the directory under
the xprt_switches?

>
> --
> Dan Aloni
Benjamin Coddington May 14, 2021, 10:17 a.m. UTC | #11
On 13 May 2021, at 17:18, Olga Kornievskaia wrote:
> On Tue, Apr 27, 2021 at 12:42 AM Dan Aloni <dan@kernelim.com> wrote:
>> If a client can use a single switch, shouldn't the name of the symlink
>> be just "switch"? This is to be consistent with other symlinks in
>> `sysfs` such as the ones in block layer, for example in my
>> `/sys/block/sda`:
>>
>>     bdi -> ../../../../../../../../../../../virtual/bdi/8:0
>>     device -> ../../../5:0:0:0
>>
>
> Jumping back to this comment because now that I went back to try to
> modify the code I'm having doubts.
>
> We still need numbering of xprt switches because they are different
> for different mounts. So xprt_switches directory would still have
> switch-0 for say a mount to server A and then switch-0 for a mount to
> server B.  While yes I see that for a given rpc client that's making a
> link into a xprt_switches directory will only have 1 link. And "yes"
> the name of the link could be "switch". But isn't it more informative
> to keep this to be the same name as the name of the directory under
> the xprt_switches?

The information isn't lost, as the symlink points to the specific switch.
Not using a number in the symlink name informs that there will only be one
switch for each client and makes it more deterministic for users and
software to navigate.

Ben
Olga Kornievskaia May 14, 2021, 2:16 p.m. UTC | #12
On Fri, May 14, 2021 at 6:17 AM Benjamin Coddington <bcodding@redhat.com> wrote:
>
> On 13 May 2021, at 17:18, Olga Kornievskaia wrote:
> > On Tue, Apr 27, 2021 at 12:42 AM Dan Aloni <dan@kernelim.com> wrote:
> >> If a client can use a single switch, shouldn't the name of the symlink
> >> be just "switch"? This is to be consistent with other symlinks in
> >> `sysfs` such as the ones in block layer, for example in my
> >> `/sys/block/sda`:
> >>
> >>     bdi -> ../../../../../../../../../../../virtual/bdi/8:0
> >>     device -> ../../../5:0:0:0
> >>
> >
> > Jumping back to this comment because now that I went back to try to
> > modify the code I'm having doubts.
> >
> > We still need numbering of xprt switches because they are different
> > for different mounts. So xprt_switches directory would still have
> > switch-0 for say a mount to server A and then switch-0 for a mount to
> > server B.  While yes I see that for a given rpc client that's making a
> > link into a xprt_switches directory will only have 1 link. And "yes"
> > the name of the link could be "switch". But isn't it more informative
> > to keep this to be the same name as the name of the directory under
> > the xprt_switches?
>
> The information isn't lost, as the symlink points to the specific switch.
> Not using a number in the symlink name informs that there will only be one
> switch for each client and makes it more deterministic for users and
> software to navigate.

What will be lost is that when you look at the xprt_switches directory
and see switch-1... switch-10 subdirectory, there is no way to tell
which rpc client uses which switch. Because each client-1 directory
will only have an entry saying "switch".

Anyway, I submitted the new version but I think it's not as good as
the original.

>
> Ben
>
Benjamin Coddington May 14, 2021, 3:58 p.m. UTC | #13
On 14 May 2021, at 10:16, Olga Kornievskaia wrote:
> On Fri, May 14, 2021 at 6:17 AM Benjamin Coddington 
> <bcodding@redhat.com> wrote:
>> The information isn't lost, as the symlink points to the specific 
>> switch.
>> Not using a number in the symlink name informs that there will only 
>> be one
>> switch for each client and makes it more deterministic for users and
>> software to navigate.
>
> What will be lost is that when you look at the xprt_switches directory
> and see switch-1... switch-10 subdirectory, there is no way to tell
> which rpc client uses which switch. Because each client-1 directory
> will only have an entry saying "switch".
>
> Anyway, I submitted the new version but I think it's not as good as
> the original.

Hmm, ok - will we ever need to traverse objects in that direction 
though?
I'm thinking that operations on xprts/rpcs will always start from a 
mount
perspective from the admin, but really the root object is struct 
nfs_server,
or bdi.  I'm thinking of something like:

     /sys/fs/nfs/<bdi>/rpc_clnt -> ../../net/sunrpc/clnt-0
     /sys/fs/nfs/<bdi>/volumes
     ...

I suppose though you could have something monitoring the xprts, and upon
finding a particular state would want to navigate back up to see what 
client
is affected.  At that point you'd have to read all the symlinks for all 
the
rpc_clients.

Ben
diff mbox series

Patch

diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index dab1abfef5cd..2e195623c10d 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -301,7 +301,6 @@  static int rpc_client_register(struct rpc_clnt *clnt,
 	int err;
 
 	rpc_clnt_debugfs_register(clnt);
-	rpc_sysfs_client_setup(clnt, net);
 
 	pipefs_sb = rpc_get_sb_net(net);
 	if (pipefs_sb) {
@@ -426,6 +425,7 @@  static struct rpc_clnt * rpc_new_client(const struct rpc_create_args *args,
 	/* save the nodename */
 	rpc_clnt_set_nodename(clnt, nodename);
 
+	rpc_sysfs_client_setup(clnt, xps, rpc_net_ns(clnt));
 	err = rpc_client_register(clnt, args->authflavor, args->client_name);
 	if (err)
 		goto out_no_path;
diff --git a/net/sunrpc/sysfs.c b/net/sunrpc/sysfs.c
index 03ea6d3ace95..871f7c696a93 100644
--- a/net/sunrpc/sysfs.c
+++ b/net/sunrpc/sysfs.c
@@ -150,14 +150,30 @@  rpc_sysfs_xprt_switch_alloc(struct kobject *parent,
 	return NULL;
 }
 
-void rpc_sysfs_client_setup(struct rpc_clnt *clnt, struct net *net)
+void rpc_sysfs_client_setup(struct rpc_clnt *clnt,
+			    struct rpc_xprt_switch *xprt_switch,
+			    struct net *net)
 {
 	struct rpc_sysfs_client *rpc_client;
 
-	rpc_client = rpc_sysfs_client_alloc(rpc_sunrpc_client_kobj, net, clnt->cl_clid);
+	rpc_client = rpc_sysfs_client_alloc(rpc_sunrpc_client_kobj,
+					    net, clnt->cl_clid);
 	if (rpc_client) {
+		char name[23];
+		struct rpc_sysfs_xprt_switch *xswitch =
+			(struct rpc_sysfs_xprt_switch *)xprt_switch->xps_sysfs;
+		int ret;
+
 		clnt->cl_sysfs = rpc_client;
+		rpc_client->clnt = clnt;
+		rpc_client->xprt_switch = xprt_switch;
 		kobject_uevent(&rpc_client->kobject, KOBJ_ADD);
+		snprintf(name, sizeof(name), "switch-%d", xprt_switch->xps_id);
+		ret = sysfs_create_link_nowarn(&rpc_client->kobject,
+					       &xswitch->kobject, name);
+		if (ret)
+			pr_warn("can't create link to %s in sysfs (%d)\n",
+				name, ret);
 	}
 }
 
@@ -188,6 +204,11 @@  void rpc_sysfs_client_destroy(struct rpc_clnt *clnt)
 	struct rpc_sysfs_client *rpc_client = clnt->cl_sysfs;
 
 	if (rpc_client) {
+		char name[23];
+
+		snprintf(name, sizeof(name), "switch-%d",
+			 rpc_client->xprt_switch->xps_id);
+		sysfs_remove_link(&rpc_client->kobject, name);
 		kobject_uevent(&rpc_client->kobject, KOBJ_REMOVE);
 		kobject_del(&rpc_client->kobject);
 		kobject_put(&rpc_client->kobject);
diff --git a/net/sunrpc/sysfs.h b/net/sunrpc/sysfs.h
index 52ec472bd4a9..760f0582aee5 100644
--- a/net/sunrpc/sysfs.h
+++ b/net/sunrpc/sysfs.h
@@ -8,6 +8,8 @@ 
 struct rpc_sysfs_client {
 	struct kobject kobject;
 	struct net *net;
+	struct rpc_clnt *clnt;
+	struct rpc_xprt_switch *xprt_switch;
 };
 
 struct rpc_sysfs_xprt_switch {
@@ -20,7 +22,9 @@  struct rpc_sysfs_xprt_switch {
 int rpc_sysfs_init(void);
 void rpc_sysfs_exit(void);
 
-void rpc_sysfs_client_setup(struct rpc_clnt *clnt, struct net *net);
+void rpc_sysfs_client_setup(struct rpc_clnt *clnt,
+			    struct rpc_xprt_switch *xprt_switch,
+			    struct net *net);
 void rpc_sysfs_client_destroy(struct rpc_clnt *clnt);
 void rpc_sysfs_xprt_switch_setup(struct rpc_xprt_switch *xprt_switch,
 				 struct rpc_xprt *xprt, gfp_t gfp_flags);