Message ID | 1566608656-30836-1-git-send-email-yanjun.zhu@oracle.com (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
Series | [PATCHv2,1/1] net: rds: add service level support in rds-info | expand |
On 8/23/19 6:04 PM, Zhu Yanjun wrote: > From IB specific 7.6.5 SERVICE LEVEL, Service Level (SL) > is used to identify different flows within an IBA subnet. > It is carried in the local route header of the packet. > > Before this commit, run "rds-info -I". The outputs are as > below: > " > RDS IB Connections: > LocalAddr RemoteAddr Tos SL LocalDev RemoteDev > 192.2.95.3 192.2.95.1 2 0 fe80::21:28:1a:39 fe80::21:28:10:b9 > 192.2.95.3 192.2.95.1 1 0 fe80::21:28:1a:39 fe80::21:28:10:b9 > 192.2.95.3 192.2.95.1 0 0 fe80::21:28:1a:39 fe80::21:28:10:b9 > " > After this commit, the output is as below: > " > RDS IB Connections: > LocalAddr RemoteAddr Tos SL LocalDev RemoteDev > 192.2.95.3 192.2.95.1 2 2 fe80::21:28:1a:39 fe80::21:28:10:b9 > 192.2.95.3 192.2.95.1 1 1 fe80::21:28:1a:39 fe80::21:28:10:b9 > 192.2.95.3 192.2.95.1 0 0 fe80::21:28:1a:39 fe80::21:28:10:b9 > " > > The commit fe3475af3bdf ("net: rds: add per rds connection cache > statistics") adds cache_allocs in struct rds_info_rdma_connection > as below: > struct rds_info_rdma_connection { > ... > __u32 rdma_mr_max; > __u32 rdma_mr_size; > __u8 tos; > __u32 cache_allocs; > }; > The peer struct in rds-tools of struct rds_info_rdma_connection is as > below: > struct rds_info_rdma_connection { > ... > uint32_t rdma_mr_max; > uint32_t rdma_mr_size; > uint8_t tos; > uint8_t sl; > uint32_t cache_allocs; > }; > The difference between userspace and kernel is the member variable sl. > In the kernel struct, the member variable sl is missing. This will > introduce risks. So it is necessary to use this commit to avoid this risk. > > Fixes: fe3475af3bdf ("net: rds: add per rds connection cache statistics") > CC: Joe Jin <joe.jin@oracle.com> > CC: JUNXIAO_BI <junxiao.bi@oracle.com> > Suggested-by: Gerd Rausch <gerd.rausch@oracle.com> > Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com> > --- > V1->V2: fix typos in commit logs. > --- I did ask you when ypu posted the patch about whether you did backward compatibility tests for which you said, you did all the tests and said "So do not worry about backward compatibility. This commit will work well with older rds-tools2.0.5 and 2.0.6." https://www.spinics.net/lists/netdev/msg574691.html I was worried about exactly such issue as described in commit. Anyways thanks for the fixup patch. Should be applied to stable as well. Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Regards, Santosh
On 2019/8/24 9:25, santosh.shilimkar@oracle.com wrote: > On 8/23/19 6:04 PM, Zhu Yanjun wrote: >> From IB specific 7.6.5 SERVICE LEVEL, Service Level (SL) >> is used to identify different flows within an IBA subnet. >> It is carried in the local route header of the packet. >> >> Before this commit, run "rds-info -I". The outputs are as >> below: >> " >> RDS IB Connections: >> LocalAddr RemoteAddr Tos SL LocalDev RemoteDev >> 192.2.95.3 192.2.95.1 2 0 fe80::21:28:1a:39 fe80::21:28:10:b9 >> 192.2.95.3 192.2.95.1 1 0 fe80::21:28:1a:39 fe80::21:28:10:b9 >> 192.2.95.3 192.2.95.1 0 0 fe80::21:28:1a:39 fe80::21:28:10:b9 >> " >> After this commit, the output is as below: >> " >> RDS IB Connections: >> LocalAddr RemoteAddr Tos SL LocalDev RemoteDev >> 192.2.95.3 192.2.95.1 2 2 fe80::21:28:1a:39 fe80::21:28:10:b9 >> 192.2.95.3 192.2.95.1 1 1 fe80::21:28:1a:39 fe80::21:28:10:b9 >> 192.2.95.3 192.2.95.1 0 0 fe80::21:28:1a:39 fe80::21:28:10:b9 >> " >> >> The commit fe3475af3bdf ("net: rds: add per rds connection cache >> statistics") adds cache_allocs in struct rds_info_rdma_connection >> as below: >> struct rds_info_rdma_connection { >> ... >> __u32 rdma_mr_max; >> __u32 rdma_mr_size; >> __u8 tos; >> __u32 cache_allocs; >> }; >> The peer struct in rds-tools of struct rds_info_rdma_connection is as >> below: >> struct rds_info_rdma_connection { >> ... >> uint32_t rdma_mr_max; >> uint32_t rdma_mr_size; >> uint8_t tos; >> uint8_t sl; >> uint32_t cache_allocs; >> }; >> The difference between userspace and kernel is the member variable sl. >> In the kernel struct, the member variable sl is missing. This will >> introduce risks. So it is necessary to use this commit to avoid this >> risk. >> >> Fixes: fe3475af3bdf ("net: rds: add per rds connection cache >> statistics") >> CC: Joe Jin <joe.jin@oracle.com> >> CC: JUNXIAO_BI <junxiao.bi@oracle.com> >> Suggested-by: Gerd Rausch <gerd.rausch@oracle.com> >> Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com> >> --- >> V1->V2: fix typos in commit logs. >> --- > I did ask you when ypu posted the patch about whether you did > backward compatibility tests for which you said, you did all the > tests and said "So do not worry about backward compatibility. This > commit will work well with older rds-tools2.0.5 and 2.0.6." > > https://www.spinics.net/lists/netdev/msg574691.html > > I was worried about exactly such issue as described in commit. Sorry. My bad. I will make more work to let rds robust. Thanks a lot for your Ack. Zhu Yanjun > > Anyways thanks for the fixup patch. Should be applied to stable > as well. > > Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> > > Regards, > Santosh > >
From: Zhu Yanjun <yanjun.zhu@oracle.com> Date: Fri, 23 Aug 2019 21:04:16 -0400 > diff --git a/include/uapi/linux/rds.h b/include/uapi/linux/rds.h > index fd6b5f6..cba368e 100644 > --- a/include/uapi/linux/rds.h > +++ b/include/uapi/linux/rds.h > @@ -250,6 +250,7 @@ struct rds_info_rdma_connection { > __u32 rdma_mr_max; > __u32 rdma_mr_size; > __u8 tos; > + __u8 sl; > __u32 cache_allocs; > }; I'm applying this, but I am once again severely disappointed in how RDS development is being handled. From the Fixes: commit: Since rds.h in rds-tools is not related with the kernel rds.h, the change in kernel rds.h does not affect rds-tools. This is the height of arrogance and shows a lack of understanding of what user ABI requirements are all about. It is possible for other userland components to be built by other people, outside of your controlled eco-system and tools, that use these interfaces. And you cannot control that. Therefore you cannot make arbitrary changes to UABI data strucures just because the tool you use and maintain is not effected by it. Please stop making these incredibly incompatible user interface changes in the RDS stack. I am, from this point forward, going to be extra strict on RDS stack changes especially in this area.
On 2019/8/25 7:58, David Miller wrote: > From: Zhu Yanjun <yanjun.zhu@oracle.com> > Date: Fri, 23 Aug 2019 21:04:16 -0400 > >> diff --git a/include/uapi/linux/rds.h b/include/uapi/linux/rds.h >> index fd6b5f6..cba368e 100644 >> --- a/include/uapi/linux/rds.h >> +++ b/include/uapi/linux/rds.h >> @@ -250,6 +250,7 @@ struct rds_info_rdma_connection { >> __u32 rdma_mr_max; >> __u32 rdma_mr_size; >> __u8 tos; >> + __u8 sl; >> __u32 cache_allocs; >> }; > I'm applying this, but I am once again severely disappointed in how > RDS development is being handled. > > >From the Fixes: commit: > > Since rds.h in rds-tools is not related with the kernel rds.h, > the change in kernel rds.h does not affect rds-tools. > > This is the height of arrogance and shows a lack of understanding of > what user ABI requirements are all about. > > It is possible for other userland components to be built by other > people, outside of your controlled eco-system and tools, that use > these interfaces. > > And you cannot control that. > > Therefore you cannot make arbitrary changes to UABI data strucures > just because the tool you use and maintain is not effected by it. > > Please stop making these incredibly incompatible user interface > changes in the RDS stack. > > I am, from this point forward, going to be extra strict on RDS stack > changes especially in this area. OK. It is up to you to decide to merge this commit or not. Zhu Yanjun > >
Hi, On 8/23/19 8:04 PM, Zhu Yanjun wrote: [..] > diff --git a/net/rds/ib.c b/net/rds/ib.c > index ec05d91..45acab2 100644 > --- a/net/rds/ib.c > +++ b/net/rds/ib.c > @@ -291,7 +291,7 @@ static int rds_ib_conn_info_visitor(struct rds_connection *conn, > void *buffer) > { > struct rds_info_rdma_connection *iinfo = buffer; > - struct rds_ib_connection *ic; > + struct rds_ib_connection *ic = conn->c_transport_data; > > /* We will only ever look at IB transports */ > if (conn->c_trans != &rds_ib_transport) > @@ -301,15 +301,16 @@ static int rds_ib_conn_info_visitor(struct rds_connection *conn, > > iinfo->src_addr = conn->c_laddr.s6_addr32[3]; > iinfo->dst_addr = conn->c_faddr.s6_addr32[3]; > - iinfo->tos = conn->c_tos; > + if (ic) { Is this null-check actually necessary? (see related comments below...) > + iinfo->tos = conn->c_tos; > + iinfo->sl = ic->i_sl; > + } > > memset(&iinfo->src_gid, 0, sizeof(iinfo->src_gid)); > memset(&iinfo->dst_gid, 0, sizeof(iinfo->dst_gid)); > if (rds_conn_state(conn) == RDS_CONN_UP) { > struct rds_ib_device *rds_ibdev; > > - ic = conn->c_transport_data; > - > rdma_read_gids(ic->i_cm_id, (union ib_gid *)&iinfo->src_gid, Notice that *ic* is dereferenced here without null-checking it. More comments below... > (union ib_gid *)&iinfo->dst_gid); > > @@ -329,7 +330,7 @@ static int rds6_ib_conn_info_visitor(struct rds_connection *conn, > void *buffer) > { > struct rds6_info_rdma_connection *iinfo6 = buffer; > - struct rds_ib_connection *ic; > + struct rds_ib_connection *ic = conn->c_transport_data; > > /* We will only ever look at IB transports */ > if (conn->c_trans != &rds_ib_transport) > @@ -337,6 +338,10 @@ static int rds6_ib_conn_info_visitor(struct rds_connection *conn, > > iinfo6->src_addr = conn->c_laddr; > iinfo6->dst_addr = conn->c_faddr; > + if (ic) { > + iinfo6->tos = conn->c_tos; > + iinfo6->sl = ic->i_sl; > + } > > memset(&iinfo6->src_gid, 0, sizeof(iinfo6->src_gid)); > memset(&iinfo6->dst_gid, 0, sizeof(iinfo6->dst_gid)); > @@ -344,7 +349,6 @@ static int rds6_ib_conn_info_visitor(struct rds_connection *conn, > if (rds_conn_state(conn) == RDS_CONN_UP) { > struct rds_ib_device *rds_ibdev; > > - ic = conn->c_transport_data; > rdma_read_gids(ic->i_cm_id, (union ib_gid *)&iinfo6->src_gid, Again, *ic* is being dereferenced here without a previous null-check. > (union ib_gid *)&iinfo6->dst_gid); > rds_ibdev = ic->rds_ibdev; -- Gustavo
On 2019/9/3 9:58, Gustavo A. R. Silva wrote: > Hi, > > On 8/23/19 8:04 PM, Zhu Yanjun wrote: > > [..] > >> diff --git a/net/rds/ib.c b/net/rds/ib.c >> index ec05d91..45acab2 100644 >> --- a/net/rds/ib.c >> +++ b/net/rds/ib.c >> @@ -291,7 +291,7 @@ static int rds_ib_conn_info_visitor(struct rds_connection *conn, >> void *buffer) >> { >> struct rds_info_rdma_connection *iinfo = buffer; >> - struct rds_ib_connection *ic; >> + struct rds_ib_connection *ic = conn->c_transport_data; >> >> /* We will only ever look at IB transports */ >> if (conn->c_trans != &rds_ib_transport) >> @@ -301,15 +301,16 @@ static int rds_ib_conn_info_visitor(struct rds_connection *conn, >> >> iinfo->src_addr = conn->c_laddr.s6_addr32[3]; >> iinfo->dst_addr = conn->c_faddr.s6_addr32[3]; >> - iinfo->tos = conn->c_tos; >> + if (ic) { > Is this null-check actually necessary? (see related comments below...) > >> + iinfo->tos = conn->c_tos; >> + iinfo->sl = ic->i_sl; >> + } >> >> memset(&iinfo->src_gid, 0, sizeof(iinfo->src_gid)); >> memset(&iinfo->dst_gid, 0, sizeof(iinfo->dst_gid)); >> if (rds_conn_state(conn) == RDS_CONN_UP) { >> struct rds_ib_device *rds_ibdev; >> >> - ic = conn->c_transport_data; >> - >> rdma_read_gids(ic->i_cm_id, (union ib_gid *)&iinfo->src_gid, > Notice that *ic* is dereferenced here without null-checking it. More > comments below... > >> (union ib_gid *)&iinfo->dst_gid); >> >> @@ -329,7 +330,7 @@ static int rds6_ib_conn_info_visitor(struct rds_connection *conn, >> void *buffer) >> { >> struct rds6_info_rdma_connection *iinfo6 = buffer; >> - struct rds_ib_connection *ic; >> + struct rds_ib_connection *ic = conn->c_transport_data; >> >> /* We will only ever look at IB transports */ >> if (conn->c_trans != &rds_ib_transport) >> @@ -337,6 +338,10 @@ static int rds6_ib_conn_info_visitor(struct rds_connection *conn, >> >> iinfo6->src_addr = conn->c_laddr; >> iinfo6->dst_addr = conn->c_faddr; >> + if (ic) { >> + iinfo6->tos = conn->c_tos; >> + iinfo6->sl = ic->i_sl; >> + } >> >> memset(&iinfo6->src_gid, 0, sizeof(iinfo6->src_gid)); >> memset(&iinfo6->dst_gid, 0, sizeof(iinfo6->dst_gid)); >> @@ -344,7 +349,6 @@ static int rds6_ib_conn_info_visitor(struct rds_connection *conn, >> if (rds_conn_state(conn) == RDS_CONN_UP) { >> struct rds_ib_device *rds_ibdev; >> >> - ic = conn->c_transport_data; >> rdma_read_gids(ic->i_cm_id, (union ib_gid *)&iinfo6->src_gid, > Again, *ic* is being dereferenced here without a previous null-check. Please check when this "rds_conn_state(conn) = RDS_CONN_UP". Thanks a lot. Zhu Yanjun > >> (union ib_gid *)&iinfo6->dst_gid); >> rds_ibdev = ic->rds_ibdev; > > -- > Gustavo >
diff --git a/include/uapi/linux/rds.h b/include/uapi/linux/rds.h index fd6b5f6..cba368e 100644 --- a/include/uapi/linux/rds.h +++ b/include/uapi/linux/rds.h @@ -250,6 +250,7 @@ struct rds_info_rdma_connection { __u32 rdma_mr_max; __u32 rdma_mr_size; __u8 tos; + __u8 sl; __u32 cache_allocs; }; @@ -265,6 +266,7 @@ struct rds6_info_rdma_connection { __u32 rdma_mr_max; __u32 rdma_mr_size; __u8 tos; + __u8 sl; __u32 cache_allocs; }; diff --git a/net/rds/ib.c b/net/rds/ib.c index ec05d91..45acab2 100644 --- a/net/rds/ib.c +++ b/net/rds/ib.c @@ -291,7 +291,7 @@ static int rds_ib_conn_info_visitor(struct rds_connection *conn, void *buffer) { struct rds_info_rdma_connection *iinfo = buffer; - struct rds_ib_connection *ic; + struct rds_ib_connection *ic = conn->c_transport_data; /* We will only ever look at IB transports */ if (conn->c_trans != &rds_ib_transport) @@ -301,15 +301,16 @@ static int rds_ib_conn_info_visitor(struct rds_connection *conn, iinfo->src_addr = conn->c_laddr.s6_addr32[3]; iinfo->dst_addr = conn->c_faddr.s6_addr32[3]; - iinfo->tos = conn->c_tos; + if (ic) { + iinfo->tos = conn->c_tos; + iinfo->sl = ic->i_sl; + } memset(&iinfo->src_gid, 0, sizeof(iinfo->src_gid)); memset(&iinfo->dst_gid, 0, sizeof(iinfo->dst_gid)); if (rds_conn_state(conn) == RDS_CONN_UP) { struct rds_ib_device *rds_ibdev; - ic = conn->c_transport_data; - rdma_read_gids(ic->i_cm_id, (union ib_gid *)&iinfo->src_gid, (union ib_gid *)&iinfo->dst_gid); @@ -329,7 +330,7 @@ static int rds6_ib_conn_info_visitor(struct rds_connection *conn, void *buffer) { struct rds6_info_rdma_connection *iinfo6 = buffer; - struct rds_ib_connection *ic; + struct rds_ib_connection *ic = conn->c_transport_data; /* We will only ever look at IB transports */ if (conn->c_trans != &rds_ib_transport) @@ -337,6 +338,10 @@ static int rds6_ib_conn_info_visitor(struct rds_connection *conn, iinfo6->src_addr = conn->c_laddr; iinfo6->dst_addr = conn->c_faddr; + if (ic) { + iinfo6->tos = conn->c_tos; + iinfo6->sl = ic->i_sl; + } memset(&iinfo6->src_gid, 0, sizeof(iinfo6->src_gid)); memset(&iinfo6->dst_gid, 0, sizeof(iinfo6->dst_gid)); @@ -344,7 +349,6 @@ static int rds6_ib_conn_info_visitor(struct rds_connection *conn, if (rds_conn_state(conn) == RDS_CONN_UP) { struct rds_ib_device *rds_ibdev; - ic = conn->c_transport_data; rdma_read_gids(ic->i_cm_id, (union ib_gid *)&iinfo6->src_gid, (union ib_gid *)&iinfo6->dst_gid); rds_ibdev = ic->rds_ibdev; diff --git a/net/rds/ib.h b/net/rds/ib.h index 303c6ee..f2b558e 100644 --- a/net/rds/ib.h +++ b/net/rds/ib.h @@ -220,6 +220,7 @@ struct rds_ib_connection { /* Send/Recv vectors */ int i_scq_vector; int i_rcq_vector; + u8 i_sl; }; /* This assumes that atomic_t is at least 32 bits */ diff --git a/net/rds/ib_cm.c b/net/rds/ib_cm.c index fddaa09..233f136 100644 --- a/net/rds/ib_cm.c +++ b/net/rds/ib_cm.c @@ -152,6 +152,9 @@ void rds_ib_cm_connect_complete(struct rds_connection *conn, struct rdma_cm_even RDS_PROTOCOL_MINOR(conn->c_version), ic->i_flowctl ? ", flow control" : ""); + /* receive sl from the peer */ + ic->i_sl = ic->i_cm_id->route.path_rec->sl; + atomic_set(&ic->i_cq_quiesce, 0); /* Init rings and fill recv. this needs to wait until protocol diff --git a/net/rds/rdma_transport.c b/net/rds/rdma_transport.c index ff74c4b..28668ad 100644 --- a/net/rds/rdma_transport.c +++ b/net/rds/rdma_transport.c @@ -43,6 +43,9 @@ static struct rdma_cm_id *rds6_rdma_listen_id; #endif +/* Per IB specification 7.7.3, service level is a 4-bit field. */ +#define TOS_TO_SL(tos) ((tos) & 0xF) + static int rds_rdma_cm_event_handler_cmn(struct rdma_cm_id *cm_id, struct rdma_cm_event *event, bool isv6) @@ -97,10 +100,13 @@ static int rds_rdma_cm_event_handler_cmn(struct rdma_cm_id *cm_id, struct rds_ib_connection *ibic; ibic = conn->c_transport_data; - if (ibic && ibic->i_cm_id == cm_id) + if (ibic && ibic->i_cm_id == cm_id) { + cm_id->route.path_rec[0].sl = + TOS_TO_SL(conn->c_tos); ret = trans->cm_initiate_connect(cm_id, isv6); - else + } else { rds_conn_drop(conn); + } } break;
From IB specific 7.6.5 SERVICE LEVEL, Service Level (SL) is used to identify different flows within an IBA subnet. It is carried in the local route header of the packet. Before this commit, run "rds-info -I". The outputs are as below: " RDS IB Connections: LocalAddr RemoteAddr Tos SL LocalDev RemoteDev 192.2.95.3 192.2.95.1 2 0 fe80::21:28:1a:39 fe80::21:28:10:b9 192.2.95.3 192.2.95.1 1 0 fe80::21:28:1a:39 fe80::21:28:10:b9 192.2.95.3 192.2.95.1 0 0 fe80::21:28:1a:39 fe80::21:28:10:b9 " After this commit, the output is as below: " RDS IB Connections: LocalAddr RemoteAddr Tos SL LocalDev RemoteDev 192.2.95.3 192.2.95.1 2 2 fe80::21:28:1a:39 fe80::21:28:10:b9 192.2.95.3 192.2.95.1 1 1 fe80::21:28:1a:39 fe80::21:28:10:b9 192.2.95.3 192.2.95.1 0 0 fe80::21:28:1a:39 fe80::21:28:10:b9 " The commit fe3475af3bdf ("net: rds: add per rds connection cache statistics") adds cache_allocs in struct rds_info_rdma_connection as below: struct rds_info_rdma_connection { ... __u32 rdma_mr_max; __u32 rdma_mr_size; __u8 tos; __u32 cache_allocs; }; The peer struct in rds-tools of struct rds_info_rdma_connection is as below: struct rds_info_rdma_connection { ... uint32_t rdma_mr_max; uint32_t rdma_mr_size; uint8_t tos; uint8_t sl; uint32_t cache_allocs; }; The difference between userspace and kernel is the member variable sl. In the kernel struct, the member variable sl is missing. This will introduce risks. So it is necessary to use this commit to avoid this risk. Fixes: fe3475af3bdf ("net: rds: add per rds connection cache statistics") CC: Joe Jin <joe.jin@oracle.com> CC: JUNXIAO_BI <junxiao.bi@oracle.com> Suggested-by: Gerd Rausch <gerd.rausch@oracle.com> Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com> --- V1->V2: fix typos in commit logs. --- include/uapi/linux/rds.h | 2 ++ net/rds/ib.c | 16 ++++++++++------ net/rds/ib.h | 1 + net/rds/ib_cm.c | 3 +++ net/rds/rdma_transport.c | 10 ++++++++-- 5 files changed, 24 insertions(+), 8 deletions(-)