Message ID | 20230119180506.5197-1-rpearsonhpe@gmail.com (mailing list archive) |
---|---|
State | Changes Requested |
Headers | show |
Series | RDMA/rxe: Fix parameter errors | expand |
On Thu, Jan 19, 2023 at 12:05:07PM -0600, Bob Pearson wrote: > Correct errors in rxe_param.h caused by extending the range of > indices for MRs allowing it to overlap the range for MWs. Since > the driver determines whether an rkey is for an MR or MW by comparing > the index part of the rkey with these ranges this can cause an > MR to be incorrectly determined to be an MW. > > Additionally the parameters which determine the size of the index > ranges for MR, MW, QP and SRQ are incorrect since the actual > number of integers in the range [min, max] is (max - min + 1) not > (max - min). > > This patch corrects these errors. > > Fixes: 0994a1bcd5f7 ("RDMA/rxe: Bump up default maximum values used via uverbs") > Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> > --- > drivers/infiniband/sw/rxe/rxe_param.h | 27 +++++++++++++++++++-------- > 1 file changed, 19 insertions(+), 8 deletions(-) This commit 1aefe5c177c1922119afb4ee443ddd6ac3140b37 Author: Daisuke Matsuda <matsuda-daisuke@fujitsu.com> Date: Tue Dec 20 17:08:48 2022 +0900 RDMA/rxe: Prevent faulty rkey generation If you create MRs more than 0x10000 times after loading the module, responder starts to reply NAKs for RDMA/Atomic operations because of rkey violation detected in check_rkey(). The root cause is that rkeys are incremented each time a new MR is created and the value overflows into the range reserved for MWs. This commit also increases the value of RXE_MAX_MW that has been limited unlike other parameters. Fixes: 0994a1bcd5f7 ("RDMA/rxe: Bump up default maximum values used via uverbs") Link: https://lore.kernel.org/r/20221220080848.253785-2-matsuda-daisuke@fujitsu.com Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com> Tested-by: Li Zhijian <lizhijian@fujitsu.com> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Is already in v6.2-rc and conflicts with this patch, it looks like it is doing the same thing, can you sort it out please? Thanks, Jason
On 1/19/23 13:18, Jason Gunthorpe wrote: > On Thu, Jan 19, 2023 at 12:05:07PM -0600, Bob Pearson wrote: >> Correct errors in rxe_param.h caused by extending the range of >> indices for MRs allowing it to overlap the range for MWs. Since >> the driver determines whether an rkey is for an MR or MW by comparing >> the index part of the rkey with these ranges this can cause an >> MR to be incorrectly determined to be an MW. >> >> Additionally the parameters which determine the size of the index >> ranges for MR, MW, QP and SRQ are incorrect since the actual >> number of integers in the range [min, max] is (max - min + 1) not >> (max - min). >> >> This patch corrects these errors. >> >> Fixes: 0994a1bcd5f7 ("RDMA/rxe: Bump up default maximum values used via uverbs") >> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> >> --- >> drivers/infiniband/sw/rxe/rxe_param.h | 27 +++++++++++++++++++-------- >> 1 file changed, 19 insertions(+), 8 deletions(-) > > This > > commit 1aefe5c177c1922119afb4ee443ddd6ac3140b37 > Author: Daisuke Matsuda <matsuda-daisuke@fujitsu.com> > Date: Tue Dec 20 17:08:48 2022 +0900 > > RDMA/rxe: Prevent faulty rkey generation > > If you create MRs more than 0x10000 times after loading the module, > responder starts to reply NAKs for RDMA/Atomic operations because of rkey > violation detected in check_rkey(). The root cause is that rkeys are > incremented each time a new MR is created and the value overflows into the > range reserved for MWs. > > This commit also increases the value of RXE_MAX_MW that has been limited > unlike other parameters. > > Fixes: 0994a1bcd5f7 ("RDMA/rxe: Bump up default maximum values used via uverbs") > Link: https://lore.kernel.org/r/20221220080848.253785-2-matsuda-daisuke@fujitsu.com > Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com> > Tested-by: Li Zhijian <lizhijian@fujitsu.com> > Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> > > > Is already in v6.2-rc and conflicts with this patch, it looks like it > is doing the same thing, can you sort it out please? > > Thanks, > Jason Missed that one. Yes, they are basically identical except he cut the range in half and gave one to each and I doubled it. The other change I made is still a bug but much less important. It reports an incorrect max_xxx number in hca attributes but has no ill effect. We can leave it the way it is for now. Bob
On 1/19/23 13:18, Jason Gunthorpe wrote: > On Thu, Jan 19, 2023 at 12:05:07PM -0600, Bob Pearson wrote: >> Correct errors in rxe_param.h caused by extending the range of >> indices for MRs allowing it to overlap the range for MWs. Since >> the driver determines whether an rkey is for an MR or MW by comparing >> the index part of the rkey with these ranges this can cause an >> MR to be incorrectly determined to be an MW. >> >> Additionally the parameters which determine the size of the index >> ranges for MR, MW, QP and SRQ are incorrect since the actual >> number of integers in the range [min, max] is (max - min + 1) not >> (max - min). >> >> This patch corrects these errors. >> >> Fixes: 0994a1bcd5f7 ("RDMA/rxe: Bump up default maximum values used via uverbs") >> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> >> --- >> drivers/infiniband/sw/rxe/rxe_param.h | 27 +++++++++++++++++++-------- >> 1 file changed, 19 insertions(+), 8 deletions(-) > > This > > commit 1aefe5c177c1922119afb4ee443ddd6ac3140b37 > Author: Daisuke Matsuda <matsuda-daisuke@fujitsu.com> > Date: Tue Dec 20 17:08:48 2022 +0900 > > RDMA/rxe: Prevent faulty rkey generation > > If you create MRs more than 0x10000 times after loading the module, > responder starts to reply NAKs for RDMA/Atomic operations because of rkey > violation detected in check_rkey(). The root cause is that rkeys are > incremented each time a new MR is created and the value overflows into the > range reserved for MWs. > > This commit also increases the value of RXE_MAX_MW that has been limited > unlike other parameters. > > Fixes: 0994a1bcd5f7 ("RDMA/rxe: Bump up default maximum values used via uverbs") > Link: https://lore.kernel.org/r/20221220080848.253785-2-matsuda-daisuke@fujitsu.com > Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com> > Tested-by: Li Zhijian <lizhijian@fujitsu.com> > Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> > > > Is already in v6.2-rc and conflicts with this patch, it looks like it > is doing the same thing, can you sort it out please? > > Thanks, > Jason Did this get lost? for-next is now at 6.2-rc3 now and the bug is still in rxe_param.h. Bob
On Wed, Mar 01, 2023 at 05:15:07PM -0600, Bob Pearson wrote: > On 1/19/23 13:18, Jason Gunthorpe wrote: > > On Thu, Jan 19, 2023 at 12:05:07PM -0600, Bob Pearson wrote: > >> Correct errors in rxe_param.h caused by extending the range of > >> indices for MRs allowing it to overlap the range for MWs. Since > >> the driver determines whether an rkey is for an MR or MW by comparing > >> the index part of the rkey with these ranges this can cause an > >> MR to be incorrectly determined to be an MW. > >> > >> Additionally the parameters which determine the size of the index > >> ranges for MR, MW, QP and SRQ are incorrect since the actual > >> number of integers in the range [min, max] is (max - min + 1) not > >> (max - min). > >> > >> This patch corrects these errors. > >> > >> Fixes: 0994a1bcd5f7 ("RDMA/rxe: Bump up default maximum values used via uverbs") > >> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> > >> --- > >> drivers/infiniband/sw/rxe/rxe_param.h | 27 +++++++++++++++++++-------- > >> 1 file changed, 19 insertions(+), 8 deletions(-) > > > > This > > > > commit 1aefe5c177c1922119afb4ee443ddd6ac3140b37 > > Author: Daisuke Matsuda <matsuda-daisuke@fujitsu.com> > > Date: Tue Dec 20 17:08:48 2022 +0900 > > > > RDMA/rxe: Prevent faulty rkey generation > > > > If you create MRs more than 0x10000 times after loading the module, > > responder starts to reply NAKs for RDMA/Atomic operations because of rkey > > violation detected in check_rkey(). The root cause is that rkeys are > > incremented each time a new MR is created and the value overflows into the > > range reserved for MWs. > > > > This commit also increases the value of RXE_MAX_MW that has been limited > > unlike other parameters. > > > > Fixes: 0994a1bcd5f7 ("RDMA/rxe: Bump up default maximum values used via uverbs") > > Link: https://lore.kernel.org/r/20221220080848.253785-2-matsuda-daisuke@fujitsu.com > > Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com> > > Tested-by: Li Zhijian <lizhijian@fujitsu.com> > > Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> > > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> > > > > > > Is already in v6.2-rc and conflicts with this patch, it looks like it > > is doing the same thing, can you sort it out please? > > > > Thanks, > > Jason > > Did this get lost? for-next is now at 6.2-rc3 now and the bug is > still in rxe_param.h. Check again we are at v6.3-rc1 now, if something needs to be fixed send a new patch.. Jason
On 3/6/23 14:51, Jason Gunthorpe wrote: > On Wed, Mar 01, 2023 at 05:15:07PM -0600, Bob Pearson wrote: >> On 1/19/23 13:18, Jason Gunthorpe wrote: >>> On Thu, Jan 19, 2023 at 12:05:07PM -0600, Bob Pearson wrote: >>>> Correct errors in rxe_param.h caused by extending the range of >>>> indices for MRs allowing it to overlap the range for MWs. Since >>>> the driver determines whether an rkey is for an MR or MW by comparing >>>> the index part of the rkey with these ranges this can cause an >>>> MR to be incorrectly determined to be an MW. >>>> >>>> Additionally the parameters which determine the size of the index >>>> ranges for MR, MW, QP and SRQ are incorrect since the actual >>>> number of integers in the range [min, max] is (max - min + 1) not >>>> (max - min). >>>> >>>> This patch corrects these errors. >>>> >>>> Fixes: 0994a1bcd5f7 ("RDMA/rxe: Bump up default maximum values used via uverbs") >>>> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> >>>> --- >>>> drivers/infiniband/sw/rxe/rxe_param.h | 27 +++++++++++++++++++-------- >>>> 1 file changed, 19 insertions(+), 8 deletions(-) >>> >>> This >>> >>> commit 1aefe5c177c1922119afb4ee443ddd6ac3140b37 >>> Author: Daisuke Matsuda <matsuda-daisuke@fujitsu.com> >>> Date: Tue Dec 20 17:08:48 2022 +0900 >>> >>> RDMA/rxe: Prevent faulty rkey generation >>> >>> If you create MRs more than 0x10000 times after loading the module, >>> responder starts to reply NAKs for RDMA/Atomic operations because of rkey >>> violation detected in check_rkey(). The root cause is that rkeys are >>> incremented each time a new MR is created and the value overflows into the >>> range reserved for MWs. >>> >>> This commit also increases the value of RXE_MAX_MW that has been limited >>> unlike other parameters. >>> >>> Fixes: 0994a1bcd5f7 ("RDMA/rxe: Bump up default maximum values used via uverbs") >>> Link: https://lore.kernel.org/r/20221220080848.253785-2-matsuda-daisuke@fujitsu.com >>> Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com> >>> Tested-by: Li Zhijian <lizhijian@fujitsu.com> >>> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> >>> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> >>> >>> >>> Is already in v6.2-rc and conflicts with this patch, it looks like it >>> is doing the same thing, can you sort it out please? >>> >>> Thanks, >>> Jason >> >> Did this get lost? for-next is now at 6.2-rc3 now and the bug is >> still in rxe_param.h. > > Check again we are at v6.3-rc1 now, if something needs to be fixed > send a new patch.. > > Jason Just checked. It now looks good in for-next. Thanks Bob
diff --git a/drivers/infiniband/sw/rxe/rxe_param.h b/drivers/infiniband/sw/rxe/rxe_param.h index a754fc902e3d..14baa84d1d9d 100644 --- a/drivers/infiniband/sw/rxe/rxe_param.h +++ b/drivers/infiniband/sw/rxe/rxe_param.h @@ -91,18 +91,29 @@ enum rxe_device_param { RXE_MIN_QP_INDEX = 16, RXE_MAX_QP_INDEX = DEFAULT_MAX_VALUE, - RXE_MAX_QP = DEFAULT_MAX_VALUE - RXE_MIN_QP_INDEX, + RXE_MAX_QP = RXE_MAX_QP_INDEX + - RXE_MIN_QP_INDEX + 1, RXE_MIN_SRQ_INDEX = 0x00020001, RXE_MAX_SRQ_INDEX = DEFAULT_MAX_VALUE, - RXE_MAX_SRQ = DEFAULT_MAX_VALUE - RXE_MIN_SRQ_INDEX, - - RXE_MIN_MR_INDEX = 0x00000001, + RXE_MAX_SRQ = RXE_MAX_SRQ_INDEX + - RXE_MIN_SRQ_INDEX + 1, + + /* + * MR and MW indices are converted to rkeys by shifting + * left 8 bits and oring in an 8 bit key which either + * belongs to the driver or the user depending on the + * MR type. In order to determine if the rkey is an MR + * or an MW the index ranges below must not overlap. + */ + RXE_MIN_MR_INDEX = 1, RXE_MAX_MR_INDEX = DEFAULT_MAX_VALUE, - RXE_MAX_MR = DEFAULT_MAX_VALUE - RXE_MIN_MR_INDEX, - RXE_MIN_MW_INDEX = 0x00010001, - RXE_MAX_MW_INDEX = 0x00020000, - RXE_MAX_MW = 0x00001000, + RXE_MAX_MR = RXE_MAX_MR_INDEX + - RXE_MIN_MR_INDEX + 1, + RXE_MIN_MW_INDEX = DEFAULT_MAX_VALUE + 1, + RXE_MAX_MW_INDEX = 2*DEFAULT_MAX_VALUE, + RXE_MAX_MW = RXE_MAX_MW_INDEX + - RXE_MIN_MW_INDEX + 1, RXE_MAX_PKT_PER_ACK = 64,
Correct errors in rxe_param.h caused by extending the range of indices for MRs allowing it to overlap the range for MWs. Since the driver determines whether an rkey is for an MR or MW by comparing the index part of the rkey with these ranges this can cause an MR to be incorrectly determined to be an MW. Additionally the parameters which determine the size of the index ranges for MR, MW, QP and SRQ are incorrect since the actual number of integers in the range [min, max] is (max - min + 1) not (max - min). This patch corrects these errors. Fixes: 0994a1bcd5f7 ("RDMA/rxe: Bump up default maximum values used via uverbs") Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> --- drivers/infiniband/sw/rxe/rxe_param.h | 27 +++++++++++++++++++-------- 1 file changed, 19 insertions(+), 8 deletions(-)