Message ID | 20231027054154.2935054-1-lizhijian@fujitsu.com (mailing list archive) |
---|---|
State | RFC |
Headers | show |
Series | [RFC,1/2] RDMA/rxe: don't allow registering !PAGE_SIZE mr | expand |
在 2023/10/27 13:41, Li Zhijian 写道: > mr->page_list only encodes *page without page offset, when > page_size != PAGE_SIZE, we cannot restore the address with a wrong > page_offset. > > Note that this patch will break some ULPs that try to register 4K > MR when PAGE_SIZE is not 4K. > SRP and nvme over RXE is known to be impacted. When ULP uses folio or compound page, ULP can not work well with RXE after this commit is applied. Perhaps removing page_size set in RXE is a good solution because page_size is set twice, firstly page_size is set in infiniband/core, secondly it is set in RXE. When folio or compound page is used in ULP, it is very possible that page_size in infiniband/core is different from the page_size in RXE. Not sure what problem this difference will cause. Zhu Yanjun > > Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> > --- > drivers/infiniband/sw/rxe/rxe_mr.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c > index f54042e9aeb2..61a136ea1d91 100644 > --- a/drivers/infiniband/sw/rxe/rxe_mr.c > +++ b/drivers/infiniband/sw/rxe/rxe_mr.c > @@ -234,6 +234,12 @@ int rxe_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sgl, > struct rxe_mr *mr = to_rmr(ibmr); > unsigned int page_size = mr_page_size(mr); > > + if (page_size != PAGE_SIZE) { > + rxe_info_mr(mr, "Unsupported MR with page_size %u, expect %lu\n", > + page_size, PAGE_SIZE); > + return -EOPNOTSUPP; > + } > + > mr->nbuf = 0; > mr->page_shift = ilog2(page_size); > mr->page_mask = ~((u64)page_size - 1);
On 10/27/23 01:17, Zhu Yanjun wrote: > When ULP uses folio or compound page, ULP can not work well with RXE > after this commit is applied. Isn't it the responsibility of the DMA mapping code to build an sg-list for folios? Drivers like ib_srp see an sg-list, whether that comes from a folio or from something else. Bart.
在 2023/10/28 5:46, Bart Van Assche 写道: > On 10/27/23 01:17, Zhu Yanjun wrote: >> When ULP uses folio or compound page, ULP can not work well with RXE >> after this commit is applied. > > Isn't it the responsibility of the DMA mapping code to build an sg-list > for folios? Drivers like ib_srp see an sg-list, whether that comes from A folio is a way of representing a set of physically contiguous base pages. In current implementations of folio, it seems that sg-list is not used. In Folio, some huge pages whose size is not PAGE_SIZE is dma-mapped into hardware. So the page size of folio is not equal to PAGE_SIZE. If this commit is applied, it causes potential risks to the future folio. I have developed some folio work for some NIC and RDMA drivers. In Folio, the page size of Folio is possibly not equal to PAGE_SIZE, it is multiple PAGE_SIZE. And when folio is dma-mapped to HW, the page size is equal to multiple PAGE_SIZE. In this case, ULP with folio will not work well with current RXE after this commit is applied. Removing page_size in RXE seems a plan for this problem. Zhu Yanjun > a folio or from something else. > > Bart.
On 10/27/23 19:48, Zhu Yanjun wrote: > In this case, ULP with folio will not work well with current RXE after > this commit is applied. Why not? RDMA ULPs like the SRP initiator driver use ib_map_mr_sg(). The latter function calls rxe_map_mr_sg() if the RXE driver is used. rxe_map_mr_sg() calls ib_sg_to_pages(). ib_sg_to_pages() translates SG-entries that are larger than a virtual memory page into multiple entries with size mr->page_size. Thanks, Bart.
在 2023/10/29 7:07, Bart Van Assche 写道: > On 10/27/23 19:48, Zhu Yanjun wrote: >> In this case, ULP with folio will not work well with current RXE after >> this commit is applied. > > Why not? RDMA ULPs like the SRP initiator driver use ib_map_mr_sg(). The > latter function calls rxe_map_mr_sg() if the RXE driver is used. > rxe_map_mr_sg() calls ib_sg_to_pages(). ib_sg_to_pages() translates > SG-entries that are larger than a virtual memory page into multiple > entries with size mr->page_size. It seems that we are not on the same page. I am thinking that this fix should also work for the future folio. And your idea is based on the current implementation. Perhaps it is not a good time to discuss folio currently. Let me focus on the current implementation. In this commit, if the page size is not equal to PAGE_SIZE, it will pop out warning then exit. So "rxe_info_mr" should be changed to rxe_warning_mr? In fact, I like to remove the page size assignment in rxe, that is, partly reverting the commit 325a7eb85199 ("RDMA/rxe: Cleanup page variables in rxe_mr.c"). But it seems that Jason did not like this. I no longer insist on this. I am not sure whether Jason, Leon, Bob and you have better solution to this problem or not. If not, this commit can fix this problem if no better solution. Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Thanks, Zhu Yanjun > > Thanks, > > Bart. >
On 27/10/2023 13:41, Li Zhijian wrote: > mr->page_list only encodes *page without page offset, when > page_size != PAGE_SIZE, we cannot restore the address with a wrong > page_offset. > > Note that this patch will break some ULPs that try to register 4K > MR when PAGE_SIZE is not 4K. > SRP and nvme over RXE is known to be impacted. > > Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> > --- > drivers/infiniband/sw/rxe/rxe_mr.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c > index f54042e9aeb2..61a136ea1d91 100644 > --- a/drivers/infiniband/sw/rxe/rxe_mr.c > +++ b/drivers/infiniband/sw/rxe/rxe_mr.c > @@ -234,6 +234,12 @@ int rxe_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sgl, > struct rxe_mr *mr = to_rmr(ibmr); > unsigned int page_size = mr_page_size(mr); > > + if (page_size != PAGE_SIZE) { It seems this condition is too strict, it should be: if (!IS_ALIGNED(page_size, PAGE_SIZE)) So that, page_size with (N * PAGE_SIZE) can work as previously. Because the offset(mr.iova & page_mask) will get lost only when !IS_ALIGNED(page_size, PAGE_SIZE) Thanks Zhijian > + rxe_info_mr(mr, "Unsupported MR with page_size %u, expect %lu\n", > + page_size, PAGE_SIZE); > + return -EOPNOTSUPP; > + } > + > mr->nbuf = 0; > mr->page_shift = ilog2(page_size); > mr->page_mask = ~((u64)page_size - 1);
On 27/10/2023 16:17, Zhu Yanjun wrote: > 在 2023/10/27 13:41, Li Zhijian 写道: >> mr->page_list only encodes *page without page offset, when >> page_size != PAGE_SIZE, we cannot restore the address with a wrong >> page_offset. >> >> Note that this patch will break some ULPs that try to register 4K >> MR when PAGE_SIZE is not 4K. >> SRP and nvme over RXE is known to be impacted. > > When ULP uses folio or compound page, ULP can not work well with RXE after this commit is applied. > > Perhaps removing page_size set in RXE is a good solution because page_size is set twice, firstly page_size is set in infiniband/core, secondly it is set in RXE. Does The RXE one mean rxe_mr_init(), I think rxe_reg_user_mr() requires this. 48 static void rxe_mr_init(int access, struct rxe_mr *mr) 49 { 50 u32 key = mr->elem.index << 8 | rxe_get_next_key(-1); 51 52 /* set ibmr->l/rkey and also copy into private l/rkey 53 * for user MRs these will always be the same 54 * for cases where caller 'owns' the key portion 55 * they may be different until REG_MR WQE is executed. 56 */ 57 mr->lkey = mr->ibmr.lkey = key; 58 mr->rkey = mr->ibmr.rkey = key; 59 60 mr->access = access; 61 mr->ibmr.page_size = PAGE_SIZE; 62 mr->page_mask = PAGE_MASK; 63 mr->page_shift = PAGE_SHIFT; 64 mr->state = RXE_MR_STATE_INVALID; 65 } Thanks Zhijian > > When folio or compound page is used in ULP, it is very possible that page_size in infiniband/core is different from the page_size in RXE > > Not sure what problem this difference will cause. > > Zhu Yanjun > >> >> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> >> --- >> drivers/infiniband/sw/rxe/rxe_mr.c | 6 ++++++ >> 1 file changed, 6 insertions(+) >> >> diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c >> index f54042e9aeb2..61a136ea1d91 100644 >> --- a/drivers/infiniband/sw/rxe/rxe_mr.c >> +++ b/drivers/infiniband/sw/rxe/rxe_mr.c >> @@ -234,6 +234,12 @@ int rxe_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sgl, >> struct rxe_mr *mr = to_rmr(ibmr); >> unsigned int page_size = mr_page_size(mr); >> + if (page_size != PAGE_SIZE) { >> + rxe_info_mr(mr, "Unsupported MR with page_size %u, expect %lu\n", >> + page_size, PAGE_SIZE); >> + return -EOPNOTSUPP; >> + } >> + >> mr->nbuf = 0; >> mr->page_shift = ilog2(page_size); >> mr->page_mask = ~((u64)page_size - 1); >
在 2023/10/30 16:13, Zhijian Li (Fujitsu) 写道: > > > On 27/10/2023 16:17, Zhu Yanjun wrote: >> 在 2023/10/27 13:41, Li Zhijian 写道: >>> mr->page_list only encodes *page without page offset, when >>> page_size != PAGE_SIZE, we cannot restore the address with a wrong >>> page_offset. >>> >>> Note that this patch will break some ULPs that try to register 4K >>> MR when PAGE_SIZE is not 4K. >>> SRP and nvme over RXE is known to be impacted. >> >> When ULP uses folio or compound page, ULP can not work well with RXE after this commit is applied. >> >> Perhaps removing page_size set in RXE is a good solution because page_size is set twice, firstly page_size is set in infiniband/core, secondly it is set in RXE. > > Does The RXE one mean rxe_mr_init(), I think rxe_reg_user_mr() requires this. Please read the discussions carefully. This problem has been discussed. Best Regards, Zhu Yanjun > > 48 static void rxe_mr_init(int access, struct rxe_mr *mr) > 49 { > 50 u32 key = mr->elem.index << 8 | rxe_get_next_key(-1); > 51 > 52 /* set ibmr->l/rkey and also copy into private l/rkey > 53 * for user MRs these will always be the same > 54 * for cases where caller 'owns' the key portion > 55 * they may be different until REG_MR WQE is executed. > 56 */ > 57 mr->lkey = mr->ibmr.lkey = key; > 58 mr->rkey = mr->ibmr.rkey = key; > 59 > 60 mr->access = access; > 61 mr->ibmr.page_size = PAGE_SIZE; > 62 mr->page_mask = PAGE_MASK; > 63 mr->page_shift = PAGE_SHIFT; > 64 mr->state = RXE_MR_STATE_INVALID; > 65 } > > > Thanks > Zhijian > >> >> When folio or compound page is used in ULP, it is very possible that page_size in infiniband/core is different from the page_size in RXE >> >> Not sure what problem this difference will cause. >> >> Zhu Yanjun >> >>> >>> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> >>> --- >>> drivers/infiniband/sw/rxe/rxe_mr.c | 6 ++++++ >>> 1 file changed, 6 insertions(+) >>> >>> diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c >>> index f54042e9aeb2..61a136ea1d91 100644 >>> --- a/drivers/infiniband/sw/rxe/rxe_mr.c >>> +++ b/drivers/infiniband/sw/rxe/rxe_mr.c >>> @@ -234,6 +234,12 @@ int rxe_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sgl, >>> struct rxe_mr *mr = to_rmr(ibmr); >>> unsigned int page_size = mr_page_size(mr); >>> + if (page_size != PAGE_SIZE) { >>> + rxe_info_mr(mr, "Unsupported MR with page_size %u, expect %lu\n", >>> + page_size, PAGE_SIZE); >>> + return -EOPNOTSUPP; >>> + } >>> + >>> mr->nbuf = 0; >>> mr->page_shift = ilog2(page_size); >>> mr->page_mask = ~((u64)page_size - 1);
On Mon, Oct 30, 2023 at 07:51:41AM +0000, Zhijian Li (Fujitsu) wrote: > > > On 27/10/2023 13:41, Li Zhijian wrote: > > mr->page_list only encodes *page without page offset, when > > page_size != PAGE_SIZE, we cannot restore the address with a wrong > > page_offset. > > > > Note that this patch will break some ULPs that try to register 4K > > MR when PAGE_SIZE is not 4K. > > SRP and nvme over RXE is known to be impacted. > > > > Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> > > --- > > drivers/infiniband/sw/rxe/rxe_mr.c | 6 ++++++ > > 1 file changed, 6 insertions(+) > > > > diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c > > index f54042e9aeb2..61a136ea1d91 100644 > > --- a/drivers/infiniband/sw/rxe/rxe_mr.c > > +++ b/drivers/infiniband/sw/rxe/rxe_mr.c > > @@ -234,6 +234,12 @@ int rxe_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sgl, > > struct rxe_mr *mr = to_rmr(ibmr); > > unsigned int page_size = mr_page_size(mr); > > > > + if (page_size != PAGE_SIZE) { > > It seems this condition is too strict, it should be: > if (!IS_ALIGNED(page_size, PAGE_SIZE)) > > So that, page_size with (N * PAGE_SIZE) can work as previously. > Because the offset(mr.iova & page_mask) will get lost only when !IS_ALIGNED(page_size, PAGE_SIZE) That makes sense Jason
在 2023/10/30 20:40, Jason Gunthorpe 写道: > On Mon, Oct 30, 2023 at 07:51:41AM +0000, Zhijian Li (Fujitsu) wrote: >> >> >> On 27/10/2023 13:41, Li Zhijian wrote: >>> mr->page_list only encodes *page without page offset, when >>> page_size != PAGE_SIZE, we cannot restore the address with a wrong >>> page_offset. >>> >>> Note that this patch will break some ULPs that try to register 4K >>> MR when PAGE_SIZE is not 4K. >>> SRP and nvme over RXE is known to be impacted. >>> >>> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> >>> --- >>> drivers/infiniband/sw/rxe/rxe_mr.c | 6 ++++++ >>> 1 file changed, 6 insertions(+) >>> >>> diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c >>> index f54042e9aeb2..61a136ea1d91 100644 >>> --- a/drivers/infiniband/sw/rxe/rxe_mr.c >>> +++ b/drivers/infiniband/sw/rxe/rxe_mr.c >>> @@ -234,6 +234,12 @@ int rxe_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sgl, >>> struct rxe_mr *mr = to_rmr(ibmr); >>> unsigned int page_size = mr_page_size(mr); >>> >>> + if (page_size != PAGE_SIZE) { >> >> It seems this condition is too strict, it should be: >> if (!IS_ALIGNED(page_size, PAGE_SIZE)) >> >> So that, page_size with (N * PAGE_SIZE) can work as previously. >> Because the offset(mr.iova & page_mask) will get lost only when !IS_ALIGNED(page_size, PAGE_SIZE) > > That makes sense I read all the discussions very carefully. Thanks, Greg. Because RXE only supports PAGE_SIZE, when CONFIG_ARM64_64K_PAGES is enabled, the PAGE_SIZE is 64K, when CONFIG_ARM64_64K_PAGES is disabled, PAGE_SIZE is 4K. But NVMe calls ib_map_mr_sg with a fixed size SZ_4K. When CONFIG_ARM64_64K_PAGES is enabled, it is still 4K. This is not a problem in RXE. This problem is in NVMe. If this problem in NVMe is not fixed, NVMe still can not use RXE to make tests with blktests when CONFIG_ARM64_64K_PAGES is enabled. This is not a problem in RXE, it should not be fixed in RXE. Zhu Yanjun > > Jason
On 30/10/2023 20:40, Jason Gunthorpe wrote: > On Mon, Oct 30, 2023 at 07:51:41AM +0000, Zhijian Li (Fujitsu) wrote: >> >> >> On 27/10/2023 13:41, Li Zhijian wrote: >>> mr->page_list only encodes *page without page offset, when >>> page_size != PAGE_SIZE, we cannot restore the address with a wrong >>> page_offset. >>> >>> Note that this patch will break some ULPs that try to register 4K >>> MR when PAGE_SIZE is not 4K. >>> SRP and nvme over RXE is known to be impacted. >>> >>> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> >>> --- >>> drivers/infiniband/sw/rxe/rxe_mr.c | 6 ++++++ >>> 1 file changed, 6 insertions(+) >>> >>> diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c >>> index f54042e9aeb2..61a136ea1d91 100644 >>> --- a/drivers/infiniband/sw/rxe/rxe_mr.c >>> +++ b/drivers/infiniband/sw/rxe/rxe_mr.c >>> @@ -234,6 +234,12 @@ int rxe_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sgl, >>> struct rxe_mr *mr = to_rmr(ibmr); >>> unsigned int page_size = mr_page_size(mr); >>> >>> + if (page_size != PAGE_SIZE) { >> >> It seems this condition is too strict, it should be: >> if (!IS_ALIGNED(page_size, PAGE_SIZE)) >> I have to say I retract this conclusion. It still misses something. To support PAGE_SIZE aligned MR, we have to refactor rxe_map_mr_sg() or rxe_set_page() Currently, rxe_set_page() will be called in the step of page_size, this doesn't split N*PAGE_SIZE memory into N *page. So when we restore an iova from xarray, the array index is wrong as well. So i'm going to refactor rxe_map_mr_sg() to iterate the sgl by myself in rxe_map_mr_sg() like SIW does. Hope this refactor can help RXE to support SZ_4K when PAGE_SIZE!=4K as well. Thanks Zhijian >> So that, page_size with (N * PAGE_SIZE) can work as previously. >> Because the offset(mr.iova & page_mask) will get lost only when !IS_ALIGNED(page_size, PAGE_SIZE) > > That makes sense > > Jason
On Tue, Oct 31, 2023 at 04:52:23PM +0800, Zhu Yanjun wrote: > 在 2023/10/30 20:40, Jason Gunthorpe 写道: > > On Mon, Oct 30, 2023 at 07:51:41AM +0000, Zhijian Li (Fujitsu) wrote: > > > > > > > > > On 27/10/2023 13:41, Li Zhijian wrote: > > > > mr->page_list only encodes *page without page offset, when > > > > page_size != PAGE_SIZE, we cannot restore the address with a wrong > > > > page_offset. > > > > > > > > Note that this patch will break some ULPs that try to register 4K > > > > MR when PAGE_SIZE is not 4K. > > > > SRP and nvme over RXE is known to be impacted. > > > > > > > > Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> > > > > --- > > > > drivers/infiniband/sw/rxe/rxe_mr.c | 6 ++++++ > > > > 1 file changed, 6 insertions(+) > > > > > > > > diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c > > > > index f54042e9aeb2..61a136ea1d91 100644 > > > > --- a/drivers/infiniband/sw/rxe/rxe_mr.c > > > > +++ b/drivers/infiniband/sw/rxe/rxe_mr.c > > > > @@ -234,6 +234,12 @@ int rxe_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sgl, > > > > struct rxe_mr *mr = to_rmr(ibmr); > > > > unsigned int page_size = mr_page_size(mr); > > > > + if (page_size != PAGE_SIZE) { > > > > > > It seems this condition is too strict, it should be: > > > if (!IS_ALIGNED(page_size, PAGE_SIZE)) > > > > > > So that, page_size with (N * PAGE_SIZE) can work as previously. > > > Because the offset(mr.iova & page_mask) will get lost only when !IS_ALIGNED(page_size, PAGE_SIZE) > > > > That makes sense > > I read all the discussions very carefully. > > Thanks, Greg. > > Because RXE only supports PAGE_SIZE, when CONFIG_ARM64_64K_PAGES is enabled, > the PAGE_SIZE is 64K, when CONFIG_ARM64_64K_PAGES is disabled, PAGE_SIZE is > 4K. > > But NVMe calls ib_map_mr_sg with a fixed size SZ_4K. When > CONFIG_ARM64_64K_PAGES is enabled, it is still 4K. This is not a problem in > RXE. This problem is in NVMe. Maybe, but no real RDMA devices don't support 4K. The xarray conversion may need revision to use physical addresses instead of storing struct pages so it can handle this kind of segmentation. Certainly in the mean time it should be rejected. Jason
On Tue, Oct 31, 2023 at 9:19 PM Jason Gunthorpe <jgg@ziepe.ca> wrote: > > On Tue, Oct 31, 2023 at 04:52:23PM +0800, Zhu Yanjun wrote: > > 在 2023/10/30 20:40, Jason Gunthorpe 写道: > > > On Mon, Oct 30, 2023 at 07:51:41AM +0000, Zhijian Li (Fujitsu) wrote: > > > > > > > > > > > > On 27/10/2023 13:41, Li Zhijian wrote: > > > > > mr->page_list only encodes *page without page offset, when > > > > > page_size != PAGE_SIZE, we cannot restore the address with a wrong > > > > > page_offset. > > > > > > > > > > Note that this patch will break some ULPs that try to register 4K > > > > > MR when PAGE_SIZE is not 4K. > > > > > SRP and nvme over RXE is known to be impacted. > > > > > > > > > > Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> > > > > > --- > > > > > drivers/infiniband/sw/rxe/rxe_mr.c | 6 ++++++ > > > > > 1 file changed, 6 insertions(+) > > > > > > > > > > diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c > > > > > index f54042e9aeb2..61a136ea1d91 100644 > > > > > --- a/drivers/infiniband/sw/rxe/rxe_mr.c > > > > > +++ b/drivers/infiniband/sw/rxe/rxe_mr.c > > > > > @@ -234,6 +234,12 @@ int rxe_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sgl, > > > > > struct rxe_mr *mr = to_rmr(ibmr); > > > > > unsigned int page_size = mr_page_size(mr); > > > > > + if (page_size != PAGE_SIZE) { > > > > > > > > It seems this condition is too strict, it should be: > > > > if (!IS_ALIGNED(page_size, PAGE_SIZE)) > > > > > > > > So that, page_size with (N * PAGE_SIZE) can work as previously. > > > > Because the offset(mr.iova & page_mask) will get lost only when !IS_ALIGNED(page_size, PAGE_SIZE) > > > > > > That makes sense > > > > I read all the discussions very carefully. > > > > Thanks, Greg. > > > > Because RXE only supports PAGE_SIZE, when CONFIG_ARM64_64K_PAGES is enabled, > > the PAGE_SIZE is 64K, when CONFIG_ARM64_64K_PAGES is disabled, PAGE_SIZE is > > 4K. > > > > But NVMe calls ib_map_mr_sg with a fixed size SZ_4K. When > > CONFIG_ARM64_64K_PAGES is enabled, it is still 4K. This is not a problem in > > RXE. This problem is in NVMe. > > Maybe, but no real RDMA devices don't support 4K. > > The xarray conversion may need revision to use physical addresses > instead of storing struct pages so it can handle this kind of > segmentation. This problem can not be fixed until rxe supports multiple page sizes including 4K page size. Now it is not fixed. It is an intermediate way. > > Certainly in the mean time it should be rejected. > > Jason
diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c index f54042e9aeb2..61a136ea1d91 100644 --- a/drivers/infiniband/sw/rxe/rxe_mr.c +++ b/drivers/infiniband/sw/rxe/rxe_mr.c @@ -234,6 +234,12 @@ int rxe_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sgl, struct rxe_mr *mr = to_rmr(ibmr); unsigned int page_size = mr_page_size(mr); + if (page_size != PAGE_SIZE) { + rxe_info_mr(mr, "Unsupported MR with page_size %u, expect %lu\n", + page_size, PAGE_SIZE); + return -EOPNOTSUPP; + } + mr->nbuf = 0; mr->page_shift = ilog2(page_size); mr->page_mask = ~((u64)page_size - 1);
mr->page_list only encodes *page without page offset, when page_size != PAGE_SIZE, we cannot restore the address with a wrong page_offset. Note that this patch will break some ULPs that try to register 4K MR when PAGE_SIZE is not 4K. SRP and nvme over RXE is known to be impacted. Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> --- drivers/infiniband/sw/rxe/rxe_mr.c | 6 ++++++ 1 file changed, 6 insertions(+)