diff mbox series

[for-next,1/1] RDMA/rxe: sgt_append from ib_umem_get is not highmem

Message ID 20221120012939.539953-1-yanjun.zhu@intel.com (mailing list archive)
State Rejected
Delegated to: Jason Gunthorpe
Headers show
Series [for-next,1/1] RDMA/rxe: sgt_append from ib_umem_get is not highmem | expand

Commit Message

Zhu Yanjun Nov. 20, 2022, 1:29 a.m. UTC
From: Zhu Yanjun <yanjun.zhu@linux.dev>

In ib_umem_get, sgt_append is allocated from the function
sg_alloc_append_table_from_pages. And it is not from highmem.

As such, the return value of page_address will not be NULL.

Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
---
 drivers/infiniband/sw/rxe/rxe_mr.c | 9 ++-------
 1 file changed, 2 insertions(+), 7 deletions(-)

Comments

Zhijian Li (Fujitsu) Nov. 19, 2022, 10:30 a.m. UTC | #1
There was a thread that tries to refactor iova_to_vaddr[1][2], where 
page_address will be drop. if so, your changes will be no longer needed.

[1]https://lore.kernel.org/lkml/a7decec2-77d9-db4c-ff66-ff597da8bc71@fujitsu.com/T/
[2]https://www.spinics.net/lists/linux-rdma/msg114053.html

Thanks
Zhijian


On 20/11/2022 09:29, Zhu Yanjun wrote:
> From: Zhu Yanjun <yanjun.zhu@linux.dev>
> 
> In ib_umem_get, sgt_append is allocated from the function
> sg_alloc_append_table_from_pages. And it is not from highmem.
> 
> As such, the return value of page_address will not be NULL.
> 
> Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
> ---
>   drivers/infiniband/sw/rxe/rxe_mr.c | 9 ++-------
>   1 file changed, 2 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c
> index b1423000e4bc..3f33a4cdef24 100644
> --- a/drivers/infiniband/sw/rxe/rxe_mr.c
> +++ b/drivers/infiniband/sw/rxe/rxe_mr.c
> @@ -119,7 +119,6 @@ int rxe_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, u64 iova,
>   	struct ib_umem		*umem;
>   	struct sg_page_iter	sg_iter;
>   	int			num_buf;
> -	void			*vaddr;
>   	int err;
>   
>   	umem = ib_umem_get(&rxe->ib_dev, start, length, access);
> @@ -149,6 +148,8 @@ int rxe_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, u64 iova,
>   		buf = map[0]->buf;
>   
>   		for_each_sgtable_page (&umem->sgt_append.sgt, &sg_iter, 0) {
> +			void *vaddr;
> +
>   			if (num_buf >= RXE_BUF_PER_MAP) {
>   				map++;
>   				buf = map[0]->buf;
> @@ -156,16 +157,10 @@ int rxe_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, u64 iova,
>   			}
>   
>   			vaddr = page_address(sg_page_iter_page(&sg_iter));
> -			if (!vaddr) {
> -				rxe_dbg_mr(mr, "Unable to get virtual address\n");
> -				err = -ENOMEM;
> -				goto err_release_umem;
> -			}
>   			buf->addr = (uintptr_t)vaddr;
>   			buf->size = PAGE_SIZE;
>   			num_buf++;
>   			buf++;
> -
>   		}
>   	}
>
Zhu Yanjun Nov. 19, 2022, 1:20 p.m. UTC | #2
在 2022/11/19 18:30, lizhijian@fujitsu.com 写道:
>
> There was a thread that tries to refactor iova_to_vaddr[1][2], where
> page_address will be drop. if so, your changes will be no longer needed.

My commit is to indicate that your commit is not good enough.

And you should make all the related commits in a patch series.

Zhu Yanjun

>
> [1]https://lore.kernel.org/lkml/a7decec2-77d9-db4c-ff66-ff597da8bc71@fujitsu.com/T/
> [2]https://www.spinics.net/lists/linux-rdma/msg114053.html
>
> Thanks
> Zhijian
>
>
> On 20/11/2022 09:29, Zhu Yanjun wrote:
>> From: Zhu Yanjun <yanjun.zhu@linux.dev>
>>
>> In ib_umem_get, sgt_append is allocated from the function
>> sg_alloc_append_table_from_pages. And it is not from highmem.
>>
>> As such, the return value of page_address will not be NULL.
>>
>> Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
>> ---
>>    drivers/infiniband/sw/rxe/rxe_mr.c | 9 ++-------
>>    1 file changed, 2 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c
>> index b1423000e4bc..3f33a4cdef24 100644
>> --- a/drivers/infiniband/sw/rxe/rxe_mr.c
>> +++ b/drivers/infiniband/sw/rxe/rxe_mr.c
>> @@ -119,7 +119,6 @@ int rxe_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, u64 iova,
>>    	struct ib_umem		*umem;
>>    	struct sg_page_iter	sg_iter;
>>    	int			num_buf;
>> -	void			*vaddr;
>>    	int err;
>>    
>>    	umem = ib_umem_get(&rxe->ib_dev, start, length, access);
>> @@ -149,6 +148,8 @@ int rxe_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, u64 iova,
>>    		buf = map[0]->buf;
>>    
>>    		for_each_sgtable_page (&umem->sgt_append.sgt, &sg_iter, 0) {
>> +			void *vaddr;
>> +
>>    			if (num_buf >= RXE_BUF_PER_MAP) {
>>    				map++;
>>    				buf = map[0]->buf;
>> @@ -156,16 +157,10 @@ int rxe_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, u64 iova,
>>    			}
>>    
>>    			vaddr = page_address(sg_page_iter_page(&sg_iter));
>> -			if (!vaddr) {
>> -				rxe_dbg_mr(mr, "Unable to get virtual address\n");
>> -				err = -ENOMEM;
>> -				goto err_release_umem;
>> -			}
>>    			buf->addr = (uintptr_t)vaddr;
>>    			buf->size = PAGE_SIZE;
>>    			num_buf++;
>>    			buf++;
>> -
>>    		}
>>    	}
>>
Jason Gunthorpe Nov. 21, 2022, 3:28 p.m. UTC | #3
On Sat, Nov 19, 2022 at 08:29:38PM -0500, Zhu Yanjun wrote:
> From: Zhu Yanjun <yanjun.zhu@linux.dev>
> 
> In ib_umem_get, sgt_append is allocated from the function
> sg_alloc_append_table_from_pages. And it is not from highmem.

You've confused the allocation of the SGL table itself with the
page_address called on the struct page * stored inside the SGL table.

Think of the SGL as an array of 'struct page *'

The page_address() can return NULL because the 'struct page *' it
contains came from userspace and could very will be highmem.

Jason
Zhu Yanjun Nov. 22, 2022, 4:07 a.m. UTC | #4
在 2022/11/21 23:28, Jason Gunthorpe 写道:
> On Sat, Nov 19, 2022 at 08:29:38PM -0500, Zhu Yanjun wrote:
>> From: Zhu Yanjun <yanjun.zhu@linux.dev>
>>
>> In ib_umem_get, sgt_append is allocated from the function
>> sg_alloc_append_table_from_pages. And it is not from highmem.
> 
> You've confused the allocation of the SGL table itself with the
> page_address called on the struct page * stored inside the SGL table.
> 
> Think of the SGL as an array of 'struct page *'
> 
About "The page_address() can return NULL because the 'struct page *' it 
  contains came from userspace and could very will be highmem.",

 From the function ib_umem *ib_umem_get(struct ib_device *device, 
unsigned long addr,size_t size, int access), I agree with you that 
struct page comes from user space.

But from "Understanding the Linux Kernel", third edition - sections 
"8.1.3. Memory Zones" and "8.1.6. Kernel Mappings of High-Memory Page 
Frames".

In the process' virtual address space, the user space occupies the first 
3GB, and the kernel space the 4th GB of this linear address space.

The first 896MB of the kernel space (not only kernel code, but its data 
also) is "directly" mapped to the first 896 MB of the physical memory.

The last 128MB part of the virtual kernel space is where are mapped some 
pieces of the physical "high memory" (> 896MB) : thus it can only map no 
more than 128MB of "high memory" at a time.

It seems that page_address of these "128MB high memory" will return NULL.

But can user space access these high memory? From "Understanding the 
Linux Kernel", third edition, it seems that it is in kernel space.

Thanks and Regards,
Zhu Yanjun

> 
> Jason
Jason Gunthorpe Nov. 22, 2022, 2:06 p.m. UTC | #5
On Tue, Nov 22, 2022 at 12:07:06PM +0800, Yanjun Zhu wrote:

> But can user space access these high memory? From "Understanding the Linux
> Kernel", third edition, it seems that it is in kernel space.

Yes, "highmem" is effecitvely the ability to have a struct page * that
is not mapped into the kernel address space, but is mapped into a
userspace process address sspace.

Jason
Zhu Yanjun Nov. 23, 2022, 1:53 a.m. UTC | #6
在 2022/11/22 22:06, Jason Gunthorpe 写道:
> On Tue, Nov 22, 2022 at 12:07:06PM +0800, Yanjun Zhu wrote:
>
>> But can user space access these high memory? From "Understanding the Linux
>> Kernel", third edition, it seems that it is in kernel space.
> Yes, "highmem" is effecitvely the ability to have a struct page * that
> is not mapped into the kernel address space, but is mapped into a
> userspace process address sspace.
Got it. I am interested in this "highmem is mapped into a

userspace process address space.".


Would you like to share some documents, links or source code about this 
with me?


I want to delve into this "highmem is mapped into a

userspace process address space."

Thanks and Regards,

Zhu Yanjun


>
> Jason
diff mbox series

Patch

diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c
index b1423000e4bc..3f33a4cdef24 100644
--- a/drivers/infiniband/sw/rxe/rxe_mr.c
+++ b/drivers/infiniband/sw/rxe/rxe_mr.c
@@ -119,7 +119,6 @@  int rxe_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, u64 iova,
 	struct ib_umem		*umem;
 	struct sg_page_iter	sg_iter;
 	int			num_buf;
-	void			*vaddr;
 	int err;
 
 	umem = ib_umem_get(&rxe->ib_dev, start, length, access);
@@ -149,6 +148,8 @@  int rxe_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, u64 iova,
 		buf = map[0]->buf;
 
 		for_each_sgtable_page (&umem->sgt_append.sgt, &sg_iter, 0) {
+			void *vaddr;
+
 			if (num_buf >= RXE_BUF_PER_MAP) {
 				map++;
 				buf = map[0]->buf;
@@ -156,16 +157,10 @@  int rxe_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, u64 iova,
 			}
 
 			vaddr = page_address(sg_page_iter_page(&sg_iter));
-			if (!vaddr) {
-				rxe_dbg_mr(mr, "Unable to get virtual address\n");
-				err = -ENOMEM;
-				goto err_release_umem;
-			}
 			buf->addr = (uintptr_t)vaddr;
 			buf->size = PAGE_SIZE;
 			num_buf++;
 			buf++;
-
 		}
 	}