diff mbox series

[v13,16/20] IB/mlx4, arm64: untag user pointers in mlx4_get_umem_mr

Message ID 1e2824fd77e8eeb351c6c6246f384d0d89fd2d58.1553093421.git.andreyknvl@google.com (mailing list archive)
State New, archived
Headers show
Series arm64: untag user pointers passed to the kernel | expand

Commit Message

Andrey Konovalov March 20, 2019, 2:51 p.m. UTC
This patch is a part of a series that extends arm64 kernel ABI to allow to
pass tagged user pointers (with the top byte set to something else other
than 0x00) as syscall arguments.

mlx4_get_umem_mr() uses provided user pointers for vma lookups, which can
only by done with untagged pointers.

Untag user pointers in this function.

Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
---
 drivers/infiniband/hw/mlx4/mr.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

Comments

Leon Romanovsky April 29, 2019, 6:09 p.m. UTC | #1
On Wed, Mar 20, 2019 at 03:51:30PM +0100, Andrey Konovalov wrote:
> This patch is a part of a series that extends arm64 kernel ABI to allow to
> pass tagged user pointers (with the top byte set to something else other
> than 0x00) as syscall arguments.
>
> mlx4_get_umem_mr() uses provided user pointers for vma lookups, which can
> only by done with untagged pointers.
>
> Untag user pointers in this function.
>
> Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
> ---
>  drivers/infiniband/hw/mlx4/mr.c | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/infiniband/hw/mlx4/mr.c b/drivers/infiniband/hw/mlx4/mr.c
> index 395379a480cb..9a35ed2c6a6f 100644
> --- a/drivers/infiniband/hw/mlx4/mr.c
> +++ b/drivers/infiniband/hw/mlx4/mr.c
> @@ -378,6 +378,7 @@ static struct ib_umem *mlx4_get_umem_mr(struct ib_udata *udata, u64 start,
>  	 * again
>  	 */
>  	if (!ib_access_writable(access_flags)) {
> +		unsigned long untagged_start = untagged_addr(start);
>  		struct vm_area_struct *vma;
>
>  		down_read(&current->mm->mmap_sem);
> @@ -386,9 +387,9 @@ static struct ib_umem *mlx4_get_umem_mr(struct ib_udata *udata, u64 start,
>  		 * cover the memory, but for now it requires a single vma to
>  		 * entirely cover the MR to support RO mappings.
>  		 */
> -		vma = find_vma(current->mm, start);
> -		if (vma && vma->vm_end >= start + length &&
> -		    vma->vm_start <= start) {
> +		vma = find_vma(current->mm, untagged_start);
> +		if (vma && vma->vm_end >= untagged_start + length &&
> +		    vma->vm_start <= untagged_start) {
>  			if (vma->vm_flags & VM_WRITE)
>  				access_flags |= IB_ACCESS_LOCAL_WRITE;
>  		} else {
> --

Thanks,
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>

Interesting, the followup question is why mlx4 is only one driver in IB which
needs such code in umem_mr. I'll take a look on it.

Thanks
Catalin Marinas April 30, 2019, 11:16 a.m. UTC | #2
(trimmed down the cc list slightly as the message bounces)

On Mon, Apr 29, 2019 at 09:09:15PM +0300, Leon Romanovsky wrote:
> On Wed, Mar 20, 2019 at 03:51:30PM +0100, Andrey Konovalov wrote:
> > This patch is a part of a series that extends arm64 kernel ABI to allow to
> > pass tagged user pointers (with the top byte set to something else other
> > than 0x00) as syscall arguments.
> >
> > mlx4_get_umem_mr() uses provided user pointers for vma lookups, which can
> > only by done with untagged pointers.
> >
> > Untag user pointers in this function.
> >
> > Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
> > ---
> >  drivers/infiniband/hw/mlx4/mr.c | 7 ++++---
> >  1 file changed, 4 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/infiniband/hw/mlx4/mr.c b/drivers/infiniband/hw/mlx4/mr.c
> > index 395379a480cb..9a35ed2c6a6f 100644
> > --- a/drivers/infiniband/hw/mlx4/mr.c
> > +++ b/drivers/infiniband/hw/mlx4/mr.c
> > @@ -378,6 +378,7 @@ static struct ib_umem *mlx4_get_umem_mr(struct ib_udata *udata, u64 start,
> >  	 * again
> >  	 */
> >  	if (!ib_access_writable(access_flags)) {
> > +		unsigned long untagged_start = untagged_addr(start);
> >  		struct vm_area_struct *vma;
> >
> >  		down_read(&current->mm->mmap_sem);
> > @@ -386,9 +387,9 @@ static struct ib_umem *mlx4_get_umem_mr(struct ib_udata *udata, u64 start,
> >  		 * cover the memory, but for now it requires a single vma to
> >  		 * entirely cover the MR to support RO mappings.
> >  		 */
> > -		vma = find_vma(current->mm, start);
> > -		if (vma && vma->vm_end >= start + length &&
> > -		    vma->vm_start <= start) {
> > +		vma = find_vma(current->mm, untagged_start);
> > +		if (vma && vma->vm_end >= untagged_start + length &&
> > +		    vma->vm_start <= untagged_start) {
> >  			if (vma->vm_flags & VM_WRITE)
> >  				access_flags |= IB_ACCESS_LOCAL_WRITE;
> >  		} else {
> > --
> 
> Thanks,
> Reviewed-by: Leon Romanovsky <leonro@mellanox.com>

Thanks for the review.

> Interesting, the followup question is why mlx4 is only one driver in IB which
> needs such code in umem_mr. I'll take a look on it.

I don't know. Just using the light heuristics of find_vma() shows some
other places. For example, ib_umem_odp_get() gets the umem->address via
ib_umem_start(). This was previously set in ib_umem_get() as called from
mlx4_get_umem_mr(). Should the above patch have just untagged "start" on
entry?

BTW, what's the provenience of such "start" address here? Is it
something that the user would have malloc()'ed? We try to impose some
restrictions one what is allowed to be tagged in user so that we don't
have to untag the addresses in the kernel. For example, if it was the
result of an mmap() on the device file, we don't allow tagging.

Thanks.
Leon Romanovsky April 30, 2019, 12:03 p.m. UTC | #3
On Tue, Apr 30, 2019 at 12:16:25PM +0100, Catalin Marinas wrote:
> (trimmed down the cc list slightly as the message bounces)
>
> On Mon, Apr 29, 2019 at 09:09:15PM +0300, Leon Romanovsky wrote:
> > On Wed, Mar 20, 2019 at 03:51:30PM +0100, Andrey Konovalov wrote:
> > > This patch is a part of a series that extends arm64 kernel ABI to allow to
> > > pass tagged user pointers (with the top byte set to something else other
> > > than 0x00) as syscall arguments.
> > >
> > > mlx4_get_umem_mr() uses provided user pointers for vma lookups, which can
> > > only by done with untagged pointers.
> > >
> > > Untag user pointers in this function.
> > >
> > > Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
> > > ---
> > >  drivers/infiniband/hw/mlx4/mr.c | 7 ++++---
> > >  1 file changed, 4 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/drivers/infiniband/hw/mlx4/mr.c b/drivers/infiniband/hw/mlx4/mr.c
> > > index 395379a480cb..9a35ed2c6a6f 100644
> > > --- a/drivers/infiniband/hw/mlx4/mr.c
> > > +++ b/drivers/infiniband/hw/mlx4/mr.c
> > > @@ -378,6 +378,7 @@ static struct ib_umem *mlx4_get_umem_mr(struct ib_udata *udata, u64 start,
> > >  	 * again
> > >  	 */
> > >  	if (!ib_access_writable(access_flags)) {
> > > +		unsigned long untagged_start = untagged_addr(start);
> > >  		struct vm_area_struct *vma;
> > >
> > >  		down_read(&current->mm->mmap_sem);
> > > @@ -386,9 +387,9 @@ static struct ib_umem *mlx4_get_umem_mr(struct ib_udata *udata, u64 start,
> > >  		 * cover the memory, but for now it requires a single vma to
> > >  		 * entirely cover the MR to support RO mappings.
> > >  		 */
> > > -		vma = find_vma(current->mm, start);
> > > -		if (vma && vma->vm_end >= start + length &&
> > > -		    vma->vm_start <= start) {
> > > +		vma = find_vma(current->mm, untagged_start);
> > > +		if (vma && vma->vm_end >= untagged_start + length &&
> > > +		    vma->vm_start <= untagged_start) {
> > >  			if (vma->vm_flags & VM_WRITE)
> > >  				access_flags |= IB_ACCESS_LOCAL_WRITE;
> > >  		} else {
> > > --
> >
> > Thanks,
> > Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
>
> Thanks for the review.
>
> > Interesting, the followup question is why mlx4 is only one driver in IB which
> > needs such code in umem_mr. I'll take a look on it.
>
> I don't know. Just using the light heuristics of find_vma() shows some
> other places. For example, ib_umem_odp_get() gets the umem->address via
> ib_umem_start(). This was previously set in ib_umem_get() as called from
> mlx4_get_umem_mr(). Should the above patch have just untagged "start" on
> entry?

ODP flows are not applicable to any driver except mlx5.
According to commit message of d8f9cc328c88 ("IB/mlx4: Mark user
MR as writable if actual virtual memory is writable"), the code in its
current form needed to deal with different mappings between RDMA memory
requested and VMA memory underlined.

>
> BTW, what's the provenience of such "start" address here? Is it
> something that the user would have malloc()'ed? We try to impose some
> restrictions one what is allowed to be tagged in user so that we don't
> have to untag the addresses in the kernel. For example, if it was the
> result of an mmap() on the device file, we don't allow tagging.

The *_reg_user_mr() is called from userspace through ibv_reg_mr() call [1]
and this is how "address" and access flags are provided.

Right now, the address should point to memory accessible by
get_user_pages(), however mmap-ed memory uses remap_pfn_range()
to provide such pages which makes them unusable for get_user_pages().

I would be glad to see this is a current limitation of RDMA stack and
not as a final design decision.

[1] https://linux.die.net/man/3/ibv_reg_mr

>
> Thanks.
>
> --
> Catalin
Jason Gunthorpe May 2, 2019, 6:44 p.m. UTC | #4
On Tue, Apr 30, 2019 at 12:16:25PM +0100, Catalin Marinas wrote:
> > Interesting, the followup question is why mlx4 is only one driver in IB which
> > needs such code in umem_mr. I'll take a look on it.
> 
> I don't know. Just using the light heuristics of find_vma() shows some
> other places. For example, ib_umem_odp_get() gets the umem->address via
> ib_umem_start(). This was previously set in ib_umem_get() as called from
> mlx4_get_umem_mr(). Should the above patch have just untagged "start" on
> entry?

I have a feeling that there needs to be something for this in the odp
code..

Presumably mmu notifiers and what not also use untagged pointers? Most
likely then the umem should also be storing untagged pointers.

This probably becomes problematic because we do want the tag in cases
talking about the base VA of the MR..

Jason
Catalin Marinas May 3, 2019, 4:28 p.m. UTC | #5
Thanks Jason and Leon for the information.

On Thu, May 02, 2019 at 03:44:42PM -0300, Jason Gunthorpe wrote:
> On Tue, Apr 30, 2019 at 12:16:25PM +0100, Catalin Marinas wrote:
> > > Interesting, the followup question is why mlx4 is only one driver in IB which
> > > needs such code in umem_mr. I'll take a look on it.
> > 
> > I don't know. Just using the light heuristics of find_vma() shows some
> > other places. For example, ib_umem_odp_get() gets the umem->address via
> > ib_umem_start(). This was previously set in ib_umem_get() as called from
> > mlx4_get_umem_mr(). Should the above patch have just untagged "start" on
> > entry?
> 
> I have a feeling that there needs to be something for this in the odp
> code..
> 
> Presumably mmu notifiers and what not also use untagged pointers? Most
> likely then the umem should also be storing untagged pointers.

Yes.

> This probably becomes problematic because we do want the tag in cases
> talking about the base VA of the MR..

It depends on whether the tag is relevant to the kernel or not. The only
useful case so far is for the kernel performing copy_form_user() etc.
accesses so they'd get checked in the presence of hardware memory
tagging (MTE; but it's not mandatory, a 0 tag would do as well).

If we talk about a memory range where the content is relatively opaque
(or irrelevant) to the kernel code, we don't really need the tag. I'm
not familiar to RDMA but I presume it would be a device accessing such
MR but not through the user VA directly. The tag is a property of the
buffer address/pointer when accessed by the CPU at that specific address
range. Any DMA or even kernel accessing it through the linear mapping
(get_user_pages()) would drop such tag.
Jason Gunthorpe May 3, 2019, 11:52 p.m. UTC | #6
On Fri, May 03, 2019 at 05:28:46PM +0100, Catalin Marinas wrote:
> Thanks Jason and Leon for the information.
> 
> On Thu, May 02, 2019 at 03:44:42PM -0300, Jason Gunthorpe wrote:
> > On Tue, Apr 30, 2019 at 12:16:25PM +0100, Catalin Marinas wrote:
> > > > Interesting, the followup question is why mlx4 is only one driver in IB which
> > > > needs such code in umem_mr. I'll take a look on it.
> > > 
> > > I don't know. Just using the light heuristics of find_vma() shows some
> > > other places. For example, ib_umem_odp_get() gets the umem->address via
> > > ib_umem_start(). This was previously set in ib_umem_get() as called from
> > > mlx4_get_umem_mr(). Should the above patch have just untagged "start" on
> > > entry?
> > 
> > I have a feeling that there needs to be something for this in the odp
> > code..
> > 
> > Presumably mmu notifiers and what not also use untagged pointers? Most
> > likely then the umem should also be storing untagged pointers.
> 
> Yes.
> 
> > This probably becomes problematic because we do want the tag in cases
> > talking about the base VA of the MR..
> 
> It depends on whether the tag is relevant to the kernel or not. The only
> useful case so far is for the kernel performing copy_form_user() etc.
> accesses so they'd get checked in the presence of hardware memory
> tagging (MTE; but it's not mandatory, a 0 tag would do as well).
> 
> If we talk about a memory range where the content is relatively opaque
> (or irrelevant) to the kernel code, we don't really need the tag. I'm
> not familiar to RDMA but I presume it would be a device accessing such
> MR but not through the user VA directly. 

RDMA exposes the user VA directly (the IOVA) as part of the wire
protocol, we must preserve the tag in these cases as that is what the
userspace is using for the pointer.

So the ODP stuff will definately need some adjusting when it interacts
with the mmu notifiers and get user pages.

Jason
diff mbox series

Patch

diff --git a/drivers/infiniband/hw/mlx4/mr.c b/drivers/infiniband/hw/mlx4/mr.c
index 395379a480cb..9a35ed2c6a6f 100644
--- a/drivers/infiniband/hw/mlx4/mr.c
+++ b/drivers/infiniband/hw/mlx4/mr.c
@@ -378,6 +378,7 @@  static struct ib_umem *mlx4_get_umem_mr(struct ib_udata *udata, u64 start,
 	 * again
 	 */
 	if (!ib_access_writable(access_flags)) {
+		unsigned long untagged_start = untagged_addr(start);
 		struct vm_area_struct *vma;
 
 		down_read(&current->mm->mmap_sem);
@@ -386,9 +387,9 @@  static struct ib_umem *mlx4_get_umem_mr(struct ib_udata *udata, u64 start,
 		 * cover the memory, but for now it requires a single vma to
 		 * entirely cover the MR to support RO mappings.
 		 */
-		vma = find_vma(current->mm, start);
-		if (vma && vma->vm_end >= start + length &&
-		    vma->vm_start <= start) {
+		vma = find_vma(current->mm, untagged_start);
+		if (vma && vma->vm_end >= untagged_start + length &&
+		    vma->vm_start <= untagged_start) {
 			if (vma->vm_flags & VM_WRITE)
 				access_flags |= IB_ACCESS_LOCAL_WRITE;
 		} else {