diff mbox

[v1,1/5] IB/uverbs: ex_query_device: answer must not depend on request's comp_mask

Message ID 1422568741.3133.247.camel@opteya.com (mailing list archive)
State Superseded, archived
Headers show

Commit Message

Yann Droneaud Jan. 29, 2015, 9:59 p.m. UTC
Hi,

Le jeudi 29 janvier 2015 à 11:28 -0700, Jason Gunthorpe a écrit :
> On Thu, Jan 29, 2015 at 06:59:58PM +0100, Yann Droneaud wrote:
> > As specified in "Extending Verbs API" presentation [1] by Tzahi Oved
> > during OFA International Developer Workshop 2013, the request's
> > comp_mask should describe the request data: it's describe the
> > availability of extended fields in the request.
> > Conversely, the response's comp_mask should describe the presence
> > of extended fields in the response.
> 
> Roland: I agree with Yann, these patches need to go in, or the ODP
> patches reverted.
> 

Reverting all On Demand Paging patches seems overkill:
if something as to be reverted it should be commit 5a77abf9a97a
("IB/core: Add support for extended query device caps") and the part of
commit 860f10a799c8 ("IB/core: Add flags for on demand paging support")
which modify ib_uverbs_ex_query_device().

But I wonder about this part of commit 860f10a799c8:

for some coherency between a subset of the bits (it's not a function
dedicated to check flags provided by userspace):

include/rdma/ib_verbs.h:

   1094 enum ib_access_flags {
   1095         IB_ACCESS_LOCAL_WRITE   = 1,
   1096         IB_ACCESS_REMOTE_WRITE  = (1<<1),
   1097         IB_ACCESS_REMOTE_READ   = (1<<2),
   1098         IB_ACCESS_REMOTE_ATOMIC = (1<<3),
   1099         IB_ACCESS_MW_BIND       = (1<<4),
   1100         IB_ZERO_BASED           = (1<<5),
   1101         IB_ACCESS_ON_DEMAND     = (1<<6),
   1102 };

drivers/infiniband/core/uverbs_cmd.c: ib_uverbs_reg_mr()

    961         ret = ib_check_mr_access(cmd.access_flags);
    962         if (ret)
    963                 return ret;

include/rdma/ib_verbs.h:

   2643 static inline int ib_check_mr_access(int flags)
   2644 {
   2645         /*
   2646          * Local write permission is required if remote write or
   2647          * remote atomic permission is also requested.
   2648          */
   2649         if (flags & (IB_ACCESS_REMOTE_ATOMIC | IB_ACCESS_REMOTE_WRITE) &&
   2650             !(flags & IB_ACCESS_LOCAL_WRITE))
   2651                 return -EINVAL;
   2652 
   2653         return 0;
   2654 }

drivers/infiniband/core/uverbs_cmd.c: ib_uverbs_reg_mr()

    990         mr = pd->device->reg_user_mr(pd, cmd.start, cmd.length, cmd.hca_va,
    991                                      cmd.access_flags, &udata);

reg_user_mr() functions may call ib_umem_get() and pass access_flags to
it:

drivers/infiniband/core/umem.c: ib_umem_get()

    114         /*
    115          * We ask for writable memory if any of the following
    116          * access flags are set.  "Local write" and "remote write"
    117          * obviously require write access.  "Remote atomic" can do
    118          * things like fetch and add, which will modify memory, and
    119          * "MW bind" can change permissions by binding a window.
    120          */
    121         umem->writable  = !!(access &
    122                 (IB_ACCESS_LOCAL_WRITE   | IB_ACCESS_REMOTE_WRITE |
    123                  IB_ACCESS_REMOTE_ATOMIC | IB_ACCESS_MW_BIND));
    124 
    125         if (access & IB_ACCESS_ON_DEMAND) {
    126                 ret = ib_umem_odp_get(context, umem);
    127                 if (ret) {
    128                         kfree(umem);
    129                         return ERR_PTR(ret);
    130                 }
    131                 return umem;
    132         }


As you can see only a few bits in access_flags are checked in the end,
so it may exist a very unlikely possibility that an existing userspace
program is setting this IB_ACCESS_ON_DEMAND bit without the intention
of enabling on demand paging as this would be unnoticed by older kernel.

In the other hand, a newer program built with on-demand-paging in mind
will set the bit, but when run on older kernel, it will be a no-op,
allowing the program to continue, perhaps thinking on-demand-paging
is available.

That should be avoided as much as possible.

Unfortunately, I think this cannot be fixed as it's was long since 
IB_ZERO_BASED was added by commit 7083e42ee2 ("IB/core: Add "type 2" 
memory windows support").
Anyway there was no check for IB_ACCESS_REMOTE_READ, nor 
IB_ACCESS_MW_BIND in the uverb layer either.

So, just as the second argument of open() syscall (remember O_TMPFILE,
see http://lwn.net/Articles/562294/ ), we will have to live with and be
careful ...

Regards.

Comments

Roland Dreier Jan. 29, 2015, 11:17 p.m. UTC | #1
On Thu, Jan 29, 2015 at 1:59 PM, Yann Droneaud <ydroneaud@opteya.com> wrote:
>> Roland: I agree with Yann, these patches need to go in, or the ODP
>> patches reverted.

> Reverting all On Demand Paging patches seems overkill:
> if something as to be reverted it should be commit 5a77abf9a97a
> ("IB/core: Add support for extended query device caps") and the part of
> commit 860f10a799c8 ("IB/core: Add flags for on demand paging support")
> which modify ib_uverbs_ex_query_device().

Thank you and Jason for taking on this interface.

At this point I feel like I do about the IPoIB changes -- we should
revert the broken stuff and get it right for 3.20.

If we revert the two things you describe above, is everything else OK
to leave in 3.19 with respect to ABI?

Thanks,
  Roland
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yann Droneaud Jan. 30, 2015, 5:26 p.m. UTC | #2
Hi,

Le jeudi 29 janvier 2015 à 15:17 -0800, Roland Dreier a écrit :
> On Thu, Jan 29, 2015 at 1:59 PM, Yann Droneaud <ydroneaud@opteya.com> wrote:
> >> Roland: I agree with Yann, these patches need to go in, or the ODP
> >> patches reverted.
> 
> > Reverting all On Demand Paging patches seems overkill:
> > if something as to be reverted it should be commit 5a77abf9a97a
> > ("IB/core: Add support for extended query device caps") and the part of
> > commit 860f10a799c8 ("IB/core: Add flags for on demand paging support")
> > which modify ib_uverbs_ex_query_device().
> 
> Thank you and Jason for taking on this interface.
> 
> At this point I feel like I do about the IPoIB changes -- we should
> revert the broken stuff and get it right for 3.20.
> 
> If we revert the two things you describe above, is everything else OK
> to leave in 3.19 with respect to ABI?
> 

I've tried to review every changes since v3.18 on drivers/infiniband
include/rdma and include/uapi/rdma with respect to ABI issues.

I've noticed no other issue, but I have to admit I've not well reviewed
the drivers (hw/) internal changes.

If the IB_USER_VERBS_EX_CMD_QUERY_DEVICE and ib_uverbs_ex_query_device
changes are going to be reverted for v3.19, the on-demand-paging
feature will be available (IB_DEVICE_ON_DEMAND_PAGING will be set 
device_cap_flags in response to non extended QUERY_DEVICE for mlx5 HCA
and IB_ACCESS_ON_DEMAND access flag will be effective for REG_MR 
uverbs), but its parameters won't be. I don't know if it's a no-go for 
the usage of on-demand paging by userspace: I have not the chance
of owning HCA with the support for this feature, nor the patches
libibverbs / libmlx5 ... (anyway I would not have the time to test).
I've hoped people from Mellanox would have commented on the revert 
option too.

Regards.
Or Gerlitz Jan. 31, 2015, 8:09 p.m. UTC | #3
On Fri, Jan 30, 2015, Yann Droneaud <ydroneaud@opteya.com> wrote:
> [..] I have not the chance
> of owning HCA with the support for this feature, nor the patches
> libibverbs / libmlx5 ... (anyway I would not have the time to test). [...]

Yann, have you ever run Linux over RDMA capable HW (IB, RoCE, iWARP or alike?)
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Haggai Eran Feb. 1, 2015, 8:04 a.m. UTC | #4
On 29/01/2015 23:59, Yann Droneaud wrote:
> But I wonder about this part of commit 860f10a799c8:
> 
> diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
> index c7a43624c96b..f9326ccda4b5 100644
> --- a/drivers/infiniband/core/uverbs_cmd.c
> +++ b/drivers/infiniband/core/uverbs_cmd.c
> @@ -953,6 +953,18 @@ ssize_t ib_uverbs_reg_mr(struct ib_uverbs_file *file,
>                 goto err_free;
>         }
>  
> +       if (cmd.access_flags & IB_ACCESS_ON_DEMAND) {
> +               struct ib_device_attr attr;
> +
> +               ret = ib_query_device(pd->device, &attr);
> +               if (ret || !(attr.device_cap_flags &
> +                               IB_DEVICE_ON_DEMAND_PAGING)) {
> +                       pr_debug("ODP support not available\n");
> +                       ret = -EINVAL;
> +                       goto err_put;
> +               }
> +       }
> +
> 
> AFAICT (1 << 6) bit from struct ib_uverbs_reg_mr access_flags field
> was not enforced to be 0 previously, as ib_check_mr_access() only check 
> for some coherency between a subset of the bits (it's not a function
> dedicated to check flags provided by userspace):
> 
> include/rdma/ib_verbs.h:
> 
>    1094 enum ib_access_flags {
>    1095         IB_ACCESS_LOCAL_WRITE   = 1,
>    1096         IB_ACCESS_REMOTE_WRITE  = (1<<1),
>    1097         IB_ACCESS_REMOTE_READ   = (1<<2),
>    1098         IB_ACCESS_REMOTE_ATOMIC = (1<<3),
>    1099         IB_ACCESS_MW_BIND       = (1<<4),
>    1100         IB_ZERO_BASED           = (1<<5),
>    1101         IB_ACCESS_ON_DEMAND     = (1<<6),
>    1102 };
> 
> drivers/infiniband/core/uverbs_cmd.c: ib_uverbs_reg_mr()
> 
>     961         ret = ib_check_mr_access(cmd.access_flags);
>     962         if (ret)
>     963                 return ret;
> 
> include/rdma/ib_verbs.h:
> 
>    2643 static inline int ib_check_mr_access(int flags)
>    2644 {
>    2645         /*
>    2646          * Local write permission is required if remote write or
>    2647          * remote atomic permission is also requested.
>    2648          */
>    2649         if (flags & (IB_ACCESS_REMOTE_ATOMIC | IB_ACCESS_REMOTE_WRITE) &&
>    2650             !(flags & IB_ACCESS_LOCAL_WRITE))
>    2651                 return -EINVAL;
>    2652 
>    2653         return 0;
>    2654 }
> 
> drivers/infiniband/core/uverbs_cmd.c: ib_uverbs_reg_mr()
> 
>     990         mr = pd->device->reg_user_mr(pd, cmd.start, cmd.length, cmd.hca_va,
>     991                                      cmd.access_flags, &udata);
> 
> reg_user_mr() functions may call ib_umem_get() and pass access_flags to
> it:
> 
> drivers/infiniband/core/umem.c: ib_umem_get()
> 
>     114         /*
>     115          * We ask for writable memory if any of the following
>     116          * access flags are set.  "Local write" and "remote write"
>     117          * obviously require write access.  "Remote atomic" can do
>     118          * things like fetch and add, which will modify memory, and
>     119          * "MW bind" can change permissions by binding a window.
>     120          */
>     121         umem->writable  = !!(access &
>     122                 (IB_ACCESS_LOCAL_WRITE   | IB_ACCESS_REMOTE_WRITE |
>     123                  IB_ACCESS_REMOTE_ATOMIC | IB_ACCESS_MW_BIND));
>     124 
>     125         if (access & IB_ACCESS_ON_DEMAND) {
>     126                 ret = ib_umem_odp_get(context, umem);
>     127                 if (ret) {
>     128                         kfree(umem);
>     129                         return ERR_PTR(ret);
>     130                 }
>     131                 return umem;
>     132         }
> 
> 
> As you can see only a few bits in access_flags are checked in the end,
> so it may exist a very unlikely possibility that an existing userspace
> program is setting this IB_ACCESS_ON_DEMAND bit without the intention
> of enabling on demand paging as this would be unnoticed by older kernel.
>
> In the other hand, a newer program built with on-demand-paging in mind
> will set the bit, but when run on older kernel, it will be a no-op,
> allowing the program to continue, perhaps thinking on-demand-paging
> is available.

This is why we added a method for userspace to check the kernel
capabilities both when adding the memory windows support and with ODP.
User-space can still send garbage to the kernel, but if it does so
without checking the kernel supports its request, it's user-space's problem.

> That should be avoided as much as possible.
> 
> Unfortunately, I think this cannot be fixed as it's was long since 
> IB_ZERO_BASED was added by commit 7083e42ee2 ("IB/core: Add "type 2" 
> memory windows support").
> Anyway there was no check for IB_ACCESS_REMOTE_READ, nor 
> IB_ACCESS_MW_BIND in the uverb layer either.

I think it would be perfectly fine to add a check today that validates
the MR access flags input is known. Because the newer feature require
user-space to check capabilities, I don't think it would matter if we
returned an error on newer kernels that do not support these features.

Haggai
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Haggai Eran Feb. 1, 2015, 11:25 a.m. UTC | #5
On 30/01/2015 19:26, Yann Droneaud wrote:
> Hi,
> 
> Le jeudi 29 janvier 2015 à 15:17 -0800, Roland Dreier a écrit :
>> On Thu, Jan 29, 2015 at 1:59 PM, Yann Droneaud <ydroneaud@opteya.com> wrote:
>>>> Roland: I agree with Yann, these patches need to go in, or the ODP
>>>> patches reverted.
>>
>>> Reverting all On Demand Paging patches seems overkill:
>>> if something as to be reverted it should be commit 5a77abf9a97a
>>> ("IB/core: Add support for extended query device caps") and the part of
>>> commit 860f10a799c8 ("IB/core: Add flags for on demand paging support")
>>> which modify ib_uverbs_ex_query_device().
>>
>> Thank you and Jason for taking on this interface.
>>
>> At this point I feel like I do about the IPoIB changes -- we should
>> revert the broken stuff and get it right for 3.20.
>>
>> If we revert the two things you describe above, is everything else OK
>> to leave in 3.19 with respect to ABI?
>>
> 
> I've tried to review every changes since v3.18 on drivers/infiniband
> include/rdma and include/uapi/rdma with respect to ABI issues.
> 
> I've noticed no other issue, but I have to admit I've not well reviewed
> the drivers (hw/) internal changes.
> 
> If the IB_USER_VERBS_EX_CMD_QUERY_DEVICE and ib_uverbs_ex_query_device
> changes are going to be reverted for v3.19, the on-demand-paging
> feature will be available (IB_DEVICE_ON_DEMAND_PAGING will be set 
> device_cap_flags in response to non extended QUERY_DEVICE for mlx5 HCA
> and IB_ACCESS_ON_DEMAND access flag will be effective for REG_MR 
> uverbs), but its parameters won't be. 

For user-space to make use of on demand paging, it should verify the
specific transport and operation is supported. If they don't, they will
encounter errors when a page fault occurs.

> I don't know if it's a no-go for 
> the usage of on-demand paging by userspace: I have not the chance
> of owning HCA with the support for this feature, nor the patches
> libibverbs / libmlx5 ... (anyway I would not have the time to test).
> I've hoped people from Mellanox would have commented on the revert 
> option too.

I would prefer it if Yann's patches are accepted. I understand it is
very late, but they are quite short, and I think they provide the right
semantic for this new verb.

As a second option, in case you prefer to revert the extended query
device patch, I will send a patch shortly to do that.

Regards,
Haggai
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yann Droneaud March 2, 2015, 2:08 p.m. UTC | #6
Hi Or,

Le samedi 31 janvier 2015 à 22:09 +0200, Or Gerlitz a écrit :
> On Fri, Jan 30, 2015, Yann Droneaud <ydroneaud@opteya.com> wrote:
> > [..] I have not the chance
> > of owning HCA with the support for this feature, nor the patches
> > libibverbs / libmlx5 ... (anyway I would not have the time to test). [...]
> 
> Yann, have you ever run Linux over RDMA capable HW (IB, RoCE, iWARP or alike?)

That was a silly question.

So I've took a little time to consider answering it,
perhaps you were gently kidding me.

Anyway, please take the following links as an answer to your question:

"rdma_create_qp() and max_send_wr"
<http://mid.gmane.org/1303404264.2243.19.camel@deela.quest-ce.net>

"ibv_create_qp() and max_inline_data behavior"
<http://mid.gmane.org/d7685e13978f93890b2939c5ac2d5c7f@meuh.org>

"[PATCH librdmacm 0/3] no IBV_SEND_INLINE thus memory registration
required on QLogic/Intel HCA"
<http://mid.gmane.org/cover.1376746185.git.ydroneaud@opteya.com>

"[PATCH 00/22] infiniband: improve userspace input check"
<http://mid.gmane.org/cover.1376847403.git.ydroneaud@opteya.com>

I hope this would be enough to join the gang^Wclub. Where're the
by-laws ?

Regards.
diff mbox

Patch

diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index c7a43624c96b..f9326ccda4b5 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -953,6 +953,18 @@  ssize_t ib_uverbs_reg_mr(struct ib_uverbs_file *file,
                goto err_free;
        }
 
+       if (cmd.access_flags & IB_ACCESS_ON_DEMAND) {
+               struct ib_device_attr attr;
+
+               ret = ib_query_device(pd->device, &attr);
+               if (ret || !(attr.device_cap_flags &
+                               IB_DEVICE_ON_DEMAND_PAGING)) {
+                       pr_debug("ODP support not available\n");
+                       ret = -EINVAL;
+                       goto err_put;
+               }
+       }
+

AFAICT (1 << 6) bit from struct ib_uverbs_reg_mr access_flags field
was not enforced to be 0 previously, as ib_check_mr_access() only check