diff mbox

[v1,1/5] IB/uverbs: ex_query_device: answer must not depend on request's comp_mask

Message ID 24ceb1fc5b2b6563532e5776b1b2320b1ae37543.1422553023.git.ydroneaud@opteya.com (mailing list archive)
State Superseded, archived
Headers show

Commit Message

Yann Droneaud Jan. 29, 2015, 5:59 p.m. UTC
As specified in "Extending Verbs API" presentation [1] by Tzahi Oved
during OFA International Developer Workshop 2013, the request's
comp_mask should describe the request data: it's describe the
availability of extended fields in the request.
Conversely, the response's comp_mask should describe the presence
of extended fields in the response.

So this patch changes ib_uverbs_ex_query_device() function to always
return the maximum supported features, currently only
IB_USER_VERBS_EX_QUERY_DEVICE_ODP per commit 860f10a799c8 ("IB/core:
Add flags for on demand paging support"), regardless of the value of
request's comp_mask.

[1] https://www.openfabrics.org/images/docs/2013_Dev_Workshop/Tues_0423/2013_Workshop_Tues_0830_Tzahi_Oved-verbs_extensions_ofa_2013-tzahio.pdf

Link: http://mid.gmane.org/cover.1422553023.git.ydroneaud@opteya.com
Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: Shachar Raindel <raindel@mellanox.com>
Cc: Eli Cohen <eli@mellanox.com>
Cc: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
---
 drivers/infiniband/core/uverbs_cmd.c | 18 ++++++++----------
 1 file changed, 8 insertions(+), 10 deletions(-)

Comments

Jason Gunthorpe Jan. 29, 2015, 6:28 p.m. UTC | #1
On Thu, Jan 29, 2015 at 06:59:58PM +0100, Yann Droneaud wrote:
> As specified in "Extending Verbs API" presentation [1] by Tzahi Oved
> during OFA International Developer Workshop 2013, the request's
> comp_mask should describe the request data: it's describe the
> availability of extended fields in the request.
> Conversely, the response's comp_mask should describe the presence
> of extended fields in the response.

Roland: I agree with Yann, these patches need to go in, or the ODP
patches reverted.

>  #ifdef CONFIG_INFINIBAND_ON_DEMAND_PAGING
> -	if (cmd.comp_mask & IB_USER_VERBS_EX_QUERY_DEVICE_ODP) {

Absolutely, a query output should never depend on the input comp_mask.

> -		resp.odp_caps.general_caps = attr.odp_caps.general_caps;
> -		resp.odp_caps.per_transport_caps.rc_odp_caps =
> -			attr.odp_caps.per_transport_caps.rc_odp_caps;
> -		resp.odp_caps.per_transport_caps.uc_odp_caps =
> -			attr.odp_caps.per_transport_caps.uc_odp_caps;
> -		resp.odp_caps.per_transport_caps.ud_odp_caps =
> -			attr.odp_caps.per_transport_caps.ud_odp_caps;
> -		resp.comp_mask |= IB_USER_VERBS_EX_QUERY_DEVICE_ODP;
> -	}
> +	resp.odp_caps.general_caps = attr.odp_caps.general_caps;
> +	resp.odp_caps.per_transport_caps.rc_odp_caps =
> +		attr.odp_caps.per_transport_caps.rc_odp_caps;
> +	resp.odp_caps.per_transport_caps.uc_odp_caps =
> +		attr.odp_caps.per_transport_caps.uc_odp_caps;
> +	resp.odp_caps.per_transport_caps.ud_odp_caps =
> +		attr.odp_caps.per_transport_caps.ud_odp_caps;
>  #endif
> +	resp.comp_mask |= IB_USER_VERBS_EX_QUERY_DEVICE_ODP;

Not sure about this - if 0 is a valid null answer for all the _caps
then it is fine, and the comp_mask bit should just be removed as the
size alone should be enough.

This looks wrong however:

>  	err = ib_copy_to_udata(ucore, &resp, sizeof(resp));
>  	if (err)
>           return err;
>        return 0;

Why isn't this returning the filled structure size to userspace?  This
would seem to be very urgently wrong to me? Am I missing something?

Patch 3 probably should gain:

- return 0;
+ return resp_len;

This patch looks like an improvement to me:

Reviewed-By: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yann Droneaud Jan. 29, 2015, 6:43 p.m. UTC | #2
Hi,

Le jeudi 29 janvier 2015 à 11:28 -0700, Jason Gunthorpe a écrit :
> On Thu, Jan 29, 2015 at 06:59:58PM +0100, Yann Droneaud wrote:
> > As specified in "Extending Verbs API" presentation [1] by Tzahi Oved
> > during OFA International Developer Workshop 2013, the request's
> > comp_mask should describe the request data: it's describe the
> > availability of extended fields in the request.
> > Conversely, the response's comp_mask should describe the presence
> > of extended fields in the response.
> 
> Roland: I agree with Yann, these patches need to go in, or the ODP
> patches reverted.
> 
> >  #ifdef CONFIG_INFINIBAND_ON_DEMAND_PAGING
> > -	if (cmd.comp_mask & IB_USER_VERBS_EX_QUERY_DEVICE_ODP) {
> 
> Absolutely, a query output should never depend on the input comp_mask.
> 

We should keep this in mind then.

> > -		resp.odp_caps.general_caps = attr.odp_caps.general_caps;
> > -		resp.odp_caps.per_transport_caps.rc_odp_caps =
> > -			attr.odp_caps.per_transport_caps.rc_odp_caps;
> > -		resp.odp_caps.per_transport_caps.uc_odp_caps =
> > -			attr.odp_caps.per_transport_caps.uc_odp_caps;
> > -		resp.odp_caps.per_transport_caps.ud_odp_caps =
> > -			attr.odp_caps.per_transport_caps.ud_odp_caps;
> > -		resp.comp_mask |= IB_USER_VERBS_EX_QUERY_DEVICE_ODP;
> > -	}
> > +	resp.odp_caps.general_caps = attr.odp_caps.general_caps;
> > +	resp.odp_caps.per_transport_caps.rc_odp_caps =
> > +		attr.odp_caps.per_transport_caps.rc_odp_caps;
> > +	resp.odp_caps.per_transport_caps.uc_odp_caps =
> > +		attr.odp_caps.per_transport_caps.uc_odp_caps;
> > +	resp.odp_caps.per_transport_caps.ud_odp_caps =
> > +		attr.odp_caps.per_transport_caps.ud_odp_caps;
> >  #endif
> > +	resp.comp_mask |= IB_USER_VERBS_EX_QUERY_DEVICE_ODP;
> 
> Not sure about this - if 0 is a valid null answer for all the _caps
> then it is fine, and the comp_mask bit should just be removed as the
> size alone should be enough.
> 

That's difficult to say. But I hope 0 are valids answers in this case.

Anyway, the response's comp_mask is needed, as it's the sole way for 
the userspace to know the size of the response. See below.

> This looks wrong however:
> 
> >  	err = ib_copy_to_udata(ucore, &resp, sizeof(resp));
> >  	if (err)
> >           return err;
> >        return 0;
> 
> Why isn't this returning the filled structure size to userspace?  This
> would seem to be very urgently wrong to me? Am I missing something?
> 
> Patch 3 probably should gain:
> 
> - return 0;
> + return resp_len;
> 

The write() syscall must return the size buffer passed to it, or less,
but in such case it would ask for trouble as userspace would be allowed
to write() the remaining bytes.
Returning a size bigger than the one passed to write() is not acceptable
and would break any expectation.

> This patch looks like an improvement to me:
> 
> Reviewed-By: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
> 

Thanks a lot.

Regards.
Jason Gunthorpe Jan. 29, 2015, 7:18 p.m. UTC | #3
On Thu, Jan 29, 2015 at 07:43:29PM +0100, Yann Droneaud wrote:

> The write() syscall must return the size buffer passed to it, or
> less, but in such case it would ask for trouble as userspace would
> be allowed to write() the remaining bytes.  Returning a size bigger
> than the one passed to write() is not acceptable and would break any
> expectation.

By that logic the 0 return is still wrong, and it should be ucore->in_len

But I think we can return less without risking anything breaking, it
already violates the invariant associated with write() - it mutates
the buffer passed in!

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yann Droneaud Jan. 29, 2015, 8:42 p.m. UTC | #4
Le jeudi 29 janvier 2015 à 19:43 +0100, Yann Droneaud a écrit :
> Le jeudi 29 janvier 2015 à 11:28 -0700, Jason Gunthorpe a écrit :
> > On Thu, Jan 29, 2015 at 06:59:58PM +0100, Yann Droneaud wrote:

> > > -		resp.odp_caps.general_caps = attr.odp_caps.general_caps;
> > > -		resp.odp_caps.per_transport_caps.rc_odp_caps =
> > > -			attr.odp_caps.per_transport_caps.rc_odp_caps;
> > > -		resp.odp_caps.per_transport_caps.uc_odp_caps =
> > > -			attr.odp_caps.per_transport_caps.uc_odp_caps;
> > > -		resp.odp_caps.per_transport_caps.ud_odp_caps =
> > > -			attr.odp_caps.per_transport_caps.ud_odp_caps;
> > > -		resp.comp_mask |= IB_USER_VERBS_EX_QUERY_DEVICE_ODP;
> > > -	}
> > > +	resp.odp_caps.general_caps = attr.odp_caps.general_caps;
> > > +	resp.odp_caps.per_transport_caps.rc_odp_caps =
> > > +		attr.odp_caps.per_transport_caps.rc_odp_caps;
> > > +	resp.odp_caps.per_transport_caps.uc_odp_caps =
> > > +		attr.odp_caps.per_transport_caps.uc_odp_caps;
> > > +	resp.odp_caps.per_transport_caps.ud_odp_caps =
> > > +		attr.odp_caps.per_transport_caps.ud_odp_caps;
> > >  #endif
> > > +	resp.comp_mask |= IB_USER_VERBS_EX_QUERY_DEVICE_ODP;
> > 
> > Not sure about this - if 0 is a valid null answer for all the _caps
> > then it is fine, and the comp_mask bit should just be removed as the
> > size alone should be enough.
> > 
> 
> That's difficult to say. But I hope 0 are valids answers in this case.
> 

Hum, according to include/rdma/ib_verbs.h, there's a bit set
for ODP support:

    148 enum ib_odp_general_cap_bits {
    149         IB_ODP_SUPPORT = 1 << 0,
    150 };

So it should be safe to set everything to 0 in struct
ib_uverbs_odp_caps.

Regards.
Yann Droneaud Jan. 29, 2015, 8:50 p.m. UTC | #5
Hi,

Le jeudi 29 janvier 2015 à 12:18 -0700, Jason Gunthorpe a écrit :
> On Thu, Jan 29, 2015 at 07:43:29PM +0100, Yann Droneaud wrote:
> 
> > The write() syscall must return the size buffer passed to it, or
> > less, but in such case it would ask for trouble as userspace would
> > be allowed to write() the remaining bytes.  Returning a size bigger
> > than the one passed to write() is not acceptable and would break any
> > expectation.
> 
> By that logic the 0 return is still wrong, and it should be ucore->in_len
> 

This is handled by ib_uverbs_write() in
drivers/infiniband/core/uverbs_main.c:

    709                 if (err)
    710                         return err;
    711 
    712                 return written_count;


> But I think we can return less without risking anything breaking, it
> already violates the invariant associated with write() - it mutates
> the buffer passed in!
> 

I don't think so, as only the response buffer is written to and the
response buffer pointer is provided in the buffer given to write().

AFAIK, no uverbs ever change the content of the input buffer (eg. the
request): I've managed to declare the various input buffers "const" so
it would surprising to find it use for writing to userspace.

Anyway, I recognize that uverb way of abusing write() syscall is 
borderline (at best) regarding other Linux subsystems and Unix paradigm 
in general. But it's not enough to screw it more.

Regards.
Jason Gunthorpe Jan. 29, 2015, 9:12 p.m. UTC | #6
On Thu, Jan 29, 2015 at 09:50:38PM +0100, Yann Droneaud wrote:

> Anyway, I recognize that uverb way of abusing write() syscall is 
> borderline (at best) regarding other Linux subsystems and Unix paradigm 
> in general. But it's not enough to screw it more.

Then we must return the correct output size explicitly in the struct.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Haggai Eran Feb. 1, 2015, 7:39 a.m. UTC | #7
On 29/01/2015 20:28, Jason Gunthorpe wrote:
> On Thu, Jan 29, 2015 at 06:59:58PM +0100, Yann Droneaud wrote:
>> > -		resp.odp_caps.general_caps = attr.odp_caps.general_caps;
>> > -		resp.odp_caps.per_transport_caps.rc_odp_caps =
>> > -			attr.odp_caps.per_transport_caps.rc_odp_caps;
>> > -		resp.odp_caps.per_transport_caps.uc_odp_caps =
>> > -			attr.odp_caps.per_transport_caps.uc_odp_caps;
>> > -		resp.odp_caps.per_transport_caps.ud_odp_caps =
>> > -			attr.odp_caps.per_transport_caps.ud_odp_caps;
>> > -		resp.comp_mask |= IB_USER_VERBS_EX_QUERY_DEVICE_ODP;
>> > -	}
>> > +	resp.odp_caps.general_caps = attr.odp_caps.general_caps;
>> > +	resp.odp_caps.per_transport_caps.rc_odp_caps =
>> > +		attr.odp_caps.per_transport_caps.rc_odp_caps;
>> > +	resp.odp_caps.per_transport_caps.uc_odp_caps =
>> > +		attr.odp_caps.per_transport_caps.uc_odp_caps;
>> > +	resp.odp_caps.per_transport_caps.ud_odp_caps =
>> > +		attr.odp_caps.per_transport_caps.ud_odp_caps;
>> >  #endif
>> > +	resp.comp_mask |= IB_USER_VERBS_EX_QUERY_DEVICE_ODP;
> Not sure about this - if 0 is a valid null answer for all the _caps
> then it is fine, and the comp_mask bit should just be removed as the
> size alone should be enough.

Zero is indeed a valid answer. There the IB_ODP_SUPPORT bit in the
general_caps field that says whether or not ODP is supported in general.
The per transport capabilities are also default to not supported.

However, I think we should keep the comp_mask field for future
extensions. The current code doesn't report the size of the response to
user space, and in addition, comp_mask being a bit mask has the
advantage of allowing only part of the structure to be marked valid.

Regards,
Haggai
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ira Weiny Feb. 5, 2015, 2:54 a.m. UTC | #8
> 
> On Thu, Jan 29, 2015 at 09:50:38PM +0100, Yann Droneaud wrote:
> 
> > Anyway, I recognize that uverb way of abusing write() syscall is
> > borderline (at best) regarding other Linux subsystems and Unix
> > paradigm in general. But it's not enough to screw it more.
> 
> Then we must return the correct output size explicitly in the struct.

I was thinking this very same thing as I read through this thread.

I too would like to avoid the use of comp_masks if at all possible.  The query call seems to be a verb where the structure size is all you really need to know the set of values returned.

As Jason says, other calls may require the comp_mask where 0 is not sufficient to indicate "missing".

Ira

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Haggai Eran Feb. 5, 2015, 8:26 a.m. UTC | #9
On 05/02/2015 04:54, Weiny, Ira wrote:
>>
>> On Thu, Jan 29, 2015 at 09:50:38PM +0100, Yann Droneaud wrote:
>>
>>> Anyway, I recognize that uverb way of abusing write() syscall is
>>> borderline (at best) regarding other Linux subsystems and Unix
>>> paradigm in general. But it's not enough to screw it more.
>>
>> Then we must return the correct output size explicitly in the struct.
> 
> I was thinking this very same thing as I read through this thread.
> 
> I too would like to avoid the use of comp_masks if at all possible.  The query call seems to be a verb where the structure size is all you really need to know the set of values returned.
> 
> As Jason says, other calls may require the comp_mask where 0 is not sufficient to indicate "missing".

Would it be okay to return it in the ib_uverbs_cmd_hdr.out_words? That
would further abuse the write() syscall by writing to the input buffer.
However, the only other alternative I see is to add it explicitly to
every uverb response struct.

Haggai
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ira Weiny Feb. 7, 2015, 12:52 a.m. UTC | #10
> 
> On 05/02/2015 04:54, Weiny, Ira wrote:
> >>
> >> On Thu, Jan 29, 2015 at 09:50:38PM +0100, Yann Droneaud wrote:
> >>
> >>> Anyway, I recognize that uverb way of abusing write() syscall is
> >>> borderline (at best) regarding other Linux subsystems and Unix
> >>> paradigm in general. But it's not enough to screw it more.
> >>
> >> Then we must return the correct output size explicitly in the struct.
> >
> > I was thinking this very same thing as I read through this thread.
> >
> > I too would like to avoid the use of comp_masks if at all possible.  The query
> call seems to be a verb where the structure size is all you really need to know
> the set of values returned.
> >
> > As Jason says, other calls may require the comp_mask where 0 is not
> sufficient to indicate "missing".
> 
> Would it be okay to return it in the ib_uverbs_cmd_hdr.out_words? That would
> further abuse the write() syscall by writing to the input buffer.

I don't think that is such a great idea.
 
> However, the only other alternative I see is to add it explicitly to every uverb
> response struct.
> 

I think this is the best solution.  There is a 32 bit reserved field in ib_uverbs_ex_query_device_resp.  Could we use all or part of that to be the size?

For other extended commands I'm not sure what to do.

Ira

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Haggai Eran Feb. 8, 2015, 7:27 a.m. UTC | #11
On 07/02/2015 02:52, Weiny, Ira wrote:
>>
>> On 05/02/2015 04:54, Weiny, Ira wrote:
>>>>
>>>> On Thu, Jan 29, 2015 at 09:50:38PM +0100, Yann Droneaud wrote:
>>>>
>>>>> Anyway, I recognize that uverb way of abusing write() syscall is
>>>>> borderline (at best) regarding other Linux subsystems and Unix
>>>>> paradigm in general. But it's not enough to screw it more.
>>>>
>>>> Then we must return the correct output size explicitly in the struct.
>>>
>>> I was thinking this very same thing as I read through this thread.
>>>
>>> I too would like to avoid the use of comp_masks if at all possible.  The query
>> call seems to be a verb where the structure size is all you really need to know
>> the set of values returned.
>>>
>>> As Jason says, other calls may require the comp_mask where 0 is not
>> sufficient to indicate "missing".
>>
...
>  
>> However, the only other alternative I see is to add it explicitly to every uverb
>> response struct.
>>
> 
> I think this is the best solution.  There is a 32 bit reserved field in ib_uverbs_ex_query_device_resp.  Could we use all or part of that to be the size?

Yes, I think 32-bit for the response length are more than enough.

I will send patches for 3.20 to re-introduce ib_uverbs_ex_query_device
with the response size instead of the reserved field, and with Yann's
changes.

Thanks,
Haggai

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index 532d8eba8b02..6ef06a9b4362 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -3324,17 +3324,15 @@  int ib_uverbs_ex_query_device(struct ib_uverbs_file *file,
 	resp.comp_mask = 0;
 
 #ifdef CONFIG_INFINIBAND_ON_DEMAND_PAGING
-	if (cmd.comp_mask & IB_USER_VERBS_EX_QUERY_DEVICE_ODP) {
-		resp.odp_caps.general_caps = attr.odp_caps.general_caps;
-		resp.odp_caps.per_transport_caps.rc_odp_caps =
-			attr.odp_caps.per_transport_caps.rc_odp_caps;
-		resp.odp_caps.per_transport_caps.uc_odp_caps =
-			attr.odp_caps.per_transport_caps.uc_odp_caps;
-		resp.odp_caps.per_transport_caps.ud_odp_caps =
-			attr.odp_caps.per_transport_caps.ud_odp_caps;
-		resp.comp_mask |= IB_USER_VERBS_EX_QUERY_DEVICE_ODP;
-	}
+	resp.odp_caps.general_caps = attr.odp_caps.general_caps;
+	resp.odp_caps.per_transport_caps.rc_odp_caps =
+		attr.odp_caps.per_transport_caps.rc_odp_caps;
+	resp.odp_caps.per_transport_caps.uc_odp_caps =
+		attr.odp_caps.per_transport_caps.uc_odp_caps;
+	resp.odp_caps.per_transport_caps.ud_odp_caps =
+		attr.odp_caps.per_transport_caps.ud_odp_caps;
 #endif
+	resp.comp_mask |= IB_USER_VERBS_EX_QUERY_DEVICE_ODP;
 
 	err = ib_copy_to_udata(ucore, &resp, sizeof(resp));
 	if (err)