diff mbox series

[for-next,v2] RDMA/nldev: Add copy-on-fork attribute to get sys command

Message ID 20210407101606.80737-1-galpress@amazon.com (mailing list archive)
State Superseded
Headers show
Series [for-next,v2] RDMA/nldev: Add copy-on-fork attribute to get sys command | expand

Commit Message

Gal Pressman April 7, 2021, 10:16 a.m. UTC
The new attribute indicates that the kernel copies DMA pages on fork,
hence libibverbs' fork support through madvise and MADV_DONTFORK is not
needed.

The introduced attribute is always reported as supported since the
kernel has the patch that added the copy-on-fork behavior. This allows
the userspace library to identify older vs newer kernel versions.
Extra care should be taken when backporting this patch as it relies on
the fact that the copy-on-fork patch is merged, hence no check for
support is added.

Don't backport this patch unless you also have the following series:
70e806e4e645 ("mm: Do early cow for pinned pages during fork() for ptes")
and 4eae4efa2c29 ("hugetlb: do early cow when page pinned on src mm").

Copy-on-fork attribute is read-only, trying to change it through the set
sys command will result in an error.

Fixes: 70e806e4e645 ("mm: Do early cow for pinned pages during fork() for ptes")
Fixes: 4eae4efa2c29 ("hugetlb: do early cow when page pinned on src mm")
Signed-off-by: Gal Pressman <galpress@amazon.com>
---
PR was sent:
https://github.com/linux-rdma/rdma-core/pull/975

Changelog -
v1->v2: https://lore.kernel.org/linux-rdma/20210405114722.98904-1-galpress@amazon.com/
* Remove nla_put_u8() return value check
* Add commit hashes to commit message and code comment
---
 drivers/infiniband/core/nldev.c  | 14 +++++++++++++-
 include/uapi/rdma/rdma_netlink.h |  2 ++
 2 files changed, 15 insertions(+), 1 deletion(-)

Comments

Leon Romanovsky April 7, 2021, 12:39 p.m. UTC | #1
On Wed, Apr 07, 2021 at 01:16:05PM +0300, Gal Pressman wrote:
> The new attribute indicates that the kernel copies DMA pages on fork,
> hence libibverbs' fork support through madvise and MADV_DONTFORK is not
> needed.
> 
> The introduced attribute is always reported as supported since the
> kernel has the patch that added the copy-on-fork behavior. This allows
> the userspace library to identify older vs newer kernel versions.
> Extra care should be taken when backporting this patch as it relies on
> the fact that the copy-on-fork patch is merged, hence no check for
> support is added.
> 
> Don't backport this patch unless you also have the following series:
> 70e806e4e645 ("mm: Do early cow for pinned pages during fork() for ptes")
> and 4eae4efa2c29 ("hugetlb: do early cow when page pinned on src mm").
> 
> Copy-on-fork attribute is read-only, trying to change it through the set
> sys command will result in an error.
> 
> Fixes: 70e806e4e645 ("mm: Do early cow for pinned pages during fork() for ptes")
> Fixes: 4eae4efa2c29 ("hugetlb: do early cow when page pinned on src mm")
> Signed-off-by: Gal Pressman <galpress@amazon.com>
> ---
> PR was sent:
> https://github.com/linux-rdma/rdma-core/pull/975
> 
> Changelog -
> v1->v2: https://lore.kernel.org/linux-rdma/20210405114722.98904-1-galpress@amazon.com/
> * Remove nla_put_u8() return value check
> * Add commit hashes to commit message and code comment
> ---
>  drivers/infiniband/core/nldev.c  | 14 +++++++++++++-
>  include/uapi/rdma/rdma_netlink.h |  2 ++
>  2 files changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/infiniband/core/nldev.c b/drivers/infiniband/core/nldev.c
> index b8dc002a2478..6b2235926f74 100644
> --- a/drivers/infiniband/core/nldev.c
> +++ b/drivers/infiniband/core/nldev.c
> @@ -146,6 +146,7 @@ static const struct nla_policy nldev_policy[RDMA_NLDEV_ATTR_MAX] = {
>  	[RDMA_NLDEV_ATTR_UVERBS_DRIVER_ID]	= { .type = NLA_U32 },
>  	[RDMA_NLDEV_NET_NS_FD]			= { .type = NLA_U32 },
>  	[RDMA_NLDEV_SYS_ATTR_NETNS_MODE]	= { .type = NLA_U8 },
> +	[RDMA_NLDEV_SYS_ATTR_COPY_ON_FORK]	= { .type = NLA_U8 },
>  };
>  
>  static int put_driver_name_print_type(struct sk_buff *msg, const char *name,
> @@ -1697,6 +1698,16 @@ static int nldev_sys_get_doit(struct sk_buff *skb, struct nlmsghdr *nlh,
>  		nlmsg_free(msg);
>  		return err;
>  	}
> +
> +	/*
> +	 * Copy-on-fork is supported.
> +	 * See commits:
> +	 * 70e806e4e645 ("mm: Do early cow for pinned pages during fork() for ptes")
> +	 * 4eae4efa2c29 ("hugetlb: do early cow when page pinned on src mm")
> +	 * for more details. Don't backport this without them.
> +	 */
> +	nla_put_u8(msg, RDMA_NLDEV_SYS_ATTR_COPY_ON_FORK, 1);
> +
>  	nlmsg_end(msg, nlh);
>  	return rdma_nl_unicast(sock_net(skb->sk), msg, NETLINK_CB(skb).portid);
>  }
> @@ -1710,7 +1721,8 @@ static int nldev_set_sys_set_doit(struct sk_buff *skb, struct nlmsghdr *nlh,
>  
>  	err = nlmsg_parse(nlh, 0, tb, RDMA_NLDEV_ATTR_MAX - 1,
>  			  nldev_policy, extack);
> -	if (err || !tb[RDMA_NLDEV_SYS_ATTR_NETNS_MODE])
> +	if (err || !tb[RDMA_NLDEV_SYS_ATTR_NETNS_MODE] ||
> +	    tb[RDMA_NLDEV_SYS_ATTR_COPY_ON_FORK])

Why do we fail if user supplies RDMA_NLDEV_SYS_ATTR_COPY_ON_FORK?

Thanks
Gal Pressman April 7, 2021, 1:14 p.m. UTC | #2
On 07/04/2021 15:39, Leon Romanovsky wrote:
>> @@ -1710,7 +1721,8 @@ static int nldev_set_sys_set_doit(struct sk_buff *skb, struct nlmsghdr *nlh,
>>
>>       err = nlmsg_parse(nlh, 0, tb, RDMA_NLDEV_ATTR_MAX - 1,
>>                         nldev_policy, extack);
>> -     if (err || !tb[RDMA_NLDEV_SYS_ATTR_NETNS_MODE])
>> +     if (err || !tb[RDMA_NLDEV_SYS_ATTR_NETNS_MODE] ||
>> +         tb[RDMA_NLDEV_SYS_ATTR_COPY_ON_FORK])
> 
> Why do we fail if user supplies RDMA_NLDEV_SYS_ATTR_COPY_ON_FORK?

It's a read-only attribute, if someone tries to set its value I assume it's best
to return an error.
Leon Romanovsky April 7, 2021, 1:23 p.m. UTC | #3
On Wed, Apr 07, 2021 at 04:14:46PM +0300, Gal Pressman wrote:
> On 07/04/2021 15:39, Leon Romanovsky wrote:
> >> @@ -1710,7 +1721,8 @@ static int nldev_set_sys_set_doit(struct sk_buff *skb, struct nlmsghdr *nlh,
> >>
> >>       err = nlmsg_parse(nlh, 0, tb, RDMA_NLDEV_ATTR_MAX - 1,
> >>                         nldev_policy, extack);
> >> -     if (err || !tb[RDMA_NLDEV_SYS_ATTR_NETNS_MODE])
> >> +     if (err || !tb[RDMA_NLDEV_SYS_ATTR_NETNS_MODE] ||
> >> +         tb[RDMA_NLDEV_SYS_ATTR_COPY_ON_FORK])
> > 
> > Why do we fail if user supplies RDMA_NLDEV_SYS_ATTR_COPY_ON_FORK?
> 
> It's a read-only attribute, if someone tries to set its value I assume it's best
> to return an error.

Not in netlink world, you need to ignore the parameters that
you don't "know how to handle" and check for must-to-be input only.

Thanks
Gal Pressman April 7, 2021, 1:30 p.m. UTC | #4
On 07/04/2021 16:23, Leon Romanovsky wrote:
> On Wed, Apr 07, 2021 at 04:14:46PM +0300, Gal Pressman wrote:
>> On 07/04/2021 15:39, Leon Romanovsky wrote:
>>>> @@ -1710,7 +1721,8 @@ static int nldev_set_sys_set_doit(struct sk_buff *skb, struct nlmsghdr *nlh,
>>>>
>>>>       err = nlmsg_parse(nlh, 0, tb, RDMA_NLDEV_ATTR_MAX - 1,
>>>>                         nldev_policy, extack);
>>>> -     if (err || !tb[RDMA_NLDEV_SYS_ATTR_NETNS_MODE])
>>>> +     if (err || !tb[RDMA_NLDEV_SYS_ATTR_NETNS_MODE] ||
>>>> +         tb[RDMA_NLDEV_SYS_ATTR_COPY_ON_FORK])
>>>
>>> Why do we fail if user supplies RDMA_NLDEV_SYS_ATTR_COPY_ON_FORK?
>>
>> It's a read-only attribute, if someone tries to set its value I assume it's best
>> to return an error.
> 
> Not in netlink world, you need to ignore the parameters that
> you don't "know how to handle" and check for must-to-be input only.

Not sure I understand.
So you expect the set function to remain unchanged in this patch? Isn't it bad
that a user can request to change the copy on fork value and get a success
return value although nothing happened?
Leon Romanovsky April 7, 2021, 1:40 p.m. UTC | #5
On Wed, Apr 07, 2021 at 04:30:50PM +0300, Gal Pressman wrote:
> On 07/04/2021 16:23, Leon Romanovsky wrote:
> > On Wed, Apr 07, 2021 at 04:14:46PM +0300, Gal Pressman wrote:
> >> On 07/04/2021 15:39, Leon Romanovsky wrote:
> >>>> @@ -1710,7 +1721,8 @@ static int nldev_set_sys_set_doit(struct sk_buff *skb, struct nlmsghdr *nlh,
> >>>>
> >>>>       err = nlmsg_parse(nlh, 0, tb, RDMA_NLDEV_ATTR_MAX - 1,
> >>>>                         nldev_policy, extack);
> >>>> -     if (err || !tb[RDMA_NLDEV_SYS_ATTR_NETNS_MODE])
> >>>> +     if (err || !tb[RDMA_NLDEV_SYS_ATTR_NETNS_MODE] ||
> >>>> +         tb[RDMA_NLDEV_SYS_ATTR_COPY_ON_FORK])
> >>>
> >>> Why do we fail if user supplies RDMA_NLDEV_SYS_ATTR_COPY_ON_FORK?
> >>
> >> It's a read-only attribute, if someone tries to set its value I assume it's best
> >> to return an error.
> > 
> > Not in netlink world, you need to ignore the parameters that
> > you don't "know how to handle" and check for must-to-be input only.
> 
> Not sure I understand.
> So you expect the set function to remain unchanged in this patch? Isn't it bad
> that a user can request to change the copy on fork value and get a success
> return value although nothing happened?

The same goes for all netlink attributes, user can send some *_RES_*
attribute to _set_ and we won't fail. The same goes for rtnetlink too.

Thanks
Gal Pressman April 8, 2021, 6:30 a.m. UTC | #6
On 07/04/2021 16:40, Leon Romanovsky wrote:
> On Wed, Apr 07, 2021 at 04:30:50PM +0300, Gal Pressman wrote:
>> On 07/04/2021 16:23, Leon Romanovsky wrote:
>>> On Wed, Apr 07, 2021 at 04:14:46PM +0300, Gal Pressman wrote:
>>>> On 07/04/2021 15:39, Leon Romanovsky wrote:
>>>>>> @@ -1710,7 +1721,8 @@ static int nldev_set_sys_set_doit(struct sk_buff *skb, struct nlmsghdr *nlh,
>>>>>>
>>>>>>       err = nlmsg_parse(nlh, 0, tb, RDMA_NLDEV_ATTR_MAX - 1,
>>>>>>                         nldev_policy, extack);
>>>>>> -     if (err || !tb[RDMA_NLDEV_SYS_ATTR_NETNS_MODE])
>>>>>> +     if (err || !tb[RDMA_NLDEV_SYS_ATTR_NETNS_MODE] ||
>>>>>> +         tb[RDMA_NLDEV_SYS_ATTR_COPY_ON_FORK])
>>>>>
>>>>> Why do we fail if user supplies RDMA_NLDEV_SYS_ATTR_COPY_ON_FORK?
>>>>
>>>> It's a read-only attribute, if someone tries to set its value I assume it's best
>>>> to return an error.
>>>
>>> Not in netlink world, you need to ignore the parameters that
>>> you don't "know how to handle" and check for must-to-be input only.
>>
>> Not sure I understand.
>> So you expect the set function to remain unchanged in this patch? Isn't it bad
>> that a user can request to change the copy on fork value and get a success
>> return value although nothing happened?
> 
> The same goes for all netlink attributes, user can send some *_RES_*
> attribute to _set_ and we won't fail. The same goes for rtnetlink too.

Thanks, will remove the check in nldev_set_sys_set_doit().
diff mbox series

Patch

diff --git a/drivers/infiniband/core/nldev.c b/drivers/infiniband/core/nldev.c
index b8dc002a2478..6b2235926f74 100644
--- a/drivers/infiniband/core/nldev.c
+++ b/drivers/infiniband/core/nldev.c
@@ -146,6 +146,7 @@  static const struct nla_policy nldev_policy[RDMA_NLDEV_ATTR_MAX] = {
 	[RDMA_NLDEV_ATTR_UVERBS_DRIVER_ID]	= { .type = NLA_U32 },
 	[RDMA_NLDEV_NET_NS_FD]			= { .type = NLA_U32 },
 	[RDMA_NLDEV_SYS_ATTR_NETNS_MODE]	= { .type = NLA_U8 },
+	[RDMA_NLDEV_SYS_ATTR_COPY_ON_FORK]	= { .type = NLA_U8 },
 };
 
 static int put_driver_name_print_type(struct sk_buff *msg, const char *name,
@@ -1697,6 +1698,16 @@  static int nldev_sys_get_doit(struct sk_buff *skb, struct nlmsghdr *nlh,
 		nlmsg_free(msg);
 		return err;
 	}
+
+	/*
+	 * Copy-on-fork is supported.
+	 * See commits:
+	 * 70e806e4e645 ("mm: Do early cow for pinned pages during fork() for ptes")
+	 * 4eae4efa2c29 ("hugetlb: do early cow when page pinned on src mm")
+	 * for more details. Don't backport this without them.
+	 */
+	nla_put_u8(msg, RDMA_NLDEV_SYS_ATTR_COPY_ON_FORK, 1);
+
 	nlmsg_end(msg, nlh);
 	return rdma_nl_unicast(sock_net(skb->sk), msg, NETLINK_CB(skb).portid);
 }
@@ -1710,7 +1721,8 @@  static int nldev_set_sys_set_doit(struct sk_buff *skb, struct nlmsghdr *nlh,
 
 	err = nlmsg_parse(nlh, 0, tb, RDMA_NLDEV_ATTR_MAX - 1,
 			  nldev_policy, extack);
-	if (err || !tb[RDMA_NLDEV_SYS_ATTR_NETNS_MODE])
+	if (err || !tb[RDMA_NLDEV_SYS_ATTR_NETNS_MODE] ||
+	    tb[RDMA_NLDEV_SYS_ATTR_COPY_ON_FORK])
 		return -EINVAL;
 
 	enable = nla_get_u8(tb[RDMA_NLDEV_SYS_ATTR_NETNS_MODE]);
diff --git a/include/uapi/rdma/rdma_netlink.h b/include/uapi/rdma/rdma_netlink.h
index d2f5b8396243..342c9db5b3c1 100644
--- a/include/uapi/rdma/rdma_netlink.h
+++ b/include/uapi/rdma/rdma_netlink.h
@@ -533,6 +533,8 @@  enum rdma_nldev_attr {
 
 	RDMA_NLDEV_ATTR_RES_RAW,	/* binary */
 
+	RDMA_NLDEV_SYS_ATTR_COPY_ON_FORK,	/* u8 */
+
 	/*
 	 * Always the end
 	 */