diff mbox series

[net-next,v4,5/9] genetlink: introduce per-sock family private pointer storage

Message ID 20231123181546.521488-6-jiri@resnulli.us (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series devlink: introduce notifications filtering | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/codegen success Generated files up to date
netdev/tree_selection success Clearly marked for net-next, async
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 1182 this patch: 1182
netdev/cc_maintainers warning 3 maintainers not CCed: Liam.Howlett@oracle.com anjali.k.kulkarni@oracle.com fw@strlen.de
netdev/build_clang success Errors and warnings before: 1155 this patch: 1155
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 1209 this patch: 1209
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 132 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Jiri Pirko Nov. 23, 2023, 6:15 p.m. UTC
From: Jiri Pirko <jiri@nvidia.com>

Introduce a priv pointer into struct netlink_sock. Use it to store a per
socket xarray that contains family->id indexed priv pointer storage.
Note I used xarray instead of suggested linked list as it is more
convenient, without need to have a container struct that would
contain struct list_head item.

Introduce genl_sk_priv_store() to store the priv pointer.
Introduce genl_sk_priv_get() to obtain the priv pointer under RCU
read lock.

Assume that kfree() is good for free of privs for now, as the only user
introduced by the follow-up patch (devlink) will use kzalloc() for the
allocation of the memory of the stored pointer. If later on
this needs to be made custom, a callback is going to be needed.
Until then (if ever), do this in a simple way.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
---
v3->v4:
- new patch
---
 include/net/genetlink.h  |  3 ++
 net/netlink/af_netlink.h |  1 +
 net/netlink/genetlink.c  | 98 ++++++++++++++++++++++++++++++++++++++++
 3 files changed, 102 insertions(+)

Comments

Paolo Abeni Nov. 27, 2023, 11:13 a.m. UTC | #1
On Thu, 2023-11-23 at 19:15 +0100, Jiri Pirko wrote:
[...]
> +/**
> + * genl_sk_priv_store - Store per-socket private pointer for family
> + *
> + * @sk: socket
> + * @family: family
> + * @priv: private pointer
> + *
> + * Store a private pointer per-socket for a specified
> + * Generic netlink family.
> + *
> + * Caller has to make sure this is not called in parallel multiple times
> + * for the same sock and also in parallel to genl_release() for the same sock.
> + *
> + * Returns: previously stored private pointer for the family (could be NULL)
> + * on success, otherwise negative error value encoded by ERR_PTR().
> + */
> +void *genl_sk_priv_store(struct sock *sk, struct genl_family *family,
> +			 void *priv)
> +{
> +	struct genl_sk_ctx *ctx;
> +	void *old_priv;
> +
> +	ctx = rcu_dereference_raw(nlk_sk(sk)->priv);

Minor nit: Looking at the following patch, this will be called under
the rtnl lock. Since a look is needed to ensure the priv ptr
consistency, what about adding the relevant lockdep annotation here?

No need to repost for the above.

Cheers,

Paolo
Jiri Pirko Nov. 27, 2023, noon UTC | #2
Mon, Nov 27, 2023 at 12:13:09PM CET, pabeni@redhat.com wrote:
>On Thu, 2023-11-23 at 19:15 +0100, Jiri Pirko wrote:
>[...]
>> +/**
>> + * genl_sk_priv_store - Store per-socket private pointer for family
>> + *
>> + * @sk: socket
>> + * @family: family
>> + * @priv: private pointer
>> + *
>> + * Store a private pointer per-socket for a specified
>> + * Generic netlink family.
>> + *
>> + * Caller has to make sure this is not called in parallel multiple times
>> + * for the same sock and also in parallel to genl_release() for the same sock.
>> + *
>> + * Returns: previously stored private pointer for the family (could be NULL)
>> + * on success, otherwise negative error value encoded by ERR_PTR().
>> + */
>> +void *genl_sk_priv_store(struct sock *sk, struct genl_family *family,
>> +			 void *priv)
>> +{
>> +	struct genl_sk_ctx *ctx;
>> +	void *old_priv;
>> +
>> +	ctx = rcu_dereference_raw(nlk_sk(sk)->priv);
>
>Minor nit: Looking at the following patch, this will be called under
>the rtnl lock. Since a look is needed to ensure the priv ptr

Well, depends on the user (as the comment says). Devlink does not call
it with rtnl lock. Relies on the socket skb processing and cleanup
flows.


>consistency, what about adding the relevant lockdep annotation here?
>
>No need to repost for the above.
>
>Cheers,
>
>Paolo
>
Jakub Kicinski Nov. 27, 2023, 10:46 p.m. UTC | #3
On Thu, 23 Nov 2023 19:15:42 +0100 Jiri Pirko wrote:
> +	void __rcu		*priv;

How many times did I say "typed struct" in the feedback to the broken
v3 of this series? You complain so much about people not addressing
feedback you've given them, it's really hypocritical.

Put the xarray pointer here directly. Plus a lock to protect the init.

The size of the per-family struct should be in family definition,
allocation should happen on first get automatically. Family definition
should also hold a callback to how the data is going to be freed.
Jiri Pirko Nov. 28, 2023, 8:25 a.m. UTC | #4
Mon, Nov 27, 2023 at 11:46:26PM CET, kuba@kernel.org wrote:
>On Thu, 23 Nov 2023 19:15:42 +0100 Jiri Pirko wrote:
>> +	void __rcu		*priv;
>
>How many times did I say "typed struct" in the feedback to the broken
>v3 of this series? You complain so much about people not addressing
>feedback you've given them, it's really hypocritical.

I did what I thought is best. Since this is struct netlink_sock, it is
not related only to genetlink. So other implementations may in theory
use this pointer for something else. Looked a bit odd to put
genetlink-specific pointer here, but as you wish.

>
>Put the xarray pointer here directly. Plus a lock to protect the init.

Okay, just to make this clear. You want me to have:
	struct xarray __rcu		*family_privs;

in struct netlink_sock, correct?


Why I need a lock? If I read things correctly, skbs are processed in
serial over one sock. Since this is per-sock, should be safe.


>
>The size of the per-family struct should be in family definition,
>allocation should happen on first get automatically. Family definition

Yes, I thought about that. But I decided to do this lockless, allocating
new priv every time the user sets the filter and replace the xarray item
so it could be accessed in rcu read section during notification
processing.

What you suggest requires some lock to protect the memory being changed
during filter set and suring accessing in in notify. But okay,
if you insist.


>should also hold a callback to how the data is going to be freed.

If it is alloceted automatically, why is it needed?





>-- 
>pw-bot: cr
Przemek Kitszel Nov. 28, 2023, 12:30 p.m. UTC | #5
On 11/23/23 19:15, Jiri Pirko wrote:
> From: Jiri Pirko <jiri@nvidia.com>
> 
> Introduce a priv pointer into struct netlink_sock. Use it to store a per
> socket xarray that contains family->id indexed priv pointer storage.
> Note I used xarray instead of suggested linked list as it is more
> convenient, without need to have a container struct that would
> contain struct list_head item.
> 
> Introduce genl_sk_priv_store() to store the priv pointer.
> Introduce genl_sk_priv_get() to obtain the priv pointer under RCU
> read lock.
> 
> Assume that kfree() is good for free of privs for now, as the only user
> introduced by the follow-up patch (devlink) will use kzalloc() for the
> allocation of the memory of the stored pointer. If later on
> this needs to be made custom, a callback is going to be needed.
> Until then (if ever), do this in a simple way.
> 
> Signed-off-by: Jiri Pirko <jiri@nvidia.com>
> ---
> v3->v4:
> - new patch
> ---
>   include/net/genetlink.h  |  3 ++
>   net/netlink/af_netlink.h |  1 +
>   net/netlink/genetlink.c  | 98 ++++++++++++++++++++++++++++++++++++++++
>   3 files changed, 102 insertions(+)
> 
> diff --git a/include/net/genetlink.h b/include/net/genetlink.h
> index e18a4c0d69ee..66c1e50415e0 100644
> --- a/include/net/genetlink.h
> +++ b/include/net/genetlink.h
> @@ -300,6 +300,9 @@ int genl_register_family(struct genl_family *family);
>   int genl_unregister_family(const struct genl_family *family);
>   void genl_notify(const struct genl_family *family, struct sk_buff *skb,
>   		 struct genl_info *info, u32 group, gfp_t flags);
> +void *genl_sk_priv_get(struct sock *sk, struct genl_family *family);
> +void *genl_sk_priv_store(struct sock *sk, struct genl_family *family,
> +			 void *priv);
>   
>   void *genlmsg_put(struct sk_buff *skb, u32 portid, u32 seq,
>   		  const struct genl_family *family, int flags, u8 cmd);
> diff --git a/net/netlink/af_netlink.h b/net/netlink/af_netlink.h
> index 2145979b9986..5d96135a4cf3 100644
> --- a/net/netlink/af_netlink.h
> +++ b/net/netlink/af_netlink.h
> @@ -51,6 +51,7 @@ struct netlink_sock {
>   	struct rhash_head	node;
>   	struct rcu_head		rcu;
>   	struct work_struct	work;
> +	void __rcu		*priv;
>   };
>   
>   static inline struct netlink_sock *nlk_sk(struct sock *sk)
> diff --git a/net/netlink/genetlink.c b/net/netlink/genetlink.c
> index 92ef5ed2e7b0..aae5e63fa50b 100644
> --- a/net/netlink/genetlink.c
> +++ b/net/netlink/genetlink.c
> @@ -21,6 +21,7 @@
>   #include <linux/idr.h>
>   #include <net/sock.h>
>   #include <net/genetlink.h>
> +#include "af_netlink.h"
>   
>   static DEFINE_MUTEX(genl_mutex); /* serialization of message processing */
>   static DECLARE_RWSEM(cb_lock);
> @@ -1699,12 +1700,109 @@ static int genl_bind(struct net *net, int group)
>   	return ret;
>   }
>   
> +struct genl_sk_ctx {
> +	struct xarray family_privs;
> +};
> +
> +static struct genl_sk_ctx *genl_sk_ctx_alloc(void)
> +{
> +	struct genl_sk_ctx *ctx;
> +
> +	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
> +	if (!ctx)
> +		return NULL;
> +	xa_init_flags(&ctx->family_privs, XA_FLAGS_ALLOC);
> +	return ctx;
> +}
> +
> +static void genl_sk_ctx_free(struct genl_sk_ctx *ctx)
> +{
> +	unsigned long family_id;
> +	void *priv;
> +
> +	xa_for_each(&ctx->family_privs, family_id, priv) {
> +		xa_erase(&ctx->family_privs, family_id);
> +		kfree(priv);
> +	}
> +	xa_destroy(&ctx->family_privs);
> +	kfree(ctx);
> +}
> +
> +/**
> + * genl_sk_priv_get - Get per-socket private pointer for family
> + *
> + * @sk: socket
> + * @family: family
> + *
> + * Lookup a private pointer stored per-socket by a specified
> + * Generic netlink family.
> + *
> + * Caller should make sure this is called in RCU read locked section.
> + *
> + * Returns: valid pointer on success, otherwise NULL.

since you are going to post next revision,

kernel-doc requires "Return:" section (singular form)
https://docs.kernel.org/doc-guide/kernel-doc.html#function-documentation

for new code we should strive to fulfil the requirement
(or piss-off someone powerful enough to change the requirement ;))



[snip]
Jiri Pirko Nov. 28, 2023, 3:05 p.m. UTC | #6
Tue, Nov 28, 2023 at 01:30:51PM CET, przemyslaw.kitszel@intel.com wrote:
>On 11/23/23 19:15, Jiri Pirko wrote:
>> From: Jiri Pirko <jiri@nvidia.com>
>> 
>> Introduce a priv pointer into struct netlink_sock. Use it to store a per
>> socket xarray that contains family->id indexed priv pointer storage.
>> Note I used xarray instead of suggested linked list as it is more
>> convenient, without need to have a container struct that would
>> contain struct list_head item.
>> 
>> Introduce genl_sk_priv_store() to store the priv pointer.
>> Introduce genl_sk_priv_get() to obtain the priv pointer under RCU
>> read lock.
>> 
>> Assume that kfree() is good for free of privs for now, as the only user
>> introduced by the follow-up patch (devlink) will use kzalloc() for the
>> allocation of the memory of the stored pointer. If later on
>> this needs to be made custom, a callback is going to be needed.
>> Until then (if ever), do this in a simple way.
>> 
>> Signed-off-by: Jiri Pirko <jiri@nvidia.com>
>> ---
>> v3->v4:
>> - new patch
>> ---
>>   include/net/genetlink.h  |  3 ++
>>   net/netlink/af_netlink.h |  1 +
>>   net/netlink/genetlink.c  | 98 ++++++++++++++++++++++++++++++++++++++++
>>   3 files changed, 102 insertions(+)
>> 
>> diff --git a/include/net/genetlink.h b/include/net/genetlink.h
>> index e18a4c0d69ee..66c1e50415e0 100644
>> --- a/include/net/genetlink.h
>> +++ b/include/net/genetlink.h
>> @@ -300,6 +300,9 @@ int genl_register_family(struct genl_family *family);
>>   int genl_unregister_family(const struct genl_family *family);
>>   void genl_notify(const struct genl_family *family, struct sk_buff *skb,
>>   		 struct genl_info *info, u32 group, gfp_t flags);
>> +void *genl_sk_priv_get(struct sock *sk, struct genl_family *family);
>> +void *genl_sk_priv_store(struct sock *sk, struct genl_family *family,
>> +			 void *priv);
>>   void *genlmsg_put(struct sk_buff *skb, u32 portid, u32 seq,
>>   		  const struct genl_family *family, int flags, u8 cmd);
>> diff --git a/net/netlink/af_netlink.h b/net/netlink/af_netlink.h
>> index 2145979b9986..5d96135a4cf3 100644
>> --- a/net/netlink/af_netlink.h
>> +++ b/net/netlink/af_netlink.h
>> @@ -51,6 +51,7 @@ struct netlink_sock {
>>   	struct rhash_head	node;
>>   	struct rcu_head		rcu;
>>   	struct work_struct	work;
>> +	void __rcu		*priv;
>>   };
>>   static inline struct netlink_sock *nlk_sk(struct sock *sk)
>> diff --git a/net/netlink/genetlink.c b/net/netlink/genetlink.c
>> index 92ef5ed2e7b0..aae5e63fa50b 100644
>> --- a/net/netlink/genetlink.c
>> +++ b/net/netlink/genetlink.c
>> @@ -21,6 +21,7 @@
>>   #include <linux/idr.h>
>>   #include <net/sock.h>
>>   #include <net/genetlink.h>
>> +#include "af_netlink.h"
>>   static DEFINE_MUTEX(genl_mutex); /* serialization of message processing */
>>   static DECLARE_RWSEM(cb_lock);
>> @@ -1699,12 +1700,109 @@ static int genl_bind(struct net *net, int group)
>>   	return ret;
>>   }
>> +struct genl_sk_ctx {
>> +	struct xarray family_privs;
>> +};
>> +
>> +static struct genl_sk_ctx *genl_sk_ctx_alloc(void)
>> +{
>> +	struct genl_sk_ctx *ctx;
>> +
>> +	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
>> +	if (!ctx)
>> +		return NULL;
>> +	xa_init_flags(&ctx->family_privs, XA_FLAGS_ALLOC);
>> +	return ctx;
>> +}
>> +
>> +static void genl_sk_ctx_free(struct genl_sk_ctx *ctx)
>> +{
>> +	unsigned long family_id;
>> +	void *priv;
>> +
>> +	xa_for_each(&ctx->family_privs, family_id, priv) {
>> +		xa_erase(&ctx->family_privs, family_id);
>> +		kfree(priv);
>> +	}
>> +	xa_destroy(&ctx->family_privs);
>> +	kfree(ctx);
>> +}
>> +
>> +/**
>> + * genl_sk_priv_get - Get per-socket private pointer for family
>> + *
>> + * @sk: socket
>> + * @family: family
>> + *
>> + * Lookup a private pointer stored per-socket by a specified
>> + * Generic netlink family.
>> + *
>> + * Caller should make sure this is called in RCU read locked section.
>> + *
>> + * Returns: valid pointer on success, otherwise NULL.
>
>since you are going to post next revision,
>
>kernel-doc requires "Return:" section (singular form)
>https://docs.kernel.org/doc-guide/kernel-doc.html#function-documentation
>
>for new code we should strive to fulfil the requirement
>(or piss-off someone powerful enough to change the requirement ;))

Okay, will fix. I just thought this is okay when scripts/kernel-doc is
happy.

>
>
>
>[snip]
Jakub Kicinski Nov. 28, 2023, 3:11 p.m. UTC | #7
On Tue, 28 Nov 2023 09:25:21 +0100 Jiri Pirko wrote:
> >Put the xarray pointer here directly. Plus a lock to protect the init.  
> 
> Okay, just to make this clear. You want me to have:
> 	struct xarray __rcu		*family_privs;
> 
> in struct netlink_sock, correct?
> 
> 
> Why I need a lock? If I read things correctly, skbs are processed in
> serial over one sock. Since this is per-sock, should be safe.

Okay, then add an assertion that the socket lock is held, at least.
Also, is the socket lock not held yet when the filtering happens?
Maybe the __rcu annotation isn't necessary then either?

> >The size of the per-family struct should be in family definition,
> >allocation should happen on first get automatically. Family definition  
> 
> Yes, I thought about that. But I decided to do this lockless, allocating
> new priv every time the user sets the filter and replace the xarray item
> so it could be accessed in rcu read section during notification
> processing.
> 
> What you suggest requires some lock to protect the memory being changed
> during filter set and suring accessing in in notify. But okay,
> if you insist.

Not necessarily, you can have a helper which doesn't allocate, too.
What I'm saying is that the common case for ops will be to access
the state and allocate if it doesn't exist.

How about genl_sk_family_priv() and genl_sk_has_family_priv() ?

> >should also hold a callback to how the data is going to be freed.  
> 
> If it is alloceted automatically, why is it needed?

Because priv may be a complex type which has member that need
individual fields to be destroyed (in fullness of time we also
need a constructor which can init things like list_head, but
we can defer that).

I'm guessing in your case the priv will look like this:

struct devlink_sk_priv {
	const char *nft_fltr_instance_name;
};

static void devlink_sk_priv_free(void *ptr)
{
	struct devlink_sk_priv *priv = ptr;

	kfree(priv->nft_fltr_instance_name);
}
Jiri Pirko Nov. 28, 2023, 4:05 p.m. UTC | #8
Tue, Nov 28, 2023 at 04:11:16PM CET, kuba@kernel.org wrote:
>On Tue, 28 Nov 2023 09:25:21 +0100 Jiri Pirko wrote:
>> >Put the xarray pointer here directly. Plus a lock to protect the init.  
>> 
>> Okay, just to make this clear. You want me to have:
>> 	struct xarray __rcu		*family_privs;
>> 
>> in struct netlink_sock, correct?
>> 
>> 
>> Why I need a lock? If I read things correctly, skbs are processed in
>> serial over one sock. Since this is per-sock, should be safe.
>
>Okay, then add an assertion that the socket lock is held, at least.

No socket lock, but I assumed revcmsg could be called only in parallel.
But I guess that with multiple threads, this assumption is broken.
Okay, will add a spin lock.


>Also, is the socket lock not held yet when the filtering happens?

Nope.


>Maybe the __rcu annotation isn't necessary then either?
>
>> >The size of the per-family struct should be in family definition,
>> >allocation should happen on first get automatically. Family definition  
>> 
>> Yes, I thought about that. But I decided to do this lockless, allocating
>> new priv every time the user sets the filter and replace the xarray item
>> so it could be accessed in rcu read section during notification
>> processing.
>> 
>> What you suggest requires some lock to protect the memory being changed
>> during filter set and suring accessing in in notify. But okay,
>> if you insist.
>
>Not necessarily, you can have a helper which doesn't allocate, too.
>What I'm saying is that the common case for ops will be to access
>the state and allocate if it doesn't exist.
>
>How about genl_sk_family_priv() and genl_sk_has_family_priv() ?

My point is, with what you suggest, it will look something like this:

1) user does DEVLINK_CMD_NOTIFY_FILTER_SET
2) devlink calls into genetlink code for a sk_priv and inserts in xa_array
3) genetlink allocates sk_priv and returns back
4) devlink fills-up the sk_priv

5) user does DEVLINK_CMD_NOTIFY_FILTER_SET, again
6) devlink calls into genetlink code for a sk_priv
7) genetlink returns already exising sk_priv found in xa_array
8) devlink fills-up the sk_priv

Now the notification thread, sees sk_priv zeroed between 3) and 4)
and inconsistent during 4) and 8)

I originally solved that by rcu, DEVLINK_CMD_NOTIFY_FILTER_SET
code always allocates and flips the rcu pointer. Notification thread
always sees sk_priv consistent.

If you want to allocate sk_priv in genetlink code once, hard to use
the rcu mechanism and have to protect the sk_priv memory by a lock.

What am I missing?


>
>> >should also hold a callback to how the data is going to be freed.  
>> 
>> If it is alloceted automatically, why is it needed?
>
>Because priv may be a complex type which has member that need
>individual fields to be destroyed (in fullness of time we also
>need a constructor which can init things like list_head, but
>we can defer that).
>
>I'm guessing in your case the priv will look like this:
>
>struct devlink_sk_priv {
>	const char *nft_fltr_instance_name;
>};
>
>static void devlink_sk_priv_free(void *ptr)
>{
>	struct devlink_sk_priv *priv = ptr;
>
>	kfree(priv->nft_fltr_instance_name);
>}

If genetlink code does the allocation, it should know how to free.
Does not make sense to pass destructor to genetlink code to free memory
it actually allocated :/

If devlink does the allocation, this callback makes sense. I was
thinking about having it, but decided kfree is okay for now and
destructor could be always introduced if needed.

Quoting the patch description:
    Assume that kfree() is good for free of privs for now, as the only user
    introduced by the follow-up patch (devlink) will use kzalloc() for the
    allocation of the memory of the stored pointer. If later on
    this needs to be made custom, a callback is going to be needed.
    Until then (if ever), do this in a simple way.
Andy Shevchenko Nov. 28, 2023, 4:18 p.m. UTC | #9
On Tue, Nov 28, 2023 at 01:30:51PM +0100, Przemek Kitszel wrote:
> On 11/23/23 19:15, Jiri Pirko wrote:

...

> > + * Returns: valid pointer on success, otherwise NULL.
> 
> since you are going to post next revision,
> 
> kernel-doc requires "Return:" section (singular form)
> https://docs.kernel.org/doc-guide/kernel-doc.html#function-documentation
> 
> for new code we should strive to fulfil the requirement
> (or piss-off someone powerful enough to change the requirement ;))

Interestingly that the script accepts plural for a few keywords.
Is it documented somewhere as deprecated?
Jakub Kicinski Nov. 28, 2023, 4:36 p.m. UTC | #10
On Tue, 28 Nov 2023 17:05:48 +0100 Jiri Pirko wrote:
> >Not necessarily, you can have a helper which doesn't allocate, too.
> >What I'm saying is that the common case for ops will be to access
> >the state and allocate if it doesn't exist.
> >
> >How about genl_sk_family_priv() and genl_sk_has_family_priv() ?  
> 
> My point is, with what you suggest, it will look something like this:
> 
> 1) user does DEVLINK_CMD_NOTIFY_FILTER_SET
> 2) devlink calls into genetlink code for a sk_priv and inserts in xa_array
> 3) genetlink allocates sk_priv and returns back
> 4) devlink fills-up the sk_priv
> 
> 5) user does DEVLINK_CMD_NOTIFY_FILTER_SET, again
> 6) devlink calls into genetlink code for a sk_priv
> 7) genetlink returns already exising sk_priv found in xa_array
> 8) devlink fills-up the sk_priv
> 
> Now the notification thread, sees sk_priv zeroed between 3) and 4)
> and inconsistent during 4) and 8)
> 
> I originally solved that by rcu, DEVLINK_CMD_NOTIFY_FILTER_SET
> code always allocates and flips the rcu pointer. Notification thread
> always sees sk_priv consistent.
> 
> If you want to allocate sk_priv in genetlink code once, hard to use
> the rcu mechanism and have to protect the sk_priv memory by a lock.

No, you can do exact same thing, just instead of putting the string
directly into the xarray you put a struct which points to the string.

> What am I missing?

The fact that someone in the future may want to add another devlink
priv field, and if the state is basically a pointer to a string,
with complicated lifetime, they will have to suffer undoing that.

> >> If it is alloceted automatically, why is it needed?  
> >
> >Because priv may be a complex type which has member that need
> >individual fields to be destroyed (in fullness of time we also
> >need a constructor which can init things like list_head, but
> >we can defer that).
> >
> >I'm guessing in your case the priv will look like this:
> >
> >struct devlink_sk_priv {
> >	const char *nft_fltr_instance_name;
> >};
> >
> >static void devlink_sk_priv_free(void *ptr)
> >{
> >	struct devlink_sk_priv *priv = ptr;
> >
> >	kfree(priv->nft_fltr_instance_name);
> >}  
> 
> If genetlink code does the allocation, it should know how to free.
> Does not make sense to pass destructor to genetlink code to free memory
> it actually allocated :/
> 
> If devlink does the allocation, this callback makes sense. I was
> thinking about having it, but decided kfree is okay for now and
> destructor could be always introduced if needed.

Did you read the code snippet above?

Core still does the kfree of the container (struct devlink_sk_priv).
But what's inside the container struct (string pointer) has to be
handled by the destructor.

Feels like you focus on how to prove me wrong more than on
understanding what I'm saying :|
Jacob Keller Nov. 28, 2023, 7:59 p.m. UTC | #11
On 11/28/2023 8:18 AM, Andy Shevchenko wrote:
> On Tue, Nov 28, 2023 at 01:30:51PM +0100, Przemek Kitszel wrote:
>> On 11/23/23 19:15, Jiri Pirko wrote:
> 
> ...
> 
>>> + * Returns: valid pointer on success, otherwise NULL.
>>
>> since you are going to post next revision,
>>
>> kernel-doc requires "Return:" section (singular form)
>> https://docs.kernel.org/doc-guide/kernel-doc.html#function-documentation
>>
>> for new code we should strive to fulfil the requirement
>> (or piss-off someone powerful enough to change the requirement ;))
> 
> Interestingly that the script accepts plural for a few keywords.
> Is it documented somewhere as deprecated?
> 

I also checked the source:

$git grep --count -h 'Returns:' |  awk '{ sum += $1 } END { print sum }'3646


$git grep --count -h 'Return:' |  awk '{ sum += $1 } END { print sum }'
10907

So there is a big favor towards using 'Return:', but there are still
about 1/3 as many uses of 'Returns:'.

I dug into kernel-doc and it looks like it has accepted both "Return"
and "Returns" since the first time that section headers were limited:
f624adef3d0b ("kernel-doc: limit the "section header:" detection to a
select few")

I don't see any documentation on 'Returns;' being deprecated, but the
documentation does only call out 'Return:'.

Thanks,
Jake
Andy Shevchenko Nov. 28, 2023, 8:06 p.m. UTC | #12
On Tue, Nov 28, 2023 at 11:59:05AM -0800, Jacob Keller wrote:
> On 11/28/2023 8:18 AM, Andy Shevchenko wrote:
> > On Tue, Nov 28, 2023 at 01:30:51PM +0100, Przemek Kitszel wrote:
> >> On 11/23/23 19:15, Jiri Pirko wrote:

...

> >>> + * Returns: valid pointer on success, otherwise NULL.
> >>
> >> since you are going to post next revision,
> >>
> >> kernel-doc requires "Return:" section (singular form)
> >> https://docs.kernel.org/doc-guide/kernel-doc.html#function-documentation
> >>
> >> for new code we should strive to fulfil the requirement
> >> (or piss-off someone powerful enough to change the requirement ;))
> > 
> > Interestingly that the script accepts plural for a few keywords.
> > Is it documented somewhere as deprecated?
> 
> I also checked the source:
> 
> $git grep --count -h 'Returns:' |  awk '{ sum += $1 } END { print sum }'
> 3646
> $git grep --count -h 'Return:' |  awk '{ sum += $1 } END { print sum }'
> 10907
> 
> So there is a big favor towards using 'Return:', but there are still
> about 1/3 as many uses of 'Returns:'.
> 
> I dug into kernel-doc and it looks like it has accepted both "Return"
> and "Returns" since the first time that section headers were limited:
> f624adef3d0b ("kernel-doc: limit the "section header:" detection to a
> select few")
> 
> I don't see any documentation on 'Returns;' being deprecated, but the
> documentation does only call out 'Return:'.

Then I would amend documentation followed by amending scripts, etc.
Before that it's unclear to me that contributor must use Return:. It
sounds like similar collision to 80 vs. 100 (former in documentation,
latter in the checkpatch).

Of course, there might be sunsystem rules, but again, has to be documented.
Right?
Jiri Pirko Nov. 29, 2023, 1:59 p.m. UTC | #13
Tue, Nov 28, 2023 at 05:36:05PM CET, kuba@kernel.org wrote:
>On Tue, 28 Nov 2023 17:05:48 +0100 Jiri Pirko wrote:
>> >Not necessarily, you can have a helper which doesn't allocate, too.
>> >What I'm saying is that the common case for ops will be to access
>> >the state and allocate if it doesn't exist.
>> >
>> >How about genl_sk_family_priv() and genl_sk_has_family_priv() ?  
>> 
>> My point is, with what you suggest, it will look something like this:
>> 
>> 1) user does DEVLINK_CMD_NOTIFY_FILTER_SET
>> 2) devlink calls into genetlink code for a sk_priv and inserts in xa_array
>> 3) genetlink allocates sk_priv and returns back
>> 4) devlink fills-up the sk_priv
>> 
>> 5) user does DEVLINK_CMD_NOTIFY_FILTER_SET, again
>> 6) devlink calls into genetlink code for a sk_priv
>> 7) genetlink returns already exising sk_priv found in xa_array
>> 8) devlink fills-up the sk_priv
>> 
>> Now the notification thread, sees sk_priv zeroed between 3) and 4)
>> and inconsistent during 4) and 8)
>> 
>> I originally solved that by rcu, DEVLINK_CMD_NOTIFY_FILTER_SET
>> code always allocates and flips the rcu pointer. Notification thread
>> always sees sk_priv consistent.
>> 
>> If you want to allocate sk_priv in genetlink code once, hard to use
>> the rcu mechanism and have to protect the sk_priv memory by a lock.
>
>No, you can do exact same thing, just instead of putting the string
>directly into the xarray you put a struct which points to the string.

I'm lost. What "string" are you talking about exactly? I'm not putting
any string to xarray.

In the existing implementation, I have following struct:
struct devlink_obj_desc {
        struct rcu_head rcu;
        const char *bus_name;
        const char *dev_name;
        unsigned int port_index;
        bool port_index_valid;
        long data[];
};

This is the struct put pointer to into the xarray. Pointer to this
struct is dereferenced under rcu in notification code and the struct
is freed by kfree_rcu().


>
>> What am I missing?
>
>The fact that someone in the future may want to add another devlink
>priv field, and if the state is basically a pointer to a string,

"the state is basically a pointer to a string". I don't follow what you
mean by this :/


>with complicated lifetime, they will have to suffer undoing that.
>
>> >> If it is alloceted automatically, why is it needed?  
>> >
>> >Because priv may be a complex type which has member that need
>> >individual fields to be destroyed (in fullness of time we also
>> >need a constructor which can init things like list_head, but
>> >we can defer that).
>> >
>> >I'm guessing in your case the priv will look like this:
>> >
>> >struct devlink_sk_priv {
>> >	const char *nft_fltr_instance_name;
>> >};
>> >
>> >static void devlink_sk_priv_free(void *ptr)
>> >{
>> >	struct devlink_sk_priv *priv = ptr;
>> >
>> >	kfree(priv->nft_fltr_instance_name);
>> >}  
>> 
>> If genetlink code does the allocation, it should know how to free.
>> Does not make sense to pass destructor to genetlink code to free memory
>> it actually allocated :/
>> 
>> If devlink does the allocation, this callback makes sense. I was
>> thinking about having it, but decided kfree is okay for now and
>> destructor could be always introduced if needed.
>
>Did you read the code snippet above?

Sure.


>
>Core still does the kfree of the container (struct devlink_sk_priv).
>But what's inside the container struct (string pointer) has to be
>handled by the destructor.
>
>Feels like you focus on how to prove me wrong more than on
>understanding what I'm saying :|

Not at all, I have no reason for it. I just want to get my job done
and I am having very hard time to understand what you want exactly.
Jakub Kicinski Nov. 29, 2023, 3:01 p.m. UTC | #14
On Wed, 29 Nov 2023 14:59:31 +0100 Jiri Pirko wrote:
> Tue, Nov 28, 2023 at 05:36:05PM CET, kuba@kernel.org wrote:
> >No, you can do exact same thing, just instead of putting the string
> >directly into the xarray you put a struct which points to the string.  
> 
> I'm lost. What "string" are you talking about exactly? I'm not putting
> any string to xarray.
> 
> In the existing implementation, I have following struct:
> struct devlink_obj_desc {
>         struct rcu_head rcu;
>         const char *bus_name;
>         const char *dev_name;
>         unsigned int port_index;
>         bool port_index_valid;
>         long data[];
> };
> 
> This is the struct put pointer to into the xarray. Pointer to this
> struct is dereferenced under rcu in notification code and the struct
> is freed by kfree_rcu().

Sorry I was looking at patch 8 which has only:

+struct devlink_obj_desc {
+	struct rcu_head rcu;
+	const char *bus_name;
+	const char *dev_name;
+	long data[];
+};

that's basically a string and an rcu_head, that's what I meant.

> >Core still does the kfree of the container (struct devlink_sk_priv).
> >But what's inside the container struct (string pointer) has to be
> >handled by the destructor.
> >
> >Feels like you focus on how to prove me wrong more than on
> >understanding what I'm saying :|  
> 
> Not at all, I have no reason for it. I just want to get my job done
> and I am having very hard time to understand what you want exactly.

Sockets may want to hold state for more than filtering.
Try to look past your singular use case.
Jiri Pirko Nov. 29, 2023, 3:25 p.m. UTC | #15
Wed, Nov 29, 2023 at 04:01:57PM CET, kuba@kernel.org wrote:
>On Wed, 29 Nov 2023 14:59:31 +0100 Jiri Pirko wrote:
>> Tue, Nov 28, 2023 at 05:36:05PM CET, kuba@kernel.org wrote:
>> >No, you can do exact same thing, just instead of putting the string
>> >directly into the xarray you put a struct which points to the string.  
>> 
>> I'm lost. What "string" are you talking about exactly? I'm not putting
>> any string to xarray.
>> 
>> In the existing implementation, I have following struct:
>> struct devlink_obj_desc {
>>         struct rcu_head rcu;
>>         const char *bus_name;
>>         const char *dev_name;
>>         unsigned int port_index;
>>         bool port_index_valid;
>>         long data[];
>> };
>> 
>> This is the struct put pointer to into the xarray. Pointer to this
>> struct is dereferenced under rcu in notification code and the struct
>> is freed by kfree_rcu().
>
>Sorry I was looking at patch 8 which has only:
>
>+struct devlink_obj_desc {
>+	struct rcu_head rcu;
>+	const char *bus_name;
>+	const char *dev_name;
>+	long data[];
>+};
>
>that's basically a string and an rcu_head, that's what I meant.
>
>> >Core still does the kfree of the container (struct devlink_sk_priv).
>> >But what's inside the container struct (string pointer) has to be
>> >handled by the destructor.
>> >
>> >Feels like you focus on how to prove me wrong more than on
>> >understanding what I'm saying :|  
>> 
>> Not at all, I have no reason for it. I just want to get my job done
>> and I am having very hard time to understand what you want exactly.
>
>Sockets may want to hold state for more than filtering.
>Try to look past your singular use case.

Okay, I'll try to make another V. But please don't be mad I
misunderstood something if I do :)
Jacob Keller Nov. 29, 2023, 11:29 p.m. UTC | #16
On 11/28/2023 12:06 PM, Andy Shevchenko wrote:
> On Tue, Nov 28, 2023 at 11:59:05AM -0800, Jacob Keller wrote:
>> On 11/28/2023 8:18 AM, Andy Shevchenko wrote:
>>> On Tue, Nov 28, 2023 at 01:30:51PM +0100, Przemek Kitszel wrote:
>>>> On 11/23/23 19:15, Jiri Pirko wrote:
> 
> ...
> 
>>>>> + * Returns: valid pointer on success, otherwise NULL.
>>>>
>>>> since you are going to post next revision,
>>>>
>>>> kernel-doc requires "Return:" section (singular form)
>>>> https://docs.kernel.org/doc-guide/kernel-doc.html#function-documentation
>>>>
>>>> for new code we should strive to fulfil the requirement
>>>> (or piss-off someone powerful enough to change the requirement ;))
>>>
>>> Interestingly that the script accepts plural for a few keywords.
>>> Is it documented somewhere as deprecated?
>>
>> I also checked the source:
>>
>> $git grep --count -h 'Returns:' |  awk '{ sum += $1 } END { print sum }'
>> 3646
>> $git grep --count -h 'Return:' |  awk '{ sum += $1 } END { print sum }'
>> 10907
>>
>> So there is a big favor towards using 'Return:', but there are still
>> about 1/3 as many uses of 'Returns:'.
>>
>> I dug into kernel-doc and it looks like it has accepted both "Return"
>> and "Returns" since the first time that section headers were limited:
>> f624adef3d0b ("kernel-doc: limit the "section header:" detection to a
>> select few")
>>
>> I don't see any documentation on 'Returns;' being deprecated, but the
>> documentation does only call out 'Return:'.
> 
> Then I would amend documentation followed by amending scripts, etc.
> Before that it's unclear to me that contributor must use Return:. It
> sounds like similar collision to 80 vs. 100 (former in documentation,
> latter in the checkpatch).
> 
> Of course, there might be sunsystem rules, but again, has to be documented.
> Right?
> 

Documenting it seems reasonable to me.

Thanks,
Jake
diff mbox series

Patch

diff --git a/include/net/genetlink.h b/include/net/genetlink.h
index e18a4c0d69ee..66c1e50415e0 100644
--- a/include/net/genetlink.h
+++ b/include/net/genetlink.h
@@ -300,6 +300,9 @@  int genl_register_family(struct genl_family *family);
 int genl_unregister_family(const struct genl_family *family);
 void genl_notify(const struct genl_family *family, struct sk_buff *skb,
 		 struct genl_info *info, u32 group, gfp_t flags);
+void *genl_sk_priv_get(struct sock *sk, struct genl_family *family);
+void *genl_sk_priv_store(struct sock *sk, struct genl_family *family,
+			 void *priv);
 
 void *genlmsg_put(struct sk_buff *skb, u32 portid, u32 seq,
 		  const struct genl_family *family, int flags, u8 cmd);
diff --git a/net/netlink/af_netlink.h b/net/netlink/af_netlink.h
index 2145979b9986..5d96135a4cf3 100644
--- a/net/netlink/af_netlink.h
+++ b/net/netlink/af_netlink.h
@@ -51,6 +51,7 @@  struct netlink_sock {
 	struct rhash_head	node;
 	struct rcu_head		rcu;
 	struct work_struct	work;
+	void __rcu		*priv;
 };
 
 static inline struct netlink_sock *nlk_sk(struct sock *sk)
diff --git a/net/netlink/genetlink.c b/net/netlink/genetlink.c
index 92ef5ed2e7b0..aae5e63fa50b 100644
--- a/net/netlink/genetlink.c
+++ b/net/netlink/genetlink.c
@@ -21,6 +21,7 @@ 
 #include <linux/idr.h>
 #include <net/sock.h>
 #include <net/genetlink.h>
+#include "af_netlink.h"
 
 static DEFINE_MUTEX(genl_mutex); /* serialization of message processing */
 static DECLARE_RWSEM(cb_lock);
@@ -1699,12 +1700,109 @@  static int genl_bind(struct net *net, int group)
 	return ret;
 }
 
+struct genl_sk_ctx {
+	struct xarray family_privs;
+};
+
+static struct genl_sk_ctx *genl_sk_ctx_alloc(void)
+{
+	struct genl_sk_ctx *ctx;
+
+	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
+	if (!ctx)
+		return NULL;
+	xa_init_flags(&ctx->family_privs, XA_FLAGS_ALLOC);
+	return ctx;
+}
+
+static void genl_sk_ctx_free(struct genl_sk_ctx *ctx)
+{
+	unsigned long family_id;
+	void *priv;
+
+	xa_for_each(&ctx->family_privs, family_id, priv) {
+		xa_erase(&ctx->family_privs, family_id);
+		kfree(priv);
+	}
+	xa_destroy(&ctx->family_privs);
+	kfree(ctx);
+}
+
+/**
+ * genl_sk_priv_get - Get per-socket private pointer for family
+ *
+ * @sk: socket
+ * @family: family
+ *
+ * Lookup a private pointer stored per-socket by a specified
+ * Generic netlink family.
+ *
+ * Caller should make sure this is called in RCU read locked section.
+ *
+ * Returns: valid pointer on success, otherwise NULL.
+ */
+void *genl_sk_priv_get(struct sock *sk, struct genl_family *family)
+{
+	struct genl_sk_ctx *ctx;
+
+	ctx = rcu_dereference(nlk_sk(sk)->priv);
+	if (!ctx)
+		return NULL;
+	return xa_load(&ctx->family_privs, family->id);
+}
+
+/**
+ * genl_sk_priv_store - Store per-socket private pointer for family
+ *
+ * @sk: socket
+ * @family: family
+ * @priv: private pointer
+ *
+ * Store a private pointer per-socket for a specified
+ * Generic netlink family.
+ *
+ * Caller has to make sure this is not called in parallel multiple times
+ * for the same sock and also in parallel to genl_release() for the same sock.
+ *
+ * Returns: previously stored private pointer for the family (could be NULL)
+ * on success, otherwise negative error value encoded by ERR_PTR().
+ */
+void *genl_sk_priv_store(struct sock *sk, struct genl_family *family,
+			 void *priv)
+{
+	struct genl_sk_ctx *ctx;
+	void *old_priv;
+
+	ctx = rcu_dereference_raw(nlk_sk(sk)->priv);
+	if (!ctx) {
+		ctx = genl_sk_ctx_alloc();
+		if (!ctx)
+			return ERR_PTR(-ENOMEM);
+		rcu_assign_pointer(nlk_sk(sk)->priv, ctx);
+	}
+
+	old_priv = xa_store(&ctx->family_privs, family->id, priv, GFP_KERNEL);
+	if (xa_is_err(old_priv))
+		return ERR_PTR(xa_err(old_priv));
+	return old_priv;
+}
+
+static void genl_release(struct sock *sk, unsigned long *groups)
+{
+	struct genl_sk_ctx *ctx;
+
+	ctx = rcu_dereference_raw(nlk_sk(sk)->priv);
+	if (ctx)
+		genl_sk_ctx_free(ctx);
+}
+
 static int __net_init genl_pernet_init(struct net *net)
 {
 	struct netlink_kernel_cfg cfg = {
 		.input		= genl_rcv,
 		.flags		= NL_CFG_F_NONROOT_RECV,
 		.bind		= genl_bind,
+		.release	= genl_release,
 	};
 
 	/* we'll bump the group number right afterwards */