[net-next,0/4] netlink: add 'bitmap' attribute type and API

Message ID	20220721155950.747251-1-alexandr.lobakin@intel.com (mailing list archive)
Headers	show Return-Path: <netdev-owner@kernel.org> From: Alexander Lobakin <alexandr.lobakin@intel.com> To: "David S. Miller" <davem@davemloft.net>, Eric Dumazet <edumazet@google.com>, Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com> Cc: Alexander Lobakin <alexandr.lobakin@intel.com>, Yury Norov <yury.norov@gmail.com>, Andy Shevchenko <andriy.shevchenko@linux.intel.com>, Michal Swiatkowski <michal.swiatkowski@linux.intel.com>, Rasmus Villemoes <linux@rasmusvillemoes.dk>, Nikolay Aleksandrov <razor@blackwall.org>, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH net-next 0/4] netlink: add 'bitmap' attribute type and API Date: Thu, 21 Jul 2022 17:59:46 +0200 Message-Id: <20220721155950.747251-1-alexandr.lobakin@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	netlink: add 'bitmap' attribute type and API \| expand [net-next,0/4] netlink: add 'bitmap' attribute type and API [net-next,1/4] bitmap: add converting from/to 64-bit arrays of explicit byteorder [net-next,2/4] bitmap: add a couple more helpers to work with arrays of u64s [net-next,3/4] lib/test_bitmap: cover explicitly byteordered arr64s [net-next,4/4] netlink: add 'bitmap' attribute type (%NL_ATTR_TYPE_BITMAP / %NLA_BITMAP)

Alexander Lobakin July 21, 2022, 3:59 p.m. UTC

Add a new type of Netlink attribute -- bitmap.

Lots of different bitfields in the kernel grow through time,
sometimes significantly. They can consume a u8 at fist, then require
16 bits, 32... To support such bitfields without rewriting code each
time, kernel have bitmaps. For now, that is purely in-kernel type
and API, and to communicate with userspace, they need to be
converted to some primitive at first (e.g. __u32 etc.), dealing with
that sometimes bitfields will run out of the size set for the
corresponding Netlink attribute. Those can be netdev features,
linkmodes and so on.
Internally, in-kernel bitmaps are represented as arrays of unsigned
longs. This provides optimal performance and memory usage; however,
bitness dependent types can't be used to communicate between kernel
and userspace -- for example, userapp can be 32-bit on a 64-bit
system. So, to provide reliable communication data type, 64-bit
arrays are now used. Netlink core takes care of converting them
from/to unsigned longs when sending or receiving Netlink messages;
although, on LE and 64-bit systems conversion is a no-op. They also
can have explicit byteorder -- core code also handles this (both
kernel and userspace must know in advance the byteorder of a
particular attribute), as well as cases when the userspace and the
kernel assume different number of bits (-> different number of u64s)
for an attribute.

Basic consumer functions/macros are:
* nla_put_bitmap and nla_get_bitmap families -- to easily put a
  bitmap to an skb or get it from a received message (only pointer
  to an unsigned long bitmap and the number of bits in it are
  needed), with optional explicit byteorder;
* nla_total_size_bitmap() -- to provide estimate size in bytes to
  Netlink needed to store a bitmap;
* {,__}NLA_POLICY_BITMAP() -- to declare a Netlink policy for a
  bitmap attribute.

Netlink policy for a bitmap can have an optional bitmap mask of bits
supported by the code -- for example, to filter out obsolete bits
removed some time ago. Without it, Netlink will make sure no bits
past the passed number are set. Both variants can be requested from
the userspace and the kernel will put a mask into a new policy
attribute (%NL_POLICY_TYPE_ATTR_BITMAP_MASK).
BTW, Ethtool bitsets provide similar functionality, but it operates
with u32s (u64 is more convenient and optimal on most platforms) and
Netlink bitmaps is a generic interface providing policies and data
verification (Ethtool bitsets are declared simply as %NLA_BINARY),
generic getters/setters etc.

An example of using this API can be found in my IP tunnel tree[0]
(planned for submission later), the actual average number of locs
to start both sending and receiving bitmaps in one subsys is ~10[1].
And it looks like that some of the already existing APIs could be
later converted to Netlink bitmaps or expanded as well.

[0] https://github.com/alobakin/linux/commits/ip_tunnel
[1] https://github.com/alobakin/linux/commit/26d2f2cf13fd

Alexander Lobakin (4):
  bitmap: add converting from/to 64-bit arrays of explicit byteorder
  bitmap: add a couple more helpers to work with arrays of u64s
  lib/test_bitmap: cover explicitly byteordered arr64s
  netlink: add 'bitmap' attribute type (%NL_ATTR_TYPE_BITMAP /
    %NLA_BITMAP)

 include/linux/bitmap.h       |  80 ++++++++++++++++--
 include/net/netlink.h        | 159 ++++++++++++++++++++++++++++++++++-
 include/uapi/linux/netlink.h |   5 ++
 lib/bitmap.c                 | 143 +++++++++++++++++++++++++++----
 lib/nlattr.c                 |  43 +++++++++-
 lib/test_bitmap.c            |  22 +++--
 net/netlink/policy.c         |  44 ++++++++++
 7 files changed, 461 insertions(+), 35 deletions(-)


base-commit: 3a2ba42cbd0b669ce3837ba400905f93dd06c79f
---
Targeted for net-next, but uses base-commit from the bitmap tree
(a merge will be needed) and the API is for kernel-wide/generic
usage rather than networking-specific.

Jakub Kicinski July 21, 2022, 6:13 p.m. UTC | #1

On Thu, 21 Jul 2022 17:59:46 +0200 Alexander Lobakin wrote:
> BTW, Ethtool bitsets provide similar functionality, but it operates
> with u32s (u64 is more convenient and optimal on most platforms) and
> Netlink bitmaps is a generic interface providing policies and data
> verification (Ethtool bitsets are declared simply as %NLA_BINARY),
> generic getters/setters etc.

Are you saying we don't need the other two features ethtool bitmaps
provide? Masking and compact vs named representations?

I think that straight up bitmap with a fixed word is awkward and leads
to too much boilerplate code. People will avoid using it. What about
implementing a bigint type instead? Needing more than 64b is extremely
rare, so in 99% of the cases the code outside of parsing can keep using
its u8/u16/u32.

Alexander Lobakin July 22, 2022, 2:55 p.m. UTC | #2

From: Jakub Kicinski <kuba@kernel.org>
Date: Thu, 21 Jul 2022 11:13:18 -0700

> On Thu, 21 Jul 2022 17:59:46 +0200 Alexander Lobakin wrote:
> > BTW, Ethtool bitsets provide similar functionality, but it operates
> > with u32s (u64 is more convenient and optimal on most platforms) and
> > Netlink bitmaps is a generic interface providing policies and data
> > verification (Ethtool bitsets are declared simply as %NLA_BINARY),
> > generic getters/setters etc.
> 
> Are you saying we don't need the other two features ethtool bitmaps
> provide? Masking and compact vs named representations?

Nah I didn't say that. I'm not too familiar with Ethtool bitsets,
just know that they're represented as arrays of u32s.

> 
> I think that straight up bitmap with a fixed word is awkward and leads
> to too much boilerplate code. People will avoid using it. What about
> implementing a bigint type instead? Needing more than 64b is extremely
> rare, so in 99% of the cases the code outside of parsing can keep using
> its u8/u16/u32.

In-kernel code can still use single unsigned long for some flags if
it wouldn't need more than 64 bits in a couple decades and not
bother with the bitmap API. Same with userspace -- a single 64 is
fine for that API, just pass a pointer to it to send it as a bitmap
to the kernel.

Re 64b vs extremely rare -- I would say so 5 years go, but now more
and more bitfields run out of 64 bits. Link modes, netdev features,
...

Re bigint -- do you mean implementing u128 as a union, like

typedef union __u128 {
	struct {
		u32 b127_96;
		u32 b95_64;
		u32 b63_32;
		u32 b31_0;
	};
	struct {
		u64 b127_64;
		u64 b63_0;
	};
#ifdef __HAVE_INT128
	__int128 b127_0;
#endif
} u128;

? We have similar feature in one of our internal trees and planning
to present generic u128 soon, but this doesn't work well for flags
I think.
bitmap API and bitops are widely used and familiar to tons of folks,
most platforms define their own machine-optimized bitops
implementation, arrays of unsigned longs are native...

Re awkward -- all u64 <-> bitmap conversion is implemented in the
core code in 4/4 and users won't need doing anything besides one
get/set. And still use bitmap/bitops API. Userspace, as I said,
can use a single __u64 as long as it fits into 64 bits.

Summarizing, I feel like bigints would lead to much more boilerplate
in both kernel and user spaces and need to implement a whole new API
instead of using the already existing and well-used bitmap one.
Continuation of using single objects with fixed size like %NLA_U* or
%NLA_BITFIELD_U32 will lead to introducing a new Netlink attr every
32/64 bits (or even 16 like with IP tunnels, that was the initial
reason why I started working on those 3 series). As Jake wrote me
in PM earlier,

"I like the concept of an NLA_BITMAP. I could have used this for
some of the devlink interfaces we've done, and it definitely feels
a bit more natural than being forced to a single u32 bitfield."

Thanks,
Olek

Jakub Kicinski July 22, 2022, 6:19 p.m. UTC | #3

On Fri, 22 Jul 2022 16:55:14 +0200 Alexander Lobakin wrote:
> > I think that straight up bitmap with a fixed word is awkward and leads
> > to too much boilerplate code. People will avoid using it. What about
> > implementing a bigint type instead? Needing more than 64b is extremely
> > rare, so in 99% of the cases the code outside of parsing can keep using
> > its u8/u16/u32.  
> 
> In-kernel code can still use single unsigned long for some flags if
> it wouldn't need more than 64 bits in a couple decades and not
> bother with the bitmap API. Same with userspace -- a single 64 is
> fine for that API, just pass a pointer to it to send it as a bitmap
> to the kernel.

Single unsigned long - exactly. What I'm saying is that you need to
convince people who think they are "clever" by using u8 or u16 for
flags because it "saves space". Happens a lot, I can tell your from
reviewing such patches. And to avoid that we should give people a
"growable" type but starting from smaller sizes than a bitmap.

It's about human psychology as observed, not logic, just to be extra
clear.

A logical argument of smaller importance is that ID types have a
similar problem, although practically it's even more rare than for
flags  (I think it did happen once with inodes or some such, tho). 
So bigint has more uses.

I'd forgo the endianness BTW, it's a distraction at the netlink level.
There was more reason for it in fixed-size fields.

> Re 64b vs extremely rare -- I would say so 5 years go, but now more
> and more bitfields run out of 64 bits. Link modes, netdev features,
> ...

I like how you put the "..." there even tho these are the only two cases
either of us can think of, right? :) There are hundreds of flags in the
kernel, two needed to grow into a bitmap. And the problem you're
working on is with a 16 bit field, or 32 bit filed? Defo not 64b, right?

> Re bigint -- do you mean implementing u128 as a union, like
> 
> typedef union __u128 {
> 	struct {
> 		u32 b127_96;
> 		u32 b95_64;
> 		u32 b63_32;
> 		u32 b31_0;
> 	};
> 	struct {
> 		u64 b127_64;
> 		u64 b63_0;
> 	};
> #ifdef __HAVE_INT128
> 	__int128 b127_0;
> #endif
> } u128;
> 
> ? We have similar feature in one of our internal trees and planning
> to present generic u128 soon, but this doesn't work well for flags
> I think.

Actually once a field crosses the biggest natural int size I was
thinking that the user would go to a bitmap.

So at the netlink level the field is bigint (LE, don't care about BE).
Kernel side API is:

	nla_get_b8, nla_get_b16, nla_get_b32, nla_get_b64,
	nla_get_bitmap
	nla_put_b8, nla_put_b16 etc.

	u16 my_flags_are_so_toight;

	my_flags_are_so_toight = nla_get_b16(attr[BLAA_BLA_BLA_FLAGS]);

The point is - the representation can be more compact than u64 and will
therefore encourage anyone who doesn't have a strong reason to use
fixed size fields to switch to the bigint.

Honestly, if we were redoing netlink from scratch today I'd argue there
should be no fixed width integers in host endian.

> bitmap API and bitops are widely used and familiar to tons of folks,
> most platforms define their own machine-optimized bitops
> implementation, arrays of unsigned longs are native...
> 
> Re awkward -- all u64 <-> bitmap conversion is implemented in the
> core code in 4/4 and users won't need doing anything besides one
> get/set. And still use bitmap/bitops API. Userspace, as I said,
> can use a single __u64 as long as it fits into 64 bits.

Yes, but people will see a bitmap and think "I don't need a bitmap, 
I just need X bits".

> Summarizing, I feel like bigints would lead to much more boilerplate
> in both kernel and user spaces and need to implement a whole new API
> instead of using the already existing and well-used bitmap one.
> Continuation of using single objects with fixed size like %NLA_U* or
> %NLA_BITFIELD_U32 will lead to introducing a new Netlink attr every
> 32/64 bits (or even 16 like with IP tunnels, that was the initial
> reason why I started working on those 3 series). As Jake wrote me
> in PM earlier,

Every netlink attribute carries length. As long as you establish
correct expectations for parsing there is no need to create new types.

> "I like the concept of an NLA_BITMAP. I could have used this for
> some of the devlink interfaces we've done, and it definitely feels
> a bit more natural than being forced to a single u32 bitfield."

Slightly ironic to say "if I could use a larger type I would" 
and then not use the largest one available? ;) But the point is
about being flexible when it comes to size, I guess, more than
bitmap in particular.

Jakub Kicinski July 23, 2022, 3:52 p.m. UTC | #4

On Fri, 22 Jul 2022 11:19:51 -0700 Jakub Kicinski wrote:
> So at the netlink level the field is bigint (LE, don't care about BE).
> Kernel side API is:
> 
> 	nla_get_b8, nla_get_b16, nla_get_b32, nla_get_b64,
> 	nla_get_bitmap
> 	nla_put_b8, nla_put_b16 etc.
> 
> 	u16 my_flags_are_so_toight;
> 
> 	my_flags_are_so_toight = nla_get_b16(attr[BLAA_BLA_BLA_FLAGS]);
> 
> The point is - the representation can be more compact than u64 and will
> therefore encourage anyone who doesn't have a strong reason to use
> fixed size fields to switch to the bigint.

IDK if I convinced you yet, but in case you're typing the code for this
- the types smaller than 4B don't really have to be represented at the
netlink level. Padding will round them up, anyway. But we do want the
32b values there, otherwise we need padding type to align which bloats
the API.

Alexander Lobakin July 25, 2022, 10:24 a.m. UTC | #5

From: Andy Shevchenko <andy.shevchenko@gmail.com>
Date: Fri, 22 Jul 2022 19:02:35 +0200

> On Friday, July 22, 2022, Alexander Lobakin <alexandr.lobakin@intel.com>
> wrote:
> 
> > From: Jakub Kicinski <kuba@kernel.org>
> > Date: Thu, 21 Jul 2022 11:13:18 -0700
> >
> > > On Thu, 21 Jul 2022 17:59:46 +0200 Alexander Lobakin wrote:
> > > > BTW, Ethtool bitsets provide similar functionality, but it operates
> > > > with u32s (u64 is more convenient and optimal on most platforms) and
> > > > Netlink bitmaps is a generic interface providing policies and data
> > > > verification (Ethtool bitsets are declared simply as %NLA_BINARY),
> > > > generic getters/setters etc.
> > >
> > > Are you saying we don't need the other two features ethtool bitmaps
> > > provide? Masking and compact vs named representations?
> >
> > Nah I didn't say that. I'm not too familiar with Ethtool bitsets,
> > just know that they're represented as arrays of u32s.
> >
> > >
> > > I think that straight up bitmap with a fixed word is awkward and leads
> > > to too much boilerplate code. People will avoid using it. What about
> > > implementing a bigint type instead? Needing more than 64b is extremely
> > > rare, so in 99% of the cases the code outside of parsing can keep using
> > > its u8/u16/u32.
> >
> > In-kernel code can still use single unsigned long for some flags if
> > it wouldn't need more than 64 bits in a couple decades and not
> > bother with the bitmap API. Same with userspace -- a single 64 is
> > fine for that API, just pass a pointer to it to send it as a bitmap
> > to the kernel.
> >
> > Re 64b vs extremely rare -- I would say so 5 years go, but now more
> > and more bitfields run out of 64 bits. Link modes, netdev features,
> > ...
> >
> > Re bigint -- do you mean implementing u128 as a union, like
> >
> > typedef union __u128 {
> >         struct {
> >                 u32 b127_96;
> >                 u32 b95_64;
> >                 u32 b63_32;
> >                 u32 b31_0;
> >         };
> >         struct {
> >                 u64 b127_64;
> >                 u64 b63_0;
> >         };
> > #ifdef __HAVE_INT128
> >         __int128 b127_0;
> > #endif
> > } u128;
> >
> > ?
> 
> 
> This looks not good (besides union aliasing, have you thought about BE64?).

It was just an example to vizualize the thought.

> 
> 
> 
> >
> >
> > We have similar feature in one of our internal trees and planning
> > to present generic u128 soon, but this doesn't work well for flags
> > I think.
> > bitmap API and bitops are widely used and familiar to tons of folks,
> > most platforms define their own machine-optimized bitops
> > implementation, arrays of unsigned longs are native...
> >
> > Re awkward -- all u64 <-> bitmap conversion is implemented in the
> > core code in 4/4 and users won't need doing anything besides one
> > get/set. And still use bitmap/bitops API. Userspace, as I said,
> > can use a single __u64 as long as it fits into 64 bits.
> >
> > Summarizing, I feel like bigints would lead to much more boilerplate
> > in both kernel and user spaces and need to implement a whole new API
> > instead of using the already existing and well-used bitmap one.
> > Continuation of using single objects with fixed size like %NLA_U* or
> > %NLA_BITFIELD_U32 will lead to introducing a new Netlink attr every
> > 32/64 bits (or even 16 like with IP tunnels, that was the initial
> > reason why I started working on those 3 series). As Jake wrote me
> > in PM earlier,
> >
> > "I like the concept of an NLA_BITMAP. I could have used this for
> > some of the devlink interfaces we've done, and it definitely feels
> > a bit more natural than being forced to a single u32 bitfield."
> 
> 
> Netlink is an ABI, how would you naturally convert unsigned long from
> 64-bit kernel to the unsigned long on 32-bit user space, esp. on BE
> architectures?

Uhm, that's what this patchset does. Cover letter says: we can't
transfer longs between userspace and the kernel, so in Netlink
messages they're represented as arrays of u64s, then Netlink core
in the kernel code takes care of converting them to longs when you
call getter (and vice versa for setter)...
On LE and 64-bit BE architectures this conversion is a noop (see
bitmap_{from,to}_arr64()), that's why u64s were chosen, not u32s
like in Ethtool for example.

> 
> 
> >
> > Thanks,
> > Olek
> >
> 
> 
> -- 
> With Best Regards,
> Andy Shevchenko

Thanks,
Olek

Alexander Lobakin July 25, 2022, 1:02 p.m. UTC | #6

From: Jakub Kicinski <kuba@kernel.org>
Date: Fri, 22 Jul 2022 11:19:51 -0700

> On Fri, 22 Jul 2022 16:55:14 +0200 Alexander Lobakin wrote:
> > > I think that straight up bitmap with a fixed word is awkward and leads
> > > to too much boilerplate code. People will avoid using it. What about
> > > implementing a bigint type instead? Needing more than 64b is extremely
> > > rare, so in 99% of the cases the code outside of parsing can keep using
> > > its u8/u16/u32.  
> > 
> > In-kernel code can still use single unsigned long for some flags if
> > it wouldn't need more than 64 bits in a couple decades and not
> > bother with the bitmap API. Same with userspace -- a single 64 is
> > fine for that API, just pass a pointer to it to send it as a bitmap
> > to the kernel.
> 
> Single unsigned long - exactly. What I'm saying is that you need to
> convince people who think they are "clever" by using u8 or u16 for
> flags because it "saves space". Happens a lot, I can tell your from
> reviewing such patches. And to avoid that we should give people a
  ^^^^^^^^^^^^^^^^^^^^^^

Oh true!

> "growable" type but starting from smaller sizes than a bitmap.
> 
> It's about human psychology as observed, not logic, just to be extra
> clear.
> 
> A logical argument of smaller importance is that ID types have a
> similar problem, although practically it's even more rare than for
> flags  (I think it did happen once with inodes or some such, tho). 
> So bigint has more uses.
> 
> I'd forgo the endianness BTW, it's a distraction at the netlink level.
> There was more reason for it in fixed-size fields.

Hmm yeah, and I don't use explicit-endian arr64s in the IP tunnel
changeset. Probably not worth it, I did it only for some... uhm,
_flexibility_.

> 
> > Re 64b vs extremely rare -- I would say so 5 years go, but now more
> > and more bitfields run out of 64 bits. Link modes, netdev features,
> > ...
> 
> I like how you put the "..." there even tho these are the only two cases
> either of us can think of, right? :) There are hundreds of flags in the

Hahaha you got me!

> kernel, two needed to grow into a bitmap. And the problem you're
> working on is with a 16 bit field, or 32 bit filed? Defo not 64b, right?

It's even worse, IP tunnel flags is __be16, so I have to redo all
the use sites either case, whether I change them to u64 or bitmap
or even __be64 -- no way to optimize it to a noop >_<

Initially I've converted them to u64 IIRC. Then I looked at the
timeline of adding new flags and calculated that in the best case
we'll run out of u64 in 20 years. But there's a spike in ~5 new
flags during the last n months, so... Given that it took a lot of
routine non-so-exicing adjusting all the use sites (I can't imagine
how that guy who's converting netdev features to bitmaps right now
managed finish redoing 120+ drivers under drivers/net/), if/when
`u64 flags` comes to its end, there will be a new patch series
doing the same job again...

> 
> > Re bigint -- do you mean implementing u128 as a union, like
> > 
> > typedef union __u128 {
> > 	struct {
> > 		u32 b127_96;
> > 		u32 b95_64;
> > 		u32 b63_32;
> > 		u32 b31_0;
> > 	};
> > 	struct {
> > 		u64 b127_64;
> > 		u64 b63_0;
> > 	};
> > #ifdef __HAVE_INT128
> > 	__int128 b127_0;
> > #endif
> > } u128;
> > 
> > ? We have similar feature in one of our internal trees and planning
> > to present generic u128 soon, but this doesn't work well for flags
> > I think.
> 
> Actually once a field crosses the biggest natural int size I was
> thinking that the user would go to a bitmap.
> 
> So at the netlink level the field is bigint (LE, don't care about BE).
> Kernel side API is:
> 
> 	nla_get_b8, nla_get_b16, nla_get_b32, nla_get_b64,
> 	nla_get_bitmap
> 	nla_put_b8, nla_put_b16 etc.
> 
> 	u16 my_flags_are_so_toight;
> 
> 	my_flags_are_so_toight = nla_get_b16(attr[BLAA_BLA_BLA_FLAGS]);
> 
> The point is - the representation can be more compact than u64 and will
> therefore encourage anyone who doesn't have a strong reason to use
> fixed size fields to switch to the bigint.

Ahh looks like I got it! So you mean that at Netlink level we should
exchange with bigint/u64arrs, but there should be an option to
get/set only 16/32/64 bits from them to simplify (or keep simple)
users? Like, if we have `u16 uuid`, to not do

	unsigned long uuid_bitmap;

	nla_get_bitmap(attr[FLAGS], &uuid_bitmap, BITS_PER_TYPE(u16));
	uuid = (u16)uuid_bitmap;

but instead

	uuid = nla_get_b16(attr[FLAGS]);

?

> 
> Honestly, if we were redoing netlink from scratch today I'd argue there
> should be no fixed width integers in host endian.
> 
> > bitmap API and bitops are widely used and familiar to tons of folks,
> > most platforms define their own machine-optimized bitops
> > implementation, arrays of unsigned longs are native...
> > 
> > Re awkward -- all u64 <-> bitmap conversion is implemented in the
> > core code in 4/4 and users won't need doing anything besides one
> > get/set. And still use bitmap/bitops API. Userspace, as I said,
> > can use a single __u64 as long as it fits into 64 bits.
> 
> Yes, but people will see a bitmap and think "I don't need a bitmap, 
> I just need X bits".
> 
> > Summarizing, I feel like bigints would lead to much more boilerplate
> > in both kernel and user spaces and need to implement a whole new API
> > instead of using the already existing and well-used bitmap one.
> > Continuation of using single objects with fixed size like %NLA_U* or
> > %NLA_BITFIELD_U32 will lead to introducing a new Netlink attr every
> > 32/64 bits (or even 16 like with IP tunnels, that was the initial
> > reason why I started working on those 3 series). As Jake wrote me
> > in PM earlier,
> 
> Every netlink attribute carries length. As long as you establish
> correct expectations for parsing there is no need to create new types.

Right, but %NLA_BITFIELD_U32 landed +/- recently IIRC :)


> 
> > "I like the concept of an NLA_BITMAP. I could have used this for
> > some of the devlink interfaces we've done, and it definitely feels
> > a bit more natural than being forced to a single u32 bitfield."
> 
> Slightly ironic to say "if I could use a larger type I would" 
> and then not use the largest one available? ;) But the point is

:D Agree

> about being flexible when it comes to size, I guess, more than
> bitmap in particular.

Probably, but you also said above that for anything bigger than
longs you'd go for bitmaps, didn't you? So I guess that series
goes in the right direction, just needs a couple more inlines
to be able to get/put u{32, 64; 8, 16 -- not sure about these two
after reading your follow-up mail} as easy as nla_{get,put}<size>()
and probably dropping Endianness stuff? Hope I got it right ._.

Thanks,
Olek

Jakub Kicinski July 25, 2022, 6:53 p.m. UTC | #7

On Mon, 25 Jul 2022 15:02:55 +0200 Alexander Lobakin wrote:
> > Actually once a field crosses the biggest natural int size I was
> > thinking that the user would go to a bitmap.
> > 
> > So at the netlink level the field is bigint (LE, don't care about BE).
> > Kernel side API is:
> > 
> > 	nla_get_b8, nla_get_b16, nla_get_b32, nla_get_b64,
> > 	nla_get_bitmap
> > 	nla_put_b8, nla_put_b16 etc.
> > 
> > 	u16 my_flags_are_so_toight;
> > 
> > 	my_flags_are_so_toight = nla_get_b16(attr[BLAA_BLA_BLA_FLAGS]);
> > 
> > The point is - the representation can be more compact than u64 and will
> > therefore encourage anyone who doesn't have a strong reason to use
> > fixed size fields to switch to the bigint.  
> 
> Ahh looks like I got it! So you mean that at Netlink level we should
> exchange with bigint/u64arrs, but there should be an option to
> get/set only 16/32/64 bits from them to simplify (or keep simple)
> users?

Not exactly. I'd prefer if the netlink level was in u32 increments.
u64 requires padding (so the nla_put..() calls will have more args).
Netlink requires platform alignment and rounds up to 4B, so u32 is much
more convenient than u64. Similarly - it doesn't make sense to represent
sizes smaller than 4B because of the rounding up, so nla_put_b8() can
be a define to nla_put_b32(). Ethool's choice of u32 is not without
merit.

> Like, if we have `u16 uuid`, to not do
> 
> 	unsigned long uuid_bitmap;
> 
> 	nla_get_bitmap(attr[FLAGS], &uuid_bitmap, BITS_PER_TYPE(u16));
> 	uuid = (u16)uuid_bitmap;
> 
> but instead
> 
> 	uuid = nla_get_b16(attr[FLAGS]);
> 
> ?

Yes.

> > about being flexible when it comes to size, I guess, more than
> > bitmap in particular.  
> 
> Probably, but you also said above that for anything bigger than
> longs you'd go for bitmaps, didn't you? So I guess that series
> goes in the right direction, just needs a couple more inlines
> to be able to get/put u{32, 64; 8, 16 -- not sure about these two
> after reading your follow-up mail} as easy as nla_{get,put}<size>()
> and probably dropping Endianness stuff? Hope I got it right ._.

Modulo the fact that I do still want to pack to u32. Especially a
single u32 - perhaps once we cross 8B we can switch to requiring 8B
increments.

The nla_len is 16bit, which means that attrs nested inside other attrs
are quite tight for space (see the sad story of VF attrs in
RTM_GETLINK). If we don't represent u8/u16/u32 in a netlink-level
efficient manner we're back to people arguing for raw u32s rather than
using the new type.

Alexander Lobakin July 26, 2022, 10:41 a.m. UTC | #8

From: Jakub Kicinski <kuba@kernel.org>
Date: Mon, 25 Jul 2022 11:53:24 -0700

> On Mon, 25 Jul 2022 15:02:55 +0200 Alexander Lobakin wrote:
> > > Actually once a field crosses the biggest natural int size I was
> > > thinking that the user would go to a bitmap.
> > > 
> > > So at the netlink level the field is bigint (LE, don't care about BE).
> > > Kernel side API is:
> > > 
> > > 	nla_get_b8, nla_get_b16, nla_get_b32, nla_get_b64,
> > > 	nla_get_bitmap
> > > 	nla_put_b8, nla_put_b16 etc.
> > > 
> > > 	u16 my_flags_are_so_toight;
> > > 
> > > 	my_flags_are_so_toight = nla_get_b16(attr[BLAA_BLA_BLA_FLAGS]);
> > > 
> > > The point is - the representation can be more compact than u64 and will
> > > therefore encourage anyone who doesn't have a strong reason to use
> > > fixed size fields to switch to the bigint.  
> > 
> > Ahh looks like I got it! So you mean that at Netlink level we should
> > exchange with bigint/u64arrs, but there should be an option to
> > get/set only 16/32/64 bits from them to simplify (or keep simple)
> > users?
> 
> Not exactly. I'd prefer if the netlink level was in u32 increments.
> u64 requires padding (so the nla_put..() calls will have more args).
> Netlink requires platform alignment and rounds up to 4B, so u32 is much
> more convenient than u64. Similarly - it doesn't make sense to represent
> sizes smaller than 4B because of the rounding up, so nla_put_b8() can
> be a define to nla_put_b32(). Ethool's choice of u32 is not without
> merit.
> 
> > Like, if we have `u16 uuid`, to not do
> > 
> > 	unsigned long uuid_bitmap;
> > 
> > 	nla_get_bitmap(attr[FLAGS], &uuid_bitmap, BITS_PER_TYPE(u16));
> > 	uuid = (u16)uuid_bitmap;
> > 
> > but instead
> > 
> > 	uuid = nla_get_b16(attr[FLAGS]);
> > 
> > ?
> 
> Yes.
> 
> > > about being flexible when it comes to size, I guess, more than
> > > bitmap in particular.  
> > 
> > Probably, but you also said above that for anything bigger than
> > longs you'd go for bitmaps, didn't you? So I guess that series
> > goes in the right direction, just needs a couple more inlines
> > to be able to get/put u{32, 64; 8, 16 -- not sure about these two
> > after reading your follow-up mail} as easy as nla_{get,put}<size>()
> > and probably dropping Endianness stuff? Hope I got it right ._.
> 
> Modulo the fact that I do still want to pack to u32. Especially a
> single u32 - perhaps once we cross 8B we can switch to requiring 8B
> increments.
> 
> The nla_len is 16bit, which means that attrs nested inside other attrs
> are quite tight for space (see the sad story of VF attrs in
> RTM_GETLINK). If we don't represent u8/u16/u32 in a netlink-level
> efficient manner we're back to people arguing for raw u32s rather than
> using the new type.

Ah, got it. I think you're right. The only thing that bothers me
a bit is that we'll be converting arrays of u32s to unsigned longs
each time, unlike u64s <--> longs which are 1:1 for 64 bits and LEs.
But I guess it's better anyway than waste Netlink attrs for
paddings?

So I'd like to summarize for a v2:

* represent as u32s, not u64s;
* present it as "bigints", not bitmaps. It can carry bitmaps as
  well, but also ->
* add getters/setters for u8, 16, 32, 64s. Bitmaps are already here.
  Probably u32 arrays as well? Just copy them directly. And maybe
  u64 arrays, too (convert Netlink u32s to target u64s)?
* should I drop Endianness stuff? With it still in place, it would
  be easier to use this new API to exchange with e.g. __be64.
* hope I didn't miss anything?

Thanks a lot, some great ideas here from your side :)

Olek

Jakub Kicinski July 26, 2022, 5:17 p.m. UTC | #9

On Tue, 26 Jul 2022 12:41:45 +0200 Alexander Lobakin wrote:
> > Modulo the fact that I do still want to pack to u32. Especially a
> > single u32 - perhaps once we cross 8B we can switch to requiring 8B
> > increments.
> > 
> > The nla_len is 16bit, which means that attrs nested inside other attrs
> > are quite tight for space (see the sad story of VF attrs in
> > RTM_GETLINK). If we don't represent u8/u16/u32 in a netlink-level
> > efficient manner we're back to people arguing for raw u32s rather than
> > using the new type.  
> 
> Ah, got it. I think you're right. The only thing that bothers me
> a bit is that we'll be converting arrays of u32s to unsigned longs
> each time, unlike u64s <--> longs which are 1:1 for 64 bits and LEs.

For LE the size of the word doesn't matter, right? Don't trust me on
endian questions, but I thought for LE you can just load the data as
ulongs as long as size is divisible by sizeof(ulong) and you're gonna
be good. It'd add some #ifdefs and ifs() to the bitmap code, but the
ethtool u32 conversions would probably also benefit?

> But I guess it's better anyway than waste Netlink attrs for
> paddings?

Yes, AFAIU it's pretty bad. Say you have a nest with objects with 
3 big ints inside. With u32 the size of a nest is 4 + 3*(4+4) = 28.
With u64 it's 4 + 4+8 + 2*(4+4+8) = 48 so 28 vs 48. Netlink header
being 4B almost always puts us out of alignment.

> So I'd like to summarize for a v2:
> 
> * represent as u32s, not u64s;
> * present it as "bigints", not bitmaps. It can carry bitmaps as
>   well, but also ->
> * add getters/setters for u8, 16, 32, 64s. Bitmaps are already here.
>   Probably u32 arrays as well? Just copy them directly. And maybe
>   u64 arrays, too (convert Netlink u32s to target u64s)?

I don't think we need the array helpers.

> * should I drop Endianness stuff? With it still in place, it would
>   be easier to use this new API to exchange with e.g. __be64.

If you have a use case, sure. AFAIU the explicit endian types are there
for carrying protocol fields. I don't think there are many protocols
that we deal with in netlink that have big int fields.

> * hope I didn't miss anything?

I think that's it, but I reserve the right to remember something after
you post the code :)

> Thanks a lot, some great ideas here from your side :)

[net-next,0/4] netlink: add 'bitmap' attribute type and API

Message

Comments