mbox series

[v2,00/13] ip_tunnel: convert __be16 tunnel flags to bitmaps

Message ID 20231016165247.14212-1-aleksander.lobakin@intel.com (mailing list archive)
Headers show
Series ip_tunnel: convert __be16 tunnel flags to bitmaps | expand

Message

Alexander Lobakin Oct. 16, 2023, 4:52 p.m. UTC
Based on top of "Implement MTE tag compression for swapped pages"[0]
from Alexander Potapenko as it uses its bitmap_{read,write}() functions
to not introduce another pair of similar ones.

Derived from the PFCP support series[1] as this grew bigger (2 -> 13
commits) and involved more core bitmap changes. Only commits 10 and 11
are from the mentioned tree, the rest is new. PFCP itself still depends
on this series.

IP tunnels have their flags defined as `__be16`, including UAPI, and
after GTP was accepted, there are no more free bits left. UAPI (incl.
direct usage of one of the user structs) and explicit Endianness only
complicate things.
Since it would either way end up with hundreds of locs due to all that,
pick bitmaps right from the start to store the flags in the most native
and scalable format with rich API. I don't think it's worth trying to
praise luck and pick smth like u32 only to redo everything in x years :)
More details regarding the IP tunnel flags is in 11 and 13.

The rest is just a good bunch of prereqs and tests: a couple of new
helpers and extensions to the old ones, a few optimizations to partially
mitigate IP tunnel object code growth due to __be16 -> long, and
decouping one UAPI struct used throughout the whole kernel into the
userspace and the kernel space counterparts to eliminate the dependency.

[0] https://lore.kernel.org/lkml/20231011172836.2579017-1-glider@google.com
[1] https://lore.kernel.org/netdev/20230721071532.613888-1-marcin.szycik@linux.intel.com

Alexander Lobakin (13):
  bitops: add missing prototype check
  bitops: make BYTES_TO_BITS() treewide-available
  bitops: let the compiler optimize {__,}assign_bit()
  linkmode: convert linkmode_{test,set,clear,mod}_bit() to macros
  s390/cio: rename bitmap_size() -> idset_bitmap_size()
  fs/ntfs3: add prefix to bitmap_size() and use BITS_TO_U64()
  btrfs: rename bitmap_set_bits() -> btrfs_bitmap_set_bits()
  bitmap: introduce generic optimized bitmap_size()
  bitmap: make bitmap_{get,set}_value8() use bitmap_{read,write}()
  ip_tunnel: use a separate struct to store tunnel params in the kernel
  ip_tunnel: convert __be16 tunnel flags to bitmaps
  lib/bitmap: add compile-time test for __assign_bit() optimization
  lib/bitmap: add tests for IP tunnel flags conversion helpers

 drivers/md/dm-clone-metadata.c                |   5 -
 drivers/net/bareudp.c                         |  19 ++-
 .../ethernet/mellanox/mlx5/core/en/tc_tun.h   |   2 +-
 .../mellanox/mlx5/core/en/tc_tun_encap.c      |   6 +-
 .../mellanox/mlx5/core/en/tc_tun_geneve.c     |  12 +-
 .../mellanox/mlx5/core/en/tc_tun_gre.c        |   8 +-
 .../mellanox/mlx5/core/en/tc_tun_vxlan.c      |   9 +-
 .../net/ethernet/mellanox/mlx5/core/en_tc.c   |  16 +-
 .../ethernet/mellanox/mlxsw/spectrum_ipip.c   |  56 ++++---
 .../ethernet/mellanox/mlxsw/spectrum_ipip.h   |   2 +-
 .../ethernet/mellanox/mlxsw/spectrum_span.c   |  10 +-
 .../ethernet/netronome/nfp/flower/action.c    |  27 +++-
 drivers/net/geneve.c                          |  44 +++---
 drivers/net/vxlan/vxlan_core.c                |  14 +-
 drivers/s390/cio/idset.c                      |  12 +-
 fs/btrfs/free-space-cache.c                   |   8 +-
 fs/ntfs3/bitmap.c                             |   4 +-
 fs/ntfs3/fsntfs.c                             |   2 +-
 fs/ntfs3/index.c                              |  11 +-
 fs/ntfs3/ntfs_fs.h                            |   4 +-
 fs/ntfs3/super.c                              |   2 +-
 include/linux/bitmap.h                        |  46 ++----
 include/linux/bitops.h                        |  23 +--
 include/linux/cpumask.h                       |   2 +-
 include/linux/linkmode.h                      |  27 +---
 include/linux/netdevice.h                     |   7 +-
 include/net/dst_metadata.h                    |  10 +-
 include/net/flow_dissector.h                  |   2 +-
 include/net/gre.h                             |  70 +++++----
 include/net/ip6_tunnel.h                      |   4 +-
 include/net/ip_tunnels.h                      | 136 ++++++++++++++---
 include/net/udp_tunnel.h                      |   4 +-
 include/uapi/linux/if_tunnel.h                |  33 ++++
 kernel/trace/trace_probe.c                    |   2 -
 lib/math/prime_numbers.c                      |   2 -
 lib/test_bitmap.c                             | 123 ++++++++++++++-
 net/bridge/br_vlan_tunnel.c                   |   9 +-
 net/core/filter.c                             |  26 ++--
 net/core/flow_dissector.c                     |  20 ++-
 net/ipv4/fou_bpf.c                            |   2 +-
 net/ipv4/gre_demux.c                          |   2 +-
 net/ipv4/ip_gre.c                             | 144 +++++++++++-------
 net/ipv4/ip_tunnel.c                          | 109 ++++++++-----
 net/ipv4/ip_tunnel_core.c                     |  82 ++++++----
 net/ipv4/ip_vti.c                             |  41 +++--
 net/ipv4/ipip.c                               |  33 ++--
 net/ipv4/ipmr.c                               |   2 +-
 net/ipv4/udp_tunnel_core.c                    |   5 +-
 net/ipv6/addrconf.c                           |   3 +-
 net/ipv6/ip6_gre.c                            |  85 ++++++-----
 net/ipv6/ip6_tunnel.c                         |  14 +-
 net/ipv6/sit.c                                |  38 ++---
 net/netfilter/ipvs/ip_vs_core.c               |   6 +-
 net/netfilter/ipvs/ip_vs_xmit.c               |  20 +--
 net/netfilter/nft_tunnel.c                    |  44 +++---
 net/openvswitch/flow_netlink.c                |  61 +++++---
 net/psample/psample.c                         |  26 ++--
 net/sched/act_tunnel_key.c                    |  36 ++---
 net/sched/cls_flower.c                        |  27 ++--
 tools/include/linux/bitmap.h                  |   8 +-
 tools/include/linux/bitops.h                  |   2 +
 tools/perf/util/probe-finder.c                |   2 -
 62 files changed, 1011 insertions(+), 600 deletions(-)

---
Not sure whether it's fine to have that all in one series, but OTOH
there's not much stuff I could split (like, 3 commits), it either
depends directly (new helpers etc.) or will just generate suboptimal
code w/o some of the commits.

I'm also thinking of which tree this would ideally be taken through.
The main subject is networking, but most of the commits are generic.
My idea is to push this via Yury / bitmaps and then ask the netdev
maintainers to pull his tree before they take PFCP (dependent on this
one).

From v1[2]:
 * 03: convert assign_bit() to a macro as well, saves some bytes and
   looks more consistent (Yury);
 * 03: enclose each argument into own pair of braces (Yury);
 * 06: use generic BITS_TO_U64() while at it (Yury);
 * 07: pick Acked-by (David);
 * 08: Acked-by, use bitmap_size() in the code from 05 as well (Yury);
 * 09: instead of introducing a new pair of functions, use generic
   bitmap_{read,write}() from [0]. bloat-o-meter shows no regressions
   from the switch (Yury, also Andy).

Old pfcp -> bitmap changelog:

As for former commits (now 10 and 11), almost all of the changes were
suggested by Andy, notably: stop violating bitmap API, use
__assign_bit() where appropriate, and add more tests to make sure
everything works as expected. Apart from that, add simple wrappers for
bitmap_*() used in the IP tunnel code to avoid manually specifying
``__IP_TUNNEL_FLAG_NUM`` each time.

[2] https://lore.kernel.org/lkml/20231009151026.66145-1-aleksander.lobakin@intel.com

Comments

Yury Norov Oct. 16, 2023, 5:54 p.m. UTC | #1
On Mon, Oct 16, 2023 at 06:52:34PM +0200, Alexander Lobakin wrote:
> Based on top of "Implement MTE tag compression for swapped pages"[0]
> from Alexander Potapenko as it uses its bitmap_{read,write}() functions
> to not introduce another pair of similar ones.
> 
> Derived from the PFCP support series[1] as this grew bigger (2 -> 13
> commits) and involved more core bitmap changes. Only commits 10 and 11
> are from the mentioned tree, the rest is new. PFCP itself still depends
> on this series.
> 
> IP tunnels have their flags defined as `__be16`, including UAPI, and
> after GTP was accepted, there are no more free bits left. UAPI (incl.
> direct usage of one of the user structs) and explicit Endianness only
> complicate things.
> Since it would either way end up with hundreds of locs due to all that,
> pick bitmaps right from the start to store the flags in the most native
> and scalable format with rich API. I don't think it's worth trying to
> praise luck and pick smth like u32 only to redo everything in x years :)
> More details regarding the IP tunnel flags is in 11 and 13.
> 
> The rest is just a good bunch of prereqs and tests: a couple of new
> helpers and extensions to the old ones, a few optimizations to partially
> mitigate IP tunnel object code growth due to __be16 -> long, and
> decouping one UAPI struct used throughout the whole kernel into the
> userspace and the kernel space counterparts to eliminate the dependency.
> 
> [0] https://lore.kernel.org/lkml/20231011172836.2579017-1-glider@google.com
> [1] https://lore.kernel.org/netdev/20230721071532.613888-1-marcin.szycik@linux.intel.com
> 
> Alexander Lobakin (13):
>   bitops: add missing prototype check
>   bitops: make BYTES_TO_BITS() treewide-available
>   bitops: let the compiler optimize {__,}assign_bit()
>   linkmode: convert linkmode_{test,set,clear,mod}_bit() to macros
>   s390/cio: rename bitmap_size() -> idset_bitmap_size()
>   fs/ntfs3: add prefix to bitmap_size() and use BITS_TO_U64()
>   btrfs: rename bitmap_set_bits() -> btrfs_bitmap_set_bits()
>   bitmap: introduce generic optimized bitmap_size()
>   bitmap: make bitmap_{get,set}_value8() use bitmap_{read,write}()
>   ip_tunnel: use a separate struct to store tunnel params in the kernel
>   ip_tunnel: convert __be16 tunnel flags to bitmaps
>   lib/bitmap: add compile-time test for __assign_bit() optimization
>   lib/bitmap: add tests for IP tunnel flags conversion helpers
> 
>  drivers/md/dm-clone-metadata.c                |   5 -
>  drivers/net/bareudp.c                         |  19 ++-
>  .../ethernet/mellanox/mlx5/core/en/tc_tun.h   |   2 +-
>  .../mellanox/mlx5/core/en/tc_tun_encap.c      |   6 +-
>  .../mellanox/mlx5/core/en/tc_tun_geneve.c     |  12 +-
>  .../mellanox/mlx5/core/en/tc_tun_gre.c        |   8 +-
>  .../mellanox/mlx5/core/en/tc_tun_vxlan.c      |   9 +-
>  .../net/ethernet/mellanox/mlx5/core/en_tc.c   |  16 +-
>  .../ethernet/mellanox/mlxsw/spectrum_ipip.c   |  56 ++++---
>  .../ethernet/mellanox/mlxsw/spectrum_ipip.h   |   2 +-
>  .../ethernet/mellanox/mlxsw/spectrum_span.c   |  10 +-
>  .../ethernet/netronome/nfp/flower/action.c    |  27 +++-
>  drivers/net/geneve.c                          |  44 +++---
>  drivers/net/vxlan/vxlan_core.c                |  14 +-
>  drivers/s390/cio/idset.c                      |  12 +-
>  fs/btrfs/free-space-cache.c                   |   8 +-
>  fs/ntfs3/bitmap.c                             |   4 +-
>  fs/ntfs3/fsntfs.c                             |   2 +-
>  fs/ntfs3/index.c                              |  11 +-
>  fs/ntfs3/ntfs_fs.h                            |   4 +-
>  fs/ntfs3/super.c                              |   2 +-
>  include/linux/bitmap.h                        |  46 ++----
>  include/linux/bitops.h                        |  23 +--
>  include/linux/cpumask.h                       |   2 +-
>  include/linux/linkmode.h                      |  27 +---
>  include/linux/netdevice.h                     |   7 +-
>  include/net/dst_metadata.h                    |  10 +-
>  include/net/flow_dissector.h                  |   2 +-
>  include/net/gre.h                             |  70 +++++----
>  include/net/ip6_tunnel.h                      |   4 +-
>  include/net/ip_tunnels.h                      | 136 ++++++++++++++---
>  include/net/udp_tunnel.h                      |   4 +-
>  include/uapi/linux/if_tunnel.h                |  33 ++++
>  kernel/trace/trace_probe.c                    |   2 -
>  lib/math/prime_numbers.c                      |   2 -
>  lib/test_bitmap.c                             | 123 ++++++++++++++-
>  net/bridge/br_vlan_tunnel.c                   |   9 +-
>  net/core/filter.c                             |  26 ++--
>  net/core/flow_dissector.c                     |  20 ++-
>  net/ipv4/fou_bpf.c                            |   2 +-
>  net/ipv4/gre_demux.c                          |   2 +-
>  net/ipv4/ip_gre.c                             | 144 +++++++++++-------
>  net/ipv4/ip_tunnel.c                          | 109 ++++++++-----
>  net/ipv4/ip_tunnel_core.c                     |  82 ++++++----
>  net/ipv4/ip_vti.c                             |  41 +++--
>  net/ipv4/ipip.c                               |  33 ++--
>  net/ipv4/ipmr.c                               |   2 +-
>  net/ipv4/udp_tunnel_core.c                    |   5 +-
>  net/ipv6/addrconf.c                           |   3 +-
>  net/ipv6/ip6_gre.c                            |  85 ++++++-----
>  net/ipv6/ip6_tunnel.c                         |  14 +-
>  net/ipv6/sit.c                                |  38 ++---
>  net/netfilter/ipvs/ip_vs_core.c               |   6 +-
>  net/netfilter/ipvs/ip_vs_xmit.c               |  20 +--
>  net/netfilter/nft_tunnel.c                    |  44 +++---
>  net/openvswitch/flow_netlink.c                |  61 +++++---
>  net/psample/psample.c                         |  26 ++--
>  net/sched/act_tunnel_key.c                    |  36 ++---
>  net/sched/cls_flower.c                        |  27 ++--
>  tools/include/linux/bitmap.h                  |   8 +-
>  tools/include/linux/bitops.h                  |   2 +
>  tools/perf/util/probe-finder.c                |   2 -
>  62 files changed, 1011 insertions(+), 600 deletions(-)
> 
> ---
> Not sure whether it's fine to have that all in one series, but OTOH
> there's not much stuff I could split (like, 3 commits), it either
> depends directly (new helpers etc.) or will just generate suboptimal
> code w/o some of the commits.
> 
> I'm also thinking of which tree this would ideally be taken through.
> The main subject is networking, but most of the commits are generic.
> My idea is to push this via Yury / bitmaps and then ask the netdev
> maintainers to pull his tree before they take PFCP (dependent on this
> one).

Let's wait for more comments, but I'm generally OK with the generic
part, and have nothing against moving it, or the whole series, through
bitmap-for-next.

Thanks,
Yury
 
> >From v1[2]:
>  * 03: convert assign_bit() to a macro as well, saves some bytes and
>    looks more consistent (Yury);
>  * 03: enclose each argument into own pair of braces (Yury);
>  * 06: use generic BITS_TO_U64() while at it (Yury);
>  * 07: pick Acked-by (David);
>  * 08: Acked-by, use bitmap_size() in the code from 05 as well (Yury);
>  * 09: instead of introducing a new pair of functions, use generic
>    bitmap_{read,write}() from [0]. bloat-o-meter shows no regressions
>    from the switch (Yury, also Andy).
> 
> Old pfcp -> bitmap changelog:
> 
> As for former commits (now 10 and 11), almost all of the changes were
> suggested by Andy, notably: stop violating bitmap API, use
> __assign_bit() where appropriate, and add more tests to make sure
> everything works as expected. Apart from that, add simple wrappers for
> bitmap_*() used in the IP tunnel code to avoid manually specifying
> ``__IP_TUNNEL_FLAG_NUM`` each time.
> 
> [2] https://lore.kernel.org/lkml/20231009151026.66145-1-aleksander.lobakin@intel.com
> -- 
> 2.41.0
Yury Norov Oct. 20, 2023, 12:46 p.m. UTC | #2
+ Stephen

On Mon, Oct 16, 2023 at 10:55:02AM -0700, Yury Norov wrote:
> On Mon, Oct 16, 2023 at 06:52:34PM +0200, Alexander Lobakin wrote:
> > Based on top of "Implement MTE tag compression for swapped pages"[0]
> > from Alexander Potapenko as it uses its bitmap_{read,write}() functions
> > to not introduce another pair of similar ones.
> > 
> > Derived from the PFCP support series[1] as this grew bigger (2 -> 13
> > commits) and involved more core bitmap changes. Only commits 10 and 11
> > are from the mentioned tree, the rest is new. PFCP itself still depends
> > on this series.
> > 
> > IP tunnels have their flags defined as `__be16`, including UAPI, and
> > after GTP was accepted, there are no more free bits left. UAPI (incl.
> > direct usage of one of the user structs) and explicit Endianness only
> > complicate things.
> > Since it would either way end up with hundreds of locs due to all that,
> > pick bitmaps right from the start to store the flags in the most native
> > and scalable format with rich API. I don't think it's worth trying to
> > praise luck and pick smth like u32 only to redo everything in x years :)
> > More details regarding the IP tunnel flags is in 11 and 13.
> > 
> > The rest is just a good bunch of prereqs and tests: a couple of new
> > helpers and extensions to the old ones, a few optimizations to partially
> > mitigate IP tunnel object code growth due to __be16 -> long, and
> > decouping one UAPI struct used throughout the whole kernel into the
> > userspace and the kernel space counterparts to eliminate the dependency.
> > 
> > [0] https://lore.kernel.org/lkml/20231011172836.2579017-1-glider@google.com
> > [1] https://lore.kernel.org/netdev/20230721071532.613888-1-marcin.szycik@linux.intel.com

[...]

> > ---
> > Not sure whether it's fine to have that all in one series, but OTOH
> > there's not much stuff I could split (like, 3 commits), it either
> > depends directly (new helpers etc.) or will just generate suboptimal
> > code w/o some of the commits.
> > 
> > I'm also thinking of which tree this would ideally be taken through.
> > The main subject is networking, but most of the commits are generic.
> > My idea is to push this via Yury / bitmaps and then ask the netdev
> > maintainers to pull his tree before they take PFCP (dependent on this
> > one).
> 
> Let's wait for more comments, but I'm generally OK with the generic
> part, and have nothing against moving it, or the whole series, through
> bitmap-for-next.

I put this into bitmap-for-next, and it caused build failures for
Stephen:

https://lore.kernel.org/lkml/20220722191657.1d7282c2@canb.auug.org.au/

I can reproduce them too. So, removing from -next. Alexander, can you
run another round with all found issues fixed?

Thanks,
Yury