diff mbox series

[RFC,03/13] nexthop: Add netlink defines and enumerators for resilient NH groups

Message ID 893e22e2ad6413a98ca76134b332c8962fcd3b6a.1612815058.git.petrm@nvidia.com (mailing list archive)
State RFC
Delegated to: Netdev Maintainers
Headers show
Series nexthop: Resilient next-hop groups | expand

Checks

Context Check Description
netdev/cover_letter success Link
netdev/fixes_present success Link
netdev/patch_count success Link
netdev/tree_selection success Guessed tree name to be net-next
netdev/subject_prefix success Link
netdev/cc_maintainers warning 11 maintainers not CCed: yoshfuji@linux-ipv6.org eparis@parisplace.org selinux@vger.kernel.org amcohen@nvidia.com henrik.bjoernlund@microchip.com horatiu.vultur@microchip.com nikolay@nvidia.com acassen@gmail.com paul@paul-moore.com stephen.smalley.work@gmail.com vlad@buslov.dev
netdev/source_inline success Was 0 now: 0
netdev/verify_signedoff success Link
netdev/module_param success Was 0 now: 0
netdev/build_32bit success Errors and warnings before: 6123 this patch: 6123
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/verify_fixes success Link
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 95 lines checked
netdev/build_allmodconfig_warn success Errors and warnings before: 6516 this patch: 6516
netdev/header_inline success Link
netdev/stable success Stable not CCed

Commit Message

Petr Machata Feb. 8, 2021, 8:42 p.m. UTC
From: Ido Schimmel <idosch@nvidia.com>

- RTM_NEWNEXTHOP et.al. that handle resilient groups will have a new nested
  attribute, NHA_RES_GROUP, whose elements are attributes NHA_RES_GROUP_*.

- RTM_NEWNEXTHOPBUCKET et.al. is a suite of new messages that will
  currently serve only for dumping of individual buckets of resilient next
  hop groups. For nexthop group buckets, these messages will carry a nested
  attribute NHA_RES_BUCKET, whose elements are attributes NHA_RES_BUCKET_*.

  There are several reasons why a new suite of messages is created for
  nexthop buckets instead of overloading the information on the existing
  RTM_{NEW,DEL,GET}NEXTHOP messages.

  First, a nexthop group can contain a large number of nexthop buckets (4k
  is not unheard of). This imposes limits on the amount of information that
  can be encoded for each nexthop bucket given a netlink message is limited
  to 64k bytes.

  Second, while RTM_NEWNEXTHOPBUCKET is only used for notifications at
  this point, in the future it can be extended to provide user space with
  control over nexthop buckets configuration.

- The new group type is NEXTHOP_GRP_TYPE_RES. Note that nexthop code is
  adjusted to bounce groups with that type for now.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
---
 include/uapi/linux/nexthop.h   | 43 ++++++++++++++++++++++++++++++++++
 include/uapi/linux/rtnetlink.h |  7 ++++++
 net/ipv4/nexthop.c             |  2 ++
 security/selinux/nlmsgtab.c    |  5 +++-
 4 files changed, 56 insertions(+), 1 deletion(-)

Comments

David Ahern Feb. 13, 2021, 7:16 p.m. UTC | #1
On 2/8/21 1:42 PM, Petr Machata wrote:
> @@ -52,8 +53,50 @@ enum {
>  	NHA_FDB,	/* flag; nexthop belongs to a bridge fdb */
>  	/* if NHA_FDB is added, OIF, BLACKHOLE, ENCAP cannot be set */
>  
> +	/* nested; resilient nexthop group attributes */
> +	NHA_RES_GROUP,
> +	/* nested; nexthop bucket attributes */
> +	NHA_RES_BUCKET,
> +
>  	__NHA_MAX,
>  };
>  
>  #define NHA_MAX	(__NHA_MAX - 1)
> +
> +enum {
> +	NHA_RES_GROUP_UNSPEC,
> +	/* Pad attribute for 64-bit alignment. */
> +	NHA_RES_GROUP_PAD = NHA_RES_GROUP_UNSPEC,
> +
> +	/* u32; number of nexthop buckets in a resilient nexthop group */
> +	NHA_RES_GROUP_BUCKETS,

u32 is overkill; arguably u16 (64k) should be more than enough buckets
for any real use case.
Ido Schimmel Feb. 13, 2021, 8:17 p.m. UTC | #2
On Sat, Feb 13, 2021 at 12:16:45PM -0700, David Ahern wrote:
> On 2/8/21 1:42 PM, Petr Machata wrote:
> > @@ -52,8 +53,50 @@ enum {
> >  	NHA_FDB,	/* flag; nexthop belongs to a bridge fdb */
> >  	/* if NHA_FDB is added, OIF, BLACKHOLE, ENCAP cannot be set */
> >  
> > +	/* nested; resilient nexthop group attributes */
> > +	NHA_RES_GROUP,
> > +	/* nested; nexthop bucket attributes */
> > +	NHA_RES_BUCKET,
> > +
> >  	__NHA_MAX,
> >  };
> >  
> >  #define NHA_MAX	(__NHA_MAX - 1)
> > +
> > +enum {
> > +	NHA_RES_GROUP_UNSPEC,
> > +	/* Pad attribute for 64-bit alignment. */
> > +	NHA_RES_GROUP_PAD = NHA_RES_GROUP_UNSPEC,
> > +
> > +	/* u32; number of nexthop buckets in a resilient nexthop group */
> > +	NHA_RES_GROUP_BUCKETS,
> 
> u32 is overkill; arguably u16 (64k) should be more than enough buckets
> for any real use case.

We wanted to make it future-proof, but I think we can live with 64k. At
least in Spectrum the maximum is 4k. I don't know about other devices,
but I guess it is not more than 64k.
Petr Machata Feb. 15, 2021, 11:43 a.m. UTC | #3
Ido Schimmel <idosch@idosch.org> writes:

> On Sat, Feb 13, 2021 at 12:16:45PM -0700, David Ahern wrote:
>> On 2/8/21 1:42 PM, Petr Machata wrote:
>> > @@ -52,8 +53,50 @@ enum {
>> >  	NHA_FDB,	/* flag; nexthop belongs to a bridge fdb */
>> >  	/* if NHA_FDB is added, OIF, BLACKHOLE, ENCAP cannot be set */
>> >  
>> > +	/* nested; resilient nexthop group attributes */
>> > +	NHA_RES_GROUP,
>> > +	/* nested; nexthop bucket attributes */
>> > +	NHA_RES_BUCKET,
>> > +
>> >  	__NHA_MAX,
>> >  };
>> >  
>> >  #define NHA_MAX	(__NHA_MAX - 1)
>> > +
>> > +enum {
>> > +	NHA_RES_GROUP_UNSPEC,
>> > +	/* Pad attribute for 64-bit alignment. */
>> > +	NHA_RES_GROUP_PAD = NHA_RES_GROUP_UNSPEC,
>> > +
>> > +	/* u32; number of nexthop buckets in a resilient nexthop group */
>> > +	NHA_RES_GROUP_BUCKETS,
>> 
>> u32 is overkill; arguably u16 (64k) should be more than enough buckets
>> for any real use case.
>
> We wanted to make it future-proof, but I think we can live with 64k. At
> least in Spectrum the maximum is 4k. I don't know about other devices,
> but I guess it is not more than 64k.

OK, no problem. I was thinking of keeping the API as u32, and tracking
as u16 internally, but let's not add baggage at this stage already. Push
comes to shove there can be another u32 attribute mutexed with this one.
diff mbox series

Patch

diff --git a/include/uapi/linux/nexthop.h b/include/uapi/linux/nexthop.h
index 2d4a1e784cf0..624460bc2d93 100644
--- a/include/uapi/linux/nexthop.h
+++ b/include/uapi/linux/nexthop.h
@@ -22,6 +22,7 @@  struct nexthop_grp {
 
 enum {
 	NEXTHOP_GRP_TYPE_MPATH,  /* default type if not specified */
+	NEXTHOP_GRP_TYPE_RES,    /* resilient nexthop group */
 	__NEXTHOP_GRP_TYPE_MAX,
 };
 
@@ -52,8 +53,50 @@  enum {
 	NHA_FDB,	/* flag; nexthop belongs to a bridge fdb */
 	/* if NHA_FDB is added, OIF, BLACKHOLE, ENCAP cannot be set */
 
+	/* nested; resilient nexthop group attributes */
+	NHA_RES_GROUP,
+	/* nested; nexthop bucket attributes */
+	NHA_RES_BUCKET,
+
 	__NHA_MAX,
 };
 
 #define NHA_MAX	(__NHA_MAX - 1)
+
+enum {
+	NHA_RES_GROUP_UNSPEC,
+	/* Pad attribute for 64-bit alignment. */
+	NHA_RES_GROUP_PAD = NHA_RES_GROUP_UNSPEC,
+
+	/* u32; number of nexthop buckets in a resilient nexthop group */
+	NHA_RES_GROUP_BUCKETS,
+	/* clock_t as u32; nexthop bucket idle timer (per-group) */
+	NHA_RES_GROUP_IDLE_TIMER,
+	/* clock_t as u32; nexthop unbalanced timer */
+	NHA_RES_GROUP_UNBALANCED_TIMER,
+	/* clock_t as u64; nexthop unbalanced time */
+	NHA_RES_GROUP_UNBALANCED_TIME,
+
+	__NHA_RES_GROUP_MAX,
+};
+
+#define NHA_RES_GROUP_MAX	(__NHA_RES_GROUP_MAX - 1)
+
+enum {
+	NHA_RES_BUCKET_UNSPEC,
+	/* Pad attribute for 64-bit alignment. */
+	NHA_RES_BUCKET_PAD = NHA_RES_BUCKET_UNSPEC,
+
+	/* u32; nexthop bucket index */
+	NHA_RES_BUCKET_INDEX,
+	/* clock_t as u64; nexthop bucket idle time */
+	NHA_RES_BUCKET_IDLE_TIME,
+	/* u32; nexthop id assigned to the nexthop bucket */
+	NHA_RES_BUCKET_NH_ID,
+
+	__NHA_RES_BUCKET_MAX,
+};
+
+#define NHA_RES_BUCKET_MAX	(__NHA_RES_BUCKET_MAX - 1)
+
 #endif
diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
index 91e4ca064d61..d35953bc7d53 100644
--- a/include/uapi/linux/rtnetlink.h
+++ b/include/uapi/linux/rtnetlink.h
@@ -178,6 +178,13 @@  enum {
 	RTM_GETVLAN,
 #define RTM_GETVLAN	RTM_GETVLAN
 
+	RTM_NEWNEXTHOPBUCKET = 116,
+#define RTM_NEWNEXTHOPBUCKET	RTM_NEWNEXTHOPBUCKET
+	RTM_DELNEXTHOPBUCKET,
+#define RTM_DELNEXTHOPBUCKET	RTM_DELNEXTHOPBUCKET
+	RTM_GETNEXTHOPBUCKET,
+#define RTM_GETNEXTHOPBUCKET	RTM_GETNEXTHOPBUCKET
+
 	__RTM_MAX,
 #define RTM_MAX		(((__RTM_MAX + 3) & ~3) - 1)
 };
diff --git a/net/ipv4/nexthop.c b/net/ipv4/nexthop.c
index 7b687bca0b87..5d560d381070 100644
--- a/net/ipv4/nexthop.c
+++ b/net/ipv4/nexthop.c
@@ -1486,6 +1486,8 @@  static struct nexthop *nexthop_create_group(struct net *net,
 
 	if (cfg->nh_grp_type == NEXTHOP_GRP_TYPE_MPATH)
 		nhg->mpath = 1;
+	else if (cfg->nh_grp_type == NEXTHOP_GRP_TYPE_RES)
+		goto out_no_nh;
 
 	WARN_ON_ONCE(nhg->mpath != 1);
 
diff --git a/security/selinux/nlmsgtab.c b/security/selinux/nlmsgtab.c
index b69231918686..d59276f48d4f 100644
--- a/security/selinux/nlmsgtab.c
+++ b/security/selinux/nlmsgtab.c
@@ -88,6 +88,9 @@  static const struct nlmsg_perm nlmsg_route_perms[] =
 	{ RTM_NEWVLAN,		NETLINK_ROUTE_SOCKET__NLMSG_WRITE },
 	{ RTM_DELVLAN,		NETLINK_ROUTE_SOCKET__NLMSG_WRITE },
 	{ RTM_GETVLAN,		NETLINK_ROUTE_SOCKET__NLMSG_READ  },
+	{ RTM_NEWNEXTHOPBUCKET,	NETLINK_ROUTE_SOCKET__NLMSG_WRITE },
+	{ RTM_DELNEXTHOPBUCKET,	NETLINK_ROUTE_SOCKET__NLMSG_WRITE },
+	{ RTM_GETNEXTHOPBUCKET,	NETLINK_ROUTE_SOCKET__NLMSG_READ  },
 };
 
 static const struct nlmsg_perm nlmsg_tcpdiag_perms[] =
@@ -171,7 +174,7 @@  int selinux_nlmsg_lookup(u16 sclass, u16 nlmsg_type, u32 *perm)
 		 * structures at the top of this file with the new mappings
 		 * before updating the BUILD_BUG_ON() macro!
 		 */
-		BUILD_BUG_ON(RTM_MAX != (RTM_NEWVLAN + 3));
+		BUILD_BUG_ON(RTM_MAX != (RTM_NEWNEXTHOPBUCKET + 3));
 		err = nlmsg_perm(nlmsg_type, perm, nlmsg_route_perms,
 				 sizeof(nlmsg_route_perms));
 		break;