[net-next] rtnetlink: add RTNH_REJECT_MASK

Message ID	20211126134311.920808-2-alexander.mikhalitsyn@virtuozzo.com (mailing list archive)
State	Changes Requested
Delegated to:	Netdev Maintainers
Headers	show Return-Path: <netdev-owner@kernel.org> From: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com> To: netdev@vger.kernel.org Cc: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>, David Miller <davem@davemloft.net>, David Ahern <dsahern@gmail.com>, Stephen Hemminger <stephen@networkplumber.org>, Ido Schimmel <idosch@nvidia.com>, Jakub Kicinski <kuba@kernel.org>, Roopa Prabhu <roopa@nvidia.com>, Andrei Vagin <avagin@gmail.com>, Pavel Tikhomirov <ptikhomirov@virtuozzo.com>, Alexander Mikhalitsyn <alexander@mihalicyn.com> Subject: [PATCH net-next] rtnetlink: add RTNH_REJECT_MASK Date: Fri, 26 Nov 2021 16:43:11 +0300 Message-Id: <20211126134311.920808-2-alexander.mikhalitsyn@virtuozzo.com> In-Reply-To: <20211126134311.920808-1-alexander.mikhalitsyn@virtuozzo.com> References: <20211111160240.739294-1-alexander.mikhalitsyn@virtuozzo.com> <20211126134311.920808-1-alexander.mikhalitsyn@virtuozzo.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	[net-next] rtnetlink: add RTNH_REJECT_MASK \| expand [net-next] rtnetlink: add RTNH_REJECT_MASK

Context	Check	Description
netdev/tree_selection	success	Clearly marked for net-next
netdev/fixes_present	success	Fixes tag not required for -next series
netdev/subject_prefix	success	Link
netdev/cover_letter	success	Single patches do not need cover letters
netdev/patch_count	success	Link
netdev/header_inline	success	No static functions without inline keyword in header files
netdev/build_32bit	success	Errors and warnings before: 4095 this patch: 4095
netdev/cc_maintainers	warning	5 maintainers not CCed: amcohen@nvidia.com yoshfuji@linux-ipv6.org dsahern@kernel.org me@cooperlees.com petrm@nvidia.com
netdev/build_clang	success	Errors and warnings before: 811 this patch: 811
netdev/module_param	success	Was 0 now: 0
netdev/verify_signedoff	success	Signed-off-by tag matches author and committer
netdev/verify_fixes	success	No Fixes tag
netdev/build_allmodconfig_warn	success	Errors and warnings before: 4228 this patch: 4228
netdev/checkpatch	success	total: 0 errors, 0 warnings, 0 checks, 25 lines checked
netdev/kdoc	success	Errors and warnings before: 0 this patch: 0
netdev/source_inline	success	Was 0 now: 0

Alexander Mikhalitsyn Nov. 26, 2021, 1:43 p.m. UTC

Introduce RTNH_REJECT_MASK mask which contains
all rtnh_flags which can't be set by the userspace
directly.

This mask will be used in the iproute utility
to exclude rtnh_flags which can't be restored
from "ip route save" image.

This patch doesn't change kernel behavior at all.

Please, take a look on
[PATCH iproute2] ip route: save: exclude rtnh_flags which can't be set

Cc: David Miller <davem@davemloft.net>
Cc: David Ahern <dsahern@gmail.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Ido Schimmel <idosch@nvidia.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Roopa Prabhu <roopa@nvidia.com>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Cc: Alexander Mikhalitsyn <alexander@mihalicyn.com>
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
---
 include/uapi/linux/rtnetlink.h | 3 +++
 net/ipv4/fib_semantics.c       | 4 ++--
 2 files changed, 5 insertions(+), 2 deletions(-)

Ido Schimmel Nov. 28, 2021, 2:01 p.m. UTC | #1

On Fri, Nov 26, 2021 at 04:43:11PM +0300, Alexander Mikhalitsyn wrote:
> diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
> index 5888492a5257..9c065e2fdef9 100644
> --- a/include/uapi/linux/rtnetlink.h
> +++ b/include/uapi/linux/rtnetlink.h
> @@ -417,6 +417,9 @@ struct rtnexthop {
>  #define RTNH_COMPARE_MASK	(RTNH_F_DEAD | RTNH_F_LINKDOWN | \
>  				 RTNH_F_OFFLOAD | RTNH_F_TRAP)
>  
> +/* these flags can't be set by the userspace */
> +#define RTNH_REJECT_MASK	(RTNH_F_DEAD | RTNH_F_LINKDOWN)
> +
>  /* Macros to handle hexthops */
>  
>  #define RTNH_ALIGNTO	4
> diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
> index 4c0c33e4710d..805f5e05b56d 100644
> --- a/net/ipv4/fib_semantics.c
> +++ b/net/ipv4/fib_semantics.c
> @@ -685,7 +685,7 @@ static int fib_get_nhs(struct fib_info *fi, struct rtnexthop *rtnh,
>  			return -EINVAL;
>  		}
>  
> -		if (rtnh->rtnh_flags & (RTNH_F_DEAD | RTNH_F_LINKDOWN)) {
> +		if (rtnh->rtnh_flags & RTNH_REJECT_MASK) {
>  			NL_SET_ERR_MSG(extack,
>  				       "Invalid flags for nexthop - can not contain DEAD or LINKDOWN");
>  			return -EINVAL;
> @@ -1363,7 +1363,7 @@ struct fib_info *fib_create_info(struct fib_config *cfg,
>  		goto err_inval;
>  	}
>  
> -	if (cfg->fc_flags & (RTNH_F_DEAD | RTNH_F_LINKDOWN)) {
> +	if (cfg->fc_flags & RTNH_REJECT_MASK) {
>  		NL_SET_ERR_MSG(extack,
>  			       "Invalid rtm_flags - can not contain DEAD or LINKDOWN");

Instead of a deny list as in the legacy nexthop code, the new nexthop
code has an allow list (from rtm_to_nh_config()):

```
	if (nhm->nh_flags & ~NEXTHOP_VALID_USER_FLAGS) {
		NL_SET_ERR_MSG(extack, "Invalid nexthop flags in ancillary header");
		goto out;
	}
```

Where:

```
#define NEXTHOP_VALID_USER_FLAGS RTNH_F_ONLINK
```

So while the legacy nexthop code allows setting flags such as
RTNH_F_OFFLOAD, the new nexthop code denies them. I don't have a use
case for setting these flags from user space so I don't care if we allow
or deny them, but I believe the legacy and new nexthop code should be
consistent.

WDYT? Should we allow these flags in the new nexthop code as well or
keep denying them?

>  		goto err_inval;
> -- 
> 2.31.1
>

David Ahern Nov. 29, 2021, 12:19 a.m. UTC | #2

On 11/28/21 7:01 AM, Ido Schimmel wrote:
> On Fri, Nov 26, 2021 at 04:43:11PM +0300, Alexander Mikhalitsyn wrote:
>> diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
>> index 5888492a5257..9c065e2fdef9 100644
>> --- a/include/uapi/linux/rtnetlink.h
>> +++ b/include/uapi/linux/rtnetlink.h
>> @@ -417,6 +417,9 @@ struct rtnexthop {
>>  #define RTNH_COMPARE_MASK	(RTNH_F_DEAD | RTNH_F_LINKDOWN | \
>>  				 RTNH_F_OFFLOAD | RTNH_F_TRAP)
>>  
>> +/* these flags can't be set by the userspace */
>> +#define RTNH_REJECT_MASK	(RTNH_F_DEAD | RTNH_F_LINKDOWN)
>> +
>>  /* Macros to handle hexthops */
>>  
>>  #define RTNH_ALIGNTO	4
>> diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
>> index 4c0c33e4710d..805f5e05b56d 100644
>> --- a/net/ipv4/fib_semantics.c
>> +++ b/net/ipv4/fib_semantics.c
>> @@ -685,7 +685,7 @@ static int fib_get_nhs(struct fib_info *fi, struct rtnexthop *rtnh,
>>  			return -EINVAL;
>>  		}
>>  
>> -		if (rtnh->rtnh_flags & (RTNH_F_DEAD | RTNH_F_LINKDOWN)) {
>> +		if (rtnh->rtnh_flags & RTNH_REJECT_MASK) {
>>  			NL_SET_ERR_MSG(extack,
>>  				       "Invalid flags for nexthop - can not contain DEAD or LINKDOWN");
>>  			return -EINVAL;
>> @@ -1363,7 +1363,7 @@ struct fib_info *fib_create_info(struct fib_config *cfg,
>>  		goto err_inval;
>>  	}
>>  
>> -	if (cfg->fc_flags & (RTNH_F_DEAD | RTNH_F_LINKDOWN)) {
>> +	if (cfg->fc_flags & RTNH_REJECT_MASK) {
>>  		NL_SET_ERR_MSG(extack,
>>  			       "Invalid rtm_flags - can not contain DEAD or LINKDOWN");
> 
> Instead of a deny list as in the legacy nexthop code, the new nexthop
> code has an allow list (from rtm_to_nh_config()):
> 
> ```
> 	if (nhm->nh_flags & ~NEXTHOP_VALID_USER_FLAGS) {
> 		NL_SET_ERR_MSG(extack, "Invalid nexthop flags in ancillary header");
> 		goto out;
> 	}
> ```
> 
> Where:
> 
> ```
> #define NEXTHOP_VALID_USER_FLAGS RTNH_F_ONLINK
> ```
> 
> So while the legacy nexthop code allows setting flags such as
> RTNH_F_OFFLOAD, the new nexthop code denies them. I don't have a use
> case for setting these flags from user space so I don't care if we allow
> or deny them, but I believe the legacy and new nexthop code should be
> consistent.
> 
> WDYT? Should we allow these flags in the new nexthop code as well or
> keep denying them?
> 
>>  		goto err_inval;

I like the positive naming - RTNH_VALID_USER_FLAGS.

nexthop API should allow the OFFLOAD flag to be consistent; separate
change though.

Ido Schimmel Nov. 30, 2021, 7:59 a.m. UTC | #3

On Sun, Nov 28, 2021 at 05:19:38PM -0700, David Ahern wrote:
> On 11/28/21 7:01 AM, Ido Schimmel wrote:
> > On Fri, Nov 26, 2021 at 04:43:11PM +0300, Alexander Mikhalitsyn wrote:
> >> diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
> >> index 5888492a5257..9c065e2fdef9 100644
> >> --- a/include/uapi/linux/rtnetlink.h
> >> +++ b/include/uapi/linux/rtnetlink.h
> >> @@ -417,6 +417,9 @@ struct rtnexthop {
> >>  #define RTNH_COMPARE_MASK	(RTNH_F_DEAD | RTNH_F_LINKDOWN | \
> >>  				 RTNH_F_OFFLOAD | RTNH_F_TRAP)
> >>  
> >> +/* these flags can't be set by the userspace */
> >> +#define RTNH_REJECT_MASK	(RTNH_F_DEAD | RTNH_F_LINKDOWN)
> >> +
> >>  /* Macros to handle hexthops */
> >>  
> >>  #define RTNH_ALIGNTO	4
> >> diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
> >> index 4c0c33e4710d..805f5e05b56d 100644
> >> --- a/net/ipv4/fib_semantics.c
> >> +++ b/net/ipv4/fib_semantics.c
> >> @@ -685,7 +685,7 @@ static int fib_get_nhs(struct fib_info *fi, struct rtnexthop *rtnh,
> >>  			return -EINVAL;
> >>  		}
> >>  
> >> -		if (rtnh->rtnh_flags & (RTNH_F_DEAD | RTNH_F_LINKDOWN)) {
> >> +		if (rtnh->rtnh_flags & RTNH_REJECT_MASK) {
> >>  			NL_SET_ERR_MSG(extack,
> >>  				       "Invalid flags for nexthop - can not contain DEAD or LINKDOWN");
> >>  			return -EINVAL;
> >> @@ -1363,7 +1363,7 @@ struct fib_info *fib_create_info(struct fib_config *cfg,
> >>  		goto err_inval;
> >>  	}
> >>  
> >> -	if (cfg->fc_flags & (RTNH_F_DEAD | RTNH_F_LINKDOWN)) {
> >> +	if (cfg->fc_flags & RTNH_REJECT_MASK) {
> >>  		NL_SET_ERR_MSG(extack,
> >>  			       "Invalid rtm_flags - can not contain DEAD or LINKDOWN");
> > 
> > Instead of a deny list as in the legacy nexthop code, the new nexthop
> > code has an allow list (from rtm_to_nh_config()):
> > 
> > ```
> > 	if (nhm->nh_flags & ~NEXTHOP_VALID_USER_FLAGS) {
> > 		NL_SET_ERR_MSG(extack, "Invalid nexthop flags in ancillary header");
> > 		goto out;
> > 	}
> > ```
> > 
> > Where:
> > 
> > ```
> > #define NEXTHOP_VALID_USER_FLAGS RTNH_F_ONLINK
> > ```
> > 
> > So while the legacy nexthop code allows setting flags such as
> > RTNH_F_OFFLOAD, the new nexthop code denies them. I don't have a use
> > case for setting these flags from user space so I don't care if we allow
> > or deny them, but I believe the legacy and new nexthop code should be
> > consistent.
> > 
> > WDYT? Should we allow these flags in the new nexthop code as well or
> > keep denying them?
> > 
> >>  		goto err_inval;
> 
> I like the positive naming - RTNH_VALID_USER_FLAGS.

I don't think we can move the legacy code to the same allow list as the
new nexthop code without potentially breaking user space. The legacy
code allows for much more flags to be set in the ancillary header than
the new nexthop code.

Looking at the patch again, what is the motivation to expose
RTNH_REJECT_MASK to user space? iproute2 already knows that it only
makes sense to set RTNH_F_ONLINK. Can't we just do:

diff --git a/ip/iproute.c b/ip/iproute.c
index 1447a5f78f49..0e6dad2b67e5 100644
--- a/ip/iproute.c
+++ b/ip/iproute.c
@@ -1632,6 +1632,8 @@ static int save_route(struct nlmsghdr *n, void *arg)
        if (!filter_nlmsg(n, tb, host_len))
                return 0;
 
+       r->rtm_flags &= ~RTNH_F_ONLINK;
+
        ret = write(STDOUT_FILENO, n, n->nlmsg_len);
        if ((ret > 0) && (ret != n->nlmsg_len)) {
                fprintf(stderr, "Short write while saving nlmsg\n");

> 
> nexthop API should allow the OFFLOAD flag to be consistent; separate
> change though.
>

Alexander Mikhalitsyn Nov. 30, 2021, 8:18 a.m. UTC | #4

On Sun, 28 Nov 2021 16:01:27 +0200
Ido Schimmel <idosch@idosch.org> wrote:

> On Fri, Nov 26, 2021 at 04:43:11PM +0300, Alexander Mikhalitsyn wrote:
> > diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
> > index 5888492a5257..9c065e2fdef9 100644
> > --- a/include/uapi/linux/rtnetlink.h
> > +++ b/include/uapi/linux/rtnetlink.h
> > @@ -417,6 +417,9 @@ struct rtnexthop {
> >  #define RTNH_COMPARE_MASK	(RTNH_F_DEAD | RTNH_F_LINKDOWN | \
> >  				 RTNH_F_OFFLOAD | RTNH_F_TRAP)
> >  
> > +/* these flags can't be set by the userspace */
> > +#define RTNH_REJECT_MASK	(RTNH_F_DEAD | RTNH_F_LINKDOWN)
> > +
> >  /* Macros to handle hexthops */
> >  
> >  #define RTNH_ALIGNTO	4
> > diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
> > index 4c0c33e4710d..805f5e05b56d 100644
> > --- a/net/ipv4/fib_semantics.c
> > +++ b/net/ipv4/fib_semantics.c
> > @@ -685,7 +685,7 @@ static int fib_get_nhs(struct fib_info *fi, struct rtnexthop *rtnh,
> >  			return -EINVAL;
> >  		}
> >  
> > -		if (rtnh->rtnh_flags & (RTNH_F_DEAD | RTNH_F_LINKDOWN)) {
> > +		if (rtnh->rtnh_flags & RTNH_REJECT_MASK) {
> >  			NL_SET_ERR_MSG(extack,
> >  				       "Invalid flags for nexthop - can not contain DEAD or LINKDOWN");
> >  			return -EINVAL;
> > @@ -1363,7 +1363,7 @@ struct fib_info *fib_create_info(struct fib_config *cfg,
> >  		goto err_inval;
> >  	}
> >  
> > -	if (cfg->fc_flags & (RTNH_F_DEAD | RTNH_F_LINKDOWN)) {
> > +	if (cfg->fc_flags & RTNH_REJECT_MASK) {
> >  		NL_SET_ERR_MSG(extack,
> >  			       "Invalid rtm_flags - can not contain DEAD or LINKDOWN");
> 
> Instead of a deny list as in the legacy nexthop code, the new nexthop
> code has an allow list (from rtm_to_nh_config()):
> 
> ```
> 	if (nhm->nh_flags & ~NEXTHOP_VALID_USER_FLAGS) {
> 		NL_SET_ERR_MSG(extack, "Invalid nexthop flags in ancillary header");
> 		goto out;
> 	}
> ```
> 
> Where:
> 
> ```
> #define NEXTHOP_VALID_USER_FLAGS RTNH_F_ONLINK
> ```
> 
> So while the legacy nexthop code allows setting flags such as
> RTNH_F_OFFLOAD, the new nexthop code denies them. I don't have a use
> case for setting these flags from user space so I don't care if we allow
> or deny them, but I believe the legacy and new nexthop code should be
> consistent.

Dear Ido,

thanks for your attention to the patches and our checkpoint/restore problem.

Yep, I've read nexthop code too and notices some inconsistencies, but
unfortunately I'm newbie here and my first goal is to fix thing and not break
something, that's why my patch is so trivial and not invasive :)

We have some discussion about these flags here:
https://lore.kernel.org/netdev/d7c2d8fa-052e-b941-2ef1-830c1ba655c1@gmail.com/#r

I've noticed, that current iproute2 code not allows us to set RTNH_F_OFFLOAD and
RTNH_F_TRAP directly. And asked If we should prohibit setting these flags from
the userspace. But huge thanks to Roopa and David here - it turned out that some
userspace code usings these flags and sets it.

So, let's decide which flags we should allow to set from the userspace side
and which not. I'm ready to prepare all needed changes for both the kernel and
iproute2 side. ;)

> 
> WDYT? Should we allow these flags in the new nexthop code as well or
> keep denying them?

IMHO, we should try to be consistent between the new nexthop code and the lagacy one.

Regards,
Alex

> 
> >  		goto err_inval;
> > -- 
> > 2.31.1
> >

Alexander Mikhalitsyn Nov. 30, 2021, 8:35 a.m. UTC | #5

On Tue, 30 Nov 2021 09:59:25 +0200
Ido Schimmel <idosch@idosch.org> wrote:

> On Sun, Nov 28, 2021 at 05:19:38PM -0700, David Ahern wrote:
> > On 11/28/21 7:01 AM, Ido Schimmel wrote:
> > > On Fri, Nov 26, 2021 at 04:43:11PM +0300, Alexander Mikhalitsyn wrote:
> > >> diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
> > >> index 5888492a5257..9c065e2fdef9 100644
> > >> --- a/include/uapi/linux/rtnetlink.h
> > >> +++ b/include/uapi/linux/rtnetlink.h
> > >> @@ -417,6 +417,9 @@ struct rtnexthop {
> > >>  #define RTNH_COMPARE_MASK	(RTNH_F_DEAD | RTNH_F_LINKDOWN | \
> > >>  				 RTNH_F_OFFLOAD | RTNH_F_TRAP)
> > >>  
> > >> +/* these flags can't be set by the userspace */
> > >> +#define RTNH_REJECT_MASK	(RTNH_F_DEAD | RTNH_F_LINKDOWN)
> > >> +
> > >>  /* Macros to handle hexthops */
> > >>  
> > >>  #define RTNH_ALIGNTO	4
> > >> diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
> > >> index 4c0c33e4710d..805f5e05b56d 100644
> > >> --- a/net/ipv4/fib_semantics.c
> > >> +++ b/net/ipv4/fib_semantics.c
> > >> @@ -685,7 +685,7 @@ static int fib_get_nhs(struct fib_info *fi, struct rtnexthop *rtnh,
> > >>  			return -EINVAL;
> > >>  		}
> > >>  
> > >> -		if (rtnh->rtnh_flags & (RTNH_F_DEAD | RTNH_F_LINKDOWN)) {
> > >> +		if (rtnh->rtnh_flags & RTNH_REJECT_MASK) {
> > >>  			NL_SET_ERR_MSG(extack,
> > >>  				       "Invalid flags for nexthop - can not contain DEAD or LINKDOWN");
> > >>  			return -EINVAL;
> > >> @@ -1363,7 +1363,7 @@ struct fib_info *fib_create_info(struct fib_config *cfg,
> > >>  		goto err_inval;
> > >>  	}
> > >>  
> > >> -	if (cfg->fc_flags & (RTNH_F_DEAD | RTNH_F_LINKDOWN)) {
> > >> +	if (cfg->fc_flags & RTNH_REJECT_MASK) {
> > >>  		NL_SET_ERR_MSG(extack,
> > >>  			       "Invalid rtm_flags - can not contain DEAD or LINKDOWN");
> > > 
> > > Instead of a deny list as in the legacy nexthop code, the new nexthop
> > > code has an allow list (from rtm_to_nh_config()):
> > > 
> > > ```
> > > 	if (nhm->nh_flags & ~NEXTHOP_VALID_USER_FLAGS) {
> > > 		NL_SET_ERR_MSG(extack, "Invalid nexthop flags in ancillary header");
> > > 		goto out;
> > > 	}
> > > ```
> > > 
> > > Where:
> > > 
> > > ```
> > > #define NEXTHOP_VALID_USER_FLAGS RTNH_F_ONLINK
> > > ```
> > > 
> > > So while the legacy nexthop code allows setting flags such as
> > > RTNH_F_OFFLOAD, the new nexthop code denies them. I don't have a use
> > > case for setting these flags from user space so I don't care if we allow
> > > or deny them, but I believe the legacy and new nexthop code should be
> > > consistent.
> > > 
> > > WDYT? Should we allow these flags in the new nexthop code as well or
> > > keep denying them?
> > > 
> > >>  		goto err_inval;
> > 
> > I like the positive naming - RTNH_VALID_USER_FLAGS.
> 
> I don't think we can move the legacy code to the same allow list as the
> new nexthop code without potentially breaking user space. The legacy
> code allows for much more flags to be set in the ancillary header than
> the new nexthop code.


Hello, Ido

agreed, let's keep this side unchanged

> 
> Looking at the patch again, what is the motivation to expose
> RTNH_REJECT_MASK to user space? iproute2 already knows that it only
> makes sense to set RTNH_F_ONLINK. Can't we just do:

Sorry, but that's not fully clear for me, why we should exclude RTNH_F_ONLINK?
I thought that we should exclude RTNH_F_DEAD and RTNH_F_LINKDOWN just because
kernel doesn't allow to set these flags.

I'd also thought about another approach - "offload" this flags filtering
problems to the kernel side for better iproute dump images compatibility.

Now we dump all routes using netlink message like this
	struct {
		struct nlmsghdr nlh;
		struct rtmsg rtm;
		char buf[128];
	} req = {
		.nlh.nlmsg_len = NLMSG_LENGTH(sizeof(struct rtmsg)),
		.nlh.nlmsg_type = RTM_GETROUTE,
		.nlh.nlmsg_flags = NLM_F_DUMP | NLM_F_REQUEST,
...
	};

But we can introduce some "special" flag like NLM_F_FILTERED_DUMP (or something like that)
	} req = {
		.nlh.nlmsg_len = NLMSG_LENGTH(sizeof(struct rtmsg)),
		.nlh.nlmsg_type = RTM_GETROUTE,
		.nlh.nlmsg_flags = NLM_F_FILTERED_DUMP | NLM_F_REQUEST,
...
	};

The idea here is that the kernel nows better which flags should be omitted from the dump
(<=> which flags is prohibited to set directly from the userspace side).

But that change is more "global". WDYT about this?

I'm ready to implement any of the approaches with your kind advice.

Alex

> 
> diff --git a/ip/iproute.c b/ip/iproute.c
> index 1447a5f78f49..0e6dad2b67e5 100644
> --- a/ip/iproute.c
> +++ b/ip/iproute.c
> @@ -1632,6 +1632,8 @@ static int save_route(struct nlmsghdr *n, void *arg)
>         if (!filter_nlmsg(n, tb, host_len))
>                 return 0;
>  
> +       r->rtm_flags &= ~RTNH_F_ONLINK;
> +
>         ret = write(STDOUT_FILENO, n, n->nlmsg_len);
>         if ((ret > 0) && (ret != n->nlmsg_len)) {
>                 fprintf(stderr, "Short write while saving nlmsg\n");
> 
> > 
> > nexthop API should allow the OFFLOAD flag to be consistent; separate
> > change though.
> >

Ido Schimmel Nov. 30, 2021, 9:28 a.m. UTC | #6

On Tue, Nov 30, 2021 at 11:35:17AM +0300, Alexander Mikhalitsyn wrote:
> On Tue, 30 Nov 2021 09:59:25 +0200
> Ido Schimmel <idosch@idosch.org> wrote:
> > Looking at the patch again, what is the motivation to expose
> > RTNH_REJECT_MASK to user space? iproute2 already knows that it only
> > makes sense to set RTNH_F_ONLINK. Can't we just do:
> 
> Sorry, but that's not fully clear for me, why we should exclude RTNH_F_ONLINK?
> I thought that we should exclude RTNH_F_DEAD and RTNH_F_LINKDOWN just because
> kernel doesn't allow to set these flags.

I don't think we should exclude RTNH_F_ONLINK. I'm saying that it is the
only flag that it makes sense to send to the kernel in the ancillary
header of RTM_NEWROUTE messages. The rest of the RNTH_F_* flags are
either not used by the kernel or are only meant to be sent from the
kernel to user space. Due to omission, they are mistakenly allowed.

Therefore, I think that the only necessary patch is an iproute2 patch
that makes sure that during save/restore you are clearing all the
RTNH_F_* flags but RTNH_F_ONLINK.

BTW, looking at save_route() in iproute2, I think the patch only clears
these flags from the ancillary header, but not from 'struct rtnexthop'
that is nested in RTA_MULTIPATH for multipath routes. See this blog post
for depiction of the message:
http://codecave.cc/multipath-routing-in-linux-part-1.html

> 
> I'd also thought about another approach - "offload" this flags filtering
> problems to the kernel side for better iproute dump images compatibility.
> 
> Now we dump all routes using netlink message like this
> 	struct {
> 		struct nlmsghdr nlh;
> 		struct rtmsg rtm;
> 		char buf[128];
> 	} req = {
> 		.nlh.nlmsg_len = NLMSG_LENGTH(sizeof(struct rtmsg)),
> 		.nlh.nlmsg_type = RTM_GETROUTE,
> 		.nlh.nlmsg_flags = NLM_F_DUMP | NLM_F_REQUEST,
> ...
> 	};
> 
> But we can introduce some "special" flag like NLM_F_FILTERED_DUMP (or something like that)
> 	} req = {
> 		.nlh.nlmsg_len = NLMSG_LENGTH(sizeof(struct rtmsg)),
> 		.nlh.nlmsg_type = RTM_GETROUTE,
> 		.nlh.nlmsg_flags = NLM_F_FILTERED_DUMP | NLM_F_REQUEST,
> ...
> 	};
> 
> The idea here is that the kernel nows better which flags should be omitted from the dump
> (<=> which flags is prohibited to set directly from the userspace side).
> 
> But that change is more "global". WDYT about this?
> 
> I'm ready to implement any of the approaches with your kind advice.

Having the kernel filter RO flags upon RTM_GETROUTE with a new special
flag / attribute would be easiest to implement in iproute2 (especially
if my comment about RTA_MULTIPATH is correct), but it's a quite invasive
change that requires new uAPI.

Personally, I think that if something can be done in user space, then I
would do it in user space instead of adding new uAPI.

Alexander Mikhalitsyn Nov. 30, 2021, 9:53 a.m. UTC | #7

On Tue, 30 Nov 2021 11:28:32 +0200
Ido Schimmel <idosch@idosch.org> wrote:

> On Tue, Nov 30, 2021 at 11:35:17AM +0300, Alexander Mikhalitsyn wrote:
> > On Tue, 30 Nov 2021 09:59:25 +0200
> > Ido Schimmel <idosch@idosch.org> wrote:
> > > Looking at the patch again, what is the motivation to expose
> > > RTNH_REJECT_MASK to user space? iproute2 already knows that it only
> > > makes sense to set RTNH_F_ONLINK. Can't we just do:
> > 
> > Sorry, but that's not fully clear for me, why we should exclude RTNH_F_ONLINK?
> > I thought that we should exclude RTNH_F_DEAD and RTNH_F_LINKDOWN just because
> > kernel doesn't allow to set these flags.
> 
> I don't think we should exclude RTNH_F_ONLINK. I'm saying that it is the
> only flag that it makes sense to send to the kernel in the ancillary
> header of RTM_NEWROUTE messages. The rest of the RNTH_F_* flags are
> either not used by the kernel or are only meant to be sent from the
> kernel to user space. Due to omission, they are mistakenly allowed.

Ah, okay, so, the patch should be like

diff --git a/ip/iproute.c b/ip/iproute.c
index 1447a5f78f49..0e6dad2b67e5 100644
--- a/ip/iproute.c
+++ b/ip/iproute.c
@@ -1632,6 +1632,8 @@ static int save_route(struct nlmsghdr *n, void *arg)
        if (!filter_nlmsg(n, tb, host_len))
                return 0;
 
+       r->rtm_flags &= RTNH_F_ONLINK;
+
        ret = write(STDOUT_FILENO, n, n->nlmsg_len);
        if ((ret > 0) && (ret != n->nlmsg_len)) {
                fprintf(stderr, "Short write while saving nlmsg\n");

to filter out all flags *except* RTNH_F_ONLINK.

But what about discussion from
https://lore.kernel.org/netdev/ff405eae-21d9-35f4-1397-b6f9a29a57ff@nvidia.com/

As far as I understand Roopa, we have to save at least RTNH_F_OFFLOAD flag too,
for instance, if user uses Cumulus and want to dump/restore routes.

I'm sorry if I misunderstood something.

> 
> Therefore, I think that the only necessary patch is an iproute2 patch
> that makes sure that during save/restore you are clearing all the
> RTNH_F_* flags but RTNH_F_ONLINK.
> 
> BTW, looking at save_route() in iproute2, I think the patch only clears
> these flags from the ancillary header, but not from 'struct rtnexthop'
> that is nested in RTA_MULTIPATH for multipath routes. See this blog post
> for depiction of the message:
> http://codecave.cc/multipath-routing-in-linux-part-1.html

Sure, I will handle these nested structures too.

> 
> > 
> > I'd also thought about another approach - "offload" this flags filtering
> > problems to the kernel side for better iproute dump images compatibility.
> > 
> > Now we dump all routes using netlink message like this
> > 	struct {
> > 		struct nlmsghdr nlh;
> > 		struct rtmsg rtm;
> > 		char buf[128];
> > 	} req = {
> > 		.nlh.nlmsg_len = NLMSG_LENGTH(sizeof(struct rtmsg)),
> > 		.nlh.nlmsg_type = RTM_GETROUTE,
> > 		.nlh.nlmsg_flags = NLM_F_DUMP | NLM_F_REQUEST,
> > ...
> > 	};
> > 
> > But we can introduce some "special" flag like NLM_F_FILTERED_DUMP (or something like that)
> > 	} req = {
> > 		.nlh.nlmsg_len = NLMSG_LENGTH(sizeof(struct rtmsg)),
> > 		.nlh.nlmsg_type = RTM_GETROUTE,
> > 		.nlh.nlmsg_flags = NLM_F_FILTERED_DUMP | NLM_F_REQUEST,
> > ...
> > 	};
> > 
> > The idea here is that the kernel nows better which flags should be omitted from the dump
> > (<=> which flags is prohibited to set directly from the userspace side).
> > 
> > But that change is more "global". WDYT about this?
> > 
> > I'm ready to implement any of the approaches with your kind advice.
> 
> Having the kernel filter RO flags upon RTM_GETROUTE with a new special
> flag / attribute would be easiest to implement in iproute2 (especially
> if my comment about RTA_MULTIPATH is correct), but it's a quite invasive
> change that requires new uAPI.
> 
> Personally, I think that if something can be done in user space, then I
> would do it in user space instead of adding new uAPI.

agreed

Ido Schimmel Nov. 30, 2021, 10:28 a.m. UTC | #8

On Tue, Nov 30, 2021 at 12:53:52PM +0300, Alexander Mikhalitsyn wrote:
> On Tue, 30 Nov 2021 11:28:32 +0200
> Ido Schimmel <idosch@idosch.org> wrote:
> 
> > On Tue, Nov 30, 2021 at 11:35:17AM +0300, Alexander Mikhalitsyn wrote:
> > > On Tue, 30 Nov 2021 09:59:25 +0200
> > > Ido Schimmel <idosch@idosch.org> wrote:
> > > > Looking at the patch again, what is the motivation to expose
> > > > RTNH_REJECT_MASK to user space? iproute2 already knows that it only
> > > > makes sense to set RTNH_F_ONLINK. Can't we just do:
> > > 
> > > Sorry, but that's not fully clear for me, why we should exclude RTNH_F_ONLINK?
> > > I thought that we should exclude RTNH_F_DEAD and RTNH_F_LINKDOWN just because
> > > kernel doesn't allow to set these flags.
> > 
> > I don't think we should exclude RTNH_F_ONLINK. I'm saying that it is the
> > only flag that it makes sense to send to the kernel in the ancillary
> > header of RTM_NEWROUTE messages. The rest of the RNTH_F_* flags are
> > either not used by the kernel or are only meant to be sent from the
> > kernel to user space. Due to omission, they are mistakenly allowed.
> 
> Ah, okay, so, the patch should be like
> 
> diff --git a/ip/iproute.c b/ip/iproute.c
> index 1447a5f78f49..0e6dad2b67e5 100644
> --- a/ip/iproute.c
> +++ b/ip/iproute.c
> @@ -1632,6 +1632,8 @@ static int save_route(struct nlmsghdr *n, void *arg)
>         if (!filter_nlmsg(n, tb, host_len))
>                 return 0;
>  
> +       r->rtm_flags &= RTNH_F_ONLINK;
> +
>         ret = write(STDOUT_FILENO, n, n->nlmsg_len);
>         if ((ret > 0) && (ret != n->nlmsg_len)) {
>                 fprintf(stderr, "Short write while saving nlmsg\n");
> 
> to filter out all flags *except* RTNH_F_ONLINK.

Yes

> 
> But what about discussion from
> https://lore.kernel.org/netdev/ff405eae-21d9-35f4-1397-b6f9a29a57ff@nvidia.com/
> 
> As far as I understand Roopa, we have to save at least RTNH_F_OFFLOAD flag too,
> for instance, if user uses Cumulus and want to dump/restore routes.
> 
> I'm sorry if I misunderstood something.

Roopa, do you see a problem with the above patch?

David Ahern Nov. 30, 2021, 3:12 p.m. UTC | #9

On 11/30/21 3:28 AM, Ido Schimmel wrote:
>> diff --git a/ip/iproute.c b/ip/iproute.c
>> index 1447a5f78f49..0e6dad2b67e5 100644
>> --- a/ip/iproute.c
>> +++ b/ip/iproute.c
>> @@ -1632,6 +1632,8 @@ static int save_route(struct nlmsghdr *n, void *arg)
>>         if (!filter_nlmsg(n, tb, host_len))
>>                 return 0;
>>  
>> +       r->rtm_flags &= RTNH_F_ONLINK;
>> +
>>         ret = write(STDOUT_FILENO, n, n->nlmsg_len);
>>         if ((ret > 0) && (ret != n->nlmsg_len)) {
>>                 fprintf(stderr, "Short write while saving nlmsg\n");
>>
>> to filter out all flags *except* RTNH_F_ONLINK.
> 
> Yes
> 
>>
>> But what about discussion from
>> https://lore.kernel.org/netdev/ff405eae-21d9-35f4-1397-b6f9a29a57ff@nvidia.com/
>>
>> As far as I understand Roopa, we have to save at least RTNH_F_OFFLOAD flag too,
>> for instance, if user uses Cumulus and want to dump/restore routes.
>>
>> I'm sorry if I misunderstood something.
> 
> Roopa, do you see a problem with the above patch?
> 

The offload flag can be set from userspace but seems to me that should
only be done by the process that talks to hardware. Using iproute2 to
dump routes and then restore them should not set that flag.

[net-next] rtnetlink: add RTNH_REJECT_MASK

Checks

Commit Message

Comments

Patch