Message ID | 20211126134311.920808-2-alexander.mikhalitsyn@virtuozzo.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net-next] rtnetlink: add RTNH_REJECT_MASK | expand |
On Fri, Nov 26, 2021 at 04:43:11PM +0300, Alexander Mikhalitsyn wrote: > diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h > index 5888492a5257..9c065e2fdef9 100644 > --- a/include/uapi/linux/rtnetlink.h > +++ b/include/uapi/linux/rtnetlink.h > @@ -417,6 +417,9 @@ struct rtnexthop { > #define RTNH_COMPARE_MASK (RTNH_F_DEAD | RTNH_F_LINKDOWN | \ > RTNH_F_OFFLOAD | RTNH_F_TRAP) > > +/* these flags can't be set by the userspace */ > +#define RTNH_REJECT_MASK (RTNH_F_DEAD | RTNH_F_LINKDOWN) > + > /* Macros to handle hexthops */ > > #define RTNH_ALIGNTO 4 > diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c > index 4c0c33e4710d..805f5e05b56d 100644 > --- a/net/ipv4/fib_semantics.c > +++ b/net/ipv4/fib_semantics.c > @@ -685,7 +685,7 @@ static int fib_get_nhs(struct fib_info *fi, struct rtnexthop *rtnh, > return -EINVAL; > } > > - if (rtnh->rtnh_flags & (RTNH_F_DEAD | RTNH_F_LINKDOWN)) { > + if (rtnh->rtnh_flags & RTNH_REJECT_MASK) { > NL_SET_ERR_MSG(extack, > "Invalid flags for nexthop - can not contain DEAD or LINKDOWN"); > return -EINVAL; > @@ -1363,7 +1363,7 @@ struct fib_info *fib_create_info(struct fib_config *cfg, > goto err_inval; > } > > - if (cfg->fc_flags & (RTNH_F_DEAD | RTNH_F_LINKDOWN)) { > + if (cfg->fc_flags & RTNH_REJECT_MASK) { > NL_SET_ERR_MSG(extack, > "Invalid rtm_flags - can not contain DEAD or LINKDOWN"); Instead of a deny list as in the legacy nexthop code, the new nexthop code has an allow list (from rtm_to_nh_config()): ``` if (nhm->nh_flags & ~NEXTHOP_VALID_USER_FLAGS) { NL_SET_ERR_MSG(extack, "Invalid nexthop flags in ancillary header"); goto out; } ``` Where: ``` #define NEXTHOP_VALID_USER_FLAGS RTNH_F_ONLINK ``` So while the legacy nexthop code allows setting flags such as RTNH_F_OFFLOAD, the new nexthop code denies them. I don't have a use case for setting these flags from user space so I don't care if we allow or deny them, but I believe the legacy and new nexthop code should be consistent. WDYT? Should we allow these flags in the new nexthop code as well or keep denying them? > goto err_inval; > -- > 2.31.1 >
On 11/28/21 7:01 AM, Ido Schimmel wrote: > On Fri, Nov 26, 2021 at 04:43:11PM +0300, Alexander Mikhalitsyn wrote: >> diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h >> index 5888492a5257..9c065e2fdef9 100644 >> --- a/include/uapi/linux/rtnetlink.h >> +++ b/include/uapi/linux/rtnetlink.h >> @@ -417,6 +417,9 @@ struct rtnexthop { >> #define RTNH_COMPARE_MASK (RTNH_F_DEAD | RTNH_F_LINKDOWN | \ >> RTNH_F_OFFLOAD | RTNH_F_TRAP) >> >> +/* these flags can't be set by the userspace */ >> +#define RTNH_REJECT_MASK (RTNH_F_DEAD | RTNH_F_LINKDOWN) >> + >> /* Macros to handle hexthops */ >> >> #define RTNH_ALIGNTO 4 >> diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c >> index 4c0c33e4710d..805f5e05b56d 100644 >> --- a/net/ipv4/fib_semantics.c >> +++ b/net/ipv4/fib_semantics.c >> @@ -685,7 +685,7 @@ static int fib_get_nhs(struct fib_info *fi, struct rtnexthop *rtnh, >> return -EINVAL; >> } >> >> - if (rtnh->rtnh_flags & (RTNH_F_DEAD | RTNH_F_LINKDOWN)) { >> + if (rtnh->rtnh_flags & RTNH_REJECT_MASK) { >> NL_SET_ERR_MSG(extack, >> "Invalid flags for nexthop - can not contain DEAD or LINKDOWN"); >> return -EINVAL; >> @@ -1363,7 +1363,7 @@ struct fib_info *fib_create_info(struct fib_config *cfg, >> goto err_inval; >> } >> >> - if (cfg->fc_flags & (RTNH_F_DEAD | RTNH_F_LINKDOWN)) { >> + if (cfg->fc_flags & RTNH_REJECT_MASK) { >> NL_SET_ERR_MSG(extack, >> "Invalid rtm_flags - can not contain DEAD or LINKDOWN"); > > Instead of a deny list as in the legacy nexthop code, the new nexthop > code has an allow list (from rtm_to_nh_config()): > > ``` > if (nhm->nh_flags & ~NEXTHOP_VALID_USER_FLAGS) { > NL_SET_ERR_MSG(extack, "Invalid nexthop flags in ancillary header"); > goto out; > } > ``` > > Where: > > ``` > #define NEXTHOP_VALID_USER_FLAGS RTNH_F_ONLINK > ``` > > So while the legacy nexthop code allows setting flags such as > RTNH_F_OFFLOAD, the new nexthop code denies them. I don't have a use > case for setting these flags from user space so I don't care if we allow > or deny them, but I believe the legacy and new nexthop code should be > consistent. > > WDYT? Should we allow these flags in the new nexthop code as well or > keep denying them? > >> goto err_inval; I like the positive naming - RTNH_VALID_USER_FLAGS. nexthop API should allow the OFFLOAD flag to be consistent; separate change though.
On Sun, Nov 28, 2021 at 05:19:38PM -0700, David Ahern wrote: > On 11/28/21 7:01 AM, Ido Schimmel wrote: > > On Fri, Nov 26, 2021 at 04:43:11PM +0300, Alexander Mikhalitsyn wrote: > >> diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h > >> index 5888492a5257..9c065e2fdef9 100644 > >> --- a/include/uapi/linux/rtnetlink.h > >> +++ b/include/uapi/linux/rtnetlink.h > >> @@ -417,6 +417,9 @@ struct rtnexthop { > >> #define RTNH_COMPARE_MASK (RTNH_F_DEAD | RTNH_F_LINKDOWN | \ > >> RTNH_F_OFFLOAD | RTNH_F_TRAP) > >> > >> +/* these flags can't be set by the userspace */ > >> +#define RTNH_REJECT_MASK (RTNH_F_DEAD | RTNH_F_LINKDOWN) > >> + > >> /* Macros to handle hexthops */ > >> > >> #define RTNH_ALIGNTO 4 > >> diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c > >> index 4c0c33e4710d..805f5e05b56d 100644 > >> --- a/net/ipv4/fib_semantics.c > >> +++ b/net/ipv4/fib_semantics.c > >> @@ -685,7 +685,7 @@ static int fib_get_nhs(struct fib_info *fi, struct rtnexthop *rtnh, > >> return -EINVAL; > >> } > >> > >> - if (rtnh->rtnh_flags & (RTNH_F_DEAD | RTNH_F_LINKDOWN)) { > >> + if (rtnh->rtnh_flags & RTNH_REJECT_MASK) { > >> NL_SET_ERR_MSG(extack, > >> "Invalid flags for nexthop - can not contain DEAD or LINKDOWN"); > >> return -EINVAL; > >> @@ -1363,7 +1363,7 @@ struct fib_info *fib_create_info(struct fib_config *cfg, > >> goto err_inval; > >> } > >> > >> - if (cfg->fc_flags & (RTNH_F_DEAD | RTNH_F_LINKDOWN)) { > >> + if (cfg->fc_flags & RTNH_REJECT_MASK) { > >> NL_SET_ERR_MSG(extack, > >> "Invalid rtm_flags - can not contain DEAD or LINKDOWN"); > > > > Instead of a deny list as in the legacy nexthop code, the new nexthop > > code has an allow list (from rtm_to_nh_config()): > > > > ``` > > if (nhm->nh_flags & ~NEXTHOP_VALID_USER_FLAGS) { > > NL_SET_ERR_MSG(extack, "Invalid nexthop flags in ancillary header"); > > goto out; > > } > > ``` > > > > Where: > > > > ``` > > #define NEXTHOP_VALID_USER_FLAGS RTNH_F_ONLINK > > ``` > > > > So while the legacy nexthop code allows setting flags such as > > RTNH_F_OFFLOAD, the new nexthop code denies them. I don't have a use > > case for setting these flags from user space so I don't care if we allow > > or deny them, but I believe the legacy and new nexthop code should be > > consistent. > > > > WDYT? Should we allow these flags in the new nexthop code as well or > > keep denying them? > > > >> goto err_inval; > > I like the positive naming - RTNH_VALID_USER_FLAGS. I don't think we can move the legacy code to the same allow list as the new nexthop code without potentially breaking user space. The legacy code allows for much more flags to be set in the ancillary header than the new nexthop code. Looking at the patch again, what is the motivation to expose RTNH_REJECT_MASK to user space? iproute2 already knows that it only makes sense to set RTNH_F_ONLINK. Can't we just do: diff --git a/ip/iproute.c b/ip/iproute.c index 1447a5f78f49..0e6dad2b67e5 100644 --- a/ip/iproute.c +++ b/ip/iproute.c @@ -1632,6 +1632,8 @@ static int save_route(struct nlmsghdr *n, void *arg) if (!filter_nlmsg(n, tb, host_len)) return 0; + r->rtm_flags &= ~RTNH_F_ONLINK; + ret = write(STDOUT_FILENO, n, n->nlmsg_len); if ((ret > 0) && (ret != n->nlmsg_len)) { fprintf(stderr, "Short write while saving nlmsg\n"); > > nexthop API should allow the OFFLOAD flag to be consistent; separate > change though. >
On Sun, 28 Nov 2021 16:01:27 +0200 Ido Schimmel <idosch@idosch.org> wrote: > On Fri, Nov 26, 2021 at 04:43:11PM +0300, Alexander Mikhalitsyn wrote: > > diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h > > index 5888492a5257..9c065e2fdef9 100644 > > --- a/include/uapi/linux/rtnetlink.h > > +++ b/include/uapi/linux/rtnetlink.h > > @@ -417,6 +417,9 @@ struct rtnexthop { > > #define RTNH_COMPARE_MASK (RTNH_F_DEAD | RTNH_F_LINKDOWN | \ > > RTNH_F_OFFLOAD | RTNH_F_TRAP) > > > > +/* these flags can't be set by the userspace */ > > +#define RTNH_REJECT_MASK (RTNH_F_DEAD | RTNH_F_LINKDOWN) > > + > > /* Macros to handle hexthops */ > > > > #define RTNH_ALIGNTO 4 > > diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c > > index 4c0c33e4710d..805f5e05b56d 100644 > > --- a/net/ipv4/fib_semantics.c > > +++ b/net/ipv4/fib_semantics.c > > @@ -685,7 +685,7 @@ static int fib_get_nhs(struct fib_info *fi, struct rtnexthop *rtnh, > > return -EINVAL; > > } > > > > - if (rtnh->rtnh_flags & (RTNH_F_DEAD | RTNH_F_LINKDOWN)) { > > + if (rtnh->rtnh_flags & RTNH_REJECT_MASK) { > > NL_SET_ERR_MSG(extack, > > "Invalid flags for nexthop - can not contain DEAD or LINKDOWN"); > > return -EINVAL; > > @@ -1363,7 +1363,7 @@ struct fib_info *fib_create_info(struct fib_config *cfg, > > goto err_inval; > > } > > > > - if (cfg->fc_flags & (RTNH_F_DEAD | RTNH_F_LINKDOWN)) { > > + if (cfg->fc_flags & RTNH_REJECT_MASK) { > > NL_SET_ERR_MSG(extack, > > "Invalid rtm_flags - can not contain DEAD or LINKDOWN"); > > Instead of a deny list as in the legacy nexthop code, the new nexthop > code has an allow list (from rtm_to_nh_config()): > > ``` > if (nhm->nh_flags & ~NEXTHOP_VALID_USER_FLAGS) { > NL_SET_ERR_MSG(extack, "Invalid nexthop flags in ancillary header"); > goto out; > } > ``` > > Where: > > ``` > #define NEXTHOP_VALID_USER_FLAGS RTNH_F_ONLINK > ``` > > So while the legacy nexthop code allows setting flags such as > RTNH_F_OFFLOAD, the new nexthop code denies them. I don't have a use > case for setting these flags from user space so I don't care if we allow > or deny them, but I believe the legacy and new nexthop code should be > consistent. Dear Ido, thanks for your attention to the patches and our checkpoint/restore problem. Yep, I've read nexthop code too and notices some inconsistencies, but unfortunately I'm newbie here and my first goal is to fix thing and not break something, that's why my patch is so trivial and not invasive :) We have some discussion about these flags here: https://lore.kernel.org/netdev/d7c2d8fa-052e-b941-2ef1-830c1ba655c1@gmail.com/#r I've noticed, that current iproute2 code not allows us to set RTNH_F_OFFLOAD and RTNH_F_TRAP directly. And asked If we should prohibit setting these flags from the userspace. But huge thanks to Roopa and David here - it turned out that some userspace code usings these flags and sets it. So, let's decide which flags we should allow to set from the userspace side and which not. I'm ready to prepare all needed changes for both the kernel and iproute2 side. ;) > > WDYT? Should we allow these flags in the new nexthop code as well or > keep denying them? IMHO, we should try to be consistent between the new nexthop code and the lagacy one. Regards, Alex > > > goto err_inval; > > -- > > 2.31.1 > >
On Tue, 30 Nov 2021 09:59:25 +0200 Ido Schimmel <idosch@idosch.org> wrote: > On Sun, Nov 28, 2021 at 05:19:38PM -0700, David Ahern wrote: > > On 11/28/21 7:01 AM, Ido Schimmel wrote: > > > On Fri, Nov 26, 2021 at 04:43:11PM +0300, Alexander Mikhalitsyn wrote: > > >> diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h > > >> index 5888492a5257..9c065e2fdef9 100644 > > >> --- a/include/uapi/linux/rtnetlink.h > > >> +++ b/include/uapi/linux/rtnetlink.h > > >> @@ -417,6 +417,9 @@ struct rtnexthop { > > >> #define RTNH_COMPARE_MASK (RTNH_F_DEAD | RTNH_F_LINKDOWN | \ > > >> RTNH_F_OFFLOAD | RTNH_F_TRAP) > > >> > > >> +/* these flags can't be set by the userspace */ > > >> +#define RTNH_REJECT_MASK (RTNH_F_DEAD | RTNH_F_LINKDOWN) > > >> + > > >> /* Macros to handle hexthops */ > > >> > > >> #define RTNH_ALIGNTO 4 > > >> diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c > > >> index 4c0c33e4710d..805f5e05b56d 100644 > > >> --- a/net/ipv4/fib_semantics.c > > >> +++ b/net/ipv4/fib_semantics.c > > >> @@ -685,7 +685,7 @@ static int fib_get_nhs(struct fib_info *fi, struct rtnexthop *rtnh, > > >> return -EINVAL; > > >> } > > >> > > >> - if (rtnh->rtnh_flags & (RTNH_F_DEAD | RTNH_F_LINKDOWN)) { > > >> + if (rtnh->rtnh_flags & RTNH_REJECT_MASK) { > > >> NL_SET_ERR_MSG(extack, > > >> "Invalid flags for nexthop - can not contain DEAD or LINKDOWN"); > > >> return -EINVAL; > > >> @@ -1363,7 +1363,7 @@ struct fib_info *fib_create_info(struct fib_config *cfg, > > >> goto err_inval; > > >> } > > >> > > >> - if (cfg->fc_flags & (RTNH_F_DEAD | RTNH_F_LINKDOWN)) { > > >> + if (cfg->fc_flags & RTNH_REJECT_MASK) { > > >> NL_SET_ERR_MSG(extack, > > >> "Invalid rtm_flags - can not contain DEAD or LINKDOWN"); > > > > > > Instead of a deny list as in the legacy nexthop code, the new nexthop > > > code has an allow list (from rtm_to_nh_config()): > > > > > > ``` > > > if (nhm->nh_flags & ~NEXTHOP_VALID_USER_FLAGS) { > > > NL_SET_ERR_MSG(extack, "Invalid nexthop flags in ancillary header"); > > > goto out; > > > } > > > ``` > > > > > > Where: > > > > > > ``` > > > #define NEXTHOP_VALID_USER_FLAGS RTNH_F_ONLINK > > > ``` > > > > > > So while the legacy nexthop code allows setting flags such as > > > RTNH_F_OFFLOAD, the new nexthop code denies them. I don't have a use > > > case for setting these flags from user space so I don't care if we allow > > > or deny them, but I believe the legacy and new nexthop code should be > > > consistent. > > > > > > WDYT? Should we allow these flags in the new nexthop code as well or > > > keep denying them? > > > > > >> goto err_inval; > > > > I like the positive naming - RTNH_VALID_USER_FLAGS. > > I don't think we can move the legacy code to the same allow list as the > new nexthop code without potentially breaking user space. The legacy > code allows for much more flags to be set in the ancillary header than > the new nexthop code. Hello, Ido agreed, let's keep this side unchanged > > Looking at the patch again, what is the motivation to expose > RTNH_REJECT_MASK to user space? iproute2 already knows that it only > makes sense to set RTNH_F_ONLINK. Can't we just do: Sorry, but that's not fully clear for me, why we should exclude RTNH_F_ONLINK? I thought that we should exclude RTNH_F_DEAD and RTNH_F_LINKDOWN just because kernel doesn't allow to set these flags. I'd also thought about another approach - "offload" this flags filtering problems to the kernel side for better iproute dump images compatibility. Now we dump all routes using netlink message like this struct { struct nlmsghdr nlh; struct rtmsg rtm; char buf[128]; } req = { .nlh.nlmsg_len = NLMSG_LENGTH(sizeof(struct rtmsg)), .nlh.nlmsg_type = RTM_GETROUTE, .nlh.nlmsg_flags = NLM_F_DUMP | NLM_F_REQUEST, ... }; But we can introduce some "special" flag like NLM_F_FILTERED_DUMP (or something like that) } req = { .nlh.nlmsg_len = NLMSG_LENGTH(sizeof(struct rtmsg)), .nlh.nlmsg_type = RTM_GETROUTE, .nlh.nlmsg_flags = NLM_F_FILTERED_DUMP | NLM_F_REQUEST, ... }; The idea here is that the kernel nows better which flags should be omitted from the dump (<=> which flags is prohibited to set directly from the userspace side). But that change is more "global". WDYT about this? I'm ready to implement any of the approaches with your kind advice. Alex > > diff --git a/ip/iproute.c b/ip/iproute.c > index 1447a5f78f49..0e6dad2b67e5 100644 > --- a/ip/iproute.c > +++ b/ip/iproute.c > @@ -1632,6 +1632,8 @@ static int save_route(struct nlmsghdr *n, void *arg) > if (!filter_nlmsg(n, tb, host_len)) > return 0; > > + r->rtm_flags &= ~RTNH_F_ONLINK; > + > ret = write(STDOUT_FILENO, n, n->nlmsg_len); > if ((ret > 0) && (ret != n->nlmsg_len)) { > fprintf(stderr, "Short write while saving nlmsg\n"); > > > > > nexthop API should allow the OFFLOAD flag to be consistent; separate > > change though. > >
On Tue, Nov 30, 2021 at 11:35:17AM +0300, Alexander Mikhalitsyn wrote: > On Tue, 30 Nov 2021 09:59:25 +0200 > Ido Schimmel <idosch@idosch.org> wrote: > > Looking at the patch again, what is the motivation to expose > > RTNH_REJECT_MASK to user space? iproute2 already knows that it only > > makes sense to set RTNH_F_ONLINK. Can't we just do: > > Sorry, but that's not fully clear for me, why we should exclude RTNH_F_ONLINK? > I thought that we should exclude RTNH_F_DEAD and RTNH_F_LINKDOWN just because > kernel doesn't allow to set these flags. I don't think we should exclude RTNH_F_ONLINK. I'm saying that it is the only flag that it makes sense to send to the kernel in the ancillary header of RTM_NEWROUTE messages. The rest of the RNTH_F_* flags are either not used by the kernel or are only meant to be sent from the kernel to user space. Due to omission, they are mistakenly allowed. Therefore, I think that the only necessary patch is an iproute2 patch that makes sure that during save/restore you are clearing all the RTNH_F_* flags but RTNH_F_ONLINK. BTW, looking at save_route() in iproute2, I think the patch only clears these flags from the ancillary header, but not from 'struct rtnexthop' that is nested in RTA_MULTIPATH for multipath routes. See this blog post for depiction of the message: http://codecave.cc/multipath-routing-in-linux-part-1.html > > I'd also thought about another approach - "offload" this flags filtering > problems to the kernel side for better iproute dump images compatibility. > > Now we dump all routes using netlink message like this > struct { > struct nlmsghdr nlh; > struct rtmsg rtm; > char buf[128]; > } req = { > .nlh.nlmsg_len = NLMSG_LENGTH(sizeof(struct rtmsg)), > .nlh.nlmsg_type = RTM_GETROUTE, > .nlh.nlmsg_flags = NLM_F_DUMP | NLM_F_REQUEST, > ... > }; > > But we can introduce some "special" flag like NLM_F_FILTERED_DUMP (or something like that) > } req = { > .nlh.nlmsg_len = NLMSG_LENGTH(sizeof(struct rtmsg)), > .nlh.nlmsg_type = RTM_GETROUTE, > .nlh.nlmsg_flags = NLM_F_FILTERED_DUMP | NLM_F_REQUEST, > ... > }; > > The idea here is that the kernel nows better which flags should be omitted from the dump > (<=> which flags is prohibited to set directly from the userspace side). > > But that change is more "global". WDYT about this? > > I'm ready to implement any of the approaches with your kind advice. Having the kernel filter RO flags upon RTM_GETROUTE with a new special flag / attribute would be easiest to implement in iproute2 (especially if my comment about RTA_MULTIPATH is correct), but it's a quite invasive change that requires new uAPI. Personally, I think that if something can be done in user space, then I would do it in user space instead of adding new uAPI.
On Tue, 30 Nov 2021 11:28:32 +0200 Ido Schimmel <idosch@idosch.org> wrote: > On Tue, Nov 30, 2021 at 11:35:17AM +0300, Alexander Mikhalitsyn wrote: > > On Tue, 30 Nov 2021 09:59:25 +0200 > > Ido Schimmel <idosch@idosch.org> wrote: > > > Looking at the patch again, what is the motivation to expose > > > RTNH_REJECT_MASK to user space? iproute2 already knows that it only > > > makes sense to set RTNH_F_ONLINK. Can't we just do: > > > > Sorry, but that's not fully clear for me, why we should exclude RTNH_F_ONLINK? > > I thought that we should exclude RTNH_F_DEAD and RTNH_F_LINKDOWN just because > > kernel doesn't allow to set these flags. > > I don't think we should exclude RTNH_F_ONLINK. I'm saying that it is the > only flag that it makes sense to send to the kernel in the ancillary > header of RTM_NEWROUTE messages. The rest of the RNTH_F_* flags are > either not used by the kernel or are only meant to be sent from the > kernel to user space. Due to omission, they are mistakenly allowed. Ah, okay, so, the patch should be like diff --git a/ip/iproute.c b/ip/iproute.c index 1447a5f78f49..0e6dad2b67e5 100644 --- a/ip/iproute.c +++ b/ip/iproute.c @@ -1632,6 +1632,8 @@ static int save_route(struct nlmsghdr *n, void *arg) if (!filter_nlmsg(n, tb, host_len)) return 0; + r->rtm_flags &= RTNH_F_ONLINK; + ret = write(STDOUT_FILENO, n, n->nlmsg_len); if ((ret > 0) && (ret != n->nlmsg_len)) { fprintf(stderr, "Short write while saving nlmsg\n"); to filter out all flags *except* RTNH_F_ONLINK. But what about discussion from https://lore.kernel.org/netdev/ff405eae-21d9-35f4-1397-b6f9a29a57ff@nvidia.com/ As far as I understand Roopa, we have to save at least RTNH_F_OFFLOAD flag too, for instance, if user uses Cumulus and want to dump/restore routes. I'm sorry if I misunderstood something. > > Therefore, I think that the only necessary patch is an iproute2 patch > that makes sure that during save/restore you are clearing all the > RTNH_F_* flags but RTNH_F_ONLINK. > > BTW, looking at save_route() in iproute2, I think the patch only clears > these flags from the ancillary header, but not from 'struct rtnexthop' > that is nested in RTA_MULTIPATH for multipath routes. See this blog post > for depiction of the message: > http://codecave.cc/multipath-routing-in-linux-part-1.html Sure, I will handle these nested structures too. > > > > > I'd also thought about another approach - "offload" this flags filtering > > problems to the kernel side for better iproute dump images compatibility. > > > > Now we dump all routes using netlink message like this > > struct { > > struct nlmsghdr nlh; > > struct rtmsg rtm; > > char buf[128]; > > } req = { > > .nlh.nlmsg_len = NLMSG_LENGTH(sizeof(struct rtmsg)), > > .nlh.nlmsg_type = RTM_GETROUTE, > > .nlh.nlmsg_flags = NLM_F_DUMP | NLM_F_REQUEST, > > ... > > }; > > > > But we can introduce some "special" flag like NLM_F_FILTERED_DUMP (or something like that) > > } req = { > > .nlh.nlmsg_len = NLMSG_LENGTH(sizeof(struct rtmsg)), > > .nlh.nlmsg_type = RTM_GETROUTE, > > .nlh.nlmsg_flags = NLM_F_FILTERED_DUMP | NLM_F_REQUEST, > > ... > > }; > > > > The idea here is that the kernel nows better which flags should be omitted from the dump > > (<=> which flags is prohibited to set directly from the userspace side). > > > > But that change is more "global". WDYT about this? > > > > I'm ready to implement any of the approaches with your kind advice. > > Having the kernel filter RO flags upon RTM_GETROUTE with a new special > flag / attribute would be easiest to implement in iproute2 (especially > if my comment about RTA_MULTIPATH is correct), but it's a quite invasive > change that requires new uAPI. > > Personally, I think that if something can be done in user space, then I > would do it in user space instead of adding new uAPI. agreed
On Tue, Nov 30, 2021 at 12:53:52PM +0300, Alexander Mikhalitsyn wrote: > On Tue, 30 Nov 2021 11:28:32 +0200 > Ido Schimmel <idosch@idosch.org> wrote: > > > On Tue, Nov 30, 2021 at 11:35:17AM +0300, Alexander Mikhalitsyn wrote: > > > On Tue, 30 Nov 2021 09:59:25 +0200 > > > Ido Schimmel <idosch@idosch.org> wrote: > > > > Looking at the patch again, what is the motivation to expose > > > > RTNH_REJECT_MASK to user space? iproute2 already knows that it only > > > > makes sense to set RTNH_F_ONLINK. Can't we just do: > > > > > > Sorry, but that's not fully clear for me, why we should exclude RTNH_F_ONLINK? > > > I thought that we should exclude RTNH_F_DEAD and RTNH_F_LINKDOWN just because > > > kernel doesn't allow to set these flags. > > > > I don't think we should exclude RTNH_F_ONLINK. I'm saying that it is the > > only flag that it makes sense to send to the kernel in the ancillary > > header of RTM_NEWROUTE messages. The rest of the RNTH_F_* flags are > > either not used by the kernel or are only meant to be sent from the > > kernel to user space. Due to omission, they are mistakenly allowed. > > Ah, okay, so, the patch should be like > > diff --git a/ip/iproute.c b/ip/iproute.c > index 1447a5f78f49..0e6dad2b67e5 100644 > --- a/ip/iproute.c > +++ b/ip/iproute.c > @@ -1632,6 +1632,8 @@ static int save_route(struct nlmsghdr *n, void *arg) > if (!filter_nlmsg(n, tb, host_len)) > return 0; > > + r->rtm_flags &= RTNH_F_ONLINK; > + > ret = write(STDOUT_FILENO, n, n->nlmsg_len); > if ((ret > 0) && (ret != n->nlmsg_len)) { > fprintf(stderr, "Short write while saving nlmsg\n"); > > to filter out all flags *except* RTNH_F_ONLINK. Yes > > But what about discussion from > https://lore.kernel.org/netdev/ff405eae-21d9-35f4-1397-b6f9a29a57ff@nvidia.com/ > > As far as I understand Roopa, we have to save at least RTNH_F_OFFLOAD flag too, > for instance, if user uses Cumulus and want to dump/restore routes. > > I'm sorry if I misunderstood something. Roopa, do you see a problem with the above patch?
On 11/30/21 3:28 AM, Ido Schimmel wrote: >> diff --git a/ip/iproute.c b/ip/iproute.c >> index 1447a5f78f49..0e6dad2b67e5 100644 >> --- a/ip/iproute.c >> +++ b/ip/iproute.c >> @@ -1632,6 +1632,8 @@ static int save_route(struct nlmsghdr *n, void *arg) >> if (!filter_nlmsg(n, tb, host_len)) >> return 0; >> >> + r->rtm_flags &= RTNH_F_ONLINK; >> + >> ret = write(STDOUT_FILENO, n, n->nlmsg_len); >> if ((ret > 0) && (ret != n->nlmsg_len)) { >> fprintf(stderr, "Short write while saving nlmsg\n"); >> >> to filter out all flags *except* RTNH_F_ONLINK. > > Yes > >> >> But what about discussion from >> https://lore.kernel.org/netdev/ff405eae-21d9-35f4-1397-b6f9a29a57ff@nvidia.com/ >> >> As far as I understand Roopa, we have to save at least RTNH_F_OFFLOAD flag too, >> for instance, if user uses Cumulus and want to dump/restore routes. >> >> I'm sorry if I misunderstood something. > > Roopa, do you see a problem with the above patch? > The offload flag can be set from userspace but seems to me that should only be done by the process that talks to hardware. Using iproute2 to dump routes and then restore them should not set that flag.
diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h index 5888492a5257..9c065e2fdef9 100644 --- a/include/uapi/linux/rtnetlink.h +++ b/include/uapi/linux/rtnetlink.h @@ -417,6 +417,9 @@ struct rtnexthop { #define RTNH_COMPARE_MASK (RTNH_F_DEAD | RTNH_F_LINKDOWN | \ RTNH_F_OFFLOAD | RTNH_F_TRAP) +/* these flags can't be set by the userspace */ +#define RTNH_REJECT_MASK (RTNH_F_DEAD | RTNH_F_LINKDOWN) + /* Macros to handle hexthops */ #define RTNH_ALIGNTO 4 diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c index 4c0c33e4710d..805f5e05b56d 100644 --- a/net/ipv4/fib_semantics.c +++ b/net/ipv4/fib_semantics.c @@ -685,7 +685,7 @@ static int fib_get_nhs(struct fib_info *fi, struct rtnexthop *rtnh, return -EINVAL; } - if (rtnh->rtnh_flags & (RTNH_F_DEAD | RTNH_F_LINKDOWN)) { + if (rtnh->rtnh_flags & RTNH_REJECT_MASK) { NL_SET_ERR_MSG(extack, "Invalid flags for nexthop - can not contain DEAD or LINKDOWN"); return -EINVAL; @@ -1363,7 +1363,7 @@ struct fib_info *fib_create_info(struct fib_config *cfg, goto err_inval; } - if (cfg->fc_flags & (RTNH_F_DEAD | RTNH_F_LINKDOWN)) { + if (cfg->fc_flags & RTNH_REJECT_MASK) { NL_SET_ERR_MSG(extack, "Invalid rtm_flags - can not contain DEAD or LINKDOWN"); goto err_inval;
Introduce RTNH_REJECT_MASK mask which contains all rtnh_flags which can't be set by the userspace directly. This mask will be used in the iproute utility to exclude rtnh_flags which can't be restored from "ip route save" image. This patch doesn't change kernel behavior at all. Please, take a look on [PATCH iproute2] ip route: save: exclude rtnh_flags which can't be set Cc: David Miller <davem@davemloft.net> Cc: David Ahern <dsahern@gmail.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: Ido Schimmel <idosch@nvidia.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Roopa Prabhu <roopa@nvidia.com> Cc: Andrei Vagin <avagin@gmail.com> Cc: Pavel Tikhomirov <ptikhomirov@virtuozzo.com> Cc: Alexander Mikhalitsyn <alexander@mihalicyn.com> Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com> --- include/uapi/linux/rtnetlink.h | 3 +++ net/ipv4/fib_semantics.c | 4 ++-- 2 files changed, 5 insertions(+), 2 deletions(-)