diff mbox series

[2/2] cipso: make cipso_v4_skbuff_delattr() fully remove the CIPSO options

Message ID 20240416152913.1527166-3-omosnace@redhat.com (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series cipso: make cipso_v4_skbuff_delattr() fully remove the CIPSO options | expand

Checks

Context Check Description
netdev/series_format warning Target tree name not specified in the subject
netdev/tree_selection success Guessed tree name to be net-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 928 this patch: 928
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers warning 4 maintainers not CCed: pabeni@redhat.com edumazet@google.com kuba@kernel.org dsahern@kernel.org
netdev/build_clang success Errors and warnings before: 938 this patch: 938
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 939 this patch: 939
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 123 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 40 this patch: 40
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-04-16--21-00 (tests: 958)

Commit Message

Ondrej Mosnacek April 16, 2024, 3:29 p.m. UTC
As the comment in this function says, the code currently just clears the
CIPSO part with IPOPT_NOP, rather than removing it completely and
trimming the packet. This is inconsistent with the other
cipso_v4_*_delattr() functions and with CALIPSO (IPv6).

Implement the proper option removal to make it consistent and producing
more optimal IP packets when there are CIPSO options set.

Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
---
 net/ipv4/cipso_ipv4.c | 89 ++++++++++++++++++++++++++++---------------
 1 file changed, 59 insertions(+), 30 deletions(-)

Comments

Paul Moore April 16, 2024, 6:39 p.m. UTC | #1
On Apr 16, 2024 Ondrej Mosnacek <omosnace@redhat.com> wrote:
> 
> As the comment in this function says, the code currently just clears the
> CIPSO part with IPOPT_NOP, rather than removing it completely and
> trimming the packet. This is inconsistent with the other
> cipso_v4_*_delattr() functions and with CALIPSO (IPv6).

This sentence above implies an equality in handling between those three
cases that doesn't exist.  IPv6 has a radically different approach to
IP options, comparisions between the two aren't really valid.  Similarly,
how we manage IPv4 options on sockets (req or otherwise) is pretty
different from how we manage them on packets, and that is intentional.

Drop the above sentence, or provide a more detailed explanation of the
three aproaches explaining when they can be compared and when they
shouldn't be compared.

> Implement the proper option removal to make it consistent and producing
> more optimal IP packets when there are CIPSO options set.
>
> Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
> ---
>  net/ipv4/cipso_ipv4.c | 89 ++++++++++++++++++++++++++++---------------
>  1 file changed, 59 insertions(+), 30 deletions(-)

Outside of the SELinux test suite, what testing have you done when you
have a Linux box forwarding between a CIPSO network segment and an
unlabeled segment?  I'm specifically interested in stream based protocols
such as TCP.  Also, do the rest of the netfilter callbacks handle it okay
if the skb changes size in one of the callbacks?  Granted it has been
*years* since this code was written (decades?), but if I recall
correctly, at the time it was a big no-no to change the skb size in a
netfilter callback.

> diff --git a/net/ipv4/cipso_ipv4.c b/net/ipv4/cipso_ipv4.c
> index 75b5e3c35f9bf..c08c6d0262ba8 100644
> --- a/net/ipv4/cipso_ipv4.c
> +++ b/net/ipv4/cipso_ipv4.c
> @@ -1810,6 +1810,34 @@ static int cipso_v4_genopt(unsigned char *buf, u32 buf_len,
>  	return CIPSO_V4_HDR_LEN + ret_val;
>  }
>  
> +static int cipso_v4_get_actual_opt_len(const unsigned char *data, int len)
> +{
> +	int iter = 0, optlen = 0;
> +
> +	/* determining the new total option length is tricky because of
> +	 * the padding necessary, the only thing i can think to do at
> +	 * this point is walk the options one-by-one, skipping the
> +	 * padding at the end to determine the actual option size and
> +	 * from there we can determine the new total option length
> +	 */
> +	while (iter < len) {
> +		if (data[iter] == IPOPT_END) {
> +			break;
> +		} else if (data[iter] == IPOPT_NOP) {
> +			iter++;
> +		} else {
> +			if (WARN_ON(data[iter + 1] < 2))
> +				iter += 2;
> +			else
> +				iter += data[iter + 1];
> +			optlen = iter;
> +		}
> +	}
> +	if (WARN_ON(optlen > len))
> +		optlen = len;
> +	return optlen;
> +}

See my comments in patch 1/2; they apply here.

>  /**
>   * cipso_v4_sock_setattr - Add a CIPSO option to a socket
>   * @sk: the socket
> @@ -1985,7 +2013,6 @@ static int cipso_v4_delopt(struct ip_options_rcu __rcu **opt_ptr)
>  		u8 cipso_len;
>  		u8 cipso_off;
>  		unsigned char *cipso_ptr;
> -		int iter;
>  		int optlen_new;
>  
>  		cipso_off = opt->opt.cipso - sizeof(struct iphdr);
> @@ -2005,28 +2032,8 @@ static int cipso_v4_delopt(struct ip_options_rcu __rcu **opt_ptr)
>  		memmove(cipso_ptr, cipso_ptr + cipso_len,
>  			opt->opt.optlen - cipso_off - cipso_len);
>  
> -		/* determining the new total option length is tricky because of
> -		 * the padding necessary, the only thing i can think to do at
> -		 * this point is walk the options one-by-one, skipping the
> -		 * padding at the end to determine the actual option size and
> -		 * from there we can determine the new total option length */
> -		iter = 0;
> -		optlen_new = 0;
> -		while (iter < opt->opt.optlen) {
> -			if (opt->opt.__data[iter] == IPOPT_END) {
> -				break;
> -			} else if (opt->opt.__data[iter] == IPOPT_NOP) {
> -				iter++;
> -			} else {
> -				if (WARN_ON(opt->opt.__data[iter + 1] < 2))
> -					iter += 2;
> -				else
> -					iter += opt->opt.__data[iter + 1];
> -				optlen_new = iter;
> -			}
> -		}
> -		if (WARN_ON(optlen_new > opt->opt.optlen))
> -			optlen_new = opt->opt.optlen;
> +		optlen_new = cipso_v4_get_actual_opt_len(opt->opt.__data,
> +							 opt->opt.optlen);
>  		hdr_delta = opt->opt.optlen;
>  		opt->opt.optlen = (optlen_new + 3) & ~3;
>  		hdr_delta -= opt->opt.optlen;
> @@ -2246,7 +2253,8 @@ int cipso_v4_skbuff_setattr(struct sk_buff *skb,
>   */
>  int cipso_v4_skbuff_delattr(struct sk_buff *skb)
>  {
> -	int ret_val;
> +	int ret_val, cipso_len, hdr_len_actual, new_hdr_len_actual, new_hdr_len,
> +	    hdr_len_delta;

Please keep line lengths under 80-chars whenever possible.  I know Linus
relaxed that requirement a while ago, but I still find the 80-char limit
to be a positive thing.

>  	struct iphdr *iph;
>  	struct ip_options *opt = &IPCB(skb)->opt;
>  	unsigned char *cipso_ptr;
> @@ -2259,16 +2267,37 @@ int cipso_v4_skbuff_delattr(struct sk_buff *skb)
>  	if (ret_val < 0)
>  		return ret_val;
>  
> -	/* the easiest thing to do is just replace the cipso option with noop
> -	 * options since we don't change the size of the packet, although we
> -	 * still need to recalculate the checksum */

Unless you can guarantee that the length change isn't going to have
any adverse effects (even then I would want to know why you are so
confident), I'd feel a lot more comfortable sticking with a
preserve-the-size-and-fill approach.  If you want to change that from
_NOP to _END, I'd be okay with that.

(and if you are talking to who I think you are talking to, I'm guessing
the _NOP to _END swap would likely solve their problem)

>  	iph = ip_hdr(skb);
>  	cipso_ptr = (unsigned char *)iph + opt->cipso;
> -	memset(cipso_ptr, IPOPT_NOOP, cipso_ptr[1]);
> +	cipso_len = cipso_ptr[1];
> +
> +	hdr_len_actual = sizeof(struct iphdr) +
> +			 cipso_v4_get_actual_opt_len((unsigned char *)(iph + 1),
> +						     opt->optlen);
> +	new_hdr_len_actual = hdr_len_actual - cipso_len;
> +	new_hdr_len = (new_hdr_len_actual + 3) & ~3;
> +	hdr_len_delta = (iph->ihl << 2) - new_hdr_len;
> +
> +	/* 1. shift any options after CIPSO to the left */
> +	memmove(cipso_ptr, cipso_ptr + cipso_len,
> +		new_hdr_len_actual - opt->cipso);
> +	/* 2. move the whole IP header to its new place */
> +	memmove((unsigned char *)iph + hdr_len_delta, iph, new_hdr_len_actual);
> +	/* 3. adjust the skb layout */
> +	skb_pull(skb, hdr_len_delta);
> +	skb_reset_network_header(skb);
> +	iph = ip_hdr(skb);
> +	/* 4. re-fill new padding with IPOPT_END (may now be longer) */
> +	memset((unsigned char *)iph + new_hdr_len_actual, IPOPT_END,
> +	       new_hdr_len - new_hdr_len_actual);
> +	opt->optlen -= hdr_len_delta;
>  	opt->cipso = 0;
>  	opt->is_changed = 1;
> -
> +	if (hdr_len_delta != 0) {
> +		iph->ihl = new_hdr_len >> 2;
> +		iph_set_totlen(iph, skb->len);
> +	}
>  	ip_send_check(iph);
>  
>  	return 0;
> -- 
> 2.44.0

--
paul-moore.com
Ondrej Mosnacek April 17, 2024, 1:03 p.m. UTC | #2
On Tue, Apr 16, 2024 at 8:39 PM Paul Moore <paul@paul-moore.com> wrote:
>
> On Apr 16, 2024 Ondrej Mosnacek <omosnace@redhat.com> wrote:
> >
> > As the comment in this function says, the code currently just clears the
> > CIPSO part with IPOPT_NOP, rather than removing it completely and
> > trimming the packet. This is inconsistent with the other
> > cipso_v4_*_delattr() functions and with CALIPSO (IPv6).
>
> This sentence above implies an equality in handling between those three
> cases that doesn't exist.  IPv6 has a radically different approach to
> IP options, comparisions between the two aren't really valid.

I don't think it's that radically different. Look at
calipso_skbuff_delattr() and you will see that it already does
something very similar as cipso_v4_skbuff_delattr() after this patch.
If the CALIPSO options are the only thing in the hopopt header, it
removes it altogether. If there are other options, it removes the part
with the CALIPSO options - here it is less pedantic and replaces it
with up to 7-byte padding if the length of the CALIPSO part is not
8-byte aligned, presumably to avoid having to move the subsequent
options (if any) and adjust the padding at the end.

> Similarly,
> how we manage IPv4 options on sockets (req or otherwise) is pretty
> different from how we manage them on packets, and that is intentional.

Perhaps, but that is an implementation detail to the user. The
resulting packet content should ideally be the same regardless of
which method of injecting the options is used, when possible.

> Drop the above sentence, or provide a more detailed explanation of the
> three aproaches explaining when they can be compared and when they
> shouldn't be compared.

I'm not sure why you think they shouldn't be compared, but I can
surely be more detailed in making my case there.

> > Implement the proper option removal to make it consistent and producing
> > more optimal IP packets when there are CIPSO options set.
> >
> > Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
> > ---
> >  net/ipv4/cipso_ipv4.c | 89 ++++++++++++++++++++++++++++---------------
> >  1 file changed, 59 insertions(+), 30 deletions(-)
>
> Outside of the SELinux test suite, what testing have you done when you
> have a Linux box forwarding between a CIPSO network segment and an
> unlabeled segment?  I'm specifically interested in stream based protocols
> such as TCP.  Also, do the rest of the netfilter callbacks handle it okay
> if the skb changes size in one of the callbacks?  Granted it has been
> *years* since this code was written (decades?), but if I recall
> correctly, at the time it was a big no-no to change the skb size in a
> netfilter callback.

I didn't test that, TBH. But all of cipso_v4_skbuff_setattr(),
calipso_skbuff_setattr(), and calipso_skbuff_delattr() already do
skb_push()/skb_pull(), so they would all be broken if that is (still?)
true. And this new cipso_v4_skbuff_delattr() doesn't do anything
w.r.t. the skb and the IP header that those wouldn't do already.

[...]
> > @@ -2246,7 +2253,8 @@ int cipso_v4_skbuff_setattr(struct sk_buff *skb,
> >   */
> >  int cipso_v4_skbuff_delattr(struct sk_buff *skb)
> >  {
> > -     int ret_val;
> > +     int ret_val, cipso_len, hdr_len_actual, new_hdr_len_actual, new_hdr_len,
> > +         hdr_len_delta;
>
> Please keep line lengths under 80-chars whenever possible.  I know Linus
> relaxed that requirement a while ago, but I still find the 80-char limit
> to be a positive thing.

I believe the line is exactly 80 characters, so still within the
limit. Or is it < 80 instead of <= 80? Does it really matter?

>
> >       struct iphdr *iph;
> >       struct ip_options *opt = &IPCB(skb)->opt;
> >       unsigned char *cipso_ptr;
> > @@ -2259,16 +2267,37 @@ int cipso_v4_skbuff_delattr(struct sk_buff *skb)
> >       if (ret_val < 0)
> >               return ret_val;
> >
> > -     /* the easiest thing to do is just replace the cipso option with noop
> > -      * options since we don't change the size of the packet, although we
> > -      * still need to recalculate the checksum */
>
> Unless you can guarantee that the length change isn't going to have
> any adverse effects (even then I would want to know why you are so
> confident), I'd feel a lot more comfortable sticking with a
> preserve-the-size-and-fill approach.  If you want to change that from
> _NOP to _END, I'd be okay with that.
>
> (and if you are talking to who I think you are talking to, I'm guessing
> the _NOP to _END swap would likely solve their problem)

So, to reveal all the cards, the issue that has prompted me to send
this patch is that a user may have a router configured to drop packets
containing any IP options [1][2] and may expect a Linux host with
NetLabel configured as unlabeled for a target IP address to
output/forward packets without IP options if CIPSO was the only option
present before NetLabel processing (such that it can then pass through
the strict router(s)).

Padding with IPOPT_END *might* solve this particular case, but I'm not
sure if the routers would really interpret such padding as "no
options"... I'll try to ask the reporter to test such a patch and
we'll see. But still, I don't yet see a convincing reason to not go
all the way and make sure we send optimally-sized packets here.

Side note: We could only clear CIPSO with IPOPT_END if it's the last
option in the header, but that limitation doesn't really matter for
the use case described above.

[1] https://www.stigviewer.com/stig/juniper_router_rtr/2019-09-27/finding/V-90937
[2] https://www.stigviewer.com/stig/cisco_ios_xe_router_rtr/2021-03-26/finding/V-217001

>
> >       iph = ip_hdr(skb);
> >       cipso_ptr = (unsigned char *)iph + opt->cipso;
> > -     memset(cipso_ptr, IPOPT_NOOP, cipso_ptr[1]);
> > +     cipso_len = cipso_ptr[1];
> > +
> > +     hdr_len_actual = sizeof(struct iphdr) +
> > +                      cipso_v4_get_actual_opt_len((unsigned char *)(iph + 1),
> > +                                                  opt->optlen);
> > +     new_hdr_len_actual = hdr_len_actual - cipso_len;
> > +     new_hdr_len = (new_hdr_len_actual + 3) & ~3;
> > +     hdr_len_delta = (iph->ihl << 2) - new_hdr_len;
> > +
> > +     /* 1. shift any options after CIPSO to the left */
> > +     memmove(cipso_ptr, cipso_ptr + cipso_len,
> > +             new_hdr_len_actual - opt->cipso);
> > +     /* 2. move the whole IP header to its new place */
> > +     memmove((unsigned char *)iph + hdr_len_delta, iph, new_hdr_len_actual);
> > +     /* 3. adjust the skb layout */
> > +     skb_pull(skb, hdr_len_delta);
> > +     skb_reset_network_header(skb);
> > +     iph = ip_hdr(skb);
> > +     /* 4. re-fill new padding with IPOPT_END (may now be longer) */
> > +     memset((unsigned char *)iph + new_hdr_len_actual, IPOPT_END,
> > +            new_hdr_len - new_hdr_len_actual);
> > +     opt->optlen -= hdr_len_delta;
> >       opt->cipso = 0;
> >       opt->is_changed = 1;
> > -
> > +     if (hdr_len_delta != 0) {
> > +             iph->ihl = new_hdr_len >> 2;
> > +             iph_set_totlen(iph, skb->len);
> > +     }
> >       ip_send_check(iph);
> >
> >       return 0;
> > --
> > 2.44.0
>
> --
> paul-moore.com
>

--
Ondrej Mosnacek
Senior Software Engineer, Linux Security - SELinux kernel
Red Hat, Inc.
Paul Moore April 25, 2024, 9:48 p.m. UTC | #3
On Wed, Apr 17, 2024 at 9:03 AM Ondrej Mosnacek <omosnace@redhat.com> wrote:
> On Tue, Apr 16, 2024 at 8:39 PM Paul Moore <paul@paul-moore.com> wrote:
> > On Apr 16, 2024 Ondrej Mosnacek <omosnace@redhat.com> wrote:
> > >
> > > As the comment in this function says, the code currently just clears the
> > > CIPSO part with IPOPT_NOP, rather than removing it completely and
> > > trimming the packet. This is inconsistent with the other
> > > cipso_v4_*_delattr() functions and with CALIPSO (IPv6).
> >
> > This sentence above implies an equality in handling between those three
> > cases that doesn't exist.  IPv6 has a radically different approach to
> > IP options, comparisions between the two aren't really valid.
>
> I don't think it's that radically different.

They are very different in my mind.  The IPv4 vs IPv6 option format
and handling should be fairly obvious and I'm sure there are plenty of
things written that describe the differences and motivations in
excruciating detail so I'm not going to bother trying to do that here;
as usual, Google is your friend.  I will admit that the skbuff vs
socket-based labeling differences are a bit more subtle, but I believe
if you look at how the packets are labeled in the two approaches as
well as how they are managed and hooked into the LSMs you will start
to get a better idea.  If that doesn't convince you that these three
cases are significantly different, I'm not sure what else I can say
other than we have a difference of opinion.  Regardless, I stand by my
original comment that I don't like the text you chose and would like
you to remove or change it.

> > > Implement the proper option removal to make it consistent and producing
> > > more optimal IP packets when there are CIPSO options set.
> > >
> > > Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
> > > ---
> > >  net/ipv4/cipso_ipv4.c | 89 ++++++++++++++++++++++++++++---------------
> > >  1 file changed, 59 insertions(+), 30 deletions(-)
> >
> > Outside of the SELinux test suite, what testing have you done when you
> > have a Linux box forwarding between a CIPSO network segment and an
> > unlabeled segment?  I'm specifically interested in stream based protocols
> > such as TCP.  Also, do the rest of the netfilter callbacks handle it okay
> > if the skb changes size in one of the callbacks?  Granted it has been
> > *years* since this code was written (decades?), but if I recall
> > correctly, at the time it was a big no-no to change the skb size in a
> > netfilter callback.
>
> I didn't test that, TBH. But all of cipso_v4_skbuff_setattr(),
> calipso_skbuff_setattr(), and calipso_skbuff_delattr() already do
> skb_push()/skb_pull(), so they would all be broken if that is (still?)
> true. And this new cipso_v4_skbuff_delattr() doesn't do anything
> w.r.t. the skb and the IP header that those wouldn't do already.

Fair point on skbuff size changes in netfilter and
cipso_v4_skbuff_setattr(), that wasn't part of the original
NetLabel/CIPSO support and I forgot about that aspect.  On the other
hand, I believe cipso_v4_skbuff_delattr() was part of the original
work and used the NOOP hack both to preserve the packet length in the
netfilter chain and to help ensure a consistent IP header overhead on
both sides of a forwarding CIPSO<->unlabeled labeling/access control
system.  Which brings me around to the reason why I asked about
testing; I think we need to confirm that nothing bad happens to
bidirectional stream-based connections, e.g. TCP, when crossing over a
CIPSO/unlabeled network boundary and the IP overhead changes.  It's
probably okay, but I would like to see that you've tested it with a
couple different client OSes and everything works as expected.

> [...]
> > > @@ -2246,7 +2253,8 @@ int cipso_v4_skbuff_setattr(struct sk_buff *skb,
> > >   */
> > >  int cipso_v4_skbuff_delattr(struct sk_buff *skb)
> > >  {
> > > -     int ret_val;
> > > +     int ret_val, cipso_len, hdr_len_actual, new_hdr_len_actual, new_hdr_len,
> > > +         hdr_len_delta;
> >
> > Please keep line lengths under 80-chars whenever possible.  I know Linus
> > relaxed that requirement a while ago, but I still find the 80-char limit
> > to be a positive thing.
>
> I believe the line is exactly 80 characters, so still within the
> limit. Or is it < 80 instead of <= 80? Does it really matter?

I thought I saw it wrap on my terminal when reviewing the code, maybe
it was just the newlink wrapping that I saw.  As long as it is <= 80
I'm okay with it.

> >
> > >       struct iphdr *iph;
> > >       struct ip_options *opt = &IPCB(skb)->opt;
> > >       unsigned char *cipso_ptr;
> > > @@ -2259,16 +2267,37 @@ int cipso_v4_skbuff_delattr(struct sk_buff *skb)
> > >       if (ret_val < 0)
> > >               return ret_val;
> > >
> > > -     /* the easiest thing to do is just replace the cipso option with noop
> > > -      * options since we don't change the size of the packet, although we
> > > -      * still need to recalculate the checksum */
> >
> > Unless you can guarantee that the length change isn't going to have
> > any adverse effects (even then I would want to know why you are so
> > confident), I'd feel a lot more comfortable sticking with a
> > preserve-the-size-and-fill approach.  If you want to change that from
> > _NOP to _END, I'd be okay with that.
> >
> > (and if you are talking to who I think you are talking to, I'm guessing
> > the _NOP to _END swap would likely solve their problem)
>
> So, to reveal all the cards, the issue that has prompted me to send
> this patch is that a user may have a router configured to drop packets
> containing any IP options [1][2] and may expect a Linux host with
> NetLabel configured as unlabeled for a target IP address to
> output/forward packets without IP options if CIPSO was the only option
> present before NetLabel processing (such that it can then pass through
> the strict router(s)).
>
> Padding with IPOPT_END *might* solve this particular case, but I'm not
> sure if the routers would really interpret such padding as "no
> options"... I'll try to ask the reporter to test such a patch and
> we'll see. But still, I don't yet see a convincing reason to not go
> all the way and make sure we send optimally-sized packets here.

I'm about 99.5% certain we are talking about the same reporter,
although I was missing the detail about an intermediate
switching/routing node; the problem report I saw was that there was
simply a black-box device on the network that was dropping packets due
to the presence of NOOP options.  IMO, the original RFCs are a little
too vague in this area, but it doesn't really matter if an
intermediate node is dropping the packets.

> Side note: We could only clear CIPSO with IPOPT_END if it's the last
> option in the header ...

Obviously, I kinda assumed anyone following along would understand that.

> ... but that limitation doesn't really matter for
> the use case described above.
Ondrej Mosnacek May 14, 2024, 11:29 a.m. UTC | #4
On Thu, Apr 25, 2024 at 11:48 PM Paul Moore <paul@paul-moore.com> wrote:
>
> On Wed, Apr 17, 2024 at 9:03 AM Ondrej Mosnacek <omosnace@redhat.com> wrote:
> > On Tue, Apr 16, 2024 at 8:39 PM Paul Moore <paul@paul-moore.com> wrote:
> > > On Apr 16, 2024 Ondrej Mosnacek <omosnace@redhat.com> wrote:
> > > >
> > > > As the comment in this function says, the code currently just clears the
> > > > CIPSO part with IPOPT_NOP, rather than removing it completely and
> > > > trimming the packet. This is inconsistent with the other
> > > > cipso_v4_*_delattr() functions and with CALIPSO (IPv6).
> > >
> > > This sentence above implies an equality in handling between those three
> > > cases that doesn't exist.  IPv6 has a radically different approach to
> > > IP options, comparisions between the two aren't really valid.
> >
> > I don't think it's that radically different.
>
> They are very different in my mind.  The IPv4 vs IPv6 option format
> and handling should be fairly obvious and I'm sure there are plenty of
> things written that describe the differences and motivations in
> excruciating detail so I'm not going to bother trying to do that here;
> as usual, Google is your friend.  I will admit that the skbuff vs
> socket-based labeling differences are a bit more subtle, but I believe
> if you look at how the packets are labeled in the two approaches as
> well as how they are managed and hooked into the LSMs you will start
> to get a better idea.  If that doesn't convince you that these three
> cases are significantly different, I'm not sure what else I can say
> other than we have a difference of opinion.  Regardless, I stand by my
> original comment that I don't like the text you chose and would like
> you to remove or change it.

Ok, I amended this part for v2 to hopefully better express what I'm
alluding to. I also added a paragraph about the routers dropping
packets with IP options, which explains the motivation better, anyway.

> > > > Implement the proper option removal to make it consistent and producing
> > > > more optimal IP packets when there are CIPSO options set.
> > > >
> > > > Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
> > > > ---
> > > >  net/ipv4/cipso_ipv4.c | 89 ++++++++++++++++++++++++++++---------------
> > > >  1 file changed, 59 insertions(+), 30 deletions(-)
> > >
> > > Outside of the SELinux test suite, what testing have you done when you
> > > have a Linux box forwarding between a CIPSO network segment and an
> > > unlabeled segment?  I'm specifically interested in stream based protocols
> > > such as TCP.  Also, do the rest of the netfilter callbacks handle it okay
> > > if the skb changes size in one of the callbacks?  Granted it has been
> > > *years* since this code was written (decades?), but if I recall
> > > correctly, at the time it was a big no-no to change the skb size in a
> > > netfilter callback.
> >
> > I didn't test that, TBH. But all of cipso_v4_skbuff_setattr(),
> > calipso_skbuff_setattr(), and calipso_skbuff_delattr() already do
> > skb_push()/skb_pull(), so they would all be broken if that is (still?)
> > true. And this new cipso_v4_skbuff_delattr() doesn't do anything
> > w.r.t. the skb and the IP header that those wouldn't do already.
>
> Fair point on skbuff size changes in netfilter and
> cipso_v4_skbuff_setattr(), that wasn't part of the original
> NetLabel/CIPSO support and I forgot about that aspect.  On the other
> hand, I believe cipso_v4_skbuff_delattr() was part of the original
> work and used the NOOP hack both to preserve the packet length in the
> netfilter chain and to help ensure a consistent IP header overhead on
> both sides of a forwarding CIPSO<->unlabeled labeling/access control
> system.  Which brings me around to the reason why I asked about
> testing; I think we need to confirm that nothing bad happens to
> bidirectional stream-based connections, e.g. TCP, when crossing over a
> CIPSO/unlabeled network boundary and the IP overhead changes.  It's
> probably okay, but I would like to see that you've tested it with a
> couple different client OSes and everything works as expected.

I tried to test what you describe - hopefully I got close enough:

My test setup has 3 VMs (running Fedora 39 from the stock qcow2 image)
A, B, and R, connected via separate links as A <--> R <--> B, where R
acts as a router between A and B (net.ipv4.ip_forward is set to 1 on
R, and the appropriate routes are set on A, B, R).

The A <--> R link has subnet 10.123.123.0/24, A having address
10.123.123.2 and R having 10.123.123.1.
The B <--> R link has subnet 10.123.124.0/24, B having address
10.123.124.2 and R having 10.123.124.1.

The links are implemented as GRE tunnels over the main network that is
shared between the VMs.

Netlabel configuration on A:
netlabelctl cipsov4 add pass doi:16 tags:5
netlabelctl map del default
netlabelctl map add default address:0.0.0.0/0 protocol:unlbl
netlabelctl map add default address:::/0 protocol:unlbl
netlabelctl map add default address:10.123.123.0/24 protocol:cipsov4,16
netlabelctl map add default address:10.123.124.0/24 protocol:cipsov4,16

Netlabel configuration on R:
netlabelctl cipsov4 add pass doi:16 tags:5
netlabelctl map del default
netlabelctl map add default address:0.0.0.0/0 protocol:unlbl
netlabelctl map add default address:::/0 protocol:unlbl
netlabelctl map add default address:10.123.123.0/24 protocol:cipsov4,16

B has no netlabel configured.

(I.e. A tries to send CIPSO-labeled packets to B, but R treats the
10.123.124.0/24 network as unlabeled, so should strip/add the CIPSO
label when forwarding between A and B.)

A basic TCP connection worked just fine in both directions with and
without these patches applied (I installed the patched kernel on all
machines, though it should only matter on machine R). I ignored the
actual labels/CIPSO content - i.e. I didn't change the default SELinux
policy and put SELinux into permissive mode on machines A and R.

Capturing the packets on R showed the following IP option content
without the patches:
A -> R: CIPSO
R -> B: NOPs
B -> R: (empty)
R -> A: CIPSO

With the patches this changed to:
A -> R: CIPSO
R -> B: (empty)
B -> R: (empty)
R -> A: CIPSO

Is this convincing enough or do you have different scenarios in mind?
Paul Moore May 17, 2024, 7:49 p.m. UTC | #5
On Tue, May 14, 2024 at 7:29 AM Ondrej Mosnacek <omosnace@redhat.com> wrote:
> On Thu, Apr 25, 2024 at 11:48 PM Paul Moore <paul@paul-moore.com> wrote:
> >
> > On Wed, Apr 17, 2024 at 9:03 AM Ondrej Mosnacek <omosnace@redhat.com> wrote:
> > > On Tue, Apr 16, 2024 at 8:39 PM Paul Moore <paul@paul-moore.com> wrote:
> > > > On Apr 16, 2024 Ondrej Mosnacek <omosnace@redhat.com> wrote:
> > > > >
> > > > > As the comment in this function says, the code currently just clears the
> > > > > CIPSO part with IPOPT_NOP, rather than removing it completely and
> > > > > trimming the packet. This is inconsistent with the other
> > > > > cipso_v4_*_delattr() functions and with CALIPSO (IPv6).
> > > >
> > > > This sentence above implies an equality in handling between those three
> > > > cases that doesn't exist.  IPv6 has a radically different approach to
> > > > IP options, comparisions between the two aren't really valid.
> > >
> > > I don't think it's that radically different.
> >
> > They are very different in my mind.  The IPv4 vs IPv6 option format
> > and handling should be fairly obvious and I'm sure there are plenty of
> > things written that describe the differences and motivations in
> > excruciating detail so I'm not going to bother trying to do that here;
> > as usual, Google is your friend.  I will admit that the skbuff vs
> > socket-based labeling differences are a bit more subtle, but I believe
> > if you look at how the packets are labeled in the two approaches as
> > well as how they are managed and hooked into the LSMs you will start
> > to get a better idea.  If that doesn't convince you that these three
> > cases are significantly different, I'm not sure what else I can say
> > other than we have a difference of opinion.  Regardless, I stand by my
> > original comment that I don't like the text you chose and would like
> > you to remove or change it.
>
> Ok, I amended this part for v2 to hopefully better express what I'm
> alluding to. I also added a paragraph about the routers dropping
> packets with IP options, which explains the motivation better, anyway.

Okay, I'll refrain from further comment until I see the v2 patch.

> I tried to test what you describe - hopefully I got close enough:
>
> My test setup has 3 VMs (running Fedora 39 from the stock qcow2 image)
> A, B, and R, connected via separate links as A <--> R <--> B, where R
> acts as a router between A and B (net.ipv4.ip_forward is set to 1 on
> R, and the appropriate routes are set on A, B, R).
>
> The A <--> R link has subnet 10.123.123.0/24, A having address
> 10.123.123.2 and R having 10.123.123.1.
> The B <--> R link has subnet 10.123.124.0/24, B having address
> 10.123.124.2 and R having 10.123.124.1.
>
> The links are implemented as GRE tunnels over the main network that is
> shared between the VMs.
>
> Netlabel configuration on A:
> netlabelctl cipsov4 add pass doi:16 tags:5
> netlabelctl map del default
> netlabelctl map add default address:0.0.0.0/0 protocol:unlbl
> netlabelctl map add default address:::/0 protocol:unlbl
> netlabelctl map add default address:10.123.123.0/24 protocol:cipsov4,16
> netlabelctl map add default address:10.123.124.0/24 protocol:cipsov4,16
>
> Netlabel configuration on R:
> netlabelctl cipsov4 add pass doi:16 tags:5
> netlabelctl map del default
> netlabelctl map add default address:0.0.0.0/0 protocol:unlbl
> netlabelctl map add default address:::/0 protocol:unlbl
> netlabelctl map add default address:10.123.123.0/24 protocol:cipsov4,16
>
> B has no netlabel configured.
>
> (I.e. A tries to send CIPSO-labeled packets to B, but R treats the
> 10.123.124.0/24 network as unlabeled, so should strip/add the CIPSO
> label when forwarding between A and B.)
>
> A basic TCP connection worked just fine in both directions with and
> without these patches applied (I installed the patched kernel on all
> machines, though it should only matter on machine R). I ignored the
> actual labels/CIPSO content - i.e. I didn't change the default SELinux
> policy and put SELinux into permissive mode on machines A and R.
>
> Capturing the packets on R showed the following IP option content
> without the patches:
> A -> R: CIPSO
> R -> B: NOPs
> B -> R: (empty)
> R -> A: CIPSO
>
> With the patches this changed to:
> A -> R: CIPSO
> R -> B: (empty)
> B -> R: (empty)
> R -> A: CIPSO
>
> Is this convincing enough or do you have different scenarios in mind?

Thanks for verifying your patch, the methodology looks good to me, but
as I mentioned in my previous email I would feel much better if you
verified this with a different client OS/stack.  Do you have access to
Windows/MacOS/BSD/non-Linux system you could use in place of B in your
test above?
Ondrej Mosnacek June 7, 2024, 3:47 p.m. UTC | #6
On Fri, May 17, 2024 at 9:49 PM Paul Moore <paul@paul-moore.com> wrote:
>
> On Tue, May 14, 2024 at 7:29 AM Ondrej Mosnacek <omosnace@redhat.com> wrote:
> > I tried to test what you describe - hopefully I got close enough:
> >
> > My test setup has 3 VMs (running Fedora 39 from the stock qcow2 image)
> > A, B, and R, connected via separate links as A <--> R <--> B, where R
> > acts as a router between A and B (net.ipv4.ip_forward is set to 1 on
> > R, and the appropriate routes are set on A, B, R).
> >
> > The A <--> R link has subnet 10.123.123.0/24, A having address
> > 10.123.123.2 and R having 10.123.123.1.
> > The B <--> R link has subnet 10.123.124.0/24, B having address
> > 10.123.124.2 and R having 10.123.124.1.
> >
> > The links are implemented as GRE tunnels over the main network that is
> > shared between the VMs.
> >
> > Netlabel configuration on A:
> > netlabelctl cipsov4 add pass doi:16 tags:5
> > netlabelctl map del default
> > netlabelctl map add default address:0.0.0.0/0 protocol:unlbl
> > netlabelctl map add default address:::/0 protocol:unlbl
> > netlabelctl map add default address:10.123.123.0/24 protocol:cipsov4,16
> > netlabelctl map add default address:10.123.124.0/24 protocol:cipsov4,16
> >
> > Netlabel configuration on R:
> > netlabelctl cipsov4 add pass doi:16 tags:5
> > netlabelctl map del default
> > netlabelctl map add default address:0.0.0.0/0 protocol:unlbl
> > netlabelctl map add default address:::/0 protocol:unlbl
> > netlabelctl map add default address:10.123.123.0/24 protocol:cipsov4,16
> >
> > B has no netlabel configured.
> >
> > (I.e. A tries to send CIPSO-labeled packets to B, but R treats the
> > 10.123.124.0/24 network as unlabeled, so should strip/add the CIPSO
> > label when forwarding between A and B.)
> >
> > A basic TCP connection worked just fine in both directions with and
> > without these patches applied (I installed the patched kernel on all
> > machines, though it should only matter on machine R). I ignored the
> > actual labels/CIPSO content - i.e. I didn't change the default SELinux
> > policy and put SELinux into permissive mode on machines A and R.
> >
> > Capturing the packets on R showed the following IP option content
> > without the patches:
> > A -> R: CIPSO
> > R -> B: NOPs
> > B -> R: (empty)
> > R -> A: CIPSO
> >
> > With the patches this changed to:
> > A -> R: CIPSO
> > R -> B: (empty)
> > B -> R: (empty)
> > R -> A: CIPSO
> >
> > Is this convincing enough or do you have different scenarios in mind?
>
> Thanks for verifying your patch, the methodology looks good to me, but
> as I mentioned in my previous email I would feel much better if you
> verified this with a different client OS/stack.  Do you have access to
> Windows/MacOS/BSD/non-Linux system you could use in place of B in your
> test above?

I don't think I can easily plug that into the framework I used for the
testing (there doesn't seem to be a convenient way to install a
FreeBSD VM without manual interaction and the rest is proprietary).

I still don't quite understand what exactly you expect to break under
that scenario and why - could you elaborate on that? If anything, I'd
expect the IP header growing along the path (which already happens
pretty much by design in the opposite direction) to be more likely to
cause an issue.

--
Ondrej Mosnacek
Senior Software Engineer, Linux Security - SELinux kernel
Red Hat, Inc.
Paul Moore June 11, 2024, 9:01 p.m. UTC | #7
On Fri, Jun 7, 2024 at 11:47 AM Ondrej Mosnacek <omosnace@redhat.com> wrote:
> On Fri, May 17, 2024 at 9:49 PM Paul Moore <paul@paul-moore.com> wrote:
> >
> > On Tue, May 14, 2024 at 7:29 AM Ondrej Mosnacek <omosnace@redhat.com> wrote:
> > > I tried to test what you describe - hopefully I got close enough:
> > >
> > > My test setup has 3 VMs (running Fedora 39 from the stock qcow2 image)
> > > A, B, and R, connected via separate links as A <--> R <--> B, where R
> > > acts as a router between A and B (net.ipv4.ip_forward is set to 1 on
> > > R, and the appropriate routes are set on A, B, R).
> > >
> > > The A <--> R link has subnet 10.123.123.0/24, A having address
> > > 10.123.123.2 and R having 10.123.123.1.
> > > The B <--> R link has subnet 10.123.124.0/24, B having address
> > > 10.123.124.2 and R having 10.123.124.1.
> > >
> > > The links are implemented as GRE tunnels over the main network that is
> > > shared between the VMs.
> > >
> > > Netlabel configuration on A:
> > > netlabelctl cipsov4 add pass doi:16 tags:5
> > > netlabelctl map del default
> > > netlabelctl map add default address:0.0.0.0/0 protocol:unlbl
> > > netlabelctl map add default address:::/0 protocol:unlbl
> > > netlabelctl map add default address:10.123.123.0/24 protocol:cipsov4,16
> > > netlabelctl map add default address:10.123.124.0/24 protocol:cipsov4,16
> > >
> > > Netlabel configuration on R:
> > > netlabelctl cipsov4 add pass doi:16 tags:5
> > > netlabelctl map del default
> > > netlabelctl map add default address:0.0.0.0/0 protocol:unlbl
> > > netlabelctl map add default address:::/0 protocol:unlbl
> > > netlabelctl map add default address:10.123.123.0/24 protocol:cipsov4,16
> > >
> > > B has no netlabel configured.
> > >
> > > (I.e. A tries to send CIPSO-labeled packets to B, but R treats the
> > > 10.123.124.0/24 network as unlabeled, so should strip/add the CIPSO
> > > label when forwarding between A and B.)
> > >
> > > A basic TCP connection worked just fine in both directions with and
> > > without these patches applied (I installed the patched kernel on all
> > > machines, though it should only matter on machine R). I ignored the
> > > actual labels/CIPSO content - i.e. I didn't change the default SELinux
> > > policy and put SELinux into permissive mode on machines A and R.
> > >
> > > Capturing the packets on R showed the following IP option content
> > > without the patches:
> > > A -> R: CIPSO
> > > R -> B: NOPs
> > > B -> R: (empty)
> > > R -> A: CIPSO
> > >
> > > With the patches this changed to:
> > > A -> R: CIPSO
> > > R -> B: (empty)
> > > B -> R: (empty)
> > > R -> A: CIPSO
> > >
> > > Is this convincing enough or do you have different scenarios in mind?
> >
> > Thanks for verifying your patch, the methodology looks good to me, but
> > as I mentioned in my previous email I would feel much better if you
> > verified this with a different client OS/stack.  Do you have access to
> > Windows/MacOS/BSD/non-Linux system you could use in place of B in your
> > test above?
>
> I don't think I can easily plug that into the framework I used for the
> testing (there doesn't seem to be a convenient way to install a
> FreeBSD VM without manual interaction and the rest is proprietary).

Surely you can perform a manual unit test with some VMs on your local
machine if whatever test automation you are using doesn't support
this.

> I still don't quite understand what exactly you expect to break under
> that scenario and why - could you elaborate on that? If anything, I'd
> expect the IP header growing along the path (which already happens
> pretty much by design in the opposite direction) to be more likely to
> cause an issue.

I'm concerned about potential oddities caused by the changes in IP
header sizes while the packet is in flight.  Every OS's network stack
is a bit different and I don't think it is too much to ask to test at
least one non-Linux network stack as a client.
diff mbox series

Patch

diff --git a/net/ipv4/cipso_ipv4.c b/net/ipv4/cipso_ipv4.c
index 75b5e3c35f9bf..c08c6d0262ba8 100644
--- a/net/ipv4/cipso_ipv4.c
+++ b/net/ipv4/cipso_ipv4.c
@@ -1810,6 +1810,34 @@  static int cipso_v4_genopt(unsigned char *buf, u32 buf_len,
 	return CIPSO_V4_HDR_LEN + ret_val;
 }
 
+static int cipso_v4_get_actual_opt_len(const unsigned char *data, int len)
+{
+	int iter = 0, optlen = 0;
+
+	/* determining the new total option length is tricky because of
+	 * the padding necessary, the only thing i can think to do at
+	 * this point is walk the options one-by-one, skipping the
+	 * padding at the end to determine the actual option size and
+	 * from there we can determine the new total option length
+	 */
+	while (iter < len) {
+		if (data[iter] == IPOPT_END) {
+			break;
+		} else if (data[iter] == IPOPT_NOP) {
+			iter++;
+		} else {
+			if (WARN_ON(data[iter + 1] < 2))
+				iter += 2;
+			else
+				iter += data[iter + 1];
+			optlen = iter;
+		}
+	}
+	if (WARN_ON(optlen > len))
+		optlen = len;
+	return optlen;
+}
+
 /**
  * cipso_v4_sock_setattr - Add a CIPSO option to a socket
  * @sk: the socket
@@ -1985,7 +2013,6 @@  static int cipso_v4_delopt(struct ip_options_rcu __rcu **opt_ptr)
 		u8 cipso_len;
 		u8 cipso_off;
 		unsigned char *cipso_ptr;
-		int iter;
 		int optlen_new;
 
 		cipso_off = opt->opt.cipso - sizeof(struct iphdr);
@@ -2005,28 +2032,8 @@  static int cipso_v4_delopt(struct ip_options_rcu __rcu **opt_ptr)
 		memmove(cipso_ptr, cipso_ptr + cipso_len,
 			opt->opt.optlen - cipso_off - cipso_len);
 
-		/* determining the new total option length is tricky because of
-		 * the padding necessary, the only thing i can think to do at
-		 * this point is walk the options one-by-one, skipping the
-		 * padding at the end to determine the actual option size and
-		 * from there we can determine the new total option length */
-		iter = 0;
-		optlen_new = 0;
-		while (iter < opt->opt.optlen) {
-			if (opt->opt.__data[iter] == IPOPT_END) {
-				break;
-			} else if (opt->opt.__data[iter] == IPOPT_NOP) {
-				iter++;
-			} else {
-				if (WARN_ON(opt->opt.__data[iter + 1] < 2))
-					iter += 2;
-				else
-					iter += opt->opt.__data[iter + 1];
-				optlen_new = iter;
-			}
-		}
-		if (WARN_ON(optlen_new > opt->opt.optlen))
-			optlen_new = opt->opt.optlen;
+		optlen_new = cipso_v4_get_actual_opt_len(opt->opt.__data,
+							 opt->opt.optlen);
 		hdr_delta = opt->opt.optlen;
 		opt->opt.optlen = (optlen_new + 3) & ~3;
 		hdr_delta -= opt->opt.optlen;
@@ -2246,7 +2253,8 @@  int cipso_v4_skbuff_setattr(struct sk_buff *skb,
  */
 int cipso_v4_skbuff_delattr(struct sk_buff *skb)
 {
-	int ret_val;
+	int ret_val, cipso_len, hdr_len_actual, new_hdr_len_actual, new_hdr_len,
+	    hdr_len_delta;
 	struct iphdr *iph;
 	struct ip_options *opt = &IPCB(skb)->opt;
 	unsigned char *cipso_ptr;
@@ -2259,16 +2267,37 @@  int cipso_v4_skbuff_delattr(struct sk_buff *skb)
 	if (ret_val < 0)
 		return ret_val;
 
-	/* the easiest thing to do is just replace the cipso option with noop
-	 * options since we don't change the size of the packet, although we
-	 * still need to recalculate the checksum */
-
 	iph = ip_hdr(skb);
 	cipso_ptr = (unsigned char *)iph + opt->cipso;
-	memset(cipso_ptr, IPOPT_NOOP, cipso_ptr[1]);
+	cipso_len = cipso_ptr[1];
+
+	hdr_len_actual = sizeof(struct iphdr) +
+			 cipso_v4_get_actual_opt_len((unsigned char *)(iph + 1),
+						     opt->optlen);
+	new_hdr_len_actual = hdr_len_actual - cipso_len;
+	new_hdr_len = (new_hdr_len_actual + 3) & ~3;
+	hdr_len_delta = (iph->ihl << 2) - new_hdr_len;
+
+	/* 1. shift any options after CIPSO to the left */
+	memmove(cipso_ptr, cipso_ptr + cipso_len,
+		new_hdr_len_actual - opt->cipso);
+	/* 2. move the whole IP header to its new place */
+	memmove((unsigned char *)iph + hdr_len_delta, iph, new_hdr_len_actual);
+	/* 3. adjust the skb layout */
+	skb_pull(skb, hdr_len_delta);
+	skb_reset_network_header(skb);
+	iph = ip_hdr(skb);
+	/* 4. re-fill new padding with IPOPT_END (may now be longer) */
+	memset((unsigned char *)iph + new_hdr_len_actual, IPOPT_END,
+	       new_hdr_len - new_hdr_len_actual);
+
+	opt->optlen -= hdr_len_delta;
 	opt->cipso = 0;
 	opt->is_changed = 1;
-
+	if (hdr_len_delta != 0) {
+		iph->ihl = new_hdr_len >> 2;
+		iph_set_totlen(iph, skb->len);
+	}
 	ip_send_check(iph);
 
 	return 0;