diff mbox series

[2/2] cipso: make cipso_v4_skbuff_delattr() fully remove the CIPSO options

Message ID 20240416152913.1527166-3-omosnace@redhat.com (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series cipso: make cipso_v4_skbuff_delattr() fully remove the CIPSO options | expand

Checks

Context Check Description
netdev/series_format warning Target tree name not specified in the subject
netdev/tree_selection success Guessed tree name to be net-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 928 this patch: 928
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers warning 4 maintainers not CCed: pabeni@redhat.com edumazet@google.com kuba@kernel.org dsahern@kernel.org
netdev/build_clang success Errors and warnings before: 938 this patch: 938
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 939 this patch: 939
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 123 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 40 this patch: 40
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-04-16--21-00 (tests: 958)

Commit Message

Ondrej Mosnacek April 16, 2024, 3:29 p.m. UTC
As the comment in this function says, the code currently just clears the
CIPSO part with IPOPT_NOP, rather than removing it completely and
trimming the packet. This is inconsistent with the other
cipso_v4_*_delattr() functions and with CALIPSO (IPv6).

Implement the proper option removal to make it consistent and producing
more optimal IP packets when there are CIPSO options set.

Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
---
 net/ipv4/cipso_ipv4.c | 89 ++++++++++++++++++++++++++++---------------
 1 file changed, 59 insertions(+), 30 deletions(-)

Comments

Paul Moore April 16, 2024, 6:39 p.m. UTC | #1
On Apr 16, 2024 Ondrej Mosnacek <omosnace@redhat.com> wrote:
> 
> As the comment in this function says, the code currently just clears the
> CIPSO part with IPOPT_NOP, rather than removing it completely and
> trimming the packet. This is inconsistent with the other
> cipso_v4_*_delattr() functions and with CALIPSO (IPv6).

This sentence above implies an equality in handling between those three
cases that doesn't exist.  IPv6 has a radically different approach to
IP options, comparisions between the two aren't really valid.  Similarly,
how we manage IPv4 options on sockets (req or otherwise) is pretty
different from how we manage them on packets, and that is intentional.

Drop the above sentence, or provide a more detailed explanation of the
three aproaches explaining when they can be compared and when they
shouldn't be compared.

> Implement the proper option removal to make it consistent and producing
> more optimal IP packets when there are CIPSO options set.
>
> Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
> ---
>  net/ipv4/cipso_ipv4.c | 89 ++++++++++++++++++++++++++++---------------
>  1 file changed, 59 insertions(+), 30 deletions(-)

Outside of the SELinux test suite, what testing have you done when you
have a Linux box forwarding between a CIPSO network segment and an
unlabeled segment?  I'm specifically interested in stream based protocols
such as TCP.  Also, do the rest of the netfilter callbacks handle it okay
if the skb changes size in one of the callbacks?  Granted it has been
*years* since this code was written (decades?), but if I recall
correctly, at the time it was a big no-no to change the skb size in a
netfilter callback.

> diff --git a/net/ipv4/cipso_ipv4.c b/net/ipv4/cipso_ipv4.c
> index 75b5e3c35f9bf..c08c6d0262ba8 100644
> --- a/net/ipv4/cipso_ipv4.c
> +++ b/net/ipv4/cipso_ipv4.c
> @@ -1810,6 +1810,34 @@ static int cipso_v4_genopt(unsigned char *buf, u32 buf_len,
>  	return CIPSO_V4_HDR_LEN + ret_val;
>  }
>  
> +static int cipso_v4_get_actual_opt_len(const unsigned char *data, int len)
> +{
> +	int iter = 0, optlen = 0;
> +
> +	/* determining the new total option length is tricky because of
> +	 * the padding necessary, the only thing i can think to do at
> +	 * this point is walk the options one-by-one, skipping the
> +	 * padding at the end to determine the actual option size and
> +	 * from there we can determine the new total option length
> +	 */
> +	while (iter < len) {
> +		if (data[iter] == IPOPT_END) {
> +			break;
> +		} else if (data[iter] == IPOPT_NOP) {
> +			iter++;
> +		} else {
> +			if (WARN_ON(data[iter + 1] < 2))
> +				iter += 2;
> +			else
> +				iter += data[iter + 1];
> +			optlen = iter;
> +		}
> +	}
> +	if (WARN_ON(optlen > len))
> +		optlen = len;
> +	return optlen;
> +}

See my comments in patch 1/2; they apply here.

>  /**
>   * cipso_v4_sock_setattr - Add a CIPSO option to a socket
>   * @sk: the socket
> @@ -1985,7 +2013,6 @@ static int cipso_v4_delopt(struct ip_options_rcu __rcu **opt_ptr)
>  		u8 cipso_len;
>  		u8 cipso_off;
>  		unsigned char *cipso_ptr;
> -		int iter;
>  		int optlen_new;
>  
>  		cipso_off = opt->opt.cipso - sizeof(struct iphdr);
> @@ -2005,28 +2032,8 @@ static int cipso_v4_delopt(struct ip_options_rcu __rcu **opt_ptr)
>  		memmove(cipso_ptr, cipso_ptr + cipso_len,
>  			opt->opt.optlen - cipso_off - cipso_len);
>  
> -		/* determining the new total option length is tricky because of
> -		 * the padding necessary, the only thing i can think to do at
> -		 * this point is walk the options one-by-one, skipping the
> -		 * padding at the end to determine the actual option size and
> -		 * from there we can determine the new total option length */
> -		iter = 0;
> -		optlen_new = 0;
> -		while (iter < opt->opt.optlen) {
> -			if (opt->opt.__data[iter] == IPOPT_END) {
> -				break;
> -			} else if (opt->opt.__data[iter] == IPOPT_NOP) {
> -				iter++;
> -			} else {
> -				if (WARN_ON(opt->opt.__data[iter + 1] < 2))
> -					iter += 2;
> -				else
> -					iter += opt->opt.__data[iter + 1];
> -				optlen_new = iter;
> -			}
> -		}
> -		if (WARN_ON(optlen_new > opt->opt.optlen))
> -			optlen_new = opt->opt.optlen;
> +		optlen_new = cipso_v4_get_actual_opt_len(opt->opt.__data,
> +							 opt->opt.optlen);
>  		hdr_delta = opt->opt.optlen;
>  		opt->opt.optlen = (optlen_new + 3) & ~3;
>  		hdr_delta -= opt->opt.optlen;
> @@ -2246,7 +2253,8 @@ int cipso_v4_skbuff_setattr(struct sk_buff *skb,
>   */
>  int cipso_v4_skbuff_delattr(struct sk_buff *skb)
>  {
> -	int ret_val;
> +	int ret_val, cipso_len, hdr_len_actual, new_hdr_len_actual, new_hdr_len,
> +	    hdr_len_delta;

Please keep line lengths under 80-chars whenever possible.  I know Linus
relaxed that requirement a while ago, but I still find the 80-char limit
to be a positive thing.

>  	struct iphdr *iph;
>  	struct ip_options *opt = &IPCB(skb)->opt;
>  	unsigned char *cipso_ptr;
> @@ -2259,16 +2267,37 @@ int cipso_v4_skbuff_delattr(struct sk_buff *skb)
>  	if (ret_val < 0)
>  		return ret_val;
>  
> -	/* the easiest thing to do is just replace the cipso option with noop
> -	 * options since we don't change the size of the packet, although we
> -	 * still need to recalculate the checksum */

Unless you can guarantee that the length change isn't going to have
any adverse effects (even then I would want to know why you are so
confident), I'd feel a lot more comfortable sticking with a
preserve-the-size-and-fill approach.  If you want to change that from
_NOP to _END, I'd be okay with that.

(and if you are talking to who I think you are talking to, I'm guessing
the _NOP to _END swap would likely solve their problem)

>  	iph = ip_hdr(skb);
>  	cipso_ptr = (unsigned char *)iph + opt->cipso;
> -	memset(cipso_ptr, IPOPT_NOOP, cipso_ptr[1]);
> +	cipso_len = cipso_ptr[1];
> +
> +	hdr_len_actual = sizeof(struct iphdr) +
> +			 cipso_v4_get_actual_opt_len((unsigned char *)(iph + 1),
> +						     opt->optlen);
> +	new_hdr_len_actual = hdr_len_actual - cipso_len;
> +	new_hdr_len = (new_hdr_len_actual + 3) & ~3;
> +	hdr_len_delta = (iph->ihl << 2) - new_hdr_len;
> +
> +	/* 1. shift any options after CIPSO to the left */
> +	memmove(cipso_ptr, cipso_ptr + cipso_len,
> +		new_hdr_len_actual - opt->cipso);
> +	/* 2. move the whole IP header to its new place */
> +	memmove((unsigned char *)iph + hdr_len_delta, iph, new_hdr_len_actual);
> +	/* 3. adjust the skb layout */
> +	skb_pull(skb, hdr_len_delta);
> +	skb_reset_network_header(skb);
> +	iph = ip_hdr(skb);
> +	/* 4. re-fill new padding with IPOPT_END (may now be longer) */
> +	memset((unsigned char *)iph + new_hdr_len_actual, IPOPT_END,
> +	       new_hdr_len - new_hdr_len_actual);
> +	opt->optlen -= hdr_len_delta;
>  	opt->cipso = 0;
>  	opt->is_changed = 1;
> -
> +	if (hdr_len_delta != 0) {
> +		iph->ihl = new_hdr_len >> 2;
> +		iph_set_totlen(iph, skb->len);
> +	}
>  	ip_send_check(iph);
>  
>  	return 0;
> -- 
> 2.44.0

--
paul-moore.com
Ondrej Mosnacek April 17, 2024, 1:03 p.m. UTC | #2
On Tue, Apr 16, 2024 at 8:39 PM Paul Moore <paul@paul-moore.com> wrote:
>
> On Apr 16, 2024 Ondrej Mosnacek <omosnace@redhat.com> wrote:
> >
> > As the comment in this function says, the code currently just clears the
> > CIPSO part with IPOPT_NOP, rather than removing it completely and
> > trimming the packet. This is inconsistent with the other
> > cipso_v4_*_delattr() functions and with CALIPSO (IPv6).
>
> This sentence above implies an equality in handling between those three
> cases that doesn't exist.  IPv6 has a radically different approach to
> IP options, comparisions between the two aren't really valid.

I don't think it's that radically different. Look at
calipso_skbuff_delattr() and you will see that it already does
something very similar as cipso_v4_skbuff_delattr() after this patch.
If the CALIPSO options are the only thing in the hopopt header, it
removes it altogether. If there are other options, it removes the part
with the CALIPSO options - here it is less pedantic and replaces it
with up to 7-byte padding if the length of the CALIPSO part is not
8-byte aligned, presumably to avoid having to move the subsequent
options (if any) and adjust the padding at the end.

> Similarly,
> how we manage IPv4 options on sockets (req or otherwise) is pretty
> different from how we manage them on packets, and that is intentional.

Perhaps, but that is an implementation detail to the user. The
resulting packet content should ideally be the same regardless of
which method of injecting the options is used, when possible.

> Drop the above sentence, or provide a more detailed explanation of the
> three aproaches explaining when they can be compared and when they
> shouldn't be compared.

I'm not sure why you think they shouldn't be compared, but I can
surely be more detailed in making my case there.

> > Implement the proper option removal to make it consistent and producing
> > more optimal IP packets when there are CIPSO options set.
> >
> > Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
> > ---
> >  net/ipv4/cipso_ipv4.c | 89 ++++++++++++++++++++++++++++---------------
> >  1 file changed, 59 insertions(+), 30 deletions(-)
>
> Outside of the SELinux test suite, what testing have you done when you
> have a Linux box forwarding between a CIPSO network segment and an
> unlabeled segment?  I'm specifically interested in stream based protocols
> such as TCP.  Also, do the rest of the netfilter callbacks handle it okay
> if the skb changes size in one of the callbacks?  Granted it has been
> *years* since this code was written (decades?), but if I recall
> correctly, at the time it was a big no-no to change the skb size in a
> netfilter callback.

I didn't test that, TBH. But all of cipso_v4_skbuff_setattr(),
calipso_skbuff_setattr(), and calipso_skbuff_delattr() already do
skb_push()/skb_pull(), so they would all be broken if that is (still?)
true. And this new cipso_v4_skbuff_delattr() doesn't do anything
w.r.t. the skb and the IP header that those wouldn't do already.

[...]
> > @@ -2246,7 +2253,8 @@ int cipso_v4_skbuff_setattr(struct sk_buff *skb,
> >   */
> >  int cipso_v4_skbuff_delattr(struct sk_buff *skb)
> >  {
> > -     int ret_val;
> > +     int ret_val, cipso_len, hdr_len_actual, new_hdr_len_actual, new_hdr_len,
> > +         hdr_len_delta;
>
> Please keep line lengths under 80-chars whenever possible.  I know Linus
> relaxed that requirement a while ago, but I still find the 80-char limit
> to be a positive thing.

I believe the line is exactly 80 characters, so still within the
limit. Or is it < 80 instead of <= 80? Does it really matter?

>
> >       struct iphdr *iph;
> >       struct ip_options *opt = &IPCB(skb)->opt;
> >       unsigned char *cipso_ptr;
> > @@ -2259,16 +2267,37 @@ int cipso_v4_skbuff_delattr(struct sk_buff *skb)
> >       if (ret_val < 0)
> >               return ret_val;
> >
> > -     /* the easiest thing to do is just replace the cipso option with noop
> > -      * options since we don't change the size of the packet, although we
> > -      * still need to recalculate the checksum */
>
> Unless you can guarantee that the length change isn't going to have
> any adverse effects (even then I would want to know why you are so
> confident), I'd feel a lot more comfortable sticking with a
> preserve-the-size-and-fill approach.  If you want to change that from
> _NOP to _END, I'd be okay with that.
>
> (and if you are talking to who I think you are talking to, I'm guessing
> the _NOP to _END swap would likely solve their problem)

So, to reveal all the cards, the issue that has prompted me to send
this patch is that a user may have a router configured to drop packets
containing any IP options [1][2] and may expect a Linux host with
NetLabel configured as unlabeled for a target IP address to
output/forward packets without IP options if CIPSO was the only option
present before NetLabel processing (such that it can then pass through
the strict router(s)).

Padding with IPOPT_END *might* solve this particular case, but I'm not
sure if the routers would really interpret such padding as "no
options"... I'll try to ask the reporter to test such a patch and
we'll see. But still, I don't yet see a convincing reason to not go
all the way and make sure we send optimally-sized packets here.

Side note: We could only clear CIPSO with IPOPT_END if it's the last
option in the header, but that limitation doesn't really matter for
the use case described above.

[1] https://www.stigviewer.com/stig/juniper_router_rtr/2019-09-27/finding/V-90937
[2] https://www.stigviewer.com/stig/cisco_ios_xe_router_rtr/2021-03-26/finding/V-217001

>
> >       iph = ip_hdr(skb);
> >       cipso_ptr = (unsigned char *)iph + opt->cipso;
> > -     memset(cipso_ptr, IPOPT_NOOP, cipso_ptr[1]);
> > +     cipso_len = cipso_ptr[1];
> > +
> > +     hdr_len_actual = sizeof(struct iphdr) +
> > +                      cipso_v4_get_actual_opt_len((unsigned char *)(iph + 1),
> > +                                                  opt->optlen);
> > +     new_hdr_len_actual = hdr_len_actual - cipso_len;
> > +     new_hdr_len = (new_hdr_len_actual + 3) & ~3;
> > +     hdr_len_delta = (iph->ihl << 2) - new_hdr_len;
> > +
> > +     /* 1. shift any options after CIPSO to the left */
> > +     memmove(cipso_ptr, cipso_ptr + cipso_len,
> > +             new_hdr_len_actual - opt->cipso);
> > +     /* 2. move the whole IP header to its new place */
> > +     memmove((unsigned char *)iph + hdr_len_delta, iph, new_hdr_len_actual);
> > +     /* 3. adjust the skb layout */
> > +     skb_pull(skb, hdr_len_delta);
> > +     skb_reset_network_header(skb);
> > +     iph = ip_hdr(skb);
> > +     /* 4. re-fill new padding with IPOPT_END (may now be longer) */
> > +     memset((unsigned char *)iph + new_hdr_len_actual, IPOPT_END,
> > +            new_hdr_len - new_hdr_len_actual);
> > +     opt->optlen -= hdr_len_delta;
> >       opt->cipso = 0;
> >       opt->is_changed = 1;
> > -
> > +     if (hdr_len_delta != 0) {
> > +             iph->ihl = new_hdr_len >> 2;
> > +             iph_set_totlen(iph, skb->len);
> > +     }
> >       ip_send_check(iph);
> >
> >       return 0;
> > --
> > 2.44.0
>
> --
> paul-moore.com
>

--
Ondrej Mosnacek
Senior Software Engineer, Linux Security - SELinux kernel
Red Hat, Inc.
Paul Moore April 25, 2024, 9:48 p.m. UTC | #3
On Wed, Apr 17, 2024 at 9:03 AM Ondrej Mosnacek <omosnace@redhat.com> wrote:
> On Tue, Apr 16, 2024 at 8:39 PM Paul Moore <paul@paul-moore.com> wrote:
> > On Apr 16, 2024 Ondrej Mosnacek <omosnace@redhat.com> wrote:
> > >
> > > As the comment in this function says, the code currently just clears the
> > > CIPSO part with IPOPT_NOP, rather than removing it completely and
> > > trimming the packet. This is inconsistent with the other
> > > cipso_v4_*_delattr() functions and with CALIPSO (IPv6).
> >
> > This sentence above implies an equality in handling between those three
> > cases that doesn't exist.  IPv6 has a radically different approach to
> > IP options, comparisions between the two aren't really valid.
>
> I don't think it's that radically different.

They are very different in my mind.  The IPv4 vs IPv6 option format
and handling should be fairly obvious and I'm sure there are plenty of
things written that describe the differences and motivations in
excruciating detail so I'm not going to bother trying to do that here;
as usual, Google is your friend.  I will admit that the skbuff vs
socket-based labeling differences are a bit more subtle, but I believe
if you look at how the packets are labeled in the two approaches as
well as how they are managed and hooked into the LSMs you will start
to get a better idea.  If that doesn't convince you that these three
cases are significantly different, I'm not sure what else I can say
other than we have a difference of opinion.  Regardless, I stand by my
original comment that I don't like the text you chose and would like
you to remove or change it.

> > > Implement the proper option removal to make it consistent and producing
> > > more optimal IP packets when there are CIPSO options set.
> > >
> > > Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
> > > ---
> > >  net/ipv4/cipso_ipv4.c | 89 ++++++++++++++++++++++++++++---------------
> > >  1 file changed, 59 insertions(+), 30 deletions(-)
> >
> > Outside of the SELinux test suite, what testing have you done when you
> > have a Linux box forwarding between a CIPSO network segment and an
> > unlabeled segment?  I'm specifically interested in stream based protocols
> > such as TCP.  Also, do the rest of the netfilter callbacks handle it okay
> > if the skb changes size in one of the callbacks?  Granted it has been
> > *years* since this code was written (decades?), but if I recall
> > correctly, at the time it was a big no-no to change the skb size in a
> > netfilter callback.
>
> I didn't test that, TBH. But all of cipso_v4_skbuff_setattr(),
> calipso_skbuff_setattr(), and calipso_skbuff_delattr() already do
> skb_push()/skb_pull(), so they would all be broken if that is (still?)
> true. And this new cipso_v4_skbuff_delattr() doesn't do anything
> w.r.t. the skb and the IP header that those wouldn't do already.

Fair point on skbuff size changes in netfilter and
cipso_v4_skbuff_setattr(), that wasn't part of the original
NetLabel/CIPSO support and I forgot about that aspect.  On the other
hand, I believe cipso_v4_skbuff_delattr() was part of the original
work and used the NOOP hack both to preserve the packet length in the
netfilter chain and to help ensure a consistent IP header overhead on
both sides of a forwarding CIPSO<->unlabeled labeling/access control
system.  Which brings me around to the reason why I asked about
testing; I think we need to confirm that nothing bad happens to
bidirectional stream-based connections, e.g. TCP, when crossing over a
CIPSO/unlabeled network boundary and the IP overhead changes.  It's
probably okay, but I would like to see that you've tested it with a
couple different client OSes and everything works as expected.

> [...]
> > > @@ -2246,7 +2253,8 @@ int cipso_v4_skbuff_setattr(struct sk_buff *skb,
> > >   */
> > >  int cipso_v4_skbuff_delattr(struct sk_buff *skb)
> > >  {
> > > -     int ret_val;
> > > +     int ret_val, cipso_len, hdr_len_actual, new_hdr_len_actual, new_hdr_len,
> > > +         hdr_len_delta;
> >
> > Please keep line lengths under 80-chars whenever possible.  I know Linus
> > relaxed that requirement a while ago, but I still find the 80-char limit
> > to be a positive thing.
>
> I believe the line is exactly 80 characters, so still within the
> limit. Or is it < 80 instead of <= 80? Does it really matter?

I thought I saw it wrap on my terminal when reviewing the code, maybe
it was just the newlink wrapping that I saw.  As long as it is <= 80
I'm okay with it.

> >
> > >       struct iphdr *iph;
> > >       struct ip_options *opt = &IPCB(skb)->opt;
> > >       unsigned char *cipso_ptr;
> > > @@ -2259,16 +2267,37 @@ int cipso_v4_skbuff_delattr(struct sk_buff *skb)
> > >       if (ret_val < 0)
> > >               return ret_val;
> > >
> > > -     /* the easiest thing to do is just replace the cipso option with noop
> > > -      * options since we don't change the size of the packet, although we
> > > -      * still need to recalculate the checksum */
> >
> > Unless you can guarantee that the length change isn't going to have
> > any adverse effects (even then I would want to know why you are so
> > confident), I'd feel a lot more comfortable sticking with a
> > preserve-the-size-and-fill approach.  If you want to change that from
> > _NOP to _END, I'd be okay with that.
> >
> > (and if you are talking to who I think you are talking to, I'm guessing
> > the _NOP to _END swap would likely solve their problem)
>
> So, to reveal all the cards, the issue that has prompted me to send
> this patch is that a user may have a router configured to drop packets
> containing any IP options [1][2] and may expect a Linux host with
> NetLabel configured as unlabeled for a target IP address to
> output/forward packets without IP options if CIPSO was the only option
> present before NetLabel processing (such that it can then pass through
> the strict router(s)).
>
> Padding with IPOPT_END *might* solve this particular case, but I'm not
> sure if the routers would really interpret such padding as "no
> options"... I'll try to ask the reporter to test such a patch and
> we'll see. But still, I don't yet see a convincing reason to not go
> all the way and make sure we send optimally-sized packets here.

I'm about 99.5% certain we are talking about the same reporter,
although I was missing the detail about an intermediate
switching/routing node; the problem report I saw was that there was
simply a black-box device on the network that was dropping packets due
to the presence of NOOP options.  IMO, the original RFCs are a little
too vague in this area, but it doesn't really matter if an
intermediate node is dropping the packets.

> Side note: We could only clear CIPSO with IPOPT_END if it's the last
> option in the header ...

Obviously, I kinda assumed anyone following along would understand that.

> ... but that limitation doesn't really matter for
> the use case described above.
diff mbox series

Patch

diff --git a/net/ipv4/cipso_ipv4.c b/net/ipv4/cipso_ipv4.c
index 75b5e3c35f9bf..c08c6d0262ba8 100644
--- a/net/ipv4/cipso_ipv4.c
+++ b/net/ipv4/cipso_ipv4.c
@@ -1810,6 +1810,34 @@  static int cipso_v4_genopt(unsigned char *buf, u32 buf_len,
 	return CIPSO_V4_HDR_LEN + ret_val;
 }
 
+static int cipso_v4_get_actual_opt_len(const unsigned char *data, int len)
+{
+	int iter = 0, optlen = 0;
+
+	/* determining the new total option length is tricky because of
+	 * the padding necessary, the only thing i can think to do at
+	 * this point is walk the options one-by-one, skipping the
+	 * padding at the end to determine the actual option size and
+	 * from there we can determine the new total option length
+	 */
+	while (iter < len) {
+		if (data[iter] == IPOPT_END) {
+			break;
+		} else if (data[iter] == IPOPT_NOP) {
+			iter++;
+		} else {
+			if (WARN_ON(data[iter + 1] < 2))
+				iter += 2;
+			else
+				iter += data[iter + 1];
+			optlen = iter;
+		}
+	}
+	if (WARN_ON(optlen > len))
+		optlen = len;
+	return optlen;
+}
+
 /**
  * cipso_v4_sock_setattr - Add a CIPSO option to a socket
  * @sk: the socket
@@ -1985,7 +2013,6 @@  static int cipso_v4_delopt(struct ip_options_rcu __rcu **opt_ptr)
 		u8 cipso_len;
 		u8 cipso_off;
 		unsigned char *cipso_ptr;
-		int iter;
 		int optlen_new;
 
 		cipso_off = opt->opt.cipso - sizeof(struct iphdr);
@@ -2005,28 +2032,8 @@  static int cipso_v4_delopt(struct ip_options_rcu __rcu **opt_ptr)
 		memmove(cipso_ptr, cipso_ptr + cipso_len,
 			opt->opt.optlen - cipso_off - cipso_len);
 
-		/* determining the new total option length is tricky because of
-		 * the padding necessary, the only thing i can think to do at
-		 * this point is walk the options one-by-one, skipping the
-		 * padding at the end to determine the actual option size and
-		 * from there we can determine the new total option length */
-		iter = 0;
-		optlen_new = 0;
-		while (iter < opt->opt.optlen) {
-			if (opt->opt.__data[iter] == IPOPT_END) {
-				break;
-			} else if (opt->opt.__data[iter] == IPOPT_NOP) {
-				iter++;
-			} else {
-				if (WARN_ON(opt->opt.__data[iter + 1] < 2))
-					iter += 2;
-				else
-					iter += opt->opt.__data[iter + 1];
-				optlen_new = iter;
-			}
-		}
-		if (WARN_ON(optlen_new > opt->opt.optlen))
-			optlen_new = opt->opt.optlen;
+		optlen_new = cipso_v4_get_actual_opt_len(opt->opt.__data,
+							 opt->opt.optlen);
 		hdr_delta = opt->opt.optlen;
 		opt->opt.optlen = (optlen_new + 3) & ~3;
 		hdr_delta -= opt->opt.optlen;
@@ -2246,7 +2253,8 @@  int cipso_v4_skbuff_setattr(struct sk_buff *skb,
  */
 int cipso_v4_skbuff_delattr(struct sk_buff *skb)
 {
-	int ret_val;
+	int ret_val, cipso_len, hdr_len_actual, new_hdr_len_actual, new_hdr_len,
+	    hdr_len_delta;
 	struct iphdr *iph;
 	struct ip_options *opt = &IPCB(skb)->opt;
 	unsigned char *cipso_ptr;
@@ -2259,16 +2267,37 @@  int cipso_v4_skbuff_delattr(struct sk_buff *skb)
 	if (ret_val < 0)
 		return ret_val;
 
-	/* the easiest thing to do is just replace the cipso option with noop
-	 * options since we don't change the size of the packet, although we
-	 * still need to recalculate the checksum */
-
 	iph = ip_hdr(skb);
 	cipso_ptr = (unsigned char *)iph + opt->cipso;
-	memset(cipso_ptr, IPOPT_NOOP, cipso_ptr[1]);
+	cipso_len = cipso_ptr[1];
+
+	hdr_len_actual = sizeof(struct iphdr) +
+			 cipso_v4_get_actual_opt_len((unsigned char *)(iph + 1),
+						     opt->optlen);
+	new_hdr_len_actual = hdr_len_actual - cipso_len;
+	new_hdr_len = (new_hdr_len_actual + 3) & ~3;
+	hdr_len_delta = (iph->ihl << 2) - new_hdr_len;
+
+	/* 1. shift any options after CIPSO to the left */
+	memmove(cipso_ptr, cipso_ptr + cipso_len,
+		new_hdr_len_actual - opt->cipso);
+	/* 2. move the whole IP header to its new place */
+	memmove((unsigned char *)iph + hdr_len_delta, iph, new_hdr_len_actual);
+	/* 3. adjust the skb layout */
+	skb_pull(skb, hdr_len_delta);
+	skb_reset_network_header(skb);
+	iph = ip_hdr(skb);
+	/* 4. re-fill new padding with IPOPT_END (may now be longer) */
+	memset((unsigned char *)iph + new_hdr_len_actual, IPOPT_END,
+	       new_hdr_len - new_hdr_len_actual);
+
+	opt->optlen -= hdr_len_delta;
 	opt->cipso = 0;
 	opt->is_changed = 1;
-
+	if (hdr_len_delta != 0) {
+		iph->ihl = new_hdr_len >> 2;
+		iph_set_totlen(iph, skb->len);
+	}
 	ip_send_check(iph);
 
 	return 0;