Message ID | 20170425184734.26563-1-Jason@zx2c4.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hey Dave, David Laight and I have been discussing offlist. It occurred to both of us that this could just be turned into a loop because perhaps this is actually just tail-recursive. Upon further inspection, however, the way the current algorithm works, it's possible that each of the fraglist skbs has its own fraglist, which would make this into tree recursion, which is why in the first place I wanted to place that limit on it. If that's the case, then the patch I proposed above is the best way forward. However, perhaps there's the chance that fraglist skbs having separate fraglists are actually forbidden? Is this the case? Are there other parts of the API that enforce this contract? Is it something we could safely rely on here? If you say yes, I'll send a v7 that makes this into a non-recursive loop. Regards, Jason
2017-04-27, 11:21:51 +0200, Jason A. Donenfeld wrote: > However, perhaps there's the chance that fraglist skbs having > separate fraglists are actually forbidden? Is this the case? Hmm, I think this can actually happen: /* net/ipv4/ip_fragment.c */ static int ip_frag_reasm(struct ipq *qp, struct sk_buff *prev, struct net_device *dev) { ... /* If the first fragment is fragmented itself, we split * it to two chunks: the first with data and paged part * and the second, holding only fragments. */ if (skb_has_frag_list(head)) { struct sk_buff *clone; int i, plen = 0; clone = alloc_skb(0, GFP_ATOMIC); if (!clone) goto out_nomem; clone->next = head->next; head->next = clone; skb_shinfo(clone)->frag_list = skb_shinfo(head)->frag_list; skb_frag_list_init(head); for (i = 0; i < skb_shinfo(head)->nr_frags; i++) plen += skb_frag_size(&skb_shinfo(head)->frags[i]); clone->len = clone->data_len = head->data_len - plen; head->data_len -= clone->len; head->len -= clone->len; clone->csum = 0; clone->ip_summed = head->ip_summed; add_frag_mem_limit(qp->q.net, clone->truesize); } ... } You can test that with a vxlan tunnel on top of a vxlan tunnel ("real" MTU is 1500, first tunnel MTU set to 10000, second tunnel MTU set to 40000 -- or anything, as long as they both get fragmented).
On Thu, Apr 27, 2017 at 1:30 PM, Sabrina Dubroca <sd@queasysnail.net> wrote:
> Hmm, I think this can actually happen:
Alright, perhaps better to err on the side of caution, then.
Jason
From: Jason A. Donenfeld > On Thu, Apr 27, 2017 at 1:30 PM, Sabrina Dubroca <sd@queasysnail.net> wrote: > > Hmm, I think this can actually happen: > > Alright, perhaps better to err on the side of caution, then. You only need to recurse if both pointers are set. David
From: "Jason A. Donenfeld" <Jason@zx2c4.com> Date: Thu, 27 Apr 2017 11:21:51 +0200 > Hey Dave, > > David Laight and I have been discussing offlist. It occurred to both > of us that this could just be turned into a loop because perhaps this > is actually just tail-recursive. Upon further inspection, however, the > way the current algorithm works, it's possible that each of the > fraglist skbs has its own fraglist, which would make this into tree > recursion, which is why in the first place I wanted to place that > limit on it. If that's the case, then the patch I proposed above is > the best way forward. However, perhaps there's the chance that > fraglist skbs having separate fraglists are actually forbidden? Is > this the case? Are there other parts of the API that enforce this > contract? Is it something we could safely rely on here? If you say > yes, I'll send a v7 that makes this into a non-recursive loop. As Sabrina showed, it can happen. There are no such restrictions on the geometry of an SKB.
2017-04-25, 20:47:30 +0200, Jason A. Donenfeld wrote: > This is a defense-in-depth measure in response to bugs like > 4d6fa57b4dab ("macsec: avoid heap overflow in skb_to_sgvec"). While > we're at it, we also limit the amount of recursion this function is > allowed to do. Not actually providing a bounded base case is a future > diaster that we can easily avoid here. > > Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> > --- > Changes v5->v6: > * Use unlikely() for the rare overflow conditions. > * Also bound recursion, since this is a potential disaster we can avert. > > net/core/skbuff.c | 31 ++++++++++++++++++++++++------- > 1 file changed, 24 insertions(+), 7 deletions(-) > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > index f86bf69cfb8d..24fb53f8534e 100644 > --- a/net/core/skbuff.c > +++ b/net/core/skbuff.c > @@ -3489,16 +3489,22 @@ void __init skb_init(void) > * @len: Length of buffer space to be mapped > * > * Fill the specified scatter-gather list with mappings/pointers into a > - * region of the buffer space attached to a socket buffer. > + * region of the buffer space attached to a socket buffer. Returns either > + * the number of scatterlist items used, or -EMSGSIZE if the contents > + * could not fit. > */ One small thing here: since you're touching this comment, could you move it next to skb_to_sgvec, since that's the function it's supposed to document? Thanks! > static int > -__skb_to_sgvec(struct sk_buff *skb, struct scatterlist *sg, int offset, int len) > +__skb_to_sgvec(struct sk_buff *skb, struct scatterlist *sg, int offset, int len, > + unsigned int recursion_level)
Hi Sabrina, On Fri, Apr 28, 2017 at 6:18 PM, Sabrina Dubroca <sd@queasysnail.net> wrote: > One small thing here: since you're touching this comment, could you > move it next to skb_to_sgvec, since that's the function it's supposed > to document? Done. I'll wait until next week to resubmit, to give some more time for comments, but my current living copy of this series is here: https://git.zx2c4.com/linux-dev/log/?h=jd/safe-skb-vec One thing I'm considering, after discussing with David Laight, is the potential of just using an explicit stack array for pushing and popping skbs, rather than using the call stack. While this increases complexity, which I'm opposed to, David makes the point that on some architectures, the stack frame is rather large, and 32 function calls of recursion might not be a good idea. Any opinons on this? Overkill and simplicity is preferred? Or in fact best practice? (Either way, I'll do a trial implementation of it to get an idea of how the end result feels.)
diff --git a/net/core/skbuff.c b/net/core/skbuff.c index f86bf69cfb8d..24fb53f8534e 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -3489,16 +3489,22 @@ void __init skb_init(void) * @len: Length of buffer space to be mapped * * Fill the specified scatter-gather list with mappings/pointers into a - * region of the buffer space attached to a socket buffer. + * region of the buffer space attached to a socket buffer. Returns either + * the number of scatterlist items used, or -EMSGSIZE if the contents + * could not fit. */ static int -__skb_to_sgvec(struct sk_buff *skb, struct scatterlist *sg, int offset, int len) +__skb_to_sgvec(struct sk_buff *skb, struct scatterlist *sg, int offset, int len, + unsigned int recursion_level) { int start = skb_headlen(skb); int i, copy = start - offset; struct sk_buff *frag_iter; int elt = 0; + if (unlikely(recursion_level >= 32)) + return -EMSGSIZE; + if (copy > 0) { if (copy > len) copy = len; @@ -3517,6 +3523,8 @@ __skb_to_sgvec(struct sk_buff *skb, struct scatterlist *sg, int offset, int len) end = start + skb_frag_size(&skb_shinfo(skb)->frags[i]); if ((copy = end - offset) > 0) { skb_frag_t *frag = &skb_shinfo(skb)->frags[i]; + if (unlikely(elt && sg_is_last(&sg[elt - 1]))) + return -EMSGSIZE; if (copy > len) copy = len; @@ -3531,16 +3539,22 @@ __skb_to_sgvec(struct sk_buff *skb, struct scatterlist *sg, int offset, int len) } skb_walk_frags(skb, frag_iter) { - int end; + int end, ret; WARN_ON(start > offset + len); end = start + frag_iter->len; if ((copy = end - offset) > 0) { + if (unlikely(elt && sg_is_last(&sg[elt - 1]))) + return -EMSGSIZE; + if (copy > len) copy = len; - elt += __skb_to_sgvec(frag_iter, sg+elt, offset - start, - copy); + ret = __skb_to_sgvec(frag_iter, sg+elt, offset - start, + copy, recursion_level + 1); + if (unlikely(ret < 0)) + return ret; + elt += ret; if ((len -= copy) == 0) return elt; offset += copy; @@ -3573,13 +3587,16 @@ __skb_to_sgvec(struct sk_buff *skb, struct scatterlist *sg, int offset, int len) int skb_to_sgvec_nomark(struct sk_buff *skb, struct scatterlist *sg, int offset, int len) { - return __skb_to_sgvec(skb, sg, offset, len); + return __skb_to_sgvec(skb, sg, offset, len, 0); } EXPORT_SYMBOL_GPL(skb_to_sgvec_nomark); int skb_to_sgvec(struct sk_buff *skb, struct scatterlist *sg, int offset, int len) { - int nsg = __skb_to_sgvec(skb, sg, offset, len); + int nsg = __skb_to_sgvec(skb, sg, offset, len, 0); + + if (nsg <= 0) + return nsg; sg_mark_end(&sg[nsg - 1]);
This is a defense-in-depth measure in response to bugs like 4d6fa57b4dab ("macsec: avoid heap overflow in skb_to_sgvec"). While we're at it, we also limit the amount of recursion this function is allowed to do. Not actually providing a bounded base case is a future diaster that we can easily avoid here. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> --- Changes v5->v6: * Use unlikely() for the rare overflow conditions. * Also bound recursion, since this is a potential disaster we can avert. net/core/skbuff.c | 31 ++++++++++++++++++++++++------- 1 file changed, 24 insertions(+), 7 deletions(-)