diff mbox

[v6,1/5] skbuff: return -EMSGSIZE in skb_to_sgvec to prevent overflow

Message ID 20170425184734.26563-1-Jason@zx2c4.com (mailing list archive)
State New, archived
Headers show

Commit Message

Jason A. Donenfeld April 25, 2017, 6:47 p.m. UTC
This is a defense-in-depth measure in response to bugs like
4d6fa57b4dab ("macsec: avoid heap overflow in skb_to_sgvec"). While
we're at it, we also limit the amount of recursion this function is
allowed to do. Not actually providing a bounded base case is a future
diaster that we can easily avoid here.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
---
Changes v5->v6:
  * Use unlikely() for the rare overflow conditions.
  * Also bound recursion, since this is a potential disaster we can avert.

 net/core/skbuff.c | 31 ++++++++++++++++++++++++-------
 1 file changed, 24 insertions(+), 7 deletions(-)

Comments

Jason A. Donenfeld April 27, 2017, 9:21 a.m. UTC | #1
Hey Dave,

David Laight and I have been discussing offlist. It occurred to both
of us that this could just be turned into a loop because perhaps this
is actually just tail-recursive. Upon further inspection, however, the
way the current algorithm works, it's possible that each of the
fraglist skbs has its own fraglist, which would make this into tree
recursion, which is why in the first place I wanted to place that
limit on it. If that's the case, then the patch I proposed above is
the best way forward. However, perhaps there's the chance that
fraglist skbs having separate fraglists are actually forbidden? Is
this the case? Are there other parts of the API that enforce this
contract? Is it something we could safely rely on here? If you say
yes, I'll send a v7 that makes this into a non-recursive loop.

Regards,
Jason
Sabrina Dubroca April 27, 2017, 11:30 a.m. UTC | #2
2017-04-27, 11:21:51 +0200, Jason A. Donenfeld wrote:
> However, perhaps there's the chance that fraglist skbs having
> separate fraglists are actually forbidden? Is this the case?

Hmm, I think this can actually happen:

    /*  net/ipv4/ip_fragment.c  */
    static int ip_frag_reasm(struct ipq *qp, struct sk_buff *prev,
    			 struct net_device *dev)
    {
    
    ...
    
    	/* If the first fragment is fragmented itself, we split
    	 * it to two chunks: the first with data and paged part
    	 * and the second, holding only fragments. */
    	if (skb_has_frag_list(head)) {
    		struct sk_buff *clone;
    		int i, plen = 0;
    
    		clone = alloc_skb(0, GFP_ATOMIC);
    		if (!clone)
    			goto out_nomem;
    		clone->next = head->next;
    		head->next = clone;
    		skb_shinfo(clone)->frag_list = skb_shinfo(head)->frag_list;
    		skb_frag_list_init(head);
    		for (i = 0; i < skb_shinfo(head)->nr_frags; i++)
    			plen += skb_frag_size(&skb_shinfo(head)->frags[i]);
    		clone->len = clone->data_len = head->data_len - plen;
    		head->data_len -= clone->len;
    		head->len -= clone->len;
    		clone->csum = 0;
    		clone->ip_summed = head->ip_summed;
    		add_frag_mem_limit(qp->q.net, clone->truesize);
    	}
    
    ...
    }
    

You can test that with a vxlan tunnel on top of a vxlan tunnel ("real"
MTU is 1500, first tunnel MTU set to 10000, second tunnel MTU set to
40000 -- or anything, as long as they both get fragmented).
Jason A. Donenfeld April 27, 2017, 12:04 p.m. UTC | #3
On Thu, Apr 27, 2017 at 1:30 PM, Sabrina Dubroca <sd@queasysnail.net> wrote:
> Hmm, I think this can actually happen:

Alright, perhaps better to err on the side of caution, then.

Jason
David Laight April 27, 2017, 2:54 p.m. UTC | #4
From: Jason A. Donenfeld

> On Thu, Apr 27, 2017 at 1:30 PM, Sabrina Dubroca <sd@queasysnail.net> wrote:

> > Hmm, I think this can actually happen:

> 

> Alright, perhaps better to err on the side of caution, then.


You only need to recurse if both pointers are set.

	David
David Miller April 27, 2017, 3:54 p.m. UTC | #5
From: "Jason A. Donenfeld" <Jason@zx2c4.com>
Date: Thu, 27 Apr 2017 11:21:51 +0200

> Hey Dave,
> 
> David Laight and I have been discussing offlist. It occurred to both
> of us that this could just be turned into a loop because perhaps this
> is actually just tail-recursive. Upon further inspection, however, the
> way the current algorithm works, it's possible that each of the
> fraglist skbs has its own fraglist, which would make this into tree
> recursion, which is why in the first place I wanted to place that
> limit on it. If that's the case, then the patch I proposed above is
> the best way forward. However, perhaps there's the chance that
> fraglist skbs having separate fraglists are actually forbidden? Is
> this the case? Are there other parts of the API that enforce this
> contract? Is it something we could safely rely on here? If you say
> yes, I'll send a v7 that makes this into a non-recursive loop.

As Sabrina showed, it can happen.  There are no such restrictions on
the geometry of an SKB.
Sabrina Dubroca April 28, 2017, 4:18 p.m. UTC | #6
2017-04-25, 20:47:30 +0200, Jason A. Donenfeld wrote:
> This is a defense-in-depth measure in response to bugs like
> 4d6fa57b4dab ("macsec: avoid heap overflow in skb_to_sgvec"). While
> we're at it, we also limit the amount of recursion this function is
> allowed to do. Not actually providing a bounded base case is a future
> diaster that we can easily avoid here.
> 
> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
> ---
> Changes v5->v6:
>   * Use unlikely() for the rare overflow conditions.
>   * Also bound recursion, since this is a potential disaster we can avert.
> 
>  net/core/skbuff.c | 31 ++++++++++++++++++++++++-------
>  1 file changed, 24 insertions(+), 7 deletions(-)
> 
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index f86bf69cfb8d..24fb53f8534e 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -3489,16 +3489,22 @@ void __init skb_init(void)
>   *	@len: Length of buffer space to be mapped
>   *
>   *	Fill the specified scatter-gather list with mappings/pointers into a
> - *	region of the buffer space attached to a socket buffer.
> + *	region of the buffer space attached to a socket buffer. Returns either
> + *	the number of scatterlist items used, or -EMSGSIZE if the contents
> + *	could not fit.
>   */

One small thing here: since you're touching this comment, could you
move it next to skb_to_sgvec, since that's the function it's supposed
to document?

Thanks!

>  static int
> -__skb_to_sgvec(struct sk_buff *skb, struct scatterlist *sg, int offset, int len)
> +__skb_to_sgvec(struct sk_buff *skb, struct scatterlist *sg, int offset, int len,
> +	       unsigned int recursion_level)
Jason A. Donenfeld April 28, 2017, 10:47 p.m. UTC | #7
Hi Sabrina,

On Fri, Apr 28, 2017 at 6:18 PM, Sabrina Dubroca <sd@queasysnail.net> wrote:
> One small thing here: since you're touching this comment, could you
> move it next to skb_to_sgvec, since that's the function it's supposed
> to document?

Done. I'll wait until next week to resubmit, to give some more time
for comments, but my current living copy of this series is here:
https://git.zx2c4.com/linux-dev/log/?h=jd/safe-skb-vec

One thing I'm considering, after discussing with David Laight, is the
potential of just using an explicit stack array for pushing and
popping skbs, rather than using the call stack. While this increases
complexity, which I'm opposed to, David makes the point that on some
architectures, the stack frame is rather large, and 32 function calls
of recursion might not be a good idea. Any opinons on this? Overkill
and simplicity is preferred? Or in fact best practice? (Either way,
I'll do a trial implementation of it to get an idea of how the end
result feels.)
diff mbox

Patch

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index f86bf69cfb8d..24fb53f8534e 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -3489,16 +3489,22 @@  void __init skb_init(void)
  *	@len: Length of buffer space to be mapped
  *
  *	Fill the specified scatter-gather list with mappings/pointers into a
- *	region of the buffer space attached to a socket buffer.
+ *	region of the buffer space attached to a socket buffer. Returns either
+ *	the number of scatterlist items used, or -EMSGSIZE if the contents
+ *	could not fit.
  */
 static int
-__skb_to_sgvec(struct sk_buff *skb, struct scatterlist *sg, int offset, int len)
+__skb_to_sgvec(struct sk_buff *skb, struct scatterlist *sg, int offset, int len,
+	       unsigned int recursion_level)
 {
 	int start = skb_headlen(skb);
 	int i, copy = start - offset;
 	struct sk_buff *frag_iter;
 	int elt = 0;
 
+	if (unlikely(recursion_level >= 32))
+		return -EMSGSIZE;
+
 	if (copy > 0) {
 		if (copy > len)
 			copy = len;
@@ -3517,6 +3523,8 @@  __skb_to_sgvec(struct sk_buff *skb, struct scatterlist *sg, int offset, int len)
 		end = start + skb_frag_size(&skb_shinfo(skb)->frags[i]);
 		if ((copy = end - offset) > 0) {
 			skb_frag_t *frag = &skb_shinfo(skb)->frags[i];
+			if (unlikely(elt && sg_is_last(&sg[elt - 1])))
+				return -EMSGSIZE;
 
 			if (copy > len)
 				copy = len;
@@ -3531,16 +3539,22 @@  __skb_to_sgvec(struct sk_buff *skb, struct scatterlist *sg, int offset, int len)
 	}
 
 	skb_walk_frags(skb, frag_iter) {
-		int end;
+		int end, ret;
 
 		WARN_ON(start > offset + len);
 
 		end = start + frag_iter->len;
 		if ((copy = end - offset) > 0) {
+			if (unlikely(elt && sg_is_last(&sg[elt - 1])))
+				return -EMSGSIZE;
+
 			if (copy > len)
 				copy = len;
-			elt += __skb_to_sgvec(frag_iter, sg+elt, offset - start,
-					      copy);
+			ret = __skb_to_sgvec(frag_iter, sg+elt, offset - start,
+					      copy, recursion_level + 1);
+			if (unlikely(ret < 0))
+				return ret;
+			elt += ret;
 			if ((len -= copy) == 0)
 				return elt;
 			offset += copy;
@@ -3573,13 +3587,16 @@  __skb_to_sgvec(struct sk_buff *skb, struct scatterlist *sg, int offset, int len)
 int skb_to_sgvec_nomark(struct sk_buff *skb, struct scatterlist *sg,
 			int offset, int len)
 {
-	return __skb_to_sgvec(skb, sg, offset, len);
+	return __skb_to_sgvec(skb, sg, offset, len, 0);
 }
 EXPORT_SYMBOL_GPL(skb_to_sgvec_nomark);
 
 int skb_to_sgvec(struct sk_buff *skb, struct scatterlist *sg, int offset, int len)
 {
-	int nsg = __skb_to_sgvec(skb, sg, offset, len);
+	int nsg = __skb_to_sgvec(skb, sg, offset, len, 0);
+
+	if (nsg <= 0)
+		return nsg;
 
 	sg_mark_end(&sg[nsg - 1]);