From patchwork Thu Nov 12 23:26:39 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Fastabend X-Patchwork-Id: 11902289 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.6 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1A0B9C2D0E4 for ; Thu, 12 Nov 2020 23:27:06 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id AA762216FD for ; Thu, 12 Nov 2020 23:27:05 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="NCoOjxcL" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726199AbgKLX1F (ORCPT ); Thu, 12 Nov 2020 18:27:05 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58144 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726120AbgKLX1E (ORCPT ); Thu, 12 Nov 2020 18:27:04 -0500 Received: from mail-oo1-xc43.google.com (mail-oo1-xc43.google.com [IPv6:2607:f8b0:4864:20::c43]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1BA26C0613D1; Thu, 12 Nov 2020 15:26:55 -0800 (PST) Received: by mail-oo1-xc43.google.com with SMTP id l10so1743526oom.6; Thu, 12 Nov 2020 15:26:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:from:to:cc:date:message-id:in-reply-to:references :user-agent:mime-version:content-transfer-encoding; bh=Mv1nt7L5lJAD6UWt7vcLmtlaJBvfyTmw84ZoL88Oi2E=; b=NCoOjxcLYXvZ39sHyrxlkE6EEyy5jssR7pPVm1KoLJU/HfUW+Sa8TcpIs0cHEueB1c Pc6xySJ12v09I2NAkszZ3/fCkjbvz3Pu3R8OWpmhEBfGEUnAX5flK4LQVpNgzvPi8GsA +/jEyD5c3MsDekrG/QPG1f031oAkEIgLy6H3IrtaxyrKbsF8dyiSpSYGJ3dmm9ZwyL6/ /cyUp0KP3cwXq0y+RxPGnDNfcAGel0q1YlCDdHkAG0NFBKE9yLffQQZQLBMLI+p042jz +YneNpL18iLB6452DD9YQ4L4dnKMoaLstPoMKjiuubFy8mTMAIN74KnGPxBVJ9GgsFNJ T/5Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:from:to:cc:date:message-id:in-reply-to :references:user-agent:mime-version:content-transfer-encoding; bh=Mv1nt7L5lJAD6UWt7vcLmtlaJBvfyTmw84ZoL88Oi2E=; b=b8BJDmR55qvTqYQm/UEvtDJL+aiuceEy0QV1bGyTJQwWprgGDxS9GTgGGaW8ygbcrK 880RII7Kqn4ml0rF9JJlIwlO1dn7PNX4xaX8g6Z4nVu8fsvGYgOng0E+iOGaKNnitY4h LD0pRl/a5xNVF041Vi+e5PCcUCJx3N6DvVXjuRB8Njl6GwPI8XVhLLp6Rwa5LHSK4617 PcbbyKocudbc1tSIMiudg+7ZMvqQCH8y+SXy8fTM0OTJgS0Ur5i3OIXDGRVw3AL/TGY4 66UvKjIXOOkVwuTGT/GOURFZuuMcBcnUz7QI4LLM5t6fxozpNuSKArpZpsJvIetb6wGX BkEQ== X-Gm-Message-State: AOAM533xfSXN1T3qTR19K59p+pPtuXwFHqxQ3RlRs/nOcPc7WHW6178C gX+7Ge9FyXopVn1r5xuNtos= X-Google-Smtp-Source: ABdhPJyQ4K66OTjvIhRJvSvy8Yum3gYfbbJ3iOvjjD65DPEEUqVW5pAdYY69IyiJrqLJPTEQYWLZiQ== X-Received: by 2002:a4a:e1c6:: with SMTP id n6mr1274055oot.68.1605223614564; Thu, 12 Nov 2020 15:26:54 -0800 (PST) Received: from [127.0.1.1] ([184.63.162.180]) by smtp.gmail.com with ESMTPSA id j186sm1472379oib.38.2020.11.12.15.26.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 12 Nov 2020 15:26:53 -0800 (PST) Subject: [bpf PATCH v2 1/6] bpf, sockmap: fix partial copy_page_to_iter so progress can still be made From: John Fastabend To: ast@kernel.org, daniel@iogearbox.net, jakub@cloudflare.com Cc: john.fastabend@gmail.com, bpf@vger.kernel.org, netdev@vger.kernel.org Date: Thu, 12 Nov 2020 15:26:39 -0800 Message-ID: <160522359920.135009.3651330637920905285.stgit@john-XPS-13-9370> In-Reply-To: <160522352433.135009.15329422887113794062.stgit@john-XPS-13-9370> References: <160522352433.135009.15329422887113794062.stgit@john-XPS-13-9370> User-Agent: StGit/0.23-36-gc01b MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net If copy_page_to_iter() fails or even partially completes, but with fewer bytes copied than expected we currently reset sg.start and return EFAULT. This proves problematic if we already copied data into the user buffer before we return an error. Because we leave the copied data in the user buffer and fail to unwind the scatterlist so kernel side believes data has been copied and user side believes data has _not_ been received. Expected behavior should be to return number of bytes copied and then on the next read we need to return the error assuming its still there. This can happen if we have a copy length spanning multiple scatterlist elements and one or more complete before the error is hit. The error is rare enough though that my normal testing with server side programs, such as nginx, httpd, envoy, etc., I have never seen this. The only reliable way to reproduce that I've found is to stream movies over my browser for a day or so and wait for it to hang. Not very scientific, but with a few extra WARN_ON()s in the code the bug was obvious. When we review the errors from copy_page_to_iter() it seems we are hitting a page fault from copy_page_to_iter_iovec() where the code checks fault_in_pages_writeable(buf, copy) where buf is the user buffer. It also seems typical server applications don't hit this case. The other way to try and reproduce this is run the sockmap selftest tool test_sockmap with data verification enabled, but it doesn't reproduce the fault. Perhaps we can trigger this case artificially somehow from the test tools. I haven't sorted out a way to do that yet though. Fixes: 604326b41a6fb ("bpf, sockmap: convert to generic sk_msg interface") Signed-off-by: John Fastabend --- net/ipv4/tcp_bpf.c | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c index 37f4cb2bba5c..8e950b0bfabc 100644 --- a/net/ipv4/tcp_bpf.c +++ b/net/ipv4/tcp_bpf.c @@ -15,8 +15,8 @@ int __tcp_bpf_recvmsg(struct sock *sk, struct sk_psock *psock, { struct iov_iter *iter = &msg->msg_iter; int peek = flags & MSG_PEEK; - int i, ret, copied = 0; struct sk_msg *msg_rx; + int i, copied = 0; msg_rx = list_first_entry_or_null(&psock->ingress_msg, struct sk_msg, list); @@ -37,11 +37,9 @@ int __tcp_bpf_recvmsg(struct sock *sk, struct sk_psock *psock, page = sg_page(sge); if (copied + copy > len) copy = len - copied; - ret = copy_page_to_iter(page, sge->offset, copy, iter); - if (ret != copy) { - msg_rx->sg.start = i; - return -EFAULT; - } + copy = copy_page_to_iter(page, sge->offset, copy, iter); + if (!copy) + return copied ? copied : -EFAULT; copied += copy; if (likely(!peek)) { @@ -56,6 +54,11 @@ int __tcp_bpf_recvmsg(struct sock *sk, struct sk_psock *psock, put_page(page); } } else { + /* Lets not optimize peek case if copy_page_to_iter + * didn't copy the entire length lets just break. + */ + if (copy != sge->length) + return copied; sk_msg_iter_var_next(i); } From patchwork Thu Nov 12 23:27:01 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Fastabend X-Patchwork-Id: 11902291 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.6 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 77788C388F7 for ; Thu, 12 Nov 2020 23:27:22 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1EB4D22241 for ; Thu, 12 Nov 2020 23:27:22 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="ssfR/SeH" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726196AbgKLX1V (ORCPT ); Thu, 12 Nov 2020 18:27:21 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58196 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726120AbgKLX1V (ORCPT ); Thu, 12 Nov 2020 18:27:21 -0500 Received: from mail-oi1-x244.google.com (mail-oi1-x244.google.com [IPv6:2607:f8b0:4864:20::244]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3712FC0613D1; Thu, 12 Nov 2020 15:27:14 -0800 (PST) Received: by mail-oi1-x244.google.com with SMTP id m13so8379799oih.8; Thu, 12 Nov 2020 15:27:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:from:to:cc:date:message-id:in-reply-to:references :user-agent:mime-version:content-transfer-encoding; bh=aDYZN5a9ApiLd0vCcv7Ak0sWbHY6Cv/Dsk+GGWN7dZY=; b=ssfR/SeHY8IajLr+WRrLV5giu1JLsh/YUgrkL/2Z98FZvu2TW9rR6o24iUtBRwF6gD B5+3uZ54Pro2UVPfccOuTnxxu1sYi7i/EBlt3S0nUzRNarWiswrqhh00VLsvOIE9FXMS BycrD36NlbzXSH4dKCEJOfT+i81Lp58usGnDkhKrmcX+CnCuww3MTZRwSOFzA/8eOr1M MlAH4Lcv3jJvdwvkPJBRrZg6ikt1CBvoLMzo90arQf3xZKVNzHb7oBwWtG5nVbAqms8K G6QUFO+YK4T8I6trbWaZ0rXp7tPEVyKVvRENG2qBQGQngqmDuxEMDKw4AGLXpigIFJKA vSYQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:from:to:cc:date:message-id:in-reply-to :references:user-agent:mime-version:content-transfer-encoding; bh=aDYZN5a9ApiLd0vCcv7Ak0sWbHY6Cv/Dsk+GGWN7dZY=; b=ISN/4NGOxoje5h6DXpUuZQCckYbf/dPSb4H6ErpI3R2jwjEA27VQATPuoiVcVxHgiN PsdosbTKCFnnMOKtd7JzB5VJeJIGIbzjrH9dS8PRnyfLc2+Sb+fXsoKqOi2WKTeET1N7 Q6DiEVIu+I/rUOOI8BxCurwGBd9yDTwSuW2aUjo8ThxEW0VEm3Q+yUgieBMM4X9mj6t/ 9c5MrVuHAPfG/cOwH9oj/1E8H45YHvP2WMz4hxscR6ov4LJSi7lS0zmNjTDdJ+zohl+C cVZFIS0CElsMP2tQcLcyaTb6pkxp/06ahZfLLc4a8xa/RFDeocxRi1k3QGw6oBABkScs VuEw== X-Gm-Message-State: AOAM532TBdIiMI87niQ4OL1Dy/4+1IQnTFhtqQuE4YFpDFaV1n7/HCWi GTg34gmu8EPI+oDPeQ5a+jk= X-Google-Smtp-Source: ABdhPJzFmr6ML7janfPb9nfjNhd4UkN/3Naqd5a/WNgxj/RpQGeT+00c44A4YqKMuxygLMxM/YD4Cw== X-Received: by 2002:aca:d48c:: with SMTP id l134mr127430oig.129.1605223633688; Thu, 12 Nov 2020 15:27:13 -0800 (PST) Received: from [127.0.1.1] ([184.63.162.180]) by smtp.gmail.com with ESMTPSA id s14sm1475480oij.4.2020.11.12.15.27.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 12 Nov 2020 15:27:13 -0800 (PST) Subject: [bpf PATCH v2 2/6] bpf, sockmap: Ensure SO_RCVBUF memory is observed on ingress redirect From: John Fastabend To: ast@kernel.org, daniel@iogearbox.net, jakub@cloudflare.com Cc: john.fastabend@gmail.com, bpf@vger.kernel.org, netdev@vger.kernel.org Date: Thu, 12 Nov 2020 15:27:01 -0800 Message-ID: <160522362100.135009.18395216656832785566.stgit@john-XPS-13-9370> In-Reply-To: <160522352433.135009.15329422887113794062.stgit@john-XPS-13-9370> References: <160522352433.135009.15329422887113794062.stgit@john-XPS-13-9370> User-Agent: StGit/0.23-36-gc01b MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net Fix sockmap sk_skb programs so that they observe sk_rcvbuf limits. This allows users to tune SO_RCVBUF and sockmap will honor them. We can refactor the if(charge) case out in later patches. But, keep this fix to the point. Fixes: 51199405f9672 ("bpf: skb_verdict, support SK_PASS on RX BPF path") Suggested-by: Jakub Sitnicki Signed-off-by: John Fastabend --- net/core/skmsg.c | 20 ++++++++++++++++---- net/ipv4/tcp_bpf.c | 3 ++- 2 files changed, 18 insertions(+), 5 deletions(-) diff --git a/net/core/skmsg.c b/net/core/skmsg.c index 654182ecf87b..fe44280c033e 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -170,10 +170,12 @@ static int sk_msg_free_elem(struct sock *sk, struct sk_msg *msg, u32 i, struct scatterlist *sge = sk_msg_elem(msg, i); u32 len = sge->length; - if (charge) - sk_mem_uncharge(sk, len); - if (!msg->skb) + /* When the skb owns the memory we free it from consume_skb path. */ + if (!msg->skb) { + if (charge) + sk_mem_uncharge(sk, len); put_page(sg_page(sge)); + } memset(sge, 0, sizeof(*sge)); return len; } @@ -403,6 +405,9 @@ static int sk_psock_skb_ingress(struct sk_psock *psock, struct sk_buff *skb) int copied = 0, num_sge; struct sk_msg *msg; + if (atomic_read(&sk->sk_rmem_alloc) > sk->sk_rcvbuf) + return -EAGAIN; + msg = kzalloc(sizeof(*msg), __GFP_NOWARN | GFP_ATOMIC); if (unlikely(!msg)) return -EAGAIN; @@ -418,7 +423,14 @@ static int sk_psock_skb_ingress(struct sk_psock *psock, struct sk_buff *skb) return num_sge; } - sk_mem_charge(sk, skb->len); + /* This will transition ownership of the data from the socket where + * the BPF program was run initiating the redirect to the socket + * we will eventually receive this data on. The data will be released + * from skb_consume found in __tcp_bpf_recvmsg() after its been copied + * into user buffers. + */ + skb_set_owner_r(skb, sk); + copied = skb->len; msg->sg.start = 0; msg->sg.size = copied; diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c index 8e950b0bfabc..bc7d2a586e18 100644 --- a/net/ipv4/tcp_bpf.c +++ b/net/ipv4/tcp_bpf.c @@ -45,7 +45,8 @@ int __tcp_bpf_recvmsg(struct sock *sk, struct sk_psock *psock, if (likely(!peek)) { sge->offset += copy; sge->length -= copy; - sk_mem_uncharge(sk, copy); + if (!msg_rx->skb) + sk_mem_uncharge(sk, copy); msg_rx->sg.size -= copy; if (!sge->length) { From patchwork Thu Nov 12 23:27:20 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Fastabend X-Patchwork-Id: 11902293 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.6 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D8EE9C388F7 for ; Thu, 12 Nov 2020 23:27:37 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7C1AF22201 for ; Thu, 12 Nov 2020 23:27:37 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="I3hKM3Gv" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726261AbgKLX1h (ORCPT ); Thu, 12 Nov 2020 18:27:37 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58242 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726120AbgKLX1g (ORCPT ); Thu, 12 Nov 2020 18:27:36 -0500 Received: from mail-oi1-x22a.google.com (mail-oi1-x22a.google.com [IPv6:2607:f8b0:4864:20::22a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 89A22C0613D1; Thu, 12 Nov 2020 15:27:32 -0800 (PST) Received: by mail-oi1-x22a.google.com with SMTP id c80so8413060oib.2; Thu, 12 Nov 2020 15:27:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:from:to:cc:date:message-id:in-reply-to:references :user-agent:mime-version:content-transfer-encoding; bh=Mwy/q1OsCumX7Ymcr0qph52W094rvZ6gjyVjTWBlDuM=; b=I3hKM3GvVphY9rSSPWasliXATFDnTk5WZQOB6SOLuHseKnem4FAL+KE/HQz5knf8LX 5XBhQJeOEYnlyNZ7sNgmc2rurdjKeIJfNRNVbMk79To+CuC78TSHnckkkPjJ3yyaj5NA m2+EIbCUOXPYP+pJqgnYV6vyqd9xvGDojAPt/Ndkl4dlehkwIE8V65Jgdz+xRlijjiio 0QALxACcnoWxUHSzPH+NPlmt54I0Wh52Ku7nBWSCBcyPaVr9bR0CCCCyXIL2vM4ygX51 oddJylPrPTOhmkFp3hHZbTCjGtvjYuh9V/gRZX37LkInyoTiy+1FjFkIlui3bLYOkFfw aRGQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:from:to:cc:date:message-id:in-reply-to :references:user-agent:mime-version:content-transfer-encoding; bh=Mwy/q1OsCumX7Ymcr0qph52W094rvZ6gjyVjTWBlDuM=; b=i47zlheXMrmOeRSdirlzn+3ToCGQZfnT3skqvvclEZQbOIMcPZh1SjaAyhb6Sy+5LZ H4B0OeGw0HnLM2IydnnOW2psMJ7EDm+RXs+hD2VFRgUtDDfp7/3AnzybhXSkBTBPESBb /3OET4nuNYcxayX2iG+OBHfI2bYODDzKedUGS5tvevoCn1knSciMroPD5CXga5GYAj0J aDYbWK+hIMfhu9tiwan2n7aVFT0vTz/wzdYL0mKUzLunrEQYMNafMXmA/yIZZhGIWzoi OfL9zJogbwejfD+oB7PYwRrESjoSBHGFa8lJwYwmAKdjU1/NuYy4fzbUzlEgsxgv/EvH sybQ== X-Gm-Message-State: AOAM532UYQJBNaK+wdS4av46sMHUGQYySLQmkNZY+nvS8Na22a31CvSy 4E8spj6vkV1k9eZmsyCcKeI= X-Google-Smtp-Source: ABdhPJyRXSD9z8Z3j8WQbTMTV24DWLlNTEPLvgSnCNvP2pH+grTFWj86zk7R0axKRwzgT3c43/D5rg== X-Received: by 2002:a05:6808:d9:: with SMTP id t25mr144563oic.102.1605223652009; Thu, 12 Nov 2020 15:27:32 -0800 (PST) Received: from [127.0.1.1] ([184.63.162.180]) by smtp.gmail.com with ESMTPSA id u5sm1628057oop.8.2020.11.12.15.27.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 12 Nov 2020 15:27:31 -0800 (PST) Subject: [bpf PATCH v2 3/6] bpf, sockmap: Use truesize with sk_rmem_schedule() From: John Fastabend To: ast@kernel.org, daniel@iogearbox.net, jakub@cloudflare.com Cc: john.fastabend@gmail.com, bpf@vger.kernel.org, netdev@vger.kernel.org Date: Thu, 12 Nov 2020 15:27:20 -0800 Message-ID: <160522364023.135009.13166901167357981092.stgit@john-XPS-13-9370> In-Reply-To: <160522352433.135009.15329422887113794062.stgit@john-XPS-13-9370> References: <160522352433.135009.15329422887113794062.stgit@john-XPS-13-9370> User-Agent: StGit/0.23-36-gc01b MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net We use skb->size with sk_rmem_scheduled() which is not correct. Instead use truesize to align with socket and tcp stack usage of sk_rmem_schedule. Suggested-by: Daniel Borkman Signed-off-by: John Fastabend --- net/core/skmsg.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/core/skmsg.c b/net/core/skmsg.c index fe44280c033e..d09426ce4af3 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -411,7 +411,7 @@ static int sk_psock_skb_ingress(struct sk_psock *psock, struct sk_buff *skb) msg = kzalloc(sizeof(*msg), __GFP_NOWARN | GFP_ATOMIC); if (unlikely(!msg)) return -EAGAIN; - if (!sk_rmem_schedule(sk, skb, skb->len)) { + if (!sk_rmem_schedule(sk, skb, skb->truesize)) { kfree(msg); return -EAGAIN; } From patchwork Thu Nov 12 23:27:38 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Fastabend X-Patchwork-Id: 11902295 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.6 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A74C7C388F7 for ; Thu, 12 Nov 2020 23:27:53 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4E97F216C4 for ; Thu, 12 Nov 2020 23:27:53 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="hIIq3h5b" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726265AbgKLX1w (ORCPT ); Thu, 12 Nov 2020 18:27:52 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58300 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726120AbgKLX1w (ORCPT ); Thu, 12 Nov 2020 18:27:52 -0500 Received: from mail-ot1-x342.google.com (mail-ot1-x342.google.com [IPv6:2607:f8b0:4864:20::342]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 656EBC0613D1; Thu, 12 Nov 2020 15:27:52 -0800 (PST) Received: by mail-ot1-x342.google.com with SMTP id l36so7340707ota.4; Thu, 12 Nov 2020 15:27:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:from:to:cc:date:message-id:in-reply-to:references :user-agent:mime-version:content-transfer-encoding; bh=pxLzTG+k6JVFrGIAnylLQFW014unL2BzBDjGaWO/F9c=; b=hIIq3h5beCxju5zkzUqp9s6Ijb8IoRZdzG7uyONsQmW7ck1jccXt1rWFP8rcO0rPrT Z3YpvWO81YfNdNIhFWA7tw7onrpi3NitSnYFTsd/jnAVw2Nb5+FyhAs+q7VinSbLljCp ZuypnMbsrDLosZOYJLv6RPkmnQ0TILw87SkI63+Kve2ueqJskkxwkX9H965zkQl/or2q AfsSG6WvNJTRU/zerwma9oA79ojB7a1udkxWrlOC5m+E6P/sT7NAsrtGSvUI1xptRchR 7XJ1W0l6xQgOmHA2MAoxVqvH2KP+npWevhU7fNfHx4axWhX5sNe0xCjh5x/F+Q9Q0FJ4 dFJg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:from:to:cc:date:message-id:in-reply-to :references:user-agent:mime-version:content-transfer-encoding; bh=pxLzTG+k6JVFrGIAnylLQFW014unL2BzBDjGaWO/F9c=; b=Y9Eli9oiKx8rsR3s4IUEhFCuvVXOyeTpx+SwoLezZjnKs9DEwv11pLd9pi9XdQPhGA 0zOFhsMJVZHUrrgAnpOogYuCE4q4LzsmFokUjXyc3/SibC70tw+v3iwO6fVjHWCakSH/ vpvqPvOggsdc18RxwGw71I026cgLfq7TkhEh/N36Nt3JMahPOcvh8F0J7SgFmbz9SzL5 lym+ECUiYgMYFKqAVWvCGnozT69hVUfQsJqa53Jzdzx2J7/a2wco36LbdD3fAJj6q3Ms dzlqrluZWVz0NRqEWNsikTgV9pUQKIzZq/dwgLn0ZmO8frrz1Bxf6zlq3/x9CFym2kH6 qMXw== X-Gm-Message-State: AOAM532grutv6BT5cPxLaeLT17aOLaXRYR7oblNgQdZ1tLEqcJ2hMtYM ybgGN4bmcW8nCqZoyOAwSx0= X-Google-Smtp-Source: ABdhPJwRCrCsKIEGglT8zPrLfJ1z2og+LJ22u8dUoBmOHP0br0dNPE00YV557d6g0Oba6Iax67oIxQ== X-Received: by 2002:a9d:289:: with SMTP id 9mr1152039otl.359.1605223671816; Thu, 12 Nov 2020 15:27:51 -0800 (PST) Received: from [127.0.1.1] ([184.63.162.180]) by smtp.gmail.com with ESMTPSA id 64sm1573504otq.26.2020.11.12.15.27.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 12 Nov 2020 15:27:51 -0800 (PST) Subject: [bpf PATCH v2 4/6] bpf, sockmap: Avoid returning unneeded EAGAIN when redirecting to self From: John Fastabend To: ast@kernel.org, daniel@iogearbox.net, jakub@cloudflare.com Cc: john.fastabend@gmail.com, bpf@vger.kernel.org, netdev@vger.kernel.org Date: Thu, 12 Nov 2020 15:27:38 -0800 Message-ID: <160522365867.135009.14160426037700777343.stgit@john-XPS-13-9370> In-Reply-To: <160522352433.135009.15329422887113794062.stgit@john-XPS-13-9370> References: <160522352433.135009.15329422887113794062.stgit@john-XPS-13-9370> User-Agent: StGit/0.23-36-gc01b MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net If a socket redirects to itself and it is under memory pressure it is possible to get a socket stuck so that recv() returns EAGAIN and the socket can not advance for some time. This happens because when redirecting a skb to the same socket we received the skb on we first check if it is OK to enqueue the skb on the receiving socket by checking memory limits. But, if the skb is itself the object holding the memory needed to enqueue the skb we will keep retrying from kernel side and always fail with EAGAIN. Then userspace will get a recv() EAGAIN error if there are no skbs in the psock ingress queue. This will continue until either some skbs get kfree'd causing the memory pressure to reduce far enough that we can enqueue the pending packet or the socket is destroyed. In some cases its possible to get a socket stuck for a noticable amount of time if the socket is only receiving skbs from sk_skb verdict programs. To reproduce I make the socket memory limits ridiculously low so sockets are always under memory pressure. More often though if under memory pressure it looks like a spurious EAGAIN error on user space side causing userspace to retry and typically enough has moved on the memory side that it works. To fix skip memory checks and skb_orphan if receiving on the same sock as already assigned. For SK_PASS cases this is easy, its always the same socket so we can just omit the orphan/set_owner pair. For backlog cases we need to check skb->sk and decide if the orphan and set_owner pair are needed. Fixes: 51199405f9672 ("bpf: skb_verdict, support SK_PASS on RX BPF path") Signed-off-by: John Fastabend --- net/core/skmsg.c | 72 ++++++++++++++++++++++++++++++++++++++++-------------- 1 file changed, 53 insertions(+), 19 deletions(-) diff --git a/net/core/skmsg.c b/net/core/skmsg.c index d09426ce4af3..9aed5a2c7c5b 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -399,38 +399,38 @@ int sk_msg_memcopy_from_iter(struct sock *sk, struct iov_iter *from, } EXPORT_SYMBOL_GPL(sk_msg_memcopy_from_iter); -static int sk_psock_skb_ingress(struct sk_psock *psock, struct sk_buff *skb) +static struct sk_msg *sk_psock_create_ingress_msg(struct sock *sk, + struct sk_buff *skb) { - struct sock *sk = psock->sk; - int copied = 0, num_sge; struct sk_msg *msg; if (atomic_read(&sk->sk_rmem_alloc) > sk->sk_rcvbuf) - return -EAGAIN; + return NULL; + + if (!sk_rmem_schedule(sk, skb, skb->truesize)) + return NULL; msg = kzalloc(sizeof(*msg), __GFP_NOWARN | GFP_ATOMIC); if (unlikely(!msg)) - return -EAGAIN; - if (!sk_rmem_schedule(sk, skb, skb->truesize)) { - kfree(msg); - return -EAGAIN; - } + return NULL; sk_msg_init(msg); - num_sge = skb_to_sgvec(skb, msg->sg.data, 0, skb->len); + return msg; +} + +static int sk_psock_skb_ingress_enqueue(struct sk_buff *skb, + struct sk_psock *psock, + struct sock *sk, + struct sk_msg *msg) +{ + int num_sge = skb_to_sgvec(skb, msg->sg.data, 0, skb->len); + int copied; + if (unlikely(num_sge < 0)) { kfree(msg); return num_sge; } - /* This will transition ownership of the data from the socket where - * the BPF program was run initiating the redirect to the socket - * we will eventually receive this data on. The data will be released - * from skb_consume found in __tcp_bpf_recvmsg() after its been copied - * into user buffers. - */ - skb_set_owner_r(skb, sk); - copied = skb->len; msg->sg.start = 0; msg->sg.size = copied; @@ -442,6 +442,40 @@ static int sk_psock_skb_ingress(struct sk_psock *psock, struct sk_buff *skb) return copied; } +static int sk_psock_skb_ingress(struct sk_psock *psock, struct sk_buff *skb) +{ + struct sock *sk = psock->sk; + struct sk_msg *msg; + + msg = sk_psock_create_ingress_msg(sk, skb); + if (!msg) + return -EAGAIN; + + /* This will transition ownership of the data from the socket where + * the BPF program was run initiating the redirect to the socket + * we will eventually receive this data on. The data will be released + * from skb_consume found in __tcp_bpf_recvmsg() after its been copied + * into user buffers. + */ + skb_set_owner_r(skb, sk); + return sk_psock_skb_ingress_enqueue(skb, psock, sk, msg); +} + +/* Puts an skb on the ingress queue of the socket already assigned to the + * skb. In this case we do not need to check memory limits or skb_set_owner_r + * because the skb is already accounted for here. + */ +static int sk_psock_skb_ingress_self(struct sk_psock *psock, struct sk_buff *skb) +{ + struct sk_msg *msg = kzalloc(sizeof(*msg), __GFP_NOWARN | GFP_ATOMIC); + struct sock *sk = psock->sk; + + if (unlikely(!msg)) + return -EAGAIN; + sk_msg_init(msg); + return sk_psock_skb_ingress_enqueue(skb, psock, sk, msg); +} + static int sk_psock_handle_skb(struct sk_psock *psock, struct sk_buff *skb, u32 off, u32 len, bool ingress) { @@ -801,7 +835,7 @@ static void sk_psock_verdict_apply(struct sk_psock *psock, * retrying later from workqueue. */ if (skb_queue_empty(&psock->ingress_skb)) { - err = sk_psock_skb_ingress(psock, skb); + err = sk_psock_skb_ingress_self(psock, skb); } if (err < 0) { skb_queue_tail(&psock->ingress_skb, skb); From patchwork Thu Nov 12 23:27:58 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Fastabend X-Patchwork-Id: 11902299 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.6 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0BD54C2D0E4 for ; Thu, 12 Nov 2020 23:28:13 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id AD169216FD for ; Thu, 12 Nov 2020 23:28:12 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Hob1Oah9" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726311AbgKLX2M (ORCPT ); Thu, 12 Nov 2020 18:28:12 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58368 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726120AbgKLX2M (ORCPT ); Thu, 12 Nov 2020 18:28:12 -0500 Received: from mail-ot1-x344.google.com (mail-ot1-x344.google.com [IPv6:2607:f8b0:4864:20::344]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F2FDAC0613D1; Thu, 12 Nov 2020 15:28:11 -0800 (PST) Received: by mail-ot1-x344.google.com with SMTP id i18so7352405ots.0; Thu, 12 Nov 2020 15:28:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:from:to:cc:date:message-id:in-reply-to:references :user-agent:mime-version:content-transfer-encoding; bh=/o7ZpPtFo1LVwWxsfIU+NA+EfMMD1Zz6ZcEYNl6gVDg=; b=Hob1Oah9DaeJ+GWHOoRieGI10cNlNgdRQpaAblY6ThPN6Acl1edbt99HhpS+Te0tcE oIbFojF2tDktWGT1fhp5OkJDwvkAIVVSmNdlmGvBIzcLGIM2WJJWYAQs9HpaM19nx1Pz DtEPwDkb8ITp4daLwbXrD7LvgDX42bSod3NrCrpUhXyeMOOO9570/dLBEMBVzL9Zaxhr pRmjAsqeWZDXv4uTr1/AsF1HvdqXM2YijrV1Iwb73uu0f6RjGCanReGFrQ618NapqsHJ dKkwsId7/9IX7H0/ggFlhdfuwAEJtZyHIMwkYCe7GWLSnFbta4qmKHia7pXj80uuQh3c 7AqA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:from:to:cc:date:message-id:in-reply-to :references:user-agent:mime-version:content-transfer-encoding; bh=/o7ZpPtFo1LVwWxsfIU+NA+EfMMD1Zz6ZcEYNl6gVDg=; b=SmqT8hDRbXY0mKbexAAGt9WDwalUzHrHeo8Jb7jgCK1lcLG7M1N9LgpXFy6ElRPfaB 4NeOviAxth4PWb92dihkslJZ6dZ+Y9q76Ss8pYg6Kw827sKXxOrFaNoELBPjMx23Rl5C fC1U3y99/4vciSNGKAB/IBZGhKxIMaqkJb/gVWM6hKeQ5LoDaTg3cuIIHLe0EvlQQ8vr QMfL9HMqUAvanznBJeK9i2jQwA9/9aLJ+eLZkxSzBPlHH6T1tvXljGviRd7fC9EmNuLJ ncLZybry66oFm7bnrcFfSAsvivd/35vybfMiNInS/JzIB1OOF6rWO23T6L7sApCwvhsV E9Ow== X-Gm-Message-State: AOAM533iO3GL7aoPEbXt4PNR1FWNAb6Q9hi7C2aajo4oTc3TcbC1Q6a+ 5l5bjmkJ2/AfYvdSuryHHFw= X-Google-Smtp-Source: ABdhPJwp9JfZZgpdytWo6XP5Qf14k+euWXuOPxNV65w1T8QYKbmoSmFwUyy+pV81buOP9UcMZYbQRA== X-Received: by 2002:a05:6830:1dd8:: with SMTP id a24mr1201546otj.163.1605223691420; Thu, 12 Nov 2020 15:28:11 -0800 (PST) Received: from [127.0.1.1] ([184.63.162.180]) by smtp.gmail.com with ESMTPSA id m65sm1576836otm.40.2020.11.12.15.28.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 12 Nov 2020 15:28:10 -0800 (PST) Subject: [bpf PATCH v2 5/6] bpf, sockmap: Handle memory acct if skb_verdict prog redirects to self From: John Fastabend To: ast@kernel.org, daniel@iogearbox.net, jakub@cloudflare.com Cc: john.fastabend@gmail.com, bpf@vger.kernel.org, netdev@vger.kernel.org Date: Thu, 12 Nov 2020 15:27:58 -0800 Message-ID: <160522367856.135009.17304729578208922913.stgit@john-XPS-13-9370> In-Reply-To: <160522352433.135009.15329422887113794062.stgit@john-XPS-13-9370> References: <160522352433.135009.15329422887113794062.stgit@john-XPS-13-9370> User-Agent: StGit/0.23-36-gc01b MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net If the skb_verdict_prog redirects an skb knowingly to itself, fix your BPF program this is not optimal and an abuse of the API please use SK_PASS. That said there may be cases, such as socket load balancing, where picking the socket is hashed based or otherwise picks the same socket it was received on in some rare cases. If this happens we don't want to confuse userspace giving them an EAGAIN error if we can avoid it. To avoid double accounting in these cases. At the moment even if the skb has already been charged against the sockets rcvbuf and forward alloc we check it again and do set_owner_r() causing it to be orphaned and recharged. For one this is useless work, but more importantly we can have a case where the skb could be put on the ingress queue, but because we are under memory pressure we return EAGAIN. The trouble here is the skb has already been accounted for so any rcvbuf checks include the memory associated with the packet already. This rolls up and can result in unecessary EAGAIN errors in userspace read() calls. Fix by doing an unlikely check and skipping checks if skb->sk == sk. Fixes: 51199405f9672 ("bpf: skb_verdict, support SK_PASS on RX BPF path") Signed-off-by: John Fastabend --- net/core/skmsg.c | 17 +++++++++++------ 1 file changed, 11 insertions(+), 6 deletions(-) diff --git a/net/core/skmsg.c b/net/core/skmsg.c index 9aed5a2c7c5b..f747ee341fe8 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -404,11 +404,13 @@ static struct sk_msg *sk_psock_create_ingress_msg(struct sock *sk, { struct sk_msg *msg; - if (atomic_read(&sk->sk_rmem_alloc) > sk->sk_rcvbuf) - return NULL; + if (likely(skb->sk != sk)) { + if (atomic_read(&sk->sk_rmem_alloc) > sk->sk_rcvbuf) + return NULL; - if (!sk_rmem_schedule(sk, skb, skb->truesize)) - return NULL; + if (!sk_rmem_schedule(sk, skb, skb->truesize)) + return NULL; + } msg = kzalloc(sizeof(*msg), __GFP_NOWARN | GFP_ATOMIC); if (unlikely(!msg)) @@ -455,9 +457,12 @@ static int sk_psock_skb_ingress(struct sk_psock *psock, struct sk_buff *skb) * the BPF program was run initiating the redirect to the socket * we will eventually receive this data on. The data will be released * from skb_consume found in __tcp_bpf_recvmsg() after its been copied - * into user buffers. + * into user buffers. If we are receiving on the same sock skb->sk is + * already assigned, skip memory accounting and owner transition seeing + * it already set correctly. */ - skb_set_owner_r(skb, sk); + if (likely(skb->sk != sk)) + skb_set_owner_r(skb, sk); return sk_psock_skb_ingress_enqueue(skb, psock, sk, msg); } From patchwork Thu Nov 12 23:28:18 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Fastabend X-Patchwork-Id: 11902297 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.6 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 56109C388F7 for ; Thu, 12 Nov 2020 23:28:34 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id DD20021D7F for ; Thu, 12 Nov 2020 23:28:33 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Pdu8S3ht" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726120AbgKLX2d (ORCPT ); Thu, 12 Nov 2020 18:28:33 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58426 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726091AbgKLX2d (ORCPT ); Thu, 12 Nov 2020 18:28:33 -0500 Received: from mail-oi1-x242.google.com (mail-oi1-x242.google.com [IPv6:2607:f8b0:4864:20::242]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F2B91C0613D1; Thu, 12 Nov 2020 15:28:32 -0800 (PST) Received: by mail-oi1-x242.google.com with SMTP id c80so8415522oib.2; Thu, 12 Nov 2020 15:28:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:from:to:cc:date:message-id:in-reply-to:references :user-agent:mime-version:content-transfer-encoding; bh=qBz2ikggoRTZ09dTAsMMFGrnP2NqKHVAFQzLy/TNIXU=; b=Pdu8S3htyjph3nhGOk0m9xm3NzC2JE5f3wej72C4Ry82ixnDkb2TQrsKN2xnFhiPt7 oVTXU9rE0OoNlWnhaGNesrQyXqQuGoejhV+MDTQo8EnmRq/g0kiKjdKkGFUcB4j6Z3d0 MTTy3NeqQUeQph995kcxoD0MBhPt/yKRd0LQu6AIltKZkMi3lSG2qGlMewif2W1Ookcn ahhwxUOgUsjQ7vI7uwfRQ0ERyxj0F6gpAJXRY3G7U/vKQkvTgtPI88gz9NOmdViBsbDG 12Iv5rUHerYsY3N4qtG0J7CmyFIsWAVMX55FrFzcFUfWTSpNPt6LKUqHkqhJhsJcFYO2 wD6Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:from:to:cc:date:message-id:in-reply-to :references:user-agent:mime-version:content-transfer-encoding; bh=qBz2ikggoRTZ09dTAsMMFGrnP2NqKHVAFQzLy/TNIXU=; b=e2n47fjRcd0iTIVbY7eZo9L9OJA1gi3pvMQ9SUiMlCdrQlgJRLyxT3MAEaZTH2YSmj b5P3UmJyeEcqSbKpgVmBywS6Ek5V3FUrxWnQNTT2CdaDbQKMR3hm61lbYH6fUX7pQkPT +LD710g1ojKy8onmcV17V8745tDLitVtM+SnCc42IcQibxhoNjsDtPexANQHEZkcMdD3 uSVFewvRt5XF/nnUirIHtcd/DSywQRuyDz0EiiWPevX5iXSr2WgE1sZC6NsWHNDj2tVo 9SQvYNm12/3W4PbOKjdG2Wym20UHG61n+EZ2hdh2ZB8raJMgVeQWSU+yGUHettKy4mkW HYtA== X-Gm-Message-State: AOAM5329Y9POTqXitlnjE47qcOYXlpqC607D+EE2phVuKU+q7e2hDkc/ /rFoPeXt3V4EqVJpOPhB+DU= X-Google-Smtp-Source: ABdhPJwKlPGMniLJTTbqbt1xiRns3tUVmKw1gZF1Lg3LOg/eBVSjdZW8sFD9aiHHj2HdL+BxfVgD1w== X-Received: by 2002:aca:da89:: with SMTP id r131mr176976oig.166.1605223712412; Thu, 12 Nov 2020 15:28:32 -0800 (PST) Received: from [127.0.1.1] ([184.63.162.180]) by smtp.gmail.com with ESMTPSA id x83sm1468346oig.39.2020.11.12.15.28.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 12 Nov 2020 15:28:31 -0800 (PST) Subject: [bpf PATCH v2 6/6] bpf, sockmap: Avoid failures from skb_to_sgvec when skb has frag_list From: John Fastabend To: ast@kernel.org, daniel@iogearbox.net, jakub@cloudflare.com Cc: john.fastabend@gmail.com, bpf@vger.kernel.org, netdev@vger.kernel.org Date: Thu, 12 Nov 2020 15:28:18 -0800 Message-ID: <160522369822.135009.15718253545046438408.stgit@john-XPS-13-9370> In-Reply-To: <160522352433.135009.15329422887113794062.stgit@john-XPS-13-9370> References: <160522352433.135009.15329422887113794062.stgit@john-XPS-13-9370> User-Agent: StGit/0.23-36-gc01b MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net When skb has a frag_list its possible for skb_to_sgvec() to fail. This happens when the scatterlist has fewer elements to store pages than would be needed for the initial skb plus any of its frags. This case appears rare, but is possible when running an RX parser/verdict programs exposed to the internet. Currently, when this happens we throw an error, break the pipe, and kfree the msg. This effectively breaks the application or forces it to do a retry. Lets catch this case and handle it by doing an skb_linearize() on any skb we receive with frags. At this point skb_to_sgvec should not fail because the failing conditions would require frags to be in place. Fixes: 604326b41a6fb ("bpf, sockmap: convert to generic sk_msg interface") Signed-off-by: John Fastabend --- net/core/skmsg.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/net/core/skmsg.c b/net/core/skmsg.c index f747ee341fe8..7ec1fdc083e4 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -425,9 +425,16 @@ static int sk_psock_skb_ingress_enqueue(struct sk_buff *skb, struct sock *sk, struct sk_msg *msg) { - int num_sge = skb_to_sgvec(skb, msg->sg.data, 0, skb->len); - int copied; + int num_sge, copied; + /* skb linearize may fail with ENOMEM, but lets simply try again + * later if this happens. Under memory pressure we don't want to + * drop the skb. We need to linearize the skb so that the mapping + * in skb_to_sgvec can not error. + */ + if (skb_linearize(skb)) + return -EAGAIN; + num_sge = skb_to_sgvec(skb, msg->sg.data, 0, skb->len); if (unlikely(num_sge < 0)) { kfree(msg); return num_sge;