From patchwork Mon Nov 16 22:27:46 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Fastabend X-Patchwork-Id: 11910941 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.6 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7F204C63697 for ; Mon, 16 Nov 2020 22:28:04 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 31CFE2244C for ; Mon, 16 Nov 2020 22:28:04 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="QpCfn048" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729703AbgKPW2B (ORCPT ); Mon, 16 Nov 2020 17:28:01 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34914 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728982AbgKPW2B (ORCPT ); Mon, 16 Nov 2020 17:28:01 -0500 Received: from mail-oi1-x242.google.com (mail-oi1-x242.google.com [IPv6:2607:f8b0:4864:20::242]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 82CA9C0613CF; Mon, 16 Nov 2020 14:28:01 -0800 (PST) Received: by mail-oi1-x242.google.com with SMTP id d9so20517770oib.3; Mon, 16 Nov 2020 14:28:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:from:to:cc:date:message-id:in-reply-to:references :user-agent:mime-version:content-transfer-encoding; bh=IosZaA2J4WB04PBn1llEGI1Q5uikyxZcnchHfDV0ON0=; b=QpCfn048Rb2F2BjKhWu0wJhZm8nH8/AAlMT1F4Nt7xMppyJyhsrkI9aiKlqDAaO+OE PpGNsJ5wdUeitX2n2LbBAjXmFJvsSMsg0ohW/dI/jCuL9FiiJCJCYmLXrtc5+KWwHIx2 D+FpBPVgk5Wl8i8eD7VX7udSQGPX7fp1T9sR+SrwahreJJwEaaB/IZAKFPvNn8o+5btu 9vr3rIMzOz1/DTMy0kSIBNmfqNdikeX7/auzu2hZgNmjZR0nsze603Soz03v7mfGdHiA 5KwG+1rUDDF73nL9D/6oPvB0EL34WM0+QcEWLVdbZfQnJIV+uF+G0guh28r1z5V9R62M J0Jg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:from:to:cc:date:message-id:in-reply-to :references:user-agent:mime-version:content-transfer-encoding; bh=IosZaA2J4WB04PBn1llEGI1Q5uikyxZcnchHfDV0ON0=; b=JLKwwsUgehMKeeLam+YUo6Y64Uwo3TYTT2H75nPirWYtLcvjDUGRq0VpaeOUYYGvGE E5fAuBIU6cx0gY2k9C0kuzEVZKnggtXkEOuQu0ETywirfVCXrh7KzqWuQS3hcuEnIa/F oPp2Ay//Flhzu8SlutDaq4IX1eKVs4c58pcoYyHc/fre/5XD2n37/qPnQm4M2oUMG+gW 1zHfkIIzM6T7WYyQ2TbvPMnbjlam/DpAxc9Xt+8+DK308fCytKa1Dbiobd15mFsLsfh3 WnIFylV0AdFMIVX+3nvzGE6o8y/LjgVViV5NvbzMMwy4cSlf6DH9WyC8ejlXpAdCux+G ylAQ== X-Gm-Message-State: AOAM530SChHaSqPnM00UDz5nBeQycsxcjU0EtZ1fspwz8nOqUCTlsYra H5YebTy+vaH8g4YtlhpFD80+HsE1EWG1Gg== X-Google-Smtp-Source: ABdhPJzrs4QSdqjppUHQQp2IsbW+JMz+NMoR2Oe+/nFhYFhQAEpMqmvTKau6XqIIGs1Tesus7ZZOsQ== X-Received: by 2002:aca:a988:: with SMTP id s130mr583643oie.172.1605565680656; Mon, 16 Nov 2020 14:28:00 -0800 (PST) Received: from [127.0.1.1] ([184.63.162.180]) by smtp.gmail.com with ESMTPSA id v82sm4971383oif.27.2020.11.16.14.27.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 16 Nov 2020 14:28:00 -0800 (PST) Subject: [bpf PATCH v3 1/6] bpf, sockmap: fix partial copy_page_to_iter so progress can still be made From: John Fastabend To: jakub@cloudflare.com, ast@kernel.org, daniel@iogearbox.net Cc: john.fastabend@gmail.com, bpf@vger.kernel.org, netdev@vger.kernel.org Date: Mon, 16 Nov 2020 14:27:46 -0800 Message-ID: <160556566659.73229.15694973114605301063.stgit@john-XPS-13-9370> In-Reply-To: <160556562395.73229.12161576665124541961.stgit@john-XPS-13-9370> References: <160556562395.73229.12161576665124541961.stgit@john-XPS-13-9370> User-Agent: StGit/0.23-36-gc01b MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net If copy_page_to_iter() fails or even partially completes, but with fewer bytes copied than expected we currently reset sg.start and return EFAULT. This proves problematic if we already copied data into the user buffer before we return an error. Because we leave the copied data in the user buffer and fail to unwind the scatterlist so kernel side believes data has been copied and user side believes data has _not_ been received. Expected behavior should be to return number of bytes copied and then on the next read we need to return the error assuming its still there. This can happen if we have a copy length spanning multiple scatterlist elements and one or more complete before the error is hit. The error is rare enough though that my normal testing with server side programs, such as nginx, httpd, envoy, etc., I have never seen this. The only reliable way to reproduce that I've found is to stream movies over my browser for a day or so and wait for it to hang. Not very scientific, but with a few extra WARN_ON()s in the code the bug was obvious. When we review the errors from copy_page_to_iter() it seems we are hitting a page fault from copy_page_to_iter_iovec() where the code checks fault_in_pages_writeable(buf, copy) where buf is the user buffer. It also seems typical server applications don't hit this case. The other way to try and reproduce this is run the sockmap selftest tool test_sockmap with data verification enabled, but it doesn't reproduce the fault. Perhaps we can trigger this case artificially somehow from the test tools. I haven't sorted out a way to do that yet though. Fixes: 604326b41a6fb ("bpf, sockmap: convert to generic sk_msg interface") Reviewed-by: Jakub Sitnicki Signed-off-by: John Fastabend --- net/ipv4/tcp_bpf.c | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c index 37f4cb2bba5c..8e950b0bfabc 100644 --- a/net/ipv4/tcp_bpf.c +++ b/net/ipv4/tcp_bpf.c @@ -15,8 +15,8 @@ int __tcp_bpf_recvmsg(struct sock *sk, struct sk_psock *psock, { struct iov_iter *iter = &msg->msg_iter; int peek = flags & MSG_PEEK; - int i, ret, copied = 0; struct sk_msg *msg_rx; + int i, copied = 0; msg_rx = list_first_entry_or_null(&psock->ingress_msg, struct sk_msg, list); @@ -37,11 +37,9 @@ int __tcp_bpf_recvmsg(struct sock *sk, struct sk_psock *psock, page = sg_page(sge); if (copied + copy > len) copy = len - copied; - ret = copy_page_to_iter(page, sge->offset, copy, iter); - if (ret != copy) { - msg_rx->sg.start = i; - return -EFAULT; - } + copy = copy_page_to_iter(page, sge->offset, copy, iter); + if (!copy) + return copied ? copied : -EFAULT; copied += copy; if (likely(!peek)) { @@ -56,6 +54,11 @@ int __tcp_bpf_recvmsg(struct sock *sk, struct sk_psock *psock, put_page(page); } } else { + /* Lets not optimize peek case if copy_page_to_iter + * didn't copy the entire length lets just break. + */ + if (copy != sge->length) + return copied; sk_msg_iter_var_next(i); } From patchwork Mon Nov 16 22:28:06 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Fastabend X-Patchwork-Id: 11910945 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.6 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 284E1C388F9 for ; Mon, 16 Nov 2020 22:28:22 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C1ADA2244C for ; Mon, 16 Nov 2020 22:28:21 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="n5wtqRND" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729713AbgKPW2V (ORCPT ); Mon, 16 Nov 2020 17:28:21 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34966 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728982AbgKPW2V (ORCPT ); Mon, 16 Nov 2020 17:28:21 -0500 Received: from mail-oi1-x242.google.com (mail-oi1-x242.google.com [IPv6:2607:f8b0:4864:20::242]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 12BD5C0613CF; Mon, 16 Nov 2020 14:28:21 -0800 (PST) Received: by mail-oi1-x242.google.com with SMTP id q206so20459537oif.13; Mon, 16 Nov 2020 14:28:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:from:to:cc:date:message-id:in-reply-to:references :user-agent:mime-version:content-transfer-encoding; bh=zxfG9WgRQhGfLjPd4ehiW5C8xK9vjl0mxXHi2eGmpCE=; b=n5wtqRNDdnJISGdYJk5SoWlMtxKDrfyweQODBAp/QYBoNYUyfdq/v2caaXZKEdz28M 6v3FN4EI3ZgJML/VBxK87WjnecJWTtF8CI4Fwf7RSFWSoaRB1mqz/MPRy4Ynk+MWlzHv 5NGDnPEblw6DLw5aHLnMcJJfqCk/JpNY6RASt8zme9XLBE93DZStbBWR1/i3UCeVe6sk Uu7BHojxjR0ZwiERJMHDSJgdCCFCpczVHZQq71SPvGZD1Ms/TZv/cbcaPUWqN+EWLF00 JhqB4nkv5rvlEdAwnxOe/+7m8sBjIV4tcXqaGXNi9uq/axdznOuuNEPgDcxD059Eig2v 8Pdg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:from:to:cc:date:message-id:in-reply-to :references:user-agent:mime-version:content-transfer-encoding; bh=zxfG9WgRQhGfLjPd4ehiW5C8xK9vjl0mxXHi2eGmpCE=; b=c54El/hdXKZ5QUY3lQAU5Dkvy2YnMG7uDVs2bn4864rRp4lL/Jc7XTvXkd9hFsbD8P eKiwBaCetO3gGAQPMCv0xFdVcYh1vRWf/PcpIYBPoayM0+1rrkl6ldRseLJ15Gv5twOe QS4hFqjhdjcQPs+Hai/2udFInXzGRGXE5U5PtLsbupMAo/mkmSoSNR1NZRjf/hTvVspc DhqdaekO7TWNgZF9jPrIt2+bbhr+XGlQjhVl+wouzRSvH/XJZIJm/cY1jxEDSwZ8e49f qwUlTjDmN0sr4AxgUTYDKBvnhXn0NvSHE0u0jQhrUaIaE3sYzOrzQX42dhjBimFjAF1H w3AA== X-Gm-Message-State: AOAM533lRnsWzgR0DTDP4FR53GUTrKb/voqIUynvSeDPpiVFVPIUg/+E IXGafJU89tXMuhYl9vy+KkVLNc7CMTJCgg== X-Google-Smtp-Source: ABdhPJwClpPMOo9d48fo8RXeG5bvDhDE+XSEHTNUqaia9jEslACyQm/xuSKP7UG0G6QPoxt80rQI8g== X-Received: by 2002:aca:c4d7:: with SMTP id u206mr572669oif.150.1605565700186; Mon, 16 Nov 2020 14:28:20 -0800 (PST) Received: from [127.0.1.1] ([184.63.162.180]) by smtp.gmail.com with ESMTPSA id b13sm5147660otp.28.2020.11.16.14.28.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 16 Nov 2020 14:28:19 -0800 (PST) Subject: [bpf PATCH v3 2/6] bpf, sockmap: Ensure SO_RCVBUF memory is observed on ingress redirect From: John Fastabend To: jakub@cloudflare.com, ast@kernel.org, daniel@iogearbox.net Cc: john.fastabend@gmail.com, bpf@vger.kernel.org, netdev@vger.kernel.org Date: Mon, 16 Nov 2020 14:28:06 -0800 Message-ID: <160556568657.73229.8404601585878439060.stgit@john-XPS-13-9370> In-Reply-To: <160556562395.73229.12161576665124541961.stgit@john-XPS-13-9370> References: <160556562395.73229.12161576665124541961.stgit@john-XPS-13-9370> User-Agent: StGit/0.23-36-gc01b MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net Fix sockmap sk_skb programs so that they observe sk_rcvbuf limits. This allows users to tune SO_RCVBUF and sockmap will honor them. We can refactor the if(charge) case out in later patches. But, keep this fix to the point. Fixes: 51199405f9672 ("bpf: skb_verdict, support SK_PASS on RX BPF path") Suggested-by: Jakub Sitnicki Reviewed-by: Jakub Sitnicki Signed-off-by: John Fastabend --- net/core/skmsg.c | 20 ++++++++++++++++---- net/ipv4/tcp_bpf.c | 3 ++- 2 files changed, 18 insertions(+), 5 deletions(-) diff --git a/net/core/skmsg.c b/net/core/skmsg.c index 654182ecf87b..fe44280c033e 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -170,10 +170,12 @@ static int sk_msg_free_elem(struct sock *sk, struct sk_msg *msg, u32 i, struct scatterlist *sge = sk_msg_elem(msg, i); u32 len = sge->length; - if (charge) - sk_mem_uncharge(sk, len); - if (!msg->skb) + /* When the skb owns the memory we free it from consume_skb path. */ + if (!msg->skb) { + if (charge) + sk_mem_uncharge(sk, len); put_page(sg_page(sge)); + } memset(sge, 0, sizeof(*sge)); return len; } @@ -403,6 +405,9 @@ static int sk_psock_skb_ingress(struct sk_psock *psock, struct sk_buff *skb) int copied = 0, num_sge; struct sk_msg *msg; + if (atomic_read(&sk->sk_rmem_alloc) > sk->sk_rcvbuf) + return -EAGAIN; + msg = kzalloc(sizeof(*msg), __GFP_NOWARN | GFP_ATOMIC); if (unlikely(!msg)) return -EAGAIN; @@ -418,7 +423,14 @@ static int sk_psock_skb_ingress(struct sk_psock *psock, struct sk_buff *skb) return num_sge; } - sk_mem_charge(sk, skb->len); + /* This will transition ownership of the data from the socket where + * the BPF program was run initiating the redirect to the socket + * we will eventually receive this data on. The data will be released + * from skb_consume found in __tcp_bpf_recvmsg() after its been copied + * into user buffers. + */ + skb_set_owner_r(skb, sk); + copied = skb->len; msg->sg.start = 0; msg->sg.size = copied; diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c index 8e950b0bfabc..bc7d2a586e18 100644 --- a/net/ipv4/tcp_bpf.c +++ b/net/ipv4/tcp_bpf.c @@ -45,7 +45,8 @@ int __tcp_bpf_recvmsg(struct sock *sk, struct sk_psock *psock, if (likely(!peek)) { sge->offset += copy; sge->length -= copy; - sk_mem_uncharge(sk, copy); + if (!msg_rx->skb) + sk_mem_uncharge(sk, copy); msg_rx->sg.size -= copy; if (!sge->length) { From patchwork Mon Nov 16 22:28:26 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Fastabend X-Patchwork-Id: 11910943 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.6 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B3291C388F9 for ; Mon, 16 Nov 2020 22:28:49 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 576F222453 for ; Mon, 16 Nov 2020 22:28:49 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="RSIwQmz5" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727841AbgKPW2l (ORCPT ); Mon, 16 Nov 2020 17:28:41 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35020 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727261AbgKPW2k (ORCPT ); Mon, 16 Nov 2020 17:28:40 -0500 Received: from mail-oi1-x241.google.com (mail-oi1-x241.google.com [IPv6:2607:f8b0:4864:20::241]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B0100C0613CF; Mon, 16 Nov 2020 14:28:40 -0800 (PST) Received: by mail-oi1-x241.google.com with SMTP id c80so20535248oib.2; Mon, 16 Nov 2020 14:28:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:from:to:cc:date:message-id:in-reply-to:references :user-agent:mime-version:content-transfer-encoding; bh=q+bd+cUjjSfkR1FQPl/zR8A8bFzyViDmXWLlsRkGbGs=; b=RSIwQmz5qzYWep66OFswdDRF+PMc0FkXPJiK8NCcWJqrpcZahgHDKA7UycPIA7IxbP 5Bns8KKbS4x7BgoC5v8oy0Km/gRTFNI52p/DaeAref5MUfBg7jhRTFAhnj3YCQ4uchyI 9fB/4jZpL8Y8LxgovARdP9mZ9J1UYkZtDRae5l/RS4lPH7WJJxs3G69r04+qh1ZIY/Xc VVl/PAgKV5vnvTQwHN4ZzkfL/DA7s0Wn5XHjqurRbgfkcnhoBwP3C5d4gSi0pVLincxj I30T9iPwRiQBiqNGx/pmbJ7SwGt0msNA3m+Ll0/p3EeObewkSWPMDg2/n5HSWbArL9CZ Zyyg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:from:to:cc:date:message-id:in-reply-to :references:user-agent:mime-version:content-transfer-encoding; bh=q+bd+cUjjSfkR1FQPl/zR8A8bFzyViDmXWLlsRkGbGs=; b=j6h2UQ7L/j5TmDGHjZ0MVZBt85Lo9TuUHWb5wiUH26G3wyxSjWsO+aMgD5iygcWWx5 cV0vqpiUJ90splitrs+M8BgsnTg8xSNs/UP7PzWojLicYtSzhSzJDBqwIP4AP+zvvOb5 AJgDX2aYeWde67h5DY5lokO+sjBxqqss9uFh8ekhHL6c0jw42yVNjHJv+W70DY98c6tP DPQhXRUtfwVoE5zIUE7PmfchdfNSA3bhXS8PiN9fRc0GbklF/PFbX/ac8xmlzixLCtiv 5LYNQq4kOZk9lIKinw8RtonCw2eKNc4yc66zDnWxBwCvjS/FtFIMvljPCLITEEZqJ+LX 8Tiw== X-Gm-Message-State: AOAM530FMEDN6q3nsbAWjjeLK9plQT3+iGTGYgZHCCyubHxDDgrDmtpk fSheyFTjQnEd8czyondsEX5Qv3VumPK9Hg== X-Google-Smtp-Source: ABdhPJxwWWyk+mIWcAW2pSC9zdUzwazc5azWHjsQJzNy95zwVpS65oRUtWVp26jcWWeJNYFjVNivZg== X-Received: by 2002:aca:54c5:: with SMTP id i188mr565217oib.113.1605565719934; Mon, 16 Nov 2020 14:28:39 -0800 (PST) Received: from [127.0.1.1] ([184.63.162.180]) by smtp.gmail.com with ESMTPSA id j21sm5308540otq.18.2020.11.16.14.28.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 16 Nov 2020 14:28:39 -0800 (PST) Subject: [bpf PATCH v3 3/6] bpf, sockmap: Use truesize with sk_rmem_schedule() From: John Fastabend To: jakub@cloudflare.com, ast@kernel.org, daniel@iogearbox.net Cc: john.fastabend@gmail.com, bpf@vger.kernel.org, netdev@vger.kernel.org Date: Mon, 16 Nov 2020 14:28:26 -0800 Message-ID: <160556570616.73229.17003722112077507863.stgit@john-XPS-13-9370> In-Reply-To: <160556562395.73229.12161576665124541961.stgit@john-XPS-13-9370> References: <160556562395.73229.12161576665124541961.stgit@john-XPS-13-9370> User-Agent: StGit/0.23-36-gc01b MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net We use skb->size with sk_rmem_scheduled() which is not correct. Instead use truesize to align with socket and tcp stack usage of sk_rmem_schedule. Suggested-by: Daniel Borkman Reviewed-by: Jakub Sitnicki Signed-off-by: John Fastabend --- net/core/skmsg.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/core/skmsg.c b/net/core/skmsg.c index fe44280c033e..d09426ce4af3 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -411,7 +411,7 @@ static int sk_psock_skb_ingress(struct sk_psock *psock, struct sk_buff *skb) msg = kzalloc(sizeof(*msg), __GFP_NOWARN | GFP_ATOMIC); if (unlikely(!msg)) return -EAGAIN; - if (!sk_rmem_schedule(sk, skb, skb->len)) { + if (!sk_rmem_schedule(sk, skb, skb->truesize)) { kfree(msg); return -EAGAIN; } From patchwork Mon Nov 16 22:28:46 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Fastabend X-Patchwork-Id: 11910973 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.6 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1067FC388F9 for ; Mon, 16 Nov 2020 22:29:32 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9A35B20E65 for ; Mon, 16 Nov 2020 22:29:31 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="s5zV5qH0" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730057AbgKPW3D (ORCPT ); Mon, 16 Nov 2020 17:29:03 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35078 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727261AbgKPW3D (ORCPT ); Mon, 16 Nov 2020 17:29:03 -0500 Received: from mail-ot1-x343.google.com (mail-ot1-x343.google.com [IPv6:2607:f8b0:4864:20::343]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 04916C0613CF; Mon, 16 Nov 2020 14:29:03 -0800 (PST) Received: by mail-ot1-x343.google.com with SMTP id i18so17627906ots.0; Mon, 16 Nov 2020 14:29:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:from:to:cc:date:message-id:in-reply-to:references :user-agent:mime-version:content-transfer-encoding; bh=F0jSSl8oQ71zl5T0n2RCWV64rAXKAMsxWQ0W0YaMN9Q=; b=s5zV5qH0sGhUupnBT2aeOiheB+gYq0OFhVdvz/2YEXUjpmaTcyn9w+ysN2URDXcSuP E2JSOYoVe8XmrA1EKPA/hVi9hB38junKbqT3lW1Up0+AUS780txh7i4tTM2c/edbI6JL 9A6u+4OXHT2OUw1QY5n5ijqx2uuUKjSVl0hdciraNJV45838OOGto/3RaxNX+BDzXKWQ E8QL977OXw3NG0OTs0LoaDRz4TBE3nBo7/43aXqhnS/A99waV1pwE4tNVL0dr64SwBVy KpLbraJRYL1sBUVzo4q3Nv/I1OCUgUpZ/aM/AW/ivQZT4khHbt4FPUYgJZUse2GnR91X TGkg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:from:to:cc:date:message-id:in-reply-to :references:user-agent:mime-version:content-transfer-encoding; bh=F0jSSl8oQ71zl5T0n2RCWV64rAXKAMsxWQ0W0YaMN9Q=; b=oPCJWUbXXvSQIZB/ZPcZHY5nMNwY0J2AVy7ccjIjKysSryySf1PwSyGSlw77uPgBPk ZLG3yZwG+HkGEo3FsbggAImBxvbreaP3pxPi4n8OSrQC3NRX98h3fpg7bR06Hm/knYwF LudNZW0+SGqIWBj07AdfqTRGcMC3FXgKiSfD8FlZbait72LZBAfFXu281WATHn5LH5eK GkR45aD1kbgrm3oXxXlI4+N3v9Q2KKjQzs9ze4h4erHV3lcWInN7bUjUcYI38dESJa/z pdjZ+oV1nQSgh8uTmnL4joAyh898PsjV6nNUSGMBL9Kdt92Il8LAtwKIM6a3wBPSOhJd HWRQ== X-Gm-Message-State: AOAM532+ul9c+m1I/UETDUoXUJ4vGrwbJ3hPHI5BEGQRlMvML4NmG7t5 5heS0kLJIl7ORqzNiHFUavFoFKZ7Gnxyog== X-Google-Smtp-Source: ABdhPJxlJJI3ZNd24VcOVpPC+g61OVOPDE3AAaoCHMSO/t7UMssRgRfc2RaTcEICG0g2+hi2NJeuAQ== X-Received: by 2002:a05:6830:22c9:: with SMTP id q9mr1178422otc.48.1605565742118; Mon, 16 Nov 2020 14:29:02 -0800 (PST) Received: from [127.0.1.1] ([184.63.162.180]) by smtp.gmail.com with ESMTPSA id j6sm5125463ots.32.2020.11.16.14.28.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 16 Nov 2020 14:29:01 -0800 (PST) Subject: [bpf PATCH v3 4/6] bpf, sockmap: Avoid returning unneeded EAGAIN when redirecting to self From: John Fastabend To: jakub@cloudflare.com, ast@kernel.org, daniel@iogearbox.net Cc: john.fastabend@gmail.com, bpf@vger.kernel.org, netdev@vger.kernel.org Date: Mon, 16 Nov 2020 14:28:46 -0800 Message-ID: <160556572660.73229.12566203819812939627.stgit@john-XPS-13-9370> In-Reply-To: <160556562395.73229.12161576665124541961.stgit@john-XPS-13-9370> References: <160556562395.73229.12161576665124541961.stgit@john-XPS-13-9370> User-Agent: StGit/0.23-36-gc01b MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net If a socket redirects to itself and it is under memory pressure it is possible to get a socket stuck so that recv() returns EAGAIN and the socket can not advance for some time. This happens because when redirecting a skb to the same socket we received the skb on we first check if it is OK to enqueue the skb on the receiving socket by checking memory limits. But, if the skb is itself the object holding the memory needed to enqueue the skb we will keep retrying from kernel side and always fail with EAGAIN. Then userspace will get a recv() EAGAIN error if there are no skbs in the psock ingress queue. This will continue until either some skbs get kfree'd causing the memory pressure to reduce far enough that we can enqueue the pending packet or the socket is destroyed. In some cases its possible to get a socket stuck for a noticable amount of time if the socket is only receiving skbs from sk_skb verdict programs. To reproduce I make the socket memory limits ridiculously low so sockets are always under memory pressure. More often though if under memory pressure it looks like a spurious EAGAIN error on user space side causing userspace to retry and typically enough has moved on the memory side that it works. To fix skip memory checks and skb_orphan if receiving on the same sock as already assigned. For SK_PASS cases this is easy, its always the same socket so we can just omit the orphan/set_owner pair. For backlog cases we need to check skb->sk and decide if the orphan and set_owner pair are needed. Fixes: 51199405f9672 ("bpf: skb_verdict, support SK_PASS on RX BPF path") Reviewed-by: Jakub Sitnicki Signed-off-by: John Fastabend --- net/core/skmsg.c | 72 ++++++++++++++++++++++++++++++++++++++++-------------- 1 file changed, 53 insertions(+), 19 deletions(-) diff --git a/net/core/skmsg.c b/net/core/skmsg.c index d09426ce4af3..9aed5a2c7c5b 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -399,38 +399,38 @@ int sk_msg_memcopy_from_iter(struct sock *sk, struct iov_iter *from, } EXPORT_SYMBOL_GPL(sk_msg_memcopy_from_iter); -static int sk_psock_skb_ingress(struct sk_psock *psock, struct sk_buff *skb) +static struct sk_msg *sk_psock_create_ingress_msg(struct sock *sk, + struct sk_buff *skb) { - struct sock *sk = psock->sk; - int copied = 0, num_sge; struct sk_msg *msg; if (atomic_read(&sk->sk_rmem_alloc) > sk->sk_rcvbuf) - return -EAGAIN; + return NULL; + + if (!sk_rmem_schedule(sk, skb, skb->truesize)) + return NULL; msg = kzalloc(sizeof(*msg), __GFP_NOWARN | GFP_ATOMIC); if (unlikely(!msg)) - return -EAGAIN; - if (!sk_rmem_schedule(sk, skb, skb->truesize)) { - kfree(msg); - return -EAGAIN; - } + return NULL; sk_msg_init(msg); - num_sge = skb_to_sgvec(skb, msg->sg.data, 0, skb->len); + return msg; +} + +static int sk_psock_skb_ingress_enqueue(struct sk_buff *skb, + struct sk_psock *psock, + struct sock *sk, + struct sk_msg *msg) +{ + int num_sge = skb_to_sgvec(skb, msg->sg.data, 0, skb->len); + int copied; + if (unlikely(num_sge < 0)) { kfree(msg); return num_sge; } - /* This will transition ownership of the data from the socket where - * the BPF program was run initiating the redirect to the socket - * we will eventually receive this data on. The data will be released - * from skb_consume found in __tcp_bpf_recvmsg() after its been copied - * into user buffers. - */ - skb_set_owner_r(skb, sk); - copied = skb->len; msg->sg.start = 0; msg->sg.size = copied; @@ -442,6 +442,40 @@ static int sk_psock_skb_ingress(struct sk_psock *psock, struct sk_buff *skb) return copied; } +static int sk_psock_skb_ingress(struct sk_psock *psock, struct sk_buff *skb) +{ + struct sock *sk = psock->sk; + struct sk_msg *msg; + + msg = sk_psock_create_ingress_msg(sk, skb); + if (!msg) + return -EAGAIN; + + /* This will transition ownership of the data from the socket where + * the BPF program was run initiating the redirect to the socket + * we will eventually receive this data on. The data will be released + * from skb_consume found in __tcp_bpf_recvmsg() after its been copied + * into user buffers. + */ + skb_set_owner_r(skb, sk); + return sk_psock_skb_ingress_enqueue(skb, psock, sk, msg); +} + +/* Puts an skb on the ingress queue of the socket already assigned to the + * skb. In this case we do not need to check memory limits or skb_set_owner_r + * because the skb is already accounted for here. + */ +static int sk_psock_skb_ingress_self(struct sk_psock *psock, struct sk_buff *skb) +{ + struct sk_msg *msg = kzalloc(sizeof(*msg), __GFP_NOWARN | GFP_ATOMIC); + struct sock *sk = psock->sk; + + if (unlikely(!msg)) + return -EAGAIN; + sk_msg_init(msg); + return sk_psock_skb_ingress_enqueue(skb, psock, sk, msg); +} + static int sk_psock_handle_skb(struct sk_psock *psock, struct sk_buff *skb, u32 off, u32 len, bool ingress) { @@ -801,7 +835,7 @@ static void sk_psock_verdict_apply(struct sk_psock *psock, * retrying later from workqueue. */ if (skb_queue_empty(&psock->ingress_skb)) { - err = sk_psock_skb_ingress(psock, skb); + err = sk_psock_skb_ingress_self(psock, skb); } if (err < 0) { skb_queue_tail(&psock->ingress_skb, skb); From patchwork Mon Nov 16 22:29:08 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Fastabend X-Patchwork-Id: 11910969 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.6 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 68A66C5519F for ; Mon, 16 Nov 2020 22:29:32 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1D76523A32 for ; Mon, 16 Nov 2020 22:29:32 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="EtB0W5/A" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727261AbgKPW3X (ORCPT ); Mon, 16 Nov 2020 17:29:23 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35130 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726543AbgKPW3X (ORCPT ); Mon, 16 Nov 2020 17:29:23 -0500 Received: from mail-oi1-x244.google.com (mail-oi1-x244.google.com [IPv6:2607:f8b0:4864:20::244]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5A327C0613CF; Mon, 16 Nov 2020 14:29:23 -0800 (PST) Received: by mail-oi1-x244.google.com with SMTP id k19so11878094oic.12; Mon, 16 Nov 2020 14:29:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:from:to:cc:date:message-id:in-reply-to:references :user-agent:mime-version:content-transfer-encoding; bh=wz0IJ/hlMcfL+ARf40dglrvKGm0TqnMJcE8E8aIInjk=; b=EtB0W5/AjEzM//nTSHunLzogOf83F88aMH7RG1i4TamCBAY2986NN9zbCfsTHv5RRI gKoOGXG+tOIr77aPTs/VicUljIJdNPPCAtAiWqSvMhHTI+nSpJ/LKC3G9wsdezx4O3v9 hVNE/2Tzi/llHqRHr5zEe/sV9wvhXuLWq+ye+R7fOGPHS/dNmbYHA/fHV7Pu/NNyjmhP uel6pVymrO2bgkZh05JS6svmDCrMlVVrHhdbkk6d3Puz+3SZsiYA1biX0jcwU7qn9kEL AMjqWI6ev5qDs/VS4L8FnM8cQT4J86X9U05mKTdUpbVLiL0sK6uSq02qIzgSvZ4OVrb2 MnuA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:from:to:cc:date:message-id:in-reply-to :references:user-agent:mime-version:content-transfer-encoding; bh=wz0IJ/hlMcfL+ARf40dglrvKGm0TqnMJcE8E8aIInjk=; b=eUhh2zSSBblzST/RPz23/WVRPb6u8Dpqsh1xmy2sWSjEy6EH2EFUKeQCh6dVK6W3P7 5UdqoUGvxyMu1jP3+Lzl2X1NDMW1HJI6CJCC5ZtRDUjm0g9FQjL9yYDRNY+2DYGip67M Q5UY2VGRnUVcsDur9OTgP70ofws99memQEYE4KP8X1Z8VXnRLpExIOXNZrS8SB/5qq/t qTE5C/KYD0FDFnDhS4V/mnYrBiJFgWoItBRHCLQg6tEMpA+O4wYoa9NF9V1NcrrLO19C MVz+lDWL9a7Ao3A1id2VPvzNj3vE7i2l97+y17SvQcoxL4WW0gEkyQYIr9mcrDjXF7jl ZizQ== X-Gm-Message-State: AOAM531I5s8hVFwrp3E3S3DLfGGJnLI9OguEe7FR+O6ZWKA4aa9mvLfI BhJZ0V39qYnYCPCiQ+uTl4zhYMZCw4Bcrw== X-Google-Smtp-Source: ABdhPJxONS2U765bzX7oLx2jORWygnzCVebNGLXvmvjq14UCLmSS0mobW8L8doT7SrGD2+GRp0gi3A== X-Received: by 2002:a05:6808:494:: with SMTP id z20mr581408oid.10.1605565762485; Mon, 16 Nov 2020 14:29:22 -0800 (PST) Received: from [127.0.1.1] ([184.63.162.180]) by smtp.gmail.com with ESMTPSA id o135sm5229313ooo.38.2020.11.16.14.29.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 16 Nov 2020 14:29:21 -0800 (PST) Subject: [bpf PATCH v3 5/6] bpf, sockmap: Handle memory acct if skb_verdict prog redirects to self From: John Fastabend To: jakub@cloudflare.com, ast@kernel.org, daniel@iogearbox.net Cc: john.fastabend@gmail.com, bpf@vger.kernel.org, netdev@vger.kernel.org Date: Mon, 16 Nov 2020 14:29:08 -0800 Message-ID: <160556574804.73229.11328201020039674147.stgit@john-XPS-13-9370> In-Reply-To: <160556562395.73229.12161576665124541961.stgit@john-XPS-13-9370> References: <160556562395.73229.12161576665124541961.stgit@john-XPS-13-9370> User-Agent: StGit/0.23-36-gc01b MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net If the skb_verdict_prog redirects an skb knowingly to itself, fix your BPF program this is not optimal and an abuse of the API please use SK_PASS. That said there may be cases, such as socket load balancing, where picking the socket is hashed based or otherwise picks the same socket it was received on in some rare cases. If this happens we don't want to confuse userspace giving them an EAGAIN error if we can avoid it. To avoid double accounting in these cases. At the moment even if the skb has already been charged against the sockets rcvbuf and forward alloc we check it again and do set_owner_r() causing it to be orphaned and recharged. For one this is useless work, but more importantly we can have a case where the skb could be put on the ingress queue, but because we are under memory pressure we return EAGAIN. The trouble here is the skb has already been accounted for so any rcvbuf checks include the memory associated with the packet already. This rolls up and can result in unecessary EAGAIN errors in userspace read() calls. Fix by doing an unlikely check and skipping checks if skb->sk == sk. Fixes: 51199405f9672 ("bpf: skb_verdict, support SK_PASS on RX BPF path") Signed-off-by: John Fastabend Reviewed-by: Jakub Sitnicki --- net/core/skmsg.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/net/core/skmsg.c b/net/core/skmsg.c index 9aed5a2c7c5b..514bc9f6f8ae 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -442,11 +442,19 @@ static int sk_psock_skb_ingress_enqueue(struct sk_buff *skb, return copied; } +static int sk_psock_skb_ingress_self(struct sk_psock *psock, struct sk_buff *skb); + static int sk_psock_skb_ingress(struct sk_psock *psock, struct sk_buff *skb) { struct sock *sk = psock->sk; struct sk_msg *msg; + /* If we are receiving on the same sock skb->sk is already assigned, + * skip memory accounting and owner transition seeing it already set + * correctly. + */ + if (unlikely(skb->sk == sk)) + return sk_psock_skb_ingress_self(psock, skb); msg = sk_psock_create_ingress_msg(sk, skb); if (!msg) return -EAGAIN; From patchwork Mon Nov 16 22:29:28 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Fastabend X-Patchwork-Id: 11910971 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.6 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 897FCC388F9 for ; Mon, 16 Nov 2020 22:29:44 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 382EE20E65 for ; Mon, 16 Nov 2020 22:29:44 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="eb1D7T4x" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730156AbgKPW3n (ORCPT ); Mon, 16 Nov 2020 17:29:43 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35202 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730174AbgKPW3n (ORCPT ); Mon, 16 Nov 2020 17:29:43 -0500 Received: from mail-oi1-x244.google.com (mail-oi1-x244.google.com [IPv6:2607:f8b0:4864:20::244]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 74906C0613D2; Mon, 16 Nov 2020 14:29:43 -0800 (PST) Received: by mail-oi1-x244.google.com with SMTP id q206so20462967oif.13; Mon, 16 Nov 2020 14:29:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:from:to:cc:date:message-id:in-reply-to:references :user-agent:mime-version:content-transfer-encoding; bh=el2fYd8ySBa3+ZWXicXA9IRpxKlp+zaCI41Z/ZlGF/8=; b=eb1D7T4xM76sGfv40jZcqpoRXtuhCkeuThZ+CVXQaYkXUhMUB3qWpmajJR7KN3UrQm W4i3euO6Rrisvhkx7PFt9TbF1oYZ+nBbvux2JDNOCnCZR3lS7L4riN5cfwuDw2rKuJvX Vhx1xc7570EwX5zJ9l0fN3yhYRU7bMTb52TKUcgBrl3SfJcyOt2XKhML9DVzy73u69sS AZ8iNLn1yHqRfdK4s13X0abVRmfF19ZoWHTEsAv/L7JpHRyMrrAXVWaaCaRJiAd18oFT vfCxRwB4/uBnhzDJCnt6iFuO7fxPTBWJoy+TosXeMwExAZDTTeRgGfja9LlHK2Ncuze9 D/vQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:from:to:cc:date:message-id:in-reply-to :references:user-agent:mime-version:content-transfer-encoding; bh=el2fYd8ySBa3+ZWXicXA9IRpxKlp+zaCI41Z/ZlGF/8=; b=au93biyizZoBc8yK7sHWd59ZZ9wt6tW8MpHlq4juJyxlsFd8vCevcZgxD6TLW0aC68 gAbuWFvtBURKbBq43v3+CMzNH3176RVIZAB674eN8aW+W9dkeKDSGWyb6OAYlUx1Axqq w2b6ZRQfw6v0isQQTaYNyyyTmohG8PfVtyDxYP+95jG1URVaFjTvITLJu2+RSAAu5DoG r3OKcf9rlVXdMJ5hOb5P5AX47nt5H5VaosY3KHDbq35CsQRsJ0130EvsOcGssUGBsbxO NOl7CVKu5iSeasppX+9ximytqcIjF2rdJ8lY0UMzQQfk9GdUyxbWCInP1gshpW6KNIR9 ncDQ== X-Gm-Message-State: AOAM530YHSMCPBiB4ho2VzOfBt5fScPuX6Q7zviAvyhecwqyXiJc1KPS 0CxRXsWEjCuTJL31lvuPClIAajpY66ufHA== X-Google-Smtp-Source: ABdhPJyNHGfQTOvUJuJt1sTZp9GqGv7Mm4QAjPRw6oJxAKQ9jrKr+OcVXm1VmBk0IJuIKgGGhX1kmg== X-Received: by 2002:aca:ac91:: with SMTP id v139mr539441oie.95.1605565782644; Mon, 16 Nov 2020 14:29:42 -0800 (PST) Received: from [127.0.1.1] ([184.63.162.180]) by smtp.gmail.com with ESMTPSA id o63sm5308469ooa.10.2020.11.16.14.29.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 16 Nov 2020 14:29:41 -0800 (PST) Subject: [bpf PATCH v3 6/6] bpf, sockmap: Avoid failures from skb_to_sgvec when skb has frag_list From: John Fastabend To: jakub@cloudflare.com, ast@kernel.org, daniel@iogearbox.net Cc: john.fastabend@gmail.com, bpf@vger.kernel.org, netdev@vger.kernel.org Date: Mon, 16 Nov 2020 14:29:28 -0800 Message-ID: <160556576837.73229.14800682790808797635.stgit@john-XPS-13-9370> In-Reply-To: <160556562395.73229.12161576665124541961.stgit@john-XPS-13-9370> References: <160556562395.73229.12161576665124541961.stgit@john-XPS-13-9370> User-Agent: StGit/0.23-36-gc01b MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net When skb has a frag_list its possible for skb_to_sgvec() to fail. This happens when the scatterlist has fewer elements to store pages than would be needed for the initial skb plus any of its frags. This case appears rare, but is possible when running an RX parser/verdict programs exposed to the internet. Currently, when this happens we throw an error, break the pipe, and kfree the msg. This effectively breaks the application or forces it to do a retry. Lets catch this case and handle it by doing an skb_linearize() on any skb we receive with frags. At this point skb_to_sgvec should not fail because the failing conditions would require frags to be in place. Fixes: 604326b41a6fb ("bpf, sockmap: convert to generic sk_msg interface") Reviewed-by: Jakub Sitnicki Signed-off-by: John Fastabend --- net/core/skmsg.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/net/core/skmsg.c b/net/core/skmsg.c index 514bc9f6f8ae..25cdbb20f3a0 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -423,9 +423,16 @@ static int sk_psock_skb_ingress_enqueue(struct sk_buff *skb, struct sock *sk, struct sk_msg *msg) { - int num_sge = skb_to_sgvec(skb, msg->sg.data, 0, skb->len); - int copied; + int num_sge, copied; + /* skb linearize may fail with ENOMEM, but lets simply try again + * later if this happens. Under memory pressure we don't want to + * drop the skb. We need to linearize the skb so that the mapping + * in skb_to_sgvec can not error. + */ + if (skb_linearize(skb)) + return -EAGAIN; + num_sge = skb_to_sgvec(skb, msg->sg.data, 0, skb->len); if (unlikely(num_sge < 0)) { kfree(msg); return num_sge;