diff mbox series

[bpf,v3,5/6] bpf, sockmap: Handle memory acct if skb_verdict prog redirects to self

Message ID 160556574804.73229.11328201020039674147.stgit@john-XPS-13-9370 (mailing list archive)
State Accepted
Commit 2443ca66676d50a4eb3305c236bccd84a9828ce2
Delegated to: BPF
Headers show
Series sockmap fixes | expand

Checks

Context Check Description
netdev/cover_letter success Link
netdev/fixes_present success Link
netdev/patch_count success Link
netdev/tree_selection success Clearly marked for bpf
netdev/subject_prefix success Link
netdev/source_inline success Was 0 now: 0
netdev/verify_signedoff success Link
netdev/module_param success Was 0 now: 0
netdev/build_32bit success Errors and warnings before: 1 this patch: 1
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/verify_fixes success Link
netdev/checkpatch warning WARNING: 'unecessary' may be misspelled - perhaps 'unnecessary'? WARNING: line length of 82 exceeds 80 columns
netdev/build_allmodconfig_warn success Errors and warnings before: 1 this patch: 1
netdev/header_inline success Link
netdev/stable success Stable not CCed

Commit Message

John Fastabend Nov. 16, 2020, 10:29 p.m. UTC
If the skb_verdict_prog redirects an skb knowingly to itself, fix your
BPF program this is not optimal and an abuse of the API please use
SK_PASS. That said there may be cases, such as socket load balancing,
where picking the socket is hashed based or otherwise picks the same
socket it was received on in some rare cases. If this happens we don't
want to confuse userspace giving them an EAGAIN error if we can avoid
it.

To avoid double accounting in these cases. At the moment even if the
skb has already been charged against the sockets rcvbuf and forward
alloc we check it again and do set_owner_r() causing it to be orphaned
and recharged. For one this is useless work, but more importantly we
can have a case where the skb could be put on the ingress queue, but
because we are under memory pressure we return EAGAIN. The trouble
here is the skb has already been accounted for so any rcvbuf checks
include the memory associated with the packet already. This rolls
up and can result in unecessary EAGAIN errors in userspace read()
calls.

Fix by doing an unlikely check and skipping checks if skb->sk == sk.

Fixes: 51199405f9672 ("bpf: skb_verdict, support SK_PASS on RX BPF path")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
 net/core/skmsg.c |    8 ++++++++
 1 file changed, 8 insertions(+)

Comments

Jakub Sitnicki Nov. 17, 2020, 8:21 a.m. UTC | #1
On Mon, Nov 16, 2020 at 11:29 PM CET, John Fastabend wrote:
> If the skb_verdict_prog redirects an skb knowingly to itself, fix your
> BPF program this is not optimal and an abuse of the API please use
> SK_PASS. That said there may be cases, such as socket load balancing,
> where picking the socket is hashed based or otherwise picks the same
> socket it was received on in some rare cases. If this happens we don't
> want to confuse userspace giving them an EAGAIN error if we can avoid
> it.
>
> To avoid double accounting in these cases. At the moment even if the
> skb has already been charged against the sockets rcvbuf and forward
> alloc we check it again and do set_owner_r() causing it to be orphaned
> and recharged. For one this is useless work, but more importantly we
> can have a case where the skb could be put on the ingress queue, but
> because we are under memory pressure we return EAGAIN. The trouble
> here is the skb has already been accounted for so any rcvbuf checks
> include the memory associated with the packet already. This rolls
> up and can result in unecessary EAGAIN errors in userspace read()
> calls.
>
> Fix by doing an unlikely check and skipping checks if skb->sk == sk.
>
> Fixes: 51199405f9672 ("bpf: skb_verdict, support SK_PASS on RX BPF path")
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>
> ---
>  net/core/skmsg.c |    8 ++++++++
>  1 file changed, 8 insertions(+)
>
> diff --git a/net/core/skmsg.c b/net/core/skmsg.c
> index 9aed5a2c7c5b..514bc9f6f8ae 100644
> --- a/net/core/skmsg.c
> +++ b/net/core/skmsg.c
> @@ -442,11 +442,19 @@ static int sk_psock_skb_ingress_enqueue(struct sk_buff *skb,
>  	return copied;
>  }
>  
> +static int sk_psock_skb_ingress_self(struct sk_psock *psock, struct sk_buff *skb);
> +
>  static int sk_psock_skb_ingress(struct sk_psock *psock, struct sk_buff *skb)
>  {
>  	struct sock *sk = psock->sk;
>  	struct sk_msg *msg;
>  
> +	/* If we are receiving on the same sock skb->sk is already assigned,
> +	 * skip memory accounting and owner transition seeing it already set
> +	 * correctly.
> +	 */
> +	if (unlikely(skb->sk == sk))
> +		return sk_psock_skb_ingress_self(psock, skb);
>  	msg = sk_psock_create_ingress_msg(sk, skb);
>  	if (!msg)
>  		return -EAGAIN;

Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
diff mbox series

Patch

diff --git a/net/core/skmsg.c b/net/core/skmsg.c
index 9aed5a2c7c5b..514bc9f6f8ae 100644
--- a/net/core/skmsg.c
+++ b/net/core/skmsg.c
@@ -442,11 +442,19 @@  static int sk_psock_skb_ingress_enqueue(struct sk_buff *skb,
 	return copied;
 }
 
+static int sk_psock_skb_ingress_self(struct sk_psock *psock, struct sk_buff *skb);
+
 static int sk_psock_skb_ingress(struct sk_psock *psock, struct sk_buff *skb)
 {
 	struct sock *sk = psock->sk;
 	struct sk_msg *msg;
 
+	/* If we are receiving on the same sock skb->sk is already assigned,
+	 * skip memory accounting and owner transition seeing it already set
+	 * correctly.
+	 */
+	if (unlikely(skb->sk == sk))
+		return sk_psock_skb_ingress_self(psock, skb);
 	msg = sk_psock_create_ingress_msg(sk, skb);
 	if (!msg)
 		return -EAGAIN;