net/socket: Acquire cgroup_lock in do_sock_getsockopt

Message ID	20240819082513.27176-1-Tze-nan.Wu@mediatek.com (mailing list archive)
State	New
Headers	show Return-Path: <linux-mediatek-bounces+linux-mediatek=archiver.kernel.org@lists.infradead.org> From: Tze-nan Wu <Tze-nan.Wu@mediatek.com> To: <netdev@vger.kernel.org>, "David S. Miller" <davem@davemloft.net>, Eric Dumazet <edumazet@google.com>, Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>, Matthias Brugger <matthias.bgg@gmail.com>, AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> CC: <bobule.chang@mediatek.com>, <wsd_upstream@mediatek.com>, Tze-nan Wu <Tze-nan.Wu@mediatek.com>, Yanghui Li <yanghui.li@mediatek.com>, Cheng-Jui Wang <cheng-jui.wang@mediatek.com>, <linux-kernel@vger.kernel.org>, <linux-arm-kernel@lists.infradead.org>, <linux-mediatek@lists.infradead.org>, <bpf@vger.kernel.org> Subject: [PATCH] net/socket: Acquire cgroup_lock in do_sock_getsockopt Date: Mon, 19 Aug 2024 16:25:12 +0800 Message-ID: <20240819082513.27176-1-Tze-nan.Wu@mediatek.com> MIME-Version: 1.0 Content-Type: text/plain Precedence: list Sender: "Linux-mediatek" <linux-mediatek-bounces@lists.infradead.org> Errors-To: linux-mediatek-bounces+linux-mediatek=archiver.kernel.org@lists.infradead.org
Series	net/socket: Acquire cgroup_lock in do_sock_getsockopt \| expand net/socket: Acquire cgroup_lock in do_sock_getsockopt

Message ID

20240819082513.27176-1-Tze-nan.Wu@mediatek.com (mailing list archive)

State

New

Headers

From: Tze-nan Wu <Tze-nan.Wu@mediatek.com>
To: <netdev@vger.kernel.org>, "David S. Miller" <davem@davemloft.net>, Eric
 Dumazet <edumazet@google.com>, Jakub Kicinski <kuba@kernel.org>, Paolo Abeni
	<pabeni@redhat.com>, Matthias Brugger <matthias.bgg@gmail.com>,
	AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
CC: <bobule.chang@mediatek.com>, <wsd_upstream@mediatek.com>, Tze-nan Wu
	<Tze-nan.Wu@mediatek.com>, Yanghui Li <yanghui.li@mediatek.com>, Cheng-Jui
 Wang <cheng-jui.wang@mediatek.com>, <linux-kernel@vger.kernel.org>,
	<linux-arm-kernel@lists.infradead.org>, <linux-mediatek@lists.infradead.org>,
	<bpf@vger.kernel.org>
Subject: [PATCH] net/socket: Acquire cgroup_lock in do_sock_getsockopt
Date: Mon, 19 Aug 2024 16:25:12 +0800
Message-ID: <20240819082513.27176-1-Tze-nan.Wu@mediatek.com>
MIME-Version: 1.0
Content-Type: text/plain
Precedence: list
Sender: "Linux-mediatek" <linux-mediatek-bounces@lists.infradead.org>
Errors-To: 
 linux-mediatek-bounces+linux-mediatek=archiver.kernel.org@lists.infradead.org

Series

net/socket: Acquire cgroup_lock in do_sock_getsockopt | expand

Commit Message

Tze-nan Wu Aug. 19, 2024, 8:25 a.m. UTC

The return value from `cgroup_bpf_enabled(CGROUP_GETSOCKOPT)` can change
between the invocations of `BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN` and
`BPF_CGROUP_RUN_PROG_GETSOCKOPT`.

If `cgroup_bpf_enabled(CGROUP_GETSOCKOPT)` changes from "false" to
"true"
between the invocations of `BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN` and
`BPF_CGROUP_RUN_PROG_GETSOCKOPT`,
`BPF_CGROUP_RUN_PROG_GETSOCKOPT` will receive an -EFAULT from
`__cgroup_bpf_run_filter_getsockopt(max_optlen=0)` due to `get_user()`
had not reached in `BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN`.

Scenario shown as below:

           `process A`                      `process B`
           -----------                      ------------
  BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN
                                            enable CGROUP_GETSOCKOPT
  BPF_CGROUP_RUN_PROG_GETSOCKOPT (-EFAULT)

Prevent `cgroup_bpf_enabled(CGROUP_GETSOCKOPT)` change between
`BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN` and `BPF_CGROUP_RUN_PROG_GETSOCKOPT`
by acquiring cgroup_lock.

Co-developed-by: Yanghui Li <yanghui.li@mediatek.com>
Signed-off-by: Yanghui Li <yanghui.li@mediatek.com>
Co-developed-by: Cheng-Jui Wang <cheng-jui.wang@mediatek.com>
Signed-off-by: Cheng-Jui Wang <cheng-jui.wang@mediatek.com>
Signed-off-by: Tze-nan Wu <Tze-nan.Wu@mediatek.com>

---

We have encountered this issue by observing that process A could sometimes
get an -EFAULT from getsockopt() during our device boot-up, while another
process B triggers the race condition by enabling CGROUP_GETSOCKOPT
through bpf syscall at the same time.

The race condition is shown below:

           `process A`                        `process B`
           -----------                        ------------
  BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN
         
                                              bpf syscall 
                                        (CGROUP_GETSOCKOPT enabled)

  BPF_CGROUP_RUN_PROG_GETSOCKOPT
  -> __cgroup_bpf_run_filter_getsockopt
    (-EFAULT)

__cgroup_bpf_run_filter_getsockopt return -EFAULT at the line shown below:
	if (optval && (ctx.optlen > max_optlen || ctx.optlen < 0)) {
		if (orig_optlen > PAGE_SIZE && ctx.optlen >= 0) {
			pr_info_once("bpf getsockopt: ignoring program buffer with optlen=%d (max_optlen=%d)\n",
				     ctx.optlen, max_optlen);
			ret = retval;
			goto out;
		}
		ret = -EFAULT; <== return EFAULT here
		goto out;
	}

This patch should fix the race but not sure if it introduces any potential
side effects or regression.

And we wondering if this is a real issue in do_sock_getsockopt or if
getsockopt() is designed to expect such race conditions.
Should the userspace caller always anticipate an -EFAULT from getsockopt()
if another process enables CGROUP_GETSOCKOPT at the same time?

Any comment will be appreciated!

BTW, I added Chengjui and Yanghui to Co-developed due to we have several
discussions on this issue. And we both spend some time on this issue.

---
 net/socket.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

Comments

Eric Dumazet Aug. 19, 2024, 9:13 a.m. UTC | #1

On Mon, Aug 19, 2024 at 10:27 AM Tze-nan Wu <Tze-nan.Wu@mediatek.com> wrote:
>
> The return value from `cgroup_bpf_enabled(CGROUP_GETSOCKOPT)` can change
> between the invocations of `BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN` and
> `BPF_CGROUP_RUN_PROG_GETSOCKOPT`.
>
> If `cgroup_bpf_enabled(CGROUP_GETSOCKOPT)` changes from "false" to
> "true"
> between the invocations of `BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN` and
> `BPF_CGROUP_RUN_PROG_GETSOCKOPT`,
> `BPF_CGROUP_RUN_PROG_GETSOCKOPT` will receive an -EFAULT from
> `__cgroup_bpf_run_filter_getsockopt(max_optlen=0)` due to `get_user()`
> had not reached in `BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN`.
>
> Scenario shown as below:
>
>            `process A`                      `process B`
>            -----------                      ------------
>   BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN
>                                             enable CGROUP_GETSOCKOPT
>   BPF_CGROUP_RUN_PROG_GETSOCKOPT (-EFAULT)
>
> Prevent `cgroup_bpf_enabled(CGROUP_GETSOCKOPT)` change between
> `BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN` and `BPF_CGROUP_RUN_PROG_GETSOCKOPT`
> by acquiring cgroup_lock.
>
> Co-developed-by: Yanghui Li <yanghui.li@mediatek.com>
> Signed-off-by: Yanghui Li <yanghui.li@mediatek.com>
> Co-developed-by: Cheng-Jui Wang <cheng-jui.wang@mediatek.com>
> Signed-off-by: Cheng-Jui Wang <cheng-jui.wang@mediatek.com>
> Signed-off-by: Tze-nan Wu <Tze-nan.Wu@mediatek.com>
>
> ---
>
> We have encountered this issue by observing that process A could sometimes
> get an -EFAULT from getsockopt() during our device boot-up, while another
> process B triggers the race condition by enabling CGROUP_GETSOCKOPT
> through bpf syscall at the same time.
>
> The race condition is shown below:
>
>            `process A`                        `process B`
>            -----------                        ------------
>   BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN
>
>                                               bpf syscall
>                                         (CGROUP_GETSOCKOPT enabled)
>
>   BPF_CGROUP_RUN_PROG_GETSOCKOPT
>   -> __cgroup_bpf_run_filter_getsockopt
>     (-EFAULT)
>
> __cgroup_bpf_run_filter_getsockopt return -EFAULT at the line shown below:
>         if (optval && (ctx.optlen > max_optlen || ctx.optlen < 0)) {
>                 if (orig_optlen > PAGE_SIZE && ctx.optlen >= 0) {
>                         pr_info_once("bpf getsockopt: ignoring program buffer with optlen=%d (max_optlen=%d)\n",
>                                      ctx.optlen, max_optlen);
>                         ret = retval;
>                         goto out;
>                 }
>                 ret = -EFAULT; <== return EFAULT here
>                 goto out;
>         }
>
> This patch should fix the race but not sure if it introduces any potential
> side effects or regression.
>
> And we wondering if this is a real issue in do_sock_getsockopt or if
> getsockopt() is designed to expect such race conditions.
> Should the userspace caller always anticipate an -EFAULT from getsockopt()
> if another process enables CGROUP_GETSOCKOPT at the same time?
>
> Any comment will be appreciated!
>
> BTW, I added Chengjui and Yanghui to Co-developed due to we have several
> discussions on this issue. And we both spend some time on this issue.
>
> ---
>  net/socket.c | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/net/socket.c b/net/socket.c
> index fcbdd5bc47ac..e0b2b16fd238 100644
> --- a/net/socket.c
> +++ b/net/socket.c
> @@ -2370,8 +2370,10 @@ int do_sock_getsockopt(struct socket *sock, bool compat, int level,
>         if (err)
>                 return err;
>
> -       if (!compat)
> +       if (!compat) {
> +               cgroup_lock();
>                 max_optlen = BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN(optlen);
> +       }
>

Acquiring cgroup_lock mutex in socket getsockopt() fast path ?

There is no way we can accept such a patch, please come up with a
reasonable patch.

cgroup_bpf_enabled() should probably be used once.

diff --git a/net/socket.c b/net/socket.c
index fcbdd5bc47ac..e0b2b16fd238 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -2370,8 +2370,10 @@  int do_sock_getsockopt(struct socket *sock, bool compat, int level,
 	if (err)
 		return err;
 
-	if (!compat)
+	if (!compat) {
+		cgroup_lock();
 		max_optlen = BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN(optlen);
+	}
 
 	ops = READ_ONCE(sock->ops);
 	if (level == SOL_SOCKET) {
@@ -2387,10 +2389,12 @@  int do_sock_getsockopt(struct socket *sock, bool compat, int level,
 				      optlen.user);
 	}
 
-	if (!compat)
+	if (!compat) {
 		err = BPF_CGROUP_RUN_PROG_GETSOCKOPT(sock->sk, level, optname,
 						     optval, optlen, max_optlen,
 						     err);
+		cgroup_unlock();
+	}
 
 	return err;
 }

net/socket: Acquire cgroup_lock in do_sock_getsockopt

Commit Message

Comments

Patch