From patchwork Wed Jun 14 08:34:30 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Sebastian Sewior X-Patchwork-Id: 13279722 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 888EAEB64D9 for ; Wed, 14 Jun 2023 08:34:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243564AbjFNIeg (ORCPT ); Wed, 14 Jun 2023 04:34:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59382 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235199AbjFNIef (ORCPT ); Wed, 14 Jun 2023 04:34:35 -0400 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 44D24199C; Wed, 14 Jun 2023 01:34:34 -0700 (PDT) Date: Wed, 14 Jun 2023 10:34:30 +0200 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1686731671; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Z9JEhE0hBbRNxo8+2E1TzVaia14Yakp3vq1MLnG86lA=; b=foJ+Wbj0WC+nPGHftYz9yJRHPEhztvj3Sdeps8Ljt6YHB540Vw1kBFhIjuPsL1sSWMhMQ6 Y/TeO5Y+Jy5j/hlU4EQO9kEk2/rIhZz7ejsw452xfqxf6/W2hAUmRAPUZ7ulKOrof/kN6P 0/PuQgNjujNzEXgSVXHgk5sqOSD/u/Y84YOMQo/6Fm/rq8KP0IyOdfHPTCTl7eXV5JnDfI bOCM80aWeNLjR9PplBGYenpUjKmU6qADFVlaP2eb2eobHijFLAoI+l9HQp2uXH9btudZqp kH0DE4lVCVJy9p9IivOh+vVfcIPe5vRzM8zQDmiQT+M1brAT6Caw28nFZVxTQQ== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1686731671; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Z9JEhE0hBbRNxo8+2E1TzVaia14Yakp3vq1MLnG86lA=; b=pkD7kUXK6OwOM9xLCsZ6ECtfUNx+zHRoZ4mXbogQRlYhTlZK/TZcLLbqnF3eGEGr7NjCQ9 AHK4EAsrWRQY7+BA== From: Sebastian Andrzej Siewior To: Andrii Nakryiko Cc: Alexei Starovoitov , bpf , Alexei Starovoitov , Daniel Borkmann , John Fastabend , "Paul E. McKenney" , Peter Zijlstra , Thomas Gleixner , Linux-Fsdevel Subject: [PATCH v4] bpf: Remove in_atomic() from bpf_link_put(). Message-ID: <20230614083430.oENawF8f@linutronix.de> References: <20230509132433.2FSY_6t7@linutronix.de> <20230525141813.TFZLWM4M@linutronix.de> <20230526112356.fOlWmeOF@linutronix.de> <20230605163733.LD-UCcso@linutronix.de> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org bpf_free_inode() is invoked as a RCU callback. Usually RCU callbacks are invoked within softirq context. By setting rcutree.use_softirq=0 boot option the RCU callbacks will be invoked in a per-CPU kthread with bottom halves disabled which implies a RCU read section. On PREEMPT_RT the context remains fully preemptible. The RCU read section however does not allow schedule() invocation. The latter happens in mutex_lock() performed by bpf_trampoline_unlink_prog() originated from bpf_link_put(). It was pointed out that the bpf_link_put() invocation should not be delayed if originated from close(). It was also pointed out that other invocations from within a syscall should also avoid the workqueue. Everyone else should use workqueue by default to remain safe in the future (while auditing the code, every caller was preemptible except for the RCU case). Let bpf_link_put() use the worker unconditionally. Add bpf_link_put_direct() which will directly free the resources and is used by close() and from within __sys_bpf(). Signed-off-by: Sebastian Andrzej Siewior --- v3…v4: - Revert back to bpf_link_put_direct() to the direct free and let bpf_link_put() use the worker. Let close() and all invocations from within the syscall use bpf_link_put_direct() which are all instances within syscall.c here. v2…v3: - Drop bpf_link_put_direct(). Let bpf_link_put() do the direct free and add bpf_link_put_from_atomic() to do the delayed free via the worker. v1…v2: - Add bpf_link_put_direct() to be used from bpf_link_release() as suggested. kernel/bpf/syscall.c | 29 ++++++++++++++++------------- 1 file changed, 16 insertions(+), 13 deletions(-) diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 14f39c1e573ee..8f09aef5949d4 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -2777,28 +2777,31 @@ static void bpf_link_put_deferred(struct work_struct *work) bpf_link_free(link); } -/* bpf_link_put can be called from atomic context, but ensures that resources - * are freed from process context +/* bpf_link_put might be called from atomic context. It needs to be called + * from sleepable context in order to acquire sleeping locks during the process. */ void bpf_link_put(struct bpf_link *link) { if (!atomic64_dec_and_test(&link->refcnt)) return; - if (in_atomic()) { - INIT_WORK(&link->work, bpf_link_put_deferred); - schedule_work(&link->work); - } else { - bpf_link_free(link); - } + INIT_WORK(&link->work, bpf_link_put_deferred); + schedule_work(&link->work); } EXPORT_SYMBOL(bpf_link_put); +static void bpf_link_put_direct(struct bpf_link *link) +{ + if (!atomic64_dec_and_test(&link->refcnt)) + return; + bpf_link_free(link); +} + static int bpf_link_release(struct inode *inode, struct file *filp) { struct bpf_link *link = filp->private_data; - bpf_link_put(link); + bpf_link_put_direct(link); return 0; } @@ -4764,7 +4767,7 @@ static int link_update(union bpf_attr *attr) if (ret) bpf_prog_put(new_prog); out_put_link: - bpf_link_put(link); + bpf_link_put_direct(link); return ret; } @@ -4787,7 +4790,7 @@ static int link_detach(union bpf_attr *attr) else ret = -EOPNOTSUPP; - bpf_link_put(link); + bpf_link_put_direct(link); return ret; } @@ -4857,7 +4860,7 @@ static int bpf_link_get_fd_by_id(const union bpf_attr *attr) fd = bpf_link_new_fd(link); if (fd < 0) - bpf_link_put(link); + bpf_link_put_direct(link); return fd; } @@ -4934,7 +4937,7 @@ static int bpf_iter_create(union bpf_attr *attr) return PTR_ERR(link); err = bpf_iter_new_fd(link); - bpf_link_put(link); + bpf_link_put_direct(link); return err; }