From patchwork Tue Jul 9 18:54:38 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 13728422 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-ed1-f68.google.com (mail-ed1-f68.google.com [209.85.208.68]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F22EB17B045 for ; Tue, 9 Jul 2024 18:54:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.208.68 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720551292; cv=none; b=JGDzPM+UBAHKUEhl+T7MSS4d+satLK/Sh2GaEJ8d2lYs8TsrofyKahqZhlO1g4ABFAMwpMwn2UJRKt3FrsdDN1aIUhetACgpujjBuhU8DCHBXZMCe+T7fqScxKxBf4mBziOm1iDYvp8NROGqQP6RktbjitqM0QP7bW7rvVu1SB0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720551292; c=relaxed/simple; bh=MNW+92E/Nw/DexAy7fnWsWfXJIC72yC+uqyEd9vvLe8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=R75YtmoT3COGoK7TxDNY27iisrMmU8v474vOI6JDrSjamPqc/scFOhV7en2Y4qvGbuDluMhj5kfFBrh09iawU5G2x+JRLq2Kkmchb0LeW5IxwBxa4d/NFlB+64h2D3UVGqKWrguJIQjzRgVQHGkyqARrPip2aPxaZ+7BYibEqNM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=IcOTrehF; arc=none smtp.client-ip=209.85.208.68 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="IcOTrehF" Received: by mail-ed1-f68.google.com with SMTP id 4fb4d7f45d1cf-58bac81f419so7300328a12.0 for ; Tue, 09 Jul 2024 11:54:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1720551288; x=1721156088; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=rsI5BPkq+USbt3b1GIUPIPi4gqY5NbC67o1W6F1/MlY=; b=IcOTrehFzIt6LNLA5BHv46fZg6PQELOFyQh0/J0Ks+CEGZEK0jgngdf8PVXm4MMcTD 44dJROPRETYsp47jR/m+BE6GsWXHB4/mbK6SxpO2oMBO15p/IHBokcWq7ApHAU2FoSS7 k/Q+LA1wpWBbjLBLOU7c1zPi8Ic62Ovv1GD+iB4QvBCXc1tfVl5D1WTMvPws9ikUk79b XN5bwG3j8W/jaOo7Twwr3sbcfxtvB5a7Damf1j6VIhJJg7zj8Ekgci1F7IShqnTNDg3M MbbH3XSMR1rw4ESGQJqd6c43en15SCPg8H0ua+XMY3hVzRc72Fj17UW/iqeBdq24LVmj Onlg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1720551288; x=1721156088; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=rsI5BPkq+USbt3b1GIUPIPi4gqY5NbC67o1W6F1/MlY=; b=ZiX4ogdnSS733IV9/5PadUQ/uav89venT+Otq+7qwKQwXOvYGvURQ6DrmYxqM05B+a RyuROSdM6Mcg7EAipmy/E/zRRRjrO9iidrc2nwL+najm2I9fFeVyqyT6zECIhEBvzPpQ aBpCbBOiFf7X4ekakGdHVrCFAdZHLk0/pqGvivIhrOk0f6VjLf2EsglS9hGhGW/U+/8w bGMnizi+tiJ1Pr24b0+iGtBCPM90fqC4R9Bu6DCKVZLXpLaDZucs1ctZR2nt2pMG1wa7 pa9fhVLIGL7a8y5A+2H4whWfhPF8JPEHzAF+JWz9R6VspQ9jsAwVqKwOlEypM5XcJUnN YrKA== X-Gm-Message-State: AOJu0Ywvr+g5/HbQLQ+tYZG1EeiX+SSwyqO3YnQpNcp2m4jP4dfD1sHg J9Oul5f9yMqIuw2dOSYH54L55Ih4nDk+tTkWz/Aw56cCO0uHXPnyM+VEWf21 X-Google-Smtp-Source: AGHT+IGOuHYCH5BoI1MXT31d9Vp9tIkDcQ3XhhUgXl83joPX/omfw0hym1E/RVIQUqS+tbztoKKuIw== X-Received: by 2002:a05:6402:2688:b0:57c:ff70:5429 with SMTP id 4fb4d7f45d1cf-594ba98f43dmr2571304a12.8.1720551288289; Tue, 09 Jul 2024 11:54:48 -0700 (PDT) Received: from localhost (nat-icclus-192-26-29-3.epfl.ch. [192.26.29.3]) by smtp.gmail.com with ESMTPSA id 4fb4d7f45d1cf-594bda27f56sm1364348a12.80.2024.07.09.11.54.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 09 Jul 2024 11:54:47 -0700 (PDT) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org Cc: Dohyun Kim , Neel Natu , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , Song Liu , Yonghong Song , Barret Rhoden , Tejun Heo , David Vernet Subject: [PATCH bpf v1 1/3] bpf: Fail bpf_timer_cancel when callback is being cancelled Date: Tue, 9 Jul 2024 18:54:38 +0000 Message-ID: <20240709185440.1104957-2-memxor@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240709185440.1104957-1-memxor@gmail.com> References: <20240709185440.1104957-1-memxor@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=5488; i=memxor@gmail.com; h=from:subject; bh=MNW+92E/Nw/DexAy7fnWsWfXJIC72yC+uqyEd9vvLe8=; b=owEBbQKS/ZANAwAKAUzgyIZIvxHKAcsmYgBmjYULVCA5muRocCwBiYJKLfs4+U03w0OplMC+0 b/Q5vX5Z46JAjMEAAEKAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZo2FCwAKCRBM4MiGSL8R ytj2D/0Q0zR5QtIWT90OdzImpLz3L0js8aFVmj7CthpBTB4RUfYDRZz/O0+4c8Vjg0HmzE1Vwy0 pf+/1NIJVagW4Vo7d+vaNHQqz839W9eaN3Xws39332A1V+iwDnTx2hoV7+rtjK+0srvoeVtib+c oDfOupTK3A7Tsc3rNoTLISdB0aPobRWoUo2mx6F+NAOJ6uvp2SmIZxGoQBR/DzOmw6rOEwvQG5v YwXi2gRHTbPpPEa+6vnJX/leXXqF1lqVBJsstNtNQu8lXOVCSRqn63RDsKYi5PVXzOfAqGjtjqV 0ISnS1szmq6yx67pz2YM2FxK5L9lW2tzS9D6cx2upHzcm9bf53jw2rz8YJ/nZSoM3/cFSBa5QmU +KY/9hkQFcjaSaDuKGM/La1DaVgSTopVNy6mxEgbUvFNRT8IZMoP9uz9Ymn8c5kH1hpyiEQoIEX fOR/rM0SxRPXI7+w0+VmJ1Pk5XV+VpxYhe3+le5dmE5mRaXiChFEx0//4d2ji6WoEP8KSVnt8xN 6jBGGnYN3XLyqesQDqBquKora2hB9JMot2/swCqATV9kdUJLSTLnlvIhZfBQxQf0RG4l9UjWorV gGCBy24HHRtm9tGlkzzlpXbOBmbz+lPKyv2FDJ/t9axCRqQpeDFO5gVCSiOPJZAjwMJIjSknSRu 9HYuM42CWmlgzJw== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-Patchwork-Delegate: bpf@iogearbox.net Given a schedule: timer1 cb timer2 cb bpf_timer_cancel(timer2); bpf_timer_cancel(timer1); Both bpf_timer_cancel calls would wait for the other callback to finish executing, introducing a lockup. Add an atomic_t count named 'cancelling' in bpf_hrtimer. This keeps track of all in-flight cancellation requests for a given BPF timer. Whenever cancelling a BPF timer, we must check if we have outstanding cancellation requests, and if so, we must fail the operation with an error (-EDEADLK) since cancellation is synchronous and waits for the callback to finish executing. This implies that we can enter a deadlock situation involving two or more timer callbacks executing in parallel and attempting to cancel one another. Note that we avoid incrementing the cancelling counter for the target timer (the one being cancelled) if bpf_timer_cancel is not invoked from a callback, to avoid spurious errors. The whole point of detecting cur->cancelling and returning -EDEADLK is to not enter a busy wait loop (which may or may not lead to a lockup). This does not apply in case the caller is in a non-callback context, the other side can continue to cancel as it sees fit without running into errors. Background on prior attempts: Earlier versions of this patch used a bool 'cancelling' bit and used the following pattern under timer->lock to publish cancellation status. lock(t->lock); t->cancelling = true; mb(); if (cur->cancelling) return -EDEADLK; unlock(t->lock); hrtimer_cancel(t->timer); t->cancelling = false; The store outside the critical section could overwrite a parallel requests t->cancelling assignment to true, to ensure the parallely executing callback observes its cancellation status. It would be necessary to clear this cancelling bit once hrtimer_cancel is done, but lack of serialization introduced races. Another option was explored where bpf_timer_start would clear the bit when (re)starting the timer under timer->lock. This would ensure serialized access to the cancelling bit, but may allow it to be cleared before in-flight hrtimer_cancel has finished executing, such that lockups can occur again. Thus, we choose an atomic counter to keep track of all outstanding cancellation requests and use it to prevent lockups in case callbacks attempt to cancel each other while executing in parallel. Reported-by: Dohyun Kim Reported-by: Neel Natu Fixes: b00628b1c7d5 ("bpf: Introduce bpf timers.") Signed-off-by: Kumar Kartikeya Dwivedi --- kernel/bpf/helpers.c | 38 +++++++++++++++++++++++++++++++++++--- 1 file changed, 35 insertions(+), 3 deletions(-) diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c index 2a69a9a36c0f..22e779ca50d5 100644 --- a/kernel/bpf/helpers.c +++ b/kernel/bpf/helpers.c @@ -1107,6 +1107,7 @@ struct bpf_async_cb { struct bpf_hrtimer { struct bpf_async_cb cb; struct hrtimer timer; + atomic_t cancelling; }; struct bpf_work { @@ -1262,6 +1263,7 @@ static int __bpf_async_init(struct bpf_async_kern *async, struct bpf_map *map, u clockid = flags & (MAX_CLOCKS - 1); t = (struct bpf_hrtimer *)cb; + atomic_set(&t->cancelling, 0); hrtimer_init(&t->timer, clockid, HRTIMER_MODE_REL_SOFT); t->timer.function = bpf_timer_cb; cb->value = (void *)async - map->record->timer_off; @@ -1440,7 +1442,8 @@ static void drop_prog_refcnt(struct bpf_async_cb *async) BPF_CALL_1(bpf_timer_cancel, struct bpf_async_kern *, timer) { - struct bpf_hrtimer *t; + struct bpf_hrtimer *t, *cur_t; + bool inc = false; int ret = 0; if (in_nmi()) @@ -1452,14 +1455,41 @@ BPF_CALL_1(bpf_timer_cancel, struct bpf_async_kern *, timer) ret = -EINVAL; goto out; } - if (this_cpu_read(hrtimer_running) == t) { + + cur_t = this_cpu_read(hrtimer_running); + if (cur_t == t) { /* If bpf callback_fn is trying to bpf_timer_cancel() * its own timer the hrtimer_cancel() will deadlock - * since it waits for callback_fn to finish + * since it waits for callback_fn to finish. + */ + ret = -EDEADLK; + goto out; + } + + /* Only account in-flight cancellations when invoked from a timer + * callback, since we want to avoid waiting only if other _callbacks_ + * are waiting on us, to avoid introducing lockups. Non-callback paths + * are ok, since nobody would synchronously wait for their completion. + */ + if (!cur_t) + goto drop; + atomic_inc(&t->cancelling); + /* Need full barrier after relaxed atomic_inc */ + smp_mb__after_atomic(); + inc = true; + if (atomic_read(&cur_t->cancelling)) { + /* We're cancelling timer t, while some other timer callback is + * attempting to cancel us. In such a case, it might be possible + * that timer t belongs to the other callback, or some other + * callback waiting upon it (creating transitive dependencies + * upon us), and we will enter a deadlock if we continue + * cancelling and waiting for it synchronously, since it might + * do the same. Bail! */ ret = -EDEADLK; goto out; } +drop: drop_prog_refcnt(&t->cb); out: __bpf_spin_unlock_irqrestore(&timer->lock); @@ -1467,6 +1497,8 @@ BPF_CALL_1(bpf_timer_cancel, struct bpf_async_kern *, timer) * if it was running. */ ret = ret ?: hrtimer_cancel(&t->timer); + if (inc) + atomic_dec(&t->cancelling); rcu_read_unlock(); return ret; } From patchwork Tue Jul 9 18:54:39 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 13728423 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-ej1-f65.google.com (mail-ej1-f65.google.com [209.85.218.65]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F0D2B80BEC for ; Tue, 9 Jul 2024 18:54:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.218.65 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720551293; cv=none; b=fCe2vLcuD0S1CmtMtl/yXED0LeUafmlg6ikpdPWAYw451SahwxjbHhFNmkUImz/Yn3s7xjEY+M2OrT85VEzL+E2UuJum03hxMjELzHzdg7aYsOZjcTe4/hQ8CDzKrphKyo0qM7RXRqlzm4s4nJQ59EL+6weB+Uu3uvIpKcBf1/8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720551293; c=relaxed/simple; bh=nKc6Xfyz1LTZezI04trVSCNAspYleV6waAmDm9GBKbo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=VwJUuWD3Ub+QIC7EaNUa72nZJO7+oXeDqKJxZgbZfVCQsDb+VMtdSIv7V+hoU04p1WKxRMUw1WLRWXCwS6EuirZXterWZ6SwuulyMjPDGX4V898pxjoDMGxrQxFipbeJHLexcLxD4VXHNy0Ei33e4TyxIEc68ZtveNkIGwRIXfM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=Jg9kliOI; arc=none smtp.client-ip=209.85.218.65 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Jg9kliOI" Received: by mail-ej1-f65.google.com with SMTP id a640c23a62f3a-a77dc08db60so489397866b.1 for ; Tue, 09 Jul 2024 11:54:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1720551290; x=1721156090; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=GEBqLiX+v7k3/CljmBNVSBmS+deJTGADxiomoBN2mjs=; b=Jg9kliOIEiQdzM0Banrqe19ln8AnAzqly6enC0o7ZZZ3f06443MyT6FsB1yck4LDvK yFGPMytv6CC3taFIOl/AEsOhvDQIQSjuYLQ79Ghg57Z3a+YEibqMvZXFOJg0pVr9cONV u4L88kCPilVVlMFzP2u+le4lMltYov4jX+YPOv/CDU2eTs4yfmWQ57mu/Fp3brTP8Bq2 QMDAP+TSCGx8d1gErc59XDm0n1xHgdHEoodoL3L3Gi3UZfp9FpiDvdKu7egcHeC69MA0 RDQqkVzOWFycrtRaqw2uonhik0X0xoI57VkJiMe1OFF+n5lO8zuecOdW9u6vkOh++g+L YnFw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1720551290; x=1721156090; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=GEBqLiX+v7k3/CljmBNVSBmS+deJTGADxiomoBN2mjs=; b=gklZZW232SyBOs+mSWZl7YZsuN0deFWBI41QzQZ2XzgwSJk9E6YpudmbXRx9oylrp1 KMSDRo3lp1m0g6zyAV3f3lJ01t9fGVfDe5OMdat6wa9hZEF5axLoR6aCDBd7mj0SbPus qNZF7pE0nZ+VLTdW2GcuGHVvSYVy3VWB9Oq6FQyQUyiK3iwmYPsmitOBLDTqFcGaLc/6 +7gvfrxMwd3bvHuEsLP9KYP5huxGZ+Bniz/XAug58WLPJ7dAvFlXhXdXLekyyOcfsg6z Hka5lTMVJJ89r3BjZbEwrEyYsTUujlGNkYhdpLCWMZL46WRp+LT4kCtFP7flUs/C9r7T C2KQ== X-Gm-Message-State: AOJu0Yye8iXy8e//WRSkb7i/RZrJpeP62LvOAC5Ng3L73v6gC3jGQkOU e4aQesGkOzRAJ0S2sogt7cj+sanEIHihTcNb9rAMSX7S10HCW8gY/N+lgv1/ X-Google-Smtp-Source: AGHT+IGWLRP+twsYvajQegmakqFuVUsc8ZFq8+ZmmdOj3aztNCVMSz0Pr9oeeuSpwsDTKsug9yEw+w== X-Received: by 2002:a17:906:a08:b0:a6f:b702:8a21 with SMTP id a640c23a62f3a-a780b89cbf1mr258099966b.63.1720551289964; Tue, 09 Jul 2024 11:54:49 -0700 (PDT) Received: from localhost (nat-icclus-192-26-29-3.epfl.ch. [192.26.29.3]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a780a6dfa93sm97151266b.65.2024.07.09.11.54.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 09 Jul 2024 11:54:49 -0700 (PDT) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org Cc: Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , Song Liu , Yonghong Song , Dohyun Kim , Neel Natu , Barret Rhoden , Tejun Heo , David Vernet Subject: [PATCH bpf v1 2/3] bpf: Defer work in bpf_timer_cancel_and_free Date: Tue, 9 Jul 2024 18:54:39 +0000 Message-ID: <20240709185440.1104957-3-memxor@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240709185440.1104957-1-memxor@gmail.com> References: <20240709185440.1104957-1-memxor@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=6215; i=memxor@gmail.com; h=from:subject; bh=nKc6Xfyz1LTZezI04trVSCNAspYleV6waAmDm9GBKbo=; b=owEBbQKS/ZANAwAKAUzgyIZIvxHKAcsmYgBmjYUMrTotNGJbP852a7CanXC1Q5MVSa/4rQoVJ gDiS+YMa6OJAjMEAAEKAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZo2FDAAKCRBM4MiGSL8R ykjPEACx51BoHawpHaiNC5hrO9IoHSO9CC5STU5r9v/fI7xCQKQYEdUOljv2KVw2ShPU0NHEYLU R3FoxXFEuYWv8FQn3CrOrkw9AHj/MUgkli9MG86slHBKnrpcxi4gBgQxmBLlUZjMIWJs9dYLLsY 78p6yxwAuNotjb/zWe9deoubkPqQOmJxVRLb78djAiJ8Eb7JMOO5LdXFt0oIY1MrzZmol6bYv/x AorfUjKEFi7QPEd5r9aEF1EVO9spjUogD2Ktht+e50QJWi7mcSduuVUIyeYZp96hTNkWeVQr7qB tj/i/Yo5zmQKXe9leO8DLkrjrE/NvN9U4bXHTmX+BEsRA5a4KkyWgmSusaIQcwx3xDkyV03pULY k0o2172hN/Ox807/WF3zNhTTYRKIZvsha9v3tqWnwFjU7ZMAf65dl72FtoYO7EwuF4AYb9mja8I RNIk1dYDu9RTJXdMm7z8qM7oGqiFqHEj003XhBNSklzZMRzSWi6zZBKVyiNQIlsO2qmDTXX0GEh bV+8na0pHEcQBlsg6uUtiCLnSeapyc6zZLXh4hHTw+wR8oAxviM7+wewO5iFHFxdFPSqJ9ApvEp JU1rYCUxixOGDUeH8cqhmo/jYnosF29wzXon9y/SSQI0oxcgjVoOVZ20YoyC8amCEK3iZQsMugz QjQ702wsLokbltQ== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-Patchwork-Delegate: bpf@iogearbox.net Currently, the same case as previous patch (two timer callbacks trying to cancel each other) can be invoked through bpf_map_update_elem as well, or more precisely, freeing map elements containing timers. Since this relies on hrtimer_cancel as well, it is prone to the same deadlock situation as the previous patch. It would be sufficient to use hrtimer_try_to_cancel to fix this problem, as the timer cannot be enqueued after async_cancel_and_free. Once async_cancel_and_free has been done, the timer must be reinitialized before it can be armed again. The callback running in parallel trying to arm the timer will fail, and freeing bpf_hrtimer without waiting is sufficient (given kfree_rcu), and bpf_timer_cb will return HRTIMER_NORESTART, preventing the timer from being rearmed again. However, there exists a UAF scenario where the callback arms the timer before entering this function, such that if cancellation fails (due to timer callback invoking this routine, or the target timer callback running concurrently). In such a case, if the timer expiration is significantly far in the future, the RCU grace period expiration happening before it will free the bpf_hrtimer state and along with it the struct hrtimer, that is enqueued. Hence, it is clear cancellation needs to occur after async_cancel_and_free, and yet it cannot be done inline due to deadlock issues. We thus modify bpf_timer_cancel_and_free to defer work to the global workqueue, adding a work_struct alongside rcu_head (both used at _different_ points of time, so can share space). Update existing code comments to reflect the new state of affairs. Fixes: b00628b1c7d5 ("bpf: Introduce bpf timers.") Signed-off-by: Kumar Kartikeya Dwivedi --- kernel/bpf/helpers.c | 61 ++++++++++++++++++++++++++++++++++---------- 1 file changed, 47 insertions(+), 14 deletions(-) diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c index 22e779ca50d5..3243c83ef3e3 100644 --- a/kernel/bpf/helpers.c +++ b/kernel/bpf/helpers.c @@ -1084,7 +1084,10 @@ struct bpf_async_cb { struct bpf_prog *prog; void __rcu *callback_fn; void *value; - struct rcu_head rcu; + union { + struct rcu_head rcu; + struct work_struct delete_work; + }; u64 flags; }; @@ -1220,6 +1223,21 @@ static void bpf_wq_delete_work(struct work_struct *work) kfree_rcu(w, cb.rcu); } +static void bpf_timer_delete_work(struct work_struct *work) +{ + struct bpf_hrtimer *t = container_of(work, struct bpf_hrtimer, cb.delete_work); + + /* Cancel the timer and wait for callback to complete if it was running. + * If hrtimer_cancel() can be safely called it's safe to call + * kfree_rcu(t) right after for both preallocated and non-preallocated + * maps. The async->cb = NULL was already done and no code path can see + * address 't' anymore. Timer if armed for existing bpf_hrtimer before + * bpf_timer_cancel_and_free will have been cancelled. + */ + hrtimer_cancel(&t->timer); + kfree_rcu(t, cb.rcu); +} + static int __bpf_async_init(struct bpf_async_kern *async, struct bpf_map *map, u64 flags, enum bpf_async_type type) { @@ -1264,6 +1282,7 @@ static int __bpf_async_init(struct bpf_async_kern *async, struct bpf_map *map, u t = (struct bpf_hrtimer *)cb; atomic_set(&t->cancelling, 0); + INIT_WORK(&t->cb.delete_work, bpf_timer_delete_work); hrtimer_init(&t->timer, clockid, HRTIMER_MODE_REL_SOFT); t->timer.function = bpf_timer_cb; cb->value = (void *)async - map->record->timer_off; @@ -1544,25 +1563,39 @@ void bpf_timer_cancel_and_free(void *val) if (!t) return; - /* Cancel the timer and wait for callback to complete if it was running. - * If hrtimer_cancel() can be safely called it's safe to call kfree(t) - * right after for both preallocated and non-preallocated maps. - * The async->cb = NULL was already done and no code path can - * see address 't' anymore. - * - * Check that bpf_map_delete/update_elem() wasn't called from timer - * callback_fn. In such case don't call hrtimer_cancel() (since it will - * deadlock) and don't call hrtimer_try_to_cancel() (since it will just - * return -1). Though callback_fn is still running on this cpu it's + /* We check that bpf_map_delete/update_elem() was called from timer + * callback_fn. In such case we don't call hrtimer_cancel() (since it + * will deadlock) and don't call hrtimer_try_to_cancel() (since it will + * just return -1). Though callback_fn is still running on this cpu it's * safe to do kfree(t) because bpf_timer_cb() read everything it needed * from 't'. The bpf subprog callback_fn won't be able to access 't', * since async->cb = NULL was already done. The timer will be * effectively cancelled because bpf_timer_cb() will return * HRTIMER_NORESTART. + * + * However, it is possible the timer callback_fn calling us armed the + * timer _before_ calling us, such that failing to cancel it here will + * cause it to possibly use struct hrtimer after freeing bpf_hrtimer. + * Therefore, we _need_ to cancel any outstanding timers before we do + * kfree_rcu, even though no more timers can be armed. + * + * Moreover, we need to schedule work even if timer does not belong to + * the calling callback_fn, as on two different CPUs, we can end up in a + * situation where both sides run in parallel, try to cancel one + * another, and we end up waiting on both sides in hrtimer_cancel + * without making forward progress, since timer1 depends on time2 + * callback to finish, and vice versa. + * + * CPU 1 (timer1_cb) CPU 2 (timer2_cb) + * bpf_timer_cancel_and_free(timer2) bpf_timer_cancel_and_free(timer1) + * + * To avoid these issues, punt to workqueue context when we are in a + * timer callback. */ - if (this_cpu_read(hrtimer_running) != t) - hrtimer_cancel(&t->timer); - kfree_rcu(t, cb.rcu); + if (this_cpu_read(hrtimer_running)) + queue_work(system_unbound_wq, &t->cb.delete_work); + else + bpf_timer_delete_work(&t->cb.delete_work); } /* This function is called by map_delete/update_elem for individual element and From patchwork Tue Jul 9 18:54:40 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 13728424 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-ej1-f67.google.com (mail-ej1-f67.google.com [209.85.218.67]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5ABB1182A41 for ; Tue, 9 Jul 2024 18:54:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.218.67 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720551295; cv=none; b=QjyQU9xOVhAIHdyZ2l3Q3w43C6SyjarPJPXUD6IqfItFf6EZPNoO2DxfXf05sS8DRj1P2UhcxAzeHn3a0ng+fd6Ps77Z8sPtVopf/Oeg0sbabYHtO0mFZ6kASWjVLyulbzV6D4STqbdsTMkBEvy4nsMkXTSalOY17/4xPwoo/fo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720551295; c=relaxed/simple; bh=jcn0AeM1rbLzBgyDbcxQ/jmlpvg/rwiV8DyXu9Q36PQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=fCja5ituiBmRNYak8/GNd9r7fZKNCkytY/KDjuIOsRNDlvb6CEEdq2/jJpHvTm0465apdfUty88L9ywBPGx62GpTwjQhs9jC+3fJYWtKulVB+RhsDUSvuVPZxnHjpdLmynOhs3vqFLRWIF4cy9eXMO6A64JJuNt3NlGqV58A8/s= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=eWmXgLcN; arc=none smtp.client-ip=209.85.218.67 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="eWmXgLcN" Received: by mail-ej1-f67.google.com with SMTP id a640c23a62f3a-a77e7a6cfa7so315807366b.1 for ; Tue, 09 Jul 2024 11:54:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1720551291; x=1721156091; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=H6vPWh/Cdfv+g6Y41vcOMOGSawu5s0lT6PzyDhP1yBc=; b=eWmXgLcNHcxmOWEdK0xhQJZJNxNHBh5Jx41sTZv5/WVDWsSHZ2nJXJ2BitY9skX2an mfhBFtAAD3hUujPObxIAk0B95FJxSo/Z+2vhSztm1vNYeW8NRxcwmcdK99VcsQBEf+cI 3WrmACCpG+1cC7Sn/s4BiaLmr/EJ7A5zu5+V9/BoLyg+W5TafK4SqZP8weOMDqBkl5qv NrICrO+FuNkjeRXQFp/tkrRz+GyvNzcyOptGrCX/PG5cHSNQ4OqBzkpduKxHymCTv5q2 VO1Vf1f70EspmPzRtxmHT80sYDYd5EXNgmi6dHMCMVYkAeBMcoaAW3DzSI8iRVkwG5KM 2sLg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1720551291; x=1721156091; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=H6vPWh/Cdfv+g6Y41vcOMOGSawu5s0lT6PzyDhP1yBc=; b=jZEFmzQ9vxEo3hoa8YUaHS17ISVjverWosao+8YSInwmGT0aIHGc9vpHSUtGwfCQeQ wltqc+jwkJtXYCcRsPjDecjgIOVEUyVNLNWFua6kVkVUvWwSptpLrSl+Rv05cbYdjXsK 1waVEMhvafbbGtGupwv9ub3TtcL25K78tOvl7YbJnC0H8x8qAL/6Hm14VztJ3IV+TgWb ALDPdGgaA5ZASnHTmzj9GdqNBV+06OsfHotYbd9DwVG/GXgOzDg93tBVhrTT+ChH0bgx +edc8YJQOSf14T6piPqgXN1yxEpFqYpKphI7S6enxU2rvQqcCyUIgdgSwW+dbNJb5WaU 3Dvw== X-Gm-Message-State: AOJu0Yz4qo+1rJcPWvXLeb8/UH79QVyycH/S8/cYVAtGcVAZbhoA340G DxsLjrUeVxyobaAwJ9Yy2jIB0nGDJ2HHBWCt0gVIuCjQGH5M9h2LR8wdzM7y X-Google-Smtp-Source: AGHT+IFGknCJmuYQzN8tfIXXN6/7lNKCtwusaSU6WcsaIzaxoKfSEfWN31L2cX/r266o5eDdVk+rpQ== X-Received: by 2002:a17:906:3c04:b0:a6f:2206:99ae with SMTP id a640c23a62f3a-a780b705382mr192121466b.41.1720551291274; Tue, 09 Jul 2024 11:54:51 -0700 (PDT) Received: from localhost (nat-icclus-192-26-29-3.epfl.ch. [192.26.29.3]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a780a6e1742sm97488766b.70.2024.07.09.11.54.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 09 Jul 2024 11:54:50 -0700 (PDT) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org Cc: Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , Song Liu , Yonghong Song , Dohyun Kim , Neel Natu , Barret Rhoden , Tejun Heo , David Vernet Subject: [PATCH bpf v1 3/3] selftests/bpf: Add timer lockup selftest Date: Tue, 9 Jul 2024 18:54:40 +0000 Message-ID: <20240709185440.1104957-4-memxor@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240709185440.1104957-1-memxor@gmail.com> References: <20240709185440.1104957-1-memxor@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=4896; i=memxor@gmail.com; h=from:subject; bh=jcn0AeM1rbLzBgyDbcxQ/jmlpvg/rwiV8DyXu9Q36PQ=; b=owEBbQKS/ZANAwAKAUzgyIZIvxHKAcsmYgBmjYUMSOZhgQ9ffn0BxhgAVW4Rmng9L5aJJqNRz 6ELcVWEehmJAjMEAAEKAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZo2FDAAKCRBM4MiGSL8R yhoZD/9CnQC4MjclpusmtpL0t9Vl0wqBMmPalO53vQqY8S/ah5C/iP5M6OHrso/izlaprZo4AfF nSr6GefR8/J8zQfEDSYXwTQcmptbVIJlD4EwcsmOqBUndc4B4y/pQMS7Uc00jXD+hmPPa7u+NFT 6gNX3EjPqIenf+3p1cWqcedKBnfDFaN3C8sEjKLn7i0D5jM15nDJErscZs2EFpTTYBq/Ahco9KN pcYY1p/sDRUWACfX88/Tm1h0hzSGrTqklmEpw4Ef79e2yDeOAGt4xeqdTlXI1i7UUGENn1ZCZHy xhbKBESMaNdGr/HFVxSZoNjcI5hzBcdYA/41in7jTsDcSmvJqb4Lbfx41kI4nsRFP1tqPGj1Nn1 7ugO0aFkIi0SgiATGvIflm4qQ5Cpl6DmJGS/FDing+O/Skg9DL7Ik2x/hAK8nJSZUsIKWwzeT+J hyMSVnWxV7pnP6oxsD9zNjsZTWHlAwI2vWl140s2aNivnzWFcE2fJw1DlkHHqej4bk+pgyd1upj Npjf2BKPlfUeth8Fn4X/iuZFAVshYvPJEUu5r0+419s7F4s7EsbLoimNLk3HcuWdNI5SSkhVyN4 g1xzgP8MCte/YuLOE+ZC8inGRAZJMBEHZ3Rv2sW8VNMWtQwqKkXI3LUvPLsU6xuevTfoLHfYpWb 9HsE0/2iNQYdSKA== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-Patchwork-Delegate: bpf@iogearbox.net Add a selftest that tries to trigger a situation where two timer callbacks are attempting to cancel each other's timer. By running them continuously, we hit a condition where both run in parallel and cancel each other. Without the fix in the previous patch, this would cause a lockup as hrtimer_cancel on either side will wait for forward progress from the callback. Ensure that this situation leads to a EDEADLK error. Signed-off-by: Kumar Kartikeya Dwivedi --- .../selftests/bpf/prog_tests/timer_lockup.c | 65 ++++++++++++++ .../selftests/bpf/progs/timer_lockup.c | 85 +++++++++++++++++++ 2 files changed, 150 insertions(+) create mode 100644 tools/testing/selftests/bpf/prog_tests/timer_lockup.c create mode 100644 tools/testing/selftests/bpf/progs/timer_lockup.c diff --git a/tools/testing/selftests/bpf/prog_tests/timer_lockup.c b/tools/testing/selftests/bpf/prog_tests/timer_lockup.c new file mode 100644 index 000000000000..73e376fc5bbd --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/timer_lockup.c @@ -0,0 +1,65 @@ +// SPDX-License-Identifier: GPL-2.0 +#define _GNU_SOURCE +#include +#include +#include +#include +#include "timer_lockup.skel.h" + +long cpu; +int *timer1_err; +int *timer2_err; + +static void *timer_lockup_thread(void *arg) +{ + LIBBPF_OPTS(bpf_test_run_opts, opts, + .data_in = &pkt_v4, + .data_size_in = sizeof(pkt_v4), + .repeat = 10000, + ); + int prog_fd = *(int *)arg; + cpu_set_t cpuset; + + CPU_ZERO(&cpuset); + CPU_SET(__sync_fetch_and_add(&cpu, 1), &cpuset); + ASSERT_OK(pthread_setaffinity_np(pthread_self(), sizeof(cpuset), &cpuset), "cpu affinity"); + + while (!*timer1_err && !*timer2_err) + bpf_prog_test_run_opts(prog_fd, &opts); + + return NULL; +} + +void test_timer_lockup(void) +{ + struct timer_lockup *skel; + pthread_t thrds[2]; + void *ret; + + skel = timer_lockup__open_and_load(); + if (!ASSERT_OK_PTR(skel, "timer_lockup__open_and_load")) + return; + + int timer1_prog = bpf_program__fd(skel->progs.timer1_prog); + int timer2_prog = bpf_program__fd(skel->progs.timer2_prog); + + timer1_err = &skel->bss->timer1_err; + timer2_err = &skel->bss->timer2_err; + + if (!ASSERT_OK(pthread_create(&thrds[0], NULL, timer_lockup_thread, &timer1_prog), "pthread_create thread1")) + return; + if (!ASSERT_OK(pthread_create(&thrds[1], NULL, timer_lockup_thread, &timer2_prog), "pthread_create thread2")) { + pthread_exit(&thrds[0]); + return; + } + + pthread_join(thrds[1], &ret); + pthread_join(thrds[0], &ret); + + if (*timer1_err != -EDEADLK && *timer1_err != 0) + ASSERT_FAIL("timer1_err bad value"); + if (*timer2_err != -EDEADLK && *timer2_err != 0) + ASSERT_FAIL("timer2_err bad value"); + + return; +} diff --git a/tools/testing/selftests/bpf/progs/timer_lockup.c b/tools/testing/selftests/bpf/progs/timer_lockup.c new file mode 100644 index 000000000000..ca29da9ff25c --- /dev/null +++ b/tools/testing/selftests/bpf/progs/timer_lockup.c @@ -0,0 +1,85 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include +#include +#include +#include +#include +#include "bpf_misc.h" + +char _license[] SEC("license") = "GPL"; + +struct elem { + struct bpf_timer t; +}; + +struct { + __uint(type, BPF_MAP_TYPE_ARRAY); + __uint(max_entries, 1); + __type(key, int); + __type(value, struct elem); +} timer1_map SEC(".maps"); + +struct { + __uint(type, BPF_MAP_TYPE_ARRAY); + __uint(max_entries, 1); + __type(key, int); + __type(value, struct elem); +} timer2_map SEC(".maps"); + +int timer1_err; +int timer2_err; + +static int timer_cb1(void *map, int *k, struct bpf_timer *timer) +{ + int key = 0; + + timer = bpf_map_lookup_elem(&timer2_map, &key); + if (timer) { + timer2_err = bpf_timer_cancel(timer); + } + return 0; +} + +static int timer_cb2(void *map, int *k, struct bpf_timer *timer) +{ + int key = 0; + + timer = bpf_map_lookup_elem(&timer1_map, &key); + if (timer) { + timer1_err = bpf_timer_cancel(timer); + } + return 0; +} + +SEC("tc") +int timer1_prog(void *ctx) +{ + int key = 0; + struct bpf_timer *timer; + + timer = bpf_map_lookup_elem(&timer1_map, &key); + if (timer) { + bpf_timer_init(timer, &timer1_map, CLOCK_BOOTTIME); + bpf_timer_set_callback(timer, timer_cb1); + bpf_timer_start(timer, 1, BPF_F_TIMER_CPU_PIN); + } + + return 0; +} + +SEC("tc") +int timer2_prog(void *ctx) +{ + int key = 0; + struct bpf_timer *timer; + + timer = bpf_map_lookup_elem(&timer2_map, &key); + if (timer) { + bpf_timer_init(timer, &timer2_map, CLOCK_BOOTTIME); + bpf_timer_set_callback(timer, timer_cb2); + bpf_timer_start(timer, 1, BPF_F_TIMER_CPU_PIN); + } + + return 0; +}