From patchwork Mon Dec 19 04:15:50 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tonghao Zhang X-Patchwork-Id: 13076228 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 47908C4332F for ; Mon, 19 Dec 2022 04:16:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231148AbiLSEQF (ORCPT ); Sun, 18 Dec 2022 23:16:05 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47744 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230061AbiLSEQB (ORCPT ); Sun, 18 Dec 2022 23:16:01 -0500 Received: from mail-pg1-x533.google.com (mail-pg1-x533.google.com [IPv6:2607:f8b0:4864:20::533]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1420021A7 for ; Sun, 18 Dec 2022 20:16:00 -0800 (PST) Received: by mail-pg1-x533.google.com with SMTP id 79so5346776pgf.11 for ; Sun, 18 Dec 2022 20:16:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=ac34a4260OiGhUdPdxmE/Jcjul2AuDI7xA9NMT5T/gM=; b=hEClS99ptdO9g0yJYJQWfjP66dzeQ3slgmhYQ2idzeGVzmMho8CpyMVRTxqZR7hKbP Jlob+XDJnN/BHMe9F5kS9EJ2OSnya2mwra8e0D2ug1tK57oi91iiY6JtWj51KO8CBjRT b5GpK6UrwrZeIE7EIJz/EtUcJyCHAxxyl6KFPMPXIbux4mwEgwyDtvwfGgYrCYQ40HfM tfrV1tw1/EPVW3PUXlz4YUkMThhstaYf57UEDzYfrQXL919CQVmcEImrTiphGyrMl6jO 0LibS4b7vvpflwfSTc8d33JGW/REN8OZSdCiwoMvj2HBaSHcNl2WAkN2Hkl0oK8yDIYg t36w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=ac34a4260OiGhUdPdxmE/Jcjul2AuDI7xA9NMT5T/gM=; b=CvKBQGIvX7pBrN3r87NqjKiGGo7xxk04o/abi4v11cXAou6StoSbhZFntqrXIb20wo A2d6EXvYuUDdYTMOgxLRH8tHfiyL/rDpeWF2IigAcVARTkwqavxyFIIkHlQr0+sKbU9h UWZmjD0P15w7tcnYn1/1EG3Cu40Hl2KtMRO4rbcGIRV73v74Reo6P8ZHJxs41fN2IX02 0pDFV7aPvLApVHZcHc2Xh9hpS+FP5g+eFVTnfvbZcavPUKlQ2FUbB6uvy0bE/u7t2BTi AYhl+dIwha/mPOsjZi57pbDcAKTe0KYSKFYvrXR/2B3K/pZITs9SHK1b30WbeoL5DXJd I0qA== X-Gm-Message-State: ANoB5pm6LyFkqkhlp1zdobk1YiDsMSKAQSDClPrQ6Gy7B2ehKgLlMyfz dClUctSomkpgZG5n/CckT7igSBah5LyxdkKIxTo= X-Google-Smtp-Source: AA0mqf57nxF23fJZEAKk1eTnDqOa0cO00+2uhY1QcgK9cv+aXOSH/pHFrnwykB6VZ7gKbN3/U3zfNw== X-Received: by 2002:aa7:870a:0:b0:574:f201:660a with SMTP id b10-20020aa7870a000000b00574f201660amr35702568pfo.33.1671423359272; Sun, 18 Dec 2022 20:15:59 -0800 (PST) Received: from localhost.localdomain ([1.202.165.115]) by smtp.gmail.com with ESMTPSA id x28-20020aa78f1c000000b00575caf8478dsm5363055pfr.41.2022.12.18.20.15.55 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Sun, 18 Dec 2022 20:15:58 -0800 (PST) From: xiangxia.m.yue@gmail.com To: bpf@vger.kernel.org Cc: Tonghao Zhang , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , KP Singh , Stanislav Fomichev , Hao Luo , Jiri Olsa , Hou Tao Subject: [bpf-next v3 1/2] bpf: hash map, avoid deadlock with suitable hash mask Date: Mon, 19 Dec 2022 12:15:50 +0800 Message-Id: <20221219041551.69344-1-xiangxia.m.yue@gmail.com> X-Mailer: git-send-email 2.30.1 (Apple Git-130) MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Tonghao Zhang The deadlock still may occur while accessed in NMI and non-NMI context. Because in NMI, we still may access the same bucket but with different map_locked index. For example, on the same CPU, .max_entries = 2, we update the hash map, with key = 4, while running bpf prog in NMI nmi_handle(), to update hash map with key = 20, so it will have the same bucket index but have different map_locked index. To fix this issue, using min mask to hash again. Signed-off-by: Tonghao Zhang Cc: Alexei Starovoitov Cc: Daniel Borkmann Cc: Andrii Nakryiko Cc: Martin KaFai Lau Cc: Song Liu Cc: Yonghong Song Cc: John Fastabend Cc: KP Singh Cc: Stanislav Fomichev Cc: Hao Luo Cc: Jiri Olsa Cc: Hou Tao Acked-by: Yonghong Song Acked-by: Hou Tao --- kernel/bpf/hashtab.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c index 5aa2b5525f79..974f104f47a0 100644 --- a/kernel/bpf/hashtab.c +++ b/kernel/bpf/hashtab.c @@ -152,7 +152,7 @@ static inline int htab_lock_bucket(const struct bpf_htab *htab, { unsigned long flags; - hash = hash & HASHTAB_MAP_LOCK_MASK; + hash = hash & min_t(u32, HASHTAB_MAP_LOCK_MASK, htab->n_buckets -1); preempt_disable(); if (unlikely(__this_cpu_inc_return(*(htab->map_locked[hash])) != 1)) { @@ -171,7 +171,7 @@ static inline void htab_unlock_bucket(const struct bpf_htab *htab, struct bucket *b, u32 hash, unsigned long flags) { - hash = hash & HASHTAB_MAP_LOCK_MASK; + hash = hash & min_t(u32, HASHTAB_MAP_LOCK_MASK, htab->n_buckets -1); raw_spin_unlock_irqrestore(&b->raw_lock, flags); __this_cpu_dec(*(htab->map_locked[hash])); preempt_enable(); From patchwork Mon Dec 19 04:15:51 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tonghao Zhang X-Patchwork-Id: 13076229 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7265DC4167B for ; Mon, 19 Dec 2022 04:16:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231246AbiLSEQH (ORCPT ); Sun, 18 Dec 2022 23:16:07 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47754 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231207AbiLSEQF (ORCPT ); Sun, 18 Dec 2022 23:16:05 -0500 Received: from mail-pg1-x534.google.com (mail-pg1-x534.google.com [IPv6:2607:f8b0:4864:20::534]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0854321A7 for ; Sun, 18 Dec 2022 20:16:04 -0800 (PST) Received: by mail-pg1-x534.google.com with SMTP id b12so5366705pgj.6 for ; Sun, 18 Dec 2022 20:16:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=ncnXB0rwjtIsbkYq9pbKmmm21LDgoO0imeQIOZaU0xE=; b=WHOzHKYl9II/3JyBAZhjEZYkE7EFTTVbXb2bbYIPrtQowusqhYCX6wpDSrmxMaO3sv KbqT5HE0PH6BQIQ7TL8631ATNJzaeudknYgzh6towPRf79J96DeiqXvDFmdrZK0gDy/W SHadOQZzdcA2uIT4Ws66jv2yBKeF+XjfoOqoZZxbxW8dykHZzCv9gtRZvLVSSGnvUfph 3Xnu2q2EhyuIyPG7d1bM6aWScE9yWnMPwC6rMXib+JcuVmaL/mXuV8T84txw2Bu2J1wb D/Xj2T1BBR4uBiZfH/CCaL2ga1Ddev1WznNQP/e/GNWoaVfQ51Ws57zEvYUpZTSr8kaJ rXKQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ncnXB0rwjtIsbkYq9pbKmmm21LDgoO0imeQIOZaU0xE=; b=Vyf2fXN67H4os02AeB6eXqQ2whlymTG1RlD92JGvtHp51suGK8i73oBGAdLyslS6x5 DcadGv4vUJ2soOIwQCNj0zMD2uS43uLuT3mPHOTpKKkv8slqeTpMpYAppkEezwW9Sv9b IJ+4pCmDRSpCB+ihZOOZ22Ba8LC1hlVeGrqTun7Lalm4v28PTvH1g3O1NOGThJOmzd6m fKJAWfDGYd/DH5EE9mg7pAfiryZlgK4sdL5KT9hrIGwj89CYEmw/7sf5QmmJgMIV5pP6 QREdvbTEKiPwm6Xg8Kp6XIMwSXZgmXznByNI/fl9Ec2QdSg7G9NQoFR8Jvh/tbRyA4SU 63Pg== X-Gm-Message-State: ANoB5pnx288jD81zkGTdsbH4lfj7qQ9x3nX0PVeOlOkOV9dDvNhmmt41 UVzz/HxE6nOXt4C/6tlNplBysHO/qpd861vqyuQ= X-Google-Smtp-Source: AA0mqf7CcAyDKExr5XhYHut5cnqx0xBn9HsKywsRhD0RLyVj6bXj8rK+zzpKY4vc7cnBMhXdQKuj6g== X-Received: by 2002:aa7:90c5:0:b0:572:6e9b:9f9e with SMTP id k5-20020aa790c5000000b005726e9b9f9emr38300655pfk.19.1671423363163; Sun, 18 Dec 2022 20:16:03 -0800 (PST) Received: from localhost.localdomain ([1.202.165.115]) by smtp.gmail.com with ESMTPSA id x28-20020aa78f1c000000b00575caf8478dsm5363055pfr.41.2022.12.18.20.15.59 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Sun, 18 Dec 2022 20:16:02 -0800 (PST) From: xiangxia.m.yue@gmail.com To: bpf@vger.kernel.org Cc: Tonghao Zhang , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , KP Singh , Stanislav Fomichev , Hao Luo , Jiri Olsa , Hou Tao Subject: [bpf-next v3 2/2] selftests/bpf: add test case for htab map Date: Mon, 19 Dec 2022 12:15:51 +0800 Message-Id: <20221219041551.69344-2-xiangxia.m.yue@gmail.com> X-Mailer: git-send-email 2.30.1 (Apple Git-130) In-Reply-To: <20221219041551.69344-1-xiangxia.m.yue@gmail.com> References: <20221219041551.69344-1-xiangxia.m.yue@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Tonghao Zhang This testing show how to reproduce deadlock in special case. We update htab map in Task and NMI context. Task can be interrupted by NMI, if the same map bucket was locked, there will be a deadlock. * map max_entries is 2. * NMI using key 4 and Task context using key 20. * so same bucket index but map_locked index is different. The selftest use perf to produce the NMI and fentry nmi_handle. Note that bpf_overflow_handler checks bpf_prog_active, but in bpf update map syscall increase this counter in bpf_disable_instrumentation. Then fentry nmi_handle and update hash map will reproduce the issue. Signed-off-by: Tonghao Zhang Cc: Alexei Starovoitov Cc: Daniel Borkmann Cc: Andrii Nakryiko Cc: Martin KaFai Lau Cc: Song Liu Cc: Yonghong Song Cc: John Fastabend Cc: KP Singh Cc: Stanislav Fomichev Cc: Hao Luo Cc: Jiri Olsa Cc: Hou Tao Acked-by: Yonghong Song --- tools/testing/selftests/bpf/DENYLIST.aarch64 | 1 + tools/testing/selftests/bpf/DENYLIST.s390x | 1 + .../selftests/bpf/prog_tests/htab_deadlock.c | 75 +++++++++++++++++++ .../selftests/bpf/progs/htab_deadlock.c | 32 ++++++++ 4 files changed, 109 insertions(+) create mode 100644 tools/testing/selftests/bpf/prog_tests/htab_deadlock.c create mode 100644 tools/testing/selftests/bpf/progs/htab_deadlock.c diff --git a/tools/testing/selftests/bpf/DENYLIST.aarch64 b/tools/testing/selftests/bpf/DENYLIST.aarch64 index 99cc33c51eaa..87e8fc9c9df2 100644 --- a/tools/testing/selftests/bpf/DENYLIST.aarch64 +++ b/tools/testing/selftests/bpf/DENYLIST.aarch64 @@ -24,6 +24,7 @@ fexit_test # fexit_attach unexpected error get_func_args_test # get_func_args_test__attach unexpected error: -524 (errno 524) (trampoline) get_func_ip_test # get_func_ip_test__attach unexpected error: -524 (errno 524) (trampoline) htab_update/reenter_update +htab_deadlock # failed to find kernel BTF type ID of 'nmi_handle': -3 (trampoline) kfree_skb # attach fentry unexpected error: -524 (trampoline) kfunc_call/subprog # extern (var ksym) 'bpf_prog_active': not found in kernel BTF kfunc_call/subprog_lskel # skel unexpected error: -2 diff --git a/tools/testing/selftests/bpf/DENYLIST.s390x b/tools/testing/selftests/bpf/DENYLIST.s390x index 585fcf73c731..735239b31050 100644 --- a/tools/testing/selftests/bpf/DENYLIST.s390x +++ b/tools/testing/selftests/bpf/DENYLIST.s390x @@ -26,6 +26,7 @@ get_func_args_test # trampoline get_func_ip_test # get_func_ip_test__attach unexpected error: -524 (trampoline) get_stack_raw_tp # user_stack corrupted user stack (no backchain userspace) htab_update # failed to attach: ERROR: strerror_r(-524)=22 (trampoline) +htab_deadlock # failed to find kernel BTF type ID of 'nmi_handle': -3 (trampoline) kfree_skb # attach fentry unexpected error: -524 (trampoline) kfunc_call # 'bpf_prog_active': not found in kernel BTF (?) kfunc_dynptr_param # JIT does not support calling kernel function (kfunc) diff --git a/tools/testing/selftests/bpf/prog_tests/htab_deadlock.c b/tools/testing/selftests/bpf/prog_tests/htab_deadlock.c new file mode 100644 index 000000000000..137dce8f1346 --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/htab_deadlock.c @@ -0,0 +1,75 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) 2022 DiDi Global Inc. */ +#define _GNU_SOURCE +#include +#include +#include + +#include "htab_deadlock.skel.h" + +static int perf_event_open(void) +{ + struct perf_event_attr attr = {0}; + int pfd; + + /* create perf event on CPU 0 */ + attr.size = sizeof(attr); + attr.type = PERF_TYPE_HARDWARE; + attr.config = PERF_COUNT_HW_CPU_CYCLES; + attr.freq = 1; + attr.sample_freq = 1000; + pfd = syscall(__NR_perf_event_open, &attr, -1, 0, -1, PERF_FLAG_FD_CLOEXEC); + + return pfd >= 0 ? pfd : -errno; +} + +void test_htab_deadlock(void) +{ + unsigned int val = 0, key = 20; + struct bpf_link *link = NULL; + struct htab_deadlock *skel; + int err, i, pfd; + cpu_set_t cpus; + + skel = htab_deadlock__open_and_load(); + if (!ASSERT_OK_PTR(skel, "skel_open_and_load")) + return; + + err = htab_deadlock__attach(skel); + if (!ASSERT_OK(err, "skel_attach")) + goto clean_skel; + + /* NMI events. */ + pfd = perf_event_open(); + if (pfd < 0) { + if (pfd == -ENOENT || pfd == -EOPNOTSUPP) { + printf("%s:SKIP:no PERF_COUNT_HW_CPU_CYCLES\n", __func__); + test__skip(); + goto clean_skel; + } + if (!ASSERT_GE(pfd, 0, "perf_event_open")) + goto clean_skel; + } + + link = bpf_program__attach_perf_event(skel->progs.bpf_empty, pfd); + if (!ASSERT_OK_PTR(link, "attach_perf_event")) + goto clean_pfd; + + /* Pinned on CPU 0 */ + CPU_ZERO(&cpus); + CPU_SET(0, &cpus); + pthread_setaffinity_np(pthread_self(), sizeof(cpus), &cpus); + + /* update bpf map concurrently on CPU0 in NMI and Task context. + * there should be no kernel deadlock. + */ + for (i = 0; i < 100000; i++) + bpf_map_update_elem(bpf_map__fd(skel->maps.htab), + &key, &val, BPF_ANY); + + bpf_link__destroy(link); +clean_pfd: + close(pfd); +clean_skel: + htab_deadlock__destroy(skel); +} diff --git a/tools/testing/selftests/bpf/progs/htab_deadlock.c b/tools/testing/selftests/bpf/progs/htab_deadlock.c new file mode 100644 index 000000000000..d394f95e97c3 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/htab_deadlock.c @@ -0,0 +1,32 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) 2022 DiDi Global Inc. */ +#include +#include +#include + +char _license[] SEC("license") = "GPL"; + +struct { + __uint(type, BPF_MAP_TYPE_HASH); + __uint(max_entries, 2); + __uint(map_flags, BPF_F_ZERO_SEED); + __type(key, unsigned int); + __type(value, unsigned int); +} htab SEC(".maps"); + +/* nmi_handle on x86 platform. If changing keyword + * "static" to "inline", this prog load failed. */ +SEC("fentry/nmi_handle") +int bpf_nmi_handle(struct pt_regs *regs) +{ + unsigned int val = 0, key = 4; + + bpf_map_update_elem(&htab, &key, &val, BPF_ANY); + return 0; +} + +SEC("perf_event") +int bpf_empty(struct pt_regs *regs) +{ + return 0; +}