From patchwork Fri Dec 16 01:59:11 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kui-Feng Lee X-Patchwork-Id: 13074729 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B5333C4332F for ; Fri, 16 Dec 2022 01:59:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229603AbiLPB7k (ORCPT ); Thu, 15 Dec 2022 20:59:40 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52960 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229665AbiLPB7j (ORCPT ); Thu, 15 Dec 2022 20:59:39 -0500 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EAAA22A435 for ; Thu, 15 Dec 2022 17:59:38 -0800 (PST) Received: from pps.filterd (m0148461.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 2BG0gBAl003361 for ; Thu, 15 Dec 2022 17:59:38 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=s2048-2021-q4; bh=q0dkVT1046Z/0Ay02N3rdXqy7E1eepT0+HFV6m4yjtU=; b=C9qMmJEfbIDdmxFNzgCmEyjWZhmBZ+z58ZuJz/sqNLY5iZTPsKRRb7LzrYtuUBAbOvLC 7c4AiQJCASdEBLY9WevEYt6Xrcqu5PjIF0NYUDN4DWSEbTeUVKwwCko7tAUcPlqlDn3Q KLpJEUdCGgHR+GEbOUyciWbTZ3HKsGW4E5VrdNVG7tsXWEa6UNevXxdGfwpeupZ2Mnx3 X0PT0YvZPY+6iC07944657Pk8fQAj0cuEE/2GoMPRlwa9y+z2pHV9XQRbCEDfMkiVo8N guOmSiOBWlzvNsluIOlU5ZUA4pchfcoDVIWd/r4lAI507/zgLxSwCtGUztQewz2SsHMw QQ== Received: from maileast.thefacebook.com ([163.114.130.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3mfy5sykn2-4 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Thu, 15 Dec 2022 17:59:38 -0800 Received: from twshared26225.38.frc1.facebook.com (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:82::c) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.34; Thu, 15 Dec 2022 17:59:36 -0800 Received: by devbig931.frc1.facebook.com (Postfix, from userid 460691) id 2FCB3A5F24D; Thu, 15 Dec 2022 17:59:30 -0800 (PST) From: Kui-Feng Lee To: , , , , CC: Kui-Feng Lee , Nathan Slingerland Subject: [PATCH bpf-next 1/2] bpf: keep a reference to the mm, in case the task is dead. Date: Thu, 15 Dec 2022 17:59:11 -0800 Message-ID: <20221216015912.991616-2-kuifeng@meta.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20221216015912.991616-1-kuifeng@meta.com> References: <20221216015912.991616-1-kuifeng@meta.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: oRDd4ZAIvI-m7UlFMU8Lf-xS4MoXjbWi X-Proofpoint-ORIG-GUID: oRDd4ZAIvI-m7UlFMU8Lf-xS4MoXjbWi X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.923,Hydra:6.0.545,FMLib:17.11.122.1 definitions=2022-12-15_12,2022-12-15_02,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net Fix the system crash that happens when a task iterator travel through vma of tasks. In task iterators, we used to access mm by following the pointer on the task_struct; however, the death of a task will clear the pointer, even though we still hold the task_struct. That can cause an unexpected crash for a null pointer when an iterator is visiting a task that dies during the visit. Keeping a reference of mm on the iterator ensures we always have a valid pointer to mm. Co-developed-by: Song Liu Signed-off-by: Song Liu Signed-off-by: Kui-Feng Lee Reported-by: Nathan Slingerland Acked-by: Yonghong Song --- kernel/bpf/task_iter.c | 39 +++++++++++++++++++++++++++------------ 1 file changed, 27 insertions(+), 12 deletions(-) diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c index c2a2182ce570..c4ab9d6cdbe9 100644 --- a/kernel/bpf/task_iter.c +++ b/kernel/bpf/task_iter.c @@ -438,6 +438,7 @@ struct bpf_iter_seq_task_vma_info { */ struct bpf_iter_seq_task_common common; struct task_struct *task; + struct mm_struct *mm; struct vm_area_struct *vma; u32 tid; unsigned long prev_vm_start; @@ -456,16 +457,19 @@ task_vma_seq_get_next(struct bpf_iter_seq_task_vma_info *info) enum bpf_task_vma_iter_find_op op; struct vm_area_struct *curr_vma; struct task_struct *curr_task; + struct mm_struct *curr_mm; u32 saved_tid = info->tid; /* If this function returns a non-NULL vma, it holds a reference to - * the task_struct, and holds read lock on vma->mm->mmap_lock. + * the task_struct, holds a refcount on mm->mm_users, and holds + * read lock on vma->mm->mmap_lock. * If this function returns NULL, it does not hold any reference or * lock. */ if (info->task) { curr_task = info->task; curr_vma = info->vma; + curr_mm = info->mm; /* In case of lock contention, drop mmap_lock to unblock * the writer. * @@ -504,13 +508,15 @@ task_vma_seq_get_next(struct bpf_iter_seq_task_vma_info *info) * 4.2) VMA2 and VMA2' covers different ranges, process * VMA2'. */ - if (mmap_lock_is_contended(curr_task->mm)) { + if (mmap_lock_is_contended(curr_mm)) { info->prev_vm_start = curr_vma->vm_start; info->prev_vm_end = curr_vma->vm_end; op = task_vma_iter_find_vma; - mmap_read_unlock(curr_task->mm); - if (mmap_read_lock_killable(curr_task->mm)) + mmap_read_unlock(curr_mm); + if (mmap_read_lock_killable(curr_mm)) { + mmput(curr_mm); goto finish; + } } else { op = task_vma_iter_next_vma; } @@ -535,42 +541,47 @@ task_vma_seq_get_next(struct bpf_iter_seq_task_vma_info *info) op = task_vma_iter_find_vma; } - if (!curr_task->mm) + curr_mm = get_task_mm(curr_task); + if (!curr_mm) goto next_task; - if (mmap_read_lock_killable(curr_task->mm)) + if (mmap_read_lock_killable(curr_mm)) { + mmput(curr_mm); goto finish; + } } switch (op) { case task_vma_iter_first_vma: - curr_vma = find_vma(curr_task->mm, 0); + curr_vma = find_vma(curr_mm, 0); break; case task_vma_iter_next_vma: - curr_vma = find_vma(curr_task->mm, curr_vma->vm_end); + curr_vma = find_vma(curr_mm, curr_vma->vm_end); break; case task_vma_iter_find_vma: /* We dropped mmap_lock so it is necessary to use find_vma * to find the next vma. This is similar to the mechanism * in show_smaps_rollup(). */ - curr_vma = find_vma(curr_task->mm, info->prev_vm_end - 1); + curr_vma = find_vma(curr_mm, info->prev_vm_end - 1); /* case 1) and 4.2) above just use curr_vma */ /* check for case 2) or case 4.1) above */ if (curr_vma && curr_vma->vm_start == info->prev_vm_start && curr_vma->vm_end == info->prev_vm_end) - curr_vma = find_vma(curr_task->mm, curr_vma->vm_end); + curr_vma = find_vma(curr_mm, curr_vma->vm_end); break; } if (!curr_vma) { /* case 3) above, or case 2) 4.1) with vma->next == NULL */ - mmap_read_unlock(curr_task->mm); + mmap_read_unlock(curr_mm); + mmput(curr_mm); goto next_task; } info->task = curr_task; info->vma = curr_vma; + info->mm = curr_mm; return curr_vma; next_task: @@ -579,6 +590,7 @@ task_vma_seq_get_next(struct bpf_iter_seq_task_vma_info *info) put_task_struct(curr_task); info->task = NULL; + info->mm = NULL; info->tid++; goto again; @@ -587,6 +599,7 @@ task_vma_seq_get_next(struct bpf_iter_seq_task_vma_info *info) put_task_struct(curr_task); info->task = NULL; info->vma = NULL; + info->mm = NULL; return NULL; } @@ -658,7 +671,9 @@ static void task_vma_seq_stop(struct seq_file *seq, void *v) */ info->prev_vm_start = ~0UL; info->prev_vm_end = info->vma->vm_end; - mmap_read_unlock(info->task->mm); + mmap_read_unlock(info->mm); + mmput(info->mm); + info->mm = NULL; put_task_struct(info->task); info->task = NULL; } From patchwork Fri Dec 16 01:59:12 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kui-Feng Lee X-Patchwork-Id: 13074730 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3A991C4332F for ; Fri, 16 Dec 2022 01:59:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229636AbiLPB7r (ORCPT ); Thu, 15 Dec 2022 20:59:47 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53022 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229678AbiLPB7r (ORCPT ); Thu, 15 Dec 2022 20:59:47 -0500 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 465522B636 for ; Thu, 15 Dec 2022 17:59:46 -0800 (PST) Received: from pps.filterd (m0044012.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 2BG0i8SB015068 for ; Thu, 15 Dec 2022 17:59:46 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=s2048-2021-q4; bh=OxsACo3ac+olhcG0dHXAbcHu/Qo0K4SqjG2VRFBUb4k=; b=HYJIp0+gSMSBMHZgcqq58jlTOVAPIIJ6clhdyZurWcOtHefv8Kna38Dghf1qHscL0jr8 TYkJLpLRU5amg+ROBtsnFsqCSXjbPfu0eXm6WsmUAocqWE/ABTHlqsEnRDrBuPoCvU/b 8bBVqLJoEn7b60F1zt2cJn0g7R4NoHleZooeIxCZgeX7p8MSwVYtMdN6aIXa/E1RfXZ8 gOVxDHb+kDI73Z5oLx1BNN5+TXcDhAqRrpvF5wTvmK9JDVm60HtNjkYKqKm80IigcSca /YvP2oO6Gim+NLBWSwATmd2AeedUHUE8iCa7U0+XZ0v7EtLPUmYYnjKYeBbAS0Q2BDij DA== Received: from mail.thefacebook.com ([163.114.132.120]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3mg148ewj0-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Thu, 15 Dec 2022 17:59:45 -0800 Received: from twshared24004.14.frc2.facebook.com (2620:10d:c085:208::11) by mail.thefacebook.com (2620:10d:c085:21d::5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.34; Thu, 15 Dec 2022 17:59:45 -0800 Received: by devbig931.frc1.facebook.com (Postfix, from userid 460691) id 2BBFFA5F282; Thu, 15 Dec 2022 17:59:37 -0800 (PST) From: Kui-Feng Lee To: , , , , CC: Kui-Feng Lee Subject: [PATCH bpf-next 2/2] selftests/bpf: create new processes repeatedly in the background. Date: Thu, 15 Dec 2022 17:59:12 -0800 Message-ID: <20221216015912.991616-3-kuifeng@meta.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20221216015912.991616-1-kuifeng@meta.com> References: <20221216015912.991616-1-kuifeng@meta.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: nuVKNlMjjeGc3wTGL_cYYMFRb2GWRNR1 X-Proofpoint-GUID: nuVKNlMjjeGc3wTGL_cYYMFRb2GWRNR1 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.923,Hydra:6.0.545,FMLib:17.11.122.1 definitions=2022-12-15_12,2022-12-15_02,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net According to a report, the system may crash when a task iterator travels vma(s). The investigation shows it takes place if the visiting task dies during the visit. This test case creates iterators repeatedly and forks short-lived processes in the background to detect this bug. The test will last for 3 seconds to get the chance to trigger the issue. Signed-off-by: Kui-Feng Lee Acked-by: Yonghong Song --- .../selftests/bpf/prog_tests/bpf_iter.c | 79 +++++++++++++++++++ 1 file changed, 79 insertions(+) diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_iter.c b/tools/testing/selftests/bpf/prog_tests/bpf_iter.c index 6f8ed61fc4b4..df13350d615a 100644 --- a/tools/testing/selftests/bpf/prog_tests/bpf_iter.c +++ b/tools/testing/selftests/bpf/prog_tests/bpf_iter.c @@ -1465,6 +1465,83 @@ static void test_task_vma_common(struct bpf_iter_attach_opts *opts) bpf_iter_task_vma__destroy(skel); } +static void test_task_vma_dead_task(void) +{ + int err, iter_fd = -1; + struct bpf_iter_task_vma *skel; + int wstatus, child_pid = -1; + time_t start_tm, cur_tm; + int wait_sec = 3; + + skel = bpf_iter_task_vma__open(); + if (!ASSERT_OK_PTR(skel, "bpf_iter_task_vma__open")) + return; + + skel->bss->pid = getpid(); + + err = bpf_iter_task_vma__load(skel); + if (!ASSERT_OK(err, "bpf_iter_task_vma__load")) + goto out; + + skel->links.proc_maps = bpf_program__attach_iter( + skel->progs.proc_maps, NULL); + + if (!ASSERT_OK_PTR(skel->links.proc_maps, "bpf_program__attach_iter")) { + skel->links.proc_maps = NULL; + goto out; + } + + start_tm = time(NULL); + if (start_tm < 0) + goto out; + cur_tm = start_tm; + + child_pid = fork(); + if (child_pid == 0) { + /* Fork short-lived processes in the background. */ + while (cur_tm < start_tm + wait_sec) { + system("echo > /dev/null"); + cur_tm = time(NULL); + if (cur_tm < 0) + exit(1); + } + exit(0); + } + + if (!ASSERT_GE(child_pid, 0, "fork_child")) + goto out; + + while (cur_tm < start_tm + wait_sec) { + iter_fd = bpf_iter_create(bpf_link__fd(skel->links.proc_maps)); + if (!ASSERT_GE(iter_fd, 0, "create_iter")) + goto out; + + /* Drain all data from iter_fd. */ + while (cur_tm < start_tm + wait_sec) { + err = read_fd_into_buffer(iter_fd, task_vma_output, CMP_BUFFER_SIZE); + if (!ASSERT_GE(err, 0, "read_iter_fd")) + goto out; + + cur_tm = time(NULL); + if (cur_tm < 0) + goto out; + + if (err == 0) + break; + } + + close(iter_fd); + iter_fd = -1; + } + + check_bpf_link_info(skel->progs.proc_maps); + +out: + waitpid(child_pid, &wstatus, 0); + close(iter_fd); + bpf_iter_task_vma__destroy(skel); +} + void test_bpf_sockmap_map_iter_fd(void) { struct bpf_iter_sockmap *skel; @@ -1586,6 +1663,8 @@ void test_bpf_iter(void) test_task_file(); if (test__start_subtest("task_vma")) test_task_vma(); + if (test__start_subtest("task_vma_dead_task")) + test_task_vma_dead_task(); if (test__start_subtest("task_btf")) test_task_btf(); if (test__start_subtest("tcp4"))