From patchwork Mon Oct 7 19:05:38 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Jann Horn X-Patchwork-Id: 13825184 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 444DACFB454 for ; Mon, 7 Oct 2024 19:06:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BDD476B007B; Mon, 7 Oct 2024 15:06:20 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B8C576B0082; Mon, 7 Oct 2024 15:06:20 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A54146B0083; Mon, 7 Oct 2024 15:06:20 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 82A3E6B007B for ; Mon, 7 Oct 2024 15:06:20 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 08A0B80DE9 for ; Mon, 7 Oct 2024 19:06:20 +0000 (UTC) X-FDA: 82647736920.09.CE61389 Received: from mail-wm1-f53.google.com (mail-wm1-f53.google.com [209.85.128.53]) by imf23.hostedemail.com (Postfix) with ESMTP id 2D689140014 for ; Mon, 7 Oct 2024 19:06:17 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=nc1bYHi5; spf=pass (imf23.hostedemail.com: domain of jannh@google.com designates 209.85.128.53 as permitted sender) smtp.mailfrom=jannh@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1728327910; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=I7BLvl0BaDY7z+QL2+srsvtgfwui19rmMhSzbH1HlsY=; b=BxBZwk3OCvsG7C6LQPxRtZc9+YVa9gh0OM2Sd74JUe4xhYeMB7a79Hlabo5twfYgHBFajf y8wPaKqdaBC3hbNbk4u/7lVZSxYvSK9je7Dkrg/PL7IUQGCfQhB+P5AgKkSxH/vGI6c3V7 KcyNKyvKOpEcK1qY+OOyZ8NV5Vkm/8w= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=nc1bYHi5; spf=pass (imf23.hostedemail.com: domain of jannh@google.com designates 209.85.128.53 as permitted sender) smtp.mailfrom=jannh@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1728327910; a=rsa-sha256; cv=none; b=7Qv1XmZ1XJ0AxzSBrdx6MU0nKAk2kDAd99dCfsfeDpe95m5GPwCk9pCLKS8vmiY3uRBE1C 6GzwkzNWnJKOllRMH7+IxWOAcUGYO/wkn3hNr+GFhnZ46vFmyAW6lgtTHcsKHnvoETU0Fj rITut9+Wca2C2Q0ipq6AEkwqDp/Ghsc= Received: by mail-wm1-f53.google.com with SMTP id 5b1f17b1804b1-42cb1dd2886so48245e9.0 for ; Mon, 07 Oct 2024 12:06:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1728327976; x=1728932776; darn=kvack.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=I7BLvl0BaDY7z+QL2+srsvtgfwui19rmMhSzbH1HlsY=; b=nc1bYHi5LB0+7YWY7DDj1xh14JkSITiO965AnxydnbH9OXc+s3SnuOEuvshh/+JJDQ 8ZwXckmr6r+R2oACDee/VGSinJkAmRS5PIkjW6RFD3L9a5Zb39MzJcJ4Z8xgwKnci6+d tnRY3VhVGfmdJBi9hZGFgymL7L0ZSnibuyt0tY4PZ00gcJ04s8xQXcc0K8zqc1DljDbQ nhoZrROXDgIEYMM7d00SYc8Gw7yjKk4S9+RATCaho6DGQyyD8q6NwRpi6D27jqNe3gBa wJxbU37N0wL1/JIqRMCxfXPEnQhpswed2IWRC1+rGHtXhOSw6UpkDrS9k7yLfQkvZfo9 eTHA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1728327976; x=1728932776; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=I7BLvl0BaDY7z+QL2+srsvtgfwui19rmMhSzbH1HlsY=; b=ozZWfoHHcrSJf83x+0KSaI17yx2wfagfkwRd4x/JQ36qvVDchGVPJBRjR3hQk1fJE3 1AJW0rlGpBpCgaLWWIDfH4iuLPKIHrLYlpJ0i93MQxcwuI3Gf/tXGLUdvyFyEJj4eREy mU6fbFNTQQoG8c2Lt2WaAc5cDyy1+wLpqRCddlAmM4LigJ2dboVVwMZsSAX8Pau+tzeY v0rXiOzzf/ZB+3y3sreYpsUo2EZSndr52ijptCNP04TgiqBzto82qLB1G6NEU0vJZ1m/ +/atpkHwW5Ws1z3vqIVCzZDRMHsv+5IUckpNguH29B7zfmxYZAPrlmXWSyZ2mu0SS+GI 2Peg== X-Forwarded-Encrypted: i=1; AJvYcCXnLyJherYluGz0XRLIvVjx9zIFpPZoJIEqWzbw1BgXtmfBprggdrSBrHnMRfSv/qpA9tlQ1dB7XQ==@kvack.org X-Gm-Message-State: AOJu0Yxt/OihduqmWdbH9hpe+BKTBfz+LRX6WprAlUs8Ym5mbrEj12Uy aTGTU1Ibl0rSrjaXADVBl0eYxGKt07kTpVThbfWYqrpV8y1aT5Qtip9rEl9exLF6QcWKN8FkBnC hYn45ZOTEz21F+sCEw2ZZa7Wu4zIPxPb4lhTK X-Google-Smtp-Source: AGHT+IF49es84bRQQNTCR1vFcCt9syxS59M7zoUaxBvAPNfJAs/dVUrrUYG50atB7jL+UCVUp3qO8M26JojM+ctc5cQ= X-Received: by 2002:a05:600c:3d15:b0:426:6edd:61a7 with SMTP id 5b1f17b1804b1-42fda96d727mr517735e9.7.1728327976095; Mon, 07 Oct 2024 12:06:16 -0700 (PDT) MIME-Version: 1.0 References: <20240830040101.822209-1-Liam.Howlett@oracle.com> <20240830040101.822209-15-Liam.Howlett@oracle.com> In-Reply-To: <20240830040101.822209-15-Liam.Howlett@oracle.com> From: Jann Horn Date: Mon, 7 Oct 2024 21:05:38 +0200 Message-ID: Subject: [BUG] page table UAF, Re: [PATCH v8 14/21] mm/mmap: Avoid zeroing vma tree in mmap_region() To: "Liam R. Howlett" , Andrew Morton Cc: Lorenzo Stoakes , Linux-MM , kernel list , Suren Baghdasaryan , Matthew Wilcox , Vlastimil Babka , Sidhartha Kumar , Bert Karwatzki , Jiri Olsa , Kees Cook , "Paul E . McKenney" , Jeff Xu , Seth Jenkins X-Rspamd-Server: rspam03 X-Rspam-User: X-Rspamd-Queue-Id: 2D689140014 X-Stat-Signature: fp9udgasgt6bbokciswcoce9p63gxoi4 X-HE-Tag: 1728327977-853466 X-HE-Meta: U2FsdGVkX1/maBm7X1mO0UGC/f7VDJ6ZqNVzMZf6IOVNBksRYPFapCe3lvi9zRCok6lBqBAyCzqqSydrV6Ev7EcSe5zEPEICTd/XUS3Ib+CuIRbJJvSlpfHXMCXoXW+0OuyBPE6aRxazTTkLsHEljwyikuQu1nqvjDwY1TlrQUN1SIznXl/un1926bLDL4reUD0y8Q+tzgPe81FOY8pU7PrVRzn+m7YR6IJSYl2XDvZd4BUUVR6qCOmsGYdyI7ubtwyq24oYepRxkUa3pReoiWG5RbHFYaA0eWqJTteXBMCbex/3rtDSlPNnrkyoW+oJygdGQT9NQp3AKmSt4NNEUg9pnFdoMw+YIerGkBWGEqFhxwotVwkejlqp8WejpPVyzpNFABShuTuQnvHZmb8ZhLBwjB7lbdC1wUVdA/88NgsAOHkLXDkiXNcZdjDfne+bSEedbkGm7Ep/Lw8G5qYO77NGoum9sZeQQh/qg9ElbdcNp2mPxQe2ysJatVi2kqlL+o+i0LxA+NpC+NKxiSq6kTQq9Xqbm8ItvPfbMQKVqpcFwpYi7pJQyoL2wu26wHrMxyeUeZh1WhKeSc7eBwD+DKw8DxxEvt2VvXAD6Do3SC3ckk9RV8V1HV2v+4YSqVfbbuJt+fbSnXRb2bpMJ8gmt3n+jmC9I6ulgM58/INy4CUZPrOMDP0y4fIw3trV0y2y873TFwHaHZAvSZBiIV87A+8EXeDtJbkNLMBVSqrKcttLH40V3Vxq8tk0zJFf7SqV6Y9dEKEe4AUSwmClglhV9HChlxRFV+c0XSKYRz9+sd2D8HxqsSbC1DuZq4vyTV25n1lEcuI1InqEpjvrEkw1MNkdk/BKeBtqeMNT1g2/EAz+CZmHp+odfgDlS5zgRz5KfuiJlNUhj8Z6XPtfko4Cehgst5syoaZAkX6GmdNvZiFCe8Mc7JBOuwvEFltkMPvUTOILnEyYASCENDnuord h/Zm+c8e RjGZwswhYhv06fXsfjAZ0Ccn993k4i0Xc7/7FNAq0jhcIq5qXWRAEKP05dt9SJ2fKFmpQ8lCYS9JKGDBY07hN8018swlFjq1MZbkeCRLHBTovl1K0SfRT29X9Q5VYZ7wcU44Nzs4uijn2bou9BV81DU0rrjucFRfsM44tZB9tXRA/VLm0LGk7TnN7VnSRRcKVjuBGVjjfHWuXiJUNx6GLfCCW/H+7VctoVpLzjT+jvK7hMj0kY4Ew7qUcfewpftDjWIuO6JzrQ7c/6H5d3zolItn3xtXEzOitE0UW/tvl39kmjI7FCIq7kI4QIqpN5ttio4r/GIERmMJLyknHJi1745FA8GYVKLevollth4d4vHv4aNCgAlRqjczVnUVRTgGBNVmVkBv1k8mW2lkmwd/hEjRjjsc0bp8kK3pqUhzasu8yGjO6/mqeDkq5FPs/humu1+L3h+wO3G5oNn6+FEajc48cVqNQQqKjUKTI+1PIaCb+HmY02OPeg6/sw93uJiHy2i9sw/aYr9Q9iluoQSCreGjc+E6cO8pZqxHa3PeL8aZo4kTxvd4KRrqh/KcHKFL/q/bRg37wCT9QVzVLB8AzYg+WI6UAgyRaqg+x+D9GGuNY+1cSitEwUfCfEOv5Nd1ZS8SG X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Aug 30, 2024 at 6:00 AM Liam R. Howlett wrote: > Instead of zeroing the vma tree and then overwriting the area, let the > area be overwritten and then clean up the gathered vmas using > vms_complete_munmap_vmas(). > > To ensure locking is downgraded correctly, the mm is set regardless of > MAP_FIXED or not (NULL vma). > > If a driver is mapping over an existing vma, then clear the ptes before > the call_mmap() invocation. This is done using the vms_clean_up_area() > helper. If there is a close vm_ops, that must also be called to ensure > any cleanup is done before mapping over the area. This also means that > calling open has been added to the abort of an unmap operation, for now. As currently implemented, this is not a valid optimization because it violates the (unwritten?) rule that you must not call free_pgd_range() on a region in the page tables which can concurrently be walked. A region in the page tables can be concurrently walked if it overlaps a VMA which is linked into rmaps which are not write-locked. On Linux 6.12-rc2, when you mmap(MAP_FIXED) over an existing VMA, and the new mapping is created by expanding an adjacent VMA, the following race with an ftruncate() is possible (because page tables for the old mapping are removed while the new VMA in the same location is already fully set up and linked into the rmap): task 1 (mmap, MAP_FIXED) task 2 (ftruncate) ======================== ================== mmap_region vma_merge_new_range vma_expand commit_merge vma_prepare [take rmap locks] vma_set_range [expand adjacent mapping] vma_complete [drop rmap locks] vms_complete_munmap_vmas vms_clear_ptes unmap_vmas [removes ptes] free_pgtables [unlinks old vma from rmap] unmap_mapping_range unmap_mapping_pages i_mmap_lock_read unmap_mapping_range_tree [loop] unmap_mapping_range_vma zap_page_range_single unmap_single_vma unmap_page_range zap_p4d_range zap_pud_range zap_pmd_range [looks up pmd entry] free_pgd_range [frees pmd] [UAF pmd entry access] To reproduce this, apply the attached mmap-vs-truncate-racewiden.diff to widen the race windows, then build and run the attached reproducer mmap-fixed-race.c. Under a kernel with KASAN, you should ideally get a KASAN splat like this: [ 90.012655][ T1113] ================================================================== [ 90.013937][ T1113] BUG: KASAN: use-after-free in unmap_page_range+0x2d4e/0x3a80 [ 90.015136][ T1113] Read of size 8 at addr ffff888006a3b000 by task SLOWME2/1113 [ 90.016322][ T1113] [ 90.016695][ T1113] CPU: 12 UID: 1000 PID: 1113 Comm: SLOWME2 Tainted: G W 6.12.0-rc2-dirty #500 [ 90.018355][ T1113] Tainted: [W]=WARN [ 90.018959][ T1113] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 [ 90.020598][ T1113] Call Trace: [ 90.021126][ T1113] [ 90.021590][ T1113] dump_stack_lvl+0x53/0x70 [ 90.022307][ T1113] print_report+0xce/0x670 [ 90.023008][ T1113] ? __pfx__raw_spin_lock_irqsave+0x10/0x10 [ 90.023942][ T1113] ? unmap_page_range+0x2d4e/0x3a80 [ 90.024763][ T1113] kasan_report+0xe2/0x120 [ 90.025468][ T1113] ? unmap_page_range+0x2d4e/0x3a80 [ 90.026293][ T1113] unmap_page_range+0x2d4e/0x3a80 [ 90.027087][ T1113] ? srso_return_thunk+0x5/0x5f [ 90.027855][ T1113] ? next_uptodate_folio+0x148/0x890 [ 90.029299][ T1113] ? set_pte_range+0x265/0x6c0 [ 90.030058][ T1113] ? srso_return_thunk+0x5/0x5f [ 90.030826][ T1113] ? page_table_check_set.part.0+0x2ab/0x3e0 [ 90.031773][ T1113] ? __pfx_unmap_page_range+0x10/0x10 [ 90.032622][ T1113] ? srso_return_thunk+0x5/0x5f [ 90.033394][ T1113] ? unmap_single_vma+0xc6/0x2c0 [ 90.034211][ T1113] zap_page_range_single+0x28a/0x4b0 [ 90.035052][ T1113] ? srso_return_thunk+0x5/0x5f [ 90.035821][ T1113] ? __pfx_zap_page_range_single+0x10/0x10 [ 90.036739][ T1113] ? __pte_offset_map+0x1d/0x250 [ 90.037528][ T1113] ? srso_return_thunk+0x5/0x5f [ 90.038295][ T1113] ? do_fault+0x6c4/0x1270 [ 90.038999][ T1113] ? __pfx___handle_mm_fault+0x10/0x10 [ 90.039862][ T1113] unmap_mapping_range+0x1b2/0x240 [ 90.040671][ T1113] ? __pfx_unmap_mapping_range+0x10/0x10 [ 90.041563][ T1113] ? setattr_prepare+0xed/0x7e0 [ 90.042330][ T1113] ? srso_return_thunk+0x5/0x5f [ 90.043097][ T1113] ? current_time+0x88/0x200 [ 90.043826][ T1113] shmem_setattr+0x880/0xea0 [ 90.044556][ T1113] notify_change+0x42b/0xea0 [ 90.045913][ T1113] ? do_truncate+0x10b/0x1b0 [ 90.046641][ T1113] ? srso_return_thunk+0x5/0x5f [ 90.047407][ T1113] do_truncate+0x10b/0x1b0 [ 90.048107][ T1113] ? __pfx_do_truncate+0x10/0x10 [ 90.048890][ T1113] ? __set_task_comm+0x53/0x140 [ 90.049668][ T1113] ? srso_return_thunk+0x5/0x5f [ 90.050472][ T1113] ? __do_sys_prctl+0x71e/0x11f0 [ 90.051262][ T1113] do_ftruncate+0x334/0x470 [ 90.051976][ T1113] ? srso_return_thunk+0x5/0x5f [ 90.052745][ T1113] do_sys_ftruncate+0x3d/0x80 [ 90.053493][ T1113] do_syscall_64+0x4b/0x110 [ 90.054209][ T1113] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 90.055142][ T1113] RIP: 0033:0x7f89cbf2bf47 [ 90.055844][ T1113] Code: 77 01 c3 48 8b 15 b9 1e 0d 00 f7 d8 64 89 02 b8 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 b8 4d 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 01 c3 48 8b 15 89 1e 0d 00 f7 d8 64 89 02 b8 [ 90.058924][ T1113] RSP: 002b:00007fff96968aa8 EFLAGS: 00000213 ORIG_RAX: 000000000000004d [ 90.060248][ T1113] RAX: ffffffffffffffda RBX: 00007fff96968c28 RCX: 00007f89cbf2bf47 [ 90.061507][ T1113] RDX: 0000000000000000 RSI: 0000000000001000 RDI: 0000000000000003 [ 90.063471][ T1113] RBP: 00007fff96968b10 R08: 0000000000000000 R09: 00007f89cbe29740 [ 90.064727][ T1113] R10: 00007f89cbefb443 R11: 0000000000000213 R12: 0000000000000000 [ 90.065990][ T1113] R13: 00007fff96968c38 R14: 000055ce5a58add8 R15: 00007f89cc05f020 [ 90.067291][ T1113] [ 90.067772][ T1113] [ 90.068147][ T1113] The buggy address belongs to the physical page: [ 90.069168][ T1113] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0xffff888006a3bf50 pfn:0x6a3b [ 90.070741][ T1113] flags: 0x100000000000000(node=0|zone=1) [ 90.071649][ T1113] raw: 0100000000000000 ffffea0000111748 ffffea000020c488 0000000000000000 [ 90.073009][ T1113] raw: ffff888006a3bf50 0000000000000000 00000000ffffffff 0000000000000000 [ 90.074368][ T1113] page dumped because: kasan: bad access detected [ 90.075382][ T1113] [ 90.075751][ T1113] Memory state around the buggy address: [ 90.076636][ T1113] ffff888006a3af00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 90.077905][ T1113] ffff888006a3af80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 90.079874][ T1113] >ffff888006a3b000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [ 90.081173][ T1113] ^ [ 90.081821][ T1113] ffff888006a3b080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [ 90.083134][ T1113] ffff888006a3b100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [ 90.084416][ T1113] ================================================================== diff --git a/mm/memory.c b/mm/memory.c index 2366578015ad..b95a43221058 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -78,6 +78,7 @@ #include #include #include +#include #include @@ -410,6 +411,13 @@ void free_pgtables(struct mmu_gather *tlb, struct ma_state *mas, unlink_file_vma_batch_add(&vb, vma); } unlink_file_vma_batch_final(&vb); + + if (strcmp(current->comm, "SLOWME1") == 0) { + pr_warn("%s: starting delay\n", __func__); + mdelay(2000); + pr_warn("%s: ending delay\n", __func__); + } + free_pgd_range(tlb, addr, vma->vm_end, floor, next ? next->vm_start : ceiling); } @@ -1711,6 +1719,13 @@ static inline unsigned long zap_pmd_range(struct mmu_gather *tlb, unsigned long next; pmd = pmd_offset(pud, addr); + + if (strcmp(current->comm, "SLOWME2") == 0) { + pr_warn("%s: starting delay\n", __func__); + mdelay(2000); + pr_warn("%s: ending delay\n", __func__); + } + do { next = pmd_addr_end(addr, end); if (is_swap_pmd(*pmd) || pmd_trans_huge(*pmd) || pmd_devmap(*pmd)) {