From patchwork Tue Aug 13 20:25:21 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jann Horn X-Patchwork-Id: 13762451 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8C322C531DD for ; Tue, 13 Aug 2024 20:25:49 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C61D06B0093; Tue, 13 Aug 2024 16:25:47 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C10646B0092; Tue, 13 Aug 2024 16:25:47 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AB1356B0093; Tue, 13 Aug 2024 16:25:47 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 7DF0E6B0092 for ; Tue, 13 Aug 2024 16:25:47 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 08C571C40A0 for ; Tue, 13 Aug 2024 20:25:47 +0000 (UTC) X-FDA: 82448353134.04.E5D6FB0 Received: from mail-wm1-f42.google.com (mail-wm1-f42.google.com [209.85.128.42]) by imf09.hostedemail.com (Postfix) with ESMTP id E433F14001C for ; Tue, 13 Aug 2024 20:25:43 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=MzsZVd9n; spf=pass (imf09.hostedemail.com: domain of jannh@google.com designates 209.85.128.42 as permitted sender) smtp.mailfrom=jannh@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1723580732; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Xs2mtlmkE8RENg4xcjJxmH3AsBKwUjYWFgttp/hIQKk=; b=JjmPfn7IB5z4rz/rxkK4N/B4a19nFliybLaxcwWKi4y7CfeQg1BliujlW8bG78hV9cKVbc wzHPHITIFIyT/hk73cui7YKPUGmrrr44tcKS/3SWTRy565lfYWoOyRfFdkc92W1mkzqjg4 qNZPA/YnZzMT9OplZApNwblsbxVDh8g= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=MzsZVd9n; spf=pass (imf09.hostedemail.com: domain of jannh@google.com designates 209.85.128.42 as permitted sender) smtp.mailfrom=jannh@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1723580732; a=rsa-sha256; cv=none; b=S9xuaNtv6Jb9ExQNyozx93mozPCzZM/OWuE3TqJuLsBcB3GFl4jCm31U2GnzDpOOui+bYy bng3/tqDFKRljqQ537SqNcwa1yS7Cz+zQ00oZUnix8OV2aVtJZZ98Oyiipdrya4sZC1CN/ 5/Et2GnPko8+C1AjxC4m8f/FomUEGyM= Received: by mail-wm1-f42.google.com with SMTP id 5b1f17b1804b1-42807cb6afdso2695e9.1 for ; Tue, 13 Aug 2024 13:25:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1723580742; x=1724185542; darn=kvack.org; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=Xs2mtlmkE8RENg4xcjJxmH3AsBKwUjYWFgttp/hIQKk=; b=MzsZVd9njryCv3kW0HOb1HRc1y7D3UvJbesTfh3l18LBVFCWjC5BzJnUwQA2JDmMBg JB5816Ac0EOOB1vtoBWeC7FOAnG2x2eNUnTM7QDZK2qDm9q+M50Sqwn8uMKUTfPD53LB yO2beefEqaDh7OqSOJRQCp6JoE1O3r0OsdUOSQIY4UH6OpNbO5SiKRHSGJpfzcEdm1As qnlzNnGVKqEpNDz/L7xC/MXKq5S6a22gV79idNCkPswLEEcmgMY7MKBMk2/epBgMqMlt cszLZyW5S9UKtoLiTjt0U3kFEgiWXjB9IlfkDSZT51W0sYxoll65gcGYXjle7x2MM35N U9Gw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1723580742; x=1724185542; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Xs2mtlmkE8RENg4xcjJxmH3AsBKwUjYWFgttp/hIQKk=; b=ngsCdTL+/MGFxeR2ls1vMtM3OBmN2kxMxtVRbujU3EkqErvuiTdEMnqy9lWcEkYBec p9ff/l8J8GwT/SvQXFwPAH2lMmp/I8HzOGsuluIN69mYLyqebNe5pL95OqtSuZrFfMGp KgUXci2S2zaLgimwMCu3nbG5t9/AI4dWCT+gonwYsKUbAW+p8FWYGgMIUah8sBWD3qdh 2+7XlNj0DIy7qN1ASRAVG6G6Ui5N97hhhjJZc/Oou22tqCeCxohtOlHGaILYMdOSO/OR Nit/lJb/XiN4J9iOBI3JD2uLDR1SqdnT6uLL1ougzX45LUraGmRWZ7aP+AUTXRO/sZLz Yk0Q== X-Gm-Message-State: AOJu0YzoaVqaEkReRc/ILPQvP2S3jyTJ9TwIlEwuZ/L96b1fcPN3OVDu oXZHtOyRHxahEWA6EmFds5A3j4VqYvjVsZlOgIsThKCd0yGKP2GNcoS3O8k+QKgnZALhEPWAgNK gEw== X-Google-Smtp-Source: AGHT+IFO4Kwu8LOnDCVoNHKE9IDgnB2lyd4k/4Eah6HV4M7+c2hknCrXNstVkl9ev/iPKDtfSvDJqg== X-Received: by 2002:a05:600c:1e24:b0:426:62a2:dfc with SMTP id 5b1f17b1804b1-429dec3dd98mr5415e9.5.1723580741697; Tue, 13 Aug 2024 13:25:41 -0700 (PDT) Received: from localhost ([2a00:79e0:9d:4:a608:a4cb:f4c2:6573]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-429deb4c1bbsm910225e9.20.2024.08.13.13.25.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 13 Aug 2024 13:25:41 -0700 (PDT) From: Jann Horn Date: Tue, 13 Aug 2024 22:25:21 +0200 Subject: [PATCH v2 1/2] userfaultfd: Fix checks for huge PMDs MIME-Version: 1.0 Message-Id: <20240813-uffd-thp-flip-fix-v2-1-5efa61078a41@google.com> References: <20240813-uffd-thp-flip-fix-v2-0-5efa61078a41@google.com> In-Reply-To: <20240813-uffd-thp-flip-fix-v2-0-5efa61078a41@google.com> To: Andrew Morton , Pavel Emelianov , Andrea Arcangeli , Hugh Dickins Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, David Hildenbrand , Qi Zheng , Jann Horn , stable@vger.kernel.org X-Mailer: b4 0.15-dev X-Developer-Signature: v=1; a=ed25519-sha256; t=1723580736; l=3813; i=jannh@google.com; s=20240730; h=from:subject:message-id; bh=NjEb/5kWU2ZKZsYUDCoXZhjVmSL4k/e2MGJXrVl34QQ=; b=BJybKHoOx2uHAvfMX0klbtA/6hnFPEr8ifxsqo6HrIRCT1MpuWH3+pMSr68JQN+RXy0lPCIuT FleIsksNrFQCttylm7aL6gN/0i9yu5T6ZOyquKvGCEyUVqVI7VwWUvq X-Developer-Key: i=jannh@google.com; a=ed25519; pk=AljNtGOzXeF6khBXDJVVvwSEkVDGnnZZYqfWhP1V+C8= X-Rspam-User: X-Stat-Signature: z7bt6afx3crcqaox4qrmy5xzj6iswnwz X-Rspamd-Queue-Id: E433F14001C X-Rspamd-Server: rspam11 X-HE-Tag: 1723580743-378856 X-HE-Meta: U2FsdGVkX192myMjQ1/jLrxyGv1g18+Ee9I+ehWypWYFKOqjZOXSUmc/4+iqL/3cakQeq3hKG9pRz1T77lNZWC0KOmhpreOHGt8K0KGmvQTW0wk2YDXJxLZH4W3jGzyMNphSGYeElrGWxrYiHNCOioWKDKIHth4vrC+ouXJOef/EfT7r3bXC5k4WdWN0JQOndqEgMZsrfjqw/u7209fFmr19F2Y0KsDXFmvow7LLiNbrWUdF5jOr6eC1D3kaigIo1QxwRxaXJvgr52bTg/XJJzxWDzZQRUg60gLnRoDadgCUsv9G2XHdBEib1J+VZCldFCFj6xY4ocU0bg2Ej1EjkaWa4EmKltUmDJfVBjkC+AzSBx7YJBIzepLAaEfPEpLOXbu9xUHZq2jJepIwTtZUeVWTZRrpWdmU4uMKmwbcPE1hKbKI2lvid7i82putmuLtJ4zs6KXGyNC4PU/35YvVZRVC5iX8nhvRvGpG06/zL4SOX3GqSuNflT5pITsm3F7WmtyUAOoz+UazGJCKRuaoVAiASNQAEyh6rFLu1ttqCwkl05GMG1gWq2fjPExIpwqDs5hy7JIGCUEDqK9Eu8SuKeLMpTVjWPhdU5cGeiIbyYDK3auETr5Yt0/2vVQHCSawRHIjM2zcLGsr7G/2NUAYPZ4e+rktHV25eDm/45uPU+QWyE9IHGPAms0QPj6rrLJnAfSeCYHBfD5Z1Ub0anDhUeyKo7yoUAjnf8qSKeOF69ig1pi5J865qF8L6Km+l+m/3upGzVFa4P/ahw+aTnG4a5VlqmCRq+bQwOJ69IhVYram8fDbIuYgfUlXoh6ZwVnA+vAlEGRW4n0AZGCVUBz8hKDlKdce4VOLObGXS7kK2mogvkfq7A8IZAj9FcDF83NGsOWQImSq2ELP3grA1CGB+67PYYrjq5Gmokpy6/pHJ1TAQIV/C3pIplVDPKTGbhYTN+XomC0HxFO/FqRAZ9v HHDT05Ba U4rpXKN9YHAt9jiQR5IRWUOf3KC9ZpOOibDc2I5Tg7kzXurroBmgcz5dpUa7d8BLP/kIfOFxuGqV+nVlD9x0A6uR4sOrVD1CO6uZmhVd8KG1VwxqL8nmCtmDdKsarQmzRaMFU5NnJwzLWXKPFu/WfxD/iiB8qAGey2LmG6xVJFjcMsfltvd1iTV54kBvjh26yAuzkRA70o3YuJKguULW1Zg5PCy0Zd8UJdzy7Ckg8uaYxzEiLFr84Ln3ra8kyiSl1aJIw+s7+rfYj5ymXVWNbsvfcJLUHGFWr/IXiay0UCBmU3ydpPaPQJkWFwBk9V1azw3ll0p/NG6LX6uMadYmmiyRSRBx5jb6vBcMpvmd34dKZxWJspPgftMBFzhll4uIFaqxtxzemUKv9Wit7x33Ivw81BKhRHa6fYtO+ X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: This fixes two issues. I discovered that the following race can occur: mfill_atomic other thread ============ ============ pmdp_get_lockless() [reads none pmd] __pte_alloc [no-op] BUG_ON(pmd_none(*dst_pmd)) I have experimentally verified this in a kernel with extra mdelay() calls; the BUG_ON(pmd_none(*dst_pmd)) triggers. On kernels newer than commit 0d940a9b270b ("mm/pgtable: allow pte_offset_map[_lock]() to fail"), this can't lead to anything worse than a BUG_ON(), since the page table access helpers are actually designed to deal with page tables concurrently disappearing; but on older kernels (<=6.4), I think we could probably theoretically race past the two BUG_ON() checks and end up treating a hugepage as a page table. The second issue is that, as Qi Zheng pointed out, there are other types of huge PMDs that pmd_trans_huge() can't catch: devmap PMDs and swap PMDs (in particular, migration PMDs). On <=6.4, this is worse than the first issue: If mfill_atomic() runs on a PMD that contains a migration entry (which just requires winning a single, fairly wide race), it will pass the PMD to pte_offset_map_lock(), which assumes that the PMD points to a page table. Breakage follows: First, the kernel tries to take the PTE lock (which will crash or maybe worse if there is no "struct page" for the address bits in the migration entry PMD - I think at least on X86 there usually is no corresponding "struct page" thanks to the PTE inversion mitigation, amd64 looks different). If that didn't crash, the kernel would next try to write a PTE into what it wrongly thinks is a page table. As part of fixing these issues, get rid of the check for pmd_trans_huge() before __pte_alloc() - that's redundant, we're going to have to check for that after the __pte_alloc() anyway. Backport note: pmdp_get_lockless() is pmd_read_atomic() in older kernels. Reported-by: Qi Zheng Closes: https://lore.kernel.org/r/59bf3c2e-d58b-41af-ab10-3e631d802229@bytedance.com Cc: stable@vger.kernel.org Fixes: c1a4de99fada ("userfaultfd: mcopy_atomic|mfill_zeropage: UFFDIO_COPY|UFFDIO_ZEROPAGE preparation") Signed-off-by: Jann Horn Acked-by: David Hildenbrand Reviewed-by: Qi Zheng --- mm/userfaultfd.c | 22 ++++++++++++---------- 1 file changed, 12 insertions(+), 10 deletions(-) diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index e54e5c8907fa..290b2a0d84ac 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -787,21 +787,23 @@ static __always_inline ssize_t mfill_atomic(struct userfaultfd_ctx *ctx, } dst_pmdval = pmdp_get_lockless(dst_pmd); - /* - * If the dst_pmd is mapped as THP don't - * override it and just be strict. - */ - if (unlikely(pmd_trans_huge(dst_pmdval))) { - err = -EEXIST; - break; - } if (unlikely(pmd_none(dst_pmdval)) && unlikely(__pte_alloc(dst_mm, dst_pmd))) { err = -ENOMEM; break; } - /* If an huge pmd materialized from under us fail */ - if (unlikely(pmd_trans_huge(*dst_pmd))) { + dst_pmdval = pmdp_get_lockless(dst_pmd); + /* + * If the dst_pmd is THP don't override it and just be strict. + * (This includes the case where the PMD used to be THP and + * changed back to none after __pte_alloc().) + */ + if (unlikely(!pmd_present(dst_pmdval) || pmd_trans_huge(dst_pmdval) || + pmd_devmap(dst_pmdval))) { + err = -EEXIST; + break; + } + if (unlikely(pmd_bad(dst_pmdval))) { err = -EFAULT; break; }