From patchwork Thu Mar 4 09:54:23 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Nadav Amit X-Patchwork-Id: 12115779 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.5 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED,DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D389AC433DB for ; Thu, 4 Mar 2021 09:59:02 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 61F1F64F27 for ; Thu, 4 Mar 2021 09:59:02 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 61F1F64F27 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 9D10E6B0006; Thu, 4 Mar 2021 04:59:01 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 9A6856B0007; Thu, 4 Mar 2021 04:59:01 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 847DD6B0008; Thu, 4 Mar 2021 04:59:01 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0236.hostedemail.com [216.40.44.236]) by kanga.kvack.org (Postfix) with ESMTP id 675196B0006 for ; Thu, 4 Mar 2021 04:59:01 -0500 (EST) Received: from smtpin12.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 22B67180ACEEE for ; Thu, 4 Mar 2021 09:59:01 +0000 (UTC) X-FDA: 77881743282.12.BD42EFF Received: from mail-pj1-f50.google.com (mail-pj1-f50.google.com [209.85.216.50]) by imf14.hostedemail.com (Postfix) with ESMTP id C0CECC0001FA for ; Thu, 4 Mar 2021 09:58:58 +0000 (UTC) Received: by mail-pj1-f50.google.com with SMTP id ch11so541539pjb.4 for ; Thu, 04 Mar 2021 01:59:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=n4CkASMCqnRVKlVMuJIlE3MUCpSbMkYMNXckSMWm4yU=; b=kD12OzyrjUv66c03Uw8JyGct02j39byIAVQFPy6UHJHrlPPciybOg2kcF90RCEzvPp BOASrUL8jYq3O+VLz+Qhlx6/MtJ29eQwxLorY2NT9i9xUQij+1BxCkexGif8nEJNfXmx V+NrpzmY8rUXFounOiJuCEXwG2qEKKImOMD2qW0DeO7g0MKne00MmADavgl3nHiyg5kD GJQI62DCzVFOaoFmejkNCFSZcauw0Jcyo2Lqg+opaDR5txmoZlb9AAIfenlJu2lNYTBI pkQHCns3c3blvQswShIsz05pAXNU3q+Blfium59uxCnNtf5THamebdhOb//+TcbLWAEo k0Ng== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=n4CkASMCqnRVKlVMuJIlE3MUCpSbMkYMNXckSMWm4yU=; b=drT5XmR2phHAKZ2MwM4IefhHreITyPsZxJpBgH+xizhUjVf0Qmj7PYG89+pnM2YBcU z9PcOvxPk6r28DP4dhm4Td4mrcppaclft21c8UK7/kHuRUcHOeRLNrP5xgxdUAnFCSoa zRicBjaqWEUWk5AgRXQK052MUcE2SnQ1kVeEVzb15L28QGNaH92KZnrlMaIw0UJYbZ4Z 6uH9bp8+JzBFHPUpJHHt1TEBlKqyBeGDfWVHDkieOpZTeVnNb1Nhmz/TjETCLCnf1hup j1n7ndVRm1cDYux6HLLfIXf/48ASRK7lJwlEaZZptIRrrAv5GGUuoImrQVbjI5auTao2 9ltg== X-Gm-Message-State: AOAM531n+L4w811tfdzeRr9QXOriyvQgAYjpUTJ0sNlTWXb04Ycb6+2w DlJRE/eZNz7Ua1RSAv2YJCHzQiVUhsCWgA== X-Google-Smtp-Source: ABdhPJz11DGPywcaGov16qLHJPteMitUsxd1qVspYeC8e4qrs50QmNjkEhzLrLlQrMX0OpoY3+F+qQ== X-Received: by 2002:a17:90a:7309:: with SMTP id m9mr3743213pjk.23.1614851939139; Thu, 04 Mar 2021 01:58:59 -0800 (PST) Received: from sc2-haas01-esx0118.eng.vmware.com ([66.170.99.1]) by smtp.gmail.com with ESMTPSA id a19sm9503339pjh.39.2021.03.04.01.58.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 04 Mar 2021 01:58:58 -0800 (PST) From: Nadav Amit X-Google-Original-From: Nadav Amit To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, Andrew Morton , Nadav Amit , Andrea Arcangeli , Andy Lutomirski , Pavel Emelyanov , Mike Kravetz , Mike Rapoport , Minchan Kim , Will Deacon , Peter Zijlstra , stable@vger.kernel.org, Yu Zhao , Peter Xu Subject: [PATCH v4] mm/userfaultfd: fix memory corruption due to writeprotect Date: Thu, 4 Mar 2021 01:54:23 -0800 Message-Id: <20210304095423.3825684-1-namit@vmware.com> X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: C0CECC0001FA X-Stat-Signature: xg3p9ao56t57nh1c4ng7ofx4rtxxiy89 Received-SPF: none (<>: No applicable sender policy available) receiver=imf14; identity=mailfrom; envelope-from="<>"; helo=mail-pj1-f50.google.com; client-ip=209.85.216.50 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1614851938-566928 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Nadav Amit Userfaultfd self-test fails occasionally, indicating a memory corruption. Analyzing this problem indicates that there is a real bug since mmap_lock is only taken for read in mwriteprotect_range() and defers flushes, and since there is insufficient consideration of concurrent deferred TLB flushes in wp_page_copy(). Although the PTE is flushed from the TLBs in wp_page_copy(), this flush takes place after the copy has already been performed, and therefore changes of the page are possible between the time of the copy and the time in which the PTE is flushed. To make matters worse, memory-unprotection using userfaultfd also poses a problem. Although memory unprotection is logically a promotion of PTE permissions, and therefore should not require a TLB flush, the current userrfaultfd code might actually cause a demotion of the architectural PTE permission: when userfaultfd_writeprotect() unprotects memory region, it unintentionally *clears* the RW-bit if it was already set. Note that this unprotecting a PTE that is not write-protected is a valid use-case: the userfaultfd monitor might ask to unprotect a region that holds both write-protected and write-unprotected PTEs. The scenario that happens in selftests/vm/userfaultfd is as follows: cpu0 cpu1 cpu2 ---- ---- ---- [ Writable PTE cached in TLB ] userfaultfd_writeprotect() [ write-*unprotect* ] mwriteprotect_range() mmap_read_lock() change_protection() change_protection_range() ... change_pte_range() [ *clear* “write”-bit ] [ defer TLB flushes ] [ page-fault ] ... wp_page_copy() cow_user_page() [ copy page ] [ write to old page ] ... set_pte_at_notify() A similar scenario can happen: cpu0 cpu1 cpu2 cpu3 ---- ---- ---- ---- [ Writable PTE cached in TLB ] userfaultfd_writeprotect() [ write-protect ] [ deferred TLB flush ] userfaultfd_writeprotect() [ write-unprotect ] [ deferred TLB flush] [ page-fault ] wp_page_copy() cow_user_page() [ copy page ] ... [ write to page ] set_pte_at_notify() This race exists since commit 292924b26024 ("userfaultfd: wp: apply _PAGE_UFFD_WP bit"). Yet, as Yu Zhao pointed, these races became apparent since commit 09854ba94c6a ("mm: do_wp_page() simplification") which made wp_page_copy() more likely to take place, specifically if page_count(page) > 1. To resolve the aforementioned races, check whether there are pending flushes on uffd-write-protected VMAs, and if there are, perform a flush before doing the COW. Further optimizations will follow to avoid during uffd-write-unprotect unnecassary PTE write-protection and TLB flushes. Cc: Andrea Arcangeli Cc: Andy Lutomirski Cc: Pavel Emelyanov Cc: Mike Kravetz Cc: Mike Rapoport Cc: Minchan Kim Cc: Will Deacon Cc: Peter Zijlstra Cc: stable@vger.kernel.org # 5.9+ Suggested-by: Yu Zhao Reviewed-by: Peter Xu Tested-by: Peter Xu Fixes: 09854ba94c6a ("mm: do_wp_page() simplification") Signed-off-by: Nadav Amit --- v3->v4: * Fix the "Fixes" tag for real [Peter Xu] * Reviewed-by, suggested-by tags [Peter Xu] * Adding unlikely() [Peter Xu] v2->v3: * Do not acquire mmap_lock for write, flush conditionally instead [Yu] * Change the fixes tag to the patch that made the race apparent [Yu] * Removing patch to avoid write-protect on uffd unprotect. More comprehensive solution to follow (and avoid the TLB flush as well). --- mm/memory.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/mm/memory.c b/mm/memory.c index 9e8576a83147..79253cb3bcd5 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3092,6 +3092,14 @@ static vm_fault_t do_wp_page(struct vm_fault *vmf) return handle_userfault(vmf, VM_UFFD_WP); } + /* + * Userfaultfd write-protect can defer flushes. Ensure the TLB + * is flushed in this case before copying. + */ + if (unlikely(userfaultfd_wp(vmf->vma) && + mm_tlb_flush_pending(vmf->vma->vm_mm))) + flush_tlb_page(vmf->vma, vmf->address); + vmf->page = vm_normal_page(vma, vmf->address, vmf->orig_pte); if (!vmf->page) { /*