From patchwork Sun Aug 8 02:07:22 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nadav Amit X-Patchwork-Id: 12424641 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.5 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED,DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C02E7C4338F for ; Sun, 8 Aug 2021 02:07:53 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 0493860ED6 for ; Sun, 8 Aug 2021 02:07:52 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 0493860ED6 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id DB11A6B0071; Sat, 7 Aug 2021 22:07:51 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D3A476B0073; Sat, 7 Aug 2021 22:07:51 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B64086B0074; Sat, 7 Aug 2021 22:07:51 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0218.hostedemail.com [216.40.44.218]) by kanga.kvack.org (Postfix) with ESMTP id 988FF6B0071 for ; Sat, 7 Aug 2021 22:07:51 -0400 (EDT) Received: from smtpin20.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 500EB8249980 for ; Sun, 8 Aug 2021 02:07:51 +0000 (UTC) X-FDA: 78450277542.20.1A7BB95 Received: from mail-pj1-f54.google.com (mail-pj1-f54.google.com [209.85.216.54]) by imf15.hostedemail.com (Postfix) with ESMTP id 084E5D00D8AE for ; Sun, 8 Aug 2021 02:07:50 +0000 (UTC) Received: by mail-pj1-f54.google.com with SMTP id m10-20020a17090a34cab0290176b52c60ddso24083691pjf.4 for ; Sat, 07 Aug 2021 19:07:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=y1+8qk94YsLybd6d880dx8+jAHRqhw7MZcVegNJG/As=; b=pf9HhRCzXwqsDn7byDY/u5c0MwGmAVtS9hJq0ik9NCMy+0S2monj1cEmbyz5BT4FAJ wprdXcKO8uBN69fa2aSYUnDM7FRoGYqbixoLorFJAk92XDt3Po5RB7oEff76+tkLg5y5 PUoUGUaZnstPDuQWpn356mOJWtwhsYdfAzW5ed1O4Enu4AJXBAPS44vRV3LHzkgEfZoO XBmu6fr3Ny3uwqZCtTHupkGNfs9avJIqwUKSHHqUh91v+zq1xbAdaTZvh5qR1iInEYpg MhS8CgwGF6K8xQcujJIVLT9X+iFztvW9ufZTjahasm+2Z0T4lj8I8xwLLXdC4jXKLvuM eNKQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=y1+8qk94YsLybd6d880dx8+jAHRqhw7MZcVegNJG/As=; b=mODBID1f8ce2IYFWsnpqivDI9GN9Wk12cBsD+MHHs1X1vw0PuNrorIGeQsqHKjPfoH cW1Lm22sZc2upLOEjklmcrWmr4RyfCE/sWhvOwpZZZN//sDxEKB1jcAUg9nKOzSOd+Iw tdbdiQiqIQ9Wn0s2+fvuuoPDCWgRFrYKIbx1wqZbA8UKUFoRyWO6wQ3KyF7tGpxZ6vaw pjL5Ebg/7/NHe6PaHoCk/UeT9+xT+y6TbO/n4/2AFzTOBxSmCx9V92yQIGHSZY2bv6XF 7vLeiX1NHSjLPM62LfVeMxYiGdydA88qQx6+nLl9lSxR1JZ0s0uR0SkpIdXVh1Isxnpx IrQQ== X-Gm-Message-State: AOAM531o9/1ug0Hngb62aEHyHQdV6s1LPZ7EdLY0SIg77cNHY4RgIDgs 5NahUmkpeGDh0PWUe62WXIg= X-Google-Smtp-Source: ABdhPJzW27UfC05q0W5jndajSA0al8nrdR98ALOIkJYTPk5PoG0hdD4KIRhan00ZSd5sftw4KzUtzw== X-Received: by 2002:a63:57:: with SMTP id 84mr31742pga.241.1628388469915; Sat, 07 Aug 2021 19:07:49 -0700 (PDT) Received: from sc2-haas01-esx0118.eng.vmware.com ([66.170.99.1]) by smtp.gmail.com with ESMTPSA id o18sm3987432pjp.1.2021.08.07.19.07.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 07 Aug 2021 19:07:49 -0700 (PDT) From: Nadav Amit X-Google-Original-From: Nadav Amit To: Andrew Morton Cc: Pavel Emelyanov , Mike Rapoport , Peter Xu , linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, Nadav Amit Subject: [PATCH 1/3] userfaultfd: change mmap_changing to atomic Date: Sat, 7 Aug 2021 19:07:22 -0700 Message-Id: <20210808020724.1022515-2-namit@vmware.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210808020724.1022515-1-namit@vmware.com> References: <20210808020724.1022515-1-namit@vmware.com> MIME-Version: 1.0 Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=gmail.com header.s=20161025 header.b=pf9HhRCz; dmarc=pass (policy=none) header.from=gmail.com; spf=none (imf15.hostedemail.com: domain of mail-pj1-f54.google.com has no SPF policy when checking 209.85.216.54) smtp.helo=mail-pj1-f54.google.com X-Stat-Signature: 7tf9rubuzm5tffei69ch47m9gc7pxa8q X-Rspamd-Queue-Id: 084E5D00D8AE X-Rspamd-Server: rspam01 X-HE-Tag: 1628388470-452040 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Nadav Amit mmap_changing is currently a boolean variable, which is set and cleared without any lock that protects against concurrent modifications. mmap_chanign is supposed to mark whether userfaultfd page-faults handling should be retried since mappings are undergoing a change. However, concurrent calls, for instance to madvise(MADV_DONTNEED), might cause mmap_changing to be false, although the remove event was still not read (hence acknowledged) by the user. Change mmap_changing to atomic_t and increase/decrease appropriately. Add a debug assertion to see whether mmap_changing is negative. Cc: Mike Rapoport Fixes: df2cc96e77011 ("userfaultfd: prevent non-cooperative events vs mcopy_atomic races") Signed-off-by: Nadav Amit --- fs/userfaultfd.c | 25 +++++++++++++------------ include/linux/userfaultfd_k.h | 8 ++++---- mm/userfaultfd.c | 15 ++++++++------- 3 files changed, 25 insertions(+), 23 deletions(-) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index 5c2d806e6ae5..29a3016f16c9 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -74,7 +74,7 @@ struct userfaultfd_ctx { /* released */ bool released; /* memory mappings are changing because of non-cooperative event */ - bool mmap_changing; + atomic_t mmap_changing; /* mm with one ore more vmas attached to this userfaultfd_ctx */ struct mm_struct *mm; }; @@ -623,7 +623,8 @@ static void userfaultfd_event_wait_completion(struct userfaultfd_ctx *ctx, * already released. */ out: - WRITE_ONCE(ctx->mmap_changing, false); + atomic_dec(&ctx->mmap_changing); + VM_BUG_ON(atomic_read(&ctx->mmap_changing) < 0); userfaultfd_ctx_put(ctx); } @@ -669,12 +670,12 @@ int dup_userfaultfd(struct vm_area_struct *vma, struct list_head *fcs) ctx->state = UFFD_STATE_RUNNING; ctx->features = octx->features; ctx->released = false; - ctx->mmap_changing = false; + atomic_set(&ctx->mmap_changing, 0); ctx->mm = vma->vm_mm; mmgrab(ctx->mm); userfaultfd_ctx_get(octx); - WRITE_ONCE(octx->mmap_changing, true); + atomic_inc(&octx->mmap_changing); fctx->orig = octx; fctx->new = ctx; list_add_tail(&fctx->list, fcs); @@ -721,7 +722,7 @@ void mremap_userfaultfd_prep(struct vm_area_struct *vma, if (ctx->features & UFFD_FEATURE_EVENT_REMAP) { vm_ctx->ctx = ctx; userfaultfd_ctx_get(ctx); - WRITE_ONCE(ctx->mmap_changing, true); + atomic_inc(&ctx->mmap_changing); } else { /* Drop uffd context if remap feature not enabled */ vma->vm_userfaultfd_ctx = NULL_VM_UFFD_CTX; @@ -766,7 +767,7 @@ bool userfaultfd_remove(struct vm_area_struct *vma, return true; userfaultfd_ctx_get(ctx); - WRITE_ONCE(ctx->mmap_changing, true); + atomic_inc(&ctx->mmap_changing); mmap_read_unlock(mm); msg_init(&ewq.msg); @@ -810,7 +811,7 @@ int userfaultfd_unmap_prep(struct vm_area_struct *vma, return -ENOMEM; userfaultfd_ctx_get(ctx); - WRITE_ONCE(ctx->mmap_changing, true); + atomic_inc(&ctx->mmap_changing); unmap_ctx->ctx = ctx; unmap_ctx->start = start; unmap_ctx->end = end; @@ -1700,7 +1701,7 @@ static int userfaultfd_copy(struct userfaultfd_ctx *ctx, user_uffdio_copy = (struct uffdio_copy __user *) arg; ret = -EAGAIN; - if (READ_ONCE(ctx->mmap_changing)) + if (atomic_read(&ctx->mmap_changing)) goto out; ret = -EFAULT; @@ -1757,7 +1758,7 @@ static int userfaultfd_zeropage(struct userfaultfd_ctx *ctx, user_uffdio_zeropage = (struct uffdio_zeropage __user *) arg; ret = -EAGAIN; - if (READ_ONCE(ctx->mmap_changing)) + if (atomic_read(&ctx->mmap_changing)) goto out; ret = -EFAULT; @@ -1807,7 +1808,7 @@ static int userfaultfd_writeprotect(struct userfaultfd_ctx *ctx, struct userfaultfd_wake_range range; bool mode_wp, mode_dontwake; - if (READ_ONCE(ctx->mmap_changing)) + if (atomic_read(&ctx->mmap_changing)) return -EAGAIN; user_uffdio_wp = (struct uffdio_writeprotect __user *) arg; @@ -1855,7 +1856,7 @@ static int userfaultfd_continue(struct userfaultfd_ctx *ctx, unsigned long arg) user_uffdio_continue = (struct uffdio_continue __user *)arg; ret = -EAGAIN; - if (READ_ONCE(ctx->mmap_changing)) + if (atomic_read(&ctx->mmap_changing)) goto out; ret = -EFAULT; @@ -2087,7 +2088,7 @@ SYSCALL_DEFINE1(userfaultfd, int, flags) ctx->features = 0; ctx->state = UFFD_STATE_WAIT_API; ctx->released = false; - ctx->mmap_changing = false; + atomic_set(&ctx->mmap_changing, 0); ctx->mm = current->mm; /* prevent the mm struct to be freed */ mmgrab(ctx->mm); diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h index 331d2ccf0bcc..33cea484d1ad 100644 --- a/include/linux/userfaultfd_k.h +++ b/include/linux/userfaultfd_k.h @@ -60,16 +60,16 @@ extern int mfill_atomic_install_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd, extern ssize_t mcopy_atomic(struct mm_struct *dst_mm, unsigned long dst_start, unsigned long src_start, unsigned long len, - bool *mmap_changing, __u64 mode); + atomic_t *mmap_changing, __u64 mode); extern ssize_t mfill_zeropage(struct mm_struct *dst_mm, unsigned long dst_start, unsigned long len, - bool *mmap_changing); + atomic_t *mmap_changing); extern ssize_t mcopy_continue(struct mm_struct *dst_mm, unsigned long dst_start, - unsigned long len, bool *mmap_changing); + unsigned long len, atomic_t *mmap_changing); extern int mwriteprotect_range(struct mm_struct *dst_mm, unsigned long start, unsigned long len, - bool enable_wp, bool *mmap_changing); + bool enable_wp, atomic_t *mmap_changing); /* mm helpers */ static inline bool is_mergeable_vm_userfaultfd_ctx(struct vm_area_struct *vma, diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index 0e2132834bc7..7a9008415534 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -483,7 +483,7 @@ static __always_inline ssize_t __mcopy_atomic(struct mm_struct *dst_mm, unsigned long src_start, unsigned long len, enum mcopy_atomic_mode mcopy_mode, - bool *mmap_changing, + atomic_t *mmap_changing, __u64 mode) { struct vm_area_struct *dst_vma; @@ -517,7 +517,7 @@ static __always_inline ssize_t __mcopy_atomic(struct mm_struct *dst_mm, * request the user to retry later */ err = -EAGAIN; - if (mmap_changing && READ_ONCE(*mmap_changing)) + if (mmap_changing && atomic_read(mmap_changing)) goto out_unlock; /* @@ -650,28 +650,29 @@ static __always_inline ssize_t __mcopy_atomic(struct mm_struct *dst_mm, ssize_t mcopy_atomic(struct mm_struct *dst_mm, unsigned long dst_start, unsigned long src_start, unsigned long len, - bool *mmap_changing, __u64 mode) + atomic_t *mmap_changing, __u64 mode) { return __mcopy_atomic(dst_mm, dst_start, src_start, len, MCOPY_ATOMIC_NORMAL, mmap_changing, mode); } ssize_t mfill_zeropage(struct mm_struct *dst_mm, unsigned long start, - unsigned long len, bool *mmap_changing) + unsigned long len, atomic_t *mmap_changing) { return __mcopy_atomic(dst_mm, start, 0, len, MCOPY_ATOMIC_ZEROPAGE, mmap_changing, 0); } ssize_t mcopy_continue(struct mm_struct *dst_mm, unsigned long start, - unsigned long len, bool *mmap_changing) + unsigned long len, atomic_t *mmap_changing) { return __mcopy_atomic(dst_mm, start, 0, len, MCOPY_ATOMIC_CONTINUE, mmap_changing, 0); } int mwriteprotect_range(struct mm_struct *dst_mm, unsigned long start, - unsigned long len, bool enable_wp, bool *mmap_changing) + unsigned long len, bool enable_wp, + atomic_t *mmap_changing) { struct vm_area_struct *dst_vma; pgprot_t newprot; @@ -694,7 +695,7 @@ int mwriteprotect_range(struct mm_struct *dst_mm, unsigned long start, * request the user to retry later */ err = -EAGAIN; - if (mmap_changing && READ_ONCE(*mmap_changing)) + if (mmap_changing && atomic_read(mmap_changing)) goto out_unlock; err = -ENOENT; From patchwork Sun Aug 8 02:07:23 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nadav Amit X-Patchwork-Id: 12424643 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.5 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED,DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 08243C432BE for ; Sun, 8 Aug 2021 02:07:56 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 6394B60EE9 for ; Sun, 8 Aug 2021 02:07:55 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 6394B60EE9 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 833D06B0073; Sat, 7 Aug 2021 22:07:53 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7BBD96B0074; Sat, 7 Aug 2021 22:07:53 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3EA828D0001; Sat, 7 Aug 2021 22:07:53 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0154.hostedemail.com [216.40.44.154]) by kanga.kvack.org (Postfix) with ESMTP id 229A76B0073 for ; Sat, 7 Aug 2021 22:07:53 -0400 (EDT) Received: from smtpin04.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id CFF0E1F37F for ; Sun, 8 Aug 2021 02:07:52 +0000 (UTC) X-FDA: 78450277584.04.27F3C1E Received: from mail-pj1-f50.google.com (mail-pj1-f50.google.com [209.85.216.50]) by imf07.hostedemail.com (Postfix) with ESMTP id 6FD6C100EB4C for ; Sun, 8 Aug 2021 02:07:52 +0000 (UTC) Received: by mail-pj1-f50.google.com with SMTP id pj14-20020a17090b4f4eb029017786cf98f9so24132346pjb.2 for ; Sat, 07 Aug 2021 19:07:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=dwN65lOUWCgfc5FR2xnpIGVNh4f9hPLrdp8V42nRuEI=; b=fgc2EYW7k+YZMwiC/zEHrwd3OX9XXSS0Py/nYlLGpV/s7MzDSslbfB/PwvZoKrLgYF CJI4xqiBVk0DZTaRliobcytVTr2VcgidzWEv7AbclNsCdCqoZq0sh3ZW0I0jqZu5QTCw EN3hBb3ZYzF/nrf78YnpmJrbKVx+n0/6EmnSLcSQNsZ3PUaAs7mvy+aVVgd0FaFr62iP ZBcYYdoEdEFzWVfzDkDL0O0sPC7+9m/h/x31mXWtD+J0IgqBJs8pdYmltiYfrtQMXcez YJKFiOiiKBvBsMrRLb8l3iFbM9cqwdpG+hH8upGSNuXdeaJ2uZcibNVD02MhgwLdLCm7 8jQw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=dwN65lOUWCgfc5FR2xnpIGVNh4f9hPLrdp8V42nRuEI=; b=br4Y+Og8dftR8aLg7W3DFmUK8DLO9hPQ/EqA18Z/kskbqKYMOnIoNRLM0bcBPDNQHy eTJPtQ9A3ii7pIQ21+eqahCVkx96JM7IYAjUllOpkEUthT/cu9+/hakxgnuaofF/MKfJ fkHd3CfDbEDm8qqAgFT/ew8j9SyuKxr6/ufWtevfASSbiJYTKJi98i4Mzdw9NRYkH3NH ZSCkxQcjkmY0nqzkHdkztqDrHCJosqqoAbV8cp6UEFF0FG+UOzqn2iXET9e0PSZwEQld hF1IbwOcoziavZkTGIRBib3GUblOA/4+9UMIdpnhAqLBrCU2+w8E7Ds3cKYfvX36r7CA Ta/g== X-Gm-Message-State: AOAM533QcJLyqtaBM8HanAwJYIs/zOpzEAfvCp+Bxf+ibIWZe0wcCvy4 EBrH1PMGVcVsnP1kWD3XP74= X-Google-Smtp-Source: ABdhPJyvaJVUQ80nhCy7y9ZgYlCgRAOm1yMl+Lh6vfpO16MOgsEx+0kmVDYeMxlnm9snam5J25lqOQ== X-Received: by 2002:a17:90a:b38e:: with SMTP id e14mr17861844pjr.170.1628388471406; Sat, 07 Aug 2021 19:07:51 -0700 (PDT) Received: from sc2-haas01-esx0118.eng.vmware.com ([66.170.99.1]) by smtp.gmail.com with ESMTPSA id o18sm3987432pjp.1.2021.08.07.19.07.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 07 Aug 2021 19:07:50 -0700 (PDT) From: Nadav Amit X-Google-Original-From: Nadav Amit To: Andrew Morton Cc: Pavel Emelyanov , Mike Rapoport , Peter Xu , linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, Nadav Amit Subject: [PATCH 2/3] userfaultfd: prevent concurrent API initialization Date: Sat, 7 Aug 2021 19:07:23 -0700 Message-Id: <20210808020724.1022515-3-namit@vmware.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210808020724.1022515-1-namit@vmware.com> References: <20210808020724.1022515-1-namit@vmware.com> MIME-Version: 1.0 Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=gmail.com header.s=20161025 header.b=fgc2EYW7; dmarc=pass (policy=none) header.from=gmail.com; spf=none (imf07.hostedemail.com: domain of mail-pj1-f50.google.com has no SPF policy when checking 209.85.216.50) smtp.helo=mail-pj1-f50.google.com X-Stat-Signature: ag5xozhpa693gqgi3pk4yrcpwrbjyf4u X-Rspamd-Queue-Id: 6FD6C100EB4C X-Rspamd-Server: rspam01 X-HE-Tag: 1628388472-728118 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Nadav Amit userfaultfd assumes that the enabled features are set once and never changed after UFFDIO_API ioctl succeeded. However, currently, UFFDIO_API can be called concurrently from two different threads, succeed on both threads and leave userfaultfd's features in non-deterministic state. Theoretically, other uffd operations (ioctl's and page-faults) can be dispatched while adversely affected by such changes of features. Moreover, the writes to ctx->state and ctx->features are not ordered, which can - theoretically, again - let userfaultfd_ioctl() think that userfaultfd API completed, while the features are still not initialized. To avoid races, it is arguably best to get rid of ctx->state. Since there are only 2 states, record the API initialization in ctx->features as the uppermost bit and remove ctx->state. Cc: Pavel Emelyanov Fixes: 9cd75c3cd4c3d ("userfaultfd: non-cooperative: add ability to report non-PF events from uffd descriptor") Signed-off-by: Nadav Amit --- fs/userfaultfd.c | 91 +++++++++++++++++++++++------------------------- 1 file changed, 44 insertions(+), 47 deletions(-) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index 29a3016f16c9..003f0d31743e 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -33,11 +33,6 @@ int sysctl_unprivileged_userfaultfd __read_mostly; static struct kmem_cache *userfaultfd_ctx_cachep __read_mostly; -enum userfaultfd_state { - UFFD_STATE_WAIT_API, - UFFD_STATE_RUNNING, -}; - /* * Start with fault_pending_wqh and fault_wqh so they're more likely * to be in the same cacheline. @@ -69,8 +64,6 @@ struct userfaultfd_ctx { unsigned int flags; /* features requested from the userspace */ unsigned int features; - /* state machine */ - enum userfaultfd_state state; /* released */ bool released; /* memory mappings are changing because of non-cooperative event */ @@ -104,6 +97,14 @@ struct userfaultfd_wake_range { unsigned long len; }; +/* internal indication that UFFD_API ioctl was successfully executed */ +#define UFFD_FEATURE_INITIALIZED (1u << 31) + +static bool userfaultfd_is_initialized(struct userfaultfd_ctx *ctx) +{ + return ctx->features & UFFD_FEATURE_INITIALIZED; +} + static int userfaultfd_wake_function(wait_queue_entry_t *wq, unsigned mode, int wake_flags, void *key) { @@ -667,7 +668,6 @@ int dup_userfaultfd(struct vm_area_struct *vma, struct list_head *fcs) refcount_set(&ctx->refcount, 1); ctx->flags = octx->flags; - ctx->state = UFFD_STATE_RUNNING; ctx->features = octx->features; ctx->released = false; atomic_set(&ctx->mmap_changing, 0); @@ -944,38 +944,33 @@ static __poll_t userfaultfd_poll(struct file *file, poll_table *wait) poll_wait(file, &ctx->fd_wqh, wait); - switch (ctx->state) { - case UFFD_STATE_WAIT_API: + if (!userfaultfd_is_initialized(ctx)) return EPOLLERR; - case UFFD_STATE_RUNNING: - /* - * poll() never guarantees that read won't block. - * userfaults can be waken before they're read(). - */ - if (unlikely(!(file->f_flags & O_NONBLOCK))) - return EPOLLERR; - /* - * lockless access to see if there are pending faults - * __pollwait last action is the add_wait_queue but - * the spin_unlock would allow the waitqueue_active to - * pass above the actual list_add inside - * add_wait_queue critical section. So use a full - * memory barrier to serialize the list_add write of - * add_wait_queue() with the waitqueue_active read - * below. - */ - ret = 0; - smp_mb(); - if (waitqueue_active(&ctx->fault_pending_wqh)) - ret = EPOLLIN; - else if (waitqueue_active(&ctx->event_wqh)) - ret = EPOLLIN; - return ret; - default: - WARN_ON_ONCE(1); + /* + * poll() never guarantees that read won't block. + * userfaults can be waken before they're read(). + */ + if (unlikely(!(file->f_flags & O_NONBLOCK))) return EPOLLERR; - } + /* + * lockless access to see if there are pending faults + * __pollwait last action is the add_wait_queue but + * the spin_unlock would allow the waitqueue_active to + * pass above the actual list_add inside + * add_wait_queue critical section. So use a full + * memory barrier to serialize the list_add write of + * add_wait_queue() with the waitqueue_active read + * below. + */ + ret = 0; + smp_mb(); + if (waitqueue_active(&ctx->fault_pending_wqh)) + ret = EPOLLIN; + else if (waitqueue_active(&ctx->event_wqh)) + ret = EPOLLIN; + + return ret; } static const struct file_operations userfaultfd_fops; @@ -1170,7 +1165,7 @@ static ssize_t userfaultfd_read(struct file *file, char __user *buf, int no_wait = file->f_flags & O_NONBLOCK; struct inode *inode = file_inode(file); - if (ctx->state == UFFD_STATE_WAIT_API) + if (!userfaultfd_is_initialized(ctx)) return -EINVAL; for (;;) { @@ -1909,9 +1904,10 @@ static int userfaultfd_continue(struct userfaultfd_ctx *ctx, unsigned long arg) static inline unsigned int uffd_ctx_features(__u64 user_features) { /* - * For the current set of features the bits just coincide + * For the current set of features the bits just coincide. Set + * UFFD_FEATURE_INITIALIZED to mark the features as enabled. */ - return (unsigned int)user_features; + return (unsigned int)user_features | UFFD_FEATURE_INITIALIZED; } /* @@ -1924,12 +1920,10 @@ static int userfaultfd_api(struct userfaultfd_ctx *ctx, { struct uffdio_api uffdio_api; void __user *buf = (void __user *)arg; + unsigned int ctx_features; int ret; __u64 features; - ret = -EINVAL; - if (ctx->state != UFFD_STATE_WAIT_API) - goto out; ret = -EFAULT; if (copy_from_user(&uffdio_api, buf, sizeof(uffdio_api))) goto out; @@ -1953,9 +1947,13 @@ static int userfaultfd_api(struct userfaultfd_ctx *ctx, ret = -EFAULT; if (copy_to_user(buf, &uffdio_api, sizeof(uffdio_api))) goto out; - ctx->state = UFFD_STATE_RUNNING; + /* only enable the requested features for this uffd context */ - ctx->features = uffd_ctx_features(features); + ctx_features = uffd_ctx_features(features); + ret = -EINVAL; + if (cmpxchg(&ctx->features, 0, ctx_features) != 0) + goto err_out; + ret = 0; out: return ret; @@ -1972,7 +1970,7 @@ static long userfaultfd_ioctl(struct file *file, unsigned cmd, int ret = -EINVAL; struct userfaultfd_ctx *ctx = file->private_data; - if (cmd != UFFDIO_API && ctx->state == UFFD_STATE_WAIT_API) + if (cmd != UFFDIO_API && !userfaultfd_is_initialized(ctx)) return -EINVAL; switch(cmd) { @@ -2086,7 +2084,6 @@ SYSCALL_DEFINE1(userfaultfd, int, flags) refcount_set(&ctx->refcount, 1); ctx->flags = flags; ctx->features = 0; - ctx->state = UFFD_STATE_WAIT_API; ctx->released = false; atomic_set(&ctx->mmap_changing, 0); ctx->mm = current->mm; From patchwork Sun Aug 8 02:07:24 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nadav Amit X-Patchwork-Id: 12424645 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.5 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED,DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 08124C4320A for ; Sun, 8 Aug 2021 02:07:58 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 8BAE760ED6 for ; Sun, 8 Aug 2021 02:07:57 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 8BAE760ED6 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 155A56B0074; Sat, 7 Aug 2021 22:07:55 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 06AFF6B0075; Sat, 7 Aug 2021 22:07:54 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D890D8D0001; Sat, 7 Aug 2021 22:07:54 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0083.hostedemail.com [216.40.44.83]) by kanga.kvack.org (Postfix) with ESMTP id B5E4F6B0074 for ; Sat, 7 Aug 2021 22:07:54 -0400 (EDT) Received: from smtpin39.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 60945181A871D for ; Sun, 8 Aug 2021 02:07:54 +0000 (UTC) X-FDA: 78450277668.39.987C379 Received: from mail-pj1-f46.google.com (mail-pj1-f46.google.com [209.85.216.46]) by imf04.hostedemail.com (Postfix) with ESMTP id 19191500EA07 for ; Sun, 8 Aug 2021 02:07:53 +0000 (UTC) Received: by mail-pj1-f46.google.com with SMTP id m10-20020a17090a34cab0290176b52c60ddso24083753pjf.4 for ; Sat, 07 Aug 2021 19:07:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=1KFCzgzs6/mkJLTZRMdu7KL4aRMw+QDalF1MyA5qyPc=; b=c1jmo4avM+j7qm/Hvn4rEviWxWdp0pbR8TMWAvIa31LCIWFMHOZDm7tiys7Czcb+26 CBtHJ07AbujN0rJgMi1sRutrTqJV0PR/Vx5LI3MHcT7nwIjcvyOhVlfqDZdjx+9zCeds nTjLHdGvAfcJoDzCFcdmi4aNHdrP6PF0C72yVYyXjbB88jvhpV1HuNKPlaQlU4Wkrc3F 2slbAD7Ve6CiwjOKbnIfKm6JM0BnL20IQvycWHiaNZBJw10ZWYCwqTuvZjt+jZszQSE6 wCrGbsO+HvUN/cd4AzMzRZoFrZku4zo8ndOU6uR2CJm0ELT5gvIlDhbF72QvPVUZ7laM v+9A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=1KFCzgzs6/mkJLTZRMdu7KL4aRMw+QDalF1MyA5qyPc=; b=L5lhmhLwk6cCckFf+nt3wbp31FA4hITMXpBrGbTgxlIBJqu/oGYbjPO9sVVWYqViMR Vq3l+LXe007THG/a4n4aNvgP+qdGVkVNWvRPCK0vcO7W68UUluvob/FC9qJY0jFgV9iS O8qPhGWTyIEXHM76aeXSqPerL3dAO3ZEXxe2aom6ff6XbT9ZGYg0o6AgFM756TfE/cML OoRityymf3n6B3WR08InZ6avaxAVgoFSiLc0aEvRKsyWrVQOIsSIPkOKO+rFu+5Zynzz cqoarLTUjvcHY4kPY/2aPa99gDMKXkbBK0XDvboakOvprx0XfmVGIW9rHyCEMM64GFWl OI4Q== X-Gm-Message-State: AOAM530mh9+ofFf4gD+4NvqDQrod7ZGfMWsQRBrze+xTQKja/ul2N59L Npq0HIoRNOkHG1Jlmq5RNcA= X-Google-Smtp-Source: ABdhPJytvGw2/Z+5f1wuXCQ77/SUg5Bv0uYeWV5Hra0kqSaDResaqJ/hxaWnIOzmCfUPHdwMjak1Tg== X-Received: by 2002:a05:6a00:c81:b029:30e:21bf:4c15 with SMTP id a1-20020a056a000c81b029030e21bf4c15mr17944810pfv.70.1628388473101; Sat, 07 Aug 2021 19:07:53 -0700 (PDT) Received: from sc2-haas01-esx0118.eng.vmware.com ([66.170.99.1]) by smtp.gmail.com with ESMTPSA id o18sm3987432pjp.1.2021.08.07.19.07.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 07 Aug 2021 19:07:52 -0700 (PDT) From: Nadav Amit X-Google-Original-From: Nadav Amit To: Andrew Morton Cc: Pavel Emelyanov , Mike Rapoport , Peter Xu , linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, Nadav Amit , Jens Axboe , Andrea Arcangeli , Alexander Viro , io-uring@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH 3/3] selftests/vm/userfaultfd: wake after copy failure Date: Sat, 7 Aug 2021 19:07:24 -0700 Message-Id: <20210808020724.1022515-4-namit@vmware.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210808020724.1022515-1-namit@vmware.com> References: <20210808020724.1022515-1-namit@vmware.com> MIME-Version: 1.0 X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 19191500EA07 Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=gmail.com header.s=20161025 header.b=c1jmo4av; dmarc=pass (policy=none) header.from=gmail.com; spf=none (imf04.hostedemail.com: domain of mail-pj1-f46.google.com has no SPF policy when checking 209.85.216.46) smtp.helo=mail-pj1-f46.google.com X-Stat-Signature: mk7kyt158sp1hyj5tut1krgum8zwk954 X-HE-Tag: 1628388473-757380 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Nadav Amit When userfaultfd copy-ioctl fails since the PTE already exists, an -EEXIST error is returned and the faulting thread is not woken. The current userfaultfd test does not wake the faulting thread in such case. The assumption is presumably that another thread set the PTE through copy/wp ioctl and would wake the faulting thread or that alternatively the fault handler would realize there is no need to "must_wait" and continue. This is not necessarily true. There is an assumption that the "must_wait" tests in handle_userfault() are sufficient to provide definitive answer whether the offending PTE is populated or not. However, userfaultfd_must_wait() test is lockless. Consequently, concurrent calls to ptep_modify_prot_start(), for instance, can clear the PTE and can cause userfaultfd_must_wait() to wrongly assume it is not populated and a wait is needed. There are therefore 3 options: (1) Change the tests to wake on copy failure. (2) Wake faulting thread unconditionally on zero/copy ioctls before returning -EEXIST. (3) Change the userfaultfd_must_wait() to hold locks. This patch took the first approach, but the others are valid solutions with different tradeoffs. Cc: Jens Axboe Cc: Andrea Arcangeli Cc: Peter Xu Cc: Alexander Viro Cc: io-uring@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org Signed-off-by: Nadav Amit --- tools/testing/selftests/vm/userfaultfd.c | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/tools/testing/selftests/vm/userfaultfd.c b/tools/testing/selftests/vm/userfaultfd.c index 2ea438e6b8b1..10ab56c2484a 100644 --- a/tools/testing/selftests/vm/userfaultfd.c +++ b/tools/testing/selftests/vm/userfaultfd.c @@ -566,6 +566,18 @@ static void retry_copy_page(int ufd, struct uffdio_copy *uffdio_copy, } } +static void wake_range(int ufd, unsigned long addr, unsigned long len) +{ + struct uffdio_range uffdio_wake; + + uffdio_wake.start = addr; + uffdio_wake.len = len; + + if (ioctl(ufd, UFFDIO_WAKE, &uffdio_wake)) + fprintf(stderr, "error waking %lu\n", + addr), exit(1); +} + static int __copy_page(int ufd, unsigned long offset, bool retry) { struct uffdio_copy uffdio_copy; @@ -585,6 +597,7 @@ static int __copy_page(int ufd, unsigned long offset, bool retry) if (uffdio_copy.copy != -EEXIST) err("UFFDIO_COPY error: %"PRId64, (int64_t)uffdio_copy.copy); + wake_range(ufd, uffdio_copy.dst, page_size); } else if (uffdio_copy.copy != page_size) { err("UFFDIO_COPY error: %"PRId64, (int64_t)uffdio_copy.copy); } else {