From patchwork Wed May 23 07:42:04 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mike Rapoport X-Patchwork-Id: 10420457 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 5F5C96016C for ; Wed, 23 May 2018 07:42:20 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 434682899F for ; Wed, 23 May 2018 07:42:20 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 37A0928EA1; Wed, 23 May 2018 07:42:20 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00, MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2244C2899F for ; Wed, 23 May 2018 07:42:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E42C96B0003; Wed, 23 May 2018 03:42:17 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id E1A2B6B0005; Wed, 23 May 2018 03:42:17 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D0B146B0006; Wed, 23 May 2018 03:42:17 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-wm0-f70.google.com (mail-wm0-f70.google.com [74.125.82.70]) by kanga.kvack.org (Postfix) with ESMTP id 71BC46B0003 for ; Wed, 23 May 2018 03:42:17 -0400 (EDT) Received: by mail-wm0-f70.google.com with SMTP id t195-v6so1808715wmt.9 for ; Wed, 23 May 2018 00:42:17 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id; bh=h28jgQMMQlbiQ2iuenT137t22IYrVz9wDGXhaIkaaoY=; b=qc5W5F+I36ms2Li03UZOGvb4JxzXtiDGF+Rk3a9HsbAAD5peKW6oMExwLggF2wIqgM 0O4xm0MrQPi2bBbh8zGD1z53u8bMxTKZKm3pa6qshELw2+AvCVN5OvMDiGtUdAbkJbyj 0bDtDvhn8iQB8sjsIFAOf5SJFHQwukcN7dZ33/wTRUmlJiQnxvJ8Z6OA1c0x9sTTdmon K6MBp3iGOa1ovy6CWRONPJ3tr1c8mIP01BXSjRciptagGbR7tiHWWODFZldQByOTA4r4 5MciBufvsaRCCcBSFJySqJZxeCsGMA4A5bHGK8VPS7hN9Av/A32OgwgPdGEjT4MpAqvV qZBg== X-Original-Authentication-Results: mx.google.com; spf=neutral (google.com: 148.163.158.5 is neither permitted nor denied by best guess record for domain of rppt@linux.vnet.ibm.com) smtp.mailfrom=rppt@linux.vnet.ibm.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com X-Gm-Message-State: ALKqPwdyah5Fx9rOdogVPVhgQu1GoSHE7vb07My86wsQzSfnarW5rDt+ 3sdMSCePoKwyznXvFgAeQS2zTM3eWyjxSQ74cbwoZsrLNl2DEZrHCq36XZ3Q2Q+wreqaYX+TyrS HLM1jX1Cs3bET2hdEUaKrVIOLqo3L7C90PYuOMNaYQXlThp3u6K1BSei5aMTlymk= X-Received: by 2002:a1c:4a5d:: with SMTP id x90-v6mr3148654wma.101.1527061336947; Wed, 23 May 2018 00:42:16 -0700 (PDT) X-Google-Smtp-Source: AB8JxZrZZsDiB+rnawRPB3Pd3gIu3ToR3ADGwj66C6BcyiGw7vi8ZmjzYSPnLFDIPViEKkNgv9lz X-Received: by 2002:a1c:4a5d:: with SMTP id x90-v6mr3148608wma.101.1527061335617; Wed, 23 May 2018 00:42:15 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1527061335; cv=none; d=google.com; s=arc-20160816; b=lH+nXrU+qqdsPZXOCO1ewrQnqE/QIhOCWkvg50vVe5UHGom+hcigDCO5o3tN3ExabO /pWhsqmgL2sXwPIOYgx75Jwa2fFOPOtcBn4fLc5wGSQ+xj+1bbALGpKcNmPuNRCIHbL4 CBHMzCrbX7t59Z4wVlmeneKoZzM0x/qii5ZVXa4jahSDSaeJXyaLfN3MqmTOwRB7xww1 jjWqpA7mN7MzoARsiDQi8K4J3kmmwqUtHCT9HE0z0W5Ly0aBk4gQ10qEvA07FCnmq3nG IrA8J+ahuC27OeRdwbTFLKXOag6mafNdGPOsK4gUoCfrz8RzmNuEpFeBNWzT0gWt/tnF CEVw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=message-id:date:subject:cc:to:from:arc-authentication-results; bh=h28jgQMMQlbiQ2iuenT137t22IYrVz9wDGXhaIkaaoY=; b=LzbxM2mVZXjfsemFuLL8bERjdR6lXV3rAzqha3J6AWcU0jH/5bFx7OSCvPQzmOAV7d vOEKrejJo6MTpylkb5xBR2iIAkL7zLOSn1VZUwFK5CIQkcCdp1y3ydkzLnRbRBYiKvxu r1uS3dQlX6U7zLmfGM3eYKeg3bJeFzumRwbxejvHEUh9z4kbKPSaP8aE8gCTtSkxROmN HSJssD/edJNslOXlt93202t2Y2+hvwyJl9NrI3xzwKWCUgkrg4yWcDzw5iESW3BUP6pc TV6X8fa+qVKKoA9l8td4/gBLpB+XxbzQtE0gUYtE+ABZqDPQgKvb880NQLsxDdlNiTC5 XPKw== ARC-Authentication-Results: i=1; mx.google.com; spf=neutral (google.com: 148.163.158.5 is neither permitted nor denied by best guess record for domain of rppt@linux.vnet.ibm.com) smtp.mailfrom=rppt@linux.vnet.ibm.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com. [148.163.158.5]) by mx.google.com with ESMTPS id b33-v6si16353303wrd.207.2018.05.23.00.42.15 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 23 May 2018 00:42:15 -0700 (PDT) Received-SPF: neutral (google.com: 148.163.158.5 is neither permitted nor denied by best guess record for domain of rppt@linux.vnet.ibm.com) client-ip=148.163.158.5; Authentication-Results: mx.google.com; spf=neutral (google.com: 148.163.158.5 is neither permitted nor denied by best guess record for domain of rppt@linux.vnet.ibm.com) smtp.mailfrom=rppt@linux.vnet.ibm.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Received: from pps.filterd (m0098419.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.22/8.16.0.22) with SMTP id w4N7eV1C015901 for ; Wed, 23 May 2018 03:42:14 -0400 Received: from e06smtp10.uk.ibm.com (e06smtp10.uk.ibm.com [195.75.94.106]) by mx0b-001b2d01.pphosted.com with ESMTP id 2j50yby7gj-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Wed, 23 May 2018 03:42:13 -0400 Received: from localhost by e06smtp10.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Wed, 23 May 2018 08:42:12 +0100 Received: from b06cxnps3074.portsmouth.uk.ibm.com (9.149.109.194) by e06smtp10.uk.ibm.com (192.168.101.140) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Wed, 23 May 2018 08:42:09 +0100 Received: from d06av25.portsmouth.uk.ibm.com (d06av25.portsmouth.uk.ibm.com [9.149.105.61]) by b06cxnps3074.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id w4N7g9og9175296; Wed, 23 May 2018 07:42:09 GMT Received: from d06av25.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id EC3D311C04A; Wed, 23 May 2018 08:33:14 +0100 (BST) Received: from d06av25.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 5912211C050; Wed, 23 May 2018 08:33:13 +0100 (BST) Received: from rapoport-lnx (unknown [9.148.8.83]) by d06av25.portsmouth.uk.ibm.com (Postfix) with ESMTPS; Wed, 23 May 2018 08:33:13 +0100 (BST) Received: by rapoport-lnx (sSMTP sendmail emulation); Wed, 23 May 2018 10:42:06 +0300 From: Mike Rapoport To: Andrew Morton Cc: linux-mm , lkml , Mike Rapoport , Andrea Arcangeli , Mike Kravetz , Pavel Emelyanov , Andrei Vagin Subject: [PATCH] userfaultfd: prevent non-cooperative events vs mcopy_atomic races Date: Wed, 23 May 2018 10:42:04 +0300 X-Mailer: git-send-email 2.7.4 X-TM-AS-GCONF: 00 x-cbid: 18052307-0040-0000-0000-0000043CE2E9 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 18052307-0041-0000-0000-0000264219DA Message-Id: <1527061324-19949-1-git-send-email-rppt@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2018-05-23_04:, , signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1011 lowpriorityscore=0 impostorscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1709140000 definitions=main-1805230075 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP If a process monitored with userfaultfd changes it's memory mappings or forks() at the same time as uffd monitor fills the process memory with UFFDIO_COPY, the actual creation of page table entries and copying of the data in mcopy_atomic may happen either before of after the memory mapping modifications and there is no way for the uffd monitor to maintain consistent view of the process memory layout. For instance, let's consider fork() running in parallel with userfaultfd_copy(): process | uffd monitor ---------------------------------+------------------------------ fork() | userfaultfd_copy() ... | ... dup_mmap() | down_read(mmap_sem) down_write(mmap_sem) | /* create PTEs, copy data */ dup_uffd() | up_read(mmap_sem) copy_page_range() | up_write(mmap_sem) | dup_uffd_complete() | /* notify monitor */ | If the userfaultfd_copy() takes the mmap_sem first, the new page(s) will be present by the time copy_page_range() is called and they will appear in the child's memory mappings. However, if the fork() is the first to take the mmap_sem, the new pages won't be mapped in the child's address space. Since userfaultfd monitor has no way to determine what was the order, let's disallow userfaultfd_copy in parallel with the non-cooperative events. In such case we return -EAGAIN and the uffd monitor can understand that userfaultfd_copy() clashed with a non-cooperative event and take an appropriate action. Signed-off-by: Mike Rapoport Cc: Andrea Arcangeli Cc: Mike Kravetz Cc: Pavel Emelyanov Cc: Andrei Vagin Acked-by: Pavel Emelyanov --- fs/userfaultfd.c | 22 ++++++++++++++++++++-- include/linux/userfaultfd_k.h | 6 ++++-- mm/userfaultfd.c | 22 +++++++++++++++++----- 3 files changed, 41 insertions(+), 9 deletions(-) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index cec550c8468f..123bf7d516fc 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -62,6 +62,8 @@ struct userfaultfd_ctx { enum userfaultfd_state state; /* released */ bool released; + /* memory mappings are changing because of non-cooperative event */ + bool mmap_changing; /* mm with one ore more vmas attached to this userfaultfd_ctx */ struct mm_struct *mm; }; @@ -641,6 +643,7 @@ static void userfaultfd_event_wait_completion(struct userfaultfd_ctx *ctx, * already released. */ out: + WRITE_ONCE(ctx->mmap_changing, false); userfaultfd_ctx_put(ctx); } @@ -686,10 +689,12 @@ int dup_userfaultfd(struct vm_area_struct *vma, struct list_head *fcs) ctx->state = UFFD_STATE_RUNNING; ctx->features = octx->features; ctx->released = false; + ctx->mmap_changing = false; ctx->mm = vma->vm_mm; mmgrab(ctx->mm); userfaultfd_ctx_get(octx); + WRITE_ONCE(octx->mmap_changing, true); fctx->orig = octx; fctx->new = ctx; list_add_tail(&fctx->list, fcs); @@ -732,6 +737,7 @@ void mremap_userfaultfd_prep(struct vm_area_struct *vma, if (ctx && (ctx->features & UFFD_FEATURE_EVENT_REMAP)) { vm_ctx->ctx = ctx; userfaultfd_ctx_get(ctx); + WRITE_ONCE(ctx->mmap_changing, true); } } @@ -772,6 +778,7 @@ bool userfaultfd_remove(struct vm_area_struct *vma, return true; userfaultfd_ctx_get(ctx); + WRITE_ONCE(ctx->mmap_changing, true); up_read(&mm->mmap_sem); msg_init(&ewq.msg); @@ -815,6 +822,7 @@ int userfaultfd_unmap_prep(struct vm_area_struct *vma, return -ENOMEM; userfaultfd_ctx_get(ctx); + WRITE_ONCE(ctx->mmap_changing, true); unmap_ctx->ctx = ctx; unmap_ctx->start = start; unmap_ctx->end = end; @@ -1653,6 +1661,10 @@ static int userfaultfd_copy(struct userfaultfd_ctx *ctx, user_uffdio_copy = (struct uffdio_copy __user *) arg; + ret = -EAGAIN; + if (READ_ONCE(ctx->mmap_changing)) + goto out; + ret = -EFAULT; if (copy_from_user(&uffdio_copy, user_uffdio_copy, /* don't copy "copy" last field */ @@ -1674,7 +1686,7 @@ static int userfaultfd_copy(struct userfaultfd_ctx *ctx, goto out; if (mmget_not_zero(ctx->mm)) { ret = mcopy_atomic(ctx->mm, uffdio_copy.dst, uffdio_copy.src, - uffdio_copy.len); + uffdio_copy.len, &ctx->mmap_changing); mmput(ctx->mm); } else { return -ESRCH; @@ -1705,6 +1717,10 @@ static int userfaultfd_zeropage(struct userfaultfd_ctx *ctx, user_uffdio_zeropage = (struct uffdio_zeropage __user *) arg; + ret = -EAGAIN; + if (READ_ONCE(ctx->mmap_changing)) + goto out; + ret = -EFAULT; if (copy_from_user(&uffdio_zeropage, user_uffdio_zeropage, /* don't copy "zeropage" last field */ @@ -1721,7 +1737,8 @@ static int userfaultfd_zeropage(struct userfaultfd_ctx *ctx, if (mmget_not_zero(ctx->mm)) { ret = mfill_zeropage(ctx->mm, uffdio_zeropage.range.start, - uffdio_zeropage.range.len); + uffdio_zeropage.range.len, + &ctx->mmap_changing); mmput(ctx->mm); } else { return -ESRCH; @@ -1900,6 +1917,7 @@ SYSCALL_DEFINE1(userfaultfd, int, flags) ctx->features = 0; ctx->state = UFFD_STATE_WAIT_API; ctx->released = false; + ctx->mmap_changing = false; ctx->mm = current->mm; /* prevent the mm struct to be freed */ mmgrab(ctx->mm); diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h index f2f3b68ba910..e091f0a11b11 100644 --- a/include/linux/userfaultfd_k.h +++ b/include/linux/userfaultfd_k.h @@ -31,10 +31,12 @@ extern int handle_userfault(struct vm_fault *vmf, unsigned long reason); extern ssize_t mcopy_atomic(struct mm_struct *dst_mm, unsigned long dst_start, - unsigned long src_start, unsigned long len); + unsigned long src_start, unsigned long len, + bool *mmap_changing); extern ssize_t mfill_zeropage(struct mm_struct *dst_mm, unsigned long dst_start, - unsigned long len); + unsigned long len, + bool *mmap_changing); /* mm helpers */ static inline bool is_mergeable_vm_userfaultfd_ctx(struct vm_area_struct *vma, diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index 39791b81ede7..5029f241908f 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -404,7 +404,8 @@ static __always_inline ssize_t __mcopy_atomic(struct mm_struct *dst_mm, unsigned long dst_start, unsigned long src_start, unsigned long len, - bool zeropage) + bool zeropage, + bool *mmap_changing) { struct vm_area_struct *dst_vma; ssize_t err; @@ -431,6 +432,15 @@ static __always_inline ssize_t __mcopy_atomic(struct mm_struct *dst_mm, down_read(&dst_mm->mmap_sem); /* + * If memory mappings are changing because of non-cooperative + * operation (e.g. mremap) running in parallel, bail out and + * request the user to retry later + */ + err = -EAGAIN; + if (mmap_changing && READ_ONCE(*mmap_changing)) + goto out_unlock; + + /* * Make sure the vma is not shared, that the dst range is * both valid and fully within a single existing vma. */ @@ -563,13 +573,15 @@ static __always_inline ssize_t __mcopy_atomic(struct mm_struct *dst_mm, } ssize_t mcopy_atomic(struct mm_struct *dst_mm, unsigned long dst_start, - unsigned long src_start, unsigned long len) + unsigned long src_start, unsigned long len, + bool *mmap_changing) { - return __mcopy_atomic(dst_mm, dst_start, src_start, len, false); + return __mcopy_atomic(dst_mm, dst_start, src_start, len, false, + mmap_changing); } ssize_t mfill_zeropage(struct mm_struct *dst_mm, unsigned long start, - unsigned long len) + unsigned long len, bool *mmap_changing) { - return __mcopy_atomic(dst_mm, start, 0, len, true); + return __mcopy_atomic(dst_mm, start, 0, len, true, mmap_changing); }