From patchwork Tue Jan 29 16:54:19 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Jerome Glisse X-Patchwork-Id: 10786599 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DA2081399 for ; Tue, 29 Jan 2019 16:54:44 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C75932D6D0 for ; Tue, 29 Jan 2019 16:54:44 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id C5CDE2D7F2; Tue, 29 Jan 2019 16:54:44 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C9D3D2D78C for ; Tue, 29 Jan 2019 16:54:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A47848E0003; Tue, 29 Jan 2019 11:54:41 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 9F5D78E0001; Tue, 29 Jan 2019 11:54:41 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 823608E0003; Tue, 29 Jan 2019 11:54:41 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qk1-f197.google.com (mail-qk1-f197.google.com [209.85.222.197]) by kanga.kvack.org (Postfix) with ESMTP id 499598E0001 for ; Tue, 29 Jan 2019 11:54:41 -0500 (EST) Received: by mail-qk1-f197.google.com with SMTP id a199so22078282qkb.23 for ; Tue, 29 Jan 2019 08:54:41 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=dYSEonfMHD0KMzmNcwh/ReESOay5XsdPiDMnqImW2JM=; b=uLLBeIv5Of/OhDaagIRGfxzqsh2hX0TZabBhisy6hpWZcpjSB1kkmvsSvNvw6AYtjC T09QdG473945gLWhsnvqimSW32QKOUVgmuX3RwDXdUFseKP4XXNlkRi4RusnRw0MkbTS XraIUPlbwcIvkJW1hAkPFFTIzC2BszNn0ilz8IXjKbx+DoT/pKIJLeJknkoQqaOSIW+N PGq1n6tUAO2y7VphMSroIJqQRskBJQzUez6ZRtS7IhFtCZXadSK460RYS4CyrgcqbYRX JnsgHky10tnfBozqYGIjtxUusOxc80ysrO1qYBkZxt8fv964Si6yIYUXuEZtu2h+Xghv HWpA== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com X-Gm-Message-State: AJcUukcaMT9JnW0gaa3LsMZLRVqu+ccLNbKlf0cvITUIiaL7feeagd9c TFJrurcK+0rGi94B1Ziy7ykiXnyGIzUvV8Yj9fWEiE2uwNgVqD7N8M3yNowMd/lPTm6iC+U3NtI cfqNNoggt8CkyNnAYTYYy2WNEnNyzSa3eHwsYEV6kfPd7+TVvBpEZezLhqH/Ltizjzg== X-Received: by 2002:ac8:668c:: with SMTP id d12mr25631846qtp.242.1548780881004; Tue, 29 Jan 2019 08:54:41 -0800 (PST) X-Google-Smtp-Source: ALg8bN6VepIYAJF0yEqFkYoZh4cDRWlO1s7hSPeit1o/YqjqhBfhMerkXKbrSZFl7aE58mFn8U8w X-Received: by 2002:ac8:668c:: with SMTP id d12mr25631806qtp.242.1548780880346; Tue, 29 Jan 2019 08:54:40 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1548780880; cv=none; d=google.com; s=arc-20160816; b=rg64qBr7A/nc+akDSDkcLefzBZGrLCfmn90F/prkwqIoJSiAprKBnlYAyG+BfTyuwf GpJkYiKrs5yNm0O8u00uV5rdycR/e7ZJM0pzVRGm91Lyo78Sn+clNjOFgRpIxOexTmri j7ds6HZZ7p7m5KiGNd1I6XA2bth707Tdm7S0P90QyVbV9Bq7GFUnvutbB8y5hwUDmj5c wQBNJidniBXH7z+/P0oUO854m9KR9RxGUNA63A3MXkyGoAk+YiJ9bOfi7so7hAb/ffXv CSN1PCYuXxcV8TseRmjPpwmh9iiiFCFeNdUaR2Vszvfwg3Eyq8xR/3yY/LEtQzRCbZOG XrZA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from; bh=dYSEonfMHD0KMzmNcwh/ReESOay5XsdPiDMnqImW2JM=; b=PO4sZTFRMGWNVS5A94xwgdSq00kREaM6ol4mD29V5xL9M2K6hIbqKtoGLMkAyrUO7O nDpncRUpp9zp/c9VgezYWR2mE15FWi4fC4XzP2PnV5cRLmiCpkK4727zGdbpuiNUxpfD jaaweSvEuO5pPdXdufPwc9d6ZcimYsnyzXQmYEKIWxjgZgMhw6E2iuBTwpct3UMVx3GV vlTXHHFjlOnlGFDYaK91NVR9KYS9sQUnuNYtJv2/S37umlnpTMjtx9dfzOYpwGP7F2Mn rmGHvezGYX9/QtN0WEDbdkewrLbuGkTEHLarDZnSfBJ6I9FNbrKWHkkLSUvDU1nCNebU aFmQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from mx1.redhat.com (mx1.redhat.com. [209.132.183.28]) by mx.google.com with ESMTPS id p78si1683979qke.233.2019.01.29.08.54.40 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 29 Jan 2019 08:54:40 -0800 (PST) Received-SPF: pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) client-ip=209.132.183.28; Authentication-Results: mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 828BC8667E; Tue, 29 Jan 2019 16:54:39 +0000 (UTC) Received: from localhost.localdomain.com (ovpn-122-2.rdu2.redhat.com [10.10.122.2]) by smtp.corp.redhat.com (Postfix) with ESMTP id 59C10103BAAD; Tue, 29 Jan 2019 16:54:38 +0000 (UTC) From: jglisse@redhat.com To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Ralph Campbell , John Hubbard , Andrew Morton Subject: [PATCH 01/10] mm/hmm: use reference counting for HMM struct Date: Tue, 29 Jan 2019 11:54:19 -0500 Message-Id: <20190129165428.3931-2-jglisse@redhat.com> In-Reply-To: <20190129165428.3931-1-jglisse@redhat.com> References: <20190129165428.3931-1-jglisse@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.26]); Tue, 29 Jan 2019 16:54:39 +0000 (UTC) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Jérôme Glisse Every time i read the code to check that the HMM structure does not vanish before it should thanks to the many lock protecting its removal i get a headache. Switch to reference counting instead it is much easier to follow and harder to break. This also remove some code that is no longer needed with refcounting. Signed-off-by: Jérôme Glisse Cc: Ralph Campbell Cc: John Hubbard Cc: Andrew Morton --- include/linux/hmm.h | 2 + mm/hmm.c | 178 +++++++++++++++++++++++++++++--------------- 2 files changed, 120 insertions(+), 60 deletions(-) diff --git a/include/linux/hmm.h b/include/linux/hmm.h index 66f9ebbb1df3..bd6e058597a6 100644 --- a/include/linux/hmm.h +++ b/include/linux/hmm.h @@ -131,6 +131,7 @@ enum hmm_pfn_value_e { /* * struct hmm_range - track invalidation lock on virtual address range * + * @hmm: the core HMM structure this range is active against * @vma: the vm area struct for the range * @list: all range lock are on a list * @start: range virtual start address (inclusive) @@ -142,6 +143,7 @@ enum hmm_pfn_value_e { * @valid: pfns array did not change since it has been fill by an HMM function */ struct hmm_range { + struct hmm *hmm; struct vm_area_struct *vma; struct list_head list; unsigned long start; diff --git a/mm/hmm.c b/mm/hmm.c index a04e4b810610..b9f384ea15e9 100644 --- a/mm/hmm.c +++ b/mm/hmm.c @@ -50,6 +50,7 @@ static const struct mmu_notifier_ops hmm_mmu_notifier_ops; */ struct hmm { struct mm_struct *mm; + struct kref kref; spinlock_t lock; struct list_head ranges; struct list_head mirrors; @@ -57,6 +58,16 @@ struct hmm { struct rw_semaphore mirrors_sem; }; +static inline struct hmm *hmm_get(struct mm_struct *mm) +{ + struct hmm *hmm = READ_ONCE(mm->hmm); + + if (hmm && kref_get_unless_zero(&hmm->kref)) + return hmm; + + return NULL; +} + /* * hmm_register - register HMM against an mm (HMM internal) * @@ -67,14 +78,9 @@ struct hmm { */ static struct hmm *hmm_register(struct mm_struct *mm) { - struct hmm *hmm = READ_ONCE(mm->hmm); + struct hmm *hmm = hmm_get(mm); bool cleanup = false; - /* - * The hmm struct can only be freed once the mm_struct goes away, - * hence we should always have pre-allocated an new hmm struct - * above. - */ if (hmm) return hmm; @@ -86,6 +92,7 @@ static struct hmm *hmm_register(struct mm_struct *mm) hmm->mmu_notifier.ops = NULL; INIT_LIST_HEAD(&hmm->ranges); spin_lock_init(&hmm->lock); + kref_init(&hmm->kref); hmm->mm = mm; spin_lock(&mm->page_table_lock); @@ -106,7 +113,7 @@ static struct hmm *hmm_register(struct mm_struct *mm) if (__mmu_notifier_register(&hmm->mmu_notifier, mm)) goto error_mm; - return mm->hmm; + return hmm; error_mm: spin_lock(&mm->page_table_lock); @@ -118,9 +125,41 @@ static struct hmm *hmm_register(struct mm_struct *mm) return NULL; } +static void hmm_free(struct kref *kref) +{ + struct hmm *hmm = container_of(kref, struct hmm, kref); + struct mm_struct *mm = hmm->mm; + + mmu_notifier_unregister_no_release(&hmm->mmu_notifier, mm); + + spin_lock(&mm->page_table_lock); + if (mm->hmm == hmm) + mm->hmm = NULL; + spin_unlock(&mm->page_table_lock); + + kfree(hmm); +} + +static inline void hmm_put(struct hmm *hmm) +{ + kref_put(&hmm->kref, hmm_free); +} + void hmm_mm_destroy(struct mm_struct *mm) { - kfree(mm->hmm); + struct hmm *hmm; + + spin_lock(&mm->page_table_lock); + hmm = hmm_get(mm); + mm->hmm = NULL; + if (hmm) { + hmm->mm = NULL; + spin_unlock(&mm->page_table_lock); + hmm_put(hmm); + return; + } + + spin_unlock(&mm->page_table_lock); } static int hmm_invalidate_range(struct hmm *hmm, bool device, @@ -165,7 +204,7 @@ static int hmm_invalidate_range(struct hmm *hmm, bool device, static void hmm_release(struct mmu_notifier *mn, struct mm_struct *mm) { struct hmm_mirror *mirror; - struct hmm *hmm = mm->hmm; + struct hmm *hmm = hmm_get(mm); down_write(&hmm->mirrors_sem); mirror = list_first_entry_or_null(&hmm->mirrors, struct hmm_mirror, @@ -186,36 +225,50 @@ static void hmm_release(struct mmu_notifier *mn, struct mm_struct *mm) struct hmm_mirror, list); } up_write(&hmm->mirrors_sem); + + hmm_put(hmm); } static int hmm_invalidate_range_start(struct mmu_notifier *mn, const struct mmu_notifier_range *range) { struct hmm_update update; - struct hmm *hmm = range->mm->hmm; + struct hmm *hmm = hmm_get(range->mm); + int ret; VM_BUG_ON(!hmm); + /* Check if hmm_mm_destroy() was call. */ + if (hmm->mm == NULL) + return 0; + update.start = range->start; update.end = range->end; update.event = HMM_UPDATE_INVALIDATE; update.blockable = range->blockable; - return hmm_invalidate_range(hmm, true, &update); + ret = hmm_invalidate_range(hmm, true, &update); + hmm_put(hmm); + return ret; } static void hmm_invalidate_range_end(struct mmu_notifier *mn, const struct mmu_notifier_range *range) { struct hmm_update update; - struct hmm *hmm = range->mm->hmm; + struct hmm *hmm = hmm_get(range->mm); VM_BUG_ON(!hmm); + /* Check if hmm_mm_destroy() was call. */ + if (hmm->mm == NULL) + return; + update.start = range->start; update.end = range->end; update.event = HMM_UPDATE_INVALIDATE; update.blockable = true; hmm_invalidate_range(hmm, false, &update); + hmm_put(hmm); } static const struct mmu_notifier_ops hmm_mmu_notifier_ops = { @@ -241,24 +294,13 @@ int hmm_mirror_register(struct hmm_mirror *mirror, struct mm_struct *mm) if (!mm || !mirror || !mirror->ops) return -EINVAL; -again: mirror->hmm = hmm_register(mm); if (!mirror->hmm) return -ENOMEM; down_write(&mirror->hmm->mirrors_sem); - if (mirror->hmm->mm == NULL) { - /* - * A racing hmm_mirror_unregister() is about to destroy the hmm - * struct. Try again to allocate a new one. - */ - up_write(&mirror->hmm->mirrors_sem); - mirror->hmm = NULL; - goto again; - } else { - list_add(&mirror->list, &mirror->hmm->mirrors); - up_write(&mirror->hmm->mirrors_sem); - } + list_add(&mirror->list, &mirror->hmm->mirrors); + up_write(&mirror->hmm->mirrors_sem); return 0; } @@ -273,33 +315,18 @@ EXPORT_SYMBOL(hmm_mirror_register); */ void hmm_mirror_unregister(struct hmm_mirror *mirror) { - bool should_unregister = false; - struct mm_struct *mm; - struct hmm *hmm; + struct hmm *hmm = READ_ONCE(mirror->hmm); - if (mirror->hmm == NULL) + if (hmm == NULL) return; - hmm = mirror->hmm; down_write(&hmm->mirrors_sem); list_del_init(&mirror->list); - should_unregister = list_empty(&hmm->mirrors); + /* To protect us against double unregister ... */ mirror->hmm = NULL; - mm = hmm->mm; - hmm->mm = NULL; up_write(&hmm->mirrors_sem); - if (!should_unregister || mm == NULL) - return; - - mmu_notifier_unregister_no_release(&hmm->mmu_notifier, mm); - - spin_lock(&mm->page_table_lock); - if (mm->hmm == hmm) - mm->hmm = NULL; - spin_unlock(&mm->page_table_lock); - - kfree(hmm); + hmm_put(hmm); } EXPORT_SYMBOL(hmm_mirror_unregister); @@ -708,6 +735,8 @@ int hmm_vma_get_pfns(struct hmm_range *range) struct mm_walk mm_walk; struct hmm *hmm; + range->hmm = NULL; + /* Sanity check, this really should not happen ! */ if (range->start < vma->vm_start || range->start >= vma->vm_end) return -EINVAL; @@ -717,14 +746,18 @@ int hmm_vma_get_pfns(struct hmm_range *range) hmm = hmm_register(vma->vm_mm); if (!hmm) return -ENOMEM; - /* Caller must have registered a mirror, via hmm_mirror_register() ! */ - if (!hmm->mmu_notifier.ops) + + /* Check if hmm_mm_destroy() was call. */ + if (hmm->mm == NULL) { + hmm_put(hmm); return -EINVAL; + } /* FIXME support hugetlb fs */ if (is_vm_hugetlb_page(vma) || (vma->vm_flags & VM_SPECIAL) || vma_is_dax(vma)) { hmm_pfns_special(range); + hmm_put(hmm); return -EINVAL; } @@ -736,6 +769,7 @@ int hmm_vma_get_pfns(struct hmm_range *range) * operations such has atomic access would not work. */ hmm_pfns_clear(range, range->pfns, range->start, range->end); + hmm_put(hmm); return -EPERM; } @@ -758,6 +792,12 @@ int hmm_vma_get_pfns(struct hmm_range *range) mm_walk.pte_hole = hmm_vma_walk_hole; walk_page_range(range->start, range->end, &mm_walk); + /* + * Transfer hmm reference to the range struct it will be drop inside + * the hmm_vma_range_done() function (which _must_ be call if this + * function return 0). + */ + range->hmm = hmm; return 0; } EXPORT_SYMBOL(hmm_vma_get_pfns); @@ -802,25 +842,27 @@ EXPORT_SYMBOL(hmm_vma_get_pfns); */ bool hmm_vma_range_done(struct hmm_range *range) { - unsigned long npages = (range->end - range->start) >> PAGE_SHIFT; - struct hmm *hmm; + bool ret = false; - if (range->end <= range->start) { + /* Sanity check this really should not happen. */ + if (range->hmm == NULL || range->end <= range->start) { BUG(); return false; } - hmm = hmm_register(range->vma->vm_mm); - if (!hmm) { - memset(range->pfns, 0, sizeof(*range->pfns) * npages); - return false; - } - - spin_lock(&hmm->lock); + spin_lock(&range->hmm->lock); list_del_rcu(&range->list); - spin_unlock(&hmm->lock); + ret = range->valid; + spin_unlock(&range->hmm->lock); - return range->valid; + /* Is the mm still alive ? */ + if (range->hmm->mm == NULL) + ret = false; + + /* Drop reference taken by hmm_vma_fault() or hmm_vma_get_pfns() */ + hmm_put(range->hmm); + range->hmm = NULL; + return ret; } EXPORT_SYMBOL(hmm_vma_range_done); @@ -880,6 +922,8 @@ int hmm_vma_fault(struct hmm_range *range, bool block) struct hmm *hmm; int ret; + range->hmm = NULL; + /* Sanity check, this really should not happen ! */ if (range->start < vma->vm_start || range->start >= vma->vm_end) return -EINVAL; @@ -891,14 +935,18 @@ int hmm_vma_fault(struct hmm_range *range, bool block) hmm_pfns_clear(range, range->pfns, range->start, range->end); return -ENOMEM; } - /* Caller must have registered a mirror using hmm_mirror_register() */ - if (!hmm->mmu_notifier.ops) + + /* Check if hmm_mm_destroy() was call. */ + if (hmm->mm == NULL) { + hmm_put(hmm); return -EINVAL; + } /* FIXME support hugetlb fs */ if (is_vm_hugetlb_page(vma) || (vma->vm_flags & VM_SPECIAL) || vma_is_dax(vma)) { hmm_pfns_special(range); + hmm_put(hmm); return -EINVAL; } @@ -910,6 +958,7 @@ int hmm_vma_fault(struct hmm_range *range, bool block) * operations such has atomic access would not work. */ hmm_pfns_clear(range, range->pfns, range->start, range->end); + hmm_put(hmm); return -EPERM; } @@ -945,7 +994,16 @@ int hmm_vma_fault(struct hmm_range *range, bool block) hmm_pfns_clear(range, &range->pfns[i], hmm_vma_walk.last, range->end); hmm_vma_range_done(range); + hmm_put(hmm); + } else { + /* + * Transfer hmm reference to the range struct it will be drop + * inside the hmm_vma_range_done() function (which _must_ be + * call if this function return 0). + */ + range->hmm = hmm; } + return ret; } EXPORT_SYMBOL(hmm_vma_fault); From patchwork Tue Jan 29 16:54:20 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Jerome Glisse X-Patchwork-Id: 10786601 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0F2A81399 for ; Tue, 29 Jan 2019 16:54:47 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id EFF622CDF2 for ; Tue, 29 Jan 2019 16:54:46 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id E3FEF2D615; Tue, 29 Jan 2019 16:54:46 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7C3782CDF2 for ; Tue, 29 Jan 2019 16:54:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6EC118E0004; Tue, 29 Jan 2019 11:54:42 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 699808E0001; Tue, 29 Jan 2019 11:54:42 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 53A7D8E0004; Tue, 29 Jan 2019 11:54:42 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qk1-f199.google.com (mail-qk1-f199.google.com [209.85.222.199]) by kanga.kvack.org (Postfix) with ESMTP id 2306B8E0001 for ; Tue, 29 Jan 2019 11:54:42 -0500 (EST) Received: by mail-qk1-f199.google.com with SMTP id a199so22078321qkb.23 for ; Tue, 29 Jan 2019 08:54:42 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=QcLzV1gGPkQedQiJBQ1N5gPCY/e3jxaa42upoNgKKqI=; b=WN+B6sSX3vaoBcPh9333onfZ4vkprO2pvwK6Eto7LTg9vnhsuu+U/ZeQMpwC99l4LS UTrWRi1EPxe06PzJx/56MtKvYc4YehIgcaneXG+Tzh8Micy0BPSn3ZULntd+C0OI/N6K ZhHMyEcThijmJ2tNv8aHNjRvJKD2e5Pvarr39ha2T+lf+2SU5rHJLVB5CEMJuhBr2XAr 6r7quetH/sb+dEUn7YyhEKjWfd9DO1baeVRhFdqRqq8lWkZVnnQEOMuXqkxxHcvFUqYj 149JQ0ECSH5WgW59/eBtHSKkRlpnYiP/ckyogk8/cVg8F+W3OBG+3EJFAFB285rCoxsS d7VA== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com X-Gm-Message-State: AJcUukco2ZAcIC6t7Cn8ThUKI9bxBPUlQhoqQusSLlWYEpsnEGILWWFs jdh6fNGUWTJQKvFFAiChc3tUYTv1NpFZFgRf05SPufWf272NTqD+YwXVa/x3oXE7hVNYi5s00Er rhEoORf3Huf0OBQ2SEE56eZ8csrk4AjV4PUgq46ghtH+G7jis0GUj96ivz72IaitsUg== X-Received: by 2002:a37:9845:: with SMTP id a66mr23463915qke.271.1548780881949; Tue, 29 Jan 2019 08:54:41 -0800 (PST) X-Google-Smtp-Source: ALg8bN5+PNGHaW2uaV7/IBg2Gh77U9o6G0OJkNIjyHQb/8U6C/YXw0LNpdsLZ0wda10rwDaniaXf X-Received: by 2002:a37:9845:: with SMTP id a66mr23463892qke.271.1548780881546; Tue, 29 Jan 2019 08:54:41 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1548780881; cv=none; d=google.com; s=arc-20160816; b=zQiUJ8DoufIXrlotrfjxzf7oEjWkIvGmdTq6EnjyEfDVlWUzNXFV12XreAOLgaIxg2 b7+JudZDQIjta+iyYSu/6bwyWunRqQMMc6CjeCHVSYUzBwEhBtztnl3kbY60b8ol/kFb BDqPQ2hdWpUbpbGI4REyrAQe/KuSX4qPm1Hg00BanZqadKTQylF7/3SfZBichmMJUmZU kq01SaHyUwCKS+gsDpCRj7ci1kB0+ftY99SzDQtl49Egr1pz28PAILfZNdFuCeNvaf2j zllHYvFAhMpMeo/dfZ2zoVyOTYdbIN4zym3zUKCUuetawKEX/36e6nwbN3B7pywAn7IG MyAw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from; bh=QcLzV1gGPkQedQiJBQ1N5gPCY/e3jxaa42upoNgKKqI=; b=OEprGEexS5lAnY0aVLt28MsWhFrLgy2UgQiXvy5LbIyBZ5+FcsnVoa0KeBZEdPBT7G 2vQ7wIKAI4Axfw54DLQvydu/lA3LiXGf5iuosovv2UEP6MojyQzWiNMPMTizZmPMOQJ/ 3ZdoH1G5G5zbOlWuu0kYJs9xGQM1haeT+VQtP5uOMCUgg6/gBzW9ixqh99Vd7gxglijk eBtJY9U4b63BNx5px7Cs6KhEcAPHKELJsMATaZ5sVfDLYMOqgQpzPIXKedQhbnI+ZTwB qqmWfc/BubwaHHqB2sAxstw+y6HPx5Vg1uxgdw3R7pssLxmiVETTVkVTJzaGKr2CXCDg Dq+g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from mx1.redhat.com (mx1.redhat.com. [209.132.183.28]) by mx.google.com with ESMTPS id o13si1528246qkk.92.2019.01.29.08.54.41 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 29 Jan 2019 08:54:41 -0800 (PST) Received-SPF: pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) client-ip=209.132.183.28; Authentication-Results: mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id B1D3237E8E; Tue, 29 Jan 2019 16:54:40 +0000 (UTC) Received: from localhost.localdomain.com (ovpn-122-2.rdu2.redhat.com [10.10.122.2]) by smtp.corp.redhat.com (Postfix) with ESMTP id AC35D102BCEB; Tue, 29 Jan 2019 16:54:39 +0000 (UTC) From: jglisse@redhat.com To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Andrew Morton , Ralph Campbell , John Hubbard Subject: [PATCH 02/10] mm/hmm: do not erase snapshot when a range is invalidated Date: Tue, 29 Jan 2019 11:54:20 -0500 Message-Id: <20190129165428.3931-3-jglisse@redhat.com> In-Reply-To: <20190129165428.3931-1-jglisse@redhat.com> References: <20190129165428.3931-1-jglisse@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.29]); Tue, 29 Jan 2019 16:54:40 +0000 (UTC) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Jérôme Glisse Users of HMM might be using the snapshot information to do preparatory step like dma mapping pages to a device before checking for invalidation through hmm_vma_range_done() so do not erase that information and assume users will do the right thing. Signed-off-by: Jérôme Glisse Cc: Andrew Morton Cc: Ralph Campbell Cc: John Hubbard --- mm/hmm.c | 6 ------ 1 file changed, 6 deletions(-) diff --git a/mm/hmm.c b/mm/hmm.c index b9f384ea15e9..74d69812d6be 100644 --- a/mm/hmm.c +++ b/mm/hmm.c @@ -170,16 +170,10 @@ static int hmm_invalidate_range(struct hmm *hmm, bool device, spin_lock(&hmm->lock); list_for_each_entry(range, &hmm->ranges, list) { - unsigned long addr, idx, npages; - if (update->end < range->start || update->start >= range->end) continue; range->valid = false; - addr = max(update->start, range->start); - idx = (addr - range->start) >> PAGE_SHIFT; - npages = (min(range->end, update->end) - addr) >> PAGE_SHIFT; - memset(&range->pfns[idx], 0, sizeof(*range->pfns) * npages); } spin_unlock(&hmm->lock); From patchwork Tue Jan 29 16:54:21 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Jerome Glisse X-Patchwork-Id: 10786603 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6096E1390 for ; Tue, 29 Jan 2019 16:54:49 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4DEB72CDF2 for ; Tue, 29 Jan 2019 16:54:49 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 41E702D462; Tue, 29 Jan 2019 16:54:49 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id BC16C2CDF2 for ; Tue, 29 Jan 2019 16:54:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E6A1C8E0005; Tue, 29 Jan 2019 11:54:43 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id DC60C8E0001; Tue, 29 Jan 2019 11:54:43 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C3F588E0005; Tue, 29 Jan 2019 11:54:43 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qt1-f200.google.com (mail-qt1-f200.google.com [209.85.160.200]) by kanga.kvack.org (Postfix) with ESMTP id 8FD738E0001 for ; Tue, 29 Jan 2019 11:54:43 -0500 (EST) Received: by mail-qt1-f200.google.com with SMTP id q3so25449989qtq.15 for ; Tue, 29 Jan 2019 08:54:43 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=n2/nkvHycDGkjXSl6PLGynhqbyvPV8LWO6aSFBVE884=; b=ecCF6zJ2fwv/1FkGorbez2vCKSWCWTC5Gdmn0JpXuNwsvC2RzJvyKwUq4Zq0jd1kma SnqJWiyPtAtQgaq3Ch1eiwGnfR0+HEq5C9MTzWnQZ8tZpGkzbD2DOxhkFmTX52M6cVYG p3AYleKgDms9d4lKaCfejwKGEaZ7ozDY5i8Eb6puaGmXutHFXjHE2EUZpD5KsqGRiZZT o8pQeHF63/Dmf5YYjXI6JqnwO0G7nQ5wkclCKkGqpi7AazRqG8JaMNz8rpl9fcNTuzip cyKX9Ju3n0uZGl6/YAGL4sgaFSAo0UHi2AJfrKUvzjDwrvZDJHgZTSvzpKPTL81q64Kr RMXw== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com X-Gm-Message-State: AJcUukeeE8InFKbzrOSyMib4IfACAt6aNjY5o5Imv3OCRaGXp6o92IF0 xUV3zN0eGEdFi1OxxU6zLZXxIieXpVF5E6aom8BzPW5AUWsSlKX6OheKN6j4YWPC0Ilo7rAXGm4 guQyN8HPt+W+DK89GuvWV/d8Bued5lxuQMqxAfU+OiBMYu+mRCkrWOZCbYCvgAEDsPw== X-Received: by 2002:aed:2c22:: with SMTP id f31mr27008064qtd.154.1548780883332; Tue, 29 Jan 2019 08:54:43 -0800 (PST) X-Google-Smtp-Source: ALg8bN6XsYmo1VGTq7T0LcJFyfgcjH97aU3K8UY7HtbMfzrUTyC9NoPbZ4gFmBfF+ZJMBrWfFzet X-Received: by 2002:aed:2c22:: with SMTP id f31mr27008035qtd.154.1548780882870; Tue, 29 Jan 2019 08:54:42 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1548780882; cv=none; d=google.com; s=arc-20160816; b=OUwWtP1xWVFeGxTa82zj9BT8xSglESTRGs6SKDMgPhQYeKCAqv+UkAo6KDxrz4K9eq x4gFPWmGCNE+jpZ9MXptJwgtNzZgbaXzzrAjIxSAcYTI2F0wKZnGYOK0IvII21q2136P XtDMKTgXdIyAmOhvfoToa+RyMiVLnDhtksGQOlF/ahqPZevuJfuvXzvVkwLfA4PL0z9W AqdkXFtzP2GeDsz6YsdIdqiEjfdoJAqibH3f5Utj8rE+jkZ1lo9QIefEIj2Vc8Ly8QQ+ uDcb6sxqhHHlamnxlBmpegPzquYEpDmSZiDjFWHMffyxQkQuvS6ZrIadeMePTjlbtqwe 3wmw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from; bh=n2/nkvHycDGkjXSl6PLGynhqbyvPV8LWO6aSFBVE884=; b=DYbWxXt1kzgqrNiTyPAyR7WNTCN2TBHdeC2VQkLoZPeGidov4lflOL9kDQRPZdqBLr LR4eOUA0ZKv57TbKBtWQIcU/LTISU6+WKsUBxJuvdc9TOKgpNoOM1mlda8fYdDx1mpW9 4LAG/8F07jebhy+XPSzjjiqDejbpFH+Z4/aQKg3mu91ltL5YuUethWTyVpR/uD5vZqse DdI2pUXCX2qSvlsGkkIsCCtFdPEQ++tIRaR2audKb2LuN9tZhRQgZX9DTYAt6lNoYkw8 nBw9dc14prP5tN5a1kIx09MccnDPskJ2uONnV8jYSpbLt/I2wGi6Io+R4tUpnjOlVbyW 87Cw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from mx1.redhat.com (mx1.redhat.com. [209.132.183.28]) by mx.google.com with ESMTPS id 44si2123991qtp.70.2019.01.29.08.54.42 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 29 Jan 2019 08:54:42 -0800 (PST) Received-SPF: pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) client-ip=209.132.183.28; Authentication-Results: mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id EDFC43DE0E; Tue, 29 Jan 2019 16:54:41 +0000 (UTC) Received: from localhost.localdomain.com (ovpn-122-2.rdu2.redhat.com [10.10.122.2]) by smtp.corp.redhat.com (Postfix) with ESMTP id E1FCA103BAB6; Tue, 29 Jan 2019 16:54:40 +0000 (UTC) From: jglisse@redhat.com To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Andrew Morton , Ralph Campbell , John Hubbard Subject: [PATCH 03/10] mm/hmm: improve and rename hmm_vma_get_pfns() to hmm_range_snapshot() Date: Tue, 29 Jan 2019 11:54:21 -0500 Message-Id: <20190129165428.3931-4-jglisse@redhat.com> In-Reply-To: <20190129165428.3931-1-jglisse@redhat.com> References: <20190129165428.3931-1-jglisse@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.29]); Tue, 29 Jan 2019 16:54:42 +0000 (UTC) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Jérôme Glisse Rename for consistency between code, comments and documentation. Also improves the comments on all the possible returns values. Improve the function by returning the number of populated entries in pfns array. Signed-off-by: Jérôme Glisse Cc: Andrew Morton Cc: Ralph Campbell Cc: John Hubbard --- include/linux/hmm.h | 4 ++-- mm/hmm.c | 23 ++++++++++------------- 2 files changed, 12 insertions(+), 15 deletions(-) diff --git a/include/linux/hmm.h b/include/linux/hmm.h index bd6e058597a6..ddf49c1b1f5e 100644 --- a/include/linux/hmm.h +++ b/include/linux/hmm.h @@ -365,11 +365,11 @@ void hmm_mirror_unregister(struct hmm_mirror *mirror); * table invalidation serializes on it. * * YOU MUST CALL hmm_vma_range_done() ONCE AND ONLY ONCE EACH TIME YOU CALL - * hmm_vma_get_pfns() WITHOUT ERROR ! + * hmm_range_snapshot() WITHOUT ERROR ! * * IF YOU DO NOT FOLLOW THE ABOVE RULE THE SNAPSHOT CONTENT MIGHT BE INVALID ! */ -int hmm_vma_get_pfns(struct hmm_range *range); +long hmm_range_snapshot(struct hmm_range *range); bool hmm_vma_range_done(struct hmm_range *range); diff --git a/mm/hmm.c b/mm/hmm.c index 74d69812d6be..0d9ecd3337e5 100644 --- a/mm/hmm.c +++ b/mm/hmm.c @@ -706,23 +706,19 @@ static void hmm_pfns_special(struct hmm_range *range) } /* - * hmm_vma_get_pfns() - snapshot CPU page table for a range of virtual addresses - * @range: range being snapshotted + * hmm_range_snapshot() - snapshot CPU page table for a range + * @range: range * Returns: -EINVAL if invalid argument, -ENOMEM out of memory, -EPERM invalid - * vma permission, 0 success + * permission (for instance asking for write and range is read only), + * -EAGAIN if you need to retry, -EFAULT invalid (ie either no valid + * vma or it is illegal to access that range), number of valid pages + * in range->pfns[] (from range start address). * * This snapshots the CPU page table for a range of virtual addresses. Snapshot * validity is tracked by range struct. See hmm_vma_range_done() for further * information. - * - * The range struct is initialized here. It tracks the CPU page table, but only - * if the function returns success (0), in which case the caller must then call - * hmm_vma_range_done() to stop CPU page table update tracking on this range. - * - * NOT CALLING hmm_vma_range_done() IF FUNCTION RETURNS 0 WILL LEAD TO SERIOUS - * MEMORY CORRUPTION ! YOU HAVE BEEN WARNED ! */ -int hmm_vma_get_pfns(struct hmm_range *range) +long hmm_range_snapshot(struct hmm_range *range) { struct vm_area_struct *vma = range->vma; struct hmm_vma_walk hmm_vma_walk; @@ -776,6 +772,7 @@ int hmm_vma_get_pfns(struct hmm_range *range) hmm_vma_walk.fault = false; hmm_vma_walk.range = range; mm_walk.private = &hmm_vma_walk; + hmm_vma_walk.last = range->start; mm_walk.vma = vma; mm_walk.mm = vma->vm_mm; @@ -792,9 +789,9 @@ int hmm_vma_get_pfns(struct hmm_range *range) * function return 0). */ range->hmm = hmm; - return 0; + return (hmm_vma_walk.last - range->start) >> PAGE_SHIFT; } -EXPORT_SYMBOL(hmm_vma_get_pfns); +EXPORT_SYMBOL(hmm_range_snapshot); /* * hmm_vma_range_done() - stop tracking change to CPU page table over a range From patchwork Tue Jan 29 16:54:22 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Jerome Glisse X-Patchwork-Id: 10786605 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1BB9A1399 for ; Tue, 29 Jan 2019 16:54:52 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 090EC2CDF2 for ; Tue, 29 Jan 2019 16:54:52 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id F18602D462; Tue, 29 Jan 2019 16:54:51 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 365282CDF2 for ; Tue, 29 Jan 2019 16:54:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 407298E0006; Tue, 29 Jan 2019 11:54:45 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 389FB8E0001; Tue, 29 Jan 2019 11:54:45 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1C2A98E0006; Tue, 29 Jan 2019 11:54:45 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qk1-f197.google.com (mail-qk1-f197.google.com [209.85.222.197]) by kanga.kvack.org (Postfix) with ESMTP id DE81A8E0001 for ; Tue, 29 Jan 2019 11:54:44 -0500 (EST) Received: by mail-qk1-f197.google.com with SMTP id d196so22331183qkb.6 for ; Tue, 29 Jan 2019 08:54:44 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=yqdJDvmMZhZHyM6z9qE/KfFqFLTwNaQykLYJ1YaZ0HY=; b=GZBADnDJ+8yeHzujUU/CfTvTGIsUrIV4HIhuKtqjXY9XfTuuagZw4QB23bAnsAp5PK eVAeyuH6zhyl2XJ8ly/YV4nXa2DE3YzjbMNXJX9zke8cHwqG6CcAHFVw5pOHMTiaHN6f YZK5FhnA3UFGuryLBCJ7PamfZ9+g63cfzRx8d/dnk2qBH1c+Ilm0lgGSmszdqdCyYbQ6 056Bu9s8x3HJgQ2qUhlcziHBd+epcgzpVfRQQBV4yfo1Q/xIN1YdQqsvzXBMwfr2Xwi8 wE/f2daUXvgT9xfaoV+2iE+nGsYJiYFbYo2d2E+PNlxCK8b0GnE6xRZoiRyuApmXBIpo wXDA== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com X-Gm-Message-State: AJcUukfZAtroc/sOYtoeAFl4SQPOnQ6U/X7W721WJmKozetkCNOv27EJ CVmCOjaupc+EYpDCM0QnLknZpPrpHtvDIxlfu/tIho0A+fqVf1h3M84yOvce6aOeyveppyAbJui yGYRiNmp4WsAMlnmzKtM8G9NJ/OQPNZIjoD+rEQv98tMZk4InjP1bJDtuf/qS4m2Urw== X-Received: by 2002:a0c:9257:: with SMTP id 23mr24850162qvz.142.1548780884641; Tue, 29 Jan 2019 08:54:44 -0800 (PST) X-Google-Smtp-Source: ALg8bN4bqIZ/Id/dcbMDM5OHrgYthFOA+UVQo5tuq7Jy/4eyD2za3yP7/0o7DtMOc8Yyyuwxnr32 X-Received: by 2002:a0c:9257:: with SMTP id 23mr24850123qvz.142.1548780884028; Tue, 29 Jan 2019 08:54:44 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1548780884; cv=none; d=google.com; s=arc-20160816; b=iGK9RplzcCecRjJ41RKNFjgewqG2KGyg75Lyt887NCBJpAPuRmnmntB/Y8yhdduX2+ xQYi+DB6UjCiOuvSHGhxugFH+8rOHVLa4MAvZlsmL7vsyaMU1rQsy0LGc3z+1hTdjQnF QbcUPTIoA3ohiu89Zbuba5xc67cOsLWssE8B3UWJs3J3CjMyviiJ36UFszu3owW4sZFN C+lXBr9SiJVsoN7AFnUtWBsx70J4+upgIzjvalL7nV0fWRETEgWJQKoFNJvCc1myTTaw rKbStA97SwCnydv6V5jWyofn5lcE0ABOHNJKMLbcCa8fMvENUXPpjid7JynBDS/pLoPM 6p8Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from; bh=yqdJDvmMZhZHyM6z9qE/KfFqFLTwNaQykLYJ1YaZ0HY=; b=C2ik/k+nKqWtYbFSubJ0/M4LOohqnrysOrQ3Fi81qSrEgOGQzwi1qkN1GZTGmKEXm0 +C901/yyllkpwhd1VKSGoAqXYBbLPk8kE+pZWzAI6gU4pagWiWzvkpZ23s4IbJ13K/mA qzcfmeGYjtVAsGVa0ZOAtQgmsJabJ7kui7ogNtV93aYXuR4mkm4MWCfZuEifCDkbfCkh 9uWYgEsKRHW9MLikUtxgWapugPTWIVJptQnqSomeQjGVoKSt+iIptcR0zsTj8BrV8Wt7 E0pQgGsWFg4NX0vhWZybiRWjrR2UaB+b33AnMBKd7MFLjCKbG3MfrcLSfFDh53fiKt87 Qi7g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from mx1.redhat.com (mx1.redhat.com. [209.132.183.28]) by mx.google.com with ESMTPS id g51si200588qtc.224.2019.01.29.08.54.43 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 29 Jan 2019 08:54:44 -0800 (PST) Received-SPF: pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) client-ip=209.132.183.28; Authentication-Results: mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 2D7C3A4D21; Tue, 29 Jan 2019 16:54:43 +0000 (UTC) Received: from localhost.localdomain.com (ovpn-122-2.rdu2.redhat.com [10.10.122.2]) by smtp.corp.redhat.com (Postfix) with ESMTP id 2772C102BCEB; Tue, 29 Jan 2019 16:54:42 +0000 (UTC) From: jglisse@redhat.com To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Andrew Morton , Ralph Campbell , John Hubbard Subject: [PATCH 04/10] mm/hmm: improve and rename hmm_vma_fault() to hmm_range_fault() Date: Tue, 29 Jan 2019 11:54:22 -0500 Message-Id: <20190129165428.3931-5-jglisse@redhat.com> In-Reply-To: <20190129165428.3931-1-jglisse@redhat.com> References: <20190129165428.3931-1-jglisse@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.38]); Tue, 29 Jan 2019 16:54:43 +0000 (UTC) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Jérôme Glisse Rename for consistency between code, comments and documentation. Also improves the comments on all the possible returns values. Improve the function by returning the number of populated entries in pfns array. Signed-off-by: Jérôme Glisse Cc: Andrew Morton Cc: Ralph Campbell Cc: John Hubbard --- include/linux/hmm.h | 13 ++++++- mm/hmm.c | 93 ++++++++++++++++++++------------------------- 2 files changed, 53 insertions(+), 53 deletions(-) diff --git a/include/linux/hmm.h b/include/linux/hmm.h index ddf49c1b1f5e..ccf2b630447e 100644 --- a/include/linux/hmm.h +++ b/include/linux/hmm.h @@ -391,7 +391,18 @@ bool hmm_vma_range_done(struct hmm_range *range); * * See the function description in mm/hmm.c for further documentation. */ -int hmm_vma_fault(struct hmm_range *range, bool block); +long hmm_range_fault(struct hmm_range *range, bool block); + +/* This is a temporary helper to avoid merge conflict between trees. */ +static inline int hmm_vma_fault(struct hmm_range *range, bool block) +{ + long ret = hmm_range_fault(range, block); + if (ret == -EBUSY) + ret = -EAGAIN; + else if (ret == -EAGAIN) + ret = -EBUSY; + return ret < 0 ? ret : 0; +} /* Below are for HMM internal use only! Not to be used by device driver! */ void hmm_mm_destroy(struct mm_struct *mm); diff --git a/mm/hmm.c b/mm/hmm.c index 0d9ecd3337e5..04235455b4d2 100644 --- a/mm/hmm.c +++ b/mm/hmm.c @@ -344,13 +344,13 @@ static int hmm_vma_do_fault(struct mm_walk *walk, unsigned long addr, flags |= write_fault ? FAULT_FLAG_WRITE : 0; ret = handle_mm_fault(vma, addr, flags); if (ret & VM_FAULT_RETRY) - return -EBUSY; + return -EAGAIN; if (ret & VM_FAULT_ERROR) { *pfn = range->values[HMM_PFN_ERROR]; return -EFAULT; } - return -EAGAIN; + return -EBUSY; } static int hmm_pfns_bad(unsigned long addr, @@ -376,7 +376,7 @@ static int hmm_pfns_bad(unsigned long addr, * @fault: should we fault or not ? * @write_fault: write fault ? * @walk: mm_walk structure - * Returns: 0 on success, -EAGAIN after page fault, or page fault error + * Returns: 0 on success, -EBUSY after page fault, or page fault error * * This function will be called whenever pmd_none() or pte_none() returns true, * or whenever there is no page directory covering the virtual address range. @@ -399,12 +399,12 @@ static int hmm_vma_walk_hole_(unsigned long addr, unsigned long end, ret = hmm_vma_do_fault(walk, addr, write_fault, &pfns[i]); - if (ret != -EAGAIN) + if (ret != -EBUSY) return ret; } } - return (fault || write_fault) ? -EAGAIN : 0; + return (fault || write_fault) ? -EBUSY : 0; } static inline void hmm_pte_need_fault(const struct hmm_vma_walk *hmm_vma_walk, @@ -535,11 +535,11 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr, uint64_t orig_pfn = *pfn; *pfn = range->values[HMM_PFN_NONE]; - cpu_flags = pte_to_hmm_pfn_flags(range, pte); - hmm_pte_need_fault(hmm_vma_walk, orig_pfn, cpu_flags, - &fault, &write_fault); + fault = write_fault = false; if (pte_none(pte)) { + hmm_pte_need_fault(hmm_vma_walk, orig_pfn, 0, + &fault, &write_fault); if (fault || write_fault) goto fault; return 0; @@ -578,7 +578,7 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr, hmm_vma_walk->last = addr; migration_entry_wait(vma->vm_mm, pmdp, addr); - return -EAGAIN; + return -EBUSY; } return 0; } @@ -586,6 +586,10 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr, /* Report error for everything else */ *pfn = range->values[HMM_PFN_ERROR]; return -EFAULT; + } else { + cpu_flags = pte_to_hmm_pfn_flags(range, pte); + hmm_pte_need_fault(hmm_vma_walk, orig_pfn, cpu_flags, + &fault, &write_fault); } if (fault || write_fault) @@ -636,7 +640,7 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp, if (fault || write_fault) { hmm_vma_walk->last = addr; pmd_migration_entry_wait(vma->vm_mm, pmdp); - return -EAGAIN; + return -EBUSY; } return 0; } else if (!pmd_present(pmd)) @@ -858,53 +862,36 @@ bool hmm_vma_range_done(struct hmm_range *range) EXPORT_SYMBOL(hmm_vma_range_done); /* - * hmm_vma_fault() - try to fault some address in a virtual address range + * hmm_range_fault() - try to fault some address in a virtual address range * @range: range being faulted * @block: allow blocking on fault (if true it sleeps and do not drop mmap_sem) - * Returns: 0 success, error otherwise (-EAGAIN means mmap_sem have been drop) + * Returns: 0 on success ortherwise: + * -EINVAL: + * Invalid argument + * -ENOMEM: + * Out of memory. + * -EPERM: + * Invalid permission (for instance asking for write and range + * is read only). + * -EAGAIN: + * If you need to retry and mmap_sem was drop. This can only + * happens if block argument is false. + * -EBUSY: + * If the the range is being invalidated and you should wait for + * invalidation to finish. + * -EFAULT: + * Invalid (ie either no valid vma or it is illegal to access that + * range), number of valid pages in range->pfns[] (from range start + * address). * * This is similar to a regular CPU page fault except that it will not trigger - * any memory migration if the memory being faulted is not accessible by CPUs. + * any memory migration if the memory being faulted is not accessible by CPUs + * and caller does not ask for migration. * * On error, for one virtual address in the range, the function will mark the * corresponding HMM pfn entry with an error flag. - * - * Expected use pattern: - * retry: - * down_read(&mm->mmap_sem); - * // Find vma and address device wants to fault, initialize hmm_pfn_t - * // array accordingly - * ret = hmm_vma_fault(range, write, block); - * switch (ret) { - * case -EAGAIN: - * hmm_vma_range_done(range); - * // You might want to rate limit or yield to play nicely, you may - * // also commit any valid pfn in the array assuming that you are - * // getting true from hmm_vma_range_monitor_end() - * goto retry; - * case 0: - * break; - * case -ENOMEM: - * case -EINVAL: - * case -EPERM: - * default: - * // Handle error ! - * up_read(&mm->mmap_sem) - * return; - * } - * // Take device driver lock that serialize device page table update - * driver_lock_device_page_table_update(); - * hmm_vma_range_done(range); - * // Commit pfns we got from hmm_vma_fault() - * driver_unlock_device_page_table_update(); - * up_read(&mm->mmap_sem) - * - * YOU MUST CALL hmm_vma_range_done() AFTER THIS FUNCTION RETURN SUCCESS (0) - * BEFORE FREEING THE range struct OR YOU WILL HAVE SERIOUS MEMORY CORRUPTION ! - * - * YOU HAVE BEEN WARNED ! */ -int hmm_vma_fault(struct hmm_range *range, bool block) +long hmm_range_fault(struct hmm_range *range, bool block) { struct vm_area_struct *vma = range->vma; unsigned long start = range->start; @@ -976,7 +963,8 @@ int hmm_vma_fault(struct hmm_range *range, bool block) do { ret = walk_page_range(start, range->end, &mm_walk); start = hmm_vma_walk.last; - } while (ret == -EAGAIN); + /* Keep trying while the range is valid. */ + } while (ret == -EBUSY && range->valid); if (ret) { unsigned long i; @@ -986,6 +974,7 @@ int hmm_vma_fault(struct hmm_range *range, bool block) range->end); hmm_vma_range_done(range); hmm_put(hmm); + return ret; } else { /* * Transfer hmm reference to the range struct it will be drop @@ -995,9 +984,9 @@ int hmm_vma_fault(struct hmm_range *range, bool block) range->hmm = hmm; } - return ret; + return (hmm_vma_walk.last - range->start) >> PAGE_SHIFT; } -EXPORT_SYMBOL(hmm_vma_fault); +EXPORT_SYMBOL(hmm_range_fault); #endif /* IS_ENABLED(CONFIG_HMM_MIRROR) */ From patchwork Tue Jan 29 16:54:23 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Jerome Glisse X-Patchwork-Id: 10786607 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0D9531399 for ; Tue, 29 Jan 2019 16:54:56 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id EBE8A2CDF2 for ; Tue, 29 Jan 2019 16:54:55 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id DF97A2D462; Tue, 29 Jan 2019 16:54:55 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 12EB92CDF2 for ; Tue, 29 Jan 2019 16:54:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3207F8E0007; Tue, 29 Jan 2019 11:54:48 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 254BF8E0001; Tue, 29 Jan 2019 11:54:48 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 035708E0007; Tue, 29 Jan 2019 11:54:47 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qk1-f199.google.com (mail-qk1-f199.google.com [209.85.222.199]) by kanga.kvack.org (Postfix) with ESMTP id C776C8E0001 for ; Tue, 29 Jan 2019 11:54:47 -0500 (EST) Received: by mail-qk1-f199.google.com with SMTP id f22so22130645qkm.11 for ; Tue, 29 Jan 2019 08:54:47 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=qN02vtl3Eti1Qs0mRWCMA6xUmr1GQzcP08f9KOc7rhE=; b=biYmyh+NIB7oYfahDlpt1YAmT4kwybYWOdW1meKiL9imOAcqTxdsID92dZbTocue+U gN/GB9Q3PlgSlhX0SJcg6b3zdgFMER/pj9lu5tP+pt9TudI5wShEYKAVSJKU/Vc1gKJn QGuh9BW7rD36Zh06SYoGACTKygaQHXESz6nTDv5JvqctOsTJv9p0KNPjSLbXS67AddTI yelCmgOCA8fDM6bVYCUaHN9kNQCQcSy88F94fRIgwCWylzgco6F86ySiUeWuFEgsHYKP zVfj+E3DClOUFmgK4Xt6Ua4UxvEP/BwDcFKw+Vl6xI2CFxRT/JW4nxs8b0VnYEGFTRaD mItA== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com X-Gm-Message-State: AJcUukd2Iin8OH4LoPbSzcMKW1Noq6N8Dd4n+5ohR32i7RGG3Y7z2his 5ln6RY4rqFB0aXJcr/NwRq7IaqPtObNHdKrRswrouAMKe8vJeyipeyRx7w70TBqVgnovczNsTAZ JN55rFcyfjnlY4fJU4kfseTgCgN44d10x7aa1Wyonf8gfrJeNlEvxbWg7t1JuCi/ndQ== X-Received: by 2002:a37:9543:: with SMTP id x64mr24437793qkd.158.1548780887504; Tue, 29 Jan 2019 08:54:47 -0800 (PST) X-Google-Smtp-Source: ALg8bN5jTf67NzBVoBtNUnwJn/zttQR7XYLELyttHO0Mff+BWKi4jfDFslOx/VcmZ12Bk8jz+tVz X-Received: by 2002:a37:9543:: with SMTP id x64mr24437721qkd.158.1548780886432; Tue, 29 Jan 2019 08:54:46 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1548780886; cv=none; d=google.com; s=arc-20160816; b=IqGTKL3dOed/V34JqiCH1THg8gXralpYQ0hc/ulHwqHFYdUy/KfFDUO1Lscm6hyh0Y ztiYSNA9fVtSOze1vyu4goygFAMZHiFYvHlzZw2ObARv28NSZhHfPieIvz6Cz0Rxm/SN OZTWFeRI2pqgkp4PSdEAUZJfLNxKU+bVt6jtrajpjvbmWAYyIpxsnQ5wXUsYac0WT/WW T1HYoqyLi8R4AdHuEpUuug8o/T6ZY0QO9/0IQGwzdd4wnXrCIxFYLJLfpP9Ml+vsQ50a 5yBRt7ZG/Sm7Q4k0dsMrUGJZJ/yXeHXGgTE10Yp5NoFmyIVGxUMqeGwV0riZ38pCxPcm B44w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from; bh=qN02vtl3Eti1Qs0mRWCMA6xUmr1GQzcP08f9KOc7rhE=; b=H6/CVj+yFguIN6oXxuvF/us5sO8YyWjD2kdX9nYwYc8YQf7J59RXXqlwmSHUY0vdjv FWL6P4pQUERRuGcR00b73SAM/NILLtOiT010XbYVfa6edfqMCf7By5gma5d3aXiLA80O SrZpBidhgG89aG8rR4xTt88JbGIW0FGMHwsiuJxn/ArMRpRBmUdNdlnlltO6DZxMtyhl YyhIlzIBhDQgmzptRPAnN/aZMeeUH/fOAQ13bDmByo3+2bU3RS5yb8LTqNPl/79Zgwsw JpibgQKxuPSD2UEJppkGnyH0uL7Qq0l8pWTTzrXQSFTIr4SrycCzuH2/9gYFgMzHpDat BlsA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from mx1.redhat.com (mx1.redhat.com. [209.132.183.28]) by mx.google.com with ESMTPS id f188si1478893qkb.226.2019.01.29.08.54.46 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 29 Jan 2019 08:54:46 -0800 (PST) Received-SPF: pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) client-ip=209.132.183.28; Authentication-Results: mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 90637A37BA; Tue, 29 Jan 2019 16:54:45 +0000 (UTC) Received: from localhost.localdomain.com (ovpn-122-2.rdu2.redhat.com [10.10.122.2]) by smtp.corp.redhat.com (Postfix) with ESMTP id 5EB63102BCEB; Tue, 29 Jan 2019 16:54:43 +0000 (UTC) From: jglisse@redhat.com To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Andrew Morton , Ralph Campbell , John Hubbard Subject: [PATCH 05/10] mm/hmm: improve driver API to work and wait over a range Date: Tue, 29 Jan 2019 11:54:23 -0500 Message-Id: <20190129165428.3931-6-jglisse@redhat.com> In-Reply-To: <20190129165428.3931-1-jglisse@redhat.com> References: <20190129165428.3931-1-jglisse@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.28]); Tue, 29 Jan 2019 16:54:45 +0000 (UTC) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Jérôme Glisse A common use case for HMM mirror is user trying to mirror a range and before they could program the hardware it get invalidated by some core mm event. Instead of having user re-try right away to mirror the range provide a completion mechanism for them to wait for any active invalidation affecting the range. This also changes how hmm_range_snapshot() and hmm_range_fault() works by not relying on vma so that we can drop the mmap_sem when waiting and lookup the vma again on retry. Signed-off-by: Jérôme Glisse Cc: Andrew Morton Cc: Ralph Campbell Cc: John Hubbard --- include/linux/hmm.h | 208 +++++++++++++++--- mm/hmm.c | 526 +++++++++++++++++++++----------------------- 2 files changed, 430 insertions(+), 304 deletions(-) diff --git a/include/linux/hmm.h b/include/linux/hmm.h index ccf2b630447e..93dc88edc293 100644 --- a/include/linux/hmm.h +++ b/include/linux/hmm.h @@ -77,8 +77,34 @@ #include #include #include +#include -struct hmm; + +/* + * struct hmm - HMM per mm struct + * + * @mm: mm struct this HMM struct is bound to + * @lock: lock protecting ranges list + * @ranges: list of range being snapshotted + * @mirrors: list of mirrors for this mm + * @mmu_notifier: mmu notifier to track updates to CPU page table + * @mirrors_sem: read/write semaphore protecting the mirrors list + * @wq: wait queue for user waiting on a range invalidation + * @notifiers: count of active mmu notifiers + * @dead: is the mm dead ? + */ +struct hmm { + struct mm_struct *mm; + struct kref kref; + struct mutex lock; + struct list_head ranges; + struct list_head mirrors; + struct mmu_notifier mmu_notifier; + struct rw_semaphore mirrors_sem; + wait_queue_head_t wq; + long notifiers; + bool dead; +}; /* * hmm_pfn_flag_e - HMM flag enums @@ -155,6 +181,38 @@ struct hmm_range { bool valid; }; +/* + * hmm_range_wait_until_valid() - wait for range to be valid + * @range: range affected by invalidation to wait on + * @timeout: time out for wait in ms (ie abort wait after that period of time) + * Returns: true if the range is valid, false otherwise. + */ +static inline bool hmm_range_wait_until_valid(struct hmm_range *range, + unsigned long timeout) +{ + /* Check if mm is dead ? */ + if (range->hmm == NULL || range->hmm->dead || range->hmm->mm == NULL) { + range->valid = false; + return false; + } + if (range->valid) + return true; + wait_event_timeout(range->hmm->wq, range->valid || range->hmm->dead, + msecs_to_jiffies(timeout)); + /* Return current valid status just in case we get lucky */ + return range->valid; +} + +/* + * hmm_range_valid() - test if a range is valid or not + * @range: range + * Returns: true if the range is valid, false otherwise. + */ +static inline bool hmm_range_valid(struct hmm_range *range) +{ + return range->valid; +} + /* * hmm_pfn_to_page() - return struct page pointed to by a valid HMM pfn * @range: range use to decode HMM pfn value @@ -357,51 +415,133 @@ void hmm_mirror_unregister(struct hmm_mirror *mirror); /* - * To snapshot the CPU page table, call hmm_vma_get_pfns(), then take a device - * driver lock that serializes device page table updates, then call - * hmm_vma_range_done(), to check if the snapshot is still valid. The same - * device driver page table update lock must also be used in the - * hmm_mirror_ops.sync_cpu_device_pagetables() callback, so that CPU page - * table invalidation serializes on it. + * To snapshot the CPU page table you first have to call hmm_range_register() + * to register the range. If hmm_range_register() return an error then some- + * thing is horribly wrong and you should fail loudly. If it returned true then + * you can wait for the range to be stable with hmm_range_wait_until_valid() + * function, a range is valid when there are no concurrent changes to the CPU + * page table for the range. + * + * Once the range is valid you can call hmm_range_snapshot() if that returns + * without error then you can take your device page table lock (the same lock + * you use in the HMM mirror sync_cpu_device_pagetables() callback). After + * taking that lock you have to check the range validity, if it is still valid + * (ie hmm_range_valid() returns true) then you can program the device page + * table, otherwise you have to start again. Pseudo code: + * + * mydevice_prefault(mydevice, mm, start, end) + * { + * struct hmm_range range; + * ... * - * YOU MUST CALL hmm_vma_range_done() ONCE AND ONLY ONCE EACH TIME YOU CALL - * hmm_range_snapshot() WITHOUT ERROR ! + * ret = hmm_range_register(&range, mm, start, end); + * if (ret) + * return ret; * - * IF YOU DO NOT FOLLOW THE ABOVE RULE THE SNAPSHOT CONTENT MIGHT BE INVALID ! - */ -long hmm_range_snapshot(struct hmm_range *range); -bool hmm_vma_range_done(struct hmm_range *range); - - -/* - * Fault memory on behalf of device driver. Unlike handle_mm_fault(), this will - * not migrate any device memory back to system memory. The HMM pfn array will - * be updated with the fault result and current snapshot of the CPU page table - * for the range. + * down_read(mm->mmap_sem); + * again: + * + * if (!hmm_range_wait_until_valid(&range, TIMEOUT)) { + * up_read(&mm->mmap_sem); + * hmm_range_unregister(range); + * // Handle time out, either sleep or retry or something else + * ... + * return -ESOMETHING; || goto again; + * } + * + * ret = hmm_range_snapshot(&range); or hmm_range_fault(&range); + * if (ret == -EAGAIN) { + * down_read(mm->mmap_sem); + * goto again; + * } else if (ret == -EBUSY) { + * goto again; + * } + * + * up_read(&mm->mmap_sem); + * if (ret) { + * hmm_range_unregister(range); + * return ret; + * } + * + * // It might not have snap-shoted the whole range but only the first + * // npages, the return values is the number of valid pages from the + * // start of the range. + * npages = ret; * - * The mmap_sem must be taken in read mode before entering and it might be - * dropped by the function if the block argument is false. In that case, the - * function returns -EAGAIN. + * ... * - * Return value does not reflect if the fault was successful for every single - * address or not. Therefore, the caller must to inspect the HMM pfn array to - * determine fault status for each address. + * mydevice_page_table_lock(mydevice); + * if (!hmm_range_valid(range)) { + * mydevice_page_table_unlock(mydevice); + * goto again; + * } * - * Trying to fault inside an invalid vma will result in -EINVAL. + * mydevice_populate_page_table(mydevice, range, npages); + * ... + * mydevice_take_page_table_unlock(mydevice); + * hmm_range_unregister(range); * - * See the function description in mm/hmm.c for further documentation. + * return 0; + * } + * + * The same scheme apply to hmm_range_fault() (ie replace hmm_range_snapshot() + * with hmm_range_fault() in above pseudo code). + * + * YOU MUST CALL hmm_range_unregister() ONCE AND ONLY ONCE EACH TIME YOU CALL + * hmm_range_register() AND hmm_range_register() RETURNED TRUE ! IF YOU DO NOT + * FOLLOW THIS RULE MEMORY CORRUPTION WILL ENSUE ! */ +int hmm_range_register(struct hmm_range *range, + struct mm_struct *mm, + unsigned long start, + unsigned long end); +void hmm_range_unregister(struct hmm_range *range); +long hmm_range_snapshot(struct hmm_range *range); long hmm_range_fault(struct hmm_range *range, bool block); +/* + * HMM_RANGE_DEFAULT_TIMEOUT - default timeout (ms) when waiting for a range + * + * When waiting for mmu notifiers we need some kind of time out otherwise we + * could potentialy wait for ever, 1000ms ie 1s sounds like a long time to + * wait already. + */ +#define HMM_RANGE_DEFAULT_TIMEOUT 1000 + /* This is a temporary helper to avoid merge conflict between trees. */ +static inline bool hmm_vma_range_done(struct hmm_range *range) +{ + bool ret = hmm_range_valid(range); + + hmm_range_unregister(range); + return ret; +} + static inline int hmm_vma_fault(struct hmm_range *range, bool block) { - long ret = hmm_range_fault(range, block); - if (ret == -EBUSY) - ret = -EAGAIN; - else if (ret == -EAGAIN) - ret = -EBUSY; - return ret < 0 ? ret : 0; + long ret; + + ret = hmm_range_register(range, range->vma->vm_mm, + range->start, range->end); + if (ret) + return (int)ret; + + if (!hmm_range_wait_until_valid(range, HMM_RANGE_DEFAULT_TIMEOUT)) { + up_read(&range->vma->vm_mm->mmap_sem); + return -EAGAIN; + } + + ret = hmm_range_fault(range, block); + if (ret <= 0) { + if (ret == -EBUSY || !ret) { + up_read(&range->vma->vm_mm->mmap_sem); + ret = -EBUSY; + } else if (ret == -EAGAIN) + ret = -EBUSY; + hmm_range_unregister(range); + return ret; + } + return 0; } /* Below are for HMM internal use only! Not to be used by device driver! */ diff --git a/mm/hmm.c b/mm/hmm.c index 04235455b4d2..860ebe5d4b07 100644 --- a/mm/hmm.c +++ b/mm/hmm.c @@ -38,26 +38,6 @@ #if IS_ENABLED(CONFIG_HMM_MIRROR) static const struct mmu_notifier_ops hmm_mmu_notifier_ops; -/* - * struct hmm - HMM per mm struct - * - * @mm: mm struct this HMM struct is bound to - * @lock: lock protecting ranges list - * @ranges: list of range being snapshotted - * @mirrors: list of mirrors for this mm - * @mmu_notifier: mmu notifier to track updates to CPU page table - * @mirrors_sem: read/write semaphore protecting the mirrors list - */ -struct hmm { - struct mm_struct *mm; - struct kref kref; - spinlock_t lock; - struct list_head ranges; - struct list_head mirrors; - struct mmu_notifier mmu_notifier; - struct rw_semaphore mirrors_sem; -}; - static inline struct hmm *hmm_get(struct mm_struct *mm) { struct hmm *hmm = READ_ONCE(mm->hmm); @@ -87,12 +67,15 @@ static struct hmm *hmm_register(struct mm_struct *mm) hmm = kmalloc(sizeof(*hmm), GFP_KERNEL); if (!hmm) return NULL; + init_waitqueue_head(&hmm->wq); INIT_LIST_HEAD(&hmm->mirrors); init_rwsem(&hmm->mirrors_sem); hmm->mmu_notifier.ops = NULL; INIT_LIST_HEAD(&hmm->ranges); - spin_lock_init(&hmm->lock); + mutex_init(&hmm->lock); kref_init(&hmm->kref); + hmm->notifiers = 0; + hmm->dead = false; hmm->mm = mm; spin_lock(&mm->page_table_lock); @@ -154,6 +137,7 @@ void hmm_mm_destroy(struct mm_struct *mm) mm->hmm = NULL; if (hmm) { hmm->mm = NULL; + hmm->dead = true; spin_unlock(&mm->page_table_lock); hmm_put(hmm); return; @@ -162,43 +146,22 @@ void hmm_mm_destroy(struct mm_struct *mm) spin_unlock(&mm->page_table_lock); } -static int hmm_invalidate_range(struct hmm *hmm, bool device, - const struct hmm_update *update) +static void hmm_release(struct mmu_notifier *mn, struct mm_struct *mm) { + struct hmm *hmm = hmm_get(mm); struct hmm_mirror *mirror; struct hmm_range *range; - spin_lock(&hmm->lock); - list_for_each_entry(range, &hmm->ranges, list) { - if (update->end < range->start || update->start >= range->end) - continue; + /* Report this HMM as dying. */ + hmm->dead = true; + /* Wake-up everyone waiting on any range. */ + mutex_lock(&hmm->lock); + list_for_each_entry(range, &hmm->ranges, list) { range->valid = false; } - spin_unlock(&hmm->lock); - - if (!device) - return 0; - - down_read(&hmm->mirrors_sem); - list_for_each_entry(mirror, &hmm->mirrors, list) { - int ret; - - ret = mirror->ops->sync_cpu_device_pagetables(mirror, update); - if (!update->blockable && ret == -EAGAIN) { - up_read(&hmm->mirrors_sem); - return -EAGAIN; - } - } - up_read(&hmm->mirrors_sem); - - return 0; -} - -static void hmm_release(struct mmu_notifier *mn, struct mm_struct *mm) -{ - struct hmm_mirror *mirror; - struct hmm *hmm = hmm_get(mm); + wake_up_all(&hmm->wq); + mutex_unlock(&hmm->lock); down_write(&hmm->mirrors_sem); mirror = list_first_entry_or_null(&hmm->mirrors, struct hmm_mirror, @@ -224,44 +187,88 @@ static void hmm_release(struct mmu_notifier *mn, struct mm_struct *mm) } static int hmm_invalidate_range_start(struct mmu_notifier *mn, - const struct mmu_notifier_range *range) + const struct mmu_notifier_range *nrange) { + struct hmm *hmm = hmm_get(nrange->mm); + struct hmm_mirror *mirror; struct hmm_update update; - struct hmm *hmm = hmm_get(range->mm); - int ret; + struct hmm_range *range; + int ret = 0; VM_BUG_ON(!hmm); /* Check if hmm_mm_destroy() was call. */ if (hmm->mm == NULL) - return 0; + goto out; - update.start = range->start; - update.end = range->end; + update.start = nrange->start; + update.end = nrange->end; update.event = HMM_UPDATE_INVALIDATE; - update.blockable = range->blockable; - ret = hmm_invalidate_range(hmm, true, &update); + update.blockable = nrange->blockable; + + if (!nrange->blockable && !mutex_trylock(&hmm->lock)) { + ret = -EAGAIN; + goto out; + } else + mutex_lock(&hmm->lock); + hmm->notifiers++; + list_for_each_entry(range, &hmm->ranges, list) { + if (update.end < range->start || update.start >= range->end) + continue; + + range->valid = false; + } + mutex_unlock(&hmm->lock); + + + if (!nrange->blockable && !down_read_trylock(&hmm->mirrors_sem)) { + ret = -EAGAIN; + goto out; + } else + down_read(&hmm->mirrors_sem); + list_for_each_entry(mirror, &hmm->mirrors, list) { + int ret; + + ret = mirror->ops->sync_cpu_device_pagetables(mirror, &update); + if (!update.blockable && ret == -EAGAIN) { + up_read(&hmm->mirrors_sem); + ret = -EAGAIN; + goto out; + } + } + up_read(&hmm->mirrors_sem); + +out: hmm_put(hmm); return ret; } static void hmm_invalidate_range_end(struct mmu_notifier *mn, - const struct mmu_notifier_range *range) + const struct mmu_notifier_range *nrange) { - struct hmm_update update; - struct hmm *hmm = hmm_get(range->mm); + struct hmm *hmm = hmm_get(nrange->mm); VM_BUG_ON(!hmm); /* Check if hmm_mm_destroy() was call. */ if (hmm->mm == NULL) - return; + goto out; - update.start = range->start; - update.end = range->end; - update.event = HMM_UPDATE_INVALIDATE; - update.blockable = true; - hmm_invalidate_range(hmm, false, &update); + mutex_lock(&hmm->lock); + hmm->notifiers--; + if (!hmm->notifiers) { + struct hmm_range *range; + + list_for_each_entry(range, &hmm->ranges, list) { + if (range->valid) + continue; + range->valid = true; + } + wake_up_all(&hmm->wq); + } + mutex_unlock(&hmm->lock); + +out: hmm_put(hmm); } @@ -413,7 +420,6 @@ static inline void hmm_pte_need_fault(const struct hmm_vma_walk *hmm_vma_walk, { struct hmm_range *range = hmm_vma_walk->range; - *fault = *write_fault = false; if (!hmm_vma_walk->fault) return; @@ -452,10 +458,11 @@ static void hmm_range_need_fault(const struct hmm_vma_walk *hmm_vma_walk, return; } + *fault = *write_fault = false; for (i = 0; i < npages; ++i) { hmm_pte_need_fault(hmm_vma_walk, pfns[i], cpu_flags, fault, write_fault); - if ((*fault) || (*write_fault)) + if ((*write_fault)) return; } } @@ -710,156 +717,152 @@ static void hmm_pfns_special(struct hmm_range *range) } /* - * hmm_range_snapshot() - snapshot CPU page table for a range + * hmm_range_register() - start tracking change to CPU page table over a range * @range: range - * Returns: -EINVAL if invalid argument, -ENOMEM out of memory, -EPERM invalid - * permission (for instance asking for write and range is read only), - * -EAGAIN if you need to retry, -EFAULT invalid (ie either no valid - * vma or it is illegal to access that range), number of valid pages - * in range->pfns[] (from range start address). + * @mm: the mm struct for the range of virtual address + * @start: start virtual address (inclusive) + * @end: end virtual address (exclusive) + * Returns 0 on success, -EFAULT if the address space is no longer valid * - * This snapshots the CPU page table for a range of virtual addresses. Snapshot - * validity is tracked by range struct. See hmm_vma_range_done() for further - * information. + * Track updates to the CPU page table see include/linux/hmm.h */ -long hmm_range_snapshot(struct hmm_range *range) +int hmm_range_register(struct hmm_range *range, + struct mm_struct *mm, + unsigned long start, + unsigned long end) { - struct vm_area_struct *vma = range->vma; - struct hmm_vma_walk hmm_vma_walk; - struct mm_walk mm_walk; - struct hmm *hmm; - + range->start = start & PAGE_MASK; + range->end = end & PAGE_MASK; + range->valid = false; range->hmm = NULL; - /* Sanity check, this really should not happen ! */ - if (range->start < vma->vm_start || range->start >= vma->vm_end) - return -EINVAL; - if (range->end < vma->vm_start || range->end > vma->vm_end) + if (range->start >= range->end) return -EINVAL; - hmm = hmm_register(vma->vm_mm); - if (!hmm) - return -ENOMEM; + range->hmm = hmm_register(mm); + if (!range->hmm) + return -EFAULT; /* Check if hmm_mm_destroy() was call. */ - if (hmm->mm == NULL) { - hmm_put(hmm); - return -EINVAL; + if (range->hmm->mm == NULL || range->hmm->dead) { + hmm_put(range->hmm); + return -EFAULT; } - /* FIXME support hugetlb fs */ - if (is_vm_hugetlb_page(vma) || (vma->vm_flags & VM_SPECIAL) || - vma_is_dax(vma)) { - hmm_pfns_special(range); - hmm_put(hmm); - return -EINVAL; - } + /* Initialize range to track CPU page table update */ + mutex_lock(&range->hmm->lock); - if (!(vma->vm_flags & VM_READ)) { - /* - * If vma do not allow read access, then assume that it does - * not allow write access, either. Architecture that allow - * write without read access are not supported by HMM, because - * operations such has atomic access would not work. - */ - hmm_pfns_clear(range, range->pfns, range->start, range->end); - hmm_put(hmm); - return -EPERM; - } + list_add_rcu(&range->list, &range->hmm->ranges); - /* Initialize range to track CPU page table update */ - spin_lock(&hmm->lock); - range->valid = true; - list_add_rcu(&range->list, &hmm->ranges); - spin_unlock(&hmm->lock); - - hmm_vma_walk.fault = false; - hmm_vma_walk.range = range; - mm_walk.private = &hmm_vma_walk; - hmm_vma_walk.last = range->start; - - mm_walk.vma = vma; - mm_walk.mm = vma->vm_mm; - mm_walk.pte_entry = NULL; - mm_walk.test_walk = NULL; - mm_walk.hugetlb_entry = NULL; - mm_walk.pmd_entry = hmm_vma_walk_pmd; - mm_walk.pte_hole = hmm_vma_walk_hole; - - walk_page_range(range->start, range->end, &mm_walk); /* - * Transfer hmm reference to the range struct it will be drop inside - * the hmm_vma_range_done() function (which _must_ be call if this - * function return 0). + * If there are any concurrent notifiers we have to wait for them for + * the range to be valid (see hmm_range_wait_until_valid()). */ - range->hmm = hmm; - return (hmm_vma_walk.last - range->start) >> PAGE_SHIFT; + if (!range->hmm->notifiers) + range->valid = true; + mutex_unlock(&range->hmm->lock); + + return 0; } -EXPORT_SYMBOL(hmm_range_snapshot); +EXPORT_SYMBOL(hmm_range_register); /* - * hmm_vma_range_done() - stop tracking change to CPU page table over a range - * @range: range being tracked - * Returns: false if range data has been invalidated, true otherwise + * hmm_range_unregister() - stop tracking change to CPU page table over a range + * @range: range * * Range struct is used to track updates to the CPU page table after a call to - * either hmm_vma_get_pfns() or hmm_vma_fault(). Once the device driver is done - * using the data, or wants to lock updates to the data it got from those - * functions, it must call the hmm_vma_range_done() function, which will then - * stop tracking CPU page table updates. - * - * Note that device driver must still implement general CPU page table update - * tracking either by using hmm_mirror (see hmm_mirror_register()) or by using - * the mmu_notifier API directly. - * - * CPU page table update tracking done through hmm_range is only temporary and - * to be used while trying to duplicate CPU page table contents for a range of - * virtual addresses. - * - * There are two ways to use this : - * again: - * hmm_vma_get_pfns(range); or hmm_vma_fault(...); - * trans = device_build_page_table_update_transaction(pfns); - * device_page_table_lock(); - * if (!hmm_vma_range_done(range)) { - * device_page_table_unlock(); - * goto again; - * } - * device_commit_transaction(trans); - * device_page_table_unlock(); - * - * Or: - * hmm_vma_get_pfns(range); or hmm_vma_fault(...); - * device_page_table_lock(); - * hmm_vma_range_done(range); - * device_update_page_table(range->pfns); - * device_page_table_unlock(); + * hmm_range_register(). See include/linux/hmm.h for how to use it. */ -bool hmm_vma_range_done(struct hmm_range *range) +void hmm_range_unregister(struct hmm_range *range) { - bool ret = false; - /* Sanity check this really should not happen. */ - if (range->hmm == NULL || range->end <= range->start) { - BUG(); - return false; - } + if (range->hmm == NULL || range->end <= range->start) + return; - spin_lock(&range->hmm->lock); + mutex_lock(&range->hmm->lock); list_del_rcu(&range->list); - ret = range->valid; - spin_unlock(&range->hmm->lock); - - /* Is the mm still alive ? */ - if (range->hmm->mm == NULL) - ret = false; + mutex_unlock(&range->hmm->lock); - /* Drop reference taken by hmm_vma_fault() or hmm_vma_get_pfns() */ + /* Drop reference taken by hmm_range_register() */ + range->valid = false; hmm_put(range->hmm); range->hmm = NULL; - return ret; } -EXPORT_SYMBOL(hmm_vma_range_done); +EXPORT_SYMBOL(hmm_range_unregister); + +/* + * hmm_range_snapshot() - snapshot CPU page table for a range + * @range: range + * Returns: -EINVAL if invalid argument, -ENOMEM out of memory, -EPERM invalid + * permission (for instance asking for write and range is read only), + * -EAGAIN if you need to retry, -EFAULT invalid (ie either no valid + * vma or it is illegal to access that range), number of valid pages + * in range->pfns[] (from range start address). + * + * This snapshots the CPU page table for a range of virtual addresses. Snapshot + * validity is tracked by range struct. See in include/linux/hmm.h for example + * on how to use. + */ +long hmm_range_snapshot(struct hmm_range *range) +{ + unsigned long start = range->start, end; + struct hmm_vma_walk hmm_vma_walk; + struct hmm *hmm = range->hmm; + struct vm_area_struct *vma; + struct mm_walk mm_walk; + + /* Check if hmm_mm_destroy() was call. */ + if (hmm->mm == NULL || hmm->dead) + return -EFAULT; + + do { + /* If range is no longer valid force retry. */ + if (!range->valid) + return -EAGAIN; + + vma = find_vma(hmm->mm, start); + if (vma == NULL || (vma->vm_flags & VM_SPECIAL)) + return -EFAULT; + + /* FIXME support hugetlb fs/dax */ + if (is_vm_hugetlb_page(vma) || vma_is_dax(vma)) { + hmm_pfns_special(range); + return -EINVAL; + } + + if (!(vma->vm_flags & VM_READ)) { + /* + * If vma do not allow read access, then assume that it + * does not allow write access, either. HMM does not + * support architecture that allow write without read. + */ + hmm_pfns_clear(range, range->pfns, + range->start, range->end); + return -EPERM; + } + + range->vma = vma; + hmm_vma_walk.last = start; + hmm_vma_walk.fault = false; + hmm_vma_walk.range = range; + mm_walk.private = &hmm_vma_walk; + end = min(range->end, vma->vm_end); + + mm_walk.vma = vma; + mm_walk.mm = vma->vm_mm; + mm_walk.pte_entry = NULL; + mm_walk.test_walk = NULL; + mm_walk.hugetlb_entry = NULL; + mm_walk.pmd_entry = hmm_vma_walk_pmd; + mm_walk.pte_hole = hmm_vma_walk_hole; + + walk_page_range(start, end, &mm_walk); + start = end; + } while (start < range->end); + + return (hmm_vma_walk.last - range->start) >> PAGE_SHIFT; +} +EXPORT_SYMBOL(hmm_range_snapshot); /* * hmm_range_fault() - try to fault some address in a virtual address range @@ -893,96 +896,79 @@ EXPORT_SYMBOL(hmm_vma_range_done); */ long hmm_range_fault(struct hmm_range *range, bool block) { - struct vm_area_struct *vma = range->vma; - unsigned long start = range->start; + unsigned long start = range->start, end; struct hmm_vma_walk hmm_vma_walk; + struct hmm *hmm = range->hmm; + struct vm_area_struct *vma; struct mm_walk mm_walk; - struct hmm *hmm; int ret; - range->hmm = NULL; - - /* Sanity check, this really should not happen ! */ - if (range->start < vma->vm_start || range->start >= vma->vm_end) - return -EINVAL; - if (range->end < vma->vm_start || range->end > vma->vm_end) - return -EINVAL; + /* Check if hmm_mm_destroy() was call. */ + if (hmm->mm == NULL || hmm->dead) + return -EFAULT; - hmm = hmm_register(vma->vm_mm); - if (!hmm) { - hmm_pfns_clear(range, range->pfns, range->start, range->end); - return -ENOMEM; - } + do { + /* If range is no longer valid force retry. */ + if (!range->valid) { + up_read(&hmm->mm->mmap_sem); + return -EAGAIN; + } - /* Check if hmm_mm_destroy() was call. */ - if (hmm->mm == NULL) { - hmm_put(hmm); - return -EINVAL; - } + vma = find_vma(hmm->mm, start); + if (vma == NULL || (vma->vm_flags & VM_SPECIAL)) + return -EFAULT; - /* FIXME support hugetlb fs */ - if (is_vm_hugetlb_page(vma) || (vma->vm_flags & VM_SPECIAL) || - vma_is_dax(vma)) { - hmm_pfns_special(range); - hmm_put(hmm); - return -EINVAL; - } + /* FIXME support hugetlb fs/dax */ + if (is_vm_hugetlb_page(vma) || vma_is_dax(vma)) { + hmm_pfns_special(range); + return -EINVAL; + } - if (!(vma->vm_flags & VM_READ)) { - /* - * If vma do not allow read access, then assume that it does - * not allow write access, either. Architecture that allow - * write without read access are not supported by HMM, because - * operations such has atomic access would not work. - */ - hmm_pfns_clear(range, range->pfns, range->start, range->end); - hmm_put(hmm); - return -EPERM; - } + if (!(vma->vm_flags & VM_READ)) { + /* + * If vma do not allow read access, then assume that it + * does not allow write access, either. HMM does not + * support architecture that allow write without read. + */ + hmm_pfns_clear(range, range->pfns, + range->start, range->end); + return -EPERM; + } - /* Initialize range to track CPU page table update */ - spin_lock(&hmm->lock); - range->valid = true; - list_add_rcu(&range->list, &hmm->ranges); - spin_unlock(&hmm->lock); - - hmm_vma_walk.fault = true; - hmm_vma_walk.block = block; - hmm_vma_walk.range = range; - mm_walk.private = &hmm_vma_walk; - hmm_vma_walk.last = range->start; - - mm_walk.vma = vma; - mm_walk.mm = vma->vm_mm; - mm_walk.pte_entry = NULL; - mm_walk.test_walk = NULL; - mm_walk.hugetlb_entry = NULL; - mm_walk.pmd_entry = hmm_vma_walk_pmd; - mm_walk.pte_hole = hmm_vma_walk_hole; + range->vma = vma; + hmm_vma_walk.last = start; + hmm_vma_walk.fault = true; + hmm_vma_walk.block = block; + hmm_vma_walk.range = range; + mm_walk.private = &hmm_vma_walk; + end = min(range->end, vma->vm_end); + + mm_walk.vma = vma; + mm_walk.mm = vma->vm_mm; + mm_walk.pte_entry = NULL; + mm_walk.test_walk = NULL; + mm_walk.hugetlb_entry = NULL; + mm_walk.pmd_entry = hmm_vma_walk_pmd; + mm_walk.pte_hole = hmm_vma_walk_hole; + + do { + ret = walk_page_range(start, end, &mm_walk); + start = hmm_vma_walk.last; + + /* Keep trying while the range is valid. */ + } while (ret == -EBUSY && range->valid); + + if (ret) { + unsigned long i; + + i = (hmm_vma_walk.last - range->start) >> PAGE_SHIFT; + hmm_pfns_clear(range, &range->pfns[i], + hmm_vma_walk.last, range->end); + return ret; + } + start = end; - do { - ret = walk_page_range(start, range->end, &mm_walk); - start = hmm_vma_walk.last; - /* Keep trying while the range is valid. */ - } while (ret == -EBUSY && range->valid); - - if (ret) { - unsigned long i; - - i = (hmm_vma_walk.last - range->start) >> PAGE_SHIFT; - hmm_pfns_clear(range, &range->pfns[i], hmm_vma_walk.last, - range->end); - hmm_vma_range_done(range); - hmm_put(hmm); - return ret; - } else { - /* - * Transfer hmm reference to the range struct it will be drop - * inside the hmm_vma_range_done() function (which _must_ be - * call if this function return 0). - */ - range->hmm = hmm; - } + } while (start < range->end); return (hmm_vma_walk.last - range->start) >> PAGE_SHIFT; } From patchwork Tue Jan 29 16:54:24 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Jerome Glisse X-Patchwork-Id: 10786609 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B3C111390 for ; Tue, 29 Jan 2019 16:54:57 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A0F7E2CDF2 for ; Tue, 29 Jan 2019 16:54:57 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 9506A2D462; Tue, 29 Jan 2019 16:54:57 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1CCEA2CDF2 for ; Tue, 29 Jan 2019 16:54:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C661D8E0001; Tue, 29 Jan 2019 11:54:48 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id B2BFA8E0008; Tue, 29 Jan 2019 11:54:48 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 92F6F8E0001; Tue, 29 Jan 2019 11:54:48 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qk1-f200.google.com (mail-qk1-f200.google.com [209.85.222.200]) by kanga.kvack.org (Postfix) with ESMTP id 603D78E0008 for ; Tue, 29 Jan 2019 11:54:48 -0500 (EST) Received: by mail-qk1-f200.google.com with SMTP id w185so22504642qka.9 for ; Tue, 29 Jan 2019 08:54:48 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=PI7GUp8ZvA3zxJ+GcKn6vnogCxLfh9GqOCOpMvvb278=; b=CVVNfuwOxG4ly1CMsjF7FCanmhmTbsCIYXCAIl0VV5mq2b8lHJWTbDfr9xkD2zkBUs 2akZE3ccHbhmr7IAOx88cHGKwlfq9Undr75cltlMqz6HKSfUXnc1yVv/8SqSqTT6lSTY Xe9mgvpV5SuQVqYBtk+3GP7i+B4ETLGxHmHA4vLLJw27P/VTOaiSq1gLSmgbejUVycba 9EJGT68siWlYDq4h5eL5yxijwgzYjnDct+DRltW+egHOojb3UB5uAX9UByRKvzM3Ur6G ZV/Us6y2ea/mHWTnF1HVvVmm9m0y0bg+FXGA/VNWG55qbzBnPP9NM+QpUYEp92WooE7L PXxA== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com X-Gm-Message-State: AJcUuke9A1v6wE1Xc+9Kc5dyNfIhVE3k11YZaLjmgoVtSw4ZmMGLqDzk ueUYt3bl3xhacXwmi03B/YFV/K3KlMQgWxN47djSDO8autYI6S76u2cWYJNFfhrgAPpUxEQjPgv ma1HKYVVEVjlaY7n+DGqvNQmLSJfF/FeyjCW9z6l0aPOHVLWC5JvncJEHf6a8CH7yxw== X-Received: by 2002:a0c:fb4c:: with SMTP id b12mr24780051qvq.177.1548780888153; Tue, 29 Jan 2019 08:54:48 -0800 (PST) X-Google-Smtp-Source: ALg8bN5kVcyFshSqooJSSkoC8fqn4mCpgffHbDbZ5OTxC0rwEhduhH0W/vMKldCDs3KAxldmMNR+ X-Received: by 2002:a0c:fb4c:: with SMTP id b12mr24780017qvq.177.1548780887676; Tue, 29 Jan 2019 08:54:47 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1548780887; cv=none; d=google.com; s=arc-20160816; b=QIO5Fc3VUML9VM5yLE29XeXPhzg92V+dJXbJd6tjY6pbFllUqMI5IBz3GXSIe0WXLB BcmnfyiBorjPwT8vp0uTbhvUsmmry8tmyf344QxOR3+UQn+C8dupqgoVIrTmhbCMyosu DGF4u2acr8ej0P3P4TWoN0rSI2IeO/S7uJuD0w9zG9z6rHnc7aaePhww/YHH2WS+ZozI dAOceIT6mQ+j72weqO0dAz2OJTUsgh6DZMTnCYPJmPy0kF7DsomFDV4rORW78XctKXIC 2taoDI1NbhStYcodVk+NerXmRnXnZaPKMPZ3LzI1fEJtDyo7hVXrcZCmstUeXseA+2Ym 9FLw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from; bh=PI7GUp8ZvA3zxJ+GcKn6vnogCxLfh9GqOCOpMvvb278=; b=v0qEH9M9Uy5dg8MbfSGAQSZs6+ioy43DivXycqZ5zexFOdxV10KpCFtrHi2FLleviU Ep+FCbeppdOYJX1tYdZcY2xwK6IEdC9LJVbCfhYSGzsCWDp+avwz/J7NVs9iPHJJuryE iNeMlk8piDXJRwJKcA17OpYyYkuq4Mr++8FA0oo5jPZ1GFsipDAa/rvBfhC3q4a6oJRJ vvuuOd42ozyRg42COX3GwCnO/vWIpcnQXHWScU6LKaHXBODfJL8k7NzNaKRj9p8lHAA9 n4p36fJZiTjnmmrJS7f/qz8/ztolyRTp+qNMjCDvcRZ+Gk3U/bt0yUGGh0ZbAgZd7df2 Ioqg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from mx1.redhat.com (mx1.redhat.com. [209.132.183.28]) by mx.google.com with ESMTPS id 51si1142969qvt.60.2019.01.29.08.54.47 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 29 Jan 2019 08:54:47 -0800 (PST) Received-SPF: pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) client-ip=209.132.183.28; Authentication-Results: mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id C76D680F79; Tue, 29 Jan 2019 16:54:46 +0000 (UTC) Received: from localhost.localdomain.com (ovpn-122-2.rdu2.redhat.com [10.10.122.2]) by smtp.corp.redhat.com (Postfix) with ESMTP id C1EC4102BCEB; Tue, 29 Jan 2019 16:54:45 +0000 (UTC) From: jglisse@redhat.com To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Andrew Morton , Ralph Campbell , John Hubbard Subject: [PATCH 06/10] mm/hmm: add default fault flags to avoid the need to pre-fill pfns arrays. Date: Tue, 29 Jan 2019 11:54:24 -0500 Message-Id: <20190129165428.3931-7-jglisse@redhat.com> In-Reply-To: <20190129165428.3931-1-jglisse@redhat.com> References: <20190129165428.3931-1-jglisse@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.27]); Tue, 29 Jan 2019 16:54:46 +0000 (UTC) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Jérôme Glisse The HMM mirror API can be use in two fashions. The first one where the HMM user coalesce multiple page faults into one request and set flags per pfns for of those faults. The second one where the HMM user want to pre-fault a range with specific flags. For the latter one it is a waste to have the user pre-fill the pfn arrays with a default flags value. This patch adds a default flags value allowing user to set them for a range without having to pre-fill the pfn array. Signed-off-by: Jérôme Glisse Cc: Andrew Morton Cc: Ralph Campbell Cc: John Hubbard --- include/linux/hmm.h | 7 +++++++ mm/hmm.c | 12 ++++++++++++ 2 files changed, 19 insertions(+) diff --git a/include/linux/hmm.h b/include/linux/hmm.h index 93dc88edc293..4263f8fb32e5 100644 --- a/include/linux/hmm.h +++ b/include/linux/hmm.h @@ -165,6 +165,8 @@ enum hmm_pfn_value_e { * @pfns: array of pfns (big enough for the range) * @flags: pfn flags to match device driver page table * @values: pfn value for some special case (none, special, error, ...) + * @default_flags: default flags for the range (write, read, ...) + * @pfn_flags_mask: allows to mask pfn flags so that only default_flags matter * @pfn_shifts: pfn shift value (should be <= PAGE_SHIFT) * @valid: pfns array did not change since it has been fill by an HMM function */ @@ -177,6 +179,8 @@ struct hmm_range { uint64_t *pfns; const uint64_t *flags; const uint64_t *values; + uint64_t default_flags; + uint64_t pfn_flags_mask; uint8_t pfn_shift; bool valid; }; @@ -521,6 +525,9 @@ static inline int hmm_vma_fault(struct hmm_range *range, bool block) { long ret; + range->default_flags = 0; + range->pfn_flags_mask = -1UL; + ret = hmm_range_register(range, range->vma->vm_mm, range->start, range->end); if (ret) diff --git a/mm/hmm.c b/mm/hmm.c index 860ebe5d4b07..0a4ff31e9d7a 100644 --- a/mm/hmm.c +++ b/mm/hmm.c @@ -423,6 +423,18 @@ static inline void hmm_pte_need_fault(const struct hmm_vma_walk *hmm_vma_walk, if (!hmm_vma_walk->fault) return; + /* + * So we not only consider the individual per page request we also + * consider the default flags requested for the range. The API can + * be use in 2 fashions. The first one where the HMM user coalesce + * multiple page fault into one request and set flags per pfns for + * of those faults. The second one where the HMM user want to pre- + * fault a range with specific flags. For the latter one it is a + * waste to have the user pre-fill the pfn arrays with a default + * flags value. + */ + pfns = (pfns & range->pfn_flags_mask) | range->default_flags; + /* We aren't ask to do anything ... */ if (!(pfns & range->flags[HMM_PFN_VALID])) return; From patchwork Tue Jan 29 16:54:25 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Jerome Glisse X-Patchwork-Id: 10786611 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A51131390 for ; Tue, 29 Jan 2019 16:55:00 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 934DE2CDF2 for ; Tue, 29 Jan 2019 16:55:00 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 874852D5E1; Tue, 29 Jan 2019 16:55:00 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id CF5512CDF2 for ; Tue, 29 Jan 2019 16:54:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3B41C8E0009; Tue, 29 Jan 2019 11:54:50 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 315B68E0008; Tue, 29 Jan 2019 11:54:50 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1BFE28E0009; Tue, 29 Jan 2019 11:54:50 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qt1-f198.google.com (mail-qt1-f198.google.com [209.85.160.198]) by kanga.kvack.org (Postfix) with ESMTP id D83158E0008 for ; Tue, 29 Jan 2019 11:54:49 -0500 (EST) Received: by mail-qt1-f198.google.com with SMTP id n39so25006460qtn.18 for ; Tue, 29 Jan 2019 08:54:49 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=5VHDy0ljnDYb93eYIPu560MNKs/uH4VBN2aPll6gvwM=; b=bkONQpIBQ7PeZZ2xYysaU+AyC5xHHHRFtjDvw6HXpPZexkubPjwBjihYzbfj8I2bPt /cf6cyWYoga2pIJ1mjWdNdB+Oqv/6wTakQqmi8NmxDZV1jaSI/2WzFgoNP5ZY7jFpYfQ F5vYJVrGZyXbk7bprJGHyevu3ScPEvbKwSxQ19BP3HaKNLsqsP7c4UKP1W/QYPZD+A3o gpAHm3Eg20ktK6xdlmRKN9pMOMSXArqMxCjtrEaFnEKQyXdHTjk8INhDoQTsSRdyrAhE VgJgHOcCXu07QxxJcDiZzw+YCq+gPiiZNd5MzsXs5WpLnxApzCrdQzGvtJGnu4PPwh9r 7cqw== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com X-Gm-Message-State: AJcUukcVLvLc7OPCKgjx3pYp8n27C4MJM0jtEXcDgpLleQ7R4ZAU64cR VAU0n/xn/XaNY4Zb1Vi0nrBBaGSu5gt3fIjR6dvuoOnJprF1mH+vZdb0QvMQGooFQbC+UY/0kSw S8qwrVgyH6DoXOv8gKuqp+gcj1qafPa5BvuSyoNe0hPW4d/2JMaGUgbmolrMXc0fOlg== X-Received: by 2002:a37:9c57:: with SMTP id f84mr24020613qke.176.1548780889652; Tue, 29 Jan 2019 08:54:49 -0800 (PST) X-Google-Smtp-Source: ALg8bN4Jlk1c8feMOlSqrMRTk7C5WtERn8xC1Gbt3ZcKdvnaG7Af+WwIHNuzs4qn6cc7z4nJpLcP X-Received: by 2002:a37:9c57:: with SMTP id f84mr24020575qke.176.1548780889045; Tue, 29 Jan 2019 08:54:49 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1548780889; cv=none; d=google.com; s=arc-20160816; b=sznxDE9xi8IibE/+Ic/4SwONWPFH4Kbm0tYac8v+qdN7GqbwOSx12/35wUZf7Uezwf zwePs/ez32TmJx0yNYhSS3Xw0jsUWl1zjW3l649VYcrF79rcNkMgevE1b27qxU/u2AZD oYHUsGzyLBIGOidDmwCXCdq2FKpNAXZgpnQiD7qJG4qf/RUUihA9J5wlnClYyAnrKa3c A1JBSavtDdhPtjDWbal7/EaLUuaqduv9IiE+/q2mz7JhfBRmVqYLJL05YMfNjvNLGALI aKLvxy+6vK0euzXi2nY99kC4V4AzDiG8v3V1XkLzEQRBfashsfrmf9eJGxTpczSKTaZX Hvtg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from; bh=5VHDy0ljnDYb93eYIPu560MNKs/uH4VBN2aPll6gvwM=; b=Y2rPu1W0QgyPE75J6VtjuCBqBEeWnJhjC861Ki/LzOmV1JQH6Kb/S99lfbx1V/9K0O WoTLLv2bjUDcAF1+dpoZYxfUeZJyQ6a2wSascB8LRrXe1Ly5m0mBNs7K61yRvbtFt2AQ 1T7ks9gRZCFdI1Dn1stnzl2FiDMPBNpltHk9Af66pzgP79dU24lDWzzZcIhmaQrHadBF NRlEQ1BrNG9zlpfVGRJwaGK2D8xEYeJwN81x0+xbdqC4N5sEVPzL046a8+MYYvetVCU2 IyZ6K4wJdRs3SMz5brZBvIos8THCwnn2Ww+DjsgvDGxh5uhIC7Co8z6+XAE8ase1ztVD zLMA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from mx1.redhat.com (mx1.redhat.com. [209.132.183.28]) by mx.google.com with ESMTPS id m36si2136142qvc.170.2019.01.29.08.54.48 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 29 Jan 2019 08:54:49 -0800 (PST) Received-SPF: pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) client-ip=209.132.183.28; Authentication-Results: mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 207177E9D3; Tue, 29 Jan 2019 16:54:48 +0000 (UTC) Received: from localhost.localdomain.com (ovpn-122-2.rdu2.redhat.com [10.10.122.2]) by smtp.corp.redhat.com (Postfix) with ESMTP id 035EF103BAAD; Tue, 29 Jan 2019 16:54:46 +0000 (UTC) From: jglisse@redhat.com To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Andrew Morton , Ralph Campbell , John Hubbard Subject: [PATCH 07/10] mm/hmm: add an helper function that fault pages and map them to a device Date: Tue, 29 Jan 2019 11:54:25 -0500 Message-Id: <20190129165428.3931-8-jglisse@redhat.com> In-Reply-To: <20190129165428.3931-1-jglisse@redhat.com> References: <20190129165428.3931-1-jglisse@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.26]); Tue, 29 Jan 2019 16:54:48 +0000 (UTC) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Jérôme Glisse This is a all in one helper that fault pages in a range and map them to a device so that every single device driver do not have to re-implement this common pattern. Signed-off-by: Jérôme Glisse Cc: Andrew Morton Cc: Ralph Campbell Cc: John Hubbard --- include/linux/hmm.h | 9 +++ mm/hmm.c | 152 ++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 161 insertions(+) diff --git a/include/linux/hmm.h b/include/linux/hmm.h index 4263f8fb32e5..fc3630d0bbfd 100644 --- a/include/linux/hmm.h +++ b/include/linux/hmm.h @@ -502,6 +502,15 @@ int hmm_range_register(struct hmm_range *range, void hmm_range_unregister(struct hmm_range *range); long hmm_range_snapshot(struct hmm_range *range); long hmm_range_fault(struct hmm_range *range, bool block); +long hmm_range_dma_map(struct hmm_range *range, + struct device *device, + dma_addr_t *daddrs, + bool block); +long hmm_range_dma_unmap(struct hmm_range *range, + struct vm_area_struct *vma, + struct device *device, + dma_addr_t *daddrs, + bool dirty); /* * HMM_RANGE_DEFAULT_TIMEOUT - default timeout (ms) when waiting for a range diff --git a/mm/hmm.c b/mm/hmm.c index 0a4ff31e9d7a..9cd68334a759 100644 --- a/mm/hmm.c +++ b/mm/hmm.c @@ -30,6 +30,7 @@ #include #include #include +#include #include #include @@ -985,6 +986,157 @@ long hmm_range_fault(struct hmm_range *range, bool block) return (hmm_vma_walk.last - range->start) >> PAGE_SHIFT; } EXPORT_SYMBOL(hmm_range_fault); + +/* + * hmm_range_dma_map() - hmm_range_fault() and dma map page all in one. + * @range: range being faulted + * @device: device against to dma map page to + * @daddrs: dma address of mapped pages + * @block: allow blocking on fault (if true it sleeps and do not drop mmap_sem) + * Returns: number of pages mapped on success, -EAGAIN if mmap_sem have been + * drop and you need to try again, some other error value otherwise + * + * Note same usage pattern as hmm_range_fault(). + */ +long hmm_range_dma_map(struct hmm_range *range, + struct device *device, + dma_addr_t *daddrs, + bool block) +{ + unsigned long i, npages, mapped; + long ret; + + ret = hmm_range_fault(range, block); + if (ret <= 0) + return ret ? ret : -EBUSY; + + npages = (range->end - range->start) >> PAGE_SHIFT; + for (i = 0, mapped = 0; i < npages; ++i) { + enum dma_data_direction dir = DMA_FROM_DEVICE; + struct page *page; + + /* + * FIXME need to update DMA API to provide invalid DMA address + * value instead of a function to test dma address value. This + * would remove lot of dumb code duplicated accross many arch. + * + * For now setting it to 0 here is good enough as the pfns[] + * value is what is use to check what is valid and what isn't. + */ + daddrs[i] = 0; + + page = hmm_pfn_to_page(range, range->pfns[i]); + if (page == NULL) + continue; + + /* Check if range is being invalidated */ + if (!range->valid) { + ret = -EBUSY; + goto unmap; + } + + /* If it is read and write than map bi-directional. */ + if (range->pfns[i] & range->values[HMM_PFN_WRITE]) + dir = DMA_BIDIRECTIONAL; + + daddrs[i] = dma_map_page(device, page, 0, PAGE_SIZE, dir); + if (dma_mapping_error(device, daddrs[i])) { + ret = -EFAULT; + goto unmap; + } + + mapped++; + } + + return mapped; + +unmap: + for (npages = i, i = 0; (i < npages) && mapped; ++i) { + enum dma_data_direction dir = DMA_FROM_DEVICE; + struct page *page; + + page = hmm_pfn_to_page(range, range->pfns[i]); + if (page == NULL) + continue; + + if (dma_mapping_error(device, daddrs[i])) + continue; + + /* If it is read and write than map bi-directional. */ + if (range->pfns[i] & range->values[HMM_PFN_WRITE]) + dir = DMA_BIDIRECTIONAL; + + dma_unmap_page(device, daddrs[i], PAGE_SIZE, dir); + mapped--; + } + + return ret; +} +EXPORT_SYMBOL(hmm_range_dma_map); + +/* + * hmm_range_dma_unmap() - unmap range of that was map with hmm_range_dma_map() + * @range: range being unmapped + * @vma: the vma against which the range (optional) + * @device: device against which dma map was done + * @daddrs: dma address of mapped pages + * @dirty: dirty page if it had the write flag set + * Returns: number of page unmapped on success, -EINVAL otherwise + * + * Note that caller MUST abide by mmu notifier or use HMM mirror and abide + * to the sync_cpu_device_pagetables() callback so that it is safe here to + * call set_page_dirty(). Caller must also take appropriate locks to avoid + * concurrent mmu notifier or sync_cpu_device_pagetables() to make progress. + */ +long hmm_range_dma_unmap(struct hmm_range *range, + struct vm_area_struct *vma, + struct device *device, + dma_addr_t *daddrs, + bool dirty) +{ + unsigned long i, npages; + long cpages = 0; + + /* Sanity check. */ + if (range->end <= range->start) + return -EINVAL; + if (!daddrs) + return -EINVAL; + if (!range->pfns) + return -EINVAL; + + npages = (range->end - range->start) >> PAGE_SHIFT; + for (i = 0; i < npages; ++i) { + enum dma_data_direction dir = DMA_FROM_DEVICE; + struct page *page; + + page = hmm_pfn_to_page(range, range->pfns[i]); + if (page == NULL) + continue; + + /* If it is read and write than map bi-directional. */ + if (range->pfns[i] & range->values[HMM_PFN_WRITE]) { + dir = DMA_BIDIRECTIONAL; + + /* + * See comments in function description on why it is + * safe here to call set_page_dirty() + */ + if (dirty) + set_page_dirty(page); + } + + /* Unmap and clear pfns/dma address */ + dma_unmap_page(device, daddrs[i], PAGE_SIZE, dir); + range->pfns[i] = range->values[HMM_PFN_NONE]; + /* FIXME see comments in hmm_vma_dma_map() */ + daddrs[i] = 0; + cpages++; + } + + return cpages; +} +EXPORT_SYMBOL(hmm_range_dma_unmap); #endif /* IS_ENABLED(CONFIG_HMM_MIRROR) */ From patchwork Tue Jan 29 16:54:26 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Jerome Glisse X-Patchwork-Id: 10786613 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A20411399 for ; Tue, 29 Jan 2019 16:55:03 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8E5852CDF2 for ; Tue, 29 Jan 2019 16:55:03 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 810A32D462; Tue, 29 Jan 2019 16:55:03 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8BA512CDF2 for ; Tue, 29 Jan 2019 16:55:02 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 254918E000A; Tue, 29 Jan 2019 11:54:53 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 2086B8E0008; Tue, 29 Jan 2019 11:54:53 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 056E38E000A; Tue, 29 Jan 2019 11:54:53 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qt1-f199.google.com (mail-qt1-f199.google.com [209.85.160.199]) by kanga.kvack.org (Postfix) with ESMTP id CDF4E8E0008 for ; Tue, 29 Jan 2019 11:54:52 -0500 (EST) Received: by mail-qt1-f199.google.com with SMTP id n50so25446901qtb.9 for ; Tue, 29 Jan 2019 08:54:52 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=tJ4kFBT5/PoC5R6rCwqTGnIldGPFFgCmYxq4Ju4J1I8=; b=ORYF6Zjxywlp8zfl6Od47Id6C09jD39uj0Br+HbawqVPjTyFOsDM/e9ecgpeJKNXZI X1LQO/yMQHeBk8+mZKR0ZggalW5rr0TBCWumQk17ogoXAjYc6EnpU5czs4EPswxDIsfW hy86rE4KMwtdil1qQ2Iw8++j+PiddR2A2U7T9Uh7+gjEbzt5BiiT+kPrG4sv9qX9DJw2 G5kJe84fafpSsZpLbXUhB3G8nHWDbftLnwiaZMYoPfgeI/b8sXKM8CsLPc7QQDS9QqmS e5Z2FLTJGE3cmZp0d49WSTUTKrJVphglPtKSjowYNv6RVG+m/JV/AjeYkVIQtGG6PBvQ Of5A== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com X-Gm-Message-State: AJcUuke5nCZDrsd/LgCNu0u7cm9gFmVpS+I/u/mY80KbIpzKYlKFBxRF xA+rxNLBNz5q0rzRU6UwNsUGzAPpfLQc5IRUDN6vxQGzPhYP5RiZ57aqz7QRNdyzVLgHixOanHR jOK/E5Dh0dAAyaw9KkCZxM7a47TD2denGcbcTqSDvXNvGbOg6+ZeYeV6ayPSQMSFsSA== X-Received: by 2002:aed:2fc4:: with SMTP id m62mr25126403qtd.8.1548780892602; Tue, 29 Jan 2019 08:54:52 -0800 (PST) X-Google-Smtp-Source: ALg8bN7EckU2meyofUz7Wi65GwvAkWLVwClt/KJSzr0QEg1zwZMRj/StbydZyQpNCMmnK2siXfwJ X-Received: by 2002:aed:2fc4:: with SMTP id m62mr25126364qtd.8.1548780891883; Tue, 29 Jan 2019 08:54:51 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1548780891; cv=none; d=google.com; s=arc-20160816; b=iaoFvlrvpFtnOlq3BduTIAJLJAm8P92FUnmm4jZ/CtO17Gg068eBxeS2eCdzkM8z8Q tG5daFVVv0JN11XjZzqCeb92j/AffRjC8cv0YJRLBkfWJsTkYyqIbasE1OobuVuVOlD0 tcyfEfzJfq9LjL1+g7O9dhs6khj1KJJvnM3vwVxCoEPI4EEiLihjBfC5e51kE3jODVU8 pYHLePpQ63KbHo4TaMnjzcNtIbQY5HGOHNR7fe2aE/ieMRQfbQp0y//brzRrYovsJhKi hnWtKXWE9CByX9i3ZxBcugyUsPpcq6arxMAp9SQahY8iq2BKhIV+EW8/HLsMWkCDcns/ 3y+A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from; bh=tJ4kFBT5/PoC5R6rCwqTGnIldGPFFgCmYxq4Ju4J1I8=; b=s7r0hDAhdpcx25dQTObgMwSwY8yfpmnu0LiQM/e/fj6+nJ3tLbMgIn8Sq/kd43b44l 9rqRomGAg0PwDrwvhMUFO6rSiYU52TtnteoRoirK4OGcwusUhOU40ABuZyytKeM+M/Yz tPOWrR/XiyhsKQdCgDVIbcBfniFcVfSOQez8vAatEGeg2DEA+KgOFnJTb2gLZ3DYtXHc QDjYXVj/xn4a6Pla9dM3R2Ru02NTbShinZ/ZoR6CwhW8NkMeGbjZEtcarYGicf5kXPok pc2rLCeBt0xzk75FxSqbWXu8Opgdv3FdTT2Hjaa1ubvZMVl51Sc+M1ALJdnzu3Dd5OK7 uKYQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from mx1.redhat.com (mx1.redhat.com. [209.132.183.28]) by mx.google.com with ESMTPS id 88si3177158qte.245.2019.01.29.08.54.51 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 29 Jan 2019 08:54:51 -0800 (PST) Received-SPF: pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) client-ip=209.132.183.28; Authentication-Results: mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 13C377D0E0; Tue, 29 Jan 2019 16:54:51 +0000 (UTC) Received: from localhost.localdomain.com (ovpn-122-2.rdu2.redhat.com [10.10.122.2]) by smtp.corp.redhat.com (Postfix) with ESMTP id 3DE881048105; Tue, 29 Jan 2019 16:54:48 +0000 (UTC) From: jglisse@redhat.com To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Andrew Morton , Ralph Campbell , John Hubbard Subject: [PATCH 08/10] mm/hmm: support hugetlbfs (snap shoting, faulting and DMA mapping) Date: Tue, 29 Jan 2019 11:54:26 -0500 Message-Id: <20190129165428.3931-9-jglisse@redhat.com> In-Reply-To: <20190129165428.3931-1-jglisse@redhat.com> References: <20190129165428.3931-1-jglisse@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.26]); Tue, 29 Jan 2019 16:54:51 +0000 (UTC) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Jérôme Glisse This adds support for hugetlbfs so that HMM user can map mirror range of virtual address back by hugetlbfs. Note that now the range allows user to optimize DMA mapping of such page so that we can map a huge page as one chunk. Signed-off-by: Jérôme Glisse Cc: Andrew Morton Cc: Ralph Campbell Cc: John Hubbard --- include/linux/hmm.h | 29 ++++++++- mm/hmm.c | 141 +++++++++++++++++++++++++++++++++++++------- 2 files changed, 147 insertions(+), 23 deletions(-) diff --git a/include/linux/hmm.h b/include/linux/hmm.h index fc3630d0bbfd..b3850297352f 100644 --- a/include/linux/hmm.h +++ b/include/linux/hmm.h @@ -181,10 +181,31 @@ struct hmm_range { const uint64_t *values; uint64_t default_flags; uint64_t pfn_flags_mask; + uint8_t page_shift; uint8_t pfn_shift; bool valid; }; +/* + * hmm_range_page_shift() - return the page shift for the range + * @range: range being queried + * Returns: page shift (page size = 1 << page shift) for the range + */ +static inline unsigned hmm_range_page_shift(const struct hmm_range *range) +{ + return range->page_shift; +} + +/* + * hmm_range_page_size() - return the page size for the range + * @range: range being queried + * Returns: page size for the range in bytes + */ +static inline unsigned long hmm_range_page_size(const struct hmm_range *range) +{ + return 1UL << hmm_range_page_shift(range); +} + /* * hmm_range_wait_until_valid() - wait for range to be valid * @range: range affected by invalidation to wait on @@ -438,7 +459,7 @@ void hmm_mirror_unregister(struct hmm_mirror *mirror); * struct hmm_range range; * ... * - * ret = hmm_range_register(&range, mm, start, end); + * ret = hmm_range_register(&range, mm, start, end, page_shift); * if (ret) * return ret; * @@ -498,7 +519,8 @@ void hmm_mirror_unregister(struct hmm_mirror *mirror); int hmm_range_register(struct hmm_range *range, struct mm_struct *mm, unsigned long start, - unsigned long end); + unsigned long end, + unsigned page_shift); void hmm_range_unregister(struct hmm_range *range); long hmm_range_snapshot(struct hmm_range *range); long hmm_range_fault(struct hmm_range *range, bool block); @@ -538,7 +560,8 @@ static inline int hmm_vma_fault(struct hmm_range *range, bool block) range->pfn_flags_mask = -1UL; ret = hmm_range_register(range, range->vma->vm_mm, - range->start, range->end); + range->start, range->end, + PAGE_SHIFT); if (ret) return (int)ret; diff --git a/mm/hmm.c b/mm/hmm.c index 9cd68334a759..8b87e1813313 100644 --- a/mm/hmm.c +++ b/mm/hmm.c @@ -396,11 +396,13 @@ static int hmm_vma_walk_hole_(unsigned long addr, unsigned long end, struct hmm_vma_walk *hmm_vma_walk = walk->private; struct hmm_range *range = hmm_vma_walk->range; uint64_t *pfns = range->pfns; - unsigned long i; + unsigned long i, page_size; hmm_vma_walk->last = addr; - i = (addr - range->start) >> PAGE_SHIFT; - for (; addr < end; addr += PAGE_SIZE, i++) { + page_size = 1UL << range->page_shift; + i = (addr - range->start) >> range->page_shift; + + for (; addr < end; addr += page_size, i++) { pfns[i] = range->values[HMM_PFN_NONE]; if (fault || write_fault) { int ret; @@ -712,6 +714,69 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp, return 0; } +static int hmm_vma_walk_hugetlb_entry(pte_t *pte, unsigned long hmask, + unsigned long start, unsigned long end, + struct mm_walk *walk) +{ +#ifdef CONFIG_HUGETLB_PAGE + unsigned long addr = start, i, pfn, mask, size, pfn_inc; + struct hmm_vma_walk *hmm_vma_walk = walk->private; + struct hmm_range *range = hmm_vma_walk->range; + struct vm_area_struct *vma = walk->vma; + struct hstate *h = hstate_vma(vma); + uint64_t orig_pfn, cpu_flags; + bool fault, write_fault; + spinlock_t *ptl; + pte_t entry; + int ret = 0; + + size = 1UL << huge_page_shift(h); + mask = size - 1; + if (range->page_shift != PAGE_SHIFT) { + /* Make sure we are looking at full page. */ + if (start & mask) + return -EINVAL; + if (end < (start + size)) + return -EINVAL; + pfn_inc = size >> PAGE_SHIFT; + } else { + pfn_inc = 1; + size = PAGE_SIZE; + } + + + ptl = huge_pte_lock(hstate_vma(walk->vma), walk->mm, pte); + entry = huge_ptep_get(pte); + + i = (start - range->start) >> range->page_shift; + orig_pfn = range->pfns[i]; + range->pfns[i] = range->values[HMM_PFN_NONE]; + cpu_flags = pte_to_hmm_pfn_flags(range, entry); + fault = write_fault = false; + hmm_pte_need_fault(hmm_vma_walk, orig_pfn, cpu_flags, + &fault, &write_fault); + if (fault || write_fault) { + ret = -ENOENT; + goto unlock; + } + + pfn = pte_pfn(entry) + (start & mask); + for (; addr < end; addr += size, i++, pfn += pfn_inc) + range->pfns[i] = hmm_pfn_from_pfn(range, pfn) | cpu_flags; + hmm_vma_walk->last = end; + +unlock: + spin_unlock(ptl); + + if (ret == -ENOENT) + return hmm_vma_walk_hole_(addr, end, fault, write_fault, walk); + + return ret; +#else /* CONFIG_HUGETLB_PAGE */ + return -EINVAL; +#endif +} + static void hmm_pfns_clear(struct hmm_range *range, uint64_t *pfns, unsigned long addr, @@ -735,6 +800,7 @@ static void hmm_pfns_special(struct hmm_range *range) * @mm: the mm struct for the range of virtual address * @start: start virtual address (inclusive) * @end: end virtual address (exclusive) + * @page_shift: expect page shift for the range * Returns 0 on success, -EFAULT if the address space is no longer valid * * Track updates to the CPU page table see include/linux/hmm.h @@ -742,15 +808,22 @@ static void hmm_pfns_special(struct hmm_range *range) int hmm_range_register(struct hmm_range *range, struct mm_struct *mm, unsigned long start, - unsigned long end) + unsigned long end, + unsigned page_shift) { - range->start = start & PAGE_MASK; - range->end = end & PAGE_MASK; + unsigned long mask = ((1UL << page_shift) - 1UL); + range->valid = false; range->hmm = NULL; - if (range->start >= range->end) + if ((start & mask) || (end & mask)) return -EINVAL; + if (start >= end) + return -EINVAL; + + range->page_shift = page_shift; + range->start = start; + range->end = end; range->hmm = hmm_register(mm); if (!range->hmm) @@ -818,6 +891,7 @@ EXPORT_SYMBOL(hmm_range_unregister); */ long hmm_range_snapshot(struct hmm_range *range) { + const unsigned long device_vma = VM_IO | VM_PFNMAP | VM_MIXEDMAP; unsigned long start = range->start, end; struct hmm_vma_walk hmm_vma_walk; struct hmm *hmm = range->hmm; @@ -834,15 +908,26 @@ long hmm_range_snapshot(struct hmm_range *range) return -EAGAIN; vma = find_vma(hmm->mm, start); - if (vma == NULL || (vma->vm_flags & VM_SPECIAL)) + if (vma == NULL || (vma->vm_flags & device_vma)) return -EFAULT; - /* FIXME support hugetlb fs/dax */ - if (is_vm_hugetlb_page(vma) || vma_is_dax(vma)) { + /* FIXME support dax */ + if (vma_is_dax(vma)) { hmm_pfns_special(range); return -EINVAL; } + if (is_vm_hugetlb_page(vma)) { + struct hstate *h = hstate_vma(vma); + + if (huge_page_shift(h) != range->page_shift && + range->page_shift != PAGE_SHIFT) + return -EINVAL; + } else { + if (range->page_shift != PAGE_SHIFT) + return -EINVAL; + } + if (!(vma->vm_flags & VM_READ)) { /* * If vma do not allow read access, then assume that it @@ -868,6 +953,7 @@ long hmm_range_snapshot(struct hmm_range *range) mm_walk.hugetlb_entry = NULL; mm_walk.pmd_entry = hmm_vma_walk_pmd; mm_walk.pte_hole = hmm_vma_walk_hole; + mm_walk.hugetlb_entry = hmm_vma_walk_hugetlb_entry; walk_page_range(start, end, &mm_walk); start = end; @@ -909,6 +995,7 @@ EXPORT_SYMBOL(hmm_range_snapshot); */ long hmm_range_fault(struct hmm_range *range, bool block) { + const unsigned long device_vma = VM_IO | VM_PFNMAP | VM_MIXEDMAP; unsigned long start = range->start, end; struct hmm_vma_walk hmm_vma_walk; struct hmm *hmm = range->hmm; @@ -928,15 +1015,26 @@ long hmm_range_fault(struct hmm_range *range, bool block) } vma = find_vma(hmm->mm, start); - if (vma == NULL || (vma->vm_flags & VM_SPECIAL)) + if (vma == NULL || (vma->vm_flags & device_vma)) return -EFAULT; - /* FIXME support hugetlb fs/dax */ - if (is_vm_hugetlb_page(vma) || vma_is_dax(vma)) { + /* FIXME support dax */ + if (vma_is_dax(vma)) { hmm_pfns_special(range); return -EINVAL; } + if (is_vm_hugetlb_page(vma)) { + struct hstate *h = hstate_vma(vma); + + if (huge_page_shift(h) != range->page_shift && + range->page_shift != PAGE_SHIFT) + return -EINVAL; + } else { + if (range->page_shift != PAGE_SHIFT) + return -EINVAL; + } + if (!(vma->vm_flags & VM_READ)) { /* * If vma do not allow read access, then assume that it @@ -963,6 +1061,7 @@ long hmm_range_fault(struct hmm_range *range, bool block) mm_walk.hugetlb_entry = NULL; mm_walk.pmd_entry = hmm_vma_walk_pmd; mm_walk.pte_hole = hmm_vma_walk_hole; + mm_walk.hugetlb_entry = hmm_vma_walk_hugetlb_entry; do { ret = walk_page_range(start, end, &mm_walk); @@ -1003,14 +1102,15 @@ long hmm_range_dma_map(struct hmm_range *range, dma_addr_t *daddrs, bool block) { - unsigned long i, npages, mapped; + unsigned long i, npages, mapped, page_size; long ret; ret = hmm_range_fault(range, block); if (ret <= 0) return ret ? ret : -EBUSY; - npages = (range->end - range->start) >> PAGE_SHIFT; + page_size = hmm_range_page_size(range); + npages = (range->end - range->start) >> range->page_shift; for (i = 0, mapped = 0; i < npages; ++i) { enum dma_data_direction dir = DMA_FROM_DEVICE; struct page *page; @@ -1039,7 +1139,7 @@ long hmm_range_dma_map(struct hmm_range *range, if (range->pfns[i] & range->values[HMM_PFN_WRITE]) dir = DMA_BIDIRECTIONAL; - daddrs[i] = dma_map_page(device, page, 0, PAGE_SIZE, dir); + daddrs[i] = dma_map_page(device, page, 0, page_size, dir); if (dma_mapping_error(device, daddrs[i])) { ret = -EFAULT; goto unmap; @@ -1066,7 +1166,7 @@ long hmm_range_dma_map(struct hmm_range *range, if (range->pfns[i] & range->values[HMM_PFN_WRITE]) dir = DMA_BIDIRECTIONAL; - dma_unmap_page(device, daddrs[i], PAGE_SIZE, dir); + dma_unmap_page(device, daddrs[i], page_size, dir); mapped--; } @@ -1094,7 +1194,7 @@ long hmm_range_dma_unmap(struct hmm_range *range, dma_addr_t *daddrs, bool dirty) { - unsigned long i, npages; + unsigned long i, npages, page_size; long cpages = 0; /* Sanity check. */ @@ -1105,7 +1205,8 @@ long hmm_range_dma_unmap(struct hmm_range *range, if (!range->pfns) return -EINVAL; - npages = (range->end - range->start) >> PAGE_SHIFT; + page_size = hmm_range_page_size(range); + npages = (range->end - range->start) >> range->page_shift; for (i = 0; i < npages; ++i) { enum dma_data_direction dir = DMA_FROM_DEVICE; struct page *page; @@ -1127,7 +1228,7 @@ long hmm_range_dma_unmap(struct hmm_range *range, } /* Unmap and clear pfns/dma address */ - dma_unmap_page(device, daddrs[i], PAGE_SIZE, dir); + dma_unmap_page(device, daddrs[i], page_size, dir); range->pfns[i] = range->values[HMM_PFN_NONE]; /* FIXME see comments in hmm_vma_dma_map() */ daddrs[i] = 0; From patchwork Tue Jan 29 16:54:27 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Jerome Glisse X-Patchwork-Id: 10786615 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7F5D41399 for ; Tue, 29 Jan 2019 16:55:06 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6BEC72D3AE for ; Tue, 29 Jan 2019 16:55:06 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 601F92D5E1; Tue, 29 Jan 2019 16:55:06 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 936882D3AE for ; Tue, 29 Jan 2019 16:55:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 58C018E000B; Tue, 29 Jan 2019 11:54:54 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 4F67D8E0008; Tue, 29 Jan 2019 11:54:54 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3BB908E000B; Tue, 29 Jan 2019 11:54:54 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qt1-f198.google.com (mail-qt1-f198.google.com [209.85.160.198]) by kanga.kvack.org (Postfix) with ESMTP id 0813D8E0008 for ; Tue, 29 Jan 2019 11:54:54 -0500 (EST) Received: by mail-qt1-f198.google.com with SMTP id w19so25441401qto.13 for ; Tue, 29 Jan 2019 08:54:54 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=jHBmS4L3HjyWCkFWfDYs5jEZanpbacxsAkCpSGTqldM=; b=Io7LY9GNplDeC8BGMIA9qgheP4vXBJL8cOfUtI1ZJDUgnFw/xTUzM86e4gVdMjepIt lm2WlNAJBQjSpH5KQc4iMW5gZgtIrJRXGYkEzwYmTCtO0dxDWfMFVI7yGZwVvKWNHPJI JK2JkgTm5CWGbXeh/n+F8BI3BK+sZoLPCtIxusg3UnuUNzq+tZsw+wlUB3x/qIvNe3+F h0zMF32ks8HzrBuzM408+ObdFhUulWK5uEPKZ8zP8mpYbef+i8b/9IzCFSOHmqJu4HVg cnUOWg20Ul0ic4cEm9D8az4Tt6DIKEcyNRLW68JQxcEW9oP2w0gVh4H6G0mPua9wSYrd /36g== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com X-Gm-Message-State: AJcUukdACP7m0V3QPwc5eNshE3HWqsNGTv+v8bKKs6MlDViFa6DpO901 15/ztB7LNdISxy3vFFl6ckbW1yShThJOehvJnCeZC2QPF5qf7zQuQyEtef+rPR1t909Hol8asp0 amnZbHhwgcN/tQDMVaEVAzKzOG1z1hh1mG9HQ5VZ1KDXCjKPnsAPM0v2cDnGumlQB8Q== X-Received: by 2002:aed:202e:: with SMTP id 43mr26986379qta.27.1548780893791; Tue, 29 Jan 2019 08:54:53 -0800 (PST) X-Google-Smtp-Source: ALg8bN6y6nhLw5U6dJd35uYP/B4BMrD/1MQvxYOM/mQ//hmoZnQIU5D1hveRv29SrDM40OWhcEEf X-Received: by 2002:aed:202e:: with SMTP id 43mr26986353qta.27.1548780893310; Tue, 29 Jan 2019 08:54:53 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1548780893; cv=none; d=google.com; s=arc-20160816; b=AyEKlF9DBUayYsw1yZNd9Q6l6MiFA3QOFB/juwztm8pX0IkJjmEJ1ehccloMaCjGyg DJ+DUWJogjrLSXsMk/syMa0KG43ONv8aoTnKUbRulcsEpryoCXrS/+0PeTk31AHZ2F2P 9fxvgw0crmt7hL6OEA/rri8UQhSHZ3LgYPLtNZ09lcj0U1taIhMTQYIwPlOwkG1QbVCN O7dcoJEkol3MzKVZRvCGgkQojzCYZNArNuO4pTt5bEgBjCPPokYWCzg/fd5ws8leystm AKW8OkDk8wjv0N+R9ORgi2kfiGmvqUSUIjMX8TGqF/W8IwlBx8rR9Rx43tz4S1i98A+F 8RzQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from; bh=jHBmS4L3HjyWCkFWfDYs5jEZanpbacxsAkCpSGTqldM=; b=I/Y0CMUp7AAUALf7YGJXNJh4afw9ueh8R+P9hfbZJJGn2h3LnYp8XhpYKP+UOByngL x/QGaj2swXoTdFS//0GFzjeQINmSTqfS07ceXxv20htVf1e80gK1kxBQwXHSY4802yuj vSTXdx76Xbtsk9VJpm+Lsg89O429D7k9rJpF+68DbGabtw/lFlgmJVmN5ENWfySL3Gwk P1LUGMezy6CnHUW0DaQad8sQekXrXatYCUI39h3RJwZw4EMwLhd2HXZLaecmWd9TNPER RmjBv1Bxnc9Q2h1vGgj2ch2hKio4eKrlrwvD/COWGWapvCY7uJYKUtX9MKcatGYy2uzX 3S7w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from mx1.redhat.com (mx1.redhat.com. [209.132.183.28]) by mx.google.com with ESMTPS id q11si3617032qkc.214.2019.01.29.08.54.53 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 29 Jan 2019 08:54:53 -0800 (PST) Received-SPF: pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) client-ip=209.132.183.28; Authentication-Results: mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 689D9BDD0; Tue, 29 Jan 2019 16:54:52 +0000 (UTC) Received: from localhost.localdomain.com (ovpn-122-2.rdu2.redhat.com [10.10.122.2]) by smtp.corp.redhat.com (Postfix) with ESMTP id 43770102BCEB; Tue, 29 Jan 2019 16:54:51 +0000 (UTC) From: jglisse@redhat.com To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Andrew Morton , Dan Williams , Ralph Campbell , John Hubbard Subject: [PATCH 09/10] mm/hmm: allow to mirror vma of a file on a DAX backed filesystem Date: Tue, 29 Jan 2019 11:54:27 -0500 Message-Id: <20190129165428.3931-10-jglisse@redhat.com> In-Reply-To: <20190129165428.3931-1-jglisse@redhat.com> References: <20190129165428.3931-1-jglisse@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.29]); Tue, 29 Jan 2019 16:54:52 +0000 (UTC) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Jérôme Glisse This add support to mirror vma which is an mmap of a file which is on a filesystem that using a DAX block device. There is no reason not to support that case. Note that unlike GUP code we do not take page reference hence when we back-off we have nothing to undo. Signed-off-by: Jérôme Glisse Cc: Andrew Morton Cc: Dan Williams Cc: Ralph Campbell Cc: John Hubbard --- mm/hmm.c | 133 ++++++++++++++++++++++++++++++++++++++++++++++--------- 1 file changed, 112 insertions(+), 21 deletions(-) diff --git a/mm/hmm.c b/mm/hmm.c index 8b87e1813313..1a444885404e 100644 --- a/mm/hmm.c +++ b/mm/hmm.c @@ -334,6 +334,7 @@ EXPORT_SYMBOL(hmm_mirror_unregister); struct hmm_vma_walk { struct hmm_range *range; + struct dev_pagemap *pgmap; unsigned long last; bool fault; bool block; @@ -508,6 +509,15 @@ static inline uint64_t pmd_to_hmm_pfn_flags(struct hmm_range *range, pmd_t pmd) range->flags[HMM_PFN_VALID]; } +static inline uint64_t pud_to_hmm_pfn_flags(struct hmm_range *range, pud_t pud) +{ + if (!pud_present(pud)) + return 0; + return pud_write(pud) ? range->flags[HMM_PFN_VALID] | + range->flags[HMM_PFN_WRITE] : + range->flags[HMM_PFN_VALID]; +} + static int hmm_vma_handle_pmd(struct mm_walk *walk, unsigned long addr, unsigned long end, @@ -529,8 +539,19 @@ static int hmm_vma_handle_pmd(struct mm_walk *walk, return hmm_vma_walk_hole_(addr, end, fault, write_fault, walk); pfn = pmd_pfn(pmd) + pte_index(addr); - for (i = 0; addr < end; addr += PAGE_SIZE, i++, pfn++) + for (i = 0; addr < end; addr += PAGE_SIZE, i++, pfn++) { + if (pmd_devmap(pmd)) { + hmm_vma_walk->pgmap = get_dev_pagemap(pfn, + hmm_vma_walk->pgmap); + if (unlikely(!hmm_vma_walk->pgmap)) + return -EBUSY; + } pfns[i] = hmm_pfn_from_pfn(range, pfn) | cpu_flags; + } + if (hmm_vma_walk->pgmap) { + put_dev_pagemap(hmm_vma_walk->pgmap); + hmm_vma_walk->pgmap = NULL; + } hmm_vma_walk->last = end; return 0; } @@ -617,10 +638,24 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr, if (fault || write_fault) goto fault; + if (pte_devmap(pte)) { + hmm_vma_walk->pgmap = get_dev_pagemap(pte_pfn(pte), + hmm_vma_walk->pgmap); + if (unlikely(!hmm_vma_walk->pgmap)) + return -EBUSY; + } else if (IS_ENABLED(CONFIG_ARCH_HAS_PTE_SPECIAL) && pte_special(pte)) { + *pfn = range->values[HMM_PFN_SPECIAL]; + return -EFAULT; + } + *pfn = hmm_pfn_from_pfn(range, pte_pfn(pte)) | cpu_flags; return 0; fault: + if (hmm_vma_walk->pgmap) { + put_dev_pagemap(hmm_vma_walk->pgmap); + hmm_vma_walk->pgmap = NULL; + } pte_unmap(ptep); /* Fault any virtual address we were asked to fault */ return hmm_vma_walk_hole_(addr, end, fault, write_fault, walk); @@ -708,12 +743,84 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp, return r; } } + if (hmm_vma_walk->pgmap) { + put_dev_pagemap(hmm_vma_walk->pgmap); + hmm_vma_walk->pgmap = NULL; + } pte_unmap(ptep - 1); hmm_vma_walk->last = addr; return 0; } +static int hmm_vma_walk_pud(pud_t *pudp, + unsigned long start, + unsigned long end, + struct mm_walk *walk) +{ + struct hmm_vma_walk *hmm_vma_walk = walk->private; + struct hmm_range *range = hmm_vma_walk->range; + struct vm_area_struct *vma = walk->vma; + unsigned long addr = start, next; + pmd_t *pmdp; + pud_t pud; + int ret; + +again: + pud = READ_ONCE(*pudp); + if (pud_none(pud)) + return hmm_vma_walk_hole(start, end, walk); + + if (pud_huge(pud) && pud_devmap(pud)) { + unsigned long i, npages, pfn; + uint64_t *pfns, cpu_flags; + bool fault, write_fault; + + if (!pud_present(pud)) + return hmm_vma_walk_hole(start, end, walk); + + i = (addr - range->start) >> PAGE_SHIFT; + npages = (end - addr) >> PAGE_SHIFT; + pfns = &range->pfns[i]; + + cpu_flags = pud_to_hmm_pfn_flags(range, pud); + hmm_range_need_fault(hmm_vma_walk, pfns, npages, + cpu_flags, &fault, &write_fault); + if (fault || write_fault) + return hmm_vma_walk_hole_(addr, end, fault, + write_fault, walk); + + pfn = pud_pfn(pud) + ((addr & ~PUD_MASK) >> PAGE_SHIFT); + for (i = 0; i < npages; ++i, ++pfn) { + hmm_vma_walk->pgmap = get_dev_pagemap(pfn, + hmm_vma_walk->pgmap); + if (unlikely(!hmm_vma_walk->pgmap)) + return -EBUSY; + pfns[i] = hmm_pfn_from_pfn(range, pfn) | cpu_flags; + } + if (hmm_vma_walk->pgmap) { + put_dev_pagemap(hmm_vma_walk->pgmap); + hmm_vma_walk->pgmap = NULL; + } + hmm_vma_walk->last = end; + return 0; + } + + split_huge_pud(vma, pudp, addr); + if (pud_none(*pudp)) + goto again; + + pmdp = pmd_offset(pudp, addr); + do { + next = pmd_addr_end(addr, end); + ret = hmm_vma_walk_pmd(pmdp, addr, next, walk); + if (ret) + return ret; + } while (pmdp++, addr = next, addr != end); + + return 0; +} + static int hmm_vma_walk_hugetlb_entry(pte_t *pte, unsigned long hmask, unsigned long start, unsigned long end, struct mm_walk *walk) @@ -786,14 +893,6 @@ static void hmm_pfns_clear(struct hmm_range *range, *pfns = range->values[HMM_PFN_NONE]; } -static void hmm_pfns_special(struct hmm_range *range) -{ - unsigned long addr = range->start, i = 0; - - for (; addr < range->end; addr += PAGE_SIZE, i++) - range->pfns[i] = range->values[HMM_PFN_SPECIAL]; -} - /* * hmm_range_register() - start tracking change to CPU page table over a range * @range: range @@ -911,12 +1010,6 @@ long hmm_range_snapshot(struct hmm_range *range) if (vma == NULL || (vma->vm_flags & device_vma)) return -EFAULT; - /* FIXME support dax */ - if (vma_is_dax(vma)) { - hmm_pfns_special(range); - return -EINVAL; - } - if (is_vm_hugetlb_page(vma)) { struct hstate *h = hstate_vma(vma); @@ -940,6 +1033,7 @@ long hmm_range_snapshot(struct hmm_range *range) } range->vma = vma; + hmm_vma_walk.pgmap = NULL; hmm_vma_walk.last = start; hmm_vma_walk.fault = false; hmm_vma_walk.range = range; @@ -951,6 +1045,7 @@ long hmm_range_snapshot(struct hmm_range *range) mm_walk.pte_entry = NULL; mm_walk.test_walk = NULL; mm_walk.hugetlb_entry = NULL; + mm_walk.pud_entry = hmm_vma_walk_pud; mm_walk.pmd_entry = hmm_vma_walk_pmd; mm_walk.pte_hole = hmm_vma_walk_hole; mm_walk.hugetlb_entry = hmm_vma_walk_hugetlb_entry; @@ -1018,12 +1113,6 @@ long hmm_range_fault(struct hmm_range *range, bool block) if (vma == NULL || (vma->vm_flags & device_vma)) return -EFAULT; - /* FIXME support dax */ - if (vma_is_dax(vma)) { - hmm_pfns_special(range); - return -EINVAL; - } - if (is_vm_hugetlb_page(vma)) { struct hstate *h = hstate_vma(vma); @@ -1047,6 +1136,7 @@ long hmm_range_fault(struct hmm_range *range, bool block) } range->vma = vma; + hmm_vma_walk.pgmap = NULL; hmm_vma_walk.last = start; hmm_vma_walk.fault = true; hmm_vma_walk.block = block; @@ -1059,6 +1149,7 @@ long hmm_range_fault(struct hmm_range *range, bool block) mm_walk.pte_entry = NULL; mm_walk.test_walk = NULL; mm_walk.hugetlb_entry = NULL; + mm_walk.pud_entry = hmm_vma_walk_pud; mm_walk.pmd_entry = hmm_vma_walk_pmd; mm_walk.pte_hole = hmm_vma_walk_hole; mm_walk.hugetlb_entry = hmm_vma_walk_hugetlb_entry; From patchwork Tue Jan 29 16:54:28 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Jerome Glisse X-Patchwork-Id: 10786617 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3F1D71390 for ; Tue, 29 Jan 2019 16:55:09 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2CA762D3AE for ; Tue, 29 Jan 2019 16:55:09 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 211042D5E1; Tue, 29 Jan 2019 16:55:09 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9E65B2D3AE for ; Tue, 29 Jan 2019 16:55:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A4C3B8E000C; Tue, 29 Jan 2019 11:54:55 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 97F958E0008; Tue, 29 Jan 2019 11:54:55 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 825E98E000C; Tue, 29 Jan 2019 11:54:55 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qk1-f200.google.com (mail-qk1-f200.google.com [209.85.222.200]) by kanga.kvack.org (Postfix) with ESMTP id 4AAD98E0008 for ; Tue, 29 Jan 2019 11:54:55 -0500 (EST) Received: by mail-qk1-f200.google.com with SMTP id d196so22331750qkb.6 for ; Tue, 29 Jan 2019 08:54:55 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=uF0Rg6OUe55CoYMP7oripH2m8c1bWkRZFMmBzBzXOKY=; b=Tl9sPdDurS5fwUTcdKuUXAy6gdyYpXiLJM+jJYx2XcJaK7/PWmiS5upMcd0R/94zsX IwD4SQ5yiFWfwIPaSLTxNlNm2bzNLyADUxIbwvSswWbfxBQAehcrWsGUxH0gkNlLC1L5 GziRh6I0CCkSskqhQ7HSXRoongMhcomu9rIkGGa12fAohrmvkjEZwqoDHaT53tMy2Dfk pI0PqEPlfUt406XdzXH4pgnY0IQsP+NfTPGVy0YBzFcJRb+sJBWECe/1tb5Hw0Y/9VUS dVeHIT69s+/Q/jG05AmJ7641DuLBDScXviYuxLd43TF2USAEyrIwf1R1g8/K1B0/WmIl am6w== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com X-Gm-Message-State: AJcUukdwZN1iDXTHUP9FXiCiNQ2JcCH4TxcuANeGuNmA7TSYKjEmDIPE EYdYBnrg+25DJuYYlyePgnajf9PXLZ2d2py3GwlYWPCtPs5irhdFMNoL2CBKFPQKmy8xW4mvZL7 PPPwvSxgJImFLVSkeWIEGmHMlb44QMRHZsLGXM3mgSPvu3IbZNBHau5n3wxRHimPJpw== X-Received: by 2002:a0c:c192:: with SMTP id n18mr24139901qvh.99.1548780895045; Tue, 29 Jan 2019 08:54:55 -0800 (PST) X-Google-Smtp-Source: ALg8bN7JR9s3tYoKsXJkJd3oerj4iXS/kFrzHaeiZjRpyIV46vVpamajUBRPY5hVJUWuj4q0dUkK X-Received: by 2002:a0c:c192:: with SMTP id n18mr24139884qvh.99.1548780894557; Tue, 29 Jan 2019 08:54:54 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1548780894; cv=none; d=google.com; s=arc-20160816; b=PH5cg9miUCTKSUId1d0lHDzG4hvPHQlNBKcd7R6m8UT5Q9Cb6UqRtgWmZA6TNtJgGp GDEFCA9ne81hvS3GlTGaujzY7CQvCfsvaaA0cfZP/C2wP1TEhr82AOrHeneYQukBCSZL pGol0JU9S1Oe9A9aT4Req1sQu1g+Hgbg7GY0Wxi5ba+BK3aUeyj94DNKG+9sEADQu4U8 wEC1lUeApdwYNR5lUfs0RrBXBP7+DPvGrWijBE5T77ahITD31uUo3cmM5eO4HAJ+Xz15 V84uDKOGXKCs1ttrNPOHEIns/V8eHQaddw7Szu9GRL3tpvoSFpS7uYfw5Zw74YLuNSd2 OtbA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from; bh=uF0Rg6OUe55CoYMP7oripH2m8c1bWkRZFMmBzBzXOKY=; b=Rz9XcH0ZXt5+uLQE9/5d2yg9B9crjxI7HVFPnO7uUnRpZecDjiDSvqhesA0y7/5tGz GskPvdMG3+e+MOv/n2xUI2XyXZTjTheIbZ4O1TPOW1iO7FZqu7Z13MPcOVqfPF51va9V kPnwbDUk3zImEYyzrRUOfHREqs80mDpvBTuW9Ci1C51BQFayI8s/l11HcK+3/tPPObvt 10ofUxyXoxtQN2l19/TzlorZFO5MDB/lbaYV7DfnI7jO3WSMQKtv1MfJhZgH00xqQJW1 W7TdfZBS6V8yyEE9Nxn78aW2p+7OQN4claSI/5xZRr0GugtQAg/FqKk03Ihaj45jKOmt fsfA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from mx1.redhat.com (mx1.redhat.com. [209.132.183.28]) by mx.google.com with ESMTPS id f57si963291qtf.362.2019.01.29.08.54.54 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 29 Jan 2019 08:54:54 -0800 (PST) Received-SPF: pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) client-ip=209.132.183.28; Authentication-Results: mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 9CB7281DE9; Tue, 29 Jan 2019 16:54:53 +0000 (UTC) Received: from localhost.localdomain.com (ovpn-122-2.rdu2.redhat.com [10.10.122.2]) by smtp.corp.redhat.com (Postfix) with ESMTP id 97C9D102BCEB; Tue, 29 Jan 2019 16:54:52 +0000 (UTC) From: jglisse@redhat.com To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Andrew Morton , Ralph Campbell , John Hubbard Subject: [PATCH 10/10] mm/hmm: add helpers for driver to safely take the mmap_sem Date: Tue, 29 Jan 2019 11:54:28 -0500 Message-Id: <20190129165428.3931-11-jglisse@redhat.com> In-Reply-To: <20190129165428.3931-1-jglisse@redhat.com> References: <20190129165428.3931-1-jglisse@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.25]); Tue, 29 Jan 2019 16:54:53 +0000 (UTC) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Jérôme Glisse The device driver context which holds reference to mirror and thus to core hmm struct might outlive the mm against which it was created. To avoid every driver to check for that case provide an helper that check if mm is still alive and take the mmap_sem in read mode if so. If the mm have been destroy (mmu_notifier release call back did happen) then we return -EINVAL so that calling code knows that it is trying to do something against a mm that is no longer valid. Signed-off-by: Jérôme Glisse Cc: Andrew Morton Cc: Ralph Campbell Cc: John Hubbard --- include/linux/hmm.h | 50 ++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 47 insertions(+), 3 deletions(-) diff --git a/include/linux/hmm.h b/include/linux/hmm.h index b3850297352f..4a1454e3efba 100644 --- a/include/linux/hmm.h +++ b/include/linux/hmm.h @@ -438,6 +438,50 @@ struct hmm_mirror { int hmm_mirror_register(struct hmm_mirror *mirror, struct mm_struct *mm); void hmm_mirror_unregister(struct hmm_mirror *mirror); +/* + * hmm_mirror_mm_down_read() - lock the mmap_sem in read mode + * @mirror: the HMM mm mirror for which we want to lock the mmap_sem + * Returns: -EINVAL if the mm is dead, 0 otherwise (lock taken). + * + * The device driver context which holds reference to mirror and thus to core + * hmm struct might outlive the mm against which it was created. To avoid every + * driver to check for that case provide an helper that check if mm is still + * alive and take the mmap_sem in read mode if so. If the mm have been destroy + * (mmu_notifier release call back did happen) then we return -EINVAL so that + * calling code knows that it is trying to do something against a mm that is + * no longer valid. + */ +static inline int hmm_mirror_mm_down_read(struct hmm_mirror *mirror) +{ + struct mm_struct *mm; + + /* Sanity check ... */ + if (!mirror || !mirror->hmm) + return -EINVAL; + /* + * Before trying to take the mmap_sem make sure the mm is still + * alive as device driver context might outlive the mm lifetime. + * + * FIXME: should we also check for mm that outlive its owning + * task ? + */ + mm = READ_ONCE(mirror->hmm->mm); + if (mirror->hmm->dead || !mm) + return -EINVAL; + + down_read(&mm->mmap_sem); + return 0; +} + +/* + * hmm_mirror_mm_up_read() - unlock the mmap_sem from read mode + * @mirror: the HMM mm mirror for which we want to lock the mmap_sem + */ +static inline void hmm_mirror_mm_up_read(struct hmm_mirror *mirror) +{ + up_read(&mirror->hmm->mm->mmap_sem); +} + /* * To snapshot the CPU page table you first have to call hmm_range_register() @@ -463,7 +507,7 @@ void hmm_mirror_unregister(struct hmm_mirror *mirror); * if (ret) * return ret; * - * down_read(mm->mmap_sem); + * hmm_mirror_mm_down_read(mirror); * again: * * if (!hmm_range_wait_until_valid(&range, TIMEOUT)) { @@ -476,13 +520,13 @@ void hmm_mirror_unregister(struct hmm_mirror *mirror); * * ret = hmm_range_snapshot(&range); or hmm_range_fault(&range); * if (ret == -EAGAIN) { - * down_read(mm->mmap_sem); + * hmm_mirror_mm_down_read(mirror); * goto again; * } else if (ret == -EBUSY) { * goto again; * } * - * up_read(&mm->mmap_sem); + * hmm_mirror_mm_up_read(mirror); * if (ret) { * hmm_range_unregister(range); * return ret;