From patchwork Wed Apr 8 15:56:52 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Boaz Harrosh X-Patchwork-Id: 6181201 Return-Path: X-Original-To: patchwork-linux-nvdimm@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 5193ABF4A7 for ; Wed, 8 Apr 2015 15:57:03 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 3129F2037B for ; Wed, 8 Apr 2015 15:57:02 +0000 (UTC) Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id AD32D20376 for ; Wed, 8 Apr 2015 15:57:00 +0000 (UTC) Received: from ml01.vlan14.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id 2874181223; Wed, 8 Apr 2015 08:57:00 -0700 (PDT) X-Original-To: linux-nvdimm@ml01.01.org Delivered-To: linux-nvdimm@ml01.01.org Received: from mail-wi0-f179.google.com (mail-wi0-f179.google.com [209.85.212.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 0E8B88116F for ; Wed, 8 Apr 2015 08:56:58 -0700 (PDT) Received: by wiun10 with SMTP id n10so64093906wiu.1 for ; Wed, 08 Apr 2015 08:56:56 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:message-id:date:from:user-agent:mime-version:to :cc:subject:references:in-reply-to:content-type :content-transfer-encoding; bh=dOLOwbKHQsTPwyEq7EQNY+9GHMbZ+ZdomSCyC673sHk=; b=F3EKjsiZtHm5MLKJBRxamlfZuWWLvMXBUIXekGjqgUh4NQtPF0DUXmofuA7nGVfCpP x9M3u176WSTZNqgF6QqMOvJkkHitfREO6kX/U+efe7l9EJp8u+YxThz26O+2rULicvd8 Hkr3qq9R3zsTz/K+R18qsUAtTjDQuS4ODgGG+1lOeqSY4VX3BDiLwL6ZEfv5T3mM8/Yc +NmuFD2frytZYOM5RVqDOzxa3oJrMWtoT92zbHL42GS1O+S5f8W6/ea+MFzmW1Gucu/0 VFFc9vrcQS2YEbfRkeKyg9poiU+R7PewRKQuSAvtiNWieV9ja5vhw8vjD0Sq+uYlT9c4 VJtA== X-Gm-Message-State: ALoCoQnGLDy751f8RfDzY2um0Xbmar4g7wCTXcAKbQ3KcAnFZdK39F1mEslalcOerd1zS7RZYJwG X-Received: by 10.180.82.9 with SMTP id e9mr15513758wiy.88.1428508615789; Wed, 08 Apr 2015 08:56:55 -0700 (PDT) Received: from [10.0.0.5] ([207.232.55.62]) by mx.google.com with ESMTPSA id l1sm16249614wiy.20.2015.04.08.08.56.53 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 08 Apr 2015 08:56:54 -0700 (PDT) Message-ID: <55254FC4.3050206@plexistor.com> Date: Wed, 08 Apr 2015 18:56:52 +0300 From: Boaz Harrosh User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.5.0 MIME-Version: 1.0 To: Stable Tree References: <55239645.9000507@plexistor.com> In-Reply-To: <55239645.9000507@plexistor.com> Cc: Jan Kara , Eryu Guan , Dave Chinner , Hugh Dickins , Christoph Hellwig , linux-mm@kvack.org, Matthew Wilcox , linux-nvdimm , linux-fsdevel , Andrew Morton , "Kirill A. Shutemov" , Mel Gorman Subject: [Linux-nvdimm] [PATCH 1/3 @stable] mm(v4.0): New pfn_mkwrite same as page_mkwrite for VM_PFNMAP X-BeenThere: linux-nvdimm@lists.01.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: "Linux-nvdimm developer list." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: linux-nvdimm-bounces@lists.01.org Sender: "Linux-nvdimm" X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00, T_RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Yigal Korman [For Stable 4.0.X] The parallel patch at 4.1-rc1 to this patch is: Subject: mm: new pfn_mkwrite same as page_mkwrite for VM_PFNMAP We need this patch for the 4.0.X stable tree if the patch Subject: dax: use pfn_mkwrite to update c/mtime + freeze protection Was decided to be pulled into stable since it is a dependency of this patch. The file mm/memory.c was heavily changed in 4.1 hence this here. [v3] In the case of !pte_same when we lost the race better return 0 instead of FAULT_NO_PAGE [v2] Fixed according to Kirill's comments [v1] This will allow FS that uses VM_PFNMAP | VM_MIXEDMAP (no page structs) to get notified when access is a write to a read-only PFN. This can happen if we mmap() a file then first mmap-read from it to page-in a read-only PFN, than we mmap-write to the same page. We need this functionality to fix a DAX bug, where in the scenario above we fail to set ctime/mtime though we modified the file. An xfstest is attached to this patchset that shows the failure and the fix. (A DAX patch will follow) This functionality is extra important for us, because upon dirtying of a pmem page we also want to RDMA the page to a remote cluster node. We define a new pfn_mkwrite and do not reuse page_mkwrite because 1 - The name ;-) 2 - But mainly because it would take a very long and tedious audit of all page_mkwrite functions of VM_MIXEDMAP/VM_PFNMAP users. To make sure they do not now CRASH. For example current DAX code (which this is for) would crash. If we would want to reuse page_mkwrite, We will need to first patch all users, so to not-crash-on-no-page. Then enable this patch. But even if I did that I would not sleep so well at night. Adding a new vector is the safest thing to do, and is not that expensive. an extra pointer at a static function vector per driver. Also the new vector is better for performance, because else we Will call all current Kernel vectors, so to: check-ha-no-page-do-nothing and return. No need to call it from do_shared_fault because do_wp_page is called to change pte permissions anyway. Signed-off-by: Yigal Korman Signed-off-by: Boaz Harrosh CC: Kirill A. Shutemov CC: Matthew Wilcox CC: Jan Kara CC: Andrew Morton CC: Hugh Dickins CC: Mel Gorman CC: Konstantin Khlebnikov CC: linux-mm@kvack.org CC: Stable Tree --- Documentation/filesystems/Locking | 8 ++++++++ include/linux/mm.h | 3 +++ mm/memory.c | 27 ++++++++++++++++++++++++++- 3 files changed, 37 insertions(+), 1 deletion(-) diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking index f91926f..25f36e6 100644 --- a/Documentation/filesystems/Locking +++ b/Documentation/filesystems/Locking @@ -525,6 +525,7 @@ prototypes: void (*close)(struct vm_area_struct*); int (*fault)(struct vm_area_struct*, struct vm_fault *); int (*page_mkwrite)(struct vm_area_struct *, struct vm_fault *); + int (*pfn_mkwrite)(struct vm_area_struct *, struct vm_fault *); int (*access)(struct vm_area_struct *, unsigned long, void*, int, int); locking rules: @@ -534,6 +535,7 @@ close: yes fault: yes can return with page locked map_pages: yes page_mkwrite: yes can return with page locked +pfn_mkwrite: yes access: yes ->fault() is called when a previously not present pte is about @@ -560,6 +562,12 @@ the page has been truncated, the filesystem should not look up a new page like the ->fault() handler, but simply return with VM_FAULT_NOPAGE, which will cause the VM to retry the fault. + ->pfn_mkwrite() is the same as page_mkwrite but when the pte is +VM_PFNMAP or VM_MIXEDMAP with a page-less entry. Expected return is +VM_FAULT_NOPAGE. Or one of the VM_FAULT_ERROR types. The default behavior +after this call is to make the pte read-write, unless pfn_mkwrite() +already touched the pte, in that case it is untouched. + ->access() is called when get_user_pages() fails in access_process_vm(), typically used to debug a process through /proc/pid/mem or ptrace. This function is needed only for diff --git a/include/linux/mm.h b/include/linux/mm.h index 47a9392..85ba9c2 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -251,6 +251,9 @@ struct vm_operations_struct { * writable, if an error is returned it will cause a SIGBUS */ int (*page_mkwrite)(struct vm_area_struct *vma, struct vm_fault *vmf); + /* same as page_mkwrite when using VM_PFNMAP|VM_MIXEDMAP */ + int (*pfn_mkwrite)(struct vm_area_struct *vma, struct vm_fault *vmf); + /* called by access_process_vm when get_user_pages() fails, typically * for use by special VMAs that can switch between memory and hardware */ diff --git a/mm/memory.c b/mm/memory.c index 97839f5..6029777 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1982,6 +1982,18 @@ static int do_page_mkwrite(struct vm_area_struct *vma, struct page *page, return ret; } +static int do_pfn_mkwrite(struct vm_area_struct *vma, unsigned long address) +{ + struct vm_fault vmf = { + .page = NULL, + .pgoff = linear_page_index(vma, address), + .virtual_address = (void __user *)(address & PAGE_MASK), + .flags = FAULT_FLAG_WRITE | FAULT_FLAG_MKWRITE, + }; + + return vma->vm_ops->pfn_mkwrite(vma, &vmf); +} + /* * This routine handles present pages, when users try to write * to a shared page. It is done by copying the page to a new address @@ -2025,8 +2037,21 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct *vma, * accounting on raw pfn maps. */ if ((vma->vm_flags & (VM_WRITE|VM_SHARED)) == - (VM_WRITE|VM_SHARED)) + (VM_WRITE|VM_SHARED)) { + if (vma->vm_ops && vma->vm_ops->pfn_mkwrite) { + pte_unmap_unlock(page_table, ptl); + ret = do_pfn_mkwrite(vma, address); + if (ret & VM_FAULT_ERROR) + return ret; + page_table = pte_offset_map_lock(mm, pmd, + address, &ptl); + if (!pte_same(*page_table, orig_pte)) { + ret = 0; + goto unlock; + } + } goto reuse; + } goto gotten; }