From patchwork Fri Aug 5 11:03:28 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 12937203 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id F13D7C00140 for ; Fri, 5 Aug 2022 11:03:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 895CF8E0005; Fri, 5 Aug 2022 07:03:48 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8452E8E0001; Fri, 5 Aug 2022 07:03:48 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6E6818E0005; Fri, 5 Aug 2022 07:03:48 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 61D9E8E0001 for ; Fri, 5 Aug 2022 07:03:48 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 263B5C05CB for ; Fri, 5 Aug 2022 11:03:48 +0000 (UTC) X-FDA: 79765253736.19.BCF802E Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf07.hostedemail.com (Postfix) with ESMTP id B971540137 for ; Fri, 5 Aug 2022 11:03:47 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1659697427; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=mJ3wb3wnNw2CiOGpWxs5L2Tr+je+/REmFMZ2ZlIWVD0=; b=MGrMMpaz0eSX0+RZ6LK+i07kQo8xM+2f0wZ9gojRIj/N5AwmKXx6ec02qm/JaDdq/sDTT2 H3oc5vm/9/wDUB8PgSsGIS2Z/IxWR4YPc5Fmp9JgvDZdBXW2cQvyctWJMI1TyKIC8YK3W+ 5pCE6OZhUm+1bni9Y9Tye+7FNzoy6Is= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-299-s0xMztL9Mhql4sPg9FocmA-1; Fri, 05 Aug 2022 07:03:36 -0400 X-MC-Unique: s0xMztL9Mhql4sPg9FocmA-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 3F65C382ECC7; Fri, 5 Aug 2022 11:03:36 +0000 (UTC) Received: from t480s.fritz.box (unknown [10.39.194.85]) by smtp.corp.redhat.com (Postfix) with ESMTP id 1DCC9404E4D6; Fri, 5 Aug 2022 11:03:33 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, David Hildenbrand , Andrew Morton , Mike Kravetz , Muchun Song , Peter Xu , Peter Feiner , "Kirill A . Shutemov" , stable@vger.kernel.org Subject: [PATCH v1 1/2] mm/hugetlb: fix hugetlb not supporting write-notify Date: Fri, 5 Aug 2022 13:03:28 +0200 Message-Id: <20220805110329.80540-2-david@redhat.com> In-Reply-To: <20220805110329.80540-1-david@redhat.com> References: <20220805110329.80540-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.11.54.2 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1659697427; a=rsa-sha256; cv=none; b=dRFBFBSicixQQLuGKm68eGOor+rTDbGrfeQ+TAMn8TnnkrNgYj+4eLuzSrSgeHiVCvFfo6 qht8qHiAOnYS7HifS9ivJe/z5T2zeryhn4BZqyuAQFXcMueQlQoNbFaTwlNO/0/kpTKvDS myXBAmexeVgucAa5yxBg41sqFTP7ajA= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=MGrMMpaz; spf=pass (imf07.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1659697427; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=mJ3wb3wnNw2CiOGpWxs5L2Tr+je+/REmFMZ2ZlIWVD0=; b=7c1q0+z4375Frl8yeqqvZR0dQxIeLHb1NkKe7kpVOR1j5Te9h1a6kUhBFAlJ3H6vXrZLs9 F+bSErRCs9OfQ69NzRj+q0WcswqCe0q4gZrOwNYAHK+W6VgGXFVCRJ+m8DRJ0is8XavcEV rJsFok+bKID3OVykzO5egzc2uZQMYgM= X-Stat-Signature: fch1mq154er19mhapodgeeewswb19q9z X-Rspamd-Queue-Id: B971540137 Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=MGrMMpaz; spf=pass (imf07.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-Rspam-User: X-Rspamd-Server: rspam11 X-HE-Tag: 1659697427-828848 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Staring at hugetlb_wp(), one might wonder where all the logic for shared mappings is when stumbling over a write-protected page in a shared mapping. In fact, there is none, and so far we thought we could get away with that because e.g., mprotect() should always do the right thing and map all pages directly writable. Looks like we were wrong: -------------------------------------------------------------------------- #include #include #include #include #include #include #include #define HUGETLB_SIZE (2 * 1024 * 1024u) static void clear_softdirty(void) { int fd = open("/proc/self/clear_refs", O_WRONLY); const char *ctrl = "4"; int ret; if (fd < 0) { fprintf(stderr, "open(clear_refs) failed\n"); exit(1); } ret = write(fd, ctrl, strlen(ctrl)); if (ret != strlen(ctrl)) { fprintf(stderr, "write(clear_refs) failed\n"); exit(1); } close(fd); } int main(int argc, char **argv) { char *map; int fd; fd = open("/dev/hugepages/tmp", O_RDWR | O_CREAT); if (!fd) { fprintf(stderr, "open() failed\n"); return -errno; } if (ftruncate(fd, HUGETLB_SIZE)) { fprintf(stderr, "ftruncate() failed\n"); return -errno; } map = mmap(NULL, HUGETLB_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0); if (map == MAP_FAILED) { fprintf(stderr, "mmap() failed\n"); return -errno; } *map = 0; if (mprotect(map, HUGETLB_SIZE, PROT_READ)) { fprintf(stderr, "mmprotect() failed\n"); return -errno; } clear_softdirty(); if (mprotect(map, HUGETLB_SIZE, PROT_READ|PROT_WRITE)) { fprintf(stderr, "mmprotect() failed\n"); return -errno; } *map = 0; return 0; } -------------------------------------------------------------------------- Above test fails with SIGBUS when there is only a single free hugetlb page. # echo 1 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages # ./test Bus error (core dumped) And worse, with sufficient free hugetlb pages it will map an anonymous page into a shared mapping, for example, messing up accounting during unmap and breaking MAP_SHARED semantics: # echo 2 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages # ./test # cat /proc/meminfo | grep HugePages_ HugePages_Total: 2 HugePages_Free: 1 HugePages_Rsvd: 18446744073709551615 HugePages_Surp: 0 Reason in this particular case is that vma_wants_writenotify() will return "true", removing VM_SHARED in vma_set_page_prot() to map pages write-protected. Let's teach vma_wants_writenotify() that hugetlb does not support write-notify, including softdirty tracking. Fixes: 64e455079e1b ("mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared") Cc: # v3.18+ Signed-off-by: David Hildenbrand Acked-by: Mike Kravetz --- mm/mmap.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/mm/mmap.c b/mm/mmap.c index 61e6135c54ef..462a6b0344ac 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1683,6 +1683,13 @@ int vma_wants_writenotify(struct vm_area_struct *vma, pgprot_t vm_page_prot) if ((vm_flags & (VM_WRITE|VM_SHARED)) != ((VM_WRITE|VM_SHARED))) return 0; + /* + * Hugetlb does not require/support writenotify; especially, it does not + * support softdirty tracking. + */ + if (is_vm_hugetlb_page(vma)) + return 0; + /* The backer wishes to know when pages are first written to? */ if (vm_ops && (vm_ops->page_mkwrite || vm_ops->pfn_mkwrite)) return 1; From patchwork Fri Aug 5 11:03:29 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 12937202 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 56CADC25B08 for ; Fri, 5 Aug 2022 11:03:42 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9D0058E0003; Fri, 5 Aug 2022 07:03:41 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 97DD38E0001; Fri, 5 Aug 2022 07:03:41 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7D1068E0003; Fri, 5 Aug 2022 07:03:41 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 6AFCE8E0001 for ; Fri, 5 Aug 2022 07:03:41 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 432F1141884 for ; Fri, 5 Aug 2022 11:03:41 +0000 (UTC) X-FDA: 79765253442.04.CF6BF8A Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf11.hostedemail.com (Postfix) with ESMTP id CDB184014F for ; Fri, 5 Aug 2022 11:03:40 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1659697420; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=dLjCECz6VCL5yJ/vjQ3f/7/4x1egiz/aYuOOspPcH4c=; b=ehvxvZW1y9MiXvFMsbG2nKXkpwf8JUVwgsa158QlHiuviGJ4OJ4snW859vsQ2aUoGfy9Ox ZAfRjgmMKWMeGlFhW4xcdIpthqdstIM8vOQJw63/tYv64Aql/n7qU2Z2WjmirvLhNU5C8l DP2YGUHRPf1SoZcugUGx0Uhb91a1fcg= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-62-VrrMrNp0MKWXRwx1PZRvKA-1; Fri, 05 Aug 2022 07:03:38 -0400 X-MC-Unique: VrrMrNp0MKWXRwx1PZRvKA-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 76FAB101A54E; Fri, 5 Aug 2022 11:03:38 +0000 (UTC) Received: from t480s.fritz.box (unknown [10.39.194.85]) by smtp.corp.redhat.com (Postfix) with ESMTP id 965A540C1288; Fri, 5 Aug 2022 11:03:36 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, David Hildenbrand , Andrew Morton , Mike Kravetz , Muchun Song , Peter Xu , Peter Feiner , "Kirill A . Shutemov" Subject: [PATCH v1 2/2] mm/hugetlb: support write-faults in shared mappings Date: Fri, 5 Aug 2022 13:03:29 +0200 Message-Id: <20220805110329.80540-3-david@redhat.com> In-Reply-To: <20220805110329.80540-1-david@redhat.com> References: <20220805110329.80540-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.11.54.2 ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1659697421; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=dLjCECz6VCL5yJ/vjQ3f/7/4x1egiz/aYuOOspPcH4c=; b=y/00uoW0AAEYPb+BN+3PoUtG4ReY54JcXW0+Z1EAAOCInJJf4yzemua6DbNoG4Rvvv08bl 3SzvnPtwOImW3NmrQcQPSpel69UE9IVk13f++2uNVQTH9+Y5WhT0WO1vVNwAxB2LQgHguR ZcXmrjYs3djfFmHiq8KLg3SLLZtWm3c= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=ehvxvZW1; spf=pass (imf11.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1659697421; a=rsa-sha256; cv=none; b=SMKeM6SS2ZZ9l9HDjizhx/UPOMUTKhK1qUNSD2R27sLmyUHGvUJsGhDvhso98wDKWedn9c rSCUPmQxXNvPIfw1USQcVblCFX0+L7lSwyKCaguYXVUiuK7XorKHLK3BKlmRMHVy6KEZkQ 9GBJsGf47a5YBOibaoVGMaNl0wTnkIM= X-Rspam-User: Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=ehvxvZW1; spf=pass (imf11.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: CDB184014F X-Stat-Signature: 6qu1kfbfdspbbb6jpqcbomr7rr9hfq9f X-HE-Tag: 1659697420-431962 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Let's add a safety net if we ever get (again) a write-fault on a R/O-mapped page in a shared mapping, in which case we simply have to map the page writable. VM_MAYSHARE handling in hugetlb_fault() for FAULT_FLAG_WRITE indicates that this was at least envisioned, but could never have worked as expected. This theoretically paves the way for softdirty tracking support in hugetlb. Tested without the fix for softdirty tracking. Note that there is no need to do any kind of reservation in hugetlb_fault() in this case ... because we already have a hugetlb page mapped R/O that we will simply map writable and we are not dealing with COW/unsharing. Signed-off-by: David Hildenbrand --- mm/hugetlb.c | 21 ++++++++++++++------- 1 file changed, 14 insertions(+), 7 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index a18c071c294e..bbab7aa9d8f8 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5233,6 +5233,16 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma, VM_BUG_ON(unshare && (flags & FOLL_WRITE)); VM_BUG_ON(!unshare && !(flags & FOLL_WRITE)); + /* Let's take out shared mappings first, this should be a rare event. */ + if (unlikely(vma->vm_flags & VM_MAYSHARE)) { + if (unshare) + return 0; + if (WARN_ON_ONCE(!(vma->vm_flags & VM_WRITE))) + return VM_FAULT_SIGSEGV; + set_huge_ptep_writable(vma, haddr, ptep); + return 0; + } + pte = huge_ptep_get(ptep); old_page = pte_page(pte); @@ -5767,12 +5777,11 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, * If we are going to COW/unshare the mapping later, we examine the * pending reservations for this page now. This will ensure that any * allocations necessary to record that reservation occur outside the - * spinlock. For private mappings, we also lookup the pagecache - * page now as it is used to determine if a reservation has been - * consumed. + * spinlock. Also lookup the pagecache page now as it is used to + * determine if a reservation has been consumed. */ if ((flags & (FAULT_FLAG_WRITE|FAULT_FLAG_UNSHARE)) && - !huge_pte_write(entry)) { + !(vma->vm_flags & VM_MAYSHARE) && !huge_pte_write(entry)) { if (vma_needs_reservation(h, vma, haddr) < 0) { ret = VM_FAULT_OOM; goto out_mutex; @@ -5780,9 +5789,7 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, /* Just decrements count, does not deallocate */ vma_end_reservation(h, vma, haddr); - if (!(vma->vm_flags & VM_MAYSHARE)) - pagecache_page = hugetlbfs_pagecache_page(h, - vma, haddr); + pagecache_page = hugetlbfs_pagecache_page(h, vma, haddr); } ptl = huge_pte_lock(h, mm, ptep);