From patchwork Wed Jan 29 11:54:07 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 13953680 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E8DBDC0218D for ; Wed, 29 Jan 2025 11:54:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E76BE280017; Wed, 29 Jan 2025 06:54:46 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E01BF280018; Wed, 29 Jan 2025 06:54:46 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C28D6280017; Wed, 29 Jan 2025 06:54:46 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 9DF12280018 for ; Wed, 29 Jan 2025 06:54:46 -0500 (EST) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id AD8541A0BE7 for ; Wed, 29 Jan 2025 11:54:44 +0000 (UTC) X-FDA: 83060332488.10.A30F7C6 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf23.hostedemail.com (Postfix) with ESMTP id 6970A140004 for ; Wed, 29 Jan 2025 11:54:42 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=fdzn0TjX; spf=pass (imf23.hostedemail.com: domain of dhildenb@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=dhildenb@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1738151682; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=dBI2mWOVOmwqsdVlja7jNAEkV5b5lmKCa22lnGaqVYs=; b=ZZa772ON37NkdNfWiFMwh51uwJESOb1exYbfhYdAKgPEugtpqIO8jsPyF/SyuGhnF14t0h rICnlojOa+ZxhkTxqfcjeGJP4sUC2b6iFjZPzJoYvvbx+r6E/b/fabckyu3wTmUZ2AGMYb OPeBMYL+Fd1wOpgpJu+OttFYHp0Al5k= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1738151682; a=rsa-sha256; cv=none; b=1WfVOJK6KPI4MFZQXwIoUoTJfdsKdrsxLWxn++43xGvHuueb59zbGMR3HVQ5TSaSFcth7b Fnp2jRH0wk11rUDNHpdMB3DBchx165WmMHT2wQ7h1ojvRroYZTWtHwxxaGT4nW5dsNOCWf PgCTs0IUUYs33tB8MzZO3ilo04Bfa6s= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=fdzn0TjX; spf=pass (imf23.hostedemail.com: domain of dhildenb@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=dhildenb@redhat.com; dmarc=pass (policy=none) header.from=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1738151681; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=dBI2mWOVOmwqsdVlja7jNAEkV5b5lmKCa22lnGaqVYs=; b=fdzn0TjXMN2bKaGhrTThvk5lBfdREaaanDT9xDIGz2rFrQJkdEjDrbSu4GR4nsbD0p5qAT aimdZp1o1gnscr11A8Dxew1pUVm/i/GYc3fTrGPGcutinfCX7ulhCOeE1Husuv5FRR8E0j P0asp1NcFM4i01+ph+vT5F5PAxAPnPw= Received: from mail-wr1-f70.google.com (mail-wr1-f70.google.com [209.85.221.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-412-yHuk-rRkOeSOAz4m74G3QQ-1; Wed, 29 Jan 2025 06:54:40 -0500 X-MC-Unique: yHuk-rRkOeSOAz4m74G3QQ-1 X-Mimecast-MFC-AGG-ID: yHuk-rRkOeSOAz4m74G3QQ Received: by mail-wr1-f70.google.com with SMTP id ffacd0b85a97d-38bf4913659so5053250f8f.1 for ; Wed, 29 Jan 2025 03:54:40 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738151679; x=1738756479; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=dBI2mWOVOmwqsdVlja7jNAEkV5b5lmKCa22lnGaqVYs=; b=w2efGcJGeB5F/o+qXxbY/4wtwy63xuMg/XWd46gi0FY/rxHHZvK8sCz/yb2+HijRV6 YrCMcQ16F1prupGCUdJrDpS0yJh+UleLEkkDkJa6VGXNpbOpdTKhwOR2rwXbNeLOBICi ZAb4G8PswckwguEwH0Eu3OZBCjuJhxLesdY3yoNrik2on6wJBVQhcng8lq3eQbUJseFA ECbFiDSkZY1dMhSnTvZ4exIPMJ+FJZvU/gcSfW8/i8oghFAq2hkA4B8wjqsWDH3pLs8O J6dleDCao+KHFmqz1of0iwS823iFBBatYfEZ71QNgChEZQeE2Fq0oTcsy2eSr49WRpHl roaw== X-Forwarded-Encrypted: i=1; AJvYcCWvf2NriXJH3656SMOOkBLPZuAyk84bILyWalyhpWe9qAVnNp3oijArBmVJHZHhn5JTXAaC1kbpVg==@kvack.org X-Gm-Message-State: AOJu0YzkDE4C+a20zy0c7N3HqlPFblHpdER8/3Xh5hscixQBHoYferHr pvioQyKNmrprG8/AqwezYtUiuUbPH4ep6uIk3dT+YU1ChUpu7GcmUHkJ3BNe75foigfDJSkXnUZ AkAygNw7UymcfjN6hyUFksdwsq9aDlGOSAFVOJgrcylNvCg6q X-Gm-Gg: ASbGncvJIqfqIfTj0/XBoAdHKhCEGcl8agtOmZNBuytY5mbxqBygFrC2kB/ihqUo+ym NRU9vO/rbi7gVCWFyrC2A4LctZPAjkUjDW00FXZiXj8C5tqhU9qwYEzuzqObIXpyar6LvvGvPQX QdJRcRywd9PP3b/ThzhazgT5mL8OkBJOLzO89uUNy1bHogJhbjsFwcHU52VBeonhKsV4Qa/0YuS pUKhD+uK3IkUk2rp2kZdwjVjR7DzMZ5Kj8szXkAOPINy+rtpwrEYnXjsBEfbs7gQlJBQ36e+8S2 yWtGh1K98/oPZAn8hnKh36F+8G1WzFTzrMd6cTJ3oxULGwhnV1B8d7sPdiMAB3wFdw== X-Received: by 2002:a5d:47c9:0:b0:38c:3eab:2e17 with SMTP id ffacd0b85a97d-38c5194dae9mr2038635f8f.2.1738151679503; Wed, 29 Jan 2025 03:54:39 -0800 (PST) X-Google-Smtp-Source: AGHT+IEuFMqyDWaa/JVieaul0deYFiTz0a9oHzheZm+vFN4IrpXYFm3MUhtZq6TOTjv8DbS0i6Ouqg== X-Received: by 2002:a5d:47c9:0:b0:38c:3eab:2e17 with SMTP id ffacd0b85a97d-38c5194dae9mr2038593f8f.2.1738151679034; Wed, 29 Jan 2025 03:54:39 -0800 (PST) Received: from localhost (p200300cbc7053b0064b867195794bf13.dip0.t-ipconnect.de. [2003:cb:c705:3b00:64b8:6719:5794:bf13]) by smtp.gmail.com with UTF8SMTPSA id ffacd0b85a97d-38c2a1764d3sm17086479f8f.19.2025.01.29.03.54.37 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 29 Jan 2025 03:54:38 -0800 (PST) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-doc@vger.kernel.org, dri-devel@lists.freedesktop.org, linux-mm@kvack.org, nouveau@lists.freedesktop.org, David Hildenbrand , Andrew Morton , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Jonathan Corbet , Alex Shi , Yanteng Si , Karol Herbst , Lyude Paul , Danilo Krummrich , David Airlie , Simona Vetter , "Liam R. Howlett" , Lorenzo Stoakes , Vlastimil Babka , Jann Horn , Pasha Tatashin , Peter Xu , Alistair Popple , Jason Gunthorpe Subject: [PATCH v1 09/12] mm/rmap: handle device-exclusive entries correctly in try_to_migrate_one() Date: Wed, 29 Jan 2025 12:54:07 +0100 Message-ID: <20250129115411.2077152-10-david@redhat.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250129115411.2077152-1-david@redhat.com> References: <20250129115411.2077152-1-david@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: kLdNtL7dRGQ9lOqCfPieft0NJn6-Xjve2ic_CYYXeaI_1738151680 X-Mimecast-Originator: redhat.com content-type: text/plain; charset="US-ASCII"; x-default=true X-Rspamd-Queue-Id: 6970A140004 X-Stat-Signature: zo6h343qorkd48jr8bdxnkftrhz6bezf X-Rspam-User: X-Rspamd-Server: rspam12 X-HE-Tag: 1738151682-629642 X-HE-Meta: U2FsdGVkX1/PJKwI3M8UhgWAUrYszSw4lBvizHinl7OgrmJqHlPt9tDjWt0tORUEGh/phgrSospaQKAcN0OU4fjR/jl6JH5bkEJbwkBvJ9uTMVsenY5HeLtUsGn7mHSC+x+Uw5mPqBGMihbGV7h5ApcnvQHhqJZhWiYKwMMYSFXEXiFSxWqvS8rA9TYT/Tg/U3kNAvpaUcpyYXuXKhihuI27OQ+XCzoYZ7mVtNqIeP7bREp4bRhwT8ijZafgHRo341rLvaHYDeKSsNYPNhfyp1Am9p05P2TlBrQBDhaTfO6DnhI0zMguGAaGHpOtXAFcpvaVwjUX5+VwQHcJjR/J73yyF9ksGDvdUmMZjcsJNPPfFDuhPZV6Qq0+9iwahxryk+NATT9S385qw9x0AsBryWJ3127IBTdPQh71PTEt018dk3FWmtsaLMWjn8RUnlzlCr8zPGA2efqMGkG7m0XXogf3VJUsSgySvmqrBbTMIrLaFHUB4GIyVD8xxoOhtouPebCF4FFOH1vtP65BwLvBDaJD/4pDWNwT/Ry0kYuBZgenxxsC5RorU4VWkT0zF2KQVeRTDZJMyXGBcKIyP1CjS9r59u5SzKolrR9fVIav6tej46AgVYn4to8kTZ9deUcdHH5/4DEQmQpvRcBGChnCiNX9susVCajl6a2eA5xdCdlHsPfQ3CqSsw9hLFZtGVZsanU5lznFvaqRvzhnH95prc1zrPIKrpzJmpO7UFkrmm3VDiJHTDP8PS+hJzyYwDLvk4vO5K15fEghxfxlNOmlwH2ngqjOibNmQ9SUFj1Yx3doAPhg6o0AAdFTFEhrm2cygBB2YuOF+y74Fd/rPFeS7fRCYkrkl84JZn/8PIgH6mZXUoxGIlJSJqgfXpudpo24ZA7bceaKXgVwLuD3+B5ieej9Chgj7h9SpJAZrSgIwRin+Q1d1UkcH4E2g9S4qU/IkpavqCMUhlzpQl8/K3L 1NGxnruN wa2Swx4v+92Ei1azGV+M3ObEwRxtoxJvunGQHgyRbMZnWquZG3wrlHDeXVM3IxQzlO+rY84oNyG2gJHrFko2JxKbKjp/za807yVr9EauhyWA1u+DkdkZ2fWZblTSNxM06JRyQq29J30bDQHOU9Yld2apsGCG0G0pawxX+OOJY3B7bZIu+I/yQkhVPHuIoXPD5eucCoszmZdm1DkkLZNb9JmLW54oQuR84VNViFsdQ/he76X+ktUiwCNNA5cWBMEnbLg/D4daUvp4lrBkZEk4g15r6HiO7o08NWZwbNAYn6v7oYQRo0X1yJRRbMVmltsObVZB9r3Uf/BxzB4Ajc5qfHoqiXQCvJQMgf4mf/1EYBM7Sz1Jey/4Ujk6P69q2yc98VLz8wUvVYa5BrvV3Gp/kiiTvPY3BfrCxtTE5bi1hJZqqfGy9wqHa1Dt3akNRqRn12rNoov45y5TOFkUO3Kvges5ZU0ULlD6Vu9o/g1GUTnIYvWReWHrc0upCz8fXMcJzNa/meWy9l/2u41cZJiuGnbiSjw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000002, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Ever since commit b756a3b5e7ea ("mm: device exclusive memory access") we can return with a device-exclusive entry from page_vma_mapped_walk(). try_to_migrate_one() is not prepared for that, so teach it about these non-present nonswap PTEs. We already handle device-private entries by specializing on the folio, so we can reshuffle that code to make it work on the non-present nonswap PTEs instead. Get rid of most folio_is_device_private() handling, except when handling HWPoison. It's unclear what the right thing to do here is. Note that we could currently only run into this case with device-exclusive entries on THPs; but as we have a refcount vs. mapcount inbalance, folio splitting etc. will just bail out early and not even try migrating. For order-0 folios, we still adjust the mapcount on conversion to device-exclusive, making the rmap walk abort early (folio_mapcount() == 0 and breaking swapout). We'll fix that next, now that try_to_migrate_one() can handle it. Further note that try_to_migrate() calls MMU notifiers and holds the folio lock, so any device-exclusive users should be properly prepared for this device-exclusive PTE to "vanish". Fixes: b756a3b5e7ea ("mm: device exclusive memory access") Signed-off-by: David Hildenbrand --- mm/rmap.c | 125 ++++++++++++++++++++++-------------------------------- 1 file changed, 51 insertions(+), 74 deletions(-) diff --git a/mm/rmap.c b/mm/rmap.c index 12900f367a2a..903a78e60781 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -2040,9 +2040,9 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, { struct mm_struct *mm = vma->vm_mm; DEFINE_FOLIO_VMA_WALK(pvmw, folio, vma, address, 0); + bool anon_exclusive, writable, ret = true; pte_t pteval; struct page *subpage; - bool anon_exclusive, ret = true; struct mmu_notifier_range range; enum ttu_flags flags = (enum ttu_flags)(long)arg; unsigned long pfn; @@ -2109,24 +2109,20 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, /* Unexpected PMD-mapped THP? */ VM_BUG_ON_FOLIO(!pvmw.pte, folio); - pfn = pte_pfn(ptep_get(pvmw.pte)); - - if (folio_is_zone_device(folio)) { - /* - * Our PTE is a non-present device exclusive entry and - * calculating the subpage as for the common case would - * result in an invalid pointer. - * - * Since only PAGE_SIZE pages can currently be - * migrated, just set it to page. This will need to be - * changed when hugepage migrations to device private - * memory are supported. - */ - VM_BUG_ON_FOLIO(folio_nr_pages(folio) > 1, folio); - subpage = &folio->page; + /* + * We can end up here with selected non-swap entries that + * actually map pages similar to PROT_NONE; see + * page_vma_mapped_walk()->check_pte(). + */ + pteval = ptep_get(pvmw.pte); + if (likely(pte_present(pteval))) { + pfn = pte_pfn(pteval); } else { - subpage = folio_page(folio, pfn - folio_pfn(folio)); + pfn = swp_offset_pfn(pte_to_swp_entry(pteval)); + VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio); } + + subpage = folio_page(folio, pfn - folio_pfn(folio)); address = pvmw.address; anon_exclusive = folio_test_anon(folio) && PageAnonExclusive(subpage); @@ -2182,7 +2178,10 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, } /* Nuke the hugetlb page table entry */ pteval = huge_ptep_clear_flush(vma, address, pvmw.pte); - } else { + if (pte_dirty(pteval)) + folio_mark_dirty(folio); + writable = pte_write(pteval); + } else if (likely(pte_present(pteval))) { flush_cache_page(vma, address, pfn); /* Nuke the page table entry. */ if (should_defer_flush(mm, flags)) { @@ -2200,54 +2199,21 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, } else { pteval = ptep_clear_flush(vma, address, pvmw.pte); } + if (pte_dirty(pteval)) + folio_mark_dirty(folio); + writable = pte_write(pteval); + } else { + pte_clear(mm, address, pvmw.pte); + writable = is_writable_device_private_entry(pte_to_swp_entry(pteval)); } - /* Set the dirty flag on the folio now the pte is gone. */ - if (pte_dirty(pteval)) - folio_mark_dirty(folio); + VM_WARN_ON_FOLIO(writable && folio_test_anon(folio) && + !anon_exclusive, folio); /* Update high watermark before we lower rss */ update_hiwater_rss(mm); - if (folio_is_device_private(folio)) { - unsigned long pfn = folio_pfn(folio); - swp_entry_t entry; - pte_t swp_pte; - - if (anon_exclusive) - WARN_ON_ONCE(folio_try_share_anon_rmap_pte(folio, - subpage)); - - /* - * Store the pfn of the page in a special migration - * pte. do_swap_page() will wait until the migration - * pte is removed and then restart fault handling. - */ - entry = pte_to_swp_entry(pteval); - if (is_writable_device_private_entry(entry)) - entry = make_writable_migration_entry(pfn); - else if (anon_exclusive) - entry = make_readable_exclusive_migration_entry(pfn); - else - entry = make_readable_migration_entry(pfn); - swp_pte = swp_entry_to_pte(entry); - - /* - * pteval maps a zone device page and is therefore - * a swap pte. - */ - if (pte_swp_soft_dirty(pteval)) - swp_pte = pte_swp_mksoft_dirty(swp_pte); - if (pte_swp_uffd_wp(pteval)) - swp_pte = pte_swp_mkuffd_wp(swp_pte); - set_pte_at(mm, pvmw.address, pvmw.pte, swp_pte); - trace_set_migration_pte(pvmw.address, pte_val(swp_pte), - folio_order(folio)); - /* - * No need to invalidate here it will synchronize on - * against the special swap migration pte. - */ - } else if (PageHWPoison(subpage)) { + if (PageHWPoison(subpage) && !folio_is_device_private(folio)) { pteval = swp_entry_to_pte(make_hwpoison_entry(subpage)); if (folio_test_hugetlb(folio)) { hugetlb_count_sub(folio_nr_pages(folio), mm); @@ -2257,8 +2223,8 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, dec_mm_counter(mm, mm_counter(folio)); set_pte_at(mm, address, pvmw.pte, pteval); } - - } else if (pte_unused(pteval) && !userfaultfd_armed(vma)) { + } else if (likely(pte_present(pteval)) && pte_unused(pteval) && + !userfaultfd_armed(vma)) { /* * The guest indicated that the page content is of no * interest anymore. Simply discard the pte, vmscan @@ -2274,6 +2240,11 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, swp_entry_t entry; pte_t swp_pte; + /* + * arch_unmap_one() is expected to be a NOP on + * architectures where we could have non-swp entries + * here. + */ if (arch_unmap_one(mm, vma, address, pteval) < 0) { if (folio_test_hugetlb(folio)) set_huge_pte_at(mm, address, pvmw.pte, @@ -2284,8 +2255,6 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, page_vma_mapped_walk_done(&pvmw); break; } - VM_BUG_ON_PAGE(pte_write(pteval) && folio_test_anon(folio) && - !anon_exclusive, subpage); /* See folio_try_share_anon_rmap_pte(): clear PTE first. */ if (folio_test_hugetlb(folio)) { @@ -2310,7 +2279,7 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, * pte. do_swap_page() will wait until the migration * pte is removed and then restart fault handling. */ - if (pte_write(pteval)) + if (writable) entry = make_writable_migration_entry( page_to_pfn(subpage)); else if (anon_exclusive) @@ -2319,15 +2288,23 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, else entry = make_readable_migration_entry( page_to_pfn(subpage)); - if (pte_young(pteval)) - entry = make_migration_entry_young(entry); - if (pte_dirty(pteval)) - entry = make_migration_entry_dirty(entry); - swp_pte = swp_entry_to_pte(entry); - if (pte_soft_dirty(pteval)) - swp_pte = pte_swp_mksoft_dirty(swp_pte); - if (pte_uffd_wp(pteval)) - swp_pte = pte_swp_mkuffd_wp(swp_pte); + if (likely(pte_present(pteval))) { + if (pte_young(pteval)) + entry = make_migration_entry_young(entry); + if (pte_dirty(pteval)) + entry = make_migration_entry_dirty(entry); + swp_pte = swp_entry_to_pte(entry); + if (pte_soft_dirty(pteval)) + swp_pte = pte_swp_mksoft_dirty(swp_pte); + if (pte_uffd_wp(pteval)) + swp_pte = pte_swp_mkuffd_wp(swp_pte); + } else { + swp_pte = swp_entry_to_pte(entry); + if (pte_swp_soft_dirty(pteval)) + swp_pte = pte_swp_mksoft_dirty(swp_pte); + if (pte_swp_uffd_wp(pteval)) + swp_pte = pte_swp_mkuffd_wp(swp_pte); + } if (folio_test_hugetlb(folio)) set_huge_pte_at(mm, address, pvmw.pte, swp_pte, hsz);