From patchwork Fri Nov 8 16:20:39 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Fuad Tabba X-Patchwork-Id: 13868443 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2CABAD64060 for ; Fri, 8 Nov 2024 16:21:13 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CC0346B00A6; Fri, 8 Nov 2024 11:21:06 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id C6FA46B00A7; Fri, 8 Nov 2024 11:21:06 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A9AD16B00A8; Fri, 8 Nov 2024 11:21:06 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 8662F6B00A6 for ; Fri, 8 Nov 2024 11:21:06 -0500 (EST) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 384FA1203DD for ; Fri, 8 Nov 2024 16:21:06 +0000 (UTC) X-FDA: 82763441334.18.5833B9F Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com [209.85.128.202]) by imf09.hostedemail.com (Postfix) with ESMTP id A20E4140017 for ; Fri, 8 Nov 2024 16:20:38 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=iQbmD+M1; spf=pass (imf09.hostedemail.com: domain of 3bjouZwUKCNsQ7887DLLDIB.9LJIFKRU-JJHS79H.LOD@flex--tabba.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=3bjouZwUKCNsQ7887DLLDIB.9LJIFKRU-JJHS79H.LOD@flex--tabba.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1731082676; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=AGA0UiFmqkm5IQcmVA3JJZcNgt8jjwXyqomK1aijshk=; b=RJzYcgQQh2XWI/5DKZZGpJU3Y9ASvLxAeDcNTIRXiUTJ2SbxxIt4yGdV+on2YJhqFinkvc T7C6vOScT1DkBC+xJ8rhftPrg264mASo7gJGEzP17cbQvB3yLb8gEFEXkq61r5MWNp3RY1 Z6U1Od9haHsOfmJQAYjPFSQGFOXrqEo= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=iQbmD+M1; spf=pass (imf09.hostedemail.com: domain of 3bjouZwUKCNsQ7887DLLDIB.9LJIFKRU-JJHS79H.LOD@flex--tabba.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=3bjouZwUKCNsQ7887DLLDIB.9LJIFKRU-JJHS79H.LOD@flex--tabba.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1731082676; a=rsa-sha256; cv=none; b=StVfkOqyHqkNFvmsRWOmQlo+SR6c2AB3cteFxya9tmfkZ6mRo6tq75i1wJuxiptNgIeI4h 9AR0gIK3f9sGiTI+ya1ZICVGQaKC+RbGTODgohY19B2Mq/SzfQr9je6kZq36yUpUtx6eL7 UqeRsTruYyWTE2rXCyvCmxJASUahHWc= Received: by mail-yw1-f202.google.com with SMTP id 00721157ae682-6ea82a5480fso45675767b3.0 for ; Fri, 08 Nov 2024 08:21:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1731082863; x=1731687663; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=AGA0UiFmqkm5IQcmVA3JJZcNgt8jjwXyqomK1aijshk=; b=iQbmD+M1HTtOL6pccIWa6jFuotOoTOHIQjm4LQwJPqmsRafJXYpUYpit6k1N31qYVp 410BYM5Z7tPN5CQ+zAYWHZKIQ9apS1fqGCyIwhuBuP5NEf8njN8AT2ve8f+mjToF2bWF /BZB2dPc6vhXF6gBfyfzWVrgQp9t81rAC7zJeMXadpFFJARcpbPcIWE3pQ8VzKBBo2TZ ZVtZL/bad56cyB+Emda6nRWf88ZMbsCnER9dipeKoI5UVph1dSQyY6RRt4icWxm+gxhr qHAhop16TcU1BLFQOVt6loU9LGiXX8lkZFV1UnuDkGq6Btu5veAMMwRzv0MCSHydP3j/ QVwg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731082863; x=1731687663; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=AGA0UiFmqkm5IQcmVA3JJZcNgt8jjwXyqomK1aijshk=; b=NeUcsPP348wp+PgugmXOaUw4k5CFJLn5PftRWQuWJyt6tvToKcIYiYj7KumtX51uaQ sFGon8fp2+6aYry0LW8L9bvZnuG2yacPJDdxYo9O9mjH8YqJY4K0TJApxu/XiYl/9ErC rcO5T1FUuINbG/EocIc+JSrqHbulS3xU1UXPDi3v7bDetqVdseWGsW/z271S02n3cMXn GdhoxA2JsfGMbuQGqotkfpnMZqj4E2hIXDuB4GDpdJbexXH66BJhupcqqL5CtOdcLj5r mUI3SECcP51nuNkimx+ujBcbNDdjL72xCvFe/g3925v8VYclPstt8EajT87uMzc6YWjo igEA== X-Gm-Message-State: AOJu0YzvHC5qumUER8aYKP/yaqIQwknkLSIEehipDGolUaEjUYz4KA3a 5oOGu6sEP4u9vuQQv39/dWscQXhGyps5GWJxGOdYdM77vdU8NX7HoAWwPCV6F9z6KISnboYAvs7 uY8pq5L+qJuh3wVvYBRzTzKKnriWf12iCyUCkETBTBoSBe556Xuk4pZZXdg1+m5ftAV7kw5zlx7 CyFpKbr35a5G1BGQbt3VKFlQ== X-Google-Smtp-Source: AGHT+IGtaX0Xx+3549jHaKW8S9KxdS0O7fbxqd21Fy1uVBwd66gZLlXyQvWaaGo3amHvs3pcci25pPLu+w== X-Received: from fuad.c.googlers.com ([fda3:e722:ac3:cc00:28:9cb1:c0a8:1613]) (user=tabba job=sendgmr) by 2002:a25:dc4a:0:b0:e25:5cb1:77d8 with SMTP id 3f1490d57ef6-e337f8ed8bbmr2122276.6.1731082862962; Fri, 08 Nov 2024 08:21:02 -0800 (PST) Date: Fri, 8 Nov 2024 16:20:39 +0000 In-Reply-To: <20241108162040.159038-1-tabba@google.com> Mime-Version: 1.0 References: <20241108162040.159038-1-tabba@google.com> X-Mailer: git-send-email 2.47.0.277.g8800431eea-goog Message-ID: <20241108162040.159038-10-tabba@google.com> Subject: [RFC PATCH v1 09/10] mm: Use owner_ops on folio_put for zone device pages From: Fuad Tabba To: linux-mm@kvack.org Cc: kvm@vger.kernel.org, nouveau@lists.freedesktop.org, dri-devel@lists.freedesktop.org, david@redhat.com, rppt@kernel.org, jglisse@redhat.com, akpm@linux-foundation.org, muchun.song@linux.dev, simona@ffwll.ch, airlied@gmail.com, pbonzini@redhat.com, seanjc@google.com, willy@infradead.org, jgg@nvidia.com, jhubbard@nvidia.com, ackerleytng@google.com, vannapurve@google.com, mail@maciej.szmigiero.name, kirill.shutemov@linux.intel.com, quic_eberman@quicinc.com, maz@kernel.org, will@kernel.org, qperret@google.com, keirf@google.com, roypat@amazon.co.uk, tabba@google.com X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: A20E4140017 X-Stat-Signature: qk5gsqpz1f83xzeigqs8asi4rgxfdt43 X-Rspam-User: X-HE-Tag: 1731082838-461246 X-HE-Meta: U2FsdGVkX18Hzz6AD4SfQlPAUU1/iT3qKunBQqfwb/T6HCUz7jt6YEOrOHX1mbIc8E+95XXdyIpVdT1BNtz5K+EE9n4JtpeRzL61lgRjHAjDv3S/tf9vNo9rYz8gfMkQSsxMrrIlFjEppF1/V8ZdfR1qwLE3RJKpRcp6Teb1PdIjcDXAtj1RLRqjlcBytx+JXJ91dOLmrTJbvM1uZHvkEMcNaSxlmd0WOQkZv3OLR3YcWGDfpKB/N3f4y9lvozbcKY55mq8m57yL12atly5+8brCRDNp5QHMtivksHVLmy6ozcZ1Q67aCG4YNP69EHUSHDGXQxi1V9PjyH++FTIxDVMLvJAP0jCRWjnXZ20ysl65PxZqiELCZd9YH93xTZ/Gp5efHb3hVzwLLSly6kr+Db43MPyD3HNmkGADI1vZKo44D5rFqAXSJuRj/AwLwkDmrrqsSCieW4+b37w2Ad4qwsbwafgY5/1q6pYq/gvBVWGqDerExNwPiOpu3HpCykCZz9XfnFslFhlE+LbVRF5r1XwojWn0cdASyP2eeTZrzoRkz4UFQZnE81vJAZ8x/5QdPZThoBQzkoNqDOBHYWSfIbD1PuHlag0Id53TvzdLdpsgSVCFqLMa67UxCJjggqS0dkNzyb6qeHVc7HWLP6zkZSGMUyi3/NHe6saV5qURfJf0V2r3mPoOkTP6QG6viBusCdlEjUQdXhjXMQKrTmr6lnDM/iXV5io6IRkrELNgJZC8kSD9PmHuTT+VJzM8E5j6GT875Trdjylf5Q+h+Vmy7rYkXOo8g67+Bw5Drw4ofdSUUofF6si1YmU82LjkhK6a1FEgRy+/X6oio6Y7xMyphM1uBb7bSBepO49XKcAmBwjOxeG5Pe0NakqtgYzlcsAOPeT/yFzkeStG1x2yp3PUYQf4g1UdXWgdD2553ICWM+UmlEymP/C8I4k+usKDtEtS1ZphaKw/e7KZCbdv/d9 4Sxs/wM7 5NKpKD9HxFGXWawHXagzogMSGhyEJEio5MaAQiQOKpOy3KvxYXXSwopJ9YHomwW6uTtEwUBB3hfn4E7bFYPrHI/B5LqLRJoSDmLMQn9fa5v5QP/eRaXJVl/k5RzT331OUo42bAmR7Lmb2BTyTiR60ou8S3s2UKo2GvIJbp6vVoVYEX0klI91+OC8nFtGE/pcEi49NtwM62ad27NJydSawhMlmcW823eO9f2XR7+vCM3o1OimsH+X4njCmDAVtupV8373iJQgEiuPT/23zDB8b9OsBfR/8+jKKCzvs9yEcmtH0fzu2qCDPFaREbMsOzCdvNXfyQmV2p++7RbZNnUuAaNmcfP5zUWgbMCXPVQeK2bs/OForOsOCTpDPxzJbJCtMo4rp10GRA/PtKiI/2Yi9jSwT0ktI5UITl/3MWIrQlDKGiGpWeWw/QbJm/wjFmyIpoZ+Fo15vQIB6n3xTuHyWq//9+JEZu5pfgSDjI49guTpzQqT3FbFy9CtlVyhG9WjOhJLOVqQ/V8JMD2l8d98uO4b7/rdEcI9iFGNDX871sQds7R3tcEiYaLfC5IG863lbxwNQ/v9BuxAXGy4= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Now that we have the folio_owner_ops callback, use it for zone device pages instead of using a dedicated callback. Note that struct dev_pagemap (pgmap) in struct page is overlaid with struct folio owner_ops. Therefore, make struct dev_pagemap contain an instance of struct folio_owner_ops, to handle it the same way as struct folio_owner_ops. Also note that, although struct dev_pagemap_ops has a page_free() function, it does not have the same intention as the folio_owner_ops free() callback nor does it have the same behavior. The page_free() function is used as an optional callback to drivers that use zone device to inform them of the freeing of the page. Signed-off-by: Fuad Tabba --- include/linux/memremap.h | 8 +++++++ include/linux/mm_types.h | 16 ++++++++++++-- mm/internal.h | 1 - mm/memremap.c | 44 -------------------------------------- mm/mm_init.c | 46 ++++++++++++++++++++++++++++++++++++++++ mm/swap.c | 18 ++-------------- 6 files changed, 70 insertions(+), 63 deletions(-) diff --git a/include/linux/memremap.h b/include/linux/memremap.h index 060e27b6aee0..5b68bbc588a3 100644 --- a/include/linux/memremap.h +++ b/include/linux/memremap.h @@ -106,6 +106,7 @@ struct dev_pagemap_ops { /** * struct dev_pagemap - metadata for ZONE_DEVICE mappings + * @folio_ops: method table for folio operations. * @altmap: pre-allocated/reserved memory for vmemmap allocations * @ref: reference count that pins the devm_memremap_pages() mapping * @done: completion for @ref @@ -125,6 +126,7 @@ struct dev_pagemap_ops { * @ranges: array of ranges to be mapped when nr_range > 1 */ struct dev_pagemap { + struct folio_owner_ops folio_ops; struct vmem_altmap altmap; struct percpu_ref ref; struct completion done; @@ -140,6 +142,12 @@ struct dev_pagemap { }; }; +/* + * The folio_owner_ops structure needs to be first since pgmap in struct page is + * overlaid with owner_ops in struct folio. + */ +static_assert(offsetof(struct dev_pagemap, folio_ops) == 0); + static inline bool pgmap_has_memory_failure(struct dev_pagemap *pgmap) { return pgmap->ops && pgmap->ops->memory_failure; diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 27075ea24e67..a72fda20d5e9 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -427,6 +427,7 @@ FOLIO_MATCH(lru, lru); FOLIO_MATCH(mapping, mapping); FOLIO_MATCH(compound_head, lru); FOLIO_MATCH(compound_head, owner_ops); +FOLIO_MATCH(pgmap, owner_ops); FOLIO_MATCH(index, index); FOLIO_MATCH(private, private); FOLIO_MATCH(_mapcount, _mapcount); @@ -618,15 +619,26 @@ static inline const struct folio_owner_ops *folio_get_owner_ops(struct folio *fo /* * Get the page dev_pagemap pgmap pointer. + * + * The page pgmap is overlaid with the folio owner_ops, where bit 1 is used to + * indicate that the page/folio has owner ops. The dev_pagemap contains + * owner_ops and is handled the same way. The getter returns a sanitized + * pointer. */ -#define page_get_pgmap(page) ((page)->pgmap) +#define page_get_pgmap(page) \ + ((struct dev_pagemap *)((unsigned long)(page)->pgmap & ~FOLIO_OWNER_OPS)) /* * Set the page dev_pagemap pgmap pointer. + * + * The page pgmap is overlaid with the folio owner_ops, where bit 1 is used to + * indicate that the page/folio has owner ops. The dev_pagemap contains + * owner_ops and is handled the same way. The setter sets bit 1 to indicate + * that the page owner_ops. */ static inline void page_set_pgmap(struct page *page, struct dev_pagemap *pgmap) { - page->pgmap = pgmap; + page->pgmap = (struct dev_pagemap *)((unsigned long)pgmap | FOLIO_OWNER_OPS); } struct page_frag_cache { diff --git a/mm/internal.h b/mm/internal.h index 5a7302baeed7..a041247bed10 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -1262,7 +1262,6 @@ int numa_migrate_check(struct folio *folio, struct vm_fault *vmf, unsigned long addr, int *flags, bool writable, int *last_cpupid); -void free_zone_device_folio(struct folio *folio); int migrate_device_coherent_folio(struct folio *folio); struct vm_struct *__get_vm_area_node(unsigned long size, diff --git a/mm/memremap.c b/mm/memremap.c index 931bc85da1df..9fd5f57219eb 100644 --- a/mm/memremap.c +++ b/mm/memremap.c @@ -456,50 +456,6 @@ struct dev_pagemap *get_dev_pagemap(unsigned long pfn, } EXPORT_SYMBOL_GPL(get_dev_pagemap); -void free_zone_device_folio(struct folio *folio) -{ - struct dev_pagemap *pgmap = page_get_pgmap(&folio->page); - - if (WARN_ON_ONCE(!pgmap->ops || !pgmap->ops->page_free)) - return; - - mem_cgroup_uncharge(folio); - - /* - * Note: we don't expect anonymous compound pages yet. Once supported - * and we could PTE-map them similar to THP, we'd have to clear - * PG_anon_exclusive on all tail pages. - */ - if (folio_test_anon(folio)) { - VM_BUG_ON_FOLIO(folio_test_large(folio), folio); - __ClearPageAnonExclusive(folio_page(folio, 0)); - } - - /* - * When a device managed page is freed, the folio->mapping field - * may still contain a (stale) mapping value. For example, the - * lower bits of folio->mapping may still identify the folio as an - * anonymous folio. Ultimately, this entire field is just stale - * and wrong, and it will cause errors if not cleared. - * - * For other types of ZONE_DEVICE pages, migration is either - * handled differently or not done at all, so there is no need - * to clear folio->mapping. - */ - folio->mapping = NULL; - pgmap->ops->page_free(folio_page(folio, 0)); - - if (pgmap->type != MEMORY_DEVICE_PRIVATE && - pgmap->type != MEMORY_DEVICE_COHERENT) - /* - * Reset the refcount to 1 to prepare for handing out the page - * again. - */ - folio_set_count(folio, 1); - else - put_dev_pagemap(pgmap); -} - void zone_device_page_init(struct page *page) { /* diff --git a/mm/mm_init.c b/mm/mm_init.c index 279cdaebfd2b..47c1f8fd4914 100644 --- a/mm/mm_init.c +++ b/mm/mm_init.c @@ -974,6 +974,51 @@ static void __init memmap_init(void) } #ifdef CONFIG_ZONE_DEVICE + +static void free_zone_device_folio(struct folio *folio) +{ + struct dev_pagemap *pgmap = page_get_pgmap(&folio->page); + + if (WARN_ON_ONCE(!pgmap->ops || !pgmap->ops->page_free)) + return; + + mem_cgroup_uncharge(folio); + + /* + * Note: we don't expect anonymous compound pages yet. Once supported + * and we could PTE-map them similar to THP, we'd have to clear + * PG_anon_exclusive on all tail pages. + */ + if (folio_test_anon(folio)) { + VM_BUG_ON_FOLIO(folio_test_large(folio), folio); + __ClearPageAnonExclusive(folio_page(folio, 0)); + } + + /* + * When a device managed page is freed, the folio->mapping field + * may still contain a (stale) mapping value. For example, the + * lower bits of folio->mapping may still identify the folio as an + * anonymous folio. Ultimately, this entire field is just stale + * and wrong, and it will cause errors if not cleared. + * + * For other types of ZONE_DEVICE pages, migration is either + * handled differently or not done at all, so there is no need + * to clear folio->mapping. + */ + folio->mapping = NULL; + pgmap->ops->page_free(folio_page(folio, 0)); + + if (pgmap->type != MEMORY_DEVICE_PRIVATE && + pgmap->type != MEMORY_DEVICE_COHERENT) + /* + * Reset the refcount to 1 to prepare for handing out the page + * again. + */ + folio_set_count(folio, 1); + else + put_dev_pagemap(pgmap); +} + static void __ref __init_zone_device_page(struct page *page, unsigned long pfn, unsigned long zone_idx, int nid, struct dev_pagemap *pgmap) @@ -995,6 +1040,7 @@ static void __ref __init_zone_device_page(struct page *page, unsigned long pfn, * and zone_device_data. It is a bug if a ZONE_DEVICE page is * ever freed or placed on a driver-private list. */ + pgmap->folio_ops.free = free_zone_device_folio; page_set_pgmap(page, pgmap); page->zone_device_data = NULL; diff --git a/mm/swap.c b/mm/swap.c index 767ff6d8f47b..d2578465e270 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -117,11 +117,6 @@ void __folio_put(struct folio *folio) return; } - if (unlikely(folio_is_zone_device(folio))) { - free_zone_device_folio(folio); - return; - } - if (folio_test_hugetlb(folio)) { free_huge_folio(folio); return; @@ -947,20 +942,11 @@ void folios_put_refs(struct folio_batch *folios, unsigned int *refs) unlock_page_lruvec_irqrestore(lruvec, flags); lruvec = NULL; } - if (folio_ref_sub_and_test(folio, nr_refs)) - owner_ops->free(folio); - continue; - } - - if (folio_is_zone_device(folio)) { - if (lruvec) { - unlock_page_lruvec_irqrestore(lruvec, flags); - lruvec = NULL; - } + /* fenced by folio_is_zone_device() */ if (put_devmap_managed_folio_refs(folio, nr_refs)) continue; if (folio_ref_sub_and_test(folio, nr_refs)) - free_zone_device_folio(folio); + owner_ops->free(folio); continue; }