From patchwork Fri Nov 8 16:20:31 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Fuad Tabba X-Patchwork-Id: 13868458 Received: from mail-wm1-f73.google.com (mail-wm1-f73.google.com [209.85.128.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9B8E51957F8 for ; Fri, 8 Nov 2024 16:20:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731082848; cv=none; b=ts6GmODMjaWKS7ht91M05/6Hjt4gLeB0XTQzSFlGIkIKaeECpcCVpdocnHFxmQAKl4o8FSgU9Mhh7+zg/MkYWcW37ysX5pCI7rYXgSh9uaoG/cpt0hQvKf8zUBzCiG9fPf0Q8wF7kpExrWQGQYbQlivY0QyMH4LiTepAjF3o4bs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731082848; c=relaxed/simple; bh=d2QcORg6zlNcGw1o7eu5QsAzxTV0Ye4JUEEm3NCj+zQ=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=Vyz4Bj5sMX8R3/F/yrM0nmv6fwg+8imUuT0AW31GT3GQ8GyyvKpqcm0BE0bYkybmZ2Gx4IkJ54EbHtK3x91MjGDqEsiE9NnvYv8bno1mhLCwzPuDw1IdTDvA1975CG7U72mprxyRnc9OJ6O6gizE/5VI/ZICXqWF10kp58Y4IK4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--tabba.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=hWTmL9wU; arc=none smtp.client-ip=209.85.128.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--tabba.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="hWTmL9wU" Received: by mail-wm1-f73.google.com with SMTP id 5b1f17b1804b1-4314c6ca114so16520795e9.1 for ; Fri, 08 Nov 2024 08:20:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1731082845; x=1731687645; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=Kl2W4EtzrNZ1cJuCyrufhU34bgPSAtiEFnUtuMhqv3Y=; b=hWTmL9wU2tlqzazUqXr97Y/J42qsCJvTWBAfH8dJxJ02SBLT83hlzW4vcKD42JU5tP L5EQ4+g/mCGwOXUUCX2LgoB0E1UXv/lVfOoqmv3LkOUi+B8BvzPm05T/dGXqnZRTZrCL CA4TmNJRg7JsXUGY0pDbxkZiQ8vhMXiBSB5jOV8cMOTVgifkqwdkUz/Nsh3Q7icJ7v9/ /qJO8S9LvLw1FMHFYWJEm9oxg2ZMezfwGJpXdQqaZE3pWdZVBHhDnnq8Hlm6ylttSDY6 eszSCXH9NE+ZWglZG9pH7EDra0NtaC0evlYgIfpDHxHtscWfzbl7JLwYcLUiGRNgYr9J kzzA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731082845; x=1731687645; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Kl2W4EtzrNZ1cJuCyrufhU34bgPSAtiEFnUtuMhqv3Y=; b=Vig/Wv7NawfCMbvVB2Nc6bohF4IISKBwb74uxwNO5l0XVQWDAttLxrUI9BS6DQsYmQ 751k4QMhvHRtaxvgyy//wqv/Of2b57KNDjZaU57mFnUFOb4X7fzo3CkdSqRKMAmshYu+ Rat7cE05HQaSSRigQAO2OrN17FlXKqZ3KUnUGcNYmwGE0Jok9LoVs4xcITlAXBImmgnm j68tC58HROhz0f/2bvp4y69KQcwlm86E95hpzx4q0ueQ+RViuKkTqvohqwnyzIrZdjT3 1D1UHFaM2wU/kOsPP4aKGP2Wl8XR3zcNvQtBoZlRmENQpABrtAN0hCBE6G+SKumjPUjo 0hgw== X-Gm-Message-State: AOJu0YyKCZdeJ53AgkB/HpTkhQs2yGAP7a4batHBy3hZohLUD70U6Cyb Ra58q0dP6wicULSXrW5vrBcdPxzNoWM5guL0Bj0yuUpcHhjgGT04sK2T+GXzZhnIoTfnEhkF+A= = X-Google-Smtp-Source: AGHT+IGpbTMLKH3Z//v+mS0oCFNNN5K9KgwxRuxW+vMV7GiwPseAyRHtw03enomMXvu4TmjvVSMQK6Mc+A== X-Received: from fuad.c.googlers.com ([fda3:e722:ac3:cc00:28:9cb1:c0a8:1613]) (user=tabba job=sendgmr) by 2002:a05:600c:4a21:b0:431:43e6:adfc with SMTP id 5b1f17b1804b1-432b7527703mr24725e9.8.1731082844948; Fri, 08 Nov 2024 08:20:44 -0800 (PST) Date: Fri, 8 Nov 2024 16:20:31 +0000 In-Reply-To: <20241108162040.159038-1-tabba@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20241108162040.159038-1-tabba@google.com> X-Mailer: git-send-email 2.47.0.277.g8800431eea-goog Message-ID: <20241108162040.159038-2-tabba@google.com> Subject: [RFC PATCH v1 01/10] mm/hugetlb: rename isolate_hugetlb() to folio_isolate_hugetlb() From: Fuad Tabba To: linux-mm@kvack.org Cc: kvm@vger.kernel.org, nouveau@lists.freedesktop.org, dri-devel@lists.freedesktop.org, david@redhat.com, rppt@kernel.org, jglisse@redhat.com, akpm@linux-foundation.org, muchun.song@linux.dev, simona@ffwll.ch, airlied@gmail.com, pbonzini@redhat.com, seanjc@google.com, willy@infradead.org, jgg@nvidia.com, jhubbard@nvidia.com, ackerleytng@google.com, vannapurve@google.com, mail@maciej.szmigiero.name, kirill.shutemov@linux.intel.com, quic_eberman@quicinc.com, maz@kernel.org, will@kernel.org, qperret@google.com, keirf@google.com, roypat@amazon.co.uk, tabba@google.com From: David Hildenbrand Let's make the function name match "folio_isolate_lru()", and add some kernel doc. Signed-off-by: David Hildenbrand Signed-off-by: Fuad Tabba --- include/linux/hugetlb.h | 4 ++-- mm/gup.c | 2 +- mm/hugetlb.c | 23 ++++++++++++++++++++--- mm/mempolicy.c | 2 +- mm/migrate.c | 6 +++--- 5 files changed, 27 insertions(+), 10 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index ae4fe8615bb6..b0cf8dbfeb6a 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -153,7 +153,7 @@ bool hugetlb_reserve_pages(struct inode *inode, long from, long to, vm_flags_t vm_flags); long hugetlb_unreserve_pages(struct inode *inode, long start, long end, long freed); -bool isolate_hugetlb(struct folio *folio, struct list_head *list); +bool folio_isolate_hugetlb(struct folio *folio, struct list_head *list); int get_hwpoison_hugetlb_folio(struct folio *folio, bool *hugetlb, bool unpoison); int get_huge_page_for_hwpoison(unsigned long pfn, int flags, bool *migratable_cleared); @@ -414,7 +414,7 @@ static inline pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr, return NULL; } -static inline bool isolate_hugetlb(struct folio *folio, struct list_head *list) +static inline bool folio_isolate_hugetlb(struct folio *folio, struct list_head *list) { return false; } diff --git a/mm/gup.c b/mm/gup.c index 28ae330ec4dd..40bbcffca865 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -2301,7 +2301,7 @@ static unsigned long collect_longterm_unpinnable_folios( continue; if (folio_test_hugetlb(folio)) { - isolate_hugetlb(folio, movable_folio_list); + folio_isolate_hugetlb(folio, movable_folio_list); continue; } diff --git a/mm/hugetlb.c b/mm/hugetlb.c index cec4b121193f..e17bb2847572 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -2868,7 +2868,7 @@ static int alloc_and_dissolve_hugetlb_folio(struct hstate *h, * Fail with -EBUSY if not possible. */ spin_unlock_irq(&hugetlb_lock); - isolated = isolate_hugetlb(old_folio, list); + isolated = folio_isolate_hugetlb(old_folio, list); ret = isolated ? 0 : -EBUSY; spin_lock_irq(&hugetlb_lock); goto free_new; @@ -2953,7 +2953,7 @@ int isolate_or_dissolve_huge_page(struct page *page, struct list_head *list) if (hstate_is_gigantic(h)) return -ENOMEM; - if (folio_ref_count(folio) && isolate_hugetlb(folio, list)) + if (folio_ref_count(folio) && folio_isolate_hugetlb(folio, list)) ret = 0; else if (!folio_ref_count(folio)) ret = alloc_and_dissolve_hugetlb_folio(h, folio, list); @@ -7396,7 +7396,24 @@ __weak unsigned long hugetlb_mask_last_page(struct hstate *h) #endif /* CONFIG_ARCH_WANT_GENERAL_HUGETLB */ -bool isolate_hugetlb(struct folio *folio, struct list_head *list) +/** + * folio_isolate_hugetlb: try to isolate an allocated hugetlb folio + * @folio: the folio to isolate + * @list: the list to add the folio to on success + * + * Isolate an allocated (refcount > 0) hugetlb folio, marking it as + * isolated/non-migratable, and moving it from the active list to the + * given list. + * + * Isolation will fail if @folio is not an allocated hugetlb folio, or if + * it is already isolated/non-migratable. + * + * On success, an additional folio reference is taken that must be dropped + * using folio_putback_active_hugetlb() to undo the isolation. + * + * Return: True if isolation worked, otherwise False. + */ +bool folio_isolate_hugetlb(struct folio *folio, struct list_head *list) { bool ret = true; diff --git a/mm/mempolicy.c b/mm/mempolicy.c index bb37cd1a51d8..41bdff67757c 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -647,7 +647,7 @@ static int queue_folios_hugetlb(pte_t *pte, unsigned long hmask, */ if ((flags & MPOL_MF_MOVE_ALL) || (!folio_likely_mapped_shared(folio) && !hugetlb_pmd_shared(pte))) - if (!isolate_hugetlb(folio, qp->pagelist)) + if (!folio_isolate_hugetlb(folio, qp->pagelist)) qp->nr_failed++; unlock: spin_unlock(ptl); diff --git a/mm/migrate.c b/mm/migrate.c index dfb5eba3c522..55585b5f57ec 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -136,7 +136,7 @@ static void putback_movable_folio(struct folio *folio) * * This function shall be used whenever the isolated pageset has been * built from lru, balloon, hugetlbfs page. See isolate_migratepages_range() - * and isolate_hugetlb(). + * and folio_isolate_hugetlb(). */ void putback_movable_pages(struct list_head *l) { @@ -177,7 +177,7 @@ bool isolate_folio_to_list(struct folio *folio, struct list_head *list) bool isolated, lru; if (folio_test_hugetlb(folio)) - return isolate_hugetlb(folio, list); + return folio_isolate_hugetlb(folio, list); lru = !__folio_test_movable(folio); if (lru) @@ -2208,7 +2208,7 @@ static int __add_folio_for_migration(struct folio *folio, int node, return -EACCES; if (folio_test_hugetlb(folio)) { - if (isolate_hugetlb(folio, pagelist)) + if (folio_isolate_hugetlb(folio, pagelist)) return 1; } else if (folio_isolate_lru(folio)) { list_add_tail(&folio->lru, pagelist); From patchwork Fri Nov 8 16:20:32 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Fuad Tabba X-Patchwork-Id: 13868459 Received: from mail-wm1-f74.google.com (mail-wm1-f74.google.com [209.85.128.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C71C7198E79 for ; Fri, 8 Nov 2024 16:20:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731082850; cv=none; b=h2EsLZHhTLU/3A8krCmORcluVPFe04uUYczNglFqeEkUqEHlV1vMqQr4aoFkn0R19w+kB5CxEGGyDWDCF8pJFLqkatCY4eyHuZ4b10pW1MHCw9Yfw6LLaqsTK5qNsE9odgxkXO09wqN/6TkUDRleJKPJ9VFJbFRyJ7815Ci0siw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731082850; c=relaxed/simple; bh=IVBBSwWW0KNtO1gvsYz3uY6Z0B8g8Q/OX1T0J6vziwE=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=P0VqxEdtWzJF4jjFrQpEZzkd/x5+RF3PuL6zT/fNMMidGezyy3ZFHgLNooTYUisg2ZecBft6RPAvCR8+tvK71TcwBILiRl69C8lSuBNgRPwgbQzaUq8bEbXo8NiGDCSx4BMqKfWlrweVEDX/+ecTLfNXeGvwfaFbzk854TYztA8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--tabba.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=ShGxO+vf; arc=none smtp.client-ip=209.85.128.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--tabba.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="ShGxO+vf" Received: by mail-wm1-f74.google.com with SMTP id 5b1f17b1804b1-4314f023f55so15932225e9.2 for ; Fri, 08 Nov 2024 08:20:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1731082847; x=1731687647; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=YSD3xsyZZCSV87kyUY/RPMIHEZzslYpSJkMBD8vjd+E=; b=ShGxO+vfYWC5uTGZD8WOqyaes4rkhh93j8KH7/NfkkKfjsZErb9V1TfEM6DnFa8b6B YllGLJ0AN60POg1IUD/2cg/VNKrgqKizs00u6oz3oZVRCkhPT8r7XLCFpuLzWsYhaevw RPpcRQu+m/gR/yFDymtOxh0WL4s3rlOPa8n4eF6N2VmWTRB/sL9L9qKpPw4F7Qqb763O 5VSnI0uFCbu+jCQSJDsKBNSeMbh3Q9R/x5dVrp9Zhd9O1KGOKmgSV/++UrW1PlXe6FUI GzPLt3LHLLqH8B+tAUTNNlbbJI2k72sY7gnppefnB3TdpDP2JPETsDN/7MSZg013RiS3 U1QQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731082847; x=1731687647; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=YSD3xsyZZCSV87kyUY/RPMIHEZzslYpSJkMBD8vjd+E=; b=ukG0zEMw0+cKca2SykXTJsy9fgz/fDoxGAkkx4cInOZsX5rWOD5+QBAHcaLXMUhpXr jhnKTIKWQpOkWGAXCnvSjenZKT6H10z0FqWaMRxAJUH+VZztEif9TuSs1maZtpVxU2Oy TocoUKxWhQVa1KQ3dRVwlRocUJ1h9lW4KrVTYNaaXjmCH8C9Q/Gy4MdaaUHWfrVmmwU/ Bf/oevOJYPzfVvx3YeipEMp5X23/jbgEKhTjzpeMeZD+x8v3LE35Pn9IZNVetFfilTaG UOSTshKoKrOhvPK+ZT0QMrlt+E1Wq2xAIFM7LoztxqrDoCAdVkd5N0H6PtxD9ovxTSPY jQnA== X-Gm-Message-State: AOJu0YwaVEuX2dc8sWBfxMSyt8jRECYFw+/W4TxQywZpZV2NVuiFhuNU VichIWWiNZ8MYCwNMytTWT1hjiJ86URewlPr++ED5W/nURO7H96JZ10QAVi8rqoyo/yUgKULVQ= = X-Google-Smtp-Source: AGHT+IFgJCvqHUnZF2sVggmAky7vOXqPzIu1h0GscO2JqqplWhQ8XkVaucC9oHlByVAHMi0UBju5RAkHJw== X-Received: from fuad.c.googlers.com ([fda3:e722:ac3:cc00:28:9cb1:c0a8:1613]) (user=tabba job=sendgmr) by 2002:a05:600c:6a84:b0:42c:a8b5:c26 with SMTP id 5b1f17b1804b1-432b74fc1e5mr108415e9.2.1731082847148; Fri, 08 Nov 2024 08:20:47 -0800 (PST) Date: Fri, 8 Nov 2024 16:20:32 +0000 In-Reply-To: <20241108162040.159038-1-tabba@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20241108162040.159038-1-tabba@google.com> X-Mailer: git-send-email 2.47.0.277.g8800431eea-goog Message-ID: <20241108162040.159038-3-tabba@google.com> Subject: [RFC PATCH v1 02/10] mm/migrate: don't call folio_putback_active_hugetlb() on dst hugetlb folio From: Fuad Tabba To: linux-mm@kvack.org Cc: kvm@vger.kernel.org, nouveau@lists.freedesktop.org, dri-devel@lists.freedesktop.org, david@redhat.com, rppt@kernel.org, jglisse@redhat.com, akpm@linux-foundation.org, muchun.song@linux.dev, simona@ffwll.ch, airlied@gmail.com, pbonzini@redhat.com, seanjc@google.com, willy@infradead.org, jgg@nvidia.com, jhubbard@nvidia.com, ackerleytng@google.com, vannapurve@google.com, mail@maciej.szmigiero.name, kirill.shutemov@linux.intel.com, quic_eberman@quicinc.com, maz@kernel.org, will@kernel.org, qperret@google.com, keirf@google.com, roypat@amazon.co.uk, tabba@google.com From: David Hildenbrand We replaced a simple put_page() by a putback_active_hugepage() call in commit 3aaa76e125c1 (" mm: migrate: hugetlb: putback destination hugepage to active list"), to set the "active" flag on the dst hugetlb folio. Nowadays, we decoupled the "active" list from the flag, by calling the flag "migratable". Calling "putback" on something that wasn't allocated is weird and not future proof, especially if we might reach that path when migration failed and we just want to free the freshly allocated hugetlb folio. Let's simply set the "migratable" flag in move_hugetlb_state(), where we know that allocation succeeded, and use simple folio_put() to return our reference. Do we need the hugetlb_lock for setting that flag? Staring at other users of folio_set_hugetlb_migratable(), it does not look like it. After all, the dst folio should already be on the active list, and we are not modifying that list. Signed-off-by: David Hildenbrand Signed-off-by: Fuad Tabba --- mm/hugetlb.c | 5 +++++ mm/migrate.c | 8 ++++---- 2 files changed, 9 insertions(+), 4 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index e17bb2847572..da3fe1840ab8 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -7508,6 +7508,11 @@ void move_hugetlb_state(struct folio *old_folio, struct folio *new_folio, int re } spin_unlock_irq(&hugetlb_lock); } + /* + * Our old folio is isolated and has "migratable" cleared until it + * is putback. As migration succeeded, set the new folio "migratable". + */ + folio_set_hugetlb_migratable(new_folio); } static void hugetlb_unshare_pmds(struct vm_area_struct *vma, diff --git a/mm/migrate.c b/mm/migrate.c index 55585b5f57ec..b129dc41c140 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -1547,14 +1547,14 @@ static int unmap_and_move_huge_page(new_folio_t get_new_folio, list_move_tail(&src->lru, ret); /* - * If migration was not successful and there's a freeing callback, use - * it. Otherwise, put_page() will drop the reference grabbed during - * isolation. + * If migration was not successful and there's a freeing callback, + * return the folio to that special allocator. Otherwise, simply drop + * our additional reference. */ if (put_new_folio) put_new_folio(dst, private); else - folio_putback_active_hugetlb(dst); + folio_put(dst); return rc; } From patchwork Fri Nov 8 16:20:33 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Fuad Tabba X-Patchwork-Id: 13868460 Received: from mail-wm1-f73.google.com (mail-wm1-f73.google.com [209.85.128.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D880A1990CD for ; Fri, 8 Nov 2024 16:20:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731082852; cv=none; b=g1izDefZM97tuNLMbu5vLLYrMMipLLj9DppDI93/VUUatvH4bdjqE0a7JYEi2qz9GfVh1zgXBz00vXTBbl9/WHzrxrybXPA4mNYrD/96GsENRsouMlPxBmGMmlUo2LNl2B0iPaUhrg/3E6HJukTND9YPnKBzhBWdlPFp77fRG7g= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731082852; c=relaxed/simple; bh=vBVS9ScDauNLkO23I83Td0F6eVzgmOOsg2e68XJc528=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=pDjv/wla47rzaMn0YY3Nc9Qbs1XDAtjVNiRV14pc11vv4R5JfU9BGjBEIu4F2X6HVyYslDi9rJzXGKNqnwr/lZK7fyvwpU3Ki2lby8lMpiQGD+my7a+fJcZWssbYUdhAUwG4hQDkn1WkZB4/P67hcmajPqw+Yu5XbyGnHm6wD3A= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--tabba.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=gyCwXjNZ; arc=none smtp.client-ip=209.85.128.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--tabba.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="gyCwXjNZ" Received: by mail-wm1-f73.google.com with SMTP id 5b1f17b1804b1-43154a0886bso15964065e9.0 for ; Fri, 08 Nov 2024 08:20:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1731082849; x=1731687649; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=c97rKSzu5OtSEQDwZyqgn1x58KOnaAilRAoGtwwyknw=; b=gyCwXjNZmVS6Sx74Cpg01KGSEgJHGjv8s31j89MEAEaJZz4hlTCdeNvABRxhtPssgR ZjDSVTgPGRNMLU75Phby7M+RYJ79Bq0VUKNc0X++7IXCbXI4Icl/FTTdz5aOIV2uB6w+ Hc1bN/bKlFiUSm4sRiXr15w2fhm0H570LIKTiia1QAT87YX3zzuuRbcWJTds3t34xLvU yDWaPcI+C9NuOAHA0mbqPggSCG5hQXSmh+Ckf+KMbNEtg3H3G9HDnXJtkCua3PRigtcn 762yuofHg+s9GT9RZb4ucxcGVhuMD4tXs1EmKWGul7b+A6HbnOSmxAR871AXiEk5dwCW djHg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731082849; x=1731687649; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=c97rKSzu5OtSEQDwZyqgn1x58KOnaAilRAoGtwwyknw=; b=gHQw/V6HEsd3U1R236bVQbAqB4HtC4zyg/iHLjhlXlgPWPYPIVRa9LrRe9qVesz2pk /Bm2vJxVaZQ37pblwerIiWEMVYe2zpeUKbGmsneU+LrMKrauvxgPivevZrBpNfTbcYLY 4NTlsIuIlaK0jfEwydTYyFSg0sS4rbNvvP02l+p8UHIVEtSRcUIqbypl3xggiJpelBmt fUM+UU7qS+Y3hbOGV4GbIVbjJujZf5+Th4TImGcFp5CkFYkQn1QtSnU25rgYpfsqnHL0 ip4Qg6QNNPaDSLnyDhPtp2mO7mzCAuUseXwVBqGOQ75XIvZ+TRv76T58q0gs+wlEwsWW DLzw== X-Gm-Message-State: AOJu0YzMh6GCAnMSajDLlUEKZKAIXQ+PSPQ9ruTTBw+ykbUEMntat/u9 9ffl87iyyFSSKEHgUA2o0VcerI+flVJLYFGT9lALvZSC4w9/ZljiBBND+I21a2D19+JXRgXrGQ= = X-Google-Smtp-Source: AGHT+IELhlKA+w5GoDYCQVGGU0Q69yO5Jt+Zri88wGQ6Nd2QAGecV+Ce7InAsjA0a+zrL4cO721qJjhh5w== X-Received: from fuad.c.googlers.com ([fda3:e722:ac3:cc00:28:9cb1:c0a8:1613]) (user=tabba job=sendgmr) by 2002:a5d:4f84:0:b0:381:d049:c688 with SMTP id ffacd0b85a97d-381f1884303mr2415f8f.9.1731082849389; Fri, 08 Nov 2024 08:20:49 -0800 (PST) Date: Fri, 8 Nov 2024 16:20:33 +0000 In-Reply-To: <20241108162040.159038-1-tabba@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20241108162040.159038-1-tabba@google.com> X-Mailer: git-send-email 2.47.0.277.g8800431eea-goog Message-ID: <20241108162040.159038-4-tabba@google.com> Subject: [RFC PATCH v1 03/10] mm/hugetlb: rename "folio_putback_active_hugetlb()" to "folio_putback_hugetlb()" From: Fuad Tabba To: linux-mm@kvack.org Cc: kvm@vger.kernel.org, nouveau@lists.freedesktop.org, dri-devel@lists.freedesktop.org, david@redhat.com, rppt@kernel.org, jglisse@redhat.com, akpm@linux-foundation.org, muchun.song@linux.dev, simona@ffwll.ch, airlied@gmail.com, pbonzini@redhat.com, seanjc@google.com, willy@infradead.org, jgg@nvidia.com, jhubbard@nvidia.com, ackerleytng@google.com, vannapurve@google.com, mail@maciej.szmigiero.name, kirill.shutemov@linux.intel.com, quic_eberman@quicinc.com, maz@kernel.org, will@kernel.org, qperret@google.com, keirf@google.com, roypat@amazon.co.uk, tabba@google.com From: David Hildenbrand Now that folio_putback_hugetlb() is only called on folios that were previously isolated through folio_isolate_hugetlb(), let's rename it to match folio_putback_lru(). Add some kernel doc to clarify how this function is supposed to be used. Signed-off-by: David Hildenbrand Signed-off-by: Fuad Tabba --- include/linux/hugetlb.h | 4 ++-- mm/hugetlb.c | 15 +++++++++++++-- mm/migrate.c | 6 +++--- 3 files changed, 18 insertions(+), 7 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index b0cf8dbfeb6a..e846d7dac77c 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -157,7 +157,7 @@ bool folio_isolate_hugetlb(struct folio *folio, struct list_head *list); int get_hwpoison_hugetlb_folio(struct folio *folio, bool *hugetlb, bool unpoison); int get_huge_page_for_hwpoison(unsigned long pfn, int flags, bool *migratable_cleared); -void folio_putback_active_hugetlb(struct folio *folio); +void folio_putback_hugetlb(struct folio *folio); void move_hugetlb_state(struct folio *old_folio, struct folio *new_folio, int reason); void hugetlb_fix_reserve_counts(struct inode *inode); extern struct mutex *hugetlb_fault_mutex_table; @@ -430,7 +430,7 @@ static inline int get_huge_page_for_hwpoison(unsigned long pfn, int flags, return 0; } -static inline void folio_putback_active_hugetlb(struct folio *folio) +static inline void folio_putback_hugetlb(struct folio *folio) { } diff --git a/mm/hugetlb.c b/mm/hugetlb.c index da3fe1840ab8..d58bd815fdf2 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -7409,7 +7409,7 @@ __weak unsigned long hugetlb_mask_last_page(struct hstate *h) * it is already isolated/non-migratable. * * On success, an additional folio reference is taken that must be dropped - * using folio_putback_active_hugetlb() to undo the isolation. + * using folio_putback_hugetlb() to undo the isolation. * * Return: True if isolation worked, otherwise False. */ @@ -7461,7 +7461,18 @@ int get_huge_page_for_hwpoison(unsigned long pfn, int flags, return ret; } -void folio_putback_active_hugetlb(struct folio *folio) +/** + * folio_putback_hugetlb: unisolate a hugetlb folio + * @folio: the isolated hugetlb folio + * + * Putback/un-isolate the hugetlb folio that was previous isolated using + * folio_isolate_hugetlb(): marking it non-isolated/migratable and putting it + * back onto the active list. + * + * Will drop the additional folio reference obtained through + * folio_isolate_hugetlb(). + */ +void folio_putback_hugetlb(struct folio *folio) { spin_lock_irq(&hugetlb_lock); folio_set_hugetlb_migratable(folio); diff --git a/mm/migrate.c b/mm/migrate.c index b129dc41c140..89292d131148 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -145,7 +145,7 @@ void putback_movable_pages(struct list_head *l) list_for_each_entry_safe(folio, folio2, l, lru) { if (unlikely(folio_test_hugetlb(folio))) { - folio_putback_active_hugetlb(folio); + folio_putback_hugetlb(folio); continue; } list_del(&folio->lru); @@ -1459,7 +1459,7 @@ static int unmap_and_move_huge_page(new_folio_t get_new_folio, if (folio_ref_count(src) == 1) { /* page was freed from under us. So we are done. */ - folio_putback_active_hugetlb(src); + folio_putback_hugetlb(src); return MIGRATEPAGE_SUCCESS; } @@ -1542,7 +1542,7 @@ static int unmap_and_move_huge_page(new_folio_t get_new_folio, folio_unlock(src); out: if (rc == MIGRATEPAGE_SUCCESS) - folio_putback_active_hugetlb(src); + folio_putback_hugetlb(src); else if (rc != -EAGAIN) list_move_tail(&src->lru, ret); From patchwork Fri Nov 8 16:20:34 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Fuad Tabba X-Patchwork-Id: 13868461 Received: from mail-yw1-f201.google.com (mail-yw1-f201.google.com [209.85.128.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C080C1922EE for ; Fri, 8 Nov 2024 16:20:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731082854; cv=none; b=Qztds5y8Z7hOWocNpGCyXWWDVRm9kYe6dInE0O69tnh4Uopydp9Ln0x+AmMFW69B2WaT+QODYhQ4W1zW1fj5/y6xGO3Ux2Ajxekd78b497rdsxQceP9tuRlUtDeZQWbxpi8j6+0RwNam9oBYIHOTUoxDvf/zkl7HqtaEGoIbyrY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731082854; c=relaxed/simple; bh=5JwsK3wSjv1NaYBbv9G/u6ImEqnjv/zsjlQjC4xc6fU=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=N3GmY+RGT7JsU0Hf6wLgHNyfJsg/QZef875rPdQQW783RbUjT5S/TzmWlqH4mdj7aZ0ubiIk4WElmNthyowsLWvaJEiuCnFtLOLuxUF2+TJTBVLxSS+RUlZd/4btdFCHkLq+s2K9BsuR7f84cm7eZ8CifFl5kFwetJo+zYB+xGw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--tabba.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=LKbt33FQ; arc=none smtp.client-ip=209.85.128.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--tabba.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="LKbt33FQ" Received: by mail-yw1-f201.google.com with SMTP id 00721157ae682-6eae6aba6d4so8324257b3.3 for ; Fri, 08 Nov 2024 08:20:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1731082851; x=1731687651; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=xoczasfH3EjGVlSybMjc91OekbCn/q1l859VFPlt17E=; b=LKbt33FQNEw2pCEiBJDa775kjEbdjaIvyErQuZ5cM6p5MIVGqE7guFXxNNuhrJvJl6 vj42xLqUH3L/++D76OusPZ5pSQUvCQUG5MpsKU40BYrVX1SCqkb8CMq6WN7Pl5Pw5PTN +WfDZxmPDELd3yua6cNGMyY0dbM5J6gc+YWv3JOH4YLvMjjrkPAKpewIVzhVbN0lvTLD w85qlN8Zzf45lqXQEH8+iLzBrf9da0WtRqAabFZJP2tXZF6cT/wmZ+LHD27SJH75pBLP yS9ebVZed31tilvg/zgBA+j6Uguw7maio7zNtfMfpgzskzwc36dj4XeRRMav7DwB2QGu eYFA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731082851; x=1731687651; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=xoczasfH3EjGVlSybMjc91OekbCn/q1l859VFPlt17E=; b=WARNY7TVT2re9XupEv6pGfBfS9TQfrfb+tyJgmlebOt+RfWgP5eyb9VfFrO9A+5qGH w2lkey+Xl8AhI1r7qjgkEgATRvtA42w9fYnVCeRELmLyjarQvzx1lxpKXWlgCRQZTT5Y ytK6ICfbK9WqPUrhDiBECZnIq2aMiyk49NCkz/N+0VYtlM2/c2GEFzT1mv/Pk4JCsI2h M1O4u3nqllYINfdMbWbzAeCWxn7K7s+f53uWrZkAjrBSqYKBKwn7gBFkg5Uywxgf/Vrf il+Ep4nHqP1rqqOtYL/YQotQghE8Hk/ebXoveHmVQJlSfBm8otaoPeZlQhnTNMgDI7TF 2qsw== X-Gm-Message-State: AOJu0YxdC7hao2J3NGoGff6NjzRlBT5iIgNeDGusONH5FfH3qo6eIuwd WYX9iaTuRzJ3kTdWzRGlkNEC4F74kQeNVMqrOsHrfSdzYUJF5Vn5NnkDevOpuq58LrgoL1xgBw= = X-Google-Smtp-Source: AGHT+IGi0dElEr9qu8fzvtM9cId0LpD1vPVzTDBDvUodGuNk9OVUxqdu5KHHQFxgVeqHmDaBFP7u+PMuDw== X-Received: from fuad.c.googlers.com ([fda3:e722:ac3:cc00:28:9cb1:c0a8:1613]) (user=tabba job=sendgmr) by 2002:a05:690c:4b13:b0:6ea:decd:84e with SMTP id 00721157ae682-6eadecd0dd3mr590627b3.5.1731082851750; Fri, 08 Nov 2024 08:20:51 -0800 (PST) Date: Fri, 8 Nov 2024 16:20:34 +0000 In-Reply-To: <20241108162040.159038-1-tabba@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20241108162040.159038-1-tabba@google.com> X-Mailer: git-send-email 2.47.0.277.g8800431eea-goog Message-ID: <20241108162040.159038-5-tabba@google.com> Subject: [RFC PATCH v1 04/10] mm/hugetlb-cgroup: convert hugetlb_cgroup_css_offline() to work on folios From: Fuad Tabba To: linux-mm@kvack.org Cc: kvm@vger.kernel.org, nouveau@lists.freedesktop.org, dri-devel@lists.freedesktop.org, david@redhat.com, rppt@kernel.org, jglisse@redhat.com, akpm@linux-foundation.org, muchun.song@linux.dev, simona@ffwll.ch, airlied@gmail.com, pbonzini@redhat.com, seanjc@google.com, willy@infradead.org, jgg@nvidia.com, jhubbard@nvidia.com, ackerleytng@google.com, vannapurve@google.com, mail@maciej.szmigiero.name, kirill.shutemov@linux.intel.com, quic_eberman@quicinc.com, maz@kernel.org, will@kernel.org, qperret@google.com, keirf@google.com, roypat@amazon.co.uk, tabba@google.com From: David Hildenbrand Let's convert hugetlb_cgroup_css_offline() and hugetlb_cgroup_move_parent() to work on folios. hugepage_activelist contains folios, not pages. While at it, rename page_hcg simply to hcg, removing most of the "page" terminology. Signed-off-by: David Hildenbrand Signed-off-by: Fuad Tabba --- mm/hugetlb_cgroup.c | 17 ++++++++--------- 1 file changed, 8 insertions(+), 9 deletions(-) diff --git a/mm/hugetlb_cgroup.c b/mm/hugetlb_cgroup.c index d8d0e665caed..1bdeaf25f640 100644 --- a/mm/hugetlb_cgroup.c +++ b/mm/hugetlb_cgroup.c @@ -195,24 +195,23 @@ static void hugetlb_cgroup_css_free(struct cgroup_subsys_state *css) * cannot fail. */ static void hugetlb_cgroup_move_parent(int idx, struct hugetlb_cgroup *h_cg, - struct page *page) + struct folio *folio) { unsigned int nr_pages; struct page_counter *counter; - struct hugetlb_cgroup *page_hcg; + struct hugetlb_cgroup *hcg; struct hugetlb_cgroup *parent = parent_hugetlb_cgroup(h_cg); - struct folio *folio = page_folio(page); - page_hcg = hugetlb_cgroup_from_folio(folio); + hcg = hugetlb_cgroup_from_folio(folio); /* * We can have pages in active list without any cgroup * ie, hugepage with less than 3 pages. We can safely * ignore those pages. */ - if (!page_hcg || page_hcg != h_cg) + if (!hcg || hcg != h_cg) goto out; - nr_pages = compound_nr(page); + nr_pages = folio_nr_pages(folio); if (!parent) { parent = root_h_cgroup; /* root has no limit */ @@ -235,13 +234,13 @@ static void hugetlb_cgroup_css_offline(struct cgroup_subsys_state *css) { struct hugetlb_cgroup *h_cg = hugetlb_cgroup_from_css(css); struct hstate *h; - struct page *page; + struct folio *folio; do { for_each_hstate(h) { spin_lock_irq(&hugetlb_lock); - list_for_each_entry(page, &h->hugepage_activelist, lru) - hugetlb_cgroup_move_parent(hstate_index(h), h_cg, page); + list_for_each_entry(folio, &h->hugepage_activelist, lru) + hugetlb_cgroup_move_parent(hstate_index(h), h_cg, folio); spin_unlock_irq(&hugetlb_lock); } From patchwork Fri Nov 8 16:20:35 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Fuad Tabba X-Patchwork-Id: 13868462 Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com [209.85.128.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 27FB6199240 for ; Fri, 8 Nov 2024 16:20:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731082856; cv=none; b=A060QK4xBawzJSGCrDs/uvTPTAk9TL8cnovEMIl96T3Sn0gRLrm5clA9j911GDKgYhMm08SaBLREYl3VHxrYC6Xa4TW8rAWTCZaiZt9GS8Lwl6dfkTZPTdqGATzRtU0JwiV7X2b+XaR4pfM1qrHVKI6E9yb2Oi/1xDZpTLhtt3c= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731082856; c=relaxed/simple; bh=h4L+KXVTwGkQMGdujiC/AYyjJPycReScryr5KWmZ71A=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=m1fu+//1WxqGPQ8Q91kxQqchlIj6aCW9xosw2YFS9yr96Zuf5K+FRQxyfegBLQl3+NSvAc9GdgfdpLzrjKyoDhKiaOI4ALlOPTKHGiGENBq5wpI7f5044pD0v0e7d5Q70Vd2iI3x4G99y63ipqZxjbnlnadi1CrRiHVfLpGW81s= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--tabba.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=o/GxW8x1; arc=none smtp.client-ip=209.85.128.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--tabba.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="o/GxW8x1" Received: by mail-yw1-f202.google.com with SMTP id 00721157ae682-6ea8a5e86e9so42834357b3.2 for ; Fri, 08 Nov 2024 08:20:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1731082854; x=1731687654; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=EuhnD2p0KG553uFJn8sTE5BvDRtJggITHppkxTMLSiU=; b=o/GxW8x131gcJSb9RNmv4ntAqyMWJ2LpNrUQoPvGh0jwVEB5ZDqmprMQOn+w7OXEg1 PE9o3DeUNw4km6caLtF61kM08HhEBgOk98rl091alffzh/c+qj/heVVXEPv5YokLATXD IQwdiiVzYbjhHVAkvuLW1GnXYJ1S6US1WpWjhHFtdgWohsZgcyNcty/RnLMuMcU1WYeh kGMdorogPbnfbHnkCel6GyaCUxIWM7mqiqY7mwW98yejf+prJFylWW7exSqFGRaPRsV2 xAyLJAuTBnQYEiGEd05yl8iK4xssD4zPvXk8inihd1APXPC7LUbG6WXU6XDqeGdPod3L PZqg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731082854; x=1731687654; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=EuhnD2p0KG553uFJn8sTE5BvDRtJggITHppkxTMLSiU=; b=Ng9k8L1i8eXDOuoAHCgDWqQPSbKhA8K8o2sMfQD6AWgVmpzDeo4Iwd4HCPKT9gr0rM Irle0OAW6swi4U9vL27IMWlmsdSg5EdT4DLRVU9/U4m8Oan3vDVnlQ3YEclIUh0MaokS y9xzZR4jZ9rhvtBegj0XrjE8i4vJO585sqdBIHKs5l4jkM/IXj8NwXxOaAbl/jhBGud+ /TBoTqqoQRfwY0tQPuAqeqqtdBIUknxWMH+VxWntJjG1+HjpZI2Nd102T03cngzNa14a R5SVQuMHnbqMIXKwgsdiUit56OsXxOZlEc0v/EJVytDpChpRiUmeUDvBeseJ6bZJS/hl bmew== X-Gm-Message-State: AOJu0YzwoVg59gnlaDCzk+M1fzl+pIxSdNxL8nW0Gr0vWTwMuqFykOvz 6mKZjkn13vPfjtXIHQaGjGAIhHLmkz71Wtx7aKL/Z2QbfbtIDyXWfUiffwOWCpcoDI/ECfRXFw= = X-Google-Smtp-Source: AGHT+IESFJvscd0BPH0GhrFsgF5ykZdsLy8VBXsCYVPr2h3JOn3SzzNujpcRbxNhrkNgQoNrxNGAtYBroQ== X-Received: from fuad.c.googlers.com ([fda3:e722:ac3:cc00:28:9cb1:c0a8:1613]) (user=tabba job=sendgmr) by 2002:a25:d001:0:b0:e30:d518:30f2 with SMTP id 3f1490d57ef6-e337f8417b3mr2585276.1.1731082854058; Fri, 08 Nov 2024 08:20:54 -0800 (PST) Date: Fri, 8 Nov 2024 16:20:35 +0000 In-Reply-To: <20241108162040.159038-1-tabba@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20241108162040.159038-1-tabba@google.com> X-Mailer: git-send-email 2.47.0.277.g8800431eea-goog Message-ID: <20241108162040.159038-6-tabba@google.com> Subject: [RFC PATCH v1 05/10] mm/hugetlb: use folio->lru int demote_free_hugetlb_folios() From: Fuad Tabba To: linux-mm@kvack.org Cc: kvm@vger.kernel.org, nouveau@lists.freedesktop.org, dri-devel@lists.freedesktop.org, david@redhat.com, rppt@kernel.org, jglisse@redhat.com, akpm@linux-foundation.org, muchun.song@linux.dev, simona@ffwll.ch, airlied@gmail.com, pbonzini@redhat.com, seanjc@google.com, willy@infradead.org, jgg@nvidia.com, jhubbard@nvidia.com, ackerleytng@google.com, vannapurve@google.com, mail@maciej.szmigiero.name, kirill.shutemov@linux.intel.com, quic_eberman@quicinc.com, maz@kernel.org, will@kernel.org, qperret@google.com, keirf@google.com, roypat@amazon.co.uk, tabba@google.com From: David Hildenbrand Let's avoid messing with pages. Signed-off-by: David Hildenbrand Signed-off-by: Fuad Tabba --- mm/hugetlb.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index d58bd815fdf2..a64852280213 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -3806,13 +3806,15 @@ static long demote_free_hugetlb_folios(struct hstate *src, struct hstate *dst, for (i = 0; i < pages_per_huge_page(src); i += pages_per_huge_page(dst)) { struct page *page = folio_page(folio, i); + struct folio *new_folio; page->mapping = NULL; clear_compound_head(page); prep_compound_page(page, dst->order); + new_folio = page_folio(page); - init_new_hugetlb_folio(dst, page_folio(page)); - list_add(&page->lru, &dst_list); + init_new_hugetlb_folio(dst, new_folio); + list_add(&new_folio->lru, &dst_list); } } From patchwork Fri Nov 8 16:20:36 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Fuad Tabba X-Patchwork-Id: 13868463 Received: from mail-wm1-f74.google.com (mail-wm1-f74.google.com [209.85.128.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D736D199949 for ; Fri, 8 Nov 2024 16:20:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731082860; cv=none; b=N+amCRQ2Mpal66nLbJRh4Iwm+YBuneARtaTpYDTFfCPizYTVtnwhH2KIqK40SpXqa3UCB/QPmSGMMO+jM5xl+ff+sEFSHkdCZfgzqiRTbzugzTyElQR/zZbfeBnMCHaNnKgPx77y6Wwc1Fr/++V+hK5Tii3Cn+7Tke/XSS1q3Dc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731082860; c=relaxed/simple; bh=Wt0QuJ/nK2kslZQtPgjF86inbNMfPlwA9VGFo12XwmM=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=IZw2MoX2nkpM1klCTfdgZ5vqhP9qijAuEkNrrmbrKLNUzW28MltjgndZgwbMYZtaHHCvZpHngL/+E57M7izv7MtxzLqwcIGJS1roweeEVjl1BT+94W0te6WKxgMHNbWTN37u0Jsj29q4acZnsrcAl5zvdRJ5YhoPb/Bise7pOUk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--tabba.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=hvDUJ5mb; arc=none smtp.client-ip=209.85.128.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--tabba.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="hvDUJ5mb" Received: by mail-wm1-f74.google.com with SMTP id 5b1f17b1804b1-4316300bb15so15120005e9.2 for ; Fri, 08 Nov 2024 08:20:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1731082856; x=1731687656; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=djjKrUl0oR5lK1iUj5wBqHv7X6lVNwT0SqGCjb6L8ss=; b=hvDUJ5mbepNnsFDkP4R8j3B4BCP7FmCc1yC0JiZdov5D2XPxuQcxJZnwY/Obgrp1YF 7KxYNbDAXjYOr6hLq2Vx4wcfuxNkbhPRWQ7m5qW7bLzbQcfdYUlOM1TID2H/CGbEmTnO obPeEqRIpRYF1rJMZR7iiZJnN7zuEZDSqGap3cmSNeYMLMbvOC01P7mCoX3ze44Yhl5D UPjRlV97rT+offCVP1KYcqhhwySOU4O2x8y2ALdbL+E+fESkmEfLEH3O0FYeC4JQciSM aB08JQBf99jyiFlYjTmM7RYdWICg/Wb2rIOEfeS4QVNRVXAElOhnKzCAPl3ySyXBnohY HFnA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731082856; x=1731687656; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=djjKrUl0oR5lK1iUj5wBqHv7X6lVNwT0SqGCjb6L8ss=; b=jjCvj/nL3KCHA0uXtVfegfcfUn0UKV2ibx0CGF/B8OT0M3P3WMT4Z+dKsjNXBLVik1 oJl3XR1gubxaM2l/KctFsEeKStVu1E6eGRzCUyXuiBbOh5OqN8FL0GAgd8jr5gBuV2sa ziqgkhwXHtdqNRqvSk34eHDKANrLPbih7X+2Ne2LqeDiL8psUR4pGoO2E9plLKxHorJ7 tyWjODZTkPt7a2s+J4Vb0jl0GC3ifGWpxBr0SJescwXIRBvkrNaMTFOBplhmOZfUqufx fomiEqt1KnrCzofvTtLMlTpZxBSQCEhaFzcZdEOcv3kZEDeElkgtVzDEs1FxmJXhnSx0 1Veg== X-Gm-Message-State: AOJu0YxBmRbjDff+ELRSnRHVVxsfgKO9nshrFgN916N3yMwLpzPJoHIp 9teKhmr3snfb4PbOK7VV6WU9TW3N7Bl0snWeJgdWsMA30juXVDlkYRSqfu0j1urh4TsPtwahpg= = X-Google-Smtp-Source: AGHT+IG8oqeRtBHhwUKqiIKTnkqXE8W7GFfQboidk/t9PyUEaqaxv663mtEdw0kf/yfOyooXYNVI8Y+B3g== X-Received: from fuad.c.googlers.com ([fda3:e722:ac3:cc00:28:9cb1:c0a8:1613]) (user=tabba job=sendgmr) by 2002:a7b:c845:0:b0:42c:b995:20f1 with SMTP id 5b1f17b1804b1-432b7515d74mr74205e9.4.1731082856332; Fri, 08 Nov 2024 08:20:56 -0800 (PST) Date: Fri, 8 Nov 2024 16:20:36 +0000 In-Reply-To: <20241108162040.159038-1-tabba@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20241108162040.159038-1-tabba@google.com> X-Mailer: git-send-email 2.47.0.277.g8800431eea-goog Message-ID: <20241108162040.159038-7-tabba@google.com> Subject: [RFC PATCH v1 06/10] mm/hugetlb: use separate folio->_hugetlb_list for hugetlb-internals From: Fuad Tabba To: linux-mm@kvack.org Cc: kvm@vger.kernel.org, nouveau@lists.freedesktop.org, dri-devel@lists.freedesktop.org, david@redhat.com, rppt@kernel.org, jglisse@redhat.com, akpm@linux-foundation.org, muchun.song@linux.dev, simona@ffwll.ch, airlied@gmail.com, pbonzini@redhat.com, seanjc@google.com, willy@infradead.org, jgg@nvidia.com, jhubbard@nvidia.com, ackerleytng@google.com, vannapurve@google.com, mail@maciej.szmigiero.name, kirill.shutemov@linux.intel.com, quic_eberman@quicinc.com, maz@kernel.org, will@kernel.org, qperret@google.com, keirf@google.com, roypat@amazon.co.uk, tabba@google.com From: David Hildenbrand Let's use a separate list head in the folio, as long as hugetlb folios are not isolated. This way, we can reuse folio->lru for different purpose (e.g., owner_ops) as long as they are not isolated. Consequently, folio->lru will only be used while there is an additional folio reference that cannot be dropped until putback/un-isolated. Signed-off-by: David Hildenbrand Signed-off-by: Fuad Tabba --- include/linux/mm_types.h | 18 +++++++++ mm/hugetlb.c | 81 +++++++++++++++++++++------------------- mm/hugetlb_cgroup.c | 4 +- mm/hugetlb_vmemmap.c | 8 ++-- 4 files changed, 66 insertions(+), 45 deletions(-) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 80fef38d9d64..365c73be0bb4 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -310,6 +310,7 @@ typedef struct { * @_hugetlb_cgroup: Do not use directly, use accessor in hugetlb_cgroup.h. * @_hugetlb_cgroup_rsvd: Do not use directly, use accessor in hugetlb_cgroup.h. * @_hugetlb_hwpoison: Do not use directly, call raw_hwp_list_head(). + * @_hugetlb_list: To be used in hugetlb core code only. * @_deferred_list: Folios to be split under memory pressure. * @_unused_slab_obj_exts: Placeholder to match obj_exts in struct slab. * @@ -397,6 +398,17 @@ struct folio { }; struct page __page_2; }; + union { + struct { + unsigned long _flags_3; + unsigned long _head_3; + /* public: */ + struct list_head _hugetlb_list; + /* private: the union with struct page is transitional */ + }; + struct page __page_3; + }; + }; #define FOLIO_MATCH(pg, fl) \ @@ -433,6 +445,12 @@ FOLIO_MATCH(compound_head, _head_2); FOLIO_MATCH(flags, _flags_2a); FOLIO_MATCH(compound_head, _head_2a); #undef FOLIO_MATCH +#define FOLIO_MATCH(pg, fl) \ + static_assert(offsetof(struct folio, fl) == \ + offsetof(struct page, pg) + 3 * sizeof(struct page)) +FOLIO_MATCH(flags, _flags_3); +FOLIO_MATCH(compound_head, _head_3); +#undef FOLIO_MATCH /** * struct ptdesc - Memory descriptor for page tables. diff --git a/mm/hugetlb.c b/mm/hugetlb.c index a64852280213..2308e94d8615 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1316,7 +1316,7 @@ static void enqueue_hugetlb_folio(struct hstate *h, struct folio *folio) lockdep_assert_held(&hugetlb_lock); VM_BUG_ON_FOLIO(folio_ref_count(folio), folio); - list_move(&folio->lru, &h->hugepage_freelists[nid]); + list_move(&folio->_hugetlb_list, &h->hugepage_freelists[nid]); h->free_huge_pages++; h->free_huge_pages_node[nid]++; folio_set_hugetlb_freed(folio); @@ -1329,14 +1329,14 @@ static struct folio *dequeue_hugetlb_folio_node_exact(struct hstate *h, bool pin = !!(current->flags & PF_MEMALLOC_PIN); lockdep_assert_held(&hugetlb_lock); - list_for_each_entry(folio, &h->hugepage_freelists[nid], lru) { + list_for_each_entry(folio, &h->hugepage_freelists[nid], _hugetlb_list) { if (pin && !folio_is_longterm_pinnable(folio)) continue; if (folio_test_hwpoison(folio)) continue; - list_move(&folio->lru, &h->hugepage_activelist); + list_move(&folio->_hugetlb_list, &h->hugepage_activelist); folio_ref_unfreeze(folio, 1); folio_clear_hugetlb_freed(folio); h->free_huge_pages--; @@ -1599,7 +1599,7 @@ static void remove_hugetlb_folio(struct hstate *h, struct folio *folio, if (hstate_is_gigantic(h) && !gigantic_page_runtime_supported()) return; - list_del(&folio->lru); + list_del(&folio->_hugetlb_list); if (folio_test_hugetlb_freed(folio)) { folio_clear_hugetlb_freed(folio); @@ -1616,8 +1616,9 @@ static void remove_hugetlb_folio(struct hstate *h, struct folio *folio, * pages. Otherwise, someone (memory error handling) may try to write * to tail struct pages. */ - if (!folio_test_hugetlb_vmemmap_optimized(folio)) + if (!folio_test_hugetlb_vmemmap_optimized(folio)) { __folio_clear_hugetlb(folio); + } h->nr_huge_pages--; h->nr_huge_pages_node[nid]--; @@ -1632,7 +1633,7 @@ static void add_hugetlb_folio(struct hstate *h, struct folio *folio, lockdep_assert_held(&hugetlb_lock); - INIT_LIST_HEAD(&folio->lru); + INIT_LIST_HEAD(&folio->_hugetlb_list); h->nr_huge_pages++; h->nr_huge_pages_node[nid]++; @@ -1640,8 +1641,8 @@ static void add_hugetlb_folio(struct hstate *h, struct folio *folio, h->surplus_huge_pages++; h->surplus_huge_pages_node[nid]++; } - __folio_set_hugetlb(folio); + folio_change_private(folio, NULL); /* * We have to set hugetlb_vmemmap_optimized again as above @@ -1789,8 +1790,8 @@ static void bulk_vmemmap_restore_error(struct hstate *h, * hugetlb pages with vmemmap we will free up memory so that we * can allocate vmemmap for more hugetlb pages. */ - list_for_each_entry_safe(folio, t_folio, non_hvo_folios, lru) { - list_del(&folio->lru); + list_for_each_entry_safe(folio, t_folio, non_hvo_folios, _hugetlb_list) { + list_del(&folio->_hugetlb_list); spin_lock_irq(&hugetlb_lock); __folio_clear_hugetlb(folio); spin_unlock_irq(&hugetlb_lock); @@ -1808,14 +1809,14 @@ static void bulk_vmemmap_restore_error(struct hstate *h, * If are able to restore vmemmap and free one hugetlb page, we * quit processing the list to retry the bulk operation. */ - list_for_each_entry_safe(folio, t_folio, folio_list, lru) + list_for_each_entry_safe(folio, t_folio, folio_list, _hugetlb_list) if (hugetlb_vmemmap_restore_folio(h, folio)) { - list_del(&folio->lru); + list_del(&folio->_hugetlb_list); spin_lock_irq(&hugetlb_lock); add_hugetlb_folio(h, folio, true); spin_unlock_irq(&hugetlb_lock); } else { - list_del(&folio->lru); + list_del(&folio->_hugetlb_list); spin_lock_irq(&hugetlb_lock); __folio_clear_hugetlb(folio); spin_unlock_irq(&hugetlb_lock); @@ -1856,12 +1857,12 @@ static void update_and_free_pages_bulk(struct hstate *h, VM_WARN_ON(ret < 0); if (!list_empty(&non_hvo_folios) && ret) { spin_lock_irq(&hugetlb_lock); - list_for_each_entry(folio, &non_hvo_folios, lru) + list_for_each_entry(folio, &non_hvo_folios, _hugetlb_list) __folio_clear_hugetlb(folio); spin_unlock_irq(&hugetlb_lock); } - list_for_each_entry_safe(folio, t_folio, &non_hvo_folios, lru) { + list_for_each_entry_safe(folio, t_folio, &non_hvo_folios, _hugetlb_list) { update_and_free_hugetlb_folio(h, folio, false); cond_resched(); } @@ -1959,7 +1960,7 @@ static void __prep_account_new_huge_page(struct hstate *h, int nid) static void init_new_hugetlb_folio(struct hstate *h, struct folio *folio) { __folio_set_hugetlb(folio); - INIT_LIST_HEAD(&folio->lru); + INIT_LIST_HEAD(&folio->_hugetlb_list); hugetlb_set_folio_subpool(folio, NULL); set_hugetlb_cgroup(folio, NULL); set_hugetlb_cgroup_rsvd(folio, NULL); @@ -2112,7 +2113,7 @@ static void prep_and_add_allocated_folios(struct hstate *h, /* Add all new pool pages to free lists in one lock cycle */ spin_lock_irqsave(&hugetlb_lock, flags); - list_for_each_entry_safe(folio, tmp_f, folio_list, lru) { + list_for_each_entry_safe(folio, tmp_f, folio_list, _hugetlb_list) { __prep_account_new_huge_page(h, folio_nid(folio)); enqueue_hugetlb_folio(h, folio); } @@ -2165,7 +2166,7 @@ static struct folio *remove_pool_hugetlb_folio(struct hstate *h, if ((!acct_surplus || h->surplus_huge_pages_node[node]) && !list_empty(&h->hugepage_freelists[node])) { folio = list_entry(h->hugepage_freelists[node].next, - struct folio, lru); + struct folio, _hugetlb_list); remove_hugetlb_folio(h, folio, acct_surplus); break; } @@ -2491,7 +2492,7 @@ static int gather_surplus_pages(struct hstate *h, long delta) alloc_ok = false; break; } - list_add(&folio->lru, &surplus_list); + list_add(&folio->_hugetlb_list, &surplus_list); cond_resched(); } allocated += i; @@ -2526,7 +2527,7 @@ static int gather_surplus_pages(struct hstate *h, long delta) ret = 0; /* Free the needed pages to the hugetlb pool */ - list_for_each_entry_safe(folio, tmp, &surplus_list, lru) { + list_for_each_entry_safe(folio, tmp, &surplus_list, _hugetlb_list) { if ((--needed) < 0) break; /* Add the page to the hugetlb allocator */ @@ -2539,7 +2540,7 @@ static int gather_surplus_pages(struct hstate *h, long delta) * Free unnecessary surplus pages to the buddy allocator. * Pages have no ref count, call free_huge_folio directly. */ - list_for_each_entry_safe(folio, tmp, &surplus_list, lru) + list_for_each_entry_safe(folio, tmp, &surplus_list, _hugetlb_list) free_huge_folio(folio); spin_lock_irq(&hugetlb_lock); @@ -2588,7 +2589,7 @@ static void return_unused_surplus_pages(struct hstate *h, if (!folio) goto out; - list_add(&folio->lru, &page_list); + list_add(&folio->_hugetlb_list, &page_list); } out: @@ -3051,7 +3052,7 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, folio_set_hugetlb_restore_reserve(folio); h->resv_huge_pages--; } - list_add(&folio->lru, &h->hugepage_activelist); + list_add(&folio->_hugetlb_list, &h->hugepage_activelist); folio_ref_unfreeze(folio, 1); /* Fall through */ } @@ -3211,7 +3212,7 @@ static void __init prep_and_add_bootmem_folios(struct hstate *h, /* Send list for bulk vmemmap optimization processing */ hugetlb_vmemmap_optimize_folios(h, folio_list); - list_for_each_entry_safe(folio, tmp_f, folio_list, lru) { + list_for_each_entry_safe(folio, tmp_f, folio_list, _hugetlb_list) { if (!folio_test_hugetlb_vmemmap_optimized(folio)) { /* * If HVO fails, initialize all tail struct pages @@ -3260,7 +3261,7 @@ static void __init gather_bootmem_prealloc_node(unsigned long nid) hugetlb_folio_init_vmemmap(folio, h, HUGETLB_VMEMMAP_RESERVE_PAGES); init_new_hugetlb_folio(h, folio); - list_add(&folio->lru, &folio_list); + list_add(&folio->_hugetlb_list, &folio_list); /* * We need to restore the 'stolen' pages to totalram_pages @@ -3317,7 +3318,7 @@ static void __init hugetlb_hstate_alloc_pages_onenode(struct hstate *h, int nid) &node_states[N_MEMORY], NULL); if (!folio) break; - list_add(&folio->lru, &folio_list); + list_add(&folio->_hugetlb_list, &folio_list); } cond_resched(); } @@ -3379,7 +3380,7 @@ static void __init hugetlb_pages_alloc_boot_node(unsigned long start, unsigned l if (!folio) break; - list_move(&folio->lru, &folio_list); + list_move(&folio->_hugetlb_list, &folio_list); cond_resched(); } @@ -3544,13 +3545,13 @@ static void try_to_free_low(struct hstate *h, unsigned long count, for_each_node_mask(i, *nodes_allowed) { struct folio *folio, *next; struct list_head *freel = &h->hugepage_freelists[i]; - list_for_each_entry_safe(folio, next, freel, lru) { + list_for_each_entry_safe(folio, next, freel, _hugetlb_list) { if (count >= h->nr_huge_pages) goto out; if (folio_test_highmem(folio)) continue; remove_hugetlb_folio(h, folio, false); - list_add(&folio->lru, &page_list); + list_add(&folio->_hugetlb_list, &page_list); } } @@ -3703,7 +3704,7 @@ static int set_max_huge_pages(struct hstate *h, unsigned long count, int nid, goto out; } - list_add(&folio->lru, &page_list); + list_add(&folio->_hugetlb_list, &page_list); allocated++; /* Bail for signals. Probably ctrl-c from user */ @@ -3750,7 +3751,7 @@ static int set_max_huge_pages(struct hstate *h, unsigned long count, int nid, if (!folio) break; - list_add(&folio->lru, &page_list); + list_add(&folio->_hugetlb_list, &page_list); } /* free the pages after dropping lock */ spin_unlock_irq(&hugetlb_lock); @@ -3793,13 +3794,13 @@ static long demote_free_hugetlb_folios(struct hstate *src, struct hstate *dst, */ mutex_lock(&dst->resize_lock); - list_for_each_entry_safe(folio, next, src_list, lru) { + list_for_each_entry_safe(folio, next, src_list, _hugetlb_list) { int i; if (folio_test_hugetlb_vmemmap_optimized(folio)) continue; - list_del(&folio->lru); + list_del(&folio->_hugetlb_list); split_page_owner(&folio->page, huge_page_order(src), huge_page_order(dst)); pgalloc_tag_split(folio, huge_page_order(src), huge_page_order(dst)); @@ -3814,7 +3815,7 @@ static long demote_free_hugetlb_folios(struct hstate *src, struct hstate *dst, new_folio = page_folio(page); init_new_hugetlb_folio(dst, new_folio); - list_add(&new_folio->lru, &dst_list); + list_add(&new_folio->_hugetlb_list, &dst_list); } } @@ -3847,12 +3848,12 @@ static long demote_pool_huge_page(struct hstate *src, nodemask_t *nodes_allowed, LIST_HEAD(list); struct folio *folio, *next; - list_for_each_entry_safe(folio, next, &src->hugepage_freelists[node], lru) { + list_for_each_entry_safe(folio, next, &src->hugepage_freelists[node], _hugetlb_list) { if (folio_test_hwpoison(folio)) continue; remove_hugetlb_folio(src, folio, false); - list_add(&folio->lru, &list); + list_add(&folio->_hugetlb_list, &list); if (++nr_demoted == nr_to_demote) break; @@ -3864,8 +3865,8 @@ static long demote_pool_huge_page(struct hstate *src, nodemask_t *nodes_allowed, spin_lock_irq(&hugetlb_lock); - list_for_each_entry_safe(folio, next, &list, lru) { - list_del(&folio->lru); + list_for_each_entry_safe(folio, next, &list, _hugetlb_list) { + list_del(&folio->_hugetlb_list); add_hugetlb_folio(src, folio, false); nr_demoted--; @@ -7427,7 +7428,8 @@ bool folio_isolate_hugetlb(struct folio *folio, struct list_head *list) goto unlock; } folio_clear_hugetlb_migratable(folio); - list_move_tail(&folio->lru, list); + list_del_init(&folio->_hugetlb_list); + list_add_tail(&folio->lru, list); unlock: spin_unlock_irq(&hugetlb_lock); return ret; @@ -7478,7 +7480,8 @@ void folio_putback_hugetlb(struct folio *folio) { spin_lock_irq(&hugetlb_lock); folio_set_hugetlb_migratable(folio); - list_move_tail(&folio->lru, &(folio_hstate(folio))->hugepage_activelist); + list_del_init(&folio->lru); + list_add_tail(&folio->_hugetlb_list, &(folio_hstate(folio))->hugepage_activelist); spin_unlock_irq(&hugetlb_lock); folio_put(folio); } diff --git a/mm/hugetlb_cgroup.c b/mm/hugetlb_cgroup.c index 1bdeaf25f640..ee720eeaf6b1 100644 --- a/mm/hugetlb_cgroup.c +++ b/mm/hugetlb_cgroup.c @@ -239,7 +239,7 @@ static void hugetlb_cgroup_css_offline(struct cgroup_subsys_state *css) do { for_each_hstate(h) { spin_lock_irq(&hugetlb_lock); - list_for_each_entry(folio, &h->hugepage_activelist, lru) + list_for_each_entry(folio, &h->hugepage_activelist, _hugetlb_list) hugetlb_cgroup_move_parent(hstate_index(h), h_cg, folio); spin_unlock_irq(&hugetlb_lock); @@ -933,7 +933,7 @@ void hugetlb_cgroup_migrate(struct folio *old_folio, struct folio *new_folio) /* move the h_cg details to new cgroup */ set_hugetlb_cgroup(new_folio, h_cg); set_hugetlb_cgroup_rsvd(new_folio, h_cg_rsvd); - list_move(&new_folio->lru, &h->hugepage_activelist); + list_move(&new_folio->_hugetlb_list, &h->hugepage_activelist); spin_unlock_irq(&hugetlb_lock); return; } diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c index 57b7f591eee8..b2cb8d328aac 100644 --- a/mm/hugetlb_vmemmap.c +++ b/mm/hugetlb_vmemmap.c @@ -519,7 +519,7 @@ long hugetlb_vmemmap_restore_folios(const struct hstate *h, long ret = 0; unsigned long flags = VMEMMAP_REMAP_NO_TLB_FLUSH | VMEMMAP_SYNCHRONIZE_RCU; - list_for_each_entry_safe(folio, t_folio, folio_list, lru) { + list_for_each_entry_safe(folio, t_folio, folio_list, _hugetlb_list) { if (folio_test_hugetlb_vmemmap_optimized(folio)) { ret = __hugetlb_vmemmap_restore_folio(h, folio, flags); /* only need to synchronize_rcu() once for each batch */ @@ -531,7 +531,7 @@ long hugetlb_vmemmap_restore_folios(const struct hstate *h, } /* Add non-optimized folios to output list */ - list_move(&folio->lru, non_hvo_folios); + list_move(&folio->_hugetlb_list, non_hvo_folios); } if (restored) @@ -651,7 +651,7 @@ void hugetlb_vmemmap_optimize_folios(struct hstate *h, struct list_head *folio_l LIST_HEAD(vmemmap_pages); unsigned long flags = VMEMMAP_REMAP_NO_TLB_FLUSH | VMEMMAP_SYNCHRONIZE_RCU; - list_for_each_entry(folio, folio_list, lru) { + list_for_each_entry(folio, folio_list, _hugetlb_list) { int ret = hugetlb_vmemmap_split_folio(h, folio); /* @@ -666,7 +666,7 @@ void hugetlb_vmemmap_optimize_folios(struct hstate *h, struct list_head *folio_l flush_tlb_all(); - list_for_each_entry(folio, folio_list, lru) { + list_for_each_entry(folio, folio_list, _hugetlb_list) { int ret; ret = __hugetlb_vmemmap_optimize_folio(h, folio, &vmemmap_pages, flags); From patchwork Fri Nov 8 16:20:37 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Fuad Tabba X-Patchwork-Id: 13868464 Received: from mail-wr1-f74.google.com (mail-wr1-f74.google.com [209.85.221.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 35C0919922F for ; Fri, 8 Nov 2024 16:21:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731082862; cv=none; b=IJukS/LWUCCL6iRToXDsCsxtDrqS84BOQAjInvJVoVxgOGw6oOX8llv0t+6JrjlvpRSpodzaZEo0Y3BL94MclRwtc/flRqOHliWn/2I9Qg9ILMtUhQI9F4FlphHWDgL6g/apcepqWFOexeq6up7mgZNkSG+WlzbiLa2g+LsYISM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731082862; c=relaxed/simple; bh=0CzrKZAjiGyTLtq31cuunQlobWkpgK5KVAaAiEa5a4o=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=hPxWxCi2o0a84x4Yp72N+SV7jl9mFOhK0qUXCs4eMot9tFexXrIMDIkEjq3ER7vHL2VZWppdyoDoU/tGjqF3kjUdJEfHG24CZevwFdvsTRB4GxLIZSNOvBa95P5JLqDkOdO+V05ZqBaTqwLha532w+2wEwPB5/U3b0axviKlWpY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--tabba.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=mNLFtrbU; arc=none smtp.client-ip=209.85.221.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--tabba.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="mNLFtrbU" Received: by mail-wr1-f74.google.com with SMTP id ffacd0b85a97d-37d5016d21eso1218046f8f.3 for ; Fri, 08 Nov 2024 08:20:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1731082858; x=1731687658; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=Z1UKVDyubm+0FNKQOmqAnK7wACyIXxMp06jV3hLqPe8=; b=mNLFtrbUiEjNHHmVzFdhnnZBxcAfHBXrJBhe3msSA5stgw/V/XBIy8FzWeVLYZgIZV fGu6MbkUyXnnZZajSTuTzi9Hcbn5K+Ela1iBsFQexv4FqHyszXDYgLbvYJ59/U/aK6tp kXrW0DmbAAKShJR3PwAgTXLEKp7xCkPSawUZJFVwk4CrlbusmbJakSsuVfPMdphVEta2 W2Xb3+9Gtz1+Q9GwmryAygp/gVBwxt9sF3IDgpnRTbEbDRHn0rhOTc0QaA+NHJy2HJ28 zeRVYMWaCfxXbwfTWJJSoV0rmS/mDuiLp+Ux9yfL2qiZqKyEGHGoL4E8lJsP4vsD7R4p 9M3w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731082858; x=1731687658; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Z1UKVDyubm+0FNKQOmqAnK7wACyIXxMp06jV3hLqPe8=; b=bsjjj7Qp6UuIYUyIgFNHADqF7UUTKob0XKn+HMaBy1hozut1phhRMKPcl250LTigIh mGMcBWXsQfp6N0FrTfPvn2kdd1U2CsEKV5UdR/YqQWEE2Q7unug5RRH8SPyUWwjitVjr d4zm/5DxzDTPWARdJHJTNJf1UCnkXuIF6l1y1K0/iJZEsMCZoyBDEyGpjDwCFe1uMut/ SmdKH6Z6BT1r9FQRxUu+/4iGUrkpbR1gF6jyavnfxXvY5OU6PPjiISv/9qadJXAz3itn jm9jwi2uk7BThomHB6hOdBDLvPUqhBDLLkred/0xf2vqjOgSofgOT0OViwSbRDzXBryn vU4Q== X-Gm-Message-State: AOJu0YyCpYHEnoy1KcGEkzbBCHO5gGX7Nm9shyL7w3hXk7BnKTrUhnvK tzTIza6jAKmNe/tM0Nd+dt0FKbQDpcmtroexv2yXn3TrU6Ai+T+RnsUJuhAu4pS+FcsQWCqjiA= = X-Google-Smtp-Source: AGHT+IHkZAD0auPV4emIDnQo0+EU0NmTafyVDODnqw54sbX8+2pfhiQ95jeQsPF/8IsVn+XR42DkmZua1Q== X-Received: from fuad.c.googlers.com ([fda3:e722:ac3:cc00:28:9cb1:c0a8:1613]) (user=tabba job=sendgmr) by 2002:adf:ffca:0:b0:37d:4cee:559 with SMTP id ffacd0b85a97d-381f1862148mr2432f8f.3.1731082858585; Fri, 08 Nov 2024 08:20:58 -0800 (PST) Date: Fri, 8 Nov 2024 16:20:37 +0000 In-Reply-To: <20241108162040.159038-1-tabba@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20241108162040.159038-1-tabba@google.com> X-Mailer: git-send-email 2.47.0.277.g8800431eea-goog Message-ID: <20241108162040.159038-8-tabba@google.com> Subject: [RFC PATCH v1 07/10] mm: Introduce struct folio_owner_ops From: Fuad Tabba To: linux-mm@kvack.org Cc: kvm@vger.kernel.org, nouveau@lists.freedesktop.org, dri-devel@lists.freedesktop.org, david@redhat.com, rppt@kernel.org, jglisse@redhat.com, akpm@linux-foundation.org, muchun.song@linux.dev, simona@ffwll.ch, airlied@gmail.com, pbonzini@redhat.com, seanjc@google.com, willy@infradead.org, jgg@nvidia.com, jhubbard@nvidia.com, ackerleytng@google.com, vannapurve@google.com, mail@maciej.szmigiero.name, kirill.shutemov@linux.intel.com, quic_eberman@quicinc.com, maz@kernel.org, will@kernel.org, qperret@google.com, keirf@google.com, roypat@amazon.co.uk, tabba@google.com Introduce struct folio_owner_ops, a method table that contains callbacks to owners of folios that need special handling for certain operations. For now, it only contains a callback for folio free(), which is called immediately after the folio refcount drops to 0. Add a pointer to this struct overlaid on struct page compound_head, pgmap, and struct page/folio lru. The users of this struct either will not use lru (e.g., zone device), or would be able to easily isolate when lru is being used (e.g., hugetlb) and handle it accordingly. While folios are isolated, they cannot get freed and the owner_ops are unstable. This is sufficient for the current use case of returning these folios to a custom allocator. To identify that a folio has owner_ops, we set bit 1 of the field, in a similar way to that bit 0 of compound_head is used to identify compound pages. Signed-off-by: Fuad Tabba --- include/linux/mm_types.h | 64 +++++++++++++++++++++++++++++++++++++--- mm/swap.c | 19 ++++++++++++ 2 files changed, 79 insertions(+), 4 deletions(-) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 365c73be0bb4..6e06286f44f1 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -41,10 +41,12 @@ struct mem_cgroup; * * If you allocate the page using alloc_pages(), you can use some of the * space in struct page for your own purposes. The five words in the main - * union are available, except for bit 0 of the first word which must be - * kept clear. Many users use this word to store a pointer to an object - * which is guaranteed to be aligned. If you use the same storage as - * page->mapping, you must restore it to NULL before freeing the page. + * union are available, except for bit 0 (used for compound_head pages) + * and bit 1 (used for owner_ops) of the first word, which must be kept + * clear and used with care. Many users use this word to store a pointer + * to an object which is guaranteed to be aligned. If you use the same + * storage as page->mapping, you must restore it to NULL before freeing + * the page. * * The mapcount field must not be used for own purposes. * @@ -283,10 +285,16 @@ typedef struct { unsigned long val; } swp_entry_t; +struct folio_owner_ops; + /** * struct folio - Represents a contiguous set of bytes. * @flags: Identical to the page flags. * @lru: Least Recently Used list; tracks how recently this folio was used. + * @owner_ops: Pointer to callback operations of the folio owner. Valid if bit 1 + * is set. + * NOTE: Cannot be used with lru, since it is overlaid with it. To use lru, + * owner_ops must be cleared first, and restored once done with lru. * @mlock_count: Number of times this folio has been pinned by mlock(). * @mapping: The file this page belongs to, or refers to the anon_vma for * anonymous memory. @@ -330,6 +338,7 @@ struct folio { unsigned long flags; union { struct list_head lru; + const struct folio_owner_ops *owner_ops; /* Bit 1 is set */ /* private: avoid cluttering the output */ struct { void *__filler; @@ -417,6 +426,7 @@ FOLIO_MATCH(flags, flags); FOLIO_MATCH(lru, lru); FOLIO_MATCH(mapping, mapping); FOLIO_MATCH(compound_head, lru); +FOLIO_MATCH(compound_head, owner_ops); FOLIO_MATCH(index, index); FOLIO_MATCH(private, private); FOLIO_MATCH(_mapcount, _mapcount); @@ -452,6 +462,13 @@ FOLIO_MATCH(flags, _flags_3); FOLIO_MATCH(compound_head, _head_3); #undef FOLIO_MATCH +struct folio_owner_ops { + /* + * Called once the folio refcount reaches 0. + */ + void (*free)(struct folio *folio); +}; + /** * struct ptdesc - Memory descriptor for page tables. * @__page_flags: Same as page flags. Powerpc only. @@ -560,6 +577,45 @@ static inline void *folio_get_private(struct folio *folio) return folio->private; } +/* + * Use bit 1, since bit 0 is used to indicate a compound page in compound_head, + * which owner_ops is overlaid with. + */ +#define FOLIO_OWNER_OPS_BIT 1UL +#define FOLIO_OWNER_OPS (1UL << FOLIO_OWNER_OPS_BIT) + +/* + * Set the folio owner_ops as well as bit 1 of the pointer to indicate that the + * folio has owner_ops. + */ +static inline void folio_set_owner_ops(struct folio *folio, const struct folio_owner_ops *owner_ops) +{ + owner_ops = (const struct folio_owner_ops *)((unsigned long)owner_ops | FOLIO_OWNER_OPS); + folio->owner_ops = owner_ops; +} + +/* + * Clear the folio owner_ops including bit 1 of the pointer. + */ +static inline void folio_clear_owner_ops(struct folio *folio) +{ + folio->owner_ops = NULL; +} + +/* + * Return the folio's owner_ops if it has them, otherwise, return NULL. + */ +static inline const struct folio_owner_ops *folio_get_owner_ops(struct folio *folio) +{ + const struct folio_owner_ops *owner_ops = folio->owner_ops; + + if (!((unsigned long)owner_ops & FOLIO_OWNER_OPS)) + return NULL; + + owner_ops = (const struct folio_owner_ops *)((unsigned long)owner_ops & ~FOLIO_OWNER_OPS); + return owner_ops; +} + struct page_frag_cache { void * va; #if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) diff --git a/mm/swap.c b/mm/swap.c index 638a3f001676..767ff6d8f47b 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -110,6 +110,13 @@ static void page_cache_release(struct folio *folio) void __folio_put(struct folio *folio) { + const struct folio_owner_ops *owner_ops = folio_get_owner_ops(folio); + + if (unlikely(owner_ops)) { + owner_ops->free(folio); + return; + } + if (unlikely(folio_is_zone_device(folio))) { free_zone_device_folio(folio); return; @@ -929,10 +936,22 @@ void folios_put_refs(struct folio_batch *folios, unsigned int *refs) for (i = 0, j = 0; i < folios->nr; i++) { struct folio *folio = folios->folios[i]; unsigned int nr_refs = refs ? refs[i] : 1; + const struct folio_owner_ops *owner_ops; if (is_huge_zero_folio(folio)) continue; + owner_ops = folio_get_owner_ops(folio); + if (unlikely(owner_ops)) { + if (lruvec) { + unlock_page_lruvec_irqrestore(lruvec, flags); + lruvec = NULL; + } + if (folio_ref_sub_and_test(folio, nr_refs)) + owner_ops->free(folio); + continue; + } + if (folio_is_zone_device(folio)) { if (lruvec) { unlock_page_lruvec_irqrestore(lruvec, flags); From patchwork Fri Nov 8 16:20:38 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Fuad Tabba X-Patchwork-Id: 13868465 Received: from mail-wm1-f73.google.com (mail-wm1-f73.google.com [209.85.128.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6B38B199FAD for ; Fri, 8 Nov 2024 16:21:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731082864; cv=none; b=WwiWvZt+g6OF7Bh0y9C4ztOn+cJ+5CwhyU7gA5Y33k4S528vjmdsC3AL7XCb4sEhFuAi8fjIEjr3Dr+32JdSTzMLaUNSV52uUw28egS7g8EZPTBWkxk/V6iZyPXHf9XaYJ0w5/e9VIHy3rOoLM3O1DGakc7n8LwPGL15+N5d/KE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731082864; c=relaxed/simple; bh=K1b6tyL4L39G1de+nnSTbXKIMTHDWGrE84WevKk4PlE=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=J5OwWIWimaATdm7y2TRp8RgNuM09XC/9vApIyunSSwK9ydKE7bkrv3cMUW0s0/ceG9GpGK9C6DksoWjsmnU7FXg3lfzzk1+xuhNd1Zl+G+niXgHYBbOJEDgyz2AJcLdtoDZTg2i1t6ztL4HogGT9BKiQLo60AO/c9QSL3Ms2IBg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--tabba.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=VIBppyD2; arc=none smtp.client-ip=209.85.128.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--tabba.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="VIBppyD2" Received: by mail-wm1-f73.google.com with SMTP id 5b1f17b1804b1-43151a9ea95so18037475e9.1 for ; Fri, 08 Nov 2024 08:21:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1731082861; x=1731687661; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=Zc2jp+pMldWQELuiNGfEMyRAPqgAV/dSRAey9KCp9+8=; b=VIBppyD2FgCT+4sPPfp61utzdKkL778hOeWl4EurHITmZU/fko7FdnjrqtbWGmoHYL ihg5OWP352bS8dKf6+uHhinY5BAAGfyooAAnaxUajqFmhVlBmD2kkcNiVWM8CdUvxGdz FpOUzRMXvfbocWWqR/7B3MtORRtYLW0s7uliHUaEG0j37Eg7aWCTLWDV10LO4Fi/PXIH /h1bwpyTASbt00l3EpKlAtuehbUbNYaKxY4oLoJEUgrJUUmxbCH4j8pIwnMxCyyTuWbR HhEpZzzXe4kUMvcUGKItXHd3r+VwvHkTzWe6c9PsqiKzDHMvfBK66LoKlxtUZ6rSIU3M ipyg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731082861; x=1731687661; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Zc2jp+pMldWQELuiNGfEMyRAPqgAV/dSRAey9KCp9+8=; b=HzxmmykjWVaWAYXg3gUc5z2bWcam+viZ5iWYRw1/AxBEYkf2AJAeEndIBHnUnZyjys uTjLecQfl6r3X9wk0MWwA8IO9beKgOAFWSx0Lx3ukGC0JJ72ts542I040g89F4XfyCnW DpT1Wg5495LqC/7QvgRuaWssHnicDWpy/mwSN5/KQqYvGsBVbl68L0xD5jM8eb82TGYn EJItraTYxaFnC9P0QuQNOi5hwJ0QJDsJidbUQJL8lEjia7H4LpcYmwnYaFf7sKx3hMv7 9FdkbRsFtkF857hCzDTED+Yf5rKdjPPXFmtT7RsAkuZY8t7qk0K2knszgEkRAtyLQVwg h0tQ== X-Gm-Message-State: AOJu0Yx2XmznDMF1JeGbW+loZQvMwLcbIGUIVGoqYswWF1ed32IUyyWG VfTkVNP2xX0/4iGFhj/YPqro74NrW//Czc4pNWKeXdRHRnTjipNOAbUk/cKWVkrnkzFSfA8kJA= = X-Google-Smtp-Source: AGHT+IEFnogtOzb1Xio0nKmbj4KqgDBTNvMMzqTZB5Q0tAq5Rm4eo6VUxTjFp/Ump5nYGzIZAg3lfcnSTA== X-Received: from fuad.c.googlers.com ([fda3:e722:ac3:cc00:28:9cb1:c0a8:1613]) (user=tabba job=sendgmr) by 2002:a7b:cbc9:0:b0:42e:6ad4:e411 with SMTP id 5b1f17b1804b1-432b741c9b5mr131765e9.1.1731082860803; Fri, 08 Nov 2024 08:21:00 -0800 (PST) Date: Fri, 8 Nov 2024 16:20:38 +0000 In-Reply-To: <20241108162040.159038-1-tabba@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20241108162040.159038-1-tabba@google.com> X-Mailer: git-send-email 2.47.0.277.g8800431eea-goog Message-ID: <20241108162040.159038-9-tabba@google.com> Subject: [RFC PATCH v1 08/10] mm: Use getters and setters to access page pgmap From: Fuad Tabba To: linux-mm@kvack.org Cc: kvm@vger.kernel.org, nouveau@lists.freedesktop.org, dri-devel@lists.freedesktop.org, david@redhat.com, rppt@kernel.org, jglisse@redhat.com, akpm@linux-foundation.org, muchun.song@linux.dev, simona@ffwll.ch, airlied@gmail.com, pbonzini@redhat.com, seanjc@google.com, willy@infradead.org, jgg@nvidia.com, jhubbard@nvidia.com, ackerleytng@google.com, vannapurve@google.com, mail@maciej.szmigiero.name, kirill.shutemov@linux.intel.com, quic_eberman@quicinc.com, maz@kernel.org, will@kernel.org, qperret@google.com, keirf@google.com, roypat@amazon.co.uk, tabba@google.com The pointer to pgmap in struct page is overlaid with folio owner_ops. To indicate that a page/folio has owner ops, bit 1 is set. Therefore, before we can start to using owner_ops, we need to ensure that all accesses to page pgmap sanitize the pointer value. This patch introduces the accessors, which will be modified in the following patch to sanitize the pointer values. No functional change intended. Signed-off-by: Fuad Tabba --- drivers/gpu/drm/nouveau/nouveau_dmem.c | 4 +++- drivers/pci/p2pdma.c | 8 +++++--- include/linux/memremap.h | 6 +++--- include/linux/mm_types.h | 13 +++++++++++++ lib/test_hmm.c | 2 +- mm/hmm.c | 2 +- mm/memory.c | 2 +- mm/memremap.c | 19 +++++++++++-------- mm/migrate_device.c | 4 ++-- mm/mm_init.c | 2 +- 10 files changed, 41 insertions(+), 21 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c b/drivers/gpu/drm/nouveau/nouveau_dmem.c index 1a072568cef6..d7d9d9476bb0 100644 --- a/drivers/gpu/drm/nouveau/nouveau_dmem.c +++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c @@ -88,7 +88,9 @@ struct nouveau_dmem { static struct nouveau_dmem_chunk *nouveau_page_to_chunk(struct page *page) { - return container_of(page->pgmap, struct nouveau_dmem_chunk, pagemap); + struct dev_pagemap *pgmap = page_get_pgmap(page); + + return container_of(pgmap, struct nouveau_dmem_chunk, pagemap); } static struct nouveau_drm *page_to_drm(struct page *page) diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c index 4f47a13cb500..19519bb4ba56 100644 --- a/drivers/pci/p2pdma.c +++ b/drivers/pci/p2pdma.c @@ -193,7 +193,7 @@ static const struct attribute_group p2pmem_group = { static void p2pdma_page_free(struct page *page) { - struct pci_p2pdma_pagemap *pgmap = to_p2p_pgmap(page->pgmap); + struct pci_p2pdma_pagemap *pgmap = to_p2p_pgmap(page_get_pgmap(page)); /* safe to dereference while a reference is held to the percpu ref */ struct pci_p2pdma *p2pdma = rcu_dereference_protected(pgmap->provider->p2pdma, 1); @@ -1016,8 +1016,10 @@ enum pci_p2pdma_map_type pci_p2pdma_map_segment(struct pci_p2pdma_map_state *state, struct device *dev, struct scatterlist *sg) { - if (state->pgmap != sg_page(sg)->pgmap) { - state->pgmap = sg_page(sg)->pgmap; + struct dev_pagemap *pgmap = page_get_pgmap(sg_page(sg)); + + if (state->pgmap != pgmap) { + state->pgmap = pgmap; state->map = pci_p2pdma_map_type(state->pgmap, dev); state->bus_off = to_p2p_pgmap(state->pgmap)->bus_offset; } diff --git a/include/linux/memremap.h b/include/linux/memremap.h index 3f7143ade32c..060e27b6aee0 100644 --- a/include/linux/memremap.h +++ b/include/linux/memremap.h @@ -161,7 +161,7 @@ static inline bool is_device_private_page(const struct page *page) { return IS_ENABLED(CONFIG_DEVICE_PRIVATE) && is_zone_device_page(page) && - page->pgmap->type == MEMORY_DEVICE_PRIVATE; + page_get_pgmap(page)->type == MEMORY_DEVICE_PRIVATE; } static inline bool folio_is_device_private(const struct folio *folio) @@ -173,13 +173,13 @@ static inline bool is_pci_p2pdma_page(const struct page *page) { return IS_ENABLED(CONFIG_PCI_P2PDMA) && is_zone_device_page(page) && - page->pgmap->type == MEMORY_DEVICE_PCI_P2PDMA; + page_get_pgmap(page)->type == MEMORY_DEVICE_PCI_P2PDMA; } static inline bool is_device_coherent_page(const struct page *page) { return is_zone_device_page(page) && - page->pgmap->type == MEMORY_DEVICE_COHERENT; + page_get_pgmap(page)->type == MEMORY_DEVICE_COHERENT; } static inline bool folio_is_device_coherent(const struct folio *folio) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 6e06286f44f1..27075ea24e67 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -616,6 +616,19 @@ static inline const struct folio_owner_ops *folio_get_owner_ops(struct folio *fo return owner_ops; } +/* + * Get the page dev_pagemap pgmap pointer. + */ +#define page_get_pgmap(page) ((page)->pgmap) + +/* + * Set the page dev_pagemap pgmap pointer. + */ +static inline void page_set_pgmap(struct page *page, struct dev_pagemap *pgmap) +{ + page->pgmap = pgmap; +} + struct page_frag_cache { void * va; #if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) diff --git a/lib/test_hmm.c b/lib/test_hmm.c index 056f2e411d7b..d3e3843f57dd 100644 --- a/lib/test_hmm.c +++ b/lib/test_hmm.c @@ -195,7 +195,7 @@ static int dmirror_fops_release(struct inode *inode, struct file *filp) static struct dmirror_chunk *dmirror_page_to_chunk(struct page *page) { - return container_of(page->pgmap, struct dmirror_chunk, pagemap); + return container_of(page_get_pgmap(page), struct dmirror_chunk, pagemap); } static struct dmirror_device *dmirror_page_to_device(struct page *page) diff --git a/mm/hmm.c b/mm/hmm.c index 7e0229ae4a5a..b5f5ac218fda 100644 --- a/mm/hmm.c +++ b/mm/hmm.c @@ -248,7 +248,7 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr, * just report the PFN. */ if (is_device_private_entry(entry) && - pfn_swap_entry_to_page(entry)->pgmap->owner == + page_get_pgmap(pfn_swap_entry_to_page(entry))->owner == range->dev_private_owner) { cpu_flags = HMM_PFN_VALID; if (is_writable_device_private_entry(entry)) diff --git a/mm/memory.c b/mm/memory.c index 80850cad0e6f..5853fa5767c7 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4276,7 +4276,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) */ get_page(vmf->page); pte_unmap_unlock(vmf->pte, vmf->ptl); - ret = vmf->page->pgmap->ops->migrate_to_ram(vmf); + ret = page_get_pgmap(vmf->page)->ops->migrate_to_ram(vmf); put_page(vmf->page); } else if (is_hwpoison_entry(entry)) { ret = VM_FAULT_HWPOISON; diff --git a/mm/memremap.c b/mm/memremap.c index 40d4547ce514..931bc85da1df 100644 --- a/mm/memremap.c +++ b/mm/memremap.c @@ -458,8 +458,9 @@ EXPORT_SYMBOL_GPL(get_dev_pagemap); void free_zone_device_folio(struct folio *folio) { - if (WARN_ON_ONCE(!folio->page.pgmap->ops || - !folio->page.pgmap->ops->page_free)) + struct dev_pagemap *pgmap = page_get_pgmap(&folio->page); + + if (WARN_ON_ONCE(!pgmap->ops || !pgmap->ops->page_free)) return; mem_cgroup_uncharge(folio); @@ -486,17 +487,17 @@ void free_zone_device_folio(struct folio *folio) * to clear folio->mapping. */ folio->mapping = NULL; - folio->page.pgmap->ops->page_free(folio_page(folio, 0)); + pgmap->ops->page_free(folio_page(folio, 0)); - if (folio->page.pgmap->type != MEMORY_DEVICE_PRIVATE && - folio->page.pgmap->type != MEMORY_DEVICE_COHERENT) + if (pgmap->type != MEMORY_DEVICE_PRIVATE && + pgmap->type != MEMORY_DEVICE_COHERENT) /* * Reset the refcount to 1 to prepare for handing out the page * again. */ folio_set_count(folio, 1); else - put_dev_pagemap(folio->page.pgmap); + put_dev_pagemap(pgmap); } void zone_device_page_init(struct page *page) @@ -505,7 +506,7 @@ void zone_device_page_init(struct page *page) * Drivers shouldn't be allocating pages after calling * memunmap_pages(). */ - WARN_ON_ONCE(!percpu_ref_tryget_live(&page->pgmap->ref)); + WARN_ON_ONCE(!percpu_ref_tryget_live(&page_get_pgmap(page)->ref)); set_page_count(page, 1); lock_page(page); } @@ -514,7 +515,9 @@ EXPORT_SYMBOL_GPL(zone_device_page_init); #ifdef CONFIG_FS_DAX bool __put_devmap_managed_folio_refs(struct folio *folio, int refs) { - if (folio->page.pgmap->type != MEMORY_DEVICE_FS_DAX) + struct dev_pagemap *pgmap = page_get_pgmap(&folio->page); + + if (pgmap->type != MEMORY_DEVICE_FS_DAX) return false; /* diff --git a/mm/migrate_device.c b/mm/migrate_device.c index 9cf26592ac93..368def358d02 100644 --- a/mm/migrate_device.c +++ b/mm/migrate_device.c @@ -135,7 +135,7 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp, page = pfn_swap_entry_to_page(entry); if (!(migrate->flags & MIGRATE_VMA_SELECT_DEVICE_PRIVATE) || - page->pgmap->owner != migrate->pgmap_owner) + page_get_pgmap(page)->owner != migrate->pgmap_owner) goto next; mpfn = migrate_pfn(page_to_pfn(page)) | @@ -156,7 +156,7 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp, goto next; else if (page && is_device_coherent_page(page) && (!(migrate->flags & MIGRATE_VMA_SELECT_DEVICE_COHERENT) || - page->pgmap->owner != migrate->pgmap_owner)) + page_get_pgmap(page)->owner != migrate->pgmap_owner)) goto next; mpfn = migrate_pfn(pfn) | MIGRATE_PFN_MIGRATE; mpfn |= pte_write(pte) ? MIGRATE_PFN_WRITE : 0; diff --git a/mm/mm_init.c b/mm/mm_init.c index 1c205b0a86ed..279cdaebfd2b 100644 --- a/mm/mm_init.c +++ b/mm/mm_init.c @@ -995,7 +995,7 @@ static void __ref __init_zone_device_page(struct page *page, unsigned long pfn, * and zone_device_data. It is a bug if a ZONE_DEVICE page is * ever freed or placed on a driver-private list. */ - page->pgmap = pgmap; + page_set_pgmap(page, pgmap); page->zone_device_data = NULL; /* From patchwork Fri Nov 8 16:20:39 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Fuad Tabba X-Patchwork-Id: 13868466 Received: from mail-yb1-f202.google.com (mail-yb1-f202.google.com [209.85.219.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1B550197A8E for ; Fri, 8 Nov 2024 16:21:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731082865; cv=none; b=bUo4tHfmfbvImjQBzidWphz8h3JsSreEqoqSlbcbmx2Z4XofhRbmozEAfWPn9BWTzw/eS8n9YzJGhF01r9zHLOtrQB8MEhDQNJonHL83BzGWGjwGCvp1jW+UzyYTvP+E9qe2d7K72T62+nbKhAcg6BKHHgdSCQDii3St31tayuI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731082865; c=relaxed/simple; bh=xQ/zydhcnm+J2lQ9sqiXNNPR24pig0kVAdXdPUUk+II=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=WTNXr40XTDk+mWKkFsb6Cxr3dEGVF4MX57ul5nifDMnOi37vLoBwW77tTdqPuZdeKqtLr59xIW+487Vnhu4DZm6JTIo7saMFvd85biZpqGjyECdHSj9iV9OV2DF/VtaiYHAOspF2mt7mBVx4dQzCUgw5JZLPwDoLJ74qnLbtVas= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--tabba.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=MmlbcBau; arc=none smtp.client-ip=209.85.219.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--tabba.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="MmlbcBau" Received: by mail-yb1-f202.google.com with SMTP id 3f1490d57ef6-e293b3e014aso3603094276.3 for ; Fri, 08 Nov 2024 08:21:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1731082863; x=1731687663; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=AGA0UiFmqkm5IQcmVA3JJZcNgt8jjwXyqomK1aijshk=; b=MmlbcBauU//mimSLFb2j3N8FcQE9YSvG0I9h0xHDrxc1TKK6UR7WTHIQRq6XPphq9G 5j1cRAeyNEJXjZEmQJuDcLNZrNdhsws3UBKS6vBkOOdsxF84/uPShM6rs1T6rO6q+XvK mIrYC2pdaXgf5GoNI2WVPnu9Hfe1QwNmPAQv6Ifpgso4vLSNJbgJ0BEvSu+WflmYp8Y6 RtehGxsA16gPkulBw8jMTxrPoR+1Cvgt8Ukej9Zzk9IKxXUqPXe+MZLbJaLxX5411Sa2 jq2VVfM+hyt0JdTWB3kBJIBawq9C3ztWHWbn6SFNAQzAAw0jn2XbeYtSYPv/NOOT4INS ju2Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731082863; x=1731687663; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=AGA0UiFmqkm5IQcmVA3JJZcNgt8jjwXyqomK1aijshk=; b=vRNaEYMcFZ5nkb+aI99Y6rBsIV/o8MdjHhOGHEiWtCGf3YaHWCHHGuqn+2XBKoXU6u ci/usyPxmFOIF0VYsEsjpox9maRXr/US81Ww3HezWBe2a+ODed3qN+ALNhmMbWJo0X7U xMaBthWJ/zmF8W5HfMmADhuxINh1pJDQXVn6sjqzG0I9a1sjFQhbMFaZM/y0TSOdKASw A2urOJwwVOqGJJ+x/B6LFlQKrR8Zj5nc0VvgWn4DvWKoRDimE2+AZGyBGr3OmhnjY6o2 VueKAygOLTryfEmI7c4M5RioLSt9XHwnSfsBOzdc3bAwTZ5dFlO8NimO/dYqwSbkMunQ QDeA== X-Gm-Message-State: AOJu0YylQNpM58BlYi332F2U5cQZa1lRn3stIfB9FEEtWb86ut1cL0AF bCwS/TE+xS7YKeL5k36b4ErGoZ4S7uZruJgVO1E59mHaWVQd6l9OrW04A4LdjDbU4Xq705Eg1g= = X-Google-Smtp-Source: AGHT+IGtaX0Xx+3549jHaKW8S9KxdS0O7fbxqd21Fy1uVBwd66gZLlXyQvWaaGo3amHvs3pcci25pPLu+w== X-Received: from fuad.c.googlers.com ([fda3:e722:ac3:cc00:28:9cb1:c0a8:1613]) (user=tabba job=sendgmr) by 2002:a25:dc4a:0:b0:e25:5cb1:77d8 with SMTP id 3f1490d57ef6-e337f8ed8bbmr2122276.6.1731082862962; Fri, 08 Nov 2024 08:21:02 -0800 (PST) Date: Fri, 8 Nov 2024 16:20:39 +0000 In-Reply-To: <20241108162040.159038-1-tabba@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20241108162040.159038-1-tabba@google.com> X-Mailer: git-send-email 2.47.0.277.g8800431eea-goog Message-ID: <20241108162040.159038-10-tabba@google.com> Subject: [RFC PATCH v1 09/10] mm: Use owner_ops on folio_put for zone device pages From: Fuad Tabba To: linux-mm@kvack.org Cc: kvm@vger.kernel.org, nouveau@lists.freedesktop.org, dri-devel@lists.freedesktop.org, david@redhat.com, rppt@kernel.org, jglisse@redhat.com, akpm@linux-foundation.org, muchun.song@linux.dev, simona@ffwll.ch, airlied@gmail.com, pbonzini@redhat.com, seanjc@google.com, willy@infradead.org, jgg@nvidia.com, jhubbard@nvidia.com, ackerleytng@google.com, vannapurve@google.com, mail@maciej.szmigiero.name, kirill.shutemov@linux.intel.com, quic_eberman@quicinc.com, maz@kernel.org, will@kernel.org, qperret@google.com, keirf@google.com, roypat@amazon.co.uk, tabba@google.com Now that we have the folio_owner_ops callback, use it for zone device pages instead of using a dedicated callback. Note that struct dev_pagemap (pgmap) in struct page is overlaid with struct folio owner_ops. Therefore, make struct dev_pagemap contain an instance of struct folio_owner_ops, to handle it the same way as struct folio_owner_ops. Also note that, although struct dev_pagemap_ops has a page_free() function, it does not have the same intention as the folio_owner_ops free() callback nor does it have the same behavior. The page_free() function is used as an optional callback to drivers that use zone device to inform them of the freeing of the page. Signed-off-by: Fuad Tabba --- include/linux/memremap.h | 8 +++++++ include/linux/mm_types.h | 16 ++++++++++++-- mm/internal.h | 1 - mm/memremap.c | 44 -------------------------------------- mm/mm_init.c | 46 ++++++++++++++++++++++++++++++++++++++++ mm/swap.c | 18 ++-------------- 6 files changed, 70 insertions(+), 63 deletions(-) diff --git a/include/linux/memremap.h b/include/linux/memremap.h index 060e27b6aee0..5b68bbc588a3 100644 --- a/include/linux/memremap.h +++ b/include/linux/memremap.h @@ -106,6 +106,7 @@ struct dev_pagemap_ops { /** * struct dev_pagemap - metadata for ZONE_DEVICE mappings + * @folio_ops: method table for folio operations. * @altmap: pre-allocated/reserved memory for vmemmap allocations * @ref: reference count that pins the devm_memremap_pages() mapping * @done: completion for @ref @@ -125,6 +126,7 @@ struct dev_pagemap_ops { * @ranges: array of ranges to be mapped when nr_range > 1 */ struct dev_pagemap { + struct folio_owner_ops folio_ops; struct vmem_altmap altmap; struct percpu_ref ref; struct completion done; @@ -140,6 +142,12 @@ struct dev_pagemap { }; }; +/* + * The folio_owner_ops structure needs to be first since pgmap in struct page is + * overlaid with owner_ops in struct folio. + */ +static_assert(offsetof(struct dev_pagemap, folio_ops) == 0); + static inline bool pgmap_has_memory_failure(struct dev_pagemap *pgmap) { return pgmap->ops && pgmap->ops->memory_failure; diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 27075ea24e67..a72fda20d5e9 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -427,6 +427,7 @@ FOLIO_MATCH(lru, lru); FOLIO_MATCH(mapping, mapping); FOLIO_MATCH(compound_head, lru); FOLIO_MATCH(compound_head, owner_ops); +FOLIO_MATCH(pgmap, owner_ops); FOLIO_MATCH(index, index); FOLIO_MATCH(private, private); FOLIO_MATCH(_mapcount, _mapcount); @@ -618,15 +619,26 @@ static inline const struct folio_owner_ops *folio_get_owner_ops(struct folio *fo /* * Get the page dev_pagemap pgmap pointer. + * + * The page pgmap is overlaid with the folio owner_ops, where bit 1 is used to + * indicate that the page/folio has owner ops. The dev_pagemap contains + * owner_ops and is handled the same way. The getter returns a sanitized + * pointer. */ -#define page_get_pgmap(page) ((page)->pgmap) +#define page_get_pgmap(page) \ + ((struct dev_pagemap *)((unsigned long)(page)->pgmap & ~FOLIO_OWNER_OPS)) /* * Set the page dev_pagemap pgmap pointer. + * + * The page pgmap is overlaid with the folio owner_ops, where bit 1 is used to + * indicate that the page/folio has owner ops. The dev_pagemap contains + * owner_ops and is handled the same way. The setter sets bit 1 to indicate + * that the page owner_ops. */ static inline void page_set_pgmap(struct page *page, struct dev_pagemap *pgmap) { - page->pgmap = pgmap; + page->pgmap = (struct dev_pagemap *)((unsigned long)pgmap | FOLIO_OWNER_OPS); } struct page_frag_cache { diff --git a/mm/internal.h b/mm/internal.h index 5a7302baeed7..a041247bed10 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -1262,7 +1262,6 @@ int numa_migrate_check(struct folio *folio, struct vm_fault *vmf, unsigned long addr, int *flags, bool writable, int *last_cpupid); -void free_zone_device_folio(struct folio *folio); int migrate_device_coherent_folio(struct folio *folio); struct vm_struct *__get_vm_area_node(unsigned long size, diff --git a/mm/memremap.c b/mm/memremap.c index 931bc85da1df..9fd5f57219eb 100644 --- a/mm/memremap.c +++ b/mm/memremap.c @@ -456,50 +456,6 @@ struct dev_pagemap *get_dev_pagemap(unsigned long pfn, } EXPORT_SYMBOL_GPL(get_dev_pagemap); -void free_zone_device_folio(struct folio *folio) -{ - struct dev_pagemap *pgmap = page_get_pgmap(&folio->page); - - if (WARN_ON_ONCE(!pgmap->ops || !pgmap->ops->page_free)) - return; - - mem_cgroup_uncharge(folio); - - /* - * Note: we don't expect anonymous compound pages yet. Once supported - * and we could PTE-map them similar to THP, we'd have to clear - * PG_anon_exclusive on all tail pages. - */ - if (folio_test_anon(folio)) { - VM_BUG_ON_FOLIO(folio_test_large(folio), folio); - __ClearPageAnonExclusive(folio_page(folio, 0)); - } - - /* - * When a device managed page is freed, the folio->mapping field - * may still contain a (stale) mapping value. For example, the - * lower bits of folio->mapping may still identify the folio as an - * anonymous folio. Ultimately, this entire field is just stale - * and wrong, and it will cause errors if not cleared. - * - * For other types of ZONE_DEVICE pages, migration is either - * handled differently or not done at all, so there is no need - * to clear folio->mapping. - */ - folio->mapping = NULL; - pgmap->ops->page_free(folio_page(folio, 0)); - - if (pgmap->type != MEMORY_DEVICE_PRIVATE && - pgmap->type != MEMORY_DEVICE_COHERENT) - /* - * Reset the refcount to 1 to prepare for handing out the page - * again. - */ - folio_set_count(folio, 1); - else - put_dev_pagemap(pgmap); -} - void zone_device_page_init(struct page *page) { /* diff --git a/mm/mm_init.c b/mm/mm_init.c index 279cdaebfd2b..47c1f8fd4914 100644 --- a/mm/mm_init.c +++ b/mm/mm_init.c @@ -974,6 +974,51 @@ static void __init memmap_init(void) } #ifdef CONFIG_ZONE_DEVICE + +static void free_zone_device_folio(struct folio *folio) +{ + struct dev_pagemap *pgmap = page_get_pgmap(&folio->page); + + if (WARN_ON_ONCE(!pgmap->ops || !pgmap->ops->page_free)) + return; + + mem_cgroup_uncharge(folio); + + /* + * Note: we don't expect anonymous compound pages yet. Once supported + * and we could PTE-map them similar to THP, we'd have to clear + * PG_anon_exclusive on all tail pages. + */ + if (folio_test_anon(folio)) { + VM_BUG_ON_FOLIO(folio_test_large(folio), folio); + __ClearPageAnonExclusive(folio_page(folio, 0)); + } + + /* + * When a device managed page is freed, the folio->mapping field + * may still contain a (stale) mapping value. For example, the + * lower bits of folio->mapping may still identify the folio as an + * anonymous folio. Ultimately, this entire field is just stale + * and wrong, and it will cause errors if not cleared. + * + * For other types of ZONE_DEVICE pages, migration is either + * handled differently or not done at all, so there is no need + * to clear folio->mapping. + */ + folio->mapping = NULL; + pgmap->ops->page_free(folio_page(folio, 0)); + + if (pgmap->type != MEMORY_DEVICE_PRIVATE && + pgmap->type != MEMORY_DEVICE_COHERENT) + /* + * Reset the refcount to 1 to prepare for handing out the page + * again. + */ + folio_set_count(folio, 1); + else + put_dev_pagemap(pgmap); +} + static void __ref __init_zone_device_page(struct page *page, unsigned long pfn, unsigned long zone_idx, int nid, struct dev_pagemap *pgmap) @@ -995,6 +1040,7 @@ static void __ref __init_zone_device_page(struct page *page, unsigned long pfn, * and zone_device_data. It is a bug if a ZONE_DEVICE page is * ever freed or placed on a driver-private list. */ + pgmap->folio_ops.free = free_zone_device_folio; page_set_pgmap(page, pgmap); page->zone_device_data = NULL; diff --git a/mm/swap.c b/mm/swap.c index 767ff6d8f47b..d2578465e270 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -117,11 +117,6 @@ void __folio_put(struct folio *folio) return; } - if (unlikely(folio_is_zone_device(folio))) { - free_zone_device_folio(folio); - return; - } - if (folio_test_hugetlb(folio)) { free_huge_folio(folio); return; @@ -947,20 +942,11 @@ void folios_put_refs(struct folio_batch *folios, unsigned int *refs) unlock_page_lruvec_irqrestore(lruvec, flags); lruvec = NULL; } - if (folio_ref_sub_and_test(folio, nr_refs)) - owner_ops->free(folio); - continue; - } - - if (folio_is_zone_device(folio)) { - if (lruvec) { - unlock_page_lruvec_irqrestore(lruvec, flags); - lruvec = NULL; - } + /* fenced by folio_is_zone_device() */ if (put_devmap_managed_folio_refs(folio, nr_refs)) continue; if (folio_ref_sub_and_test(folio, nr_refs)) - free_zone_device_folio(folio); + owner_ops->free(folio); continue; } From patchwork Fri Nov 8 16:20:40 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Fuad Tabba X-Patchwork-Id: 13868467 Received: from mail-wr1-f74.google.com (mail-wr1-f74.google.com [209.85.221.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BA2121E9067 for ; Fri, 8 Nov 2024 16:21:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731082868; cv=none; b=EroT2aHpPWR5f4BTGFogpkgDd/yG0a0gOYrmYEU6XCpL2oxHYtgSc8KNACC8oZMtlo9ubxNAWmJDq5VyTORx2TJf8c28Qr2TAFKErMzvRkG98ZJ47CfyyYeptobpYm5hG+j4VVRyVEZny7slwSaFYHCHDTDc/5GvUYT33kGx8Eg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731082868; c=relaxed/simple; bh=3VPzX9gGMqAHW50WpeaTOsoXYUxWNlgcp5+wY8CavSs=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=VPpH95nXoJEeTncUTidZ8icyLn27Ds55+nzIlNlhnZMtlsjuuWQWKWft5vfcP8hBO4gLeEuQZdDA+4+IvYiszxgZjDlQELnv+B8KhXEpC8g5KZFpm8X8nQXWg4T/vw1RHuF00SOdjP/pyc9mkCUNITxRdjN5WxIrJWuRVsHe7z4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--tabba.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=ydOjqElb; arc=none smtp.client-ip=209.85.221.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--tabba.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="ydOjqElb" Received: by mail-wr1-f74.google.com with SMTP id ffacd0b85a97d-37d5a3afa84so1365217f8f.3 for ; Fri, 08 Nov 2024 08:21:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1731082865; x=1731687665; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=LLzT0eDLF6iAMaTBJqJboYyOg2pJMFOgkBQWEqDLmNQ=; b=ydOjqElbZvnt/w8cViiz38/uhN4p/dvUSbUDnfXDHIRrWL93LUtKvK8Wt31iAcdu61 afcP8SnGr+CA+I6SX/PifkBOl/oFLc14Tu1Xgo9rK2qHlGxExHHuyUGOwcXKHIPkLwsA mi/lTMhgbVNRv0X13hcVzo8zUEmg966Esb+M8J/S7CJxrWFvubeJLAur0/OxhHPCFS4S XidT0sHVMdQa0xxonHwheAwb11FGKaERvYnI7c7DjK8j2HIV0t+LGVl7VxIfDy0WX9th 0dNzPl8ZuRhinG8Gg8HBmzXTJ0Ic0e0U1MC3/lzNaPF+sl7h6jfIPoKk6Z2fvsOvSErY XnDw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731082865; x=1731687665; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=LLzT0eDLF6iAMaTBJqJboYyOg2pJMFOgkBQWEqDLmNQ=; b=BUBU/p6CZjWGl04/X4OTmtHjt+EpTxJVeoFGyRTAjVg7rprXr6I2UD9tOjBqiA3gSS 5pvD0X7v5nk/LJc8c4w+UxAvj7sJQO7diMaSSsdppeANo4locYbGhJ2uUCEnLwgPirzG +/zckSJSqAqh6R1UuUjaNFSP2KxVtT+pHr0JImF3dkJfkK1mlL7Mo8F7++i464HsgjUj kJbFgdXme9TFQzrHJqRKoV8aCdQra4QO1uQdEOp4es9In3plZgLeoVA13u2uByyAS4VR NIT0GuqkLj2SA1q1Lu72wLC2M3e9Sp17g4SMIoS49IYuyliXI1VSn/xcKBbDASKtQsBQ 09ZQ== X-Gm-Message-State: AOJu0YxCzekkyEbjST/tE1xaTkf6sYb24sgrnB54fCvyE6RHR2N/BJPq WqRtfGPb4w1/mhQ/Le5U28ggptNOsehVVtdUIDY8GyAWY9dnkrkrMfb4Yy+ro1qUCeeLvITRJw= = X-Google-Smtp-Source: AGHT+IHHoxyjGs1AUxhryPf5sywJLDUszI7O7sOrCReyd47RhmaPFBz34fv43FAG+LKSgY7icD6Ea0HGQw== X-Received: from fuad.c.googlers.com ([fda3:e722:ac3:cc00:28:9cb1:c0a8:1613]) (user=tabba job=sendgmr) by 2002:a05:6000:1b02:b0:37d:4850:c3be with SMTP id ffacd0b85a97d-381f18881dfmr2466f8f.10.1731082865212; Fri, 08 Nov 2024 08:21:05 -0800 (PST) Date: Fri, 8 Nov 2024 16:20:40 +0000 In-Reply-To: <20241108162040.159038-1-tabba@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20241108162040.159038-1-tabba@google.com> X-Mailer: git-send-email 2.47.0.277.g8800431eea-goog Message-ID: <20241108162040.159038-11-tabba@google.com> Subject: [RFC PATCH v1 10/10] mm: hugetlb: Use owner_ops on folio_put for hugetlb From: Fuad Tabba To: linux-mm@kvack.org Cc: kvm@vger.kernel.org, nouveau@lists.freedesktop.org, dri-devel@lists.freedesktop.org, david@redhat.com, rppt@kernel.org, jglisse@redhat.com, akpm@linux-foundation.org, muchun.song@linux.dev, simona@ffwll.ch, airlied@gmail.com, pbonzini@redhat.com, seanjc@google.com, willy@infradead.org, jgg@nvidia.com, jhubbard@nvidia.com, ackerleytng@google.com, vannapurve@google.com, mail@maciej.szmigiero.name, kirill.shutemov@linux.intel.com, quic_eberman@quicinc.com, maz@kernel.org, will@kernel.org, qperret@google.com, keirf@google.com, roypat@amazon.co.uk, tabba@google.com Now that we have the folio_owner_ops callback, use it for hugetlb pages instead of using a dedicated callback. Since owner_ops is overlaid with lru, we need to unset owner_ops to allow the use of lru when its isolated. At that point we know that the reference count is elevated, will not reach 0, and thus not trigger a callback. Therefore, it is safe to do so provided we restore it before we put the folio back. Signed-off-by: Fuad Tabba --- include/linux/hugetlb.h | 2 -- mm/hugetlb.c | 57 +++++++++++++++++++++++++++++++++-------- mm/swap.c | 14 ---------- 3 files changed, 47 insertions(+), 26 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index e846d7dac77c..500848862702 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -20,8 +20,6 @@ struct user_struct; struct mmu_gather; struct node; -void free_huge_folio(struct folio *folio); - #ifdef CONFIG_HUGETLB_PAGE #include diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 2308e94d8615..4e1c87e37968 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -89,6 +89,33 @@ static void __hugetlb_vma_unlock_write_free(struct vm_area_struct *vma); static void hugetlb_unshare_pmds(struct vm_area_struct *vma, unsigned long start, unsigned long end); static struct resv_map *vma_resv_map(struct vm_area_struct *vma); +static void free_huge_folio(struct folio *folio); + +static const struct folio_owner_ops hugetlb_owner_ops = { + .free = free_huge_folio, +}; + +/* + * Mark this folio as a hugetlb-owned folio. + * + * Set the folio hugetlb flag and owner operations. + */ +static void folio_set_hugetlb_owner(struct folio *folio) +{ + __folio_set_hugetlb(folio); + folio_set_owner_ops(folio, &hugetlb_owner_ops); +} + +/* + * Unmark this folio from being a hugetlb-owned folio. + * + * Clear the folio hugetlb flag and owner operations. + */ +static void folio_clear_hugetlb_owner(struct folio *folio) +{ + folio_clear_owner_ops(folio); + __folio_clear_hugetlb(folio); +} static void hugetlb_free_folio(struct folio *folio) { @@ -1617,7 +1644,7 @@ static void remove_hugetlb_folio(struct hstate *h, struct folio *folio, * to tail struct pages. */ if (!folio_test_hugetlb_vmemmap_optimized(folio)) { - __folio_clear_hugetlb(folio); + folio_clear_hugetlb_owner(folio); } h->nr_huge_pages--; @@ -1641,7 +1668,7 @@ static void add_hugetlb_folio(struct hstate *h, struct folio *folio, h->surplus_huge_pages++; h->surplus_huge_pages_node[nid]++; } - __folio_set_hugetlb(folio); + folio_set_hugetlb_owner(folio); folio_change_private(folio, NULL); /* @@ -1692,7 +1719,7 @@ static void __update_and_free_hugetlb_folio(struct hstate *h, */ if (folio_test_hugetlb(folio)) { spin_lock_irq(&hugetlb_lock); - __folio_clear_hugetlb(folio); + folio_clear_hugetlb_owner(folio); spin_unlock_irq(&hugetlb_lock); } @@ -1793,7 +1820,7 @@ static void bulk_vmemmap_restore_error(struct hstate *h, list_for_each_entry_safe(folio, t_folio, non_hvo_folios, _hugetlb_list) { list_del(&folio->_hugetlb_list); spin_lock_irq(&hugetlb_lock); - __folio_clear_hugetlb(folio); + folio_clear_hugetlb_owner(folio); spin_unlock_irq(&hugetlb_lock); update_and_free_hugetlb_folio(h, folio, false); cond_resched(); @@ -1818,7 +1845,7 @@ static void bulk_vmemmap_restore_error(struct hstate *h, } else { list_del(&folio->_hugetlb_list); spin_lock_irq(&hugetlb_lock); - __folio_clear_hugetlb(folio); + folio_clear_hugetlb_owner(folio); spin_unlock_irq(&hugetlb_lock); update_and_free_hugetlb_folio(h, folio, false); cond_resched(); @@ -1851,14 +1878,14 @@ static void update_and_free_pages_bulk(struct hstate *h, * should only be pages on the non_hvo_folios list. * Do note that the non_hvo_folios list could be empty. * Without HVO enabled, ret will be 0 and there is no need to call - * __folio_clear_hugetlb as this was done previously. + * folio_clear_hugetlb_owner as this was done previously. */ VM_WARN_ON(!list_empty(folio_list)); VM_WARN_ON(ret < 0); if (!list_empty(&non_hvo_folios) && ret) { spin_lock_irq(&hugetlb_lock); list_for_each_entry(folio, &non_hvo_folios, _hugetlb_list) - __folio_clear_hugetlb(folio); + folio_clear_hugetlb_owner(folio); spin_unlock_irq(&hugetlb_lock); } @@ -1879,7 +1906,7 @@ struct hstate *size_to_hstate(unsigned long size) return NULL; } -void free_huge_folio(struct folio *folio) +static void free_huge_folio(struct folio *folio) { /* * Can't pass hstate in here because it is called from the @@ -1959,7 +1986,7 @@ static void __prep_account_new_huge_page(struct hstate *h, int nid) static void init_new_hugetlb_folio(struct hstate *h, struct folio *folio) { - __folio_set_hugetlb(folio); + folio_set_hugetlb_owner(folio); INIT_LIST_HEAD(&folio->_hugetlb_list); hugetlb_set_folio_subpool(folio, NULL); set_hugetlb_cgroup(folio, NULL); @@ -7428,6 +7455,14 @@ bool folio_isolate_hugetlb(struct folio *folio, struct list_head *list) goto unlock; } folio_clear_hugetlb_migratable(folio); + /* + * Clear folio->owner_ops; now we can use folio->lru. + * Note that the folio cannot get freed because we are holding a + * reference. The reference will be put in folio_putback_hugetlb(), + * after restoring folio->owner_ops. + */ + folio_clear_owner_ops(folio); + INIT_LIST_HEAD(&folio->lru); list_del_init(&folio->_hugetlb_list); list_add_tail(&folio->lru, list); unlock: @@ -7480,7 +7515,9 @@ void folio_putback_hugetlb(struct folio *folio) { spin_lock_irq(&hugetlb_lock); folio_set_hugetlb_migratable(folio); - list_del_init(&folio->lru); + list_del(&folio->lru); + /* Restore folio->owner_ops since we can no longer use folio->lru. */ + folio_set_owner_ops(folio, &hugetlb_owner_ops); list_add_tail(&folio->_hugetlb_list, &(folio_hstate(folio))->hugepage_activelist); spin_unlock_irq(&hugetlb_lock); folio_put(folio); diff --git a/mm/swap.c b/mm/swap.c index d2578465e270..9798ca47f26a 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -117,11 +117,6 @@ void __folio_put(struct folio *folio) return; } - if (folio_test_hugetlb(folio)) { - free_huge_folio(folio); - return; - } - page_cache_release(folio); folio_unqueue_deferred_split(folio); mem_cgroup_uncharge(folio); @@ -953,15 +948,6 @@ void folios_put_refs(struct folio_batch *folios, unsigned int *refs) if (!folio_ref_sub_and_test(folio, nr_refs)) continue; - /* hugetlb has its own memcg */ - if (folio_test_hugetlb(folio)) { - if (lruvec) { - unlock_page_lruvec_irqrestore(lruvec, flags); - lruvec = NULL; - } - free_huge_folio(folio); - continue; - } folio_unqueue_deferred_split(folio); __page_cache_release(folio, &lruvec, &flags);