diff mbox series

[v3,1/2] mm: collect the number of anon large folios

Message ID 20240822224015.93186-2-21cnbao@gmail.com (mailing list archive)
State New
Headers show
Series mm: collect the number of anon mTHP | expand

Commit Message

Barry Song Aug. 22, 2024, 10:40 p.m. UTC
From: Barry Song <v-songbaohua@oppo.com>

Anon large folios come from three places:
1. new allocated large folios in page faults, they will call
   folio_add_new_anon_rmap() for rmap;
2. a large folio is split into multiple lower-order large folios;
3. a large folio is migrated to a new large folio.

In all above three counts, we increase nr_anon by 1;

Anon large folios might go either because of be split or be put
to free, in these cases, we reduce the count by 1.

Folios added to the swap cache without an anonymous mapping won't
be counted. This aligns with the AnonPages statistics in /proc/meminfo.
However, folios that have been fully unmapped but not yet freed are
counted. Unlike AnonPages, this can help identify anonymous memory
leaks, such as when an anon folio is still pinned after being unmapped.

Signed-off-by: Barry Song <v-songbaohua@oppo.com>
Acked-by: David Hildenbrand <david@redhat.com>
---
 Documentation/admin-guide/mm/transhuge.rst |  5 +++++
 include/linux/huge_mm.h                    | 15 +++++++++++++--
 mm/huge_memory.c                           | 13 ++++++++++---
 mm/migrate.c                               |  4 ++++
 mm/page_alloc.c                            |  5 ++++-
 mm/rmap.c                                  |  1 +
 6 files changed, 37 insertions(+), 6 deletions(-)

Comments

David Hildenbrand Aug. 23, 2024, 11:31 a.m. UTC | #1
On 23.08.24 00:40, Barry Song wrote:
> From: Barry Song <v-songbaohua@oppo.com>
> 

Nit: "anon large folios" come in two flavors: THP and hugetlb.

I suggest to just call it "anon THP" in the context of both patches 
subjects/descriptions (sorry, I should have realized that earlier) to 
make it clearer.

This patch I would call

"mm: count the number of anonymous THPs per size"


Here I would start with:

'
Let's track for each anonymous THP size, how many of them are currently 
allocated. We'll track the complete lifespan of an anon THP, starting 
when it becomes an anon THP ("large anon folio") (->mapping gets set), 
until it gets freed (->mapping gets cleared).

Introduce a new "nr_anon" counter per THP size and adjust the 
corresponding counter in the following cases:
* We allocate a new THP and call folio_add_new_anon_rmap() to map
   it the first time and turn it into an anon THP.
* We split an anon THP into multiple smaller ones.
* We migrate an anon THP, when we prepare the destination.
* We free an anon THP back to the buddy.
'

> Anon large folios come from three places:
> 1. new allocated large folios in page faults, they will call
>     folio_add_new_anon_rmap() for rmap;
> 2. a large folio is split into multiple lower-order large folios;
> 3. a large folio is migrated to a new large folio.
> 
> In all above three counts, we increase nr_anon by 1;
> 
> Anon large folios might go either because of be split or be put
> to free, in these cases, we reduce the count by 1.
> 
> Folios added to the swap cache without an anonymous mapping won't
> be counted. This aligns with the AnonPages statistics in /proc/meminfo.
> However, folios that have been fully unmapped but not yet freed are
> counted. Unlike AnonPages, this can help identify anonymous memory
> leaks, such as when an anon folio is still pinned after being unmapped.

I would just mention here:

"Note that AnonPages in /proc/meminfo currently tracks the total number 
of *mapped* anonymous *pages*, and therefore has slightly different 
semantics. In the future, we might also want to track "nr_anon_mapped" 
for each THP size, which might be helpful when comparing it to the 
number of allocated anon THPs (long-term pinning, stuck in swapcache, 
memory leaks, ...)."

Further note that for now, we only track anon THPs after they got their 
->mapping set, for example via folio_add_new_anon_rmap(). If we would 
allocate some in the swapcache, they will only show up in the statistics 
for now after they have been mapped to user space the first time, where 
we call folio_add_new_anon_rmap().
"



> 
> Signed-off-by: Barry Song <v-songbaohua@oppo.com>
> Acked-by: David Hildenbrand <david@redhat.com>
> ---
>   Documentation/admin-guide/mm/transhuge.rst |  5 +++++
>   include/linux/huge_mm.h                    | 15 +++++++++++++--
>   mm/huge_memory.c                           | 13 ++++++++++---
>   mm/migrate.c                               |  4 ++++
>   mm/page_alloc.c                            |  5 ++++-
>   mm/rmap.c                                  |  1 +
>   6 files changed, 37 insertions(+), 6 deletions(-)
> 
> diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
> index 79435c537e21..b78f2148b242 100644
> --- a/Documentation/admin-guide/mm/transhuge.rst
> +++ b/Documentation/admin-guide/mm/transhuge.rst
> @@ -551,6 +551,11 @@ split_deferred
>           it would free up some memory. Pages on split queue are going to
>           be split under memory pressure, if splitting is possible.
>   
> +nr_anon
> +       the number of transparent anon huge pages we have in the whole system.
> +       These huge pages could be entirely mapped or have partially
> +       unmapped/unused subpages.

To be clearer "This includes THPs that might or might not be currently 
mapped to user space".
David Hildenbrand Aug. 23, 2024, 11:39 a.m. UTC | #2
On 23.08.24 13:31, David Hildenbrand wrote:
> On 23.08.24 00:40, Barry Song wrote:
>> From: Barry Song <v-songbaohua@oppo.com>
>>
> 
> Nit: "anon large folios" come in two flavors: THP and hugetlb.
> 
> I suggest to just call it "anon THP" in the context of both patches
> subjects/descriptions (sorry, I should have realized that earlier) to
> make it clearer.
> 
> This patch I would call
> 
> "mm: count the number of anonymous THPs per size"

Oh, and in the cover letter subject you did it right. Just be consistent 
:) (I don't care if we call it THP or mTHP here, whatever floats your boat)
diff mbox series

Patch

diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
index 79435c537e21..b78f2148b242 100644
--- a/Documentation/admin-guide/mm/transhuge.rst
+++ b/Documentation/admin-guide/mm/transhuge.rst
@@ -551,6 +551,11 @@  split_deferred
         it would free up some memory. Pages on split queue are going to
         be split under memory pressure, if splitting is possible.
 
+nr_anon
+       the number of transparent anon huge pages we have in the whole system.
+       These huge pages could be entirely mapped or have partially
+       unmapped/unused subpages.
+
 As the system ages, allocating huge pages may be expensive as the
 system uses memory compaction to copy data around memory to free a
 huge page for use. There are some counters in ``/proc/vmstat`` to help
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 4c32058cacfe..2ee2971e4e10 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -126,6 +126,7 @@  enum mthp_stat_item {
 	MTHP_STAT_SPLIT,
 	MTHP_STAT_SPLIT_FAILED,
 	MTHP_STAT_SPLIT_DEFERRED,
+	MTHP_STAT_NR_ANON,
 	__MTHP_STAT_COUNT
 };
 
@@ -136,14 +137,24 @@  struct mthp_stat {
 
 DECLARE_PER_CPU(struct mthp_stat, mthp_stats);
 
-static inline void count_mthp_stat(int order, enum mthp_stat_item item)
+static inline void mod_mthp_stat(int order, enum mthp_stat_item item, int delta)
 {
 	if (order <= 0 || order > PMD_ORDER)
 		return;
 
-	this_cpu_inc(mthp_stats.stats[order][item]);
+	this_cpu_add(mthp_stats.stats[order][item], delta);
+}
+
+static inline void count_mthp_stat(int order, enum mthp_stat_item item)
+{
+	mod_mthp_stat(order, item, 1);
 }
+
 #else
+static inline void mod_mthp_stat(int order, enum mthp_stat_item item, int delta)
+{
+}
+
 static inline void count_mthp_stat(int order, enum mthp_stat_item item)
 {
 }
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 513e7c87efee..26ad75fcda62 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -597,6 +597,7 @@  DEFINE_MTHP_STAT_ATTR(shmem_fallback_charge, MTHP_STAT_SHMEM_FALLBACK_CHARGE);
 DEFINE_MTHP_STAT_ATTR(split, MTHP_STAT_SPLIT);
 DEFINE_MTHP_STAT_ATTR(split_failed, MTHP_STAT_SPLIT_FAILED);
 DEFINE_MTHP_STAT_ATTR(split_deferred, MTHP_STAT_SPLIT_DEFERRED);
+DEFINE_MTHP_STAT_ATTR(nr_anon, MTHP_STAT_NR_ANON);
 
 static struct attribute *anon_stats_attrs[] = {
 	&anon_fault_alloc_attr.attr,
@@ -609,6 +610,7 @@  static struct attribute *anon_stats_attrs[] = {
 	&split_attr.attr,
 	&split_failed_attr.attr,
 	&split_deferred_attr.attr,
+	&nr_anon_attr.attr,
 	NULL,
 };
 
@@ -3314,8 +3316,9 @@  int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
 	struct deferred_split *ds_queue = get_deferred_split_queue(folio);
 	/* reset xarray order to new order after split */
 	XA_STATE_ORDER(xas, &folio->mapping->i_pages, folio->index, new_order);
-	struct anon_vma *anon_vma = NULL;
+	bool is_anon = folio_test_anon(folio);
 	struct address_space *mapping = NULL;
+	struct anon_vma *anon_vma = NULL;
 	int order = folio_order(folio);
 	int extra_pins, ret;
 	pgoff_t end;
@@ -3327,7 +3330,7 @@  int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
 	if (new_order >= folio_order(folio))
 		return -EINVAL;
 
-	if (folio_test_anon(folio)) {
+	if (is_anon) {
 		/* order-1 is not supported for anonymous THP. */
 		if (new_order == 1) {
 			VM_WARN_ONCE(1, "Cannot split to order-1 folio");
@@ -3367,7 +3370,7 @@  int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
 	if (folio_test_writeback(folio))
 		return -EBUSY;
 
-	if (folio_test_anon(folio)) {
+	if (is_anon) {
 		/*
 		 * The caller does not necessarily hold an mmap_lock that would
 		 * prevent the anon_vma disappearing so we first we take a
@@ -3480,6 +3483,10 @@  int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
 			}
 		}
 
+		if (is_anon) {
+			mod_mthp_stat(order, MTHP_STAT_NR_ANON, -1);
+			mod_mthp_stat(new_order, MTHP_STAT_NR_ANON, 1 << (order - new_order));
+		}
 		__split_huge_page(page, list, end, new_order);
 		ret = 0;
 	} else {
diff --git a/mm/migrate.c b/mm/migrate.c
index 4f55f4930fe8..3cc8555de6d6 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -450,6 +450,8 @@  static int __folio_migrate_mapping(struct address_space *mapping,
 		/* No turning back from here */
 		newfolio->index = folio->index;
 		newfolio->mapping = folio->mapping;
+		if (folio_test_anon(folio) && folio_test_large(folio))
+			mod_mthp_stat(folio_order(folio), MTHP_STAT_NR_ANON, 1);
 		if (folio_test_swapbacked(folio))
 			__folio_set_swapbacked(newfolio);
 
@@ -474,6 +476,8 @@  static int __folio_migrate_mapping(struct address_space *mapping,
 	 */
 	newfolio->index = folio->index;
 	newfolio->mapping = folio->mapping;
+	if (folio_test_anon(folio) && folio_test_large(folio))
+		mod_mthp_stat(folio_order(folio), MTHP_STAT_NR_ANON, 1);
 	folio_ref_add(newfolio, nr); /* add cache reference */
 	if (folio_test_swapbacked(folio)) {
 		__folio_set_swapbacked(newfolio);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 8a67d760b71a..7dcb0713eb57 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1084,8 +1084,11 @@  __always_inline bool free_pages_prepare(struct page *page,
 			(page + i)->flags &= ~PAGE_FLAGS_CHECK_AT_PREP;
 		}
 	}
-	if (PageMappingFlags(page))
+	if (PageMappingFlags(page)) {
+		if (PageAnon(page))
+			mod_mthp_stat(order, MTHP_STAT_NR_ANON, -1);
 		page->mapping = NULL;
+	}
 	if (is_check_pages_enabled()) {
 		if (free_page_is_bad(page))
 			bad++;
diff --git a/mm/rmap.c b/mm/rmap.c
index 1103a536e474..78529cf0fd66 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1467,6 +1467,7 @@  void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
 	}
 
 	__folio_mod_stat(folio, nr, nr_pmdmapped);
+	mod_mthp_stat(folio_order(folio), MTHP_STAT_NR_ANON, 1);
 }
 
 static __always_inline void __folio_add_file_rmap(struct folio *folio,