diff mbox series

[v1] mm/page_alloc: drop pr_info_ratelimited() in alloc_contig_range()

Message ID 20210301150945.77012-1-david@redhat.com (mailing list archive)
State New, archived
Headers show
Series [v1] mm/page_alloc: drop pr_info_ratelimited() in alloc_contig_range() | expand

Commit Message

David Hildenbrand March 1, 2021, 3:09 p.m. UTC
The information that some PFNs are busy is:
a) not helpful for ordinary users: we don't even know *who* called
   alloc_contig_range(). This is certainly not worth a pr_info.*().
b) not really helpful for debugging: we don't have any details *why*
   these PFNs are busy, and that is what we usually care about.
c) not complete: there are other cases where we fail alloc_contig_range()
   using different paths that are not getting recorded.

For example, we reach this path once we succeeded in isolating pageblocks,
but failed to migrate some pages - which can happen easily on
ZONE_NORMAL (i.e., has_unmovable_pages() is racy) but also on ZONE_MOVABLE
i.e., we would have to retry longer to migrate).

For example via virtio-mem when unplugging memory, we can create quite
some noise (especially with ZONE_NORMAL) that is not of interest to
users - it's expected that some allocations may fail as memory is busy.

Let's just drop that pr_info_ratelimit() and rather implement a dynamic
debugging mechanism in the future that can give us a better reason why
alloc_contig_range() failed on specific pages.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/page_alloc.c | 2 --
 1 file changed, 2 deletions(-)

Comments

Zi Yan March 1, 2021, 3:35 p.m. UTC | #1
On 1 Mar 2021, at 10:09, David Hildenbrand wrote:

> The information that some PFNs are busy is:
> a) not helpful for ordinary users: we don't even know *who* called
>    alloc_contig_range(). This is certainly not worth a pr_info.*().
> b) not really helpful for debugging: we don't have any details *why*
>    these PFNs are busy, and that is what we usually care about.
> c) not complete: there are other cases where we fail alloc_contig_range()
>    using different paths that are not getting recorded.
>
> For example, we reach this path once we succeeded in isolating pageblocks,
> but failed to migrate some pages - which can happen easily on
> ZONE_NORMAL (i.e., has_unmovable_pages() is racy) but also on ZONE_MOVABLE
> i.e., we would have to retry longer to migrate).
>
> For example via virtio-mem when unplugging memory, we can create quite
> some noise (especially with ZONE_NORMAL) that is not of interest to
> users - it's expected that some allocations may fail as memory is busy.
>
> Let's just drop that pr_info_ratelimit() and rather implement a dynamic
> debugging mechanism in the future that can give us a better reason why
> alloc_contig_range() failed on specific pages.
>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---

LGTM. I agree that the printout is not quite useful.

Reviewed-by: Zi Yan <ziy@nvidia.com>


—
Best Regards,
Yan Zi
Michal Hocko March 1, 2021, 3:52 p.m. UTC | #2
On Mon 01-03-21 16:09:45, David Hildenbrand wrote:
> The information that some PFNs are busy is:
> a) not helpful for ordinary users: we don't even know *who* called
>    alloc_contig_range(). This is certainly not worth a pr_info.*().
> b) not really helpful for debugging: we don't have any details *why*
>    these PFNs are busy, and that is what we usually care about.
> c) not complete: there are other cases where we fail alloc_contig_range()
>    using different paths that are not getting recorded.
> 
> For example, we reach this path once we succeeded in isolating pageblocks,
> but failed to migrate some pages - which can happen easily on
> ZONE_NORMAL (i.e., has_unmovable_pages() is racy) but also on ZONE_MOVABLE
> i.e., we would have to retry longer to migrate).
> 
> For example via virtio-mem when unplugging memory, we can create quite
> some noise (especially with ZONE_NORMAL) that is not of interest to
> users - it's expected that some allocations may fail as memory is busy.
> 
> Let's just drop that pr_info_ratelimit() and rather implement a dynamic
> debugging mechanism in the future that can give us a better reason why
> alloc_contig_range() failed on specific pages.
> 
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Signed-off-by: David Hildenbrand <david@redhat.com>

Acked-by: Michal Hocko <mhocko@suse.com>

> ---
>  mm/page_alloc.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 519a60d5b6f7..efb924fb13e8 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -8647,8 +8647,6 @@ int alloc_contig_range(unsigned long start, unsigned long end,
>  
>  	/* Make sure the range is really isolated. */
>  	if (test_pages_isolated(outer_start, end, 0)) {
> -		pr_info_ratelimited("%s: [%lx, %lx) PFNs busy\n",
> -			__func__, outer_start, end);
>  		ret = -EBUSY;
>  		goto done;
>  	}
> -- 
> 2.29.2
Oscar Salvador March 1, 2021, 10:05 p.m. UTC | #3
On Mon, Mar 01, 2021 at 04:09:45PM +0100, David Hildenbrand wrote:
> The information that some PFNs are busy is:
> a) not helpful for ordinary users: we don't even know *who* called
>    alloc_contig_range(). This is certainly not worth a pr_info.*().
> b) not really helpful for debugging: we don't have any details *why*
>    these PFNs are busy, and that is what we usually care about.
> c) not complete: there are other cases where we fail alloc_contig_range()
>    using different paths that are not getting recorded.
> 
> For example, we reach this path once we succeeded in isolating pageblocks,
> but failed to migrate some pages - which can happen easily on
> ZONE_NORMAL (i.e., has_unmovable_pages() is racy) but also on ZONE_MOVABLE
> i.e., we would have to retry longer to migrate).
> 
> For example via virtio-mem when unplugging memory, we can create quite
> some noise (especially with ZONE_NORMAL) that is not of interest to
> users - it's expected that some allocations may fail as memory is busy.
> 
> Let's just drop that pr_info_ratelimit() and rather implement a dynamic
> debugging mechanism in the future that can give us a better reason why
> alloc_contig_range() failed on specific pages.
> 
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Signed-off-by: David Hildenbrand <david@redhat.com>

Reviewed-by: Oscar Salvador <osalvador@suse.de>
diff mbox series

Patch

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 519a60d5b6f7..efb924fb13e8 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -8647,8 +8647,6 @@  int alloc_contig_range(unsigned long start, unsigned long end,
 
 	/* Make sure the range is really isolated. */
 	if (test_pages_isolated(outer_start, end, 0)) {
-		pr_info_ratelimited("%s: [%lx, %lx) PFNs busy\n",
-			__func__, outer_start, end);
 		ret = -EBUSY;
 		goto done;
 	}