Message ID | 20210301150945.77012-1-david@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v1] mm/page_alloc: drop pr_info_ratelimited() in alloc_contig_range() | expand |
On 1 Mar 2021, at 10:09, David Hildenbrand wrote: > The information that some PFNs are busy is: > a) not helpful for ordinary users: we don't even know *who* called > alloc_contig_range(). This is certainly not worth a pr_info.*(). > b) not really helpful for debugging: we don't have any details *why* > these PFNs are busy, and that is what we usually care about. > c) not complete: there are other cases where we fail alloc_contig_range() > using different paths that are not getting recorded. > > For example, we reach this path once we succeeded in isolating pageblocks, > but failed to migrate some pages - which can happen easily on > ZONE_NORMAL (i.e., has_unmovable_pages() is racy) but also on ZONE_MOVABLE > i.e., we would have to retry longer to migrate). > > For example via virtio-mem when unplugging memory, we can create quite > some noise (especially with ZONE_NORMAL) that is not of interest to > users - it's expected that some allocations may fail as memory is busy. > > Let's just drop that pr_info_ratelimit() and rather implement a dynamic > debugging mechanism in the future that can give us a better reason why > alloc_contig_range() failed on specific pages. > > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: Minchan Kim <minchan@kernel.org> > Cc: Oscar Salvador <osalvador@suse.de> > Cc: Michal Hocko <mhocko@kernel.org> > Cc: Vlastimil Babka <vbabka@suse.cz> > Signed-off-by: David Hildenbrand <david@redhat.com> > --- LGTM. I agree that the printout is not quite useful. Reviewed-by: Zi Yan <ziy@nvidia.com> — Best Regards, Yan Zi
On Mon 01-03-21 16:09:45, David Hildenbrand wrote: > The information that some PFNs are busy is: > a) not helpful for ordinary users: we don't even know *who* called > alloc_contig_range(). This is certainly not worth a pr_info.*(). > b) not really helpful for debugging: we don't have any details *why* > these PFNs are busy, and that is what we usually care about. > c) not complete: there are other cases where we fail alloc_contig_range() > using different paths that are not getting recorded. > > For example, we reach this path once we succeeded in isolating pageblocks, > but failed to migrate some pages - which can happen easily on > ZONE_NORMAL (i.e., has_unmovable_pages() is racy) but also on ZONE_MOVABLE > i.e., we would have to retry longer to migrate). > > For example via virtio-mem when unplugging memory, we can create quite > some noise (especially with ZONE_NORMAL) that is not of interest to > users - it's expected that some allocations may fail as memory is busy. > > Let's just drop that pr_info_ratelimit() and rather implement a dynamic > debugging mechanism in the future that can give us a better reason why > alloc_contig_range() failed on specific pages. > > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: Minchan Kim <minchan@kernel.org> > Cc: Oscar Salvador <osalvador@suse.de> > Cc: Michal Hocko <mhocko@kernel.org> > Cc: Vlastimil Babka <vbabka@suse.cz> > Signed-off-by: David Hildenbrand <david@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> > --- > mm/page_alloc.c | 2 -- > 1 file changed, 2 deletions(-) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 519a60d5b6f7..efb924fb13e8 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -8647,8 +8647,6 @@ int alloc_contig_range(unsigned long start, unsigned long end, > > /* Make sure the range is really isolated. */ > if (test_pages_isolated(outer_start, end, 0)) { > - pr_info_ratelimited("%s: [%lx, %lx) PFNs busy\n", > - __func__, outer_start, end); > ret = -EBUSY; > goto done; > } > -- > 2.29.2
On Mon, Mar 01, 2021 at 04:09:45PM +0100, David Hildenbrand wrote: > The information that some PFNs are busy is: > a) not helpful for ordinary users: we don't even know *who* called > alloc_contig_range(). This is certainly not worth a pr_info.*(). > b) not really helpful for debugging: we don't have any details *why* > these PFNs are busy, and that is what we usually care about. > c) not complete: there are other cases where we fail alloc_contig_range() > using different paths that are not getting recorded. > > For example, we reach this path once we succeeded in isolating pageblocks, > but failed to migrate some pages - which can happen easily on > ZONE_NORMAL (i.e., has_unmovable_pages() is racy) but also on ZONE_MOVABLE > i.e., we would have to retry longer to migrate). > > For example via virtio-mem when unplugging memory, we can create quite > some noise (especially with ZONE_NORMAL) that is not of interest to > users - it's expected that some allocations may fail as memory is busy. > > Let's just drop that pr_info_ratelimit() and rather implement a dynamic > debugging mechanism in the future that can give us a better reason why > alloc_contig_range() failed on specific pages. > > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: Minchan Kim <minchan@kernel.org> > Cc: Oscar Salvador <osalvador@suse.de> > Cc: Michal Hocko <mhocko@kernel.org> > Cc: Vlastimil Babka <vbabka@suse.cz> > Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Oscar Salvador <osalvador@suse.de>
diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 519a60d5b6f7..efb924fb13e8 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -8647,8 +8647,6 @@ int alloc_contig_range(unsigned long start, unsigned long end, /* Make sure the range is really isolated. */ if (test_pages_isolated(outer_start, end, 0)) { - pr_info_ratelimited("%s: [%lx, %lx) PFNs busy\n", - __func__, outer_start, end); ret = -EBUSY; goto done; }
The information that some PFNs are busy is: a) not helpful for ordinary users: we don't even know *who* called alloc_contig_range(). This is certainly not worth a pr_info.*(). b) not really helpful for debugging: we don't have any details *why* these PFNs are busy, and that is what we usually care about. c) not complete: there are other cases where we fail alloc_contig_range() using different paths that are not getting recorded. For example, we reach this path once we succeeded in isolating pageblocks, but failed to migrate some pages - which can happen easily on ZONE_NORMAL (i.e., has_unmovable_pages() is racy) but also on ZONE_MOVABLE i.e., we would have to retry longer to migrate). For example via virtio-mem when unplugging memory, we can create quite some noise (especially with ZONE_NORMAL) that is not of interest to users - it's expected that some allocations may fail as memory is busy. Let's just drop that pr_info_ratelimit() and rather implement a dynamic debugging mechanism in the future that can give us a better reason why alloc_contig_range() failed on specific pages. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: Michal Hocko <mhocko@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: David Hildenbrand <david@redhat.com> --- mm/page_alloc.c | 2 -- 1 file changed, 2 deletions(-)