Message ID | 20180816100628.26428-6-david@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | mm/memory_hotplug: online/offline_pages refactorings | expand |
> failed_addition: > +#ifdef CONFIG_DEBUG_VM > pr_debug("online_pages [mem %#010llx-%#010llx] failed\n", > (unsigned long long) pfn << PAGE_SHIFT, > (((unsigned long long) pfn + nr_pages) << PAGE_SHIFT) - 1); > +#endif I have never been sure about this. IMO, if I fail to online pages, I want to know I failed. I think that pr_err would be better than pr_debug and without CONFIG_DEBUG_VM. But at least, if not, envolve it with a CONFIG_DEBUG_VM, but change pr_debug to pr_info. > +#ifdef CONFIG_DEBUG_VM > pr_debug("memory offlining [mem %#010llx-%#010llx] failed\n", > (unsigned long long) start_pfn << PAGE_SHIFT, > ((unsigned long long) end_pfn << PAGE_SHIFT) - 1); > +#endif Same goes here. Thanks
On Fri, Aug 17, 2018 at 10:18:53AM +0200, Oscar Salvador wrote: >> failed_addition: >> +#ifdef CONFIG_DEBUG_VM >> pr_debug("online_pages [mem %#010llx-%#010llx] failed\n", >> (unsigned long long) pfn << PAGE_SHIFT, >> (((unsigned long long) pfn + nr_pages) << PAGE_SHIFT) - 1); >> +#endif > >I have never been sure about this. >IMO, if I fail to online pages, I want to know I failed. >I think that pr_err would be better than pr_debug and without CONFIG_DEBUG_VM. > >But at least, if not, envolve it with a CONFIG_DEBUG_VM, but change pr_debug to pr_info. > I don't have a clear rule about these debug macro neither. While when you look at the page related logs in calculate_node_totalpages(), it is KERNEL_DEBUG level and without any config macro. Maybe we should leave them at the same state? >> +#ifdef CONFIG_DEBUG_VM >> pr_debug("memory offlining [mem %#010llx-%#010llx] failed\n", >> (unsigned long long) start_pfn << PAGE_SHIFT, >> ((unsigned long long) end_pfn << PAGE_SHIFT) - 1); >> +#endif > >Same goes here. > >Thanks >-- >Oscar Salvador >SUSE L3
On 17.08.2018 10:18, Oscar Salvador wrote: >> failed_addition: >> +#ifdef CONFIG_DEBUG_VM >> pr_debug("online_pages [mem %#010llx-%#010llx] failed\n", >> (unsigned long long) pfn << PAGE_SHIFT, >> (((unsigned long long) pfn + nr_pages) << PAGE_SHIFT) - 1); >> +#endif > > I have never been sure about this. > IMO, if I fail to online pages, I want to know I failed. > I think that pr_err would be better than pr_debug and without CONFIG_DEBUG_VM. I consider both error messages only partially useful, as 1. They only catch a subset of actual failures the function handles. E.g. onlining will not report an error message if the memory notifier failed. 2. Onlining/Offlining is usually (with exceptions - e.g. onlining during add_memory) triggered from user space, where we present an error code. At any times, the actual state of the memory blocks can be observed by querying the state. I would even vote for dropping the two error case messages completely. At least I don't consider them very useful. > > But at least, if not, envolve it with a CONFIG_DEBUG_VM, but change pr_debug to pr_info. > >> +#ifdef CONFIG_DEBUG_VM >> pr_debug("memory offlining [mem %#010llx-%#010llx] failed\n", >> (unsigned long long) start_pfn << PAGE_SHIFT, >> ((unsigned long long) end_pfn << PAGE_SHIFT) - 1); >> +#endif > > Same goes here. > > Thanks >
On 20.08.2018 11:45, David Hildenbrand wrote: > On 17.08.2018 10:18, Oscar Salvador wrote: >>> failed_addition: >>> +#ifdef CONFIG_DEBUG_VM >>> pr_debug("online_pages [mem %#010llx-%#010llx] failed\n", >>> (unsigned long long) pfn << PAGE_SHIFT, >>> (((unsigned long long) pfn + nr_pages) << PAGE_SHIFT) - 1); >>> +#endif >> >> I have never been sure about this. >> IMO, if I fail to online pages, I want to know I failed. >> I think that pr_err would be better than pr_debug and without CONFIG_DEBUG_VM. > > I consider both error messages only partially useful, as > > 1. They only catch a subset of actual failures the function handles. > E.g. onlining will not report an error message if the memory notifier > failed. That statement was wrong. It is rather in offline_pages, errors from start_isolate_page_range() are ignored. > 2. Onlining/Offlining is usually (with exceptions - e.g. onlining during > add_memory) triggered from user space, where we present an error > code. At any times, the actual state of the memory blocks can be > observed by querying the state. > > I would even vote for dropping the two error case messages completely. > At least I don't consider them very useful. > >> >> But at least, if not, envolve it with a CONFIG_DEBUG_VM, but change pr_debug to pr_info. >> >>> +#ifdef CONFIG_DEBUG_VM >>> pr_debug("memory offlining [mem %#010llx-%#010llx] failed\n", >>> (unsigned long long) start_pfn << PAGE_SHIFT, >>> ((unsigned long long) end_pfn << PAGE_SHIFT) - 1); >>> +#endif >> >> Same goes here. >> >> Thanks >> > >
On 19.08.2018 14:34, Wei Yang wrote: > On Fri, Aug 17, 2018 at 10:18:53AM +0200, Oscar Salvador wrote: >>> failed_addition: >>> +#ifdef CONFIG_DEBUG_VM >>> pr_debug("online_pages [mem %#010llx-%#010llx] failed\n", >>> (unsigned long long) pfn << PAGE_SHIFT, >>> (((unsigned long long) pfn + nr_pages) << PAGE_SHIFT) - 1); >>> +#endif >> >> I have never been sure about this. >> IMO, if I fail to online pages, I want to know I failed. >> I think that pr_err would be better than pr_debug and without CONFIG_DEBUG_VM. >> >> But at least, if not, envolve it with a CONFIG_DEBUG_VM, but change pr_debug to pr_info. >> > > I don't have a clear rule about these debug macro neither. > > While when you look at the page related logs in calculate_node_totalpages(), > it is KERNEL_DEBUG level and without any config macro. > > Maybe we should leave them at the same state? I guess we can do that for the to debug messages. When offlining memory right now: :/# echo 0 > /sys/devices/system/memory/memory9/online [ 24.476207] Offlined Pages 32768 [ 24.477200] remove from free list 48000 1024 50000 [ 24.477896] remove from free list 48400 1024 50000 [ 24.478584] remove from free list 48800 1024 50000 [ 24.479454] remove from free list 48c00 1024 50000 [ 24.480192] remove from free list 49000 1024 50000 [ 24.480957] remove from free list 49400 1024 50000 [ 24.481752] remove from free list 49800 1024 50000 [ 24.482578] remove from free list 49c00 1024 50000 [ 24.483302] remove from free list 4a000 1024 50000 [ 24.484300] remove from free list 4a400 1024 50000 [ 24.484902] remove from free list 4a800 1024 50000 [ 24.485462] remove from free list 4ac00 1024 50000 [ 24.486381] remove from free list 4b000 1024 50000 [ 24.487108] remove from free list 4b400 1024 50000 [ 24.487842] remove from free list 4b800 1024 50000 [ 24.488610] remove from free list 4bc00 1024 50000 [ 24.489548] remove from free list 4c000 1024 50000 [ 24.490392] remove from free list 4c400 1024 50000 [ 24.491224] remove from free list 4c800 1024 50000 ... While "remove from free list" is pr_info under CONFIG_DEBUG_VM, "Offlined Pages ..." is pr_info without CONFIG_DEBUG_VM.
On 16.08.2018 12:06, David Hildenbrand wrote: > Let's try to minimze the noise. > > Signed-off-by: David Hildenbrand <david@redhat.com> > --- > mm/memory_hotplug.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c > index bbbd16f9d877..6fec2dc6a73d 100644 > --- a/mm/memory_hotplug.c > +++ b/mm/memory_hotplug.c > @@ -966,9 +966,11 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_typ > return 0; > > failed_addition: > +#ifdef CONFIG_DEBUG_VM > pr_debug("online_pages [mem %#010llx-%#010llx] failed\n", > (unsigned long long) pfn << PAGE_SHIFT, > (((unsigned long long) pfn + nr_pages) << PAGE_SHIFT) - 1); > +#endif > memory_notify(MEM_CANCEL_ONLINE, &arg); > return ret; > } > @@ -1660,7 +1662,9 @@ int offline_pages(unsigned long start_pfn, unsigned long nr_pages) > offlined_pages = check_pages_isolated(start_pfn, end_pfn); > if (offlined_pages < 0) > goto repeat; > +#ifdef CONFIG_DEBUG_VM > pr_info("Offlined Pages %ld\n", offlined_pages); > +#endif > /* Ok, all of our target is isolated. > We cannot do rollback at this point. */ > offline_isolated_pages(start_pfn, end_pfn); > @@ -1695,9 +1699,11 @@ int offline_pages(unsigned long start_pfn, unsigned long nr_pages) > return 0; > > failed_removal: > +#ifdef CONFIG_DEBUG_VM > pr_debug("memory offlining [mem %#010llx-%#010llx] failed\n", > (unsigned long long) start_pfn << PAGE_SHIFT, > ((unsigned long long) end_pfn << PAGE_SHIFT) - 1); > +#endif > memory_notify(MEM_CANCEL_OFFLINE, &arg); > /* pushback to free area */ > undo_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE); > I'll drop this patch for now, maybe the error messages are actually useful when debugging a crashdump of a system without CONFIG_DEBUG_VM.
On Mon, Aug 20, 2018 at 11:57:04AM +0200, David Hildenbrand wrote: >On 19.08.2018 14:34, Wei Yang wrote: >> On Fri, Aug 17, 2018 at 10:18:53AM +0200, Oscar Salvador wrote: >>>> failed_addition: >>>> +#ifdef CONFIG_DEBUG_VM >>>> pr_debug("online_pages [mem %#010llx-%#010llx] failed\n", >>>> (unsigned long long) pfn << PAGE_SHIFT, >>>> (((unsigned long long) pfn + nr_pages) << PAGE_SHIFT) - 1); >>>> +#endif >>> >>> I have never been sure about this. >>> IMO, if I fail to online pages, I want to know I failed. >>> I think that pr_err would be better than pr_debug and without CONFIG_DEBUG_VM. >>> >>> But at least, if not, envolve it with a CONFIG_DEBUG_VM, but change pr_debug to pr_info. >>> >> >> I don't have a clear rule about these debug macro neither. >> >> While when you look at the page related logs in calculate_node_totalpages(), >> it is KERNEL_DEBUG level and without any config macro. >> >> Maybe we should leave them at the same state? > >I guess we can do that for the to debug messages. > >When offlining memory right now: > >:/# echo 0 > /sys/devices/system/memory/memory9/online >[ 24.476207] Offlined Pages 32768 >[ 24.477200] remove from free list 48000 1024 50000 >[ 24.477896] remove from free list 48400 1024 50000 >[ 24.478584] remove from free list 48800 1024 50000 >[ 24.479454] remove from free list 48c00 1024 50000 >[ 24.480192] remove from free list 49000 1024 50000 >[ 24.480957] remove from free list 49400 1024 50000 >[ 24.481752] remove from free list 49800 1024 50000 >[ 24.482578] remove from free list 49c00 1024 50000 >[ 24.483302] remove from free list 4a000 1024 50000 >[ 24.484300] remove from free list 4a400 1024 50000 >[ 24.484902] remove from free list 4a800 1024 50000 >[ 24.485462] remove from free list 4ac00 1024 50000 >[ 24.486381] remove from free list 4b000 1024 50000 >[ 24.487108] remove from free list 4b400 1024 50000 >[ 24.487842] remove from free list 4b800 1024 50000 >[ 24.488610] remove from free list 4bc00 1024 50000 >[ 24.489548] remove from free list 4c000 1024 50000 >[ 24.490392] remove from free list 4c400 1024 50000 >[ 24.491224] remove from free list 4c800 1024 50000 >... > >While "remove from free list" is pr_info under CONFIG_DEBUG_VM, >"Offlined Pages ..." is pr_info without CONFIG_DEBUG_VM. Hmm... yes, don't see the logic between them. > >-- > >Thanks, > >David / dhildenb
On 8/20/18 6:46 AM, David Hildenbrand wrote: > On 16.08.2018 12:06, David Hildenbrand wrote: >> Let's try to minimze the noise. >> >> Signed-off-by: David Hildenbrand <david@redhat.com> >> --- >> mm/memory_hotplug.c | 6 ++++++ >> 1 file changed, 6 insertions(+) >> >> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c >> index bbbd16f9d877..6fec2dc6a73d 100644 >> --- a/mm/memory_hotplug.c >> +++ b/mm/memory_hotplug.c >> @@ -966,9 +966,11 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_typ >> return 0; >> >> failed_addition: >> +#ifdef CONFIG_DEBUG_VM >> pr_debug("online_pages [mem %#010llx-%#010llx] failed\n", >> (unsigned long long) pfn << PAGE_SHIFT, >> (((unsigned long long) pfn + nr_pages) << PAGE_SHIFT) - 1); >> +#endif >> memory_notify(MEM_CANCEL_ONLINE, &arg); >> return ret; >> } >> @@ -1660,7 +1662,9 @@ int offline_pages(unsigned long start_pfn, unsigned long nr_pages) >> offlined_pages = check_pages_isolated(start_pfn, end_pfn); >> if (offlined_pages < 0) >> goto repeat; >> +#ifdef CONFIG_DEBUG_VM >> pr_info("Offlined Pages %ld\n", offlined_pages); >> +#endif >> /* Ok, all of our target is isolated. >> We cannot do rollback at this point. */ >> offline_isolated_pages(start_pfn, end_pfn); >> @@ -1695,9 +1699,11 @@ int offline_pages(unsigned long start_pfn, unsigned long nr_pages) >> return 0; >> >> failed_removal: >> +#ifdef CONFIG_DEBUG_VM >> pr_debug("memory offlining [mem %#010llx-%#010llx] failed\n", >> (unsigned long long) start_pfn << PAGE_SHIFT, >> ((unsigned long long) end_pfn << PAGE_SHIFT) - 1); >> +#endif >> memory_notify(MEM_CANCEL_OFFLINE, &arg); >> /* pushback to free area */ >> undo_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE); >> > > I'll drop this patch for now, maybe the error messages are actually > useful when debugging a crashdump of a system without CONFIG_DEBUG_VM. > Yes, please drop it, no reason to reduce amount of these messages. They are useful. Thank you, Pavel
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index bbbd16f9d877..6fec2dc6a73d 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -966,9 +966,11 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_typ return 0; failed_addition: +#ifdef CONFIG_DEBUG_VM pr_debug("online_pages [mem %#010llx-%#010llx] failed\n", (unsigned long long) pfn << PAGE_SHIFT, (((unsigned long long) pfn + nr_pages) << PAGE_SHIFT) - 1); +#endif memory_notify(MEM_CANCEL_ONLINE, &arg); return ret; } @@ -1660,7 +1662,9 @@ int offline_pages(unsigned long start_pfn, unsigned long nr_pages) offlined_pages = check_pages_isolated(start_pfn, end_pfn); if (offlined_pages < 0) goto repeat; +#ifdef CONFIG_DEBUG_VM pr_info("Offlined Pages %ld\n", offlined_pages); +#endif /* Ok, all of our target is isolated. We cannot do rollback at this point. */ offline_isolated_pages(start_pfn, end_pfn); @@ -1695,9 +1699,11 @@ int offline_pages(unsigned long start_pfn, unsigned long nr_pages) return 0; failed_removal: +#ifdef CONFIG_DEBUG_VM pr_debug("memory offlining [mem %#010llx-%#010llx] failed\n", (unsigned long long) start_pfn << PAGE_SHIFT, ((unsigned long long) end_pfn << PAGE_SHIFT) - 1); +#endif memory_notify(MEM_CANCEL_OFFLINE, &arg); /* pushback to free area */ undo_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE);
Let's try to minimze the noise. Signed-off-by: David Hildenbrand <david@redhat.com> --- mm/memory_hotplug.c | 6 ++++++ 1 file changed, 6 insertions(+)