Message ID | 20190409100148.24703-2-david@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | mm/memory_hotplug: Better error handling when removing memory | expand |
On Tue, 9 Apr 2019 12:01:45 +0200 David Hildenbrand <david@redhat.com> wrote: > __add_pages() doesn't add the memory resource, so __remove_pages() > shouldn't remove it. Let's factor it out. Especially as it is a special > case for memory used as system memory, added via add_memory() and > friends. > > We now remove the resource after removing the sections instead of doing > it the other way around. I don't think this change is problematic. > > add_memory() > register memory resource > arch_add_memory() > > remove_memory > arch_remove_memory() > release memory resource > > While at it, explain why we ignore errors and that it only happeny if > we remove memory in a different granularity as we added it. Seems sane. > --- a/mm/memory_hotplug.c > +++ b/mm/memory_hotplug.c > @@ -1820,6 +1806,25 @@ void try_offline_node(int nid) > } > EXPORT_SYMBOL(try_offline_node); > > +static void __release_memory_resource(u64 start, u64 size) > +{ > + int ret; > + > + /* > + * When removing memory in the same granularity as it was added, > + * this function never fails. It might only fail if resources > + * have to be adjusted or split. We'll ignore the error, as > + * removing of memory cannot fail. > + */ > + ret = release_mem_region_adjustable(&iomem_resource, start, size); > + if (ret) { > + resource_size_t endres = start + size - 1; > + > + pr_warn("Unable to release resource <%pa-%pa> (%d)\n", > + &start, &endres, ret); > + } > +} The types seem confused here. Should `start' and `size' be resource_size_t? Or maybe phys_addr_t. release_mem_region_adjustable() takes resource_size_t's. Is %pa the way to print a resource_size_t? I guess it happens to work because resource_size_t happens to map onto phys_addr_t, which isn't ideal. Wanna have a headscratch over that? > /** > * remove_memory > * @nid: the node ID > @@ -1854,6 +1859,7 @@ void __ref __remove_memory(int nid, u64 start, u64 size) > memblock_remove(start, size); > > arch_remove_memory(nid, start, size, NULL); > + __release_memory_resource(start, size); > > try_offline_node(nid);
On 10.04.19 00:41, Andrew Morton wrote: > On Tue, 9 Apr 2019 12:01:45 +0200 David Hildenbrand <david@redhat.com> wrote: > >> __add_pages() doesn't add the memory resource, so __remove_pages() >> shouldn't remove it. Let's factor it out. Especially as it is a special >> case for memory used as system memory, added via add_memory() and >> friends. >> >> We now remove the resource after removing the sections instead of doing >> it the other way around. I don't think this change is problematic. >> >> add_memory() >> register memory resource >> arch_add_memory() >> >> remove_memory >> arch_remove_memory() >> release memory resource >> >> While at it, explain why we ignore errors and that it only happeny if >> we remove memory in a different granularity as we added it. > > Seems sane. > >> --- a/mm/memory_hotplug.c >> +++ b/mm/memory_hotplug.c >> @@ -1820,6 +1806,25 @@ void try_offline_node(int nid) >> } >> EXPORT_SYMBOL(try_offline_node); >> >> +static void __release_memory_resource(u64 start, u64 size) >> +{ >> + int ret; >> + >> + /* >> + * When removing memory in the same granularity as it was added, >> + * this function never fails. It might only fail if resources >> + * have to be adjusted or split. We'll ignore the error, as >> + * removing of memory cannot fail. >> + */ >> + ret = release_mem_region_adjustable(&iomem_resource, start, size); >> + if (ret) { >> + resource_size_t endres = start + size - 1; >> + >> + pr_warn("Unable to release resource <%pa-%pa> (%d)\n", >> + &start, &endres, ret); >> + } >> +} > > The types seem confused here. Should `start' and `size' be > resource_size_t? Or maybe phys_addr_t. Hmm, right now it has the same prototype as register_memory_resource. I guess using resource_size_t is the right thing to do. > > release_mem_region_adjustable() takes resource_size_t's. > > Is %pa the way to print a resource_size_t? I guess it happens to work > because resource_size_t happens to map onto phys_addr_t, which isn't > ideal. Documentation/core-api/printk-formats.rst " %pa[p] 0x01234567 or 0x0123456789abcdef For printing a phys_addr_t type (and its derivatives, such as resource_size_t) ... " Care to fixup both u64 to resource_size_t? Or should I send a patch? Whatever you prefer. Thanks!
On Wed, 10 Apr 2019 10:07:24 +0200 David Hildenbrand <david@redhat.com> wrote: > Care to fixup both u64 to resource_size_t? Or should I send a patch? > Whatever you prefer. Please send along a fixup. This patch series has no evidence of having been reviewed :(. Can you suggest who could help us out here?
On 17.04.19 05:37, Andrew Morton wrote: > On Wed, 10 Apr 2019 10:07:24 +0200 David Hildenbrand <david@redhat.com> wrote: > >> Care to fixup both u64 to resource_size_t? Or should I send a patch? >> Whatever you prefer. > > Please send along a fixup. Will do! > > This patch series has no evidence of having been reviewed :(. Can you > suggest who could help us out here? Usually I would say Oscar and Michal, but they seem to be fairly busy. I guess only the first patch is a real change, the other three patches are mostly function prototype changes, handling errors in a nicer way. Thanks!
On Tue, 2019-04-09 at 12:01 +0200, David Hildenbrand wrote: > __add_pages() doesn't add the memory resource, so __remove_pages() > shouldn't remove it. Let's factor it out. Especially as it is a > special > case for memory used as system memory, added via add_memory() and > friends. I would call the special case the other way, aka: zone_device hooking into hotplug path. > > We now remove the resource after removing the sections instead of > doing > it the other way around. I don't think this change is problematic. > > add_memory() > register memory resource > arch_add_memory() > > remove_memory > arch_remove_memory() > release memory resource > > While at it, explain why we ignore errors and that it only happeny if > we remove memory in a different granularity as we added it. In the future we may want to allow drivers to hook directly into arch_add_memory()/arch_remove_memory(), and this will lead to different granularity in hot_add/hot_remove operations. At least that was one of the conclusions I drew from the last vmemmap- patchset. So, we will have to see how we can handle those kind of errors. > > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: Oscar Salvador <osalvador@suse.de> > Cc: Michal Hocko <mhocko@suse.com> > Cc: David Hildenbrand <david@redhat.com> > Cc: Pavel Tatashin <pasha.tatashin@soleen.com> > Cc: Wei Yang <richard.weiyang@gmail.com> > Cc: Qian Cai <cai@lca.pw> > Cc: Arun KS <arunks@codeaurora.org> > Cc: Mathieu Malaterre <malat@debian.org> > Signed-off-by: David Hildenbrand <david@redhat.com> Besides what Andrew pointed out about the types of start,size, I do not see anything wrong: Reviewed-by: Oscar Salvador <osalvador@suse.de> > --- > mm/memory_hotplug.c | 34 ++++++++++++++++++++-------------- > 1 file changed, 20 insertions(+), 14 deletions(-) > > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c > index 4970ff658055..696ed7ee5e28 100644 > --- a/mm/memory_hotplug.c > +++ b/mm/memory_hotplug.c > @@ -562,20 +562,6 @@ int __remove_pages(struct zone *zone, unsigned > long phys_start_pfn, > if (is_dev_zone(zone)) { > if (altmap) > map_offset = vmem_altmap_offset(altmap); > - } else { > - resource_size_t start, size; > - > - start = phys_start_pfn << PAGE_SHIFT; > - size = nr_pages * PAGE_SIZE; > - > - ret = release_mem_region_adjustable(&iomem_resource, > start, > - size); > - if (ret) { > - resource_size_t endres = start + size - 1; > - > - pr_warn("Unable to release resource <%pa- > %pa> (%d)\n", > - &start, &endres, ret); > - } > } > > clear_zone_contiguous(zone); > @@ -1820,6 +1806,25 @@ void try_offline_node(int nid) > } > EXPORT_SYMBOL(try_offline_node); > > +static void __release_memory_resource(u64 start, u64 size) > +{ > + int ret; > + > + /* > + * When removing memory in the same granularity as it was > added, > + * this function never fails. It might only fail if > resources > + * have to be adjusted or split. We'll ignore the error, as > + * removing of memory cannot fail. > + */ > + ret = release_mem_region_adjustable(&iomem_resource, start, > size); > + if (ret) { > + resource_size_t endres = start + size - 1; > + > + pr_warn("Unable to release resource <%pa-%pa> > (%d)\n", > + &start, &endres, ret); > + } > +} > + > /** > * remove_memory > * @nid: the node ID > @@ -1854,6 +1859,7 @@ void __ref __remove_memory(int nid, u64 start, > u64 size) > memblock_remove(start, size); > > arch_remove_memory(nid, start, size, NULL); > + __release_memory_resource(start, size); > > try_offline_node(nid); >
On Tue 09-04-19 12:01:45, David Hildenbrand wrote: > __add_pages() doesn't add the memory resource, so __remove_pages() > shouldn't remove it. Let's factor it out. Especially as it is a special > case for memory used as system memory, added via add_memory() and > friends. > > We now remove the resource after removing the sections instead of doing > it the other way around. I don't think this change is problematic. > > add_memory() > register memory resource > arch_add_memory() > > remove_memory > arch_remove_memory() > release memory resource > > While at it, explain why we ignore errors and that it only happeny if > we remove memory in a different granularity as we added it. OK, I agree that the symmetry is good in general and it certainly makes sense here as well. But does it make sense to pick up this particular part without larger considerations of add vs. remove apis? I have a strong feeling this wouldn't be the only thing to care about. In other words does this help future changes or it is more likely to cause more code conflicts with other features being developed right now? > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: Oscar Salvador <osalvador@suse.de> > Cc: Michal Hocko <mhocko@suse.com> > Cc: David Hildenbrand <david@redhat.com> > Cc: Pavel Tatashin <pasha.tatashin@soleen.com> > Cc: Wei Yang <richard.weiyang@gmail.com> > Cc: Qian Cai <cai@lca.pw> > Cc: Arun KS <arunks@codeaurora.org> > Cc: Mathieu Malaterre <malat@debian.org> > Signed-off-by: David Hildenbrand <david@redhat.com> > --- > mm/memory_hotplug.c | 34 ++++++++++++++++++++-------------- > 1 file changed, 20 insertions(+), 14 deletions(-) > > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c > index 4970ff658055..696ed7ee5e28 100644 > --- a/mm/memory_hotplug.c > +++ b/mm/memory_hotplug.c > @@ -562,20 +562,6 @@ int __remove_pages(struct zone *zone, unsigned long phys_start_pfn, > if (is_dev_zone(zone)) { > if (altmap) > map_offset = vmem_altmap_offset(altmap); > - } else { > - resource_size_t start, size; > - > - start = phys_start_pfn << PAGE_SHIFT; > - size = nr_pages * PAGE_SIZE; > - > - ret = release_mem_region_adjustable(&iomem_resource, start, > - size); > - if (ret) { > - resource_size_t endres = start + size - 1; > - > - pr_warn("Unable to release resource <%pa-%pa> (%d)\n", > - &start, &endres, ret); > - } > } > > clear_zone_contiguous(zone); > @@ -1820,6 +1806,25 @@ void try_offline_node(int nid) > } > EXPORT_SYMBOL(try_offline_node); > > +static void __release_memory_resource(u64 start, u64 size) > +{ > + int ret; > + > + /* > + * When removing memory in the same granularity as it was added, > + * this function never fails. It might only fail if resources > + * have to be adjusted or split. We'll ignore the error, as > + * removing of memory cannot fail. > + */ > + ret = release_mem_region_adjustable(&iomem_resource, start, size); > + if (ret) { > + resource_size_t endres = start + size - 1; > + > + pr_warn("Unable to release resource <%pa-%pa> (%d)\n", > + &start, &endres, ret); > + } > +} > + > /** > * remove_memory > * @nid: the node ID > @@ -1854,6 +1859,7 @@ void __ref __remove_memory(int nid, u64 start, u64 size) > memblock_remove(start, size); > > arch_remove_memory(nid, start, size, NULL); > + __release_memory_resource(start, size); > > try_offline_node(nid); > > -- > 2.17.2 >
On 17.04.19 15:12, Michal Hocko wrote: > On Tue 09-04-19 12:01:45, David Hildenbrand wrote: >> __add_pages() doesn't add the memory resource, so __remove_pages() >> shouldn't remove it. Let's factor it out. Especially as it is a special >> case for memory used as system memory, added via add_memory() and >> friends. >> >> We now remove the resource after removing the sections instead of doing >> it the other way around. I don't think this change is problematic. >> >> add_memory() >> register memory resource >> arch_add_memory() >> >> remove_memory >> arch_remove_memory() >> release memory resource >> >> While at it, explain why we ignore errors and that it only happeny if >> we remove memory in a different granularity as we added it. > > OK, I agree that the symmetry is good in general and it certainly makes > sense here as well. But does it make sense to pick up this particular > part without larger considerations of add vs. remove apis? I have a > strong feeling this wouldn't be the only thing to care about. In other > words does this help future changes or it is more likely to cause more > code conflicts with other features being developed right now? I am planning to 1. factor out memory block device handling, so features like sub-section add/remove are easier to add internally. Move it to the user that wants it. Clean up all the mess we have right now due to supporting memory block devices that span several sections. 2. Make sure that any arch_add_pages() and friends clean up properly if they fail instead of indicating failure but leaving some partially added memory lying around. 3. Clean up node handling regarding to memory hotplug/unplug. Especially don't allow to offline/remove memory spanning several nodes etc. IOW, in order to properly clean up memory block device handling and prepare for more changes people are interested in (e.g. sub-section add of device memory), this is the right thing to do. The other parts are bigger changes.
On Wed 17-04-19 15:24:47, David Hildenbrand wrote: > On 17.04.19 15:12, Michal Hocko wrote: > > On Tue 09-04-19 12:01:45, David Hildenbrand wrote: > >> __add_pages() doesn't add the memory resource, so __remove_pages() > >> shouldn't remove it. Let's factor it out. Especially as it is a special > >> case for memory used as system memory, added via add_memory() and > >> friends. > >> > >> We now remove the resource after removing the sections instead of doing > >> it the other way around. I don't think this change is problematic. > >> > >> add_memory() > >> register memory resource > >> arch_add_memory() > >> > >> remove_memory > >> arch_remove_memory() > >> release memory resource > >> > >> While at it, explain why we ignore errors and that it only happeny if > >> we remove memory in a different granularity as we added it. > > > > OK, I agree that the symmetry is good in general and it certainly makes > > sense here as well. But does it make sense to pick up this particular > > part without larger considerations of add vs. remove apis? I have a > > strong feeling this wouldn't be the only thing to care about. In other > > words does this help future changes or it is more likely to cause more > > code conflicts with other features being developed right now? > > I am planning to > > 1. factor out memory block device handling, so features like sub-section > add/remove are easier to add internally. Move it to the user that wants > it. Clean up all the mess we have right now due to supporting memory > block devices that span several sections. > > 2. Make sure that any arch_add_pages() and friends clean up properly if > they fail instead of indicating failure but leaving some partially added > memory lying around. > > 3. Clean up node handling regarding to memory hotplug/unplug. Especially > don't allow to offline/remove memory spanning several nodes etc. Yes, this all sounds sane to me. > IOW, in order to properly clean up memory block device handling and > prepare for more changes people are interested in (e.g. sub-section add > of device memory), this is the right thing to do. The other parts are > bigger changes. This would be really valuable to have in the cover. Beause there is so much to clean up in this mess but making random small cleanups without a larger plan tends to step on others toes more than being useful.
On 17.04.19 15:31, Michal Hocko wrote: > On Wed 17-04-19 15:24:47, David Hildenbrand wrote: >> On 17.04.19 15:12, Michal Hocko wrote: >>> On Tue 09-04-19 12:01:45, David Hildenbrand wrote: >>>> __add_pages() doesn't add the memory resource, so __remove_pages() >>>> shouldn't remove it. Let's factor it out. Especially as it is a special >>>> case for memory used as system memory, added via add_memory() and >>>> friends. >>>> >>>> We now remove the resource after removing the sections instead of doing >>>> it the other way around. I don't think this change is problematic. >>>> >>>> add_memory() >>>> register memory resource >>>> arch_add_memory() >>>> >>>> remove_memory >>>> arch_remove_memory() >>>> release memory resource >>>> >>>> While at it, explain why we ignore errors and that it only happeny if >>>> we remove memory in a different granularity as we added it. >>> >>> OK, I agree that the symmetry is good in general and it certainly makes >>> sense here as well. But does it make sense to pick up this particular >>> part without larger considerations of add vs. remove apis? I have a >>> strong feeling this wouldn't be the only thing to care about. In other >>> words does this help future changes or it is more likely to cause more >>> code conflicts with other features being developed right now? >> >> I am planning to >> >> 1. factor out memory block device handling, so features like sub-section >> add/remove are easier to add internally. Move it to the user that wants >> it. Clean up all the mess we have right now due to supporting memory >> block devices that span several sections. >> >> 2. Make sure that any arch_add_pages() and friends clean up properly if >> they fail instead of indicating failure but leaving some partially added >> memory lying around. >> >> 3. Clean up node handling regarding to memory hotplug/unplug. Especially >> don't allow to offline/remove memory spanning several nodes etc. > > Yes, this all sounds sane to me. > >> IOW, in order to properly clean up memory block device handling and >> prepare for more changes people are interested in (e.g. sub-section add >> of device memory), this is the right thing to do. The other parts are >> bigger changes. > > This would be really valuable to have in the cover. Beause there is so > much to clean up in this mess but making random small cleanups without a > larger plan tends to step on others toes more than being useful. I agree, let's discuss the bigger plan I have in mind 1. arch_add_memory()/arch_remove_memory() don't deal with memory block devices. add_memory()/remove_memory()/online_pages()/offline_pages() do. 2. add_memory()/remove_memory()/online_pages()/offline_pages() - Only work on memory block device alignment/granularity - Only work on single nodes. - Only work on single zones. 3. mem->nid correctly indicates if - Memory block devices belongs to single node / no node / multiple nodes - Fast and reliable way to detect remove_memory()/online_pages()/offline_pages() being called with multiple nodes. 4. arch_remove_memory() and friends never fail. Removing of memory always succeeds. This allows better error handling when adding of memory fails. We will move some parts from CONFIG_MEMORY_HOTREMOVE to CONFIG_MEMORY_HOTPLUG, so we can use them to clean up if adding of memory fails. 5. Remove all accesses to struct_page from memory removal path. Pages might never have been initialized, they should not be touched. All other features I see on the horizon (vmemmap on added memory, sub-section hot-add) would mainly be centered around arch_add_memory()/arch_remove_memory(), which would not have to deal with any special cases around memory block device memory. Do you agree, do you have any other points/things in mind you consider important?
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 4970ff658055..696ed7ee5e28 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -562,20 +562,6 @@ int __remove_pages(struct zone *zone, unsigned long phys_start_pfn, if (is_dev_zone(zone)) { if (altmap) map_offset = vmem_altmap_offset(altmap); - } else { - resource_size_t start, size; - - start = phys_start_pfn << PAGE_SHIFT; - size = nr_pages * PAGE_SIZE; - - ret = release_mem_region_adjustable(&iomem_resource, start, - size); - if (ret) { - resource_size_t endres = start + size - 1; - - pr_warn("Unable to release resource <%pa-%pa> (%d)\n", - &start, &endres, ret); - } } clear_zone_contiguous(zone); @@ -1820,6 +1806,25 @@ void try_offline_node(int nid) } EXPORT_SYMBOL(try_offline_node); +static void __release_memory_resource(u64 start, u64 size) +{ + int ret; + + /* + * When removing memory in the same granularity as it was added, + * this function never fails. It might only fail if resources + * have to be adjusted or split. We'll ignore the error, as + * removing of memory cannot fail. + */ + ret = release_mem_region_adjustable(&iomem_resource, start, size); + if (ret) { + resource_size_t endres = start + size - 1; + + pr_warn("Unable to release resource <%pa-%pa> (%d)\n", + &start, &endres, ret); + } +} + /** * remove_memory * @nid: the node ID @@ -1854,6 +1859,7 @@ void __ref __remove_memory(int nid, u64 start, u64 size) memblock_remove(start, size); arch_remove_memory(nid, start, size, NULL); + __release_memory_resource(start, size); try_offline_node(nid);
__add_pages() doesn't add the memory resource, so __remove_pages() shouldn't remove it. Let's factor it out. Especially as it is a special case for memory used as system memory, added via add_memory() and friends. We now remove the resource after removing the sections instead of doing it the other way around. I don't think this change is problematic. add_memory() register memory resource arch_add_memory() remove_memory arch_remove_memory() release memory resource While at it, explain why we ignore errors and that it only happeny if we remove memory in a different granularity as we added it. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: Michal Hocko <mhocko@suse.com> Cc: David Hildenbrand <david@redhat.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Qian Cai <cai@lca.pw> Cc: Arun KS <arunks@codeaurora.org> Cc: Mathieu Malaterre <malat@debian.org> Signed-off-by: David Hildenbrand <david@redhat.com> --- mm/memory_hotplug.c | 34 ++++++++++++++++++++-------------- 1 file changed, 20 insertions(+), 14 deletions(-)