diff mbox series

[RFC,v4,08/13] mm/memory_hotplug: Introduce offline_and_remove_memory()

Message ID 20191212171137.13872-9-david@redhat.com (mailing list archive)
State New, archived
Headers show
Series virtio-mem: paravirtualized memory | expand

Commit Message

David Hildenbrand Dec. 12, 2019, 5:11 p.m. UTC
virtio-mem wants to offline and remove a memory block once it unplugged
all subblocks (e.g., using alloc_contig_range()). Let's provide
an interface to do that from a driver. virtio-mem already supports to
offline partially unplugged memory blocks. Offlining a fully unplugged
memory block will not require to migrate any pages. All unplugged
subblocks are PageOffline() and have a reference count of 0 - so
offlining code will simply skip them.

All we need an interface to trigger the "offlining" and the removing in a
single operation - to make sure the memory block cannot get onlined by
user space again before it gets removed.

To keep things simple, allow to only work on a single memory block.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Oscar Salvador <osalvador@suse.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Qian Cai <cai@lca.pw>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 include/linux/memory_hotplug.h |  1 +
 mm/memory_hotplug.c            | 35 ++++++++++++++++++++++++++++++++++
 2 files changed, 36 insertions(+)

Comments

Michal Hocko Feb. 25, 2020, 2:11 p.m. UTC | #1
On Thu 12-12-19 18:11:32, David Hildenbrand wrote:
> virtio-mem wants to offline and remove a memory block once it unplugged
> all subblocks (e.g., using alloc_contig_range()). Let's provide
> an interface to do that from a driver. virtio-mem already supports to
> offline partially unplugged memory blocks. Offlining a fully unplugged
> memory block will not require to migrate any pages. All unplugged
> subblocks are PageOffline() and have a reference count of 0 - so
> offlining code will simply skip them.
> 
> All we need an interface to trigger the "offlining" and the removing in a
> single operation - to make sure the memory block cannot get onlined by
> user space again before it gets removed.

Why does that matter? Is it really likely that the userspace would
interfere? What would be the scenario?

Or is still mostly about not requiring callers to open code this general
patter?

> To keep things simple, allow to only work on a single memory block.
> 
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Oscar Salvador <osalvador@suse.com>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
> Cc: Wei Yang <richard.weiyang@gmail.com>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Qian Cai <cai@lca.pw>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  include/linux/memory_hotplug.h |  1 +
>  mm/memory_hotplug.c            | 35 ++++++++++++++++++++++++++++++++++
>  2 files changed, 36 insertions(+)
> 
> diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
> index ba0dca6aac6e..586f5c59c291 100644
> --- a/include/linux/memory_hotplug.h
> +++ b/include/linux/memory_hotplug.h
> @@ -310,6 +310,7 @@ extern void try_offline_node(int nid);
>  extern int offline_pages(unsigned long start_pfn, unsigned long nr_pages);
>  extern int remove_memory(int nid, u64 start, u64 size);
>  extern void __remove_memory(int nid, u64 start, u64 size);
> +extern int offline_and_remove_memory(int nid, u64 start, u64 size);
>  
>  #else
>  static inline bool is_mem_section_removable(unsigned long pfn,
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index da01453a04e6..d04369e6d3cc 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1825,4 +1825,39 @@ int remove_memory(int nid, u64 start, u64 size)
>  	return rc;
>  }
>  EXPORT_SYMBOL_GPL(remove_memory);
> +
> +/*
> + * Try to offline and remove a memory block. Might take a long time to
> + * finish in case memory is still in use. Primarily useful for memory devices
> + * that logically unplugged all memory (so it's no longer in use) and want to
> + * offline + remove the memory block.
> + */
> +int offline_and_remove_memory(int nid, u64 start, u64 size)
> +{
> +	struct memory_block *mem;
> +	int rc = -EINVAL;
> +
> +	if (!IS_ALIGNED(start, memory_block_size_bytes()) ||
> +	    size != memory_block_size_bytes())
> +		return rc;
> +
> +	lock_device_hotplug();
> +	mem = find_memory_block(__pfn_to_section(PFN_DOWN(start)));
> +	if (mem)
> +		rc = device_offline(&mem->dev);
> +	/* Ignore if the device is already offline. */
> +	if (rc > 0)
> +		rc = 0;
> +
> +	/*
> +	 * In case we succeeded to offline the memory block, remove it.
> +	 * This cannot fail as it cannot get onlined in the meantime.
> +	 */
> +	if (!rc && try_remove_memory(nid, start, size))
> +		BUG();
> +	unlock_device_hotplug();
> +
> +	return rc;
> +}
> +EXPORT_SYMBOL_GPL(offline_and_remove_memory);
>  #endif /* CONFIG_MEMORY_HOTREMOVE */
> -- 
> 2.23.0
David Hildenbrand Feb. 25, 2020, 2:27 p.m. UTC | #2
On 25.02.20 15:11, Michal Hocko wrote:
> On Thu 12-12-19 18:11:32, David Hildenbrand wrote:
>> virtio-mem wants to offline and remove a memory block once it unplugged
>> all subblocks (e.g., using alloc_contig_range()). Let's provide
>> an interface to do that from a driver. virtio-mem already supports to
>> offline partially unplugged memory blocks. Offlining a fully unplugged
>> memory block will not require to migrate any pages. All unplugged
>> subblocks are PageOffline() and have a reference count of 0 - so
>> offlining code will simply skip them.
>>
>> All we need an interface to trigger the "offlining" and the removing in a
>> single operation - to make sure the memory block cannot get onlined by
>> user space again before it gets removed.
> 
> Why does that matter? Is it really likely that the userspace would
> interfere? What would be the scenario?

I guess it's not that relevant after all (I think this comment dates
back to the times where we didn't have try_remove_memory() and could
actually BUG_ON() in remove_memory() if there would have been a race).
Can drop that part.

> 
> Or is still mostly about not requiring callers to open code this general
> patter?

From kernel module context, I cannot get access to the actual memory
block device (find_memory_block()) and call the device_unregister().

Especially, also the device hotplug lock is not exported. So this is a
clean helper function to be used from kernel module context. (e.g., also
hyper-v showed interest for using that)
Michal Hocko March 2, 2020, 12:48 p.m. UTC | #3
On Tue 25-02-20 15:27:28, David Hildenbrand wrote:
> On 25.02.20 15:11, Michal Hocko wrote:
> > On Thu 12-12-19 18:11:32, David Hildenbrand wrote:
> >> virtio-mem wants to offline and remove a memory block once it unplugged
> >> all subblocks (e.g., using alloc_contig_range()). Let's provide
> >> an interface to do that from a driver. virtio-mem already supports to
> >> offline partially unplugged memory blocks. Offlining a fully unplugged
> >> memory block will not require to migrate any pages. All unplugged
> >> subblocks are PageOffline() and have a reference count of 0 - so
> >> offlining code will simply skip them.
> >>
> >> All we need an interface to trigger the "offlining" and the removing in a
> >> single operation - to make sure the memory block cannot get onlined by
> >> user space again before it gets removed.
> > 
> > Why does that matter? Is it really likely that the userspace would
> > interfere? What would be the scenario?
> 
> I guess it's not that relevant after all (I think this comment dates
> back to the times where we didn't have try_remove_memory() and could
> actually BUG_ON() in remove_memory() if there would have been a race).
> Can drop that part.
> 
> > 
> > Or is still mostly about not requiring callers to open code this general
> > patter?
> 
> From kernel module context, I cannot get access to the actual memory
> block device (find_memory_block()) and call the device_unregister().
> 
> Especially, also the device hotplug lock is not exported. So this is a
> clean helper function to be used from kernel module context. (e.g., also
> hyper-v showed interest for using that)

Fair enough.
David Hildenbrand March 2, 2020, 12:53 p.m. UTC | #4
On 02.03.20 13:48, Michal Hocko wrote:
> On Tue 25-02-20 15:27:28, David Hildenbrand wrote:
>> On 25.02.20 15:11, Michal Hocko wrote:
>>> On Thu 12-12-19 18:11:32, David Hildenbrand wrote:
>>>> virtio-mem wants to offline and remove a memory block once it unplugged
>>>> all subblocks (e.g., using alloc_contig_range()). Let's provide
>>>> an interface to do that from a driver. virtio-mem already supports to
>>>> offline partially unplugged memory blocks. Offlining a fully unplugged
>>>> memory block will not require to migrate any pages. All unplugged
>>>> subblocks are PageOffline() and have a reference count of 0 - so
>>>> offlining code will simply skip them.
>>>>
>>>> All we need an interface to trigger the "offlining" and the removing in a
>>>> single operation - to make sure the memory block cannot get onlined by
>>>> user space again before it gets removed.
>>>
>>> Why does that matter? Is it really likely that the userspace would
>>> interfere? What would be the scenario?
>>
>> I guess it's not that relevant after all (I think this comment dates
>> back to the times where we didn't have try_remove_memory() and could
>> actually BUG_ON() in remove_memory() if there would have been a race).
>> Can drop that part.
>>
>>>
>>> Or is still mostly about not requiring callers to open code this general
>>> patter?
>>
>> From kernel module context, I cannot get access to the actual memory
>> block device (find_memory_block()) and call the device_unregister().
>>
>> Especially, also the device hotplug lock is not exported. So this is a
>> clean helper function to be used from kernel module context. (e.g., also
>> hyper-v showed interest for using that)
> 
> Fair enough.
> 

I'll send a v1 shortly, I rephrased the description to make this clear.
Thanks!
diff mbox series

Patch

diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index ba0dca6aac6e..586f5c59c291 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -310,6 +310,7 @@  extern void try_offline_node(int nid);
 extern int offline_pages(unsigned long start_pfn, unsigned long nr_pages);
 extern int remove_memory(int nid, u64 start, u64 size);
 extern void __remove_memory(int nid, u64 start, u64 size);
+extern int offline_and_remove_memory(int nid, u64 start, u64 size);
 
 #else
 static inline bool is_mem_section_removable(unsigned long pfn,
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index da01453a04e6..d04369e6d3cc 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1825,4 +1825,39 @@  int remove_memory(int nid, u64 start, u64 size)
 	return rc;
 }
 EXPORT_SYMBOL_GPL(remove_memory);
+
+/*
+ * Try to offline and remove a memory block. Might take a long time to
+ * finish in case memory is still in use. Primarily useful for memory devices
+ * that logically unplugged all memory (so it's no longer in use) and want to
+ * offline + remove the memory block.
+ */
+int offline_and_remove_memory(int nid, u64 start, u64 size)
+{
+	struct memory_block *mem;
+	int rc = -EINVAL;
+
+	if (!IS_ALIGNED(start, memory_block_size_bytes()) ||
+	    size != memory_block_size_bytes())
+		return rc;
+
+	lock_device_hotplug();
+	mem = find_memory_block(__pfn_to_section(PFN_DOWN(start)));
+	if (mem)
+		rc = device_offline(&mem->dev);
+	/* Ignore if the device is already offline. */
+	if (rc > 0)
+		rc = 0;
+
+	/*
+	 * In case we succeeded to offline the memory block, remove it.
+	 * This cannot fail as it cannot get onlined in the meantime.
+	 */
+	if (!rc && try_remove_memory(nid, start, size))
+		BUG();
+	unlock_device_hotplug();
+
+	return rc;
+}
+EXPORT_SYMBOL_GPL(offline_and_remove_memory);
 #endif /* CONFIG_MEMORY_HOTREMOVE */