diff mbox

[RFC] memory-hotplug: Add memblock_state notifier

Message ID 1342783088-29970-1-git-send-email-vasilis.liaskovitis@profitbricks.com (mailing list archive)
State New, archived
Headers show

Commit Message

Vasilis Liaskovitis July 20, 2012, 11:18 a.m. UTC
hot-remove initiated by acpi_memhotplug driver tries to offline pages and then
remove section/sysfs files in remove_memory(). remove_memory() will only proceed
if is_memblk_offline() returns true, i.e. only if the corresponding memblock
is in MEM_OFFLINE state. However, the memblock state is currently only updated
if the offlining has been initiated from the sysfs interface (echo offline >
/sys/devices/system/memory/memoryXX/state). The acpi hot-remove codepath does
not use the sysfs interface but directly calls offline_pages. So remove_memory()
will always fail, even if offline_pages has succeeded.

This patch solves this by registering a memblock_state notifier function in the
memory_notify chain. This will change state of memblocks independently of sysfs
usage.

The patch is based on work-in-progress patches for memory hot-remove, see:
http://lwn.net/Articles/507244/

Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
---
 drivers/base/memory.c |   37 +++++++++++++++++++++++++++++++++++++
 1 files changed, 37 insertions(+), 0 deletions(-)

Comments

Wen Congyang July 23, 2012, 9:08 a.m. UTC | #1
At 07/20/2012 07:18 PM, Vasilis Liaskovitis Wrote:
> hot-remove initiated by acpi_memhotplug driver tries to offline pages and then
> remove section/sysfs files in remove_memory(). remove_memory() will only proceed
> if is_memblk_offline() returns true, i.e. only if the corresponding memblock
> is in MEM_OFFLINE state. However, the memblock state is currently only updated
> if the offlining has been initiated from the sysfs interface (echo offline >
> /sys/devices/system/memory/memoryXX/state). The acpi hot-remove codepath does
> not use the sysfs interface but directly calls offline_pages. So remove_memory()
> will always fail, even if offline_pages has succeeded.

Thank you for pointing this problem.

> 
> This patch solves this by registering a memblock_state notifier function in the
> memory_notify chain. This will change state of memblocks independently of sysfs
> usage.

I think this patch does not solve this problem.

> 
> The patch is based on work-in-progress patches for memory hot-remove, see:
> http://lwn.net/Articles/507244/
> 
> Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
> ---
>  drivers/base/memory.c |   37 +++++++++++++++++++++++++++++++++++++
>  1 files changed, 37 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
> index 8981568..4095f3f 100644
> --- a/drivers/base/memory.c
> +++ b/drivers/base/memory.c
> @@ -706,6 +706,42 @@ int unregister_memory_section(struct mem_section *section)
>  	return remove_memory_block(0, section, 0);
>  }
>  
> +static int memblock_state_notifier_nb(struct notifier_block *nb, unsigned long
> +		val, void *v)
> +{
> +	struct memory_notify *arg = (struct memory_notify *)v;
> +	struct memory_block *mem = NULL;
> +	struct mem_section *ms;
> +	unsigned long section_nr;
> +
> +	section_nr = pfn_to_section_nr(arg->start_pfn);
> +	ms = __nr_to_section(section_nr);
> +	mem = find_memory_block(ms);
> +	if (!mem)
> +		goto out;

we may offline more than one memory block.

> +
> +	switch (val) {
> +	case MEM_GOING_OFFLINE:
> +	case MEM_OFFLINE:
> +	case MEM_GOING_ONLINE:
> +	case MEM_ONLINE:
> +	case MEM_CANCEL_ONLINE:
> +	case MEM_CANCEL_OFFLINE:
> +		mem->state = val;

mem->state is protected by the lock mem->state_mutex, so if you want to
update the state, you must lock mem->state_mutex. But you cannot lock it
here, because it may cause deadlock:

acpi_memhotplug                           sysfs interface
===============================================================================
                                          memory_block_change_state()
                                              lock mem->state_mutex
                                              memory_block_action()
offline_pages()
    lock_memory_hotplug()
                                                  offline_memory()
                                                      lock_memory_hotplug() // block
    memory_notify()
        memblock_state_notifier_nb()
===============================================================================

I'm writing another patch to fix it.

Thanks
Wen Congyang

> +		break;
> +	default:
> +		printk(KERN_WARNING "invalid memblock state\n");
> +		break;
> +	}
> +out:
> +	return NOTIFY_OK;
> +}
> +
> +static struct notifier_block memblock_state_nb = {
> +	.notifier_call = memblock_state_notifier_nb,
> +	.priority = 0
> +};
> +
>  /*
>   * Initialize the sysfs support for memory devices...
>   */
> @@ -724,6 +760,7 @@ int __init memory_dev_init(void)
>  	block_sz = get_memory_block_size();
>  	sections_per_block = block_sz / MIN_MEMORY_BLOCK_SIZE;
>  
> +	register_memory_notifier(&memblock_state_nb);
>  	/*
>  	 * Create entries for memory sections that were found
>  	 * during boot and have been initialized

--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Vasilis Liaskovitis July 23, 2012, 11:06 a.m. UTC | #2
Hi,

On Mon, Jul 23, 2012 at 05:08:04PM +0800, Wen Congyang wrote:
> > +static int memblock_state_notifier_nb(struct notifier_block *nb, unsigned long
> > +		val, void *v)
> > +{
> > +	struct memory_notify *arg = (struct memory_notify *)v;
> > +	struct memory_block *mem = NULL;
> > +	struct mem_section *ms;
> > +	unsigned long section_nr;
> > +
> > +	section_nr = pfn_to_section_nr(arg->start_pfn);
> > +	ms = __nr_to_section(section_nr);
> > +	mem = find_memory_block(ms);
> > +	if (!mem)
> > +		goto out;
> 
> we may offline more than one memory block.
>
thanks, you are right.

> > +
> > +	switch (val) {
> > +	case MEM_GOING_OFFLINE:
> > +	case MEM_OFFLINE:
> > +	case MEM_GOING_ONLINE:
> > +	case MEM_ONLINE:
> > +	case MEM_CANCEL_ONLINE:
> > +	case MEM_CANCEL_OFFLINE:
> > +		mem->state = val;
> 
> mem->state is protected by the lock mem->state_mutex, so if you want to
> update the state, you must lock mem->state_mutex. But you cannot lock it
> here, because it may cause deadlock:
> 
> acpi_memhotplug                           sysfs interface
> ===============================================================================
>                                           memory_block_change_state()
>                                               lock mem->state_mutex
>                                               memory_block_action()
> offline_pages()
>     lock_memory_hotplug()
>                                                   offline_memory()
>                                                       lock_memory_hotplug() // block
>     memory_notify()
>         memblock_state_notifier_nb()
> ===============================================================================

good point. Maybe if memory_hotplug_lock and state_mutex locks are acquired in
the same order in the 2 code paths, this could be avoided.

> I'm writing another patch to fix it.

ok, I 'll test.
thanks,

- Vasilis
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Wen Congyang July 23, 2012, 12:13 p.m. UTC | #3
At 2012/7/23 19:06, Vasilis Liaskovitis Wrote:
> Hi,
>
> On Mon, Jul 23, 2012 at 05:08:04PM +0800, Wen Congyang wrote:
>>> +static int memblock_state_notifier_nb(struct notifier_block *nb, unsigned long
>>> +		val, void *v)
>>> +{
>>> +	struct memory_notify *arg = (struct memory_notify *)v;
>>> +	struct memory_block *mem = NULL;
>>> +	struct mem_section *ms;
>>> +	unsigned long section_nr;
>>> +
>>> +	section_nr = pfn_to_section_nr(arg->start_pfn);
>>> +	ms = __nr_to_section(section_nr);
>>> +	mem = find_memory_block(ms);
>>> +	if (!mem)
>>> +		goto out;
>>
>> we may offline more than one memory block.
>>
> thanks, you are right.
>
>>> +
>>> +	switch (val) {
>>> +	case MEM_GOING_OFFLINE:
>>> +	case MEM_OFFLINE:
>>> +	case MEM_GOING_ONLINE:
>>> +	case MEM_ONLINE:
>>> +	case MEM_CANCEL_ONLINE:
>>> +	case MEM_CANCEL_OFFLINE:
>>> +		mem->state = val;
>>
>> mem->state is protected by the lock mem->state_mutex, so if you want to
>> update the state, you must lock mem->state_mutex. But you cannot lock it
>> here, because it may cause deadlock:
>>
>> acpi_memhotplug                           sysfs interface
>> ===============================================================================
>>                                            memory_block_change_state()
>>                                                lock mem->state_mutex
>>                                                memory_block_action()
>> offline_pages()
>>      lock_memory_hotplug()
>>                                                    offline_memory()
>>                                                        lock_memory_hotplug() // block
>>      memory_notify()
>>          memblock_state_notifier_nb()
>> ===============================================================================
>
> good point. Maybe if memory_hotplug_lock and state_mutex locks are acquired in
> the same order in the 2 code paths, this could be avoided.

Yes, I am trying to fix another 2 problems(also based on ishimatsu's 
patchset):
1. offline_memory() will fail if part of the memory is onlined and part 
of the memory
    is offlined.
2. notify the userspace if the memory block's status is changed

I guess this problem can be fixed together.

Thanks
Wen Congyang

>
>> I'm writing another patch to fix it.
>
> ok, I 'll test.
> thanks,
>
> - Vasilis
> --
> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 8981568..4095f3f 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -706,6 +706,42 @@  int unregister_memory_section(struct mem_section *section)
 	return remove_memory_block(0, section, 0);
 }
 
+static int memblock_state_notifier_nb(struct notifier_block *nb, unsigned long
+		val, void *v)
+{
+	struct memory_notify *arg = (struct memory_notify *)v;
+	struct memory_block *mem = NULL;
+	struct mem_section *ms;
+	unsigned long section_nr;
+
+	section_nr = pfn_to_section_nr(arg->start_pfn);
+	ms = __nr_to_section(section_nr);
+	mem = find_memory_block(ms);
+	if (!mem)
+		goto out;
+
+	switch (val) {
+	case MEM_GOING_OFFLINE:
+	case MEM_OFFLINE:
+	case MEM_GOING_ONLINE:
+	case MEM_ONLINE:
+	case MEM_CANCEL_ONLINE:
+	case MEM_CANCEL_OFFLINE:
+		mem->state = val;
+		break;
+	default:
+		printk(KERN_WARNING "invalid memblock state\n");
+		break;
+	}
+out:
+	return NOTIFY_OK;
+}
+
+static struct notifier_block memblock_state_nb = {
+	.notifier_call = memblock_state_notifier_nb,
+	.priority = 0
+};
+
 /*
  * Initialize the sysfs support for memory devices...
  */
@@ -724,6 +760,7 @@  int __init memory_dev_init(void)
 	block_sz = get_memory_block_size();
 	sections_per_block = block_sz / MIN_MEMORY_BLOCK_SIZE;
 
+	register_memory_notifier(&memblock_state_nb);
 	/*
 	 * Create entries for memory sections that were found
 	 * during boot and have been initialized