diff mbox

[RFC,v3,3/3] acpi_memhotplug: Allow eject to proceed on rebind scenario

Message ID 1353693037-21704-4-git-send-email-vasilis.liaskovitis@profitbricks.com (mailing list archive)
State RFC, archived
Headers show

Commit Message

Vasilis Liaskovitis Nov. 23, 2012, 5:50 p.m. UTC
Consider the following sequence of operations for a hotplugged memory device:

1. echo "PNP0C80:XX" > /sys/bus/acpi/drivers/acpi_memhotplug/unbind
2. echo "PNP0C80:XX" > /sys/bus/acpi/drivers/acpi_memhotplug/bind
3. echo 1 >/sys/bus/pci/devices/PNP0C80:XX/eject

The driver is successfully re-bound to the device in step 2. However step 3 will
not attempt to remove the memory. This is because the acpi_memory_info enabled
bit for the newly bound driver has not been set to 1. This bit needs to be set
in the case where the memory is already used by the kernel (add_memory returns
-EEXIST)

Setting the enabled bit in this case (in acpi_memory_enable_device) makes the
driver function properly after a rebind of the driver i.e. eject operation
attempts to remove memory after a successful rebind.

I am not sure if this breaks some other usage of the enabled bit (see commit
65479472). When is it possible for the memory to be in use by the kernel but
not managed by the acpi driver, apart from a driver unbind scenario?

Perhaps the patch is not needed, depending on expected semantics of re-binding.
Is the newly bound driver supposed to manage the device, if it was earlier
managed by the same driver?

This patch is only specific to this scenario, and can be dropped from the patch
series if needed.

Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
---
 drivers/acpi/acpi_memhotplug.c |    3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

Comments

Wen Congyang Nov. 24, 2012, 4:20 p.m. UTC | #1
At 2012/11/24 1:50, Vasilis Liaskovitis Wrote:
> Consider the following sequence of operations for a hotplugged memory device:
> 
> 1. echo "PNP0C80:XX">  /sys/bus/acpi/drivers/acpi_memhotplug/unbind
> 2. echo "PNP0C80:XX">  /sys/bus/acpi/drivers/acpi_memhotplug/bind
> 3. echo 1>/sys/bus/pci/devices/PNP0C80:XX/eject
> 
> The driver is successfully re-bound to the device in step 2. However step 3 will
> not attempt to remove the memory. This is because the acpi_memory_info enabled
> bit for the newly bound driver has not been set to 1. This bit needs to be set
> in the case where the memory is already used by the kernel (add_memory returns
> -EEXIST)

Hmm, I think the reason is that we don't offline/remove memory when
unbinding it
from the driver. I have sent a patch to fix this problem, and this patch
is in
pm tree now. With this patch, we will offline/remove memory when
unbinding it from
the drriver.

Consider the following sequence of operations for a hotplugged memory
device:

1. echo "PNP0C80:XX" > /sys/bus/acpi/drivers/acpi_memhotplug/unbind
2. echo 1 >/sys/bus/pci/devices/PNP0C80:XX/eject

If we don't offline/remove the memory, we have no chance to do it in
step 2. After
step2, the memory is used by the kernel, but we have powered off it. It
is very
dangerous.

So this patch is unnecessary now.

Thanks
Wen Congyang

> 
> Setting the enabled bit in this case (in acpi_memory_enable_device) makes the
> driver function properly after a rebind of the driver i.e. eject operation
> attempts to remove memory after a successful rebind.
> 
> I am not sure if this breaks some other usage of the enabled bit (see commit
> 65479472). When is it possible for the memory to be in use by the kernel but
> not managed by the acpi driver, apart from a driver unbind scenario?
> 
> Perhaps the patch is not needed, depending on expected semantics of re-binding.
> Is the newly bound driver supposed to manage the device, if it was earlier
> managed by the same driver?
> 
> This patch is only specific to this scenario, and can be dropped from the patch
> series if needed.
> 
> Signed-off-by: Vasilis Liaskovitis<vasilis.liaskovitis@profitbricks.com>
> ---
>   drivers/acpi/acpi_memhotplug.c |    3 +--
>   1 files changed, 1 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
> index d0cfbd9..0562cb4 100644
> --- a/drivers/acpi/acpi_memhotplug.c
> +++ b/drivers/acpi/acpi_memhotplug.c
> @@ -271,12 +271,11 @@ static int acpi_memory_enable_device(struct acpi_memory_device *mem_device)
>   			continue;
>   		}
> 
> -		if (!result)
> -			info->enabled = 1;
>   		/*
>   		 * Add num_enable even if add_memory() returns -EEXIST, so the
>   		 * device is bound to this driver.
>   		 */
> +		info->enabled = 1;
>   		num_enabled++;
>   	}
>   	if (!num_enabled) {
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Vasilis Liaskovitis Nov. 26, 2012, 8:36 a.m. UTC | #2
On Sun, Nov 25, 2012 at 12:20:47AM +0800, Wen Congyang wrote:
> At 2012/11/24 1:50, Vasilis Liaskovitis Wrote:
> > Consider the following sequence of operations for a hotplugged memory device:
> > 
> > 1. echo "PNP0C80:XX">  /sys/bus/acpi/drivers/acpi_memhotplug/unbind
> > 2. echo "PNP0C80:XX">  /sys/bus/acpi/drivers/acpi_memhotplug/bind
> > 3. echo 1>/sys/bus/pci/devices/PNP0C80:XX/eject
> > 
> > The driver is successfully re-bound to the device in step 2. However step 3 will
> > not attempt to remove the memory. This is because the acpi_memory_info enabled
> > bit for the newly bound driver has not been set to 1. This bit needs to be set
> > in the case where the memory is already used by the kernel (add_memory returns
> > -EEXIST)
> 
> Hmm, I think the reason is that we don't offline/remove memory when
> unbinding it
> from the driver. I have sent a patch to fix this problem, and this patch
> is in
> pm tree now. With this patch, we will offline/remove memory when
> unbinding it from
> the drriver.

ok. Which patch is this? Does it require driver-core changes?

> 
> Consider the following sequence of operations for a hotplugged memory
> device:
> 
> 1. echo "PNP0C80:XX" > /sys/bus/acpi/drivers/acpi_memhotplug/unbind
> 2. echo 1 >/sys/bus/pci/devices/PNP0C80:XX/eject
> 
> If we don't offline/remove the memory, we have no chance to do it in
> step 2. After
> step2, the memory is used by the kernel, but we have powered off it. It
> is very
> dangerous.

How does power-off happen after unbind? acpi_eject_store checks for existing
driver before taking any action:

#ifndef FORCE_EJECT
	if (acpi_device->driver == NULL) {
		ret = -ENODEV;
		goto err;
	}
#endif

FORCE_EJECT is not defined afaict, so the function returns without scheduling
acpi_bus_hot_remove_device. Is there another code path that calls power-off?

thanks,

- Vasilis
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Wen Congyang Nov. 26, 2012, 9:11 a.m. UTC | #3
At 11/26/2012 04:36 PM, Vasilis Liaskovitis Wrote:
> On Sun, Nov 25, 2012 at 12:20:47AM +0800, Wen Congyang wrote:
>> At 2012/11/24 1:50, Vasilis Liaskovitis Wrote:
>>> Consider the following sequence of operations for a hotplugged memory device:
>>>
>>> 1. echo "PNP0C80:XX">  /sys/bus/acpi/drivers/acpi_memhotplug/unbind
>>> 2. echo "PNP0C80:XX">  /sys/bus/acpi/drivers/acpi_memhotplug/bind
>>> 3. echo 1>/sys/bus/pci/devices/PNP0C80:XX/eject
>>>
>>> The driver is successfully re-bound to the device in step 2. However step 3 will
>>> not attempt to remove the memory. This is because the acpi_memory_info enabled
>>> bit for the newly bound driver has not been set to 1. This bit needs to be set
>>> in the case where the memory is already used by the kernel (add_memory returns
>>> -EEXIST)
>>
>> Hmm, I think the reason is that we don't offline/remove memory when
>> unbinding it
>> from the driver. I have sent a patch to fix this problem, and this patch
>> is in
>> pm tree now. With this patch, we will offline/remove memory when
>> unbinding it from
>> the drriver.
> 
> ok. Which patch is this? Does it require driver-core changes?

https://lkml.org/lkml/2012/11/15/21

Patch 1-6 is in pm tree now.

> 
>>
>> Consider the following sequence of operations for a hotplugged memory
>> device:
>>
>> 1. echo "PNP0C80:XX" > /sys/bus/acpi/drivers/acpi_memhotplug/unbind
>> 2. echo 1 >/sys/bus/pci/devices/PNP0C80:XX/eject
>>
>> If we don't offline/remove the memory, we have no chance to do it in
>> step 2. After
>> step2, the memory is used by the kernel, but we have powered off it. It
>> is very
>> dangerous.
> 
> How does power-off happen after unbind? acpi_eject_store checks for existing
> driver before taking any action:
> 
> #ifndef FORCE_EJECT
> 	if (acpi_device->driver == NULL) {
> 		ret = -ENODEV;
> 		goto err;
> 	}
> #endif
> 
> FORCE_EJECT is not defined afaict, so the function returns without scheduling
> acpi_bus_hot_remove_device. Is there another code path that calls power-off?

Consider the following case:

We hotremove the memory device by SCI and unbind it from the driver at the same time:

CPUa                                                  CPUb
acpi_memory_device_notify()
                                       unbind it from the driver
    acpi_bus_hot_remove_device()

Thanks
Wen Congyang

> 
> thanks,
> 
> - Vasilis
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Toshi Kani Nov. 27, 2012, 12:19 a.m. UTC | #4
> >> Consider the following sequence of operations for a hotplugged memory
> >> device:
> >>
> >> 1. echo "PNP0C80:XX" > /sys/bus/acpi/drivers/acpi_memhotplug/unbind
> >> 2. echo 1 >/sys/bus/pci/devices/PNP0C80:XX/eject
> >>
> >> If we don't offline/remove the memory, we have no chance to do it in
> >> step 2. After
> >> step2, the memory is used by the kernel, but we have powered off it. It
> >> is very
> >> dangerous.
> > 
> > How does power-off happen after unbind? acpi_eject_store checks for existing
> > driver before taking any action:
> > 
> > #ifndef FORCE_EJECT
> > 	if (acpi_device->driver == NULL) {
> > 		ret = -ENODEV;
> > 		goto err;
> > 	}
> > #endif
> > 
> > FORCE_EJECT is not defined afaict, so the function returns without scheduling
> > acpi_bus_hot_remove_device. Is there another code path that calls power-off?
> 
> Consider the following case:
> 
> We hotremove the memory device by SCI and unbind it from the driver at the same time:
> 
> CPUa                                                  CPUb
> acpi_memory_device_notify()
>                                        unbind it from the driver
>     acpi_bus_hot_remove_device()

Can we make acpi_bus_remove() to fail if a given acpi_device is not
bound with a driver?  If so, can we make the unbind operation to perform
unbind only?

Thanks,
-Toshi


--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Vasilis Liaskovitis Nov. 27, 2012, 6:32 p.m. UTC | #5
On Mon, Nov 26, 2012 at 05:19:01PM -0700, Toshi Kani wrote:
> > >> Consider the following sequence of operations for a hotplugged memory
> > >> device:
> > >>
> > >> 1. echo "PNP0C80:XX" > /sys/bus/acpi/drivers/acpi_memhotplug/unbind
> > >> 2. echo 1 >/sys/bus/pci/devices/PNP0C80:XX/eject
> > >>
> > >> If we don't offline/remove the memory, we have no chance to do it in
> > >> step 2. After
> > >> step2, the memory is used by the kernel, but we have powered off it. It
> > >> is very
> > >> dangerous.
> > > 
> > > How does power-off happen after unbind? acpi_eject_store checks for existing
> > > driver before taking any action:
> > > 
> > > #ifndef FORCE_EJECT
> > > 	if (acpi_device->driver == NULL) {
> > > 		ret = -ENODEV;
> > > 		goto err;
> > > 	}
> > > #endif
> > > 
> > > FORCE_EJECT is not defined afaict, so the function returns without scheduling
> > > acpi_bus_hot_remove_device. Is there another code path that calls power-off?
> > 
> > Consider the following case:
> > 
> > We hotremove the memory device by SCI and unbind it from the driver at the same time:
> > 
> > CPUa                                                  CPUb
> > acpi_memory_device_notify()
> >                                        unbind it from the driver
> >     acpi_bus_hot_remove_device()
> 
> Can we make acpi_bus_remove() to fail if a given acpi_device is not
> bound with a driver?  If so, can we make the unbind operation to perform
> unbind only?

acpi_bus_remove_device could check if the driver is present, and return -ENODEV
if it's not present (dev->driver == NULL).

But there can still be a race between an eject and an unbind operation happening
simultaneously. This seems like a general problem to me i.e. not specific to an
acpi memory device. How do we ensure an eject does not race with a driver unbind
for other acpi devices?

Is there a per-device lock in acpi-core or device-core that can prevent this from
happening? Driver core does a device_lock(dev) on all operations, but this is
probably not grabbed on SCI-initiated acpi ejects.

thanks,

- Vasilis
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Toshi Kani Nov. 27, 2012, 10:03 p.m. UTC | #6
On Tue, 2012-11-27 at 19:32 +0100, Vasilis Liaskovitis wrote:
> On Mon, Nov 26, 2012 at 05:19:01PM -0700, Toshi Kani wrote:
> > > >> Consider the following sequence of operations for a hotplugged memory
> > > >> device:
> > > >>
> > > >> 1. echo "PNP0C80:XX" > /sys/bus/acpi/drivers/acpi_memhotplug/unbind
> > > >> 2. echo 1 >/sys/bus/pci/devices/PNP0C80:XX/eject
> > > >>
> > > >> If we don't offline/remove the memory, we have no chance to do it in
> > > >> step 2. After
> > > >> step2, the memory is used by the kernel, but we have powered off it. It
> > > >> is very
> > > >> dangerous.
> > > > 
> > > > How does power-off happen after unbind? acpi_eject_store checks for existing
> > > > driver before taking any action:
> > > > 
> > > > #ifndef FORCE_EJECT
> > > > 	if (acpi_device->driver == NULL) {
> > > > 		ret = -ENODEV;
> > > > 		goto err;
> > > > 	}
> > > > #endif
> > > > 
> > > > FORCE_EJECT is not defined afaict, so the function returns without scheduling
> > > > acpi_bus_hot_remove_device. Is there another code path that calls power-off?
> > > 
> > > Consider the following case:
> > > 
> > > We hotremove the memory device by SCI and unbind it from the driver at the same time:
> > > 
> > > CPUa                                                  CPUb
> > > acpi_memory_device_notify()
> > >                                        unbind it from the driver
> > >     acpi_bus_hot_remove_device()
> > 
> > Can we make acpi_bus_remove() to fail if a given acpi_device is not
> > bound with a driver?  If so, can we make the unbind operation to perform
> > unbind only?
> 
> acpi_bus_remove_device could check if the driver is present, and return -ENODEV
> if it's not present (dev->driver == NULL).
> 
> But there can still be a race between an eject and an unbind operation happening
> simultaneously. This seems like a general problem to me i.e. not specific to an
> acpi memory device. How do we ensure an eject does not race with a driver unbind
> for other acpi devices?
> 
> Is there a per-device lock in acpi-core or device-core that can prevent this from
> happening? Driver core does a device_lock(dev) on all operations, but this is
> probably not grabbed on SCI-initiated acpi ejects.

Since driver_unbind() calls device_lock(dev->parent) before calling
device_release_driver(), I am wondering if we can call
device_lock(dev->dev->parent) at the beginning of acpi_bus_remove()
(i.e. before calling pre_remove) and fails if dev->driver is NULL.  The
parent lock is otherwise released after device_release_driver() is done.

Thanks,
-Toshi

--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael Wysocki Nov. 27, 2012, 11:41 p.m. UTC | #7
On Tuesday, November 27, 2012 03:03:47 PM Toshi Kani wrote:
> On Tue, 2012-11-27 at 19:32 +0100, Vasilis Liaskovitis wrote:
> > On Mon, Nov 26, 2012 at 05:19:01PM -0700, Toshi Kani wrote:
> > > > >> Consider the following sequence of operations for a hotplugged memory
> > > > >> device:
> > > > >>
> > > > >> 1. echo "PNP0C80:XX" > /sys/bus/acpi/drivers/acpi_memhotplug/unbind
> > > > >> 2. echo 1 >/sys/bus/pci/devices/PNP0C80:XX/eject
> > > > >>
> > > > >> If we don't offline/remove the memory, we have no chance to do it in
> > > > >> step 2. After
> > > > >> step2, the memory is used by the kernel, but we have powered off it. It
> > > > >> is very
> > > > >> dangerous.
> > > > > 
> > > > > How does power-off happen after unbind? acpi_eject_store checks for existing
> > > > > driver before taking any action:
> > > > > 
> > > > > #ifndef FORCE_EJECT
> > > > > 	if (acpi_device->driver == NULL) {
> > > > > 		ret = -ENODEV;
> > > > > 		goto err;
> > > > > 	}
> > > > > #endif
> > > > > 
> > > > > FORCE_EJECT is not defined afaict, so the function returns without scheduling
> > > > > acpi_bus_hot_remove_device. Is there another code path that calls power-off?
> > > > 
> > > > Consider the following case:
> > > > 
> > > > We hotremove the memory device by SCI and unbind it from the driver at the same time:
> > > > 
> > > > CPUa                                                  CPUb
> > > > acpi_memory_device_notify()
> > > >                                        unbind it from the driver
> > > >     acpi_bus_hot_remove_device()
> > > 
> > > Can we make acpi_bus_remove() to fail if a given acpi_device is not
> > > bound with a driver?  If so, can we make the unbind operation to perform
> > > unbind only?
> > 
> > acpi_bus_remove_device could check if the driver is present, and return -ENODEV
> > if it's not present (dev->driver == NULL).
> > 
> > But there can still be a race between an eject and an unbind operation happening
> > simultaneously. This seems like a general problem to me i.e. not specific to an
> > acpi memory device. How do we ensure an eject does not race with a driver unbind
> > for other acpi devices?
> > 
> > Is there a per-device lock in acpi-core or device-core that can prevent this from
> > happening? Driver core does a device_lock(dev) on all operations, but this is
> > probably not grabbed on SCI-initiated acpi ejects.
> 
> Since driver_unbind() calls device_lock(dev->parent) before calling
> device_release_driver(), I am wondering if we can call
> device_lock(dev->dev->parent) at the beginning of acpi_bus_remove()
> (i.e. before calling pre_remove) and fails if dev->driver is NULL.  The
> parent lock is otherwise released after device_release_driver() is done.

I would be careful.  You may introduce some subtle locking-related issues
this way.

Besides, there may be an alternative approach to all this.  For example,
what if we don't remove struct device objects on eject?  The ACPI handles
associated with them don't go away in that case after all, do they?

Rafael
Toshi Kani Nov. 28, 2012, 4:01 p.m. UTC | #8
On Wed, 2012-11-28 at 00:41 +0100, Rafael J. Wysocki wrote:
> On Tuesday, November 27, 2012 03:03:47 PM Toshi Kani wrote:
> > On Tue, 2012-11-27 at 19:32 +0100, Vasilis Liaskovitis wrote:
> > > On Mon, Nov 26, 2012 at 05:19:01PM -0700, Toshi Kani wrote:
> > > > > >> Consider the following sequence of operations for a hotplugged memory
> > > > > >> device:
> > > > > >>
> > > > > >> 1. echo "PNP0C80:XX" > /sys/bus/acpi/drivers/acpi_memhotplug/unbind
> > > > > >> 2. echo 1 >/sys/bus/pci/devices/PNP0C80:XX/eject
> > > > > >>
> > > > > >> If we don't offline/remove the memory, we have no chance to do it in
> > > > > >> step 2. After
> > > > > >> step2, the memory is used by the kernel, but we have powered off it. It
> > > > > >> is very
> > > > > >> dangerous.
> > > > > > 
> > > > > > How does power-off happen after unbind? acpi_eject_store checks for existing
> > > > > > driver before taking any action:
> > > > > > 
> > > > > > #ifndef FORCE_EJECT
> > > > > > 	if (acpi_device->driver == NULL) {
> > > > > > 		ret = -ENODEV;
> > > > > > 		goto err;
> > > > > > 	}
> > > > > > #endif
> > > > > > 
> > > > > > FORCE_EJECT is not defined afaict, so the function returns without scheduling
> > > > > > acpi_bus_hot_remove_device. Is there another code path that calls power-off?
> > > > > 
> > > > > Consider the following case:
> > > > > 
> > > > > We hotremove the memory device by SCI and unbind it from the driver at the same time:
> > > > > 
> > > > > CPUa                                                  CPUb
> > > > > acpi_memory_device_notify()
> > > > >                                        unbind it from the driver
> > > > >     acpi_bus_hot_remove_device()
> > > > 
> > > > Can we make acpi_bus_remove() to fail if a given acpi_device is not
> > > > bound with a driver?  If so, can we make the unbind operation to perform
> > > > unbind only?
> > > 
> > > acpi_bus_remove_device could check if the driver is present, and return -ENODEV
> > > if it's not present (dev->driver == NULL).
> > > 
> > > But there can still be a race between an eject and an unbind operation happening
> > > simultaneously. This seems like a general problem to me i.e. not specific to an
> > > acpi memory device. How do we ensure an eject does not race with a driver unbind
> > > for other acpi devices?
> > > 
> > > Is there a per-device lock in acpi-core or device-core that can prevent this from
> > > happening? Driver core does a device_lock(dev) on all operations, but this is
> > > probably not grabbed on SCI-initiated acpi ejects.
> > 
> > Since driver_unbind() calls device_lock(dev->parent) before calling
> > device_release_driver(), I am wondering if we can call
> > device_lock(dev->dev->parent) at the beginning of acpi_bus_remove()
> > (i.e. before calling pre_remove) and fails if dev->driver is NULL.  The
> > parent lock is otherwise released after device_release_driver() is done.
> 
> I would be careful.  You may introduce some subtle locking-related issues
> this way.

Right.  This requires careful inspection and testing.  As far as the
locking is concerned, I am not keen on using fine grained locking for
hot-plug.  It is much simpler and solid if we serialize such operations.

> Besides, there may be an alternative approach to all this.  For example,
> what if we don't remove struct device objects on eject?  The ACPI handles
> associated with them don't go away in that case after all, do they?

Umm...  Sorry, I am not getting your point.  The issue is that we need
to be able to fail a request when memory range cannot be off-lined.
Otherwise, we end up ejecting online memory range.

Thanks,
-Toshi

--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael Wysocki Nov. 28, 2012, 6:40 p.m. UTC | #9
On Wednesday, November 28, 2012 09:01:13 AM Toshi Kani wrote:
> On Wed, 2012-11-28 at 00:41 +0100, Rafael J. Wysocki wrote:
> > On Tuesday, November 27, 2012 03:03:47 PM Toshi Kani wrote:
> > > On Tue, 2012-11-27 at 19:32 +0100, Vasilis Liaskovitis wrote:
> > > > On Mon, Nov 26, 2012 at 05:19:01PM -0700, Toshi Kani wrote:
> > > > > > >> Consider the following sequence of operations for a hotplugged memory
> > > > > > >> device:
> > > > > > >>
> > > > > > >> 1. echo "PNP0C80:XX" > /sys/bus/acpi/drivers/acpi_memhotplug/unbind
> > > > > > >> 2. echo 1 >/sys/bus/pci/devices/PNP0C80:XX/eject
> > > > > > >>
> > > > > > >> If we don't offline/remove the memory, we have no chance to do it in
> > > > > > >> step 2. After
> > > > > > >> step2, the memory is used by the kernel, but we have powered off it. It
> > > > > > >> is very
> > > > > > >> dangerous.
> > > > > > > 
> > > > > > > How does power-off happen after unbind? acpi_eject_store checks for existing
> > > > > > > driver before taking any action:
> > > > > > > 
> > > > > > > #ifndef FORCE_EJECT
> > > > > > > 	if (acpi_device->driver == NULL) {
> > > > > > > 		ret = -ENODEV;
> > > > > > > 		goto err;
> > > > > > > 	}
> > > > > > > #endif
> > > > > > > 
> > > > > > > FORCE_EJECT is not defined afaict, so the function returns without scheduling
> > > > > > > acpi_bus_hot_remove_device. Is there another code path that calls power-off?
> > > > > > 
> > > > > > Consider the following case:
> > > > > > 
> > > > > > We hotremove the memory device by SCI and unbind it from the driver at the same time:
> > > > > > 
> > > > > > CPUa                                                  CPUb
> > > > > > acpi_memory_device_notify()
> > > > > >                                        unbind it from the driver
> > > > > >     acpi_bus_hot_remove_device()
> > > > > 
> > > > > Can we make acpi_bus_remove() to fail if a given acpi_device is not
> > > > > bound with a driver?  If so, can we make the unbind operation to perform
> > > > > unbind only?
> > > > 
> > > > acpi_bus_remove_device could check if the driver is present, and return -ENODEV
> > > > if it's not present (dev->driver == NULL).
> > > > 
> > > > But there can still be a race between an eject and an unbind operation happening
> > > > simultaneously. This seems like a general problem to me i.e. not specific to an
> > > > acpi memory device. How do we ensure an eject does not race with a driver unbind
> > > > for other acpi devices?
> > > > 
> > > > Is there a per-device lock in acpi-core or device-core that can prevent this from
> > > > happening? Driver core does a device_lock(dev) on all operations, but this is
> > > > probably not grabbed on SCI-initiated acpi ejects.
> > > 
> > > Since driver_unbind() calls device_lock(dev->parent) before calling
> > > device_release_driver(), I am wondering if we can call
> > > device_lock(dev->dev->parent) at the beginning of acpi_bus_remove()
> > > (i.e. before calling pre_remove) and fails if dev->driver is NULL.  The
> > > parent lock is otherwise released after device_release_driver() is done.
> > 
> > I would be careful.  You may introduce some subtle locking-related issues
> > this way.
> 
> Right.  This requires careful inspection and testing.  As far as the
> locking is concerned, I am not keen on using fine grained locking for
> hot-plug.  It is much simpler and solid if we serialize such operations.
> 
> > Besides, there may be an alternative approach to all this.  For example,
> > what if we don't remove struct device objects on eject?  The ACPI handles
> > associated with them don't go away in that case after all, do they?
> 
> Umm...  Sorry, I am not getting your point.  The issue is that we need
> to be able to fail a request when memory range cannot be off-lined.
> Otherwise, we end up ejecting online memory range.

Yes, this is the major one.  The minor issue, however, is a race condition
between unbinding a driver from a device and removing the device if I
understand it correctly.  Which will go away automatically if the device is
not removed in the first place.  Or so I would think. :-)

Thanks,
Rafael
Toshi Kani Nov. 28, 2012, 9:02 p.m. UTC | #10
> > > > > > > Consider the following case:
> > > > > > > 
> > > > > > > We hotremove the memory device by SCI and unbind it from the driver at the same time:
> > > > > > > 
> > > > > > > CPUa                                                  CPUb
> > > > > > > acpi_memory_device_notify()
> > > > > > >                                        unbind it from the driver
> > > > > > >     acpi_bus_hot_remove_device()
> > > > > > 
> > > > > > Can we make acpi_bus_remove() to fail if a given acpi_device is not
> > > > > > bound with a driver?  If so, can we make the unbind operation to perform
> > > > > > unbind only?
> > > > > 
> > > > > acpi_bus_remove_device could check if the driver is present, and return -ENODEV
> > > > > if it's not present (dev->driver == NULL).
> > > > > 
> > > > > But there can still be a race between an eject and an unbind operation happening
> > > > > simultaneously. This seems like a general problem to me i.e. not specific to an
> > > > > acpi memory device. How do we ensure an eject does not race with a driver unbind
> > > > > for other acpi devices?
> > > > > 
> > > > > Is there a per-device lock in acpi-core or device-core that can prevent this from
> > > > > happening? Driver core does a device_lock(dev) on all operations, but this is
> > > > > probably not grabbed on SCI-initiated acpi ejects.
> > > > 
> > > > Since driver_unbind() calls device_lock(dev->parent) before calling
> > > > device_release_driver(), I am wondering if we can call
> > > > device_lock(dev->dev->parent) at the beginning of acpi_bus_remove()
> > > > (i.e. before calling pre_remove) and fails if dev->driver is NULL.  The
> > > > parent lock is otherwise released after device_release_driver() is done.
> > > 
> > > I would be careful.  You may introduce some subtle locking-related issues
> > > this way.
> > 
> > Right.  This requires careful inspection and testing.  As far as the
> > locking is concerned, I am not keen on using fine grained locking for
> > hot-plug.  It is much simpler and solid if we serialize such operations.
> > 
> > > Besides, there may be an alternative approach to all this.  For example,
> > > what if we don't remove struct device objects on eject?  The ACPI handles
> > > associated with them don't go away in that case after all, do they?
> > 
> > Umm...  Sorry, I am not getting your point.  The issue is that we need
> > to be able to fail a request when memory range cannot be off-lined.
> > Otherwise, we end up ejecting online memory range.
> 
> Yes, this is the major one.  The minor issue, however, is a race condition
> between unbinding a driver from a device and removing the device if I
> understand it correctly.  Which will go away automatically if the device is
> not removed in the first place.  Or so I would think. :-)

I see.  I do not think whether or not the device is removed on eject
makes any difference here.  The issue is that after driver_unbind() is
done, acpi_bus_hot_remove_device() no longer calls the ACPI memory
driver (hence, it cannot fail in prepare_remove), and goes ahead to call
_EJ0.  If driver_unbind() did off-line the memory, this is OK.  However,
it cannot off-line kernel memory ranges.  So, we basically need to
either 1) serialize acpi_bus_hot_remove_device() and driver_unbind(), or
2) make acpi_bus_hot_remove_device() to fail if driver_unbind() is run
during the operation.

Thanks,
-Toshi

--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Toshi Kani Nov. 28, 2012, 9:40 p.m. UTC | #11
On Wed, 2012-11-28 at 22:40 +0100, Rafael J. Wysocki wrote:
> On Wednesday, November 28, 2012 02:02:48 PM Toshi Kani wrote:
> > > > > > > > > Consider the following case:
> > > > > > > > > 
> > > > > > > > > We hotremove the memory device by SCI and unbind it from the driver at the same time:
> > > > > > > > > 
> > > > > > > > > CPUa                                                  CPUb
> > > > > > > > > acpi_memory_device_notify()
> > > > > > > > >                                        unbind it from the driver
> > > > > > > > >     acpi_bus_hot_remove_device()
> > > > > > > > 
> > > > > > > > Can we make acpi_bus_remove() to fail if a given acpi_device is not
> > > > > > > > bound with a driver?  If so, can we make the unbind operation to perform
> > > > > > > > unbind only?
> > > > > > > 
> > > > > > > acpi_bus_remove_device could check if the driver is present, and return -ENODEV
> > > > > > > if it's not present (dev->driver == NULL).
> > > > > > > 
> > > > > > > But there can still be a race between an eject and an unbind operation happening
> > > > > > > simultaneously. This seems like a general problem to me i.e. not specific to an
> > > > > > > acpi memory device. How do we ensure an eject does not race with a driver unbind
> > > > > > > for other acpi devices?
> > > > > > > 
> > > > > > > Is there a per-device lock in acpi-core or device-core that can prevent this from
> > > > > > > happening? Driver core does a device_lock(dev) on all operations, but this is
> > > > > > > probably not grabbed on SCI-initiated acpi ejects.
> > > > > > 
> > > > > > Since driver_unbind() calls device_lock(dev->parent) before calling
> > > > > > device_release_driver(), I am wondering if we can call
> > > > > > device_lock(dev->dev->parent) at the beginning of acpi_bus_remove()
> > > > > > (i.e. before calling pre_remove) and fails if dev->driver is NULL.  The
> > > > > > parent lock is otherwise released after device_release_driver() is done.
> > > > > 
> > > > > I would be careful.  You may introduce some subtle locking-related issues
> > > > > this way.
> > > > 
> > > > Right.  This requires careful inspection and testing.  As far as the
> > > > locking is concerned, I am not keen on using fine grained locking for
> > > > hot-plug.  It is much simpler and solid if we serialize such operations.
> > > > 
> > > > > Besides, there may be an alternative approach to all this.  For example,
> > > > > what if we don't remove struct device objects on eject?  The ACPI handles
> > > > > associated with them don't go away in that case after all, do they?
> > > > 
> > > > Umm...  Sorry, I am not getting your point.  The issue is that we need
> > > > to be able to fail a request when memory range cannot be off-lined.
> > > > Otherwise, we end up ejecting online memory range.
> > > 
> > > Yes, this is the major one.  The minor issue, however, is a race condition
> > > between unbinding a driver from a device and removing the device if I
> > > understand it correctly.  Which will go away automatically if the device is
> > > not removed in the first place.  Or so I would think. :-)
> > 
> > I see.  I do not think whether or not the device is removed on eject
> > makes any difference here.  The issue is that after driver_unbind() is
> > done, acpi_bus_hot_remove_device() no longer calls the ACPI memory
> > driver (hence, it cannot fail in prepare_remove), and goes ahead to call
> > _EJ0.  If driver_unbind() did off-line the memory, this is OK.  However,
> > it cannot off-line kernel memory ranges.  So, we basically need to
> > either 1) serialize acpi_bus_hot_remove_device() and driver_unbind(), or
> > 2) make acpi_bus_hot_remove_device() to fail if driver_unbind() is run
> > during the operation.
> 
> OK, I see the problem now.
> 
> What exactly is triggering the driver_unbind() in this scenario?

User can request driver_unbind() from sysfs as follows.  I do not see
much reason why user has to do for memory, though.

echo "PNP0C80:XX" > /sys/bus/acpi/drivers/acpi_memhotplug/unbind


Thanks,
-Toshi



--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael Wysocki Nov. 28, 2012, 9:40 p.m. UTC | #12
On Wednesday, November 28, 2012 02:02:48 PM Toshi Kani wrote:
> > > > > > > > Consider the following case:
> > > > > > > > 
> > > > > > > > We hotremove the memory device by SCI and unbind it from the driver at the same time:
> > > > > > > > 
> > > > > > > > CPUa                                                  CPUb
> > > > > > > > acpi_memory_device_notify()
> > > > > > > >                                        unbind it from the driver
> > > > > > > >     acpi_bus_hot_remove_device()
> > > > > > > 
> > > > > > > Can we make acpi_bus_remove() to fail if a given acpi_device is not
> > > > > > > bound with a driver?  If so, can we make the unbind operation to perform
> > > > > > > unbind only?
> > > > > > 
> > > > > > acpi_bus_remove_device could check if the driver is present, and return -ENODEV
> > > > > > if it's not present (dev->driver == NULL).
> > > > > > 
> > > > > > But there can still be a race between an eject and an unbind operation happening
> > > > > > simultaneously. This seems like a general problem to me i.e. not specific to an
> > > > > > acpi memory device. How do we ensure an eject does not race with a driver unbind
> > > > > > for other acpi devices?
> > > > > > 
> > > > > > Is there a per-device lock in acpi-core or device-core that can prevent this from
> > > > > > happening? Driver core does a device_lock(dev) on all operations, but this is
> > > > > > probably not grabbed on SCI-initiated acpi ejects.
> > > > > 
> > > > > Since driver_unbind() calls device_lock(dev->parent) before calling
> > > > > device_release_driver(), I am wondering if we can call
> > > > > device_lock(dev->dev->parent) at the beginning of acpi_bus_remove()
> > > > > (i.e. before calling pre_remove) and fails if dev->driver is NULL.  The
> > > > > parent lock is otherwise released after device_release_driver() is done.
> > > > 
> > > > I would be careful.  You may introduce some subtle locking-related issues
> > > > this way.
> > > 
> > > Right.  This requires careful inspection and testing.  As far as the
> > > locking is concerned, I am not keen on using fine grained locking for
> > > hot-plug.  It is much simpler and solid if we serialize such operations.
> > > 
> > > > Besides, there may be an alternative approach to all this.  For example,
> > > > what if we don't remove struct device objects on eject?  The ACPI handles
> > > > associated with them don't go away in that case after all, do they?
> > > 
> > > Umm...  Sorry, I am not getting your point.  The issue is that we need
> > > to be able to fail a request when memory range cannot be off-lined.
> > > Otherwise, we end up ejecting online memory range.
> > 
> > Yes, this is the major one.  The minor issue, however, is a race condition
> > between unbinding a driver from a device and removing the device if I
> > understand it correctly.  Which will go away automatically if the device is
> > not removed in the first place.  Or so I would think. :-)
> 
> I see.  I do not think whether or not the device is removed on eject
> makes any difference here.  The issue is that after driver_unbind() is
> done, acpi_bus_hot_remove_device() no longer calls the ACPI memory
> driver (hence, it cannot fail in prepare_remove), and goes ahead to call
> _EJ0.  If driver_unbind() did off-line the memory, this is OK.  However,
> it cannot off-line kernel memory ranges.  So, we basically need to
> either 1) serialize acpi_bus_hot_remove_device() and driver_unbind(), or
> 2) make acpi_bus_hot_remove_device() to fail if driver_unbind() is run
> during the operation.

OK, I see the problem now.

What exactly is triggering the driver_unbind() in this scenario?

Rafael
Rafael Wysocki Nov. 28, 2012, 10:01 p.m. UTC | #13
On Wednesday, November 28, 2012 02:40:09 PM Toshi Kani wrote:
> On Wed, 2012-11-28 at 22:40 +0100, Rafael J. Wysocki wrote:
> > On Wednesday, November 28, 2012 02:02:48 PM Toshi Kani wrote:
> > > > > > > > > > Consider the following case:
> > > > > > > > > > 
> > > > > > > > > > We hotremove the memory device by SCI and unbind it from the driver at the same time:
> > > > > > > > > > 
> > > > > > > > > > CPUa                                                  CPUb
> > > > > > > > > > acpi_memory_device_notify()
> > > > > > > > > >                                        unbind it from the driver
> > > > > > > > > >     acpi_bus_hot_remove_device()
> > > > > > > > > 
> > > > > > > > > Can we make acpi_bus_remove() to fail if a given acpi_device is not
> > > > > > > > > bound with a driver?  If so, can we make the unbind operation to perform
> > > > > > > > > unbind only?
> > > > > > > > 
> > > > > > > > acpi_bus_remove_device could check if the driver is present, and return -ENODEV
> > > > > > > > if it's not present (dev->driver == NULL).
> > > > > > > > 
> > > > > > > > But there can still be a race between an eject and an unbind operation happening
> > > > > > > > simultaneously. This seems like a general problem to me i.e. not specific to an
> > > > > > > > acpi memory device. How do we ensure an eject does not race with a driver unbind
> > > > > > > > for other acpi devices?
> > > > > > > > 
> > > > > > > > Is there a per-device lock in acpi-core or device-core that can prevent this from
> > > > > > > > happening? Driver core does a device_lock(dev) on all operations, but this is
> > > > > > > > probably not grabbed on SCI-initiated acpi ejects.
> > > > > > > 
> > > > > > > Since driver_unbind() calls device_lock(dev->parent) before calling
> > > > > > > device_release_driver(), I am wondering if we can call
> > > > > > > device_lock(dev->dev->parent) at the beginning of acpi_bus_remove()
> > > > > > > (i.e. before calling pre_remove) and fails if dev->driver is NULL.  The
> > > > > > > parent lock is otherwise released after device_release_driver() is done.
> > > > > > 
> > > > > > I would be careful.  You may introduce some subtle locking-related issues
> > > > > > this way.
> > > > > 
> > > > > Right.  This requires careful inspection and testing.  As far as the
> > > > > locking is concerned, I am not keen on using fine grained locking for
> > > > > hot-plug.  It is much simpler and solid if we serialize such operations.
> > > > > 
> > > > > > Besides, there may be an alternative approach to all this.  For example,
> > > > > > what if we don't remove struct device objects on eject?  The ACPI handles
> > > > > > associated with them don't go away in that case after all, do they?
> > > > > 
> > > > > Umm...  Sorry, I am not getting your point.  The issue is that we need
> > > > > to be able to fail a request when memory range cannot be off-lined.
> > > > > Otherwise, we end up ejecting online memory range.
> > > > 
> > > > Yes, this is the major one.  The minor issue, however, is a race condition
> > > > between unbinding a driver from a device and removing the device if I
> > > > understand it correctly.  Which will go away automatically if the device is
> > > > not removed in the first place.  Or so I would think. :-)
> > > 
> > > I see.  I do not think whether or not the device is removed on eject
> > > makes any difference here.  The issue is that after driver_unbind() is
> > > done, acpi_bus_hot_remove_device() no longer calls the ACPI memory
> > > driver (hence, it cannot fail in prepare_remove), and goes ahead to call
> > > _EJ0.  If driver_unbind() did off-line the memory, this is OK.  However,
> > > it cannot off-line kernel memory ranges.  So, we basically need to
> > > either 1) serialize acpi_bus_hot_remove_device() and driver_unbind(), or
> > > 2) make acpi_bus_hot_remove_device() to fail if driver_unbind() is run
> > > during the operation.
> > 
> > OK, I see the problem now.
> > 
> > What exactly is triggering the driver_unbind() in this scenario?
> 
> User can request driver_unbind() from sysfs as follows.  I do not see
> much reason why user has to do for memory, though.
> 
> echo "PNP0C80:XX" > /sys/bus/acpi/drivers/acpi_memhotplug/unbind

This is wrong.  Even if we want to permit user space to forcibly unbind
drivers from anything like this, we should at least check for some
situations in which it is plain dangerous.  Like in this case.  So I think
the above should fail unless we know that the driver won't be necessary
to handle hot-removal of memory.

Alternatively, this may actually try to carry out the hot-removal and only
call driver_unbind() if that succeeds.  Whichever is preferable, I'd say.

Thanks,
Rafael
Toshi Kani Nov. 28, 2012, 10:04 p.m. UTC | #14
On Wed, 2012-11-28 at 23:01 +0100, Rafael J. Wysocki wrote:
> On Wednesday, November 28, 2012 02:40:09 PM Toshi Kani wrote:
> > On Wed, 2012-11-28 at 22:40 +0100, Rafael J. Wysocki wrote:
> > > On Wednesday, November 28, 2012 02:02:48 PM Toshi Kani wrote:
> > > > > > > > > > > Consider the following case:
> > > > > > > > > > > 
> > > > > > > > > > > We hotremove the memory device by SCI and unbind it from the driver at the same time:
> > > > > > > > > > > 
> > > > > > > > > > > CPUa                                                  CPUb
> > > > > > > > > > > acpi_memory_device_notify()
> > > > > > > > > > >                                        unbind it from the driver
> > > > > > > > > > >     acpi_bus_hot_remove_device()
> > > > > > > > > > 
> > > > > > > > > > Can we make acpi_bus_remove() to fail if a given acpi_device is not
> > > > > > > > > > bound with a driver?  If so, can we make the unbind operation to perform
> > > > > > > > > > unbind only?
> > > > > > > > > 
> > > > > > > > > acpi_bus_remove_device could check if the driver is present, and return -ENODEV
> > > > > > > > > if it's not present (dev->driver == NULL).
> > > > > > > > > 
> > > > > > > > > But there can still be a race between an eject and an unbind operation happening
> > > > > > > > > simultaneously. This seems like a general problem to me i.e. not specific to an
> > > > > > > > > acpi memory device. How do we ensure an eject does not race with a driver unbind
> > > > > > > > > for other acpi devices?
> > > > > > > > > 
> > > > > > > > > Is there a per-device lock in acpi-core or device-core that can prevent this from
> > > > > > > > > happening? Driver core does a device_lock(dev) on all operations, but this is
> > > > > > > > > probably not grabbed on SCI-initiated acpi ejects.
> > > > > > > > 
> > > > > > > > Since driver_unbind() calls device_lock(dev->parent) before calling
> > > > > > > > device_release_driver(), I am wondering if we can call
> > > > > > > > device_lock(dev->dev->parent) at the beginning of acpi_bus_remove()
> > > > > > > > (i.e. before calling pre_remove) and fails if dev->driver is NULL.  The
> > > > > > > > parent lock is otherwise released after device_release_driver() is done.
> > > > > > > 
> > > > > > > I would be careful.  You may introduce some subtle locking-related issues
> > > > > > > this way.
> > > > > > 
> > > > > > Right.  This requires careful inspection and testing.  As far as the
> > > > > > locking is concerned, I am not keen on using fine grained locking for
> > > > > > hot-plug.  It is much simpler and solid if we serialize such operations.
> > > > > > 
> > > > > > > Besides, there may be an alternative approach to all this.  For example,
> > > > > > > what if we don't remove struct device objects on eject?  The ACPI handles
> > > > > > > associated with them don't go away in that case after all, do they?
> > > > > > 
> > > > > > Umm...  Sorry, I am not getting your point.  The issue is that we need
> > > > > > to be able to fail a request when memory range cannot be off-lined.
> > > > > > Otherwise, we end up ejecting online memory range.
> > > > > 
> > > > > Yes, this is the major one.  The minor issue, however, is a race condition
> > > > > between unbinding a driver from a device and removing the device if I
> > > > > understand it correctly.  Which will go away automatically if the device is
> > > > > not removed in the first place.  Or so I would think. :-)
> > > > 
> > > > I see.  I do not think whether or not the device is removed on eject
> > > > makes any difference here.  The issue is that after driver_unbind() is
> > > > done, acpi_bus_hot_remove_device() no longer calls the ACPI memory
> > > > driver (hence, it cannot fail in prepare_remove), and goes ahead to call
> > > > _EJ0.  If driver_unbind() did off-line the memory, this is OK.  However,
> > > > it cannot off-line kernel memory ranges.  So, we basically need to
> > > > either 1) serialize acpi_bus_hot_remove_device() and driver_unbind(), or
> > > > 2) make acpi_bus_hot_remove_device() to fail if driver_unbind() is run
> > > > during the operation.
> > > 
> > > OK, I see the problem now.
> > > 
> > > What exactly is triggering the driver_unbind() in this scenario?
> > 
> > User can request driver_unbind() from sysfs as follows.  I do not see
> > much reason why user has to do for memory, though.
> > 
> > echo "PNP0C80:XX" > /sys/bus/acpi/drivers/acpi_memhotplug/unbind
> 
> This is wrong.  Even if we want to permit user space to forcibly unbind
> drivers from anything like this, we should at least check for some
> situations in which it is plain dangerous.  Like in this case.  So I think
> the above should fail unless we know that the driver won't be necessary
> to handle hot-removal of memory.

Well, we tried twice already... :)
https://lkml.org/lkml/2012/11/16/649

> Alternatively, this may actually try to carry out the hot-removal and only
> call driver_unbind() if that succeeds.  Whichever is preferable, I'd say.

Greg clarified in the above link that this interface is "unbind", not
remove.


Thanks,
-Toshi

--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Toshi Kani Nov. 28, 2012, 10:16 p.m. UTC | #15
> > > > > > I see.  I do not think whether or not the device is removed on eject
> > > > > > makes any difference here.  The issue is that after driver_unbind() is
> > > > > > done, acpi_bus_hot_remove_device() no longer calls the ACPI memory
> > > > > > driver (hence, it cannot fail in prepare_remove), and goes ahead to call
> > > > > > _EJ0.  If driver_unbind() did off-line the memory, this is OK.  However,
> > > > > > it cannot off-line kernel memory ranges.  So, we basically need to
> > > > > > either 1) serialize acpi_bus_hot_remove_device() and driver_unbind(), or
> > > > > > 2) make acpi_bus_hot_remove_device() to fail if driver_unbind() is run
> > > > > > during the operation.
> > > > > 
> > > > > OK, I see the problem now.
> > > > > 
> > > > > What exactly is triggering the driver_unbind() in this scenario?
> > > > 
> > > > User can request driver_unbind() from sysfs as follows.  I do not see
> > > > much reason why user has to do for memory, though.
> > > > 
> > > > echo "PNP0C80:XX" > /sys/bus/acpi/drivers/acpi_memhotplug/unbind
> > > 
> > > This is wrong.  Even if we want to permit user space to forcibly unbind
> > > drivers from anything like this, we should at least check for some
> > > situations in which it is plain dangerous.  Like in this case.  So I think
> > > the above should fail unless we know that the driver won't be necessary
> > > to handle hot-removal of memory.
> > 
> > Well, we tried twice already... :)
> > https://lkml.org/lkml/2012/11/16/649
> 
> I didn't mean driver_unbind() should fail.  The code path that executes
> driver_unbind() eventually should fail _before_ executing it.

driver_unbind() is the handler, so it is called directly from this
unbind interface.

Thanks,
-Toshi


--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael Wysocki Nov. 28, 2012, 10:21 p.m. UTC | #16
On Wednesday, November 28, 2012 03:04:52 PM Toshi Kani wrote:
> On Wed, 2012-11-28 at 23:01 +0100, Rafael J. Wysocki wrote:
> > On Wednesday, November 28, 2012 02:40:09 PM Toshi Kani wrote:
> > > On Wed, 2012-11-28 at 22:40 +0100, Rafael J. Wysocki wrote:
> > > > On Wednesday, November 28, 2012 02:02:48 PM Toshi Kani wrote:
> > > > > > > > > > > > Consider the following case:
> > > > > > > > > > > > 
> > > > > > > > > > > > We hotremove the memory device by SCI and unbind it from the driver at the same time:
> > > > > > > > > > > > 
> > > > > > > > > > > > CPUa                                                  CPUb
> > > > > > > > > > > > acpi_memory_device_notify()
> > > > > > > > > > > >                                        unbind it from the driver
> > > > > > > > > > > >     acpi_bus_hot_remove_device()
> > > > > > > > > > > 
> > > > > > > > > > > Can we make acpi_bus_remove() to fail if a given acpi_device is not
> > > > > > > > > > > bound with a driver?  If so, can we make the unbind operation to perform
> > > > > > > > > > > unbind only?
> > > > > > > > > > 
> > > > > > > > > > acpi_bus_remove_device could check if the driver is present, and return -ENODEV
> > > > > > > > > > if it's not present (dev->driver == NULL).
> > > > > > > > > > 
> > > > > > > > > > But there can still be a race between an eject and an unbind operation happening
> > > > > > > > > > simultaneously. This seems like a general problem to me i.e. not specific to an
> > > > > > > > > > acpi memory device. How do we ensure an eject does not race with a driver unbind
> > > > > > > > > > for other acpi devices?
> > > > > > > > > > 
> > > > > > > > > > Is there a per-device lock in acpi-core or device-core that can prevent this from
> > > > > > > > > > happening? Driver core does a device_lock(dev) on all operations, but this is
> > > > > > > > > > probably not grabbed on SCI-initiated acpi ejects.
> > > > > > > > > 
> > > > > > > > > Since driver_unbind() calls device_lock(dev->parent) before calling
> > > > > > > > > device_release_driver(), I am wondering if we can call
> > > > > > > > > device_lock(dev->dev->parent) at the beginning of acpi_bus_remove()
> > > > > > > > > (i.e. before calling pre_remove) and fails if dev->driver is NULL.  The
> > > > > > > > > parent lock is otherwise released after device_release_driver() is done.
> > > > > > > > 
> > > > > > > > I would be careful.  You may introduce some subtle locking-related issues
> > > > > > > > this way.
> > > > > > > 
> > > > > > > Right.  This requires careful inspection and testing.  As far as the
> > > > > > > locking is concerned, I am not keen on using fine grained locking for
> > > > > > > hot-plug.  It is much simpler and solid if we serialize such operations.
> > > > > > > 
> > > > > > > > Besides, there may be an alternative approach to all this.  For example,
> > > > > > > > what if we don't remove struct device objects on eject?  The ACPI handles
> > > > > > > > associated with them don't go away in that case after all, do they?
> > > > > > > 
> > > > > > > Umm...  Sorry, I am not getting your point.  The issue is that we need
> > > > > > > to be able to fail a request when memory range cannot be off-lined.
> > > > > > > Otherwise, we end up ejecting online memory range.
> > > > > > 
> > > > > > Yes, this is the major one.  The minor issue, however, is a race condition
> > > > > > between unbinding a driver from a device and removing the device if I
> > > > > > understand it correctly.  Which will go away automatically if the device is
> > > > > > not removed in the first place.  Or so I would think. :-)
> > > > > 
> > > > > I see.  I do not think whether or not the device is removed on eject
> > > > > makes any difference here.  The issue is that after driver_unbind() is
> > > > > done, acpi_bus_hot_remove_device() no longer calls the ACPI memory
> > > > > driver (hence, it cannot fail in prepare_remove), and goes ahead to call
> > > > > _EJ0.  If driver_unbind() did off-line the memory, this is OK.  However,
> > > > > it cannot off-line kernel memory ranges.  So, we basically need to
> > > > > either 1) serialize acpi_bus_hot_remove_device() and driver_unbind(), or
> > > > > 2) make acpi_bus_hot_remove_device() to fail if driver_unbind() is run
> > > > > during the operation.
> > > > 
> > > > OK, I see the problem now.
> > > > 
> > > > What exactly is triggering the driver_unbind() in this scenario?
> > > 
> > > User can request driver_unbind() from sysfs as follows.  I do not see
> > > much reason why user has to do for memory, though.
> > > 
> > > echo "PNP0C80:XX" > /sys/bus/acpi/drivers/acpi_memhotplug/unbind
> > 
> > This is wrong.  Even if we want to permit user space to forcibly unbind
> > drivers from anything like this, we should at least check for some
> > situations in which it is plain dangerous.  Like in this case.  So I think
> > the above should fail unless we know that the driver won't be necessary
> > to handle hot-removal of memory.
> 
> Well, we tried twice already... :)
> https://lkml.org/lkml/2012/11/16/649

I didn't mean driver_unbind() should fail.  The code path that executes
driver_unbind() eventually should fail _before_ executing it.

> > Alternatively, this may actually try to carry out the hot-removal and only
> > call driver_unbind() if that succeeds.  Whichever is preferable, I'd say.
> 
> Greg clarified in the above link that this interface is "unbind", not
> remove.

OK, so that's clear.

Thanks,
Rafael
Rafael Wysocki Nov. 28, 2012, 10:39 p.m. UTC | #17
On Wednesday, November 28, 2012 03:16:22 PM Toshi Kani wrote:
> > > > > > > I see.  I do not think whether or not the device is removed on eject
> > > > > > > makes any difference here.  The issue is that after driver_unbind() is
> > > > > > > done, acpi_bus_hot_remove_device() no longer calls the ACPI memory
> > > > > > > driver (hence, it cannot fail in prepare_remove), and goes ahead to call
> > > > > > > _EJ0.  If driver_unbind() did off-line the memory, this is OK.  However,
> > > > > > > it cannot off-line kernel memory ranges.  So, we basically need to
> > > > > > > either 1) serialize acpi_bus_hot_remove_device() and driver_unbind(), or
> > > > > > > 2) make acpi_bus_hot_remove_device() to fail if driver_unbind() is run
> > > > > > > during the operation.
> > > > > > 
> > > > > > OK, I see the problem now.
> > > > > > 
> > > > > > What exactly is triggering the driver_unbind() in this scenario?
> > > > > 
> > > > > User can request driver_unbind() from sysfs as follows.  I do not see
> > > > > much reason why user has to do for memory, though.
> > > > > 
> > > > > echo "PNP0C80:XX" > /sys/bus/acpi/drivers/acpi_memhotplug/unbind
> > > > 
> > > > This is wrong.  Even if we want to permit user space to forcibly unbind
> > > > drivers from anything like this, we should at least check for some
> > > > situations in which it is plain dangerous.  Like in this case.  So I think
> > > > the above should fail unless we know that the driver won't be necessary
> > > > to handle hot-removal of memory.
> > > 
> > > Well, we tried twice already... :)
> > > https://lkml.org/lkml/2012/11/16/649
> > 
> > I didn't mean driver_unbind() should fail.  The code path that executes
> > driver_unbind() eventually should fail _before_ executing it.
> 
> driver_unbind() is the handler, so it is called directly from this
> unbind interface.

Yes, sorry for the confusion.

So, it looks like the driver core wants us to handle driver unbinding no
matter what.

This pretty much means that it is a bad idea to have a driver that is
exposed as a "device driver" in sysfs for memory hotplugging.

Thanks,
Rafael
Greg Kroah-Hartman Nov. 28, 2012, 10:46 p.m. UTC | #18
On Wed, Nov 28, 2012 at 11:39:22PM +0100, Rafael J. Wysocki wrote:
> On Wednesday, November 28, 2012 03:16:22 PM Toshi Kani wrote:
> > > > > > > > I see.  I do not think whether or not the device is removed on eject
> > > > > > > > makes any difference here.  The issue is that after driver_unbind() is
> > > > > > > > done, acpi_bus_hot_remove_device() no longer calls the ACPI memory
> > > > > > > > driver (hence, it cannot fail in prepare_remove), and goes ahead to call
> > > > > > > > _EJ0.  If driver_unbind() did off-line the memory, this is OK.  However,
> > > > > > > > it cannot off-line kernel memory ranges.  So, we basically need to
> > > > > > > > either 1) serialize acpi_bus_hot_remove_device() and driver_unbind(), or
> > > > > > > > 2) make acpi_bus_hot_remove_device() to fail if driver_unbind() is run
> > > > > > > > during the operation.
> > > > > > > 
> > > > > > > OK, I see the problem now.
> > > > > > > 
> > > > > > > What exactly is triggering the driver_unbind() in this scenario?
> > > > > > 
> > > > > > User can request driver_unbind() from sysfs as follows.  I do not see
> > > > > > much reason why user has to do for memory, though.
> > > > > > 
> > > > > > echo "PNP0C80:XX" > /sys/bus/acpi/drivers/acpi_memhotplug/unbind
> > > > > 
> > > > > This is wrong.  Even if we want to permit user space to forcibly unbind
> > > > > drivers from anything like this, we should at least check for some
> > > > > situations in which it is plain dangerous.  Like in this case.  So I think
> > > > > the above should fail unless we know that the driver won't be necessary
> > > > > to handle hot-removal of memory.
> > > > 
> > > > Well, we tried twice already... :)
> > > > https://lkml.org/lkml/2012/11/16/649
> > > 
> > > I didn't mean driver_unbind() should fail.  The code path that executes
> > > driver_unbind() eventually should fail _before_ executing it.
> > 
> > driver_unbind() is the handler, so it is called directly from this
> > unbind interface.
> 
> Yes, sorry for the confusion.
> 
> So, it looks like the driver core wants us to handle driver unbinding no
> matter what.

Yes.  Well, the driver core does the unbinding no matter what, if it was
told, by a user, to do so.  Why is that a problem?  The user then is
responsible for any bad things (i.e. not able to control the device any
more), if they do so.

> This pretty much means that it is a bad idea to have a driver that is
> exposed as a "device driver" in sysfs for memory hotplugging.

Again, why?  All this means is that the driver is now not connected to
the device (memory in this case.)  The memory is still there, still
operates as before, only difference is, the driver can't touch it
anymore.

This is the same for any ACPI driver, and has been for years.

Please don't confuse unbind with any "normal" system operation, it is
not to be used for memory hotplug, or anything else like this.

Also, if you really do not want to do this, turn off the ability to
unbind/bind for these devices, that is under your control in your bus
logic.

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael Wysocki Nov. 28, 2012, 11:05 p.m. UTC | #19
On Wednesday, November 28, 2012 02:46:33 PM Greg KH wrote:
> On Wed, Nov 28, 2012 at 11:39:22PM +0100, Rafael J. Wysocki wrote:
> > On Wednesday, November 28, 2012 03:16:22 PM Toshi Kani wrote:
> > > > > > > > > I see.  I do not think whether or not the device is removed on eject
> > > > > > > > > makes any difference here.  The issue is that after driver_unbind() is
> > > > > > > > > done, acpi_bus_hot_remove_device() no longer calls the ACPI memory
> > > > > > > > > driver (hence, it cannot fail in prepare_remove), and goes ahead to call
> > > > > > > > > _EJ0.  If driver_unbind() did off-line the memory, this is OK.  However,
> > > > > > > > > it cannot off-line kernel memory ranges.  So, we basically need to
> > > > > > > > > either 1) serialize acpi_bus_hot_remove_device() and driver_unbind(), or
> > > > > > > > > 2) make acpi_bus_hot_remove_device() to fail if driver_unbind() is run
> > > > > > > > > during the operation.
> > > > > > > > 
> > > > > > > > OK, I see the problem now.
> > > > > > > > 
> > > > > > > > What exactly is triggering the driver_unbind() in this scenario?
> > > > > > > 
> > > > > > > User can request driver_unbind() from sysfs as follows.  I do not see
> > > > > > > much reason why user has to do for memory, though.
> > > > > > > 
> > > > > > > echo "PNP0C80:XX" > /sys/bus/acpi/drivers/acpi_memhotplug/unbind
> > > > > > 
> > > > > > This is wrong.  Even if we want to permit user space to forcibly unbind
> > > > > > drivers from anything like this, we should at least check for some
> > > > > > situations in which it is plain dangerous.  Like in this case.  So I think
> > > > > > the above should fail unless we know that the driver won't be necessary
> > > > > > to handle hot-removal of memory.
> > > > > 
> > > > > Well, we tried twice already... :)
> > > > > https://lkml.org/lkml/2012/11/16/649
> > > > 
> > > > I didn't mean driver_unbind() should fail.  The code path that executes
> > > > driver_unbind() eventually should fail _before_ executing it.
> > > 
> > > driver_unbind() is the handler, so it is called directly from this
> > > unbind interface.
> > 
> > Yes, sorry for the confusion.
> > 
> > So, it looks like the driver core wants us to handle driver unbinding no
> > matter what.
> 
> Yes.  Well, the driver core does the unbinding no matter what, if it was
> told, by a user, to do so.  Why is that a problem?  The user then is
> responsible for any bad things (i.e. not able to control the device any
> more), if they do so.

I don't really agree with that, because the user may simply not know what
the consequences of that will be.  In my not so humble opinion any interface
allowing user space to crash the kernel is a bad one.  And this is an example
of that.

> > This pretty much means that it is a bad idea to have a driver that is
> > exposed as a "device driver" in sysfs for memory hotplugging.
> 
> Again, why?  All this means is that the driver is now not connected to
> the device (memory in this case.)  The memory is still there, still
> operates as before, only difference is, the driver can't touch it
> anymore.
> 
> This is the same for any ACPI driver, and has been for years.

Except that if this driver has been unbound and the removal is triggered by
an SCI, the core will just go on and remove the memory, although it may
be killing the kernel this way.

Arguably, this may be considered as the core's fault, but the only way to
fix that would be to move the code from that driver into the core and not to
register it as a "driver" any more.  Which was my point. :-)

> Please don't confuse unbind with any "normal" system operation, it is
> not to be used for memory hotplug, or anything else like this.
> 
> Also, if you really do not want to do this, turn off the ability to
> unbind/bind for these devices, that is under your control in your bus
> logic.

OK, but how?  I'm looking at driver_unbind() and not seeing any way to do
that actually.

Thanks,
Rafael
Greg Kroah-Hartman Nov. 28, 2012, 11:10 p.m. UTC | #20
On Thu, Nov 29, 2012 at 12:05:20AM +0100, Rafael J. Wysocki wrote:
> On Wednesday, November 28, 2012 02:46:33 PM Greg KH wrote:
> > > So, it looks like the driver core wants us to handle driver unbinding no
> > > matter what.
> > 
> > Yes.  Well, the driver core does the unbinding no matter what, if it was
> > told, by a user, to do so.  Why is that a problem?  The user then is
> > responsible for any bad things (i.e. not able to control the device any
> > more), if they do so.
> 
> I don't really agree with that, because the user may simply not know what
> the consequences of that will be.  In my not so humble opinion any interface
> allowing user space to crash the kernel is a bad one.  And this is an example
> of that.

This has been in place since 2005, over 7 years now, and I have never
heard any problems with it being used to crash the kernel, despite the
easy ability for people to unbind all of their devices from drivers and
instantly cause a system hang.  So really doubt this is a problem in
real life :)

> > > This pretty much means that it is a bad idea to have a driver that is
> > > exposed as a "device driver" in sysfs for memory hotplugging.
> > 
> > Again, why?  All this means is that the driver is now not connected to
> > the device (memory in this case.)  The memory is still there, still
> > operates as before, only difference is, the driver can't touch it
> > anymore.
> > 
> > This is the same for any ACPI driver, and has been for years.
> 
> Except that if this driver has been unbound and the removal is triggered by
> an SCI, the core will just go on and remove the memory, although it may
> be killing the kernel this way.

Why would memory go away if a driver is unbound from a device?  The
device didn't go away.  It's the same if the driver was a module and it
was unloaded, you should not turn memory off in that situation, right?
Are you also going to prevent module unloading of this driver?

> Arguably, this may be considered as the core's fault, but the only way to
> fix that would be to move the code from that driver into the core and not to
> register it as a "driver" any more.  Which was my point. :-)

No, I think people are totally overreacting to the unbind/bind files,
which are there to aid in development, and in adding new device ids to
drivers, as well as sometimes doing a hacky revoke() call.

> > Please don't confuse unbind with any "normal" system operation, it is
> > not to be used for memory hotplug, or anything else like this.
> > 
> > Also, if you really do not want to do this, turn off the ability to
> > unbind/bind for these devices, that is under your control in your bus
> > logic.
> 
> OK, but how?  I'm looking at driver_unbind() and not seeing any way to do
> that actually.

See the suppress_bind_attrs field in struct device_driver.  It's even
documented in device.h, but sadly, no one reads documentation :)

I recommend you set this field if you don't want the bind/unbind files
to show up for your memory driver, although I would argue that the
driver needs to be fixed up to not do foolish things like removing
memory from a system unless it really does go away...

hope this helps,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael Wysocki Nov. 28, 2012, 11:31 p.m. UTC | #21
On Wednesday, November 28, 2012 03:10:46 PM Greg KH wrote:
> On Thu, Nov 29, 2012 at 12:05:20AM +0100, Rafael J. Wysocki wrote:
> > On Wednesday, November 28, 2012 02:46:33 PM Greg KH wrote:
> > > > So, it looks like the driver core wants us to handle driver unbinding no
> > > > matter what.
> > > 
> > > Yes.  Well, the driver core does the unbinding no matter what, if it was
> > > told, by a user, to do so.  Why is that a problem?  The user then is
> > > responsible for any bad things (i.e. not able to control the device any
> > > more), if they do so.
> > 
> > I don't really agree with that, because the user may simply not know what
> > the consequences of that will be.  In my not so humble opinion any interface
> > allowing user space to crash the kernel is a bad one.  And this is an example
> > of that.
> 
> This has been in place since 2005, over 7 years now, and I have never
> heard any problems with it being used to crash the kernel, despite the
> easy ability for people to unbind all of their devices from drivers and
> instantly cause a system hang.  So really doubt this is a problem in
> real life :)
> 
> > > > This pretty much means that it is a bad idea to have a driver that is
> > > > exposed as a "device driver" in sysfs for memory hotplugging.
> > > 
> > > Again, why?  All this means is that the driver is now not connected to
> > > the device (memory in this case.)  The memory is still there, still
> > > operates as before, only difference is, the driver can't touch it
> > > anymore.
> > > 
> > > This is the same for any ACPI driver, and has been for years.
> > 
> > Except that if this driver has been unbound and the removal is triggered by
> > an SCI, the core will just go on and remove the memory, although it may
> > be killing the kernel this way.
> 
> Why would memory go away if a driver is unbound from a device?  The
> device didn't go away.  It's the same if the driver was a module and it
> was unloaded, you should not turn memory off in that situation, right?

Right.  It looks like there's some confusion about the role of .remove()
in the ACPI subsystem, but I need to investigate it a bit more.

> Are you also going to prevent module unloading of this driver?

I'm not sure what I'm going to do with that driver at the moment to be honest. :-)

> > Arguably, this may be considered as the core's fault, but the only way to
> > fix that would be to move the code from that driver into the core and not to
> > register it as a "driver" any more.  Which was my point. :-)
> 
> No, I think people are totally overreacting to the unbind/bind files,
> which are there to aid in development, and in adding new device ids to
> drivers, as well as sometimes doing a hacky revoke() call.
> 
> > > Please don't confuse unbind with any "normal" system operation, it is
> > > not to be used for memory hotplug, or anything else like this.
> > > 
> > > Also, if you really do not want to do this, turn off the ability to
> > > unbind/bind for these devices, that is under your control in your bus
> > > logic.
> > 
> > OK, but how?  I'm looking at driver_unbind() and not seeing any way to do
> > that actually.
> 
> See the suppress_bind_attrs field in struct device_driver.  It's even
> documented in device.h, but sadly, no one reads documentation :)

That's good to know, thanks. :-)

And if I knew I could find that information in device.h, I'd look in there.

> I recommend you set this field if you don't want the bind/unbind files
> to show up for your memory driver, although I would argue that the
> driver needs to be fixed up to not do foolish things like removing
> memory from a system unless it really does go away...

Quite frankly, I need to look at that driver and how things are supposed to
work more thoroughly, because I don't seem to see a reason to do various
things the way they are done.  Well, maybe it's just me.

Thanks,
Rafael
Rafael Wysocki Nov. 28, 2012, 11:49 p.m. UTC | #22
On Wednesday, November 28, 2012 02:02:48 PM Toshi Kani wrote:
> > > > > > > > Consider the following case:
> > > > > > > > 
> > > > > > > > We hotremove the memory device by SCI and unbind it from the driver at the same time:
> > > > > > > > 
> > > > > > > > CPUa                                                  CPUb
> > > > > > > > acpi_memory_device_notify()
> > > > > > > >                                        unbind it from the driver
> > > > > > > >     acpi_bus_hot_remove_device()
> > > > > > > 
> > > > > > > Can we make acpi_bus_remove() to fail if a given acpi_device is not
> > > > > > > bound with a driver?  If so, can we make the unbind operation to perform
> > > > > > > unbind only?
> > > > > > 
> > > > > > acpi_bus_remove_device could check if the driver is present, and return -ENODEV
> > > > > > if it's not present (dev->driver == NULL).
> > > > > > 
> > > > > > But there can still be a race between an eject and an unbind operation happening
> > > > > > simultaneously. This seems like a general problem to me i.e. not specific to an
> > > > > > acpi memory device. How do we ensure an eject does not race with a driver unbind
> > > > > > for other acpi devices?
> > > > > > 
> > > > > > Is there a per-device lock in acpi-core or device-core that can prevent this from
> > > > > > happening? Driver core does a device_lock(dev) on all operations, but this is
> > > > > > probably not grabbed on SCI-initiated acpi ejects.
> > > > > 
> > > > > Since driver_unbind() calls device_lock(dev->parent) before calling
> > > > > device_release_driver(), I am wondering if we can call
> > > > > device_lock(dev->dev->parent) at the beginning of acpi_bus_remove()
> > > > > (i.e. before calling pre_remove) and fails if dev->driver is NULL.  The
> > > > > parent lock is otherwise released after device_release_driver() is done.
> > > > 
> > > > I would be careful.  You may introduce some subtle locking-related issues
> > > > this way.
> > > 
> > > Right.  This requires careful inspection and testing.  As far as the
> > > locking is concerned, I am not keen on using fine grained locking for
> > > hot-plug.  It is much simpler and solid if we serialize such operations.
> > > 
> > > > Besides, there may be an alternative approach to all this.  For example,
> > > > what if we don't remove struct device objects on eject?  The ACPI handles
> > > > associated with them don't go away in that case after all, do they?
> > > 
> > > Umm...  Sorry, I am not getting your point.  The issue is that we need
> > > to be able to fail a request when memory range cannot be off-lined.
> > > Otherwise, we end up ejecting online memory range.
> > 
> > Yes, this is the major one.  The minor issue, however, is a race condition
> > between unbinding a driver from a device and removing the device if I
> > understand it correctly.  Which will go away automatically if the device is
> > not removed in the first place.  Or so I would think. :-)
> 
> I see.  I do not think whether or not the device is removed on eject
> makes any difference here.  The issue is that after driver_unbind() is
> done, acpi_bus_hot_remove_device() no longer calls the ACPI memory
> driver (hence, it cannot fail in prepare_remove), and goes ahead to call
> _EJ0.

I see two reasons for calling acpi_bus_hot_remove_device() for memory (correct
me if I'm wrong): (1) from the memhotplug driver's notify handler and (2) from
acpi_eject_store() which is exposed through sysfs.  If we disabled exposing
acpi_eject_store() for memory devices, then the only way would be from the
notify handler.  So I wonder if driver_unbind() shouldn't just uninstall the
notify handler for memory (so that memory eject events are simply dropped on
the floor after unbinding the driver)?

Rafael
Toshi Kani Nov. 29, 2012, 1:02 a.m. UTC | #23
On Thu, 2012-11-29 at 00:49 +0100, Rafael J. Wysocki wrote:
> On Wednesday, November 28, 2012 02:02:48 PM Toshi Kani wrote:
> > > > > > > > > Consider the following case:
> > > > > > > > > 
> > > > > > > > > We hotremove the memory device by SCI and unbind it from the driver at the same time:
> > > > > > > > > 
> > > > > > > > > CPUa                                                  CPUb
> > > > > > > > > acpi_memory_device_notify()
> > > > > > > > >                                        unbind it from the driver
> > > > > > > > >     acpi_bus_hot_remove_device()
> > > > > > > > 
> > > > > > > > Can we make acpi_bus_remove() to fail if a given acpi_device is not
> > > > > > > > bound with a driver?  If so, can we make the unbind operation to perform
> > > > > > > > unbind only?
> > > > > > > 
> > > > > > > acpi_bus_remove_device could check if the driver is present, and return -ENODEV
> > > > > > > if it's not present (dev->driver == NULL).
> > > > > > > 
> > > > > > > But there can still be a race between an eject and an unbind operation happening
> > > > > > > simultaneously. This seems like a general problem to me i.e. not specific to an
> > > > > > > acpi memory device. How do we ensure an eject does not race with a driver unbind
> > > > > > > for other acpi devices?
> > > > > > > 
> > > > > > > Is there a per-device lock in acpi-core or device-core that can prevent this from
> > > > > > > happening? Driver core does a device_lock(dev) on all operations, but this is
> > > > > > > probably not grabbed on SCI-initiated acpi ejects.
> > > > > > 
> > > > > > Since driver_unbind() calls device_lock(dev->parent) before calling
> > > > > > device_release_driver(), I am wondering if we can call
> > > > > > device_lock(dev->dev->parent) at the beginning of acpi_bus_remove()
> > > > > > (i.e. before calling pre_remove) and fails if dev->driver is NULL.  The
> > > > > > parent lock is otherwise released after device_release_driver() is done.
> > > > > 
> > > > > I would be careful.  You may introduce some subtle locking-related issues
> > > > > this way.
> > > > 
> > > > Right.  This requires careful inspection and testing.  As far as the
> > > > locking is concerned, I am not keen on using fine grained locking for
> > > > hot-plug.  It is much simpler and solid if we serialize such operations.
> > > > 
> > > > > Besides, there may be an alternative approach to all this.  For example,
> > > > > what if we don't remove struct device objects on eject?  The ACPI handles
> > > > > associated with them don't go away in that case after all, do they?
> > > > 
> > > > Umm...  Sorry, I am not getting your point.  The issue is that we need
> > > > to be able to fail a request when memory range cannot be off-lined.
> > > > Otherwise, we end up ejecting online memory range.
> > > 
> > > Yes, this is the major one.  The minor issue, however, is a race condition
> > > between unbinding a driver from a device and removing the device if I
> > > understand it correctly.  Which will go away automatically if the device is
> > > not removed in the first place.  Or so I would think. :-)
> > 
> > I see.  I do not think whether or not the device is removed on eject
> > makes any difference here.  The issue is that after driver_unbind() is
> > done, acpi_bus_hot_remove_device() no longer calls the ACPI memory
> > driver (hence, it cannot fail in prepare_remove), and goes ahead to call
> > _EJ0.
> 
> I see two reasons for calling acpi_bus_hot_remove_device() for memory (correct
> me if I'm wrong): (1) from the memhotplug driver's notify handler and (2) from
> acpi_eject_store() which is exposed through sysfs.  

Yes, that is correct.

> If we disabled exposing
> acpi_eject_store() for memory devices, then the only way would be from the
> notify handler.  So I wonder if driver_unbind() shouldn't just uninstall the
> notify handler for memory (so that memory eject events are simply dropped on
> the floor after unbinding the driver)?

If driver_unbind() happens before an eject request, we do not have a
problem.  acpi_eject_store() fails if a driver is not bound to the
device.  acpi_memory_device_notify() fails as well.

The race condition Wen pointed out (see the top of this email) is that
driver_unbind() may come in while eject operation is in-progress.  This
is why I mentioned the following in previous email.

> So, we basically need to either 1) serialize
> acpi_bus_hot_remove_device() and driver_unbind(), or 2) make
> acpi_bus_hot_remove_device() to fail if driver_unbind() is run
> during the operation.


Thanks,
-Toshi

--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Toshi Kani Nov. 29, 2012, 1:15 a.m. UTC | #24
On Wed, 2012-11-28 at 18:02 -0700, Toshi Kani wrote:
> On Thu, 2012-11-29 at 00:49 +0100, Rafael J. Wysocki wrote:
> > On Wednesday, November 28, 2012 02:02:48 PM Toshi Kani wrote:
> > > > > > > > > > Consider the following case:
> > > > > > > > > > 
> > > > > > > > > > We hotremove the memory device by SCI and unbind it from the driver at the same time:
> > > > > > > > > > 
> > > > > > > > > > CPUa                                                  CPUb
> > > > > > > > > > acpi_memory_device_notify()
> > > > > > > > > >                                        unbind it from the driver
> > > > > > > > > >     acpi_bus_hot_remove_device()
> > > > > > > > > 
> > > > > > > > > Can we make acpi_bus_remove() to fail if a given acpi_device is not
> > > > > > > > > bound with a driver?  If so, can we make the unbind operation to perform
> > > > > > > > > unbind only?
> > > > > > > > 
> > > > > > > > acpi_bus_remove_device could check if the driver is present, and return -ENODEV
> > > > > > > > if it's not present (dev->driver == NULL).
> > > > > > > > 
> > > > > > > > But there can still be a race between an eject and an unbind operation happening
> > > > > > > > simultaneously. This seems like a general problem to me i.e. not specific to an
> > > > > > > > acpi memory device. How do we ensure an eject does not race with a driver unbind
> > > > > > > > for other acpi devices?
> > > > > > > > 
> > > > > > > > Is there a per-device lock in acpi-core or device-core that can prevent this from
> > > > > > > > happening? Driver core does a device_lock(dev) on all operations, but this is
> > > > > > > > probably not grabbed on SCI-initiated acpi ejects.
> > > > > > > 
> > > > > > > Since driver_unbind() calls device_lock(dev->parent) before calling
> > > > > > > device_release_driver(), I am wondering if we can call
> > > > > > > device_lock(dev->dev->parent) at the beginning of acpi_bus_remove()
> > > > > > > (i.e. before calling pre_remove) and fails if dev->driver is NULL.  The
> > > > > > > parent lock is otherwise released after device_release_driver() is done.
> > > > > > 
> > > > > > I would be careful.  You may introduce some subtle locking-related issues
> > > > > > this way.
> > > > > 
> > > > > Right.  This requires careful inspection and testing.  As far as the
> > > > > locking is concerned, I am not keen on using fine grained locking for
> > > > > hot-plug.  It is much simpler and solid if we serialize such operations.
> > > > > 
> > > > > > Besides, there may be an alternative approach to all this.  For example,
> > > > > > what if we don't remove struct device objects on eject?  The ACPI handles
> > > > > > associated with them don't go away in that case after all, do they?
> > > > > 
> > > > > Umm...  Sorry, I am not getting your point.  The issue is that we need
> > > > > to be able to fail a request when memory range cannot be off-lined.
> > > > > Otherwise, we end up ejecting online memory range.
> > > > 
> > > > Yes, this is the major one.  The minor issue, however, is a race condition
> > > > between unbinding a driver from a device and removing the device if I
> > > > understand it correctly.  Which will go away automatically if the device is
> > > > not removed in the first place.  Or so I would think. :-)
> > > 
> > > I see.  I do not think whether or not the device is removed on eject
> > > makes any difference here.  The issue is that after driver_unbind() is
> > > done, acpi_bus_hot_remove_device() no longer calls the ACPI memory
> > > driver (hence, it cannot fail in prepare_remove), and goes ahead to call
> > > _EJ0.
> > 
> > I see two reasons for calling acpi_bus_hot_remove_device() for memory (correct
> > me if I'm wrong): (1) from the memhotplug driver's notify handler and (2) from
> > acpi_eject_store() which is exposed through sysfs.  
> 
> Yes, that is correct.
> 
> > If we disabled exposing
> > acpi_eject_store() for memory devices, then the only way would be from the
> > notify handler.  So I wonder if driver_unbind() shouldn't just uninstall the
> > notify handler for memory (so that memory eject events are simply dropped on
> > the floor after unbinding the driver)?
> 
> If driver_unbind() happens before an eject request, we do not have a
> problem.  acpi_eject_store() fails if a driver is not bound to the
> device.  acpi_memory_device_notify() fails as well.
> 
> The race condition Wen pointed out (see the top of this email) is that
> driver_unbind() may come in while eject operation is in-progress.  This
> is why I mentioned the following in previous email.
> 
> > So, we basically need to either 1) serialize
> > acpi_bus_hot_remove_device() and driver_unbind(), or 2) make
> > acpi_bus_hot_remove_device() to fail if driver_unbind() is run
> > during the operation.

Forgot to mention.  The 3rd option is what Greg said -- use the
suppress_bind_attrs field.  I think this is a good option to address
this race condition for now.  For a long term solution, we should have a
better infrastructure in place to address such issue in general.

Thanks,
-Toshi

--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael Wysocki Nov. 29, 2012, 10:03 a.m. UTC | #25
On Wednesday, November 28, 2012 06:15:42 PM Toshi Kani wrote:
> On Wed, 2012-11-28 at 18:02 -0700, Toshi Kani wrote:
> > On Thu, 2012-11-29 at 00:49 +0100, Rafael J. Wysocki wrote:
> > > On Wednesday, November 28, 2012 02:02:48 PM Toshi Kani wrote:
> > > > > > > > > > > Consider the following case:
> > > > > > > > > > > 
> > > > > > > > > > > We hotremove the memory device by SCI and unbind it from the driver at the same time:
> > > > > > > > > > > 
> > > > > > > > > > > CPUa                                                  CPUb
> > > > > > > > > > > acpi_memory_device_notify()
> > > > > > > > > > >                                        unbind it from the driver
> > > > > > > > > > >     acpi_bus_hot_remove_device()
> > > > > > > > > > 
> > > > > > > > > > Can we make acpi_bus_remove() to fail if a given acpi_device is not
> > > > > > > > > > bound with a driver?  If so, can we make the unbind operation to perform
> > > > > > > > > > unbind only?
> > > > > > > > > 
> > > > > > > > > acpi_bus_remove_device could check if the driver is present, and return -ENODEV
> > > > > > > > > if it's not present (dev->driver == NULL).
> > > > > > > > > 
> > > > > > > > > But there can still be a race between an eject and an unbind operation happening
> > > > > > > > > simultaneously. This seems like a general problem to me i.e. not specific to an
> > > > > > > > > acpi memory device. How do we ensure an eject does not race with a driver unbind
> > > > > > > > > for other acpi devices?
> > > > > > > > > 
> > > > > > > > > Is there a per-device lock in acpi-core or device-core that can prevent this from
> > > > > > > > > happening? Driver core does a device_lock(dev) on all operations, but this is
> > > > > > > > > probably not grabbed on SCI-initiated acpi ejects.
> > > > > > > > 
> > > > > > > > Since driver_unbind() calls device_lock(dev->parent) before calling
> > > > > > > > device_release_driver(), I am wondering if we can call
> > > > > > > > device_lock(dev->dev->parent) at the beginning of acpi_bus_remove()
> > > > > > > > (i.e. before calling pre_remove) and fails if dev->driver is NULL.  The
> > > > > > > > parent lock is otherwise released after device_release_driver() is done.
> > > > > > > 
> > > > > > > I would be careful.  You may introduce some subtle locking-related issues
> > > > > > > this way.
> > > > > > 
> > > > > > Right.  This requires careful inspection and testing.  As far as the
> > > > > > locking is concerned, I am not keen on using fine grained locking for
> > > > > > hot-plug.  It is much simpler and solid if we serialize such operations.
> > > > > > 
> > > > > > > Besides, there may be an alternative approach to all this.  For example,
> > > > > > > what if we don't remove struct device objects on eject?  The ACPI handles
> > > > > > > associated with them don't go away in that case after all, do they?
> > > > > > 
> > > > > > Umm...  Sorry, I am not getting your point.  The issue is that we need
> > > > > > to be able to fail a request when memory range cannot be off-lined.
> > > > > > Otherwise, we end up ejecting online memory range.
> > > > > 
> > > > > Yes, this is the major one.  The minor issue, however, is a race condition
> > > > > between unbinding a driver from a device and removing the device if I
> > > > > understand it correctly.  Which will go away automatically if the device is
> > > > > not removed in the first place.  Or so I would think. :-)
> > > > 
> > > > I see.  I do not think whether or not the device is removed on eject
> > > > makes any difference here.  The issue is that after driver_unbind() is
> > > > done, acpi_bus_hot_remove_device() no longer calls the ACPI memory
> > > > driver (hence, it cannot fail in prepare_remove), and goes ahead to call
> > > > _EJ0.
> > > 
> > > I see two reasons for calling acpi_bus_hot_remove_device() for memory (correct
> > > me if I'm wrong): (1) from the memhotplug driver's notify handler and (2) from
> > > acpi_eject_store() which is exposed through sysfs.  
> > 
> > Yes, that is correct.
> > 
> > > If we disabled exposing
> > > acpi_eject_store() for memory devices, then the only way would be from the
> > > notify handler.  So I wonder if driver_unbind() shouldn't just uninstall the
> > > notify handler for memory (so that memory eject events are simply dropped on
> > > the floor after unbinding the driver)?
> > 
> > If driver_unbind() happens before an eject request, we do not have a
> > problem.  acpi_eject_store() fails if a driver is not bound to the
> > device.  acpi_memory_device_notify() fails as well.
> > 
> > The race condition Wen pointed out (see the top of this email) is that
> > driver_unbind() may come in while eject operation is in-progress.  This
> > is why I mentioned the following in previous email.
> > 
> > > So, we basically need to either 1) serialize
> > > acpi_bus_hot_remove_device() and driver_unbind(), or 2) make
> > > acpi_bus_hot_remove_device() to fail if driver_unbind() is run
> > > during the operation.
> 
> Forgot to mention.  The 3rd option is what Greg said -- use the
> suppress_bind_attrs field.  I think this is a good option to address
> this race condition for now.  For a long term solution, we should have a
> better infrastructure in place to address such issue in general.

Well, in the meantime I've had a look at acpi_bus_hot_remove_device() and
friends and I think there's a way to address all of these problems
without big redesign (for now).

First, why don't we introduce an ACPI device flag (in the flags field of
struct acpi_device) called eject_forbidden or something like this such that:

(1) It will be clear by default.
(2) It may only be set by a driver's .add() routine if necessary.
(3) Once set, it may only be cleared by the driver's .remove() routine if
    it's safe to physically remove the device after the .remove().

Then, after the .remove() (which must be successful) has returned, and the
flag is set, it will tell acpi_bus_remove() to return a specific error code
(such as -EBUSY or -EAGAIN).  It doesn't matter if .remove() was called
earlier, because if it left the flag set, there's no way to clear it afterward
and acpi_bus_remove() will see it set anyway.  I think the struct acpi_device
should be unregistered anyway if that error code is to be returned.

[By the way, do you know where we free the memory allocated for struct
 acpi_device objects?]

Now if acpi_bus_trim() gets that error code from acpi_bus_remove(), it should
store it, but continue the trimming normally and finally it should return that
error code to acpi_bus_hot_remove_device().

Now, if acpi_bus_hot_remove_device() gets that error code, it should just
reverse the whole trimming (i.e. trigger acpi_bus_scan() from the device
we attempted to eject) and notify the firmware about the failure.

If we have that, then the memory hotplug driver would only need to set
flags.eject_forbidden in its .add() routine and make its .remove() routine
only clear that flag if it is safe to actually remove the memory.

Does this make sense to you?

[BTW, using _PS3 in acpi_bus_hot_remove_device() directly to power off the
 device is a nonsense, because this method is not guaranteed to turn the power
 off in the first place (it may just put the device into D3hot).  If anything,
 acpi_device_set_power() should be used for that, but even that is not
 guaranteed to actually remove the power (power resources may be shared with
 other devices, so in fact that operation should be done by acpi_bus_trim()
 for each of the trimmed devices.]

Thanks,
Rafael
Vasilis Liaskovitis Nov. 29, 2012, 11:04 a.m. UTC | #26
Hi,

On Wed, Nov 28, 2012 at 06:15:42PM -0700, Toshi Kani wrote:
> On Wed, 2012-11-28 at 18:02 -0700, Toshi Kani wrote:
> > On Thu, 2012-11-29 at 00:49 +0100, Rafael J. Wysocki wrote:
> > > On Wednesday, November 28, 2012 02:02:48 PM Toshi Kani wrote:
> > > > > > > > > > > Consider the following case:
> > > > > > > > > > > 
> > > > > > > > > > > We hotremove the memory device by SCI and unbind it from the driver at the same time:
> > > > > > > > > > > 
> > > > > > > > > > > CPUa                                                  CPUb
> > > > > > > > > > > acpi_memory_device_notify()
> > > > > > > > > > >                                        unbind it from the driver
> > > > > > > > > > >     acpi_bus_hot_remove_device()
> > > I see two reasons for calling acpi_bus_hot_remove_device() for memory (correct
> > > me if I'm wrong): (1) from the memhotplug driver's notify handler and (2) from
> > > acpi_eject_store() which is exposed through sysfs.  
> > 
> > Yes, that is correct.
> > 
> > > If we disabled exposing
> > > acpi_eject_store() for memory devices, then the only way would be from the
> > > notify handler.  So I wonder if driver_unbind() shouldn't just uninstall the
> > > notify handler for memory (so that memory eject events are simply dropped on
> > > the floor after unbinding the driver)?
> > 
> > If driver_unbind() happens before an eject request, we do not have a
> > problem.  acpi_eject_store() fails if a driver is not bound to the
> > device.  acpi_memory_device_notify() fails as well.
> > 
> > The race condition Wen pointed out (see the top of this email) is that
> > driver_unbind() may come in while eject operation is in-progress.  This
> > is why I mentioned the following in previous email.
> > 
> > > So, we basically need to either 1) serialize
> > > acpi_bus_hot_remove_device() and driver_unbind(), or 2) make
> > > acpi_bus_hot_remove_device() to fail if driver_unbind() is run
> > > during the operation.
> 
> Forgot to mention.  The 3rd option is what Greg said -- use the
> suppress_bind_attrs field.  I think this is a good option to address
> this race condition for now.  For a long term solution, we should have a
> better infrastructure in place to address such issue in general.

I like the suppress_bind_attrs idea, I 'll take a look.

As I said for option 2), acpi_bus_remove could check for driver presence.
But It's more a quick hack to abort the eject (the race with unbind can still
happen, but acpi_bus_remove can now detect it later in the eject path).
Something like:

 static int acpi_bus_remove(struct acpi_device *dev, int rmdevice)
 {
+	int ret;
 	if (!dev)
 		return -EINVAL;
 
 	dev->removal_type = ACPI_BUS_REMOVAL_EJECT;
+
+	if (dev->driver && dev->driver->ops.prepare_remove) {
+		ret = dev->driver->ops.prepare_remove(dev);
+		if (ret)
+			return ret;
+	}
+	else if (!dev->driver)
+		return -ENODEV;
 	device_release_driver(&dev->dev);

thanks,

- Vasilis
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Vasilis Liaskovitis Nov. 29, 2012, 11:30 a.m. UTC | #27
On Thu, Nov 29, 2012 at 11:03:05AM +0100, Rafael J. Wysocki wrote:
> On Wednesday, November 28, 2012 06:15:42 PM Toshi Kani wrote:
> > On Wed, 2012-11-28 at 18:02 -0700, Toshi Kani wrote:
> > > On Thu, 2012-11-29 at 00:49 +0100, Rafael J. Wysocki wrote:
> > > > On Wednesday, November 28, 2012 02:02:48 PM Toshi Kani wrote:
> > > > > > > > > > > > Consider the following case:
> > > > > > > > > > > > 
> > > > > > > > > > > > We hotremove the memory device by SCI and unbind it from the driver at the same time:
> > > > > > > > > > > > 
> > > > > > > > > > > > CPUa                                                  CPUb
> > > > > > > > > > > > acpi_memory_device_notify()
> > > > > > > > > > > >                                        unbind it from the driver
> > > > > > > > > > > >     acpi_bus_hot_remove_device()
> > > > > > > > > > > 
[...]
> Well, in the meantime I've had a look at acpi_bus_hot_remove_device() and
> friends and I think there's a way to address all of these problems
> without big redesign (for now).
> 
> First, why don't we introduce an ACPI device flag (in the flags field of
> struct acpi_device) called eject_forbidden or something like this such that:
> 
> (1) It will be clear by default.
> (2) It may only be set by a driver's .add() routine if necessary.
> (3) Once set, it may only be cleared by the driver's .remove() routine if
>     it's safe to physically remove the device after the .remove().
> 
> Then, after the .remove() (which must be successful) has returned, and the
> flag is set, it will tell acpi_bus_remove() to return a specific error code
> (such as -EBUSY or -EAGAIN).  It doesn't matter if .remove() was called
> earlier, because if it left the flag set, there's no way to clear it afterward
> and acpi_bus_remove() will see it set anyway.  I think the struct acpi_device
> should be unregistered anyway if that error code is to be returned.
> 
> [By the way, do you know where we free the memory allocated for struct
>  acpi_device objects?]
> 
> Now if acpi_bus_trim() gets that error code from acpi_bus_remove(), it should
> store it, but continue the trimming normally and finally it should return that
> error code to acpi_bus_hot_remove_device().

Side-note: In the pre_remove patches, acpi_bus_trim actually returns on the
first error from acpi_bus_remove (e.g. when memory offlining in pre_remove
fails). Trimming is not continued. 

Normally, acpi_bus_trim keeps trimming as you say, and always returns the last
error. Is this the desired behaviour that we want to keep for bus_trim? (This is
more a general question, not specific to the eject_forbidden suggestion)

> 
> Now, if acpi_bus_hot_remove_device() gets that error code, it should just
> reverse the whole trimming (i.e. trigger acpi_bus_scan() from the device
> we attempted to eject) and notify the firmware about the failure.

sounds like this rollback needs to be implemented in any solution we choose
to implement, correct?

> 
> If we have that, then the memory hotplug driver would only need to set
> flags.eject_forbidden in its .add() routine and make its .remove() routine
> only clear that flag if it is safe to actually remove the memory.
> 

But when .remove op is called, we are already in the irreversible/error-free
removal (final removal step).
Maybe we need to reset eject_forbidden in a prepare_remove operation which
handles the removal part that can fail ?

thanks,

- Vasilis
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Toshi Kani Nov. 29, 2012, 4:43 p.m. UTC | #28
On Thu, 2012-11-29 at 11:03 +0100, Rafael J. Wysocki wrote:
> On Wednesday, November 28, 2012 06:15:42 PM Toshi Kani wrote:
> > On Wed, 2012-11-28 at 18:02 -0700, Toshi Kani wrote:
> > > On Thu, 2012-11-29 at 00:49 +0100, Rafael J. Wysocki wrote:
> > > > On Wednesday, November 28, 2012 02:02:48 PM Toshi Kani wrote:
> > > > If we disabled exposing
> > > > acpi_eject_store() for memory devices, then the only way would be from the
> > > > notify handler.  So I wonder if driver_unbind() shouldn't just uninstall the
> > > > notify handler for memory (so that memory eject events are simply dropped on
> > > > the floor after unbinding the driver)?
> > > 
> > > If driver_unbind() happens before an eject request, we do not have a
> > > problem.  acpi_eject_store() fails if a driver is not bound to the
> > > device.  acpi_memory_device_notify() fails as well.
> > > 
> > > The race condition Wen pointed out (see the top of this email) is that
> > > driver_unbind() may come in while eject operation is in-progress.  This
> > > is why I mentioned the following in previous email.
> > > 
> > > > So, we basically need to either 1) serialize
> > > > acpi_bus_hot_remove_device() and driver_unbind(), or 2) make
> > > > acpi_bus_hot_remove_device() to fail if driver_unbind() is run
> > > > during the operation.
> > 
> > Forgot to mention.  The 3rd option is what Greg said -- use the
> > suppress_bind_attrs field.  I think this is a good option to address
> > this race condition for now.  For a long term solution, we should have a
> > better infrastructure in place to address such issue in general.
> 
> Well, in the meantime I've had a look at acpi_bus_hot_remove_device() and
> friends and I think there's a way to address all of these problems
> without big redesign (for now).
> 
> First, why don't we introduce an ACPI device flag (in the flags field of
> struct acpi_device) called eject_forbidden or something like this such that:
> 
> (1) It will be clear by default.
> (2) It may only be set by a driver's .add() routine if necessary.
> (3) Once set, it may only be cleared by the driver's .remove() routine if
>     it's safe to physically remove the device after the .remove().
> 
> Then, after the .remove() (which must be successful) has returned, and the
> flag is set, it will tell acpi_bus_remove() to return a specific error code
> (such as -EBUSY or -EAGAIN).  It doesn't matter if .remove() was called
> earlier, because if it left the flag set, there's no way to clear it afterward
> and acpi_bus_remove() will see it set anyway.  I think the struct acpi_device
> should be unregistered anyway if that error code is to be returned.

I like the idea!  It's a good intermediate solution if we need to keep
the bind/unbind interface.  That said, I still prefer to go with option
3) for now.  I do not see much reason to keep the bind/unbind interface
for ACPI hotplug drivers, and it seems that the semantics of .remove()
is .remove_driver(), not .remove_device() for driver_unbind().  So, I
think we should disable the bind/unbind interface until we settle this
issue.

> [By the way, do you know where we free the memory allocated for struct
>  acpi_device objects?]

device_release() -> acpi_device_release().

> Now if acpi_bus_trim() gets that error code from acpi_bus_remove(), it should
> store it, but continue the trimming normally and finally it should return that
> error code to acpi_bus_hot_remove_device().
> 
> Now, if acpi_bus_hot_remove_device() gets that error code, it should just
> reverse the whole trimming (i.e. trigger acpi_bus_scan() from the device
> we attempted to eject) and notify the firmware about the failure.
> 
> If we have that, then the memory hotplug driver would only need to set
> flags.eject_forbidden in its .add() routine and make its .remove() routine
> only clear that flag if it is safe to actually remove the memory.
> 
> Does this make sense to you?

In high-level, yes.  Rollback strategy, such as we should continue the
trimming after an error, is something we need to think about along with
the framework design.  I think we need a good framework before
implementing rollback.

> [BTW, using _PS3 in acpi_bus_hot_remove_device() directly to power off the
>  device is a nonsense, because this method is not guaranteed to turn the power
>  off in the first place (it may just put the device into D3hot).  If anything,
>  acpi_device_set_power() should be used for that, but even that is not
>  guaranteed to actually remove the power (power resources may be shared with
>  other devices, so in fact that operation should be done by acpi_bus_trim()
>  for each of the trimmed devices.]

I agree.  I cannot tell for other vendor's implementation, but I expect
that _EJ0 takes care of the power state after it is ejected.

Thanks,
-Toshi


--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael Wysocki Nov. 29, 2012, 4:57 p.m. UTC | #29
On Thursday, November 29, 2012 12:30:30 PM Vasilis Liaskovitis wrote:
> On Thu, Nov 29, 2012 at 11:03:05AM +0100, Rafael J. Wysocki wrote:
> > On Wednesday, November 28, 2012 06:15:42 PM Toshi Kani wrote:
> > > On Wed, 2012-11-28 at 18:02 -0700, Toshi Kani wrote:
> > > > On Thu, 2012-11-29 at 00:49 +0100, Rafael J. Wysocki wrote:
> > > > > On Wednesday, November 28, 2012 02:02:48 PM Toshi Kani wrote:
> > > > > > > > > > > > > Consider the following case:
> > > > > > > > > > > > > 
> > > > > > > > > > > > > We hotremove the memory device by SCI and unbind it from the driver at the same time:
> > > > > > > > > > > > > 
> > > > > > > > > > > > > CPUa                                                  CPUb
> > > > > > > > > > > > > acpi_memory_device_notify()
> > > > > > > > > > > > >                                        unbind it from the driver
> > > > > > > > > > > > >     acpi_bus_hot_remove_device()
> > > > > > > > > > > > 
> [...]
> > Well, in the meantime I've had a look at acpi_bus_hot_remove_device() and
> > friends and I think there's a way to address all of these problems
> > without big redesign (for now).
> > 
> > First, why don't we introduce an ACPI device flag (in the flags field of
> > struct acpi_device) called eject_forbidden or something like this such that:
> > 
> > (1) It will be clear by default.
> > (2) It may only be set by a driver's .add() routine if necessary.
> > (3) Once set, it may only be cleared by the driver's .remove() routine if
> >     it's safe to physically remove the device after the .remove().
> > 
> > Then, after the .remove() (which must be successful) has returned, and the
> > flag is set, it will tell acpi_bus_remove() to return a specific error code
> > (such as -EBUSY or -EAGAIN).  It doesn't matter if .remove() was called
> > earlier, because if it left the flag set, there's no way to clear it afterward
> > and acpi_bus_remove() will see it set anyway.  I think the struct acpi_device
> > should be unregistered anyway if that error code is to be returned.
> > 
> > [By the way, do you know where we free the memory allocated for struct
> >  acpi_device objects?]
> > 
> > Now if acpi_bus_trim() gets that error code from acpi_bus_remove(), it should
> > store it, but continue the trimming normally and finally it should return that
> > error code to acpi_bus_hot_remove_device().
> 
> Side-note: In the pre_remove patches, acpi_bus_trim actually returns on the
> first error from acpi_bus_remove (e.g. when memory offlining in pre_remove
> fails). Trimming is not continued. 
> 
> Normally, acpi_bus_trim keeps trimming as you say, and always returns the last
> error. Is this the desired behaviour that we want to keep for bus_trim? (This is
> more a general question, not specific to the eject_forbidden suggestion)
> 
> > 
> > Now, if acpi_bus_hot_remove_device() gets that error code, it should just
> > reverse the whole trimming (i.e. trigger acpi_bus_scan() from the device
> > we attempted to eject) and notify the firmware about the failure.
> 
> sounds like this rollback needs to be implemented in any solution we choose
> to implement, correct?
> 
> > 
> > If we have that, then the memory hotplug driver would only need to set
> > flags.eject_forbidden in its .add() routine and make its .remove() routine
> > only clear that flag if it is safe to actually remove the memory.
> > 
> 
> But when .remove op is called, we are already in the irreversible/error-free
> removal (final removal step).

Why so?  What prevents us from doing a bus scan again and binding the driver
again to the device?  Is .remove() doing something to the firmware?

Rafael
Toshi Kani Nov. 29, 2012, 5:44 p.m. UTC | #30
On Thu, 2012-11-29 at 12:04 +0100, Vasilis Liaskovitis wrote:
> Hi,
> 
> On Wed, Nov 28, 2012 at 06:15:42PM -0700, Toshi Kani wrote:
> > On Wed, 2012-11-28 at 18:02 -0700, Toshi Kani wrote:
> > > On Thu, 2012-11-29 at 00:49 +0100, Rafael J. Wysocki wrote:
> > > > On Wednesday, November 28, 2012 02:02:48 PM Toshi Kani wrote:
> > > > > > > > > > > > Consider the following case:
> > > > > > > > > > > > 
> > > > > > > > > > > > We hotremove the memory device by SCI and unbind it from the driver at the same time:
> > > > > > > > > > > > 
> > > > > > > > > > > > CPUa                                                  CPUb
> > > > > > > > > > > > acpi_memory_device_notify()
> > > > > > > > > > > >                                        unbind it from the driver
> > > > > > > > > > > >     acpi_bus_hot_remove_device()
> > > > I see two reasons for calling acpi_bus_hot_remove_device() for memory (correct
> > > > me if I'm wrong): (1) from the memhotplug driver's notify handler and (2) from
> > > > acpi_eject_store() which is exposed through sysfs.  
> > > 
> > > Yes, that is correct.
> > > 
> > > > If we disabled exposing
> > > > acpi_eject_store() for memory devices, then the only way would be from the
> > > > notify handler.  So I wonder if driver_unbind() shouldn't just uninstall the
> > > > notify handler for memory (so that memory eject events are simply dropped on
> > > > the floor after unbinding the driver)?
> > > 
> > > If driver_unbind() happens before an eject request, we do not have a
> > > problem.  acpi_eject_store() fails if a driver is not bound to the
> > > device.  acpi_memory_device_notify() fails as well.
> > > 
> > > The race condition Wen pointed out (see the top of this email) is that
> > > driver_unbind() may come in while eject operation is in-progress.  This
> > > is why I mentioned the following in previous email.
> > > 
> > > > So, we basically need to either 1) serialize
> > > > acpi_bus_hot_remove_device() and driver_unbind(), or 2) make
> > > > acpi_bus_hot_remove_device() to fail if driver_unbind() is run
> > > > during the operation.
> > 
> > Forgot to mention.  The 3rd option is what Greg said -- use the
> > suppress_bind_attrs field.  I think this is a good option to address
> > this race condition for now.  For a long term solution, we should have a
> > better infrastructure in place to address such issue in general.
> 
> I like the suppress_bind_attrs idea, I 'll take a look.

Great!

> As I said for option 2), acpi_bus_remove could check for driver presence.
> But It's more a quick hack to abort the eject (the race with unbind can still
> happen, but acpi_bus_remove can now detect it later in the eject path).
> Something like:
> 
>  static int acpi_bus_remove(struct acpi_device *dev, int rmdevice)
>  {
> +	int ret;
>  	if (!dev)
>  		return -EINVAL;
>  
>  	dev->removal_type = ACPI_BUS_REMOVAL_EJECT;
> +
> +	if (dev->driver && dev->driver->ops.prepare_remove) {
> +		ret = dev->driver->ops.prepare_remove(dev);
> +		if (ret)
> +			return ret;
> +	}
> +	else if (!dev->driver)
> +		return -ENODEV;
>  	device_release_driver(&dev->dev);

Yes, that's what I had in mind along with device_lock().  I think the
lock is necessary to close the window.
http://www.spinics.net/lists/linux-mm/msg46973.html

But as I mentioned in other email, I prefer option 3 with
suppress_bind_attrs.  So, yes, please take a look to see how it works
out.

Thanks,
-Toshi


--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Toshi Kani Nov. 29, 2012, 5:56 p.m. UTC | #31
On Thu, 2012-11-29 at 12:30 +0100, Vasilis Liaskovitis wrote:
> On Thu, Nov 29, 2012 at 11:03:05AM +0100, Rafael J. Wysocki wrote:
> > On Wednesday, November 28, 2012 06:15:42 PM Toshi Kani wrote:
> > > On Wed, 2012-11-28 at 18:02 -0700, Toshi Kani wrote:
> > > > On Thu, 2012-11-29 at 00:49 +0100, Rafael J. Wysocki wrote:
> > > > > On Wednesday, November 28, 2012 02:02:48 PM Toshi Kani wrote:
> > > > > > > > > > > > > Consider the following case:
> > > > > > > > > > > > > 
> > > > > > > > > > > > > We hotremove the memory device by SCI and unbind it from the driver at the same time:
> > > > > > > > > > > > > 
> > > > > > > > > > > > > CPUa                                                  CPUb
> > > > > > > > > > > > > acpi_memory_device_notify()
> > > > > > > > > > > > >                                        unbind it from the driver
> > > > > > > > > > > > >     acpi_bus_hot_remove_device()
> > > > > > > > > > > > 
> [...]
> > Well, in the meantime I've had a look at acpi_bus_hot_remove_device() and
> > friends and I think there's a way to address all of these problems
> > without big redesign (for now).
> > 
> > First, why don't we introduce an ACPI device flag (in the flags field of
> > struct acpi_device) called eject_forbidden or something like this such that:
> > 
> > (1) It will be clear by default.
> > (2) It may only be set by a driver's .add() routine if necessary.
> > (3) Once set, it may only be cleared by the driver's .remove() routine if
> >     it's safe to physically remove the device after the .remove().
> > 
> > Then, after the .remove() (which must be successful) has returned, and the
> > flag is set, it will tell acpi_bus_remove() to return a specific error code
> > (such as -EBUSY or -EAGAIN).  It doesn't matter if .remove() was called
> > earlier, because if it left the flag set, there's no way to clear it afterward
> > and acpi_bus_remove() will see it set anyway.  I think the struct acpi_device
> > should be unregistered anyway if that error code is to be returned.
> > 
> > [By the way, do you know where we free the memory allocated for struct
> >  acpi_device objects?]
> > 
> > Now if acpi_bus_trim() gets that error code from acpi_bus_remove(), it should
> > store it, but continue the trimming normally and finally it should return that
> > error code to acpi_bus_hot_remove_device().
> 
> Side-note: In the pre_remove patches, acpi_bus_trim actually returns on the
> first error from acpi_bus_remove (e.g. when memory offlining in pre_remove
> fails). Trimming is not continued. 
> 
> Normally, acpi_bus_trim keeps trimming as you say, and always returns the last
> error. Is this the desired behaviour that we want to keep for bus_trim? (This is
> more a general question, not specific to the eject_forbidden suggestion)

Your change makes sense to me.  At least until we have rollback code in
place, we need to fail as soon as we hit an error.
 
> > Now, if acpi_bus_hot_remove_device() gets that error code, it should just
> > reverse the whole trimming (i.e. trigger acpi_bus_scan() from the device
> > we attempted to eject) and notify the firmware about the failure.
> 
> sounds like this rollback needs to be implemented in any solution we choose
> to implement, correct?

Yes, rollback is necessary.  But I do not think we need to include it
into your patch, though.

Thanks,
-Toshi
 
> > If we have that, then the memory hotplug driver would only need to set
> > flags.eject_forbidden in its .add() routine and make its .remove() routine
> > only clear that flag if it is safe to actually remove the memory.
> > 
> 
> But when .remove op is called, we are already in the irreversible/error-free
> removal (final removal step).
> Maybe we need to reset eject_forbidden in a prepare_remove operation which
> handles the removal part that can fail ?
> 
> thanks,
> 
> - Vasilis


--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael Wysocki Nov. 29, 2012, 8:25 p.m. UTC | #32
On Thursday, November 29, 2012 10:56:30 AM Toshi Kani wrote:
> On Thu, 2012-11-29 at 12:30 +0100, Vasilis Liaskovitis wrote:
> > On Thu, Nov 29, 2012 at 11:03:05AM +0100, Rafael J. Wysocki wrote:
> > > On Wednesday, November 28, 2012 06:15:42 PM Toshi Kani wrote:
> > > > On Wed, 2012-11-28 at 18:02 -0700, Toshi Kani wrote:
> > > > > On Thu, 2012-11-29 at 00:49 +0100, Rafael J. Wysocki wrote:
> > > > > > On Wednesday, November 28, 2012 02:02:48 PM Toshi Kani wrote:
> > > > > > > > > > > > > > Consider the following case:
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > We hotremove the memory device by SCI and unbind it from the driver at the same time:
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > CPUa                                                  CPUb
> > > > > > > > > > > > > > acpi_memory_device_notify()
> > > > > > > > > > > > > >                                        unbind it from the driver
> > > > > > > > > > > > > >     acpi_bus_hot_remove_device()
> > > > > > > > > > > > > 
> > [...]
> > > Well, in the meantime I've had a look at acpi_bus_hot_remove_device() and
> > > friends and I think there's a way to address all of these problems
> > > without big redesign (for now).
> > > 
> > > First, why don't we introduce an ACPI device flag (in the flags field of
> > > struct acpi_device) called eject_forbidden or something like this such that:
> > > 
> > > (1) It will be clear by default.
> > > (2) It may only be set by a driver's .add() routine if necessary.
> > > (3) Once set, it may only be cleared by the driver's .remove() routine if
> > >     it's safe to physically remove the device after the .remove().
> > > 
> > > Then, after the .remove() (which must be successful) has returned, and the
> > > flag is set, it will tell acpi_bus_remove() to return a specific error code
> > > (such as -EBUSY or -EAGAIN).  It doesn't matter if .remove() was called
> > > earlier, because if it left the flag set, there's no way to clear it afterward
> > > and acpi_bus_remove() will see it set anyway.  I think the struct acpi_device
> > > should be unregistered anyway if that error code is to be returned.
> > > 
> > > [By the way, do you know where we free the memory allocated for struct
> > >  acpi_device objects?]
> > > 
> > > Now if acpi_bus_trim() gets that error code from acpi_bus_remove(), it should
> > > store it, but continue the trimming normally and finally it should return that
> > > error code to acpi_bus_hot_remove_device().
> > 
> > Side-note: In the pre_remove patches, acpi_bus_trim actually returns on the
> > first error from acpi_bus_remove (e.g. when memory offlining in pre_remove
> > fails). Trimming is not continued. 
> > 
> > Normally, acpi_bus_trim keeps trimming as you say, and always returns the last
> > error. Is this the desired behaviour that we want to keep for bus_trim? (This is
> > more a general question, not specific to the eject_forbidden suggestion)
> 
> Your change makes sense to me.  At least until we have rollback code in
> place, we need to fail as soon as we hit an error.

Are you sure this makes sense?  What happens to the devices that we have
trimmed already and then there's an error?  Looks like they are just unusable
going forward, aren't they?

> > > Now, if acpi_bus_hot_remove_device() gets that error code, it should just
> > > reverse the whole trimming (i.e. trigger acpi_bus_scan() from the device
> > > we attempted to eject) and notify the firmware about the failure.
> > 
> > sounds like this rollback needs to be implemented in any solution we choose
> > to implement, correct?
> 
> Yes, rollback is necessary.  But I do not think we need to include it
> into your patch, though.

As the first step, we should just trim everything and then return an error
code in my opinion.

Thanks,
Rafael
Toshi Kani Nov. 29, 2012, 8:38 p.m. UTC | #33
On Thu, 2012-11-29 at 21:25 +0100, Rafael J. Wysocki wrote:
> On Thursday, November 29, 2012 10:56:30 AM Toshi Kani wrote:
> > On Thu, 2012-11-29 at 12:30 +0100, Vasilis Liaskovitis wrote:
> > > On Thu, Nov 29, 2012 at 11:03:05AM +0100, Rafael J. Wysocki wrote:
> > > > On Wednesday, November 28, 2012 06:15:42 PM Toshi Kani wrote:
> > > > > On Wed, 2012-11-28 at 18:02 -0700, Toshi Kani wrote:
> > > > > > On Thu, 2012-11-29 at 00:49 +0100, Rafael J. Wysocki wrote:
> > > > > > > On Wednesday, November 28, 2012 02:02:48 PM Toshi Kani wrote:
> > > > > > > > > > > > > > > Consider the following case:
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > We hotremove the memory device by SCI and unbind it from the driver at the same time:
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > CPUa                                                  CPUb
> > > > > > > > > > > > > > > acpi_memory_device_notify()
> > > > > > > > > > > > > > >                                        unbind it from the driver
> > > > > > > > > > > > > > >     acpi_bus_hot_remove_device()
> > > > > > > > > > > > > > 
> > > [...]
> > > > Well, in the meantime I've had a look at acpi_bus_hot_remove_device() and
> > > > friends and I think there's a way to address all of these problems
> > > > without big redesign (for now).
> > > > 
> > > > First, why don't we introduce an ACPI device flag (in the flags field of
> > > > struct acpi_device) called eject_forbidden or something like this such that:
> > > > 
> > > > (1) It will be clear by default.
> > > > (2) It may only be set by a driver's .add() routine if necessary.
> > > > (3) Once set, it may only be cleared by the driver's .remove() routine if
> > > >     it's safe to physically remove the device after the .remove().
> > > > 
> > > > Then, after the .remove() (which must be successful) has returned, and the
> > > > flag is set, it will tell acpi_bus_remove() to return a specific error code
> > > > (such as -EBUSY or -EAGAIN).  It doesn't matter if .remove() was called
> > > > earlier, because if it left the flag set, there's no way to clear it afterward
> > > > and acpi_bus_remove() will see it set anyway.  I think the struct acpi_device
> > > > should be unregistered anyway if that error code is to be returned.
> > > > 
> > > > [By the way, do you know where we free the memory allocated for struct
> > > >  acpi_device objects?]
> > > > 
> > > > Now if acpi_bus_trim() gets that error code from acpi_bus_remove(), it should
> > > > store it, but continue the trimming normally and finally it should return that
> > > > error code to acpi_bus_hot_remove_device().
> > > 
> > > Side-note: In the pre_remove patches, acpi_bus_trim actually returns on the
> > > first error from acpi_bus_remove (e.g. when memory offlining in pre_remove
> > > fails). Trimming is not continued. 
> > > 
> > > Normally, acpi_bus_trim keeps trimming as you say, and always returns the last
> > > error. Is this the desired behaviour that we want to keep for bus_trim? (This is
> > > more a general question, not specific to the eject_forbidden suggestion)
> > 
> > Your change makes sense to me.  At least until we have rollback code in
> > place, we need to fail as soon as we hit an error.
> 
> Are you sure this makes sense?  What happens to the devices that we have
> trimmed already and then there's an error?  Looks like they are just unusable
> going forward, aren't they?

Yes, the devices trimmed already are released from the kernel, and their
memory ranges become unusable.  This is bad.  But I do not think we
should trim further to make more devices unusable after an error. 


> > > > Now, if acpi_bus_hot_remove_device() gets that error code, it should just
> > > > reverse the whole trimming (i.e. trigger acpi_bus_scan() from the device
> > > > we attempted to eject) and notify the firmware about the failure.
> > > 
> > > sounds like this rollback needs to be implemented in any solution we choose
> > > to implement, correct?
> > 
> > Yes, rollback is necessary.  But I do not think we need to include it
> > into your patch, though.
> 
> As the first step, we should just trim everything and then return an error
> code in my opinion.

But we cannot trim devices with kernel memory.

Thanks,
-Toshi

--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael Wysocki Nov. 29, 2012, 9:23 p.m. UTC | #34
On Thursday, November 29, 2012 01:38:39 PM Toshi Kani wrote:
> On Thu, 2012-11-29 at 21:25 +0100, Rafael J. Wysocki wrote:
> > On Thursday, November 29, 2012 10:56:30 AM Toshi Kani wrote:
> > > On Thu, 2012-11-29 at 12:30 +0100, Vasilis Liaskovitis wrote:
> > > > On Thu, Nov 29, 2012 at 11:03:05AM +0100, Rafael J. Wysocki wrote:
> > > > > On Wednesday, November 28, 2012 06:15:42 PM Toshi Kani wrote:
> > > > > > On Wed, 2012-11-28 at 18:02 -0700, Toshi Kani wrote:
> > > > > > > On Thu, 2012-11-29 at 00:49 +0100, Rafael J. Wysocki wrote:
> > > > > > > > On Wednesday, November 28, 2012 02:02:48 PM Toshi Kani wrote:
> > > > > > > > > > > > > > > > Consider the following case:
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > We hotremove the memory device by SCI and unbind it from the driver at the same time:
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > CPUa                                                  CPUb
> > > > > > > > > > > > > > > > acpi_memory_device_notify()
> > > > > > > > > > > > > > > >                                        unbind it from the driver
> > > > > > > > > > > > > > > >     acpi_bus_hot_remove_device()
> > > > > > > > > > > > > > > 
> > > > [...]
> > > > > Well, in the meantime I've had a look at acpi_bus_hot_remove_device() and
> > > > > friends and I think there's a way to address all of these problems
> > > > > without big redesign (for now).
> > > > > 
> > > > > First, why don't we introduce an ACPI device flag (in the flags field of
> > > > > struct acpi_device) called eject_forbidden or something like this such that:
> > > > > 
> > > > > (1) It will be clear by default.
> > > > > (2) It may only be set by a driver's .add() routine if necessary.
> > > > > (3) Once set, it may only be cleared by the driver's .remove() routine if
> > > > >     it's safe to physically remove the device after the .remove().
> > > > > 
> > > > > Then, after the .remove() (which must be successful) has returned, and the
> > > > > flag is set, it will tell acpi_bus_remove() to return a specific error code
> > > > > (such as -EBUSY or -EAGAIN).  It doesn't matter if .remove() was called
> > > > > earlier, because if it left the flag set, there's no way to clear it afterward
> > > > > and acpi_bus_remove() will see it set anyway.  I think the struct acpi_device
> > > > > should be unregistered anyway if that error code is to be returned.
> > > > > 
> > > > > [By the way, do you know where we free the memory allocated for struct
> > > > >  acpi_device objects?]
> > > > > 
> > > > > Now if acpi_bus_trim() gets that error code from acpi_bus_remove(), it should
> > > > > store it, but continue the trimming normally and finally it should return that
> > > > > error code to acpi_bus_hot_remove_device().
> > > > 
> > > > Side-note: In the pre_remove patches, acpi_bus_trim actually returns on the
> > > > first error from acpi_bus_remove (e.g. when memory offlining in pre_remove
> > > > fails). Trimming is not continued. 
> > > > 
> > > > Normally, acpi_bus_trim keeps trimming as you say, and always returns the last
> > > > error. Is this the desired behaviour that we want to keep for bus_trim? (This is
> > > > more a general question, not specific to the eject_forbidden suggestion)
> > > 
> > > Your change makes sense to me.  At least until we have rollback code in
> > > place, we need to fail as soon as we hit an error.
> > 
> > Are you sure this makes sense?  What happens to the devices that we have
> > trimmed already and then there's an error?  Looks like they are just unusable
> > going forward, aren't they?
> 
> Yes, the devices trimmed already are released from the kernel, and their
> memory ranges become unusable.  This is bad.  But I do not think we
> should trim further to make more devices unusable after an error. 
> 
> 
> > > > > Now, if acpi_bus_hot_remove_device() gets that error code, it should just
> > > > > reverse the whole trimming (i.e. trigger acpi_bus_scan() from the device
> > > > > we attempted to eject) and notify the firmware about the failure.
> > > > 
> > > > sounds like this rollback needs to be implemented in any solution we choose
> > > > to implement, correct?
> > > 
> > > Yes, rollback is necessary.  But I do not think we need to include it
> > > into your patch, though.
> > 
> > As the first step, we should just trim everything and then return an error
> > code in my opinion.
> 
> But we cannot trim devices with kernel memory.

Well, let's put it this way: If we started a trim, we should just do it
completely, in which case we know we can go for the eject, or we should
roll it back completely.  Now, if you just break the trim on first error,
the complete rollback is kind of problematic.  It should be doable, but
it won't be easy.  On the other hand, if you go for the full trim,
doing a rollback is trivial, it's as though you have reinserted the whole
stuff.

Now, that need not harm functionality, and that's why I proposed the
eject_forbidden flag, so that .remove() can say "I'm not done, please
rollback", in which case the device can happily function going forward,
even if we don't rebind the driver to it.

Thanks,
Rafael
Toshi Kani Nov. 29, 2012, 9:46 p.m. UTC | #35
On Thu, 2012-11-29 at 22:23 +0100, Rafael J. Wysocki wrote:
> On Thursday, November 29, 2012 01:38:39 PM Toshi Kani wrote:
> > On Thu, 2012-11-29 at 21:25 +0100, Rafael J. Wysocki wrote:
> > > On Thursday, November 29, 2012 10:56:30 AM Toshi Kani wrote:
> > > > On Thu, 2012-11-29 at 12:30 +0100, Vasilis Liaskovitis wrote:
> > > > > Side-note: In the pre_remove patches, acpi_bus_trim actually returns on the
> > > > > first error from acpi_bus_remove (e.g. when memory offlining in pre_remove
> > > > > fails). Trimming is not continued. 
> > > > > 
> > > > > Normally, acpi_bus_trim keeps trimming as you say, and always returns the last
> > > > > error. Is this the desired behaviour that we want to keep for bus_trim? (This is
> > > > > more a general question, not specific to the eject_forbidden suggestion)
> > > > 
> > > > Your change makes sense to me.  At least until we have rollback code in
> > > > place, we need to fail as soon as we hit an error.
> > > 
> > > Are you sure this makes sense?  What happens to the devices that we have
> > > trimmed already and then there's an error?  Looks like they are just unusable
> > > going forward, aren't they?
> > 
> > Yes, the devices trimmed already are released from the kernel, and their
> > memory ranges become unusable.  This is bad.  But I do not think we
> > should trim further to make more devices unusable after an error. 
> > 
> > 
> > > > > > Now, if acpi_bus_hot_remove_device() gets that error code, it should just
> > > > > > reverse the whole trimming (i.e. trigger acpi_bus_scan() from the device
> > > > > > we attempted to eject) and notify the firmware about the failure.
> > > > > 
> > > > > sounds like this rollback needs to be implemented in any solution we choose
> > > > > to implement, correct?
> > > > 
> > > > Yes, rollback is necessary.  But I do not think we need to include it
> > > > into your patch, though.
> > > 
> > > As the first step, we should just trim everything and then return an error
> > > code in my opinion.
> > 
> > But we cannot trim devices with kernel memory.
> 
> Well, let's put it this way: If we started a trim, we should just do it
> completely, in which case we know we can go for the eject, or we should
> roll it back completely.  Now, if you just break the trim on first error,
> the complete rollback is kind of problematic.  It should be doable, but
> it won't be easy.  On the other hand, if you go for the full trim,
> doing a rollback is trivial, it's as though you have reinserted the whole
> stuff.

acpi_bus_check_add() skips initialization when an ACPI device already
has its associated acpi_device.  So, I think it works either way.


> Now, that need not harm functionality, and that's why I proposed the
> eject_forbidden flag, so that .remove() can say "I'm not done, please
> rollback", in which case the device can happily function going forward,
> even if we don't rebind the driver to it.

A partially trimmed acpi_device is hard to rollback.  acpi_device should
be either trimmed completely or intact.  When a function failed to trim
an acpi_device, it needs to rollback its operation for the device before
returning an error.  This is because only the failed function has enough
context to rollback when an error occurred in the middle of its
procedure.

Thanks,
-Toshi  




--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael Wysocki Nov. 29, 2012, 10:11 p.m. UTC | #36
On Thursday, November 29, 2012 02:46:44 PM Toshi Kani wrote:
> On Thu, 2012-11-29 at 22:23 +0100, Rafael J. Wysocki wrote:
> > On Thursday, November 29, 2012 01:38:39 PM Toshi Kani wrote:
> > > On Thu, 2012-11-29 at 21:25 +0100, Rafael J. Wysocki wrote:
> > > > On Thursday, November 29, 2012 10:56:30 AM Toshi Kani wrote:
> > > > > On Thu, 2012-11-29 at 12:30 +0100, Vasilis Liaskovitis wrote:
> > > > > > Side-note: In the pre_remove patches, acpi_bus_trim actually returns on the
> > > > > > first error from acpi_bus_remove (e.g. when memory offlining in pre_remove
> > > > > > fails). Trimming is not continued. 
> > > > > > 
> > > > > > Normally, acpi_bus_trim keeps trimming as you say, and always returns the last
> > > > > > error. Is this the desired behaviour that we want to keep for bus_trim? (This is
> > > > > > more a general question, not specific to the eject_forbidden suggestion)
> > > > > 
> > > > > Your change makes sense to me.  At least until we have rollback code in
> > > > > place, we need to fail as soon as we hit an error.
> > > > 
> > > > Are you sure this makes sense?  What happens to the devices that we have
> > > > trimmed already and then there's an error?  Looks like they are just unusable
> > > > going forward, aren't they?
> > > 
> > > Yes, the devices trimmed already are released from the kernel, and their
> > > memory ranges become unusable.  This is bad.  But I do not think we
> > > should trim further to make more devices unusable after an error. 
> > > 
> > > 
> > > > > > > Now, if acpi_bus_hot_remove_device() gets that error code, it should just
> > > > > > > reverse the whole trimming (i.e. trigger acpi_bus_scan() from the device
> > > > > > > we attempted to eject) and notify the firmware about the failure.
> > > > > > 
> > > > > > sounds like this rollback needs to be implemented in any solution we choose
> > > > > > to implement, correct?
> > > > > 
> > > > > Yes, rollback is necessary.  But I do not think we need to include it
> > > > > into your patch, though.
> > > > 
> > > > As the first step, we should just trim everything and then return an error
> > > > code in my opinion.
> > > 
> > > But we cannot trim devices with kernel memory.
> > 
> > Well, let's put it this way: If we started a trim, we should just do it
> > completely, in which case we know we can go for the eject, or we should
> > roll it back completely.  Now, if you just break the trim on first error,
> > the complete rollback is kind of problematic.  It should be doable, but
> > it won't be easy.  On the other hand, if you go for the full trim,
> > doing a rollback is trivial, it's as though you have reinserted the whole
> > stuff.
> 
> acpi_bus_check_add() skips initialization when an ACPI device already
> has its associated acpi_device.  So, I think it works either way.

OK

> > Now, that need not harm functionality, and that's why I proposed the
> > eject_forbidden flag, so that .remove() can say "I'm not done, please
> > rollback", in which case the device can happily function going forward,
> > even if we don't rebind the driver to it.
> 
> A partially trimmed acpi_device is hard to rollback.  acpi_device should
> be either trimmed completely or intact.

I may or may not agree, depending on what you mean by "trimmed". :-)

> When a function failed to trim
> an acpi_device, it needs to rollback its operation for the device before
> returning an error.

Unless it is .remove(), because .remove() is supposed to always succeed
(ie. unbind the driver from the device).  However, it may signal the caller
that something's fishy, by setting a flag in the device object, for example.

> This is because only the failed function has enough
> context to rollback when an error occurred in the middle of its
> procedure.

Not really.  If it actually removes the struct acpi_device then the caller
may run acpi_bus_scan() on that device if necessary.  There may be a problem
if the device has an associated physical node (or more of them), but that
requires special care anyway.

Thanks,
Rafael
Toshi Kani Nov. 29, 2012, 11:17 p.m. UTC | #37
On Thu, 2012-11-29 at 23:11 +0100, Rafael J. Wysocki wrote:
> On Thursday, November 29, 2012 02:46:44 PM Toshi Kani wrote:
> > On Thu, 2012-11-29 at 22:23 +0100, Rafael J. Wysocki wrote:
> > > On Thursday, November 29, 2012 01:38:39 PM Toshi Kani wrote:
> > > 
> > > Well, let's put it this way: If we started a trim, we should just do it
> > > completely, in which case we know we can go for the eject, or we should
> > > roll it back completely.  Now, if you just break the trim on first error,
> > > the complete rollback is kind of problematic.  It should be doable, but
> > > it won't be easy.  On the other hand, if you go for the full trim,
> > > doing a rollback is trivial, it's as though you have reinserted the whole
> > > stuff.
> > 
> > acpi_bus_check_add() skips initialization when an ACPI device already
> > has its associated acpi_device.  So, I think it works either way.
> 
> OK
> 
> > > Now, that need not harm functionality, and that's why I proposed the
> > > eject_forbidden flag, so that .remove() can say "I'm not done, please
> > > rollback", in which case the device can happily function going forward,
> > > even if we don't rebind the driver to it.
> > 
> > A partially trimmed acpi_device is hard to rollback.  acpi_device should
> > be either trimmed completely or intact.
> 
> I may or may not agree, depending on what you mean by "trimmed". :-)
> 
> > When a function failed to trim
> > an acpi_device, it needs to rollback its operation for the device before
> > returning an error.
> 
> Unless it is .remove(), because .remove() is supposed to always succeed
> (ie. unbind the driver from the device).  However, it may signal the caller
> that something's fishy, by setting a flag in the device object, for example.

Right, .remove() cannot fail.  We still need to check if we should
continue to use .remove(), though.

As for the flag, are you thinking that we call acpi_bus_trim() with
rmdevice false first, so that it won't remove acpi_device?

> > This is because only the failed function has enough
> > context to rollback when an error occurred in the middle of its
> > procedure.
> 
> Not really.  If it actually removes the struct acpi_device then the caller
> may run acpi_bus_scan() on that device if necessary.  There may be a problem
> if the device has an associated physical node (or more of them), but that
> requires special care anyway.

Well, hot-remove to a device fails when there is a reason to fail.  IOW,
such reason prevented the device to be removed safely.  So, I think we
need to put it back to the original state in this case.  Removing it by
ignoring the cause of failure sounds unsafe to me.  Some status/data may
be left un-deleted as a result.

Thanks,
-Toshi


--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael Wysocki Nov. 30, 2012, 12:13 a.m. UTC | #38
On Thursday, November 29, 2012 04:17:19 PM Toshi Kani wrote:
> On Thu, 2012-11-29 at 23:11 +0100, Rafael J. Wysocki wrote:
> > On Thursday, November 29, 2012 02:46:44 PM Toshi Kani wrote:
> > > On Thu, 2012-11-29 at 22:23 +0100, Rafael J. Wysocki wrote:
> > > > On Thursday, November 29, 2012 01:38:39 PM Toshi Kani wrote:
> > > > 
> > > > Well, let's put it this way: If we started a trim, we should just do it
> > > > completely, in which case we know we can go for the eject, or we should
> > > > roll it back completely.  Now, if you just break the trim on first error,
> > > > the complete rollback is kind of problematic.  It should be doable, but
> > > > it won't be easy.  On the other hand, if you go for the full trim,
> > > > doing a rollback is trivial, it's as though you have reinserted the whole
> > > > stuff.
> > > 
> > > acpi_bus_check_add() skips initialization when an ACPI device already
> > > has its associated acpi_device.  So, I think it works either way.
> > 
> > OK
> > 
> > > > Now, that need not harm functionality, and that's why I proposed the
> > > > eject_forbidden flag, so that .remove() can say "I'm not done, please
> > > > rollback", in which case the device can happily function going forward,
> > > > even if we don't rebind the driver to it.
> > > 
> > > A partially trimmed acpi_device is hard to rollback.  acpi_device should
> > > be either trimmed completely or intact.
> > 
> > I may or may not agree, depending on what you mean by "trimmed". :-)
> > 
> > > When a function failed to trim
> > > an acpi_device, it needs to rollback its operation for the device before
> > > returning an error.
> > 
> > Unless it is .remove(), because .remove() is supposed to always succeed
> > (ie. unbind the driver from the device).  However, it may signal the caller
> > that something's fishy, by setting a flag in the device object, for example.
> 
> Right, .remove() cannot fail.  We still need to check if we should
> continue to use .remove(), though.
> 
> As for the flag, are you thinking that we call acpi_bus_trim() with
> rmdevice false first, so that it won't remove acpi_device?

I'm not sure if that's going to help.

Definitely, .remove() should just unbind the driver from the device.
That's what it's supposed to do.  Still, it may leave some information for
the caller in the device structure itself.  For example, "I have unbound
from the device, but it is not safe to remove it physically".

I'm now thinking that we may need to rework the trimming so that
.remove() is called for all drivers first and the struct acpi_device
objects are not removed at this stage.  Then, if .remove() from one
driver signals the situation like above, the routine will have to
rebind the drivers that have been unbound and we're done.

After that stage, when all drivers have been unbound, we should be
able to go for full eject.  First, we can drop all struct acpi_device
objects in the relevant subtree and then we can run _EJ0.

> > > This is because only the failed function has enough
> > > context to rollback when an error occurred in the middle of its
> > > procedure.
> > 
> > Not really.  If it actually removes the struct acpi_device then the caller
> > may run acpi_bus_scan() on that device if necessary.  There may be a problem
> > if the device has an associated physical node (or more of them), but that
> > requires special care anyway.
> 
> Well, hot-remove to a device fails when there is a reason to fail.  IOW,
> such reason prevented the device to be removed safely.  So, I think we
> need to put it back to the original state in this case.  Removing it by
> ignoring the cause of failure sounds unsafe to me.  Some status/data may
> be left un-deleted as a result.

Again, I may or may not agree with that, depending on whether you're talking
about physical devices or about struct acpi_device objects.

Anyway, I agree that removing struct acpi_device objects may not be worth the
effort if we're going to re-create them in a while, because that may be costly.

Thanks,
Rafael
Toshi Kani Nov. 30, 2012, 1:09 a.m. UTC | #39
On Fri, 2012-11-30 at 01:13 +0100, Rafael J. Wysocki wrote:
> On Thursday, November 29, 2012 04:17:19 PM Toshi Kani wrote:
> > On Thu, 2012-11-29 at 23:11 +0100, Rafael J. Wysocki wrote:
> > > On Thursday, November 29, 2012 02:46:44 PM Toshi Kani wrote:
> > > > On Thu, 2012-11-29 at 22:23 +0100, Rafael J. Wysocki wrote:
> > > > > On Thursday, November 29, 2012 01:38:39 PM Toshi Kani wrote:
> > > > > Now, that need not harm functionality, and that's why I proposed the
> > > > > eject_forbidden flag, so that .remove() can say "I'm not done, please
> > > > > rollback", in which case the device can happily function going forward,
> > > > > even if we don't rebind the driver to it.
> > > > 
> > > > A partially trimmed acpi_device is hard to rollback.  acpi_device should
> > > > be either trimmed completely or intact.
> > > 
> > > I may or may not agree, depending on what you mean by "trimmed". :-)
> > > 
> > > > When a function failed to trim
> > > > an acpi_device, it needs to rollback its operation for the device before
> > > > returning an error.
> > > 
> > > Unless it is .remove(), because .remove() is supposed to always succeed
> > > (ie. unbind the driver from the device).  However, it may signal the caller
> > > that something's fishy, by setting a flag in the device object, for example.
> > 
> > Right, .remove() cannot fail.  We still need to check if we should
> > continue to use .remove(), though.
> > 
> > As for the flag, are you thinking that we call acpi_bus_trim() with
> > rmdevice false first, so that it won't remove acpi_device?
> 
> I'm not sure if that's going to help.
> 
> Definitely, .remove() should just unbind the driver from the device.
> That's what it's supposed to do.  Still, it may leave some information for
> the caller in the device structure itself.  For example, "I have unbound
> from the device, but it is not safe to remove it physically".

Right.

> I'm now thinking that we may need to rework the trimming so that
> .remove() is called for all drivers first and the struct acpi_device
> objects are not removed at this stage.  Then, if .remove() from one
> driver signals the situation like above, the routine will have to
> rebind the drivers that have been unbound and we're done.
> 
> After that stage, when all drivers have been unbound, we should be
> able to go for full eject.  First, we can drop all struct acpi_device
> objects in the relevant subtree and then we can run _EJ0.

I agree that such approach is worth pursuing.

> > > > This is because only the failed function has enough
> > > > context to rollback when an error occurred in the middle of its
> > > > procedure.
> > > 
> > > Not really.  If it actually removes the struct acpi_device then the caller
> > > may run acpi_bus_scan() on that device if necessary.  There may be a problem
> > > if the device has an associated physical node (or more of them), but that
> > > requires special care anyway.
> > 
> > Well, hot-remove to a device fails when there is a reason to fail.  IOW,
> > such reason prevented the device to be removed safely.  So, I think we
> > need to put it back to the original state in this case.  Removing it by
> > ignoring the cause of failure sounds unsafe to me.  Some status/data may
> > be left un-deleted as a result.
> 
> Again, I may or may not agree with that, depending on whether you're talking
> about physical devices or about struct acpi_device objects.

Sorry, by "hot-remove a device", I was referring removing struct
acpi_device and off-lining its resource.  By "left un-delted", I was
referring its resource left un-deleted, such as memory ranges.

> Anyway, I agree that removing struct acpi_device objects may not be worth the
> effort if we're going to re-create them in a while, because that may be costly.

Thanks,
-Toshi

--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
index d0cfbd9..0562cb4 100644
--- a/drivers/acpi/acpi_memhotplug.c
+++ b/drivers/acpi/acpi_memhotplug.c
@@ -271,12 +271,11 @@  static int acpi_memory_enable_device(struct acpi_memory_device *mem_device)
 			continue;
 		}
 
-		if (!result)
-			info->enabled = 1;
 		/*
 		 * Add num_enable even if add_memory() returns -EEXIST, so the
 		 * device is bound to this driver.
 		 */
+		info->enabled = 1;
 		num_enabled++;
 	}
 	if (!num_enabled) {