mbox series

[v2,0/3] ACPI: New eject flow to remove devices cautiously

Message ID 20190703101348.3506-1-clin@suse.com (mailing list archive)
Headers show
Series ACPI: New eject flow to remove devices cautiously | expand

Message

Chester Lin July 3, 2019, 10:14 a.m. UTC
Currently there are two ways to handle ACPI device ejection. When an eject
event happens on a container, the kernel just sends KOBJ_CHANGE to
userland and userland should handle offline operation. For other device
types, acpi_scan_try_to_offline() is called and it tries to put target
device(s) offline and then removes all nodes once they are all offline.

However we found that sometimes applications could intensively access
resources on ejectable devices therefore they could have risk if ejection
suddenly happens and removes devices without any notification. In stead
of executing the offline callbakcs directly, we want to introduce a new
approach, which sends change events to notify all target nodes beforehand
and hands over offline handling to userland so that userland can have a
chance to schedule an offline task based on current workload. The online
function to recover from failure is also changed, it follows the same
approach to send change events rather than putting devices online directly
, which means userland will also need to take care of online handling.

To ensure that eject function can work properly since normal users might
not have their own offline/online handling, we will submit a generic udev
rule to systemd upstream as default in order to deal with change events
and take [offline/online] action accordingly. But the Hot-Removing part
still remains so the hotplug function can run to it once target nodes are
all offline.

To easily monitor eject status and start over an eject process, there's a
status trace mechanism in this eject flow, which helps to count current
online devices under the ejectable target, and it can reschedule an eject
event when all nodes within the device tree have been put offline.

v2:
- device_sysfs: Add descriptions in /Document/ABI/testing/sysfs-bus-acpi
- device_sysfs: Replace the declartion with DEVICE_ATTR_RW and add cancel
  option in eject_store.
- scan: Add a retry mechanism when userspace fail to put device offline.
- scan: Add ready-to-remove state.

Chester Lin (3):
  ACPI / hotplug: Send change events for offline/online requests when
    eject is triggered
  ACPI / hotplug: Eject status trace and auto-remove approach
  ACPI / device_sysfs: Add eject_show and add a cancel option in
    eject_store

 Documentation/ABI/testing/sysfs-bus-acpi |   9 +-
 drivers/acpi/container.c                 |   2 +-
 drivers/acpi/device_sysfs.c              |  94 ++++++-
 drivers/acpi/glue.c                      | 146 +++++++++++
 drivers/acpi/internal.h                  |  34 ++-
 drivers/acpi/scan.c                      | 318 +++++++++++++++++------
 drivers/base/core.c                      |   4 +
 include/acpi/acpi_bus.h                  |   3 +-
 include/linux/acpi.h                     |   6 +
 9 files changed, 523 insertions(+), 93 deletions(-)

Comments

Chester Lin July 3, 2019, 10:42 a.m. UTC | #1
On Wed, Jul 03, 2019 at 10:14:39AM +0000, Chester Lin wrote:
> Currently there are two ways to handle ACPI device ejection. When an eject
> event happens on a container, the kernel just sends KOBJ_CHANGE to
> userland and userland should handle offline operation. For other device
> types, acpi_scan_try_to_offline() is called and it tries to put target
> device(s) offline and then removes all nodes once they are all offline.
> 
> However we found that sometimes applications could intensively access
> resources on ejectable devices therefore they could have risk if ejection
> suddenly happens and removes devices without any notification. In stead
> of executing the offline callbakcs directly, we want to introduce a new
> approach, which sends change events to notify all target nodes beforehand
> and hands over offline handling to userland so that userland can have a
> chance to schedule an offline task based on current workload. The online
> function to recover from failure is also changed, it follows the same
> approach to send change events rather than putting devices online directly
> , which means userland will also need to take care of online handling.
> 
> To ensure that eject function can work properly since normal users might
> not have their own offline/online handling, we will submit a generic udev
> rule to systemd upstream as default in order to deal with change events
> and take [offline/online] action accordingly. But the Hot-Removing part
> still remains so the hotplug function can run to it once target nodes are
> all offline.
> 


Here are default rules we are going to propose:

# 80-acpi-hotplug-eject.rules
# Generic rules for handling ACPI hotplug eject.

SUBSYSTEM=="*", ACTION=="change", ENV{EVENT}=="offline", ATTR{online}=="1", \
DEVPATH=="*", ATTR{online}="0"

SUBSYSTEM=="*", ACTION=="change", ENV{EVENT}=="online", ATTR{online}=="0", \
DEVPATH=="*", ATTR{online}="1"


> To easily monitor eject status and start over an eject process, there's a
> status trace mechanism in this eject flow, which helps to count current
> online devices under the ejectable target, and it can reschedule an eject
> event when all nodes within the device tree have been put offline.
> 
> v2:
> - device_sysfs: Add descriptions in /Document/ABI/testing/sysfs-bus-acpi
> - device_sysfs: Replace the declartion with DEVICE_ATTR_RW and add cancel
>   option in eject_store.
> - scan: Add a retry mechanism when userspace fail to put device offline.
> - scan: Add ready-to-remove state.
> 
> Chester Lin (3):
>   ACPI / hotplug: Send change events for offline/online requests when
>     eject is triggered
>   ACPI / hotplug: Eject status trace and auto-remove approach
>   ACPI / device_sysfs: Add eject_show and add a cancel option in
>     eject_store
> 
>  Documentation/ABI/testing/sysfs-bus-acpi |   9 +-
>  drivers/acpi/container.c                 |   2 +-
>  drivers/acpi/device_sysfs.c              |  94 ++++++-
>  drivers/acpi/glue.c                      | 146 +++++++++++
>  drivers/acpi/internal.h                  |  34 ++-
>  drivers/acpi/scan.c                      | 318 +++++++++++++++++------
>  drivers/base/core.c                      |   4 +
>  include/acpi/acpi_bus.h                  |   3 +-
>  include/linux/acpi.h                     |   6 +
>  9 files changed, 523 insertions(+), 93 deletions(-)
> 
> -- 
> 2.20.1
>
Chester Lin Aug. 5, 2019, 7:45 a.m. UTC | #2
On Wed, Jul 03, 2019 at 10:14:39AM +0000, Chester Lin wrote:
> Currently there are two ways to handle ACPI device ejection. When an eject
> event happens on a container, the kernel just sends KOBJ_CHANGE to
> userland and userland should handle offline operation. For other device
> types, acpi_scan_try_to_offline() is called and it tries to put target
> device(s) offline and then removes all nodes once they are all offline.
> 
> However we found that sometimes applications could intensively access
> resources on ejectable devices therefore they could have risk if ejection
> suddenly happens and removes devices without any notification. In stead
> of executing the offline callbakcs directly, we want to introduce a new
> approach, which sends change events to notify all target nodes beforehand
> and hands over offline handling to userland so that userland can have a
> chance to schedule an offline task based on current workload. The online
> function to recover from failure is also changed, it follows the same
> approach to send change events rather than putting devices online directly
> , which means userland will also need to take care of online handling.
> 
> To ensure that eject function can work properly since normal users might
> not have their own offline/online handling, we will submit a generic udev
> rule to systemd upstream as default in order to deal with change events
> and take [offline/online] action accordingly. But the Hot-Removing part
> still remains so the hotplug function can run to it once target nodes are
> all offline.
> 
> To easily monitor eject status and start over an eject process, there's a
> status trace mechanism in this eject flow, which helps to count current
> online devices under the ejectable target, and it can reschedule an eject
> event when all nodes within the device tree have been put offline.
> 
> v2:
> - device_sysfs: Add descriptions in /Document/ABI/testing/sysfs-bus-acpi
> - device_sysfs: Replace the declartion with DEVICE_ATTR_RW and add cancel
>   option in eject_store.
> - scan: Add a retry mechanism when userspace fail to put device offline.
> - scan: Add ready-to-remove state.
> 
> Chester Lin (3):
>   ACPI / hotplug: Send change events for offline/online requests when
>     eject is triggered
>   ACPI / hotplug: Eject status trace and auto-remove approach
>   ACPI / device_sysfs: Add eject_show and add a cancel option in
>     eject_store
> 
>  Documentation/ABI/testing/sysfs-bus-acpi |   9 +-
>  drivers/acpi/container.c                 |   2 +-
>  drivers/acpi/device_sysfs.c              |  94 ++++++-
>  drivers/acpi/glue.c                      | 146 +++++++++++
>  drivers/acpi/internal.h                  |  34 ++-
>  drivers/acpi/scan.c                      | 318 +++++++++++++++++------
>  drivers/base/core.c                      |   4 +
>  include/acpi/acpi_bus.h                  |   3 +-
>  include/linux/acpi.h                     |   6 +
>  9 files changed, 523 insertions(+), 93 deletions(-)
>

Gentle ping. I will appreciate any comment on this series.

Thanks,
Chester