Message ID | 20130219181157.GA4366@dhcp-192-168-178-175.profitbricks.localdomain (mailing list archive) |
---|---|
State | Superseded, archived |
Headers | show |
Hi Vasilis, 2013/02/20 3:11, Vasilis Liaskovitis wrote: > Hi, > > On Sun, Feb 17, 2013 at 04:27:18PM +0100, Rafael J. Wysocki wrote: >> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> >> >> Make the ACPI memory hotplug driver use struct acpi_scan_handler >> for representing the object used to set up ACPI memory hotplug >> functionality and to remove hotplug memory ranges and data >> structures used by the driver before unregistering ACPI device >> nodes representing memory. Register the new struct acpi_scan_handler >> object with the help of acpi_scan_add_handler_with_hotplug() to allow >> user space to manipulate the attributes of the memory hotplug >> profile. > > Let's consider an example where we want acpi memory device ejection to be safely > handled by userspace. We do the following: > > echo 0 > /sys/firmware/acpi/hotplug/memory/autoeject > echo 1 > /sys/firmware/acpi/hotplug/memory/uevents > > We succesfully hotplug acpi device: > /sys/devices/LNXSYSTM:00/LNXSYSBUS:00/PNP0C80:00 > and its corresponding memblocks /sys/devices/system/memory/memoryXX are > also successfully onlined. > > On an eject request, since uevents == 1, the kernel will emit KOBJ_OFFLINE for: > /sys/devices/LNXSYSTM:00/LNXSYSBUS:00/PNP0C80:00 > > Can userspace know which memblocks in /sys/devices/system/memory/memoryXX/ > correspond to the acpi device /sys/devices/LNXSYSTM:00/LNXSYSBUS:00/PNP0C80:00 ? > This will be needed so that userspace tries to offline the memblocks (and only > if successful, issue the eject operation on the acpi device). As far as I see, > we don't create any sysfs links or files for this scenario - can userspace get > this info somehow? > > /sys/devices/system/memory/memoryXX/phys_device needs to be properly implemented > for this to work I think, see Documentation/ABI/testing/sysfs-memory > > The following test patch works toward that direction. Let me know if it's of > interest or if there are better ideas /comments. How about use ../PNP0C80:00/physical_node/resources file? In my system, the file shows following information. $ cat /sys/bus/acpi/devices/PNP0C80\:00/physical_node/resources state = active mem 0x0-0x80000000 mem 0x100000000-0x800000000 It means PNP0C80:00's memory ranges are "0x0-0x7fffffff" and "0x100000000-0x7ffffffff". In x86 architecture, memory section size is 128MiB. So, if these memory range is divided by 128MiB, you can calculate memory section number as follow: 0x0-0x7fffffff => 0x0-0x10 0x100000000-0x7ffffffff => 0x20-0xff But there is one problem. The problem is that resources file of added memory is not created. If the problem is fixed, I think you can use the way. > > From: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com> > Date: Tue, 19 Feb 2013 18:36:25 +0100 > Subject: [RFC PATCH] acpi / memory-hotplug: implement phys_device > > In order for userspace to know which memblocks in: > /sys/devices/system/memory/memoryXX correspond to which acpi memory devices in: > /sys/devices/LNXSYSTM:00/LNXSYSBUS:00/PNP0C80:YY, > /sys/devices/system/memory/memoryXX/phys_device should return a name (or index > YY) of the memory device each memblock XX belongs to. > > WIth this patch, the acpi mem_hotplug driver keeps a global list of acpi memory > devices (inserted in hotplug_order). The base memory driver checks against this > list in arch_get_memory_phys_device to determine the zero-based index of the > physical memory device each new memblock belongs to. > > For initial memory or for non-acpi/hotplug enabled systems, phys_device is > always -1. > > Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com> > --- > Documentation/ABI/testing/sysfs-devices-memory | 8 ++++++- > drivers/acpi/acpi_memhotplug.c | 27 ++++++++++++++++++++++++ > drivers/base/memory.c | 7 +++++- > include/linux/acpi.h | 2 + > 4 files changed, 42 insertions(+), 2 deletions(-) > > diff --git a/Documentation/ABI/testing/sysfs-devices-memory b/Documentation/ABI/testing/sysfs-devices-memory > index 7405de2..290c62a 100644 > --- a/Documentation/ABI/testing/sysfs-devices-memory > +++ b/Documentation/ABI/testing/sysfs-devices-memory > @@ -27,7 +27,13 @@ Contact: Badari Pulavarty <pbadari@us.ibm.com> > Description: > The file /sys/devices/system/memory/memoryX/phys_device > is read-only and is designed to show the name of physical > - memory device. Implementation is currently incomplete. > + memory device. Implementation is currently incomplete. In a > + system with CONFIG_ACPI_HOTPLUG_MEMORY=n, phys_device is always > + -1. In a system with CONFIG_ACPI_HOTPLUG_MEMORY=y, phys_device > + is -1 for all initial / non-hot-removable memory. For > + memory that has been hot-plugged, phys_device will return the > + zero-based index of the physical device that this memory block > + belongs to. Indices are determined by hotplug order. > > What: /sys/devices/system/memory/memoryX/phys_index > Date: September 2008 > diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c > index 3be9501..4154dc5 100644 > --- a/drivers/acpi/acpi_memhotplug.c > +++ b/drivers/acpi/acpi_memhotplug.c > @@ -48,6 +48,7 @@ ACPI_MODULE_NAME("acpi_memhotplug"); > #define MEMORY_POWER_ON_STATE 1 > #define MEMORY_POWER_OFF_STATE 2 > > +static LIST_HEAD(acpi_mem_device_list); > static int acpi_memory_device_add(struct acpi_device *device, > const struct acpi_device_id *not_used); > static void acpi_memory_device_remove(struct acpi_device *device); > @@ -81,6 +82,7 @@ struct acpi_memory_device { > struct acpi_device * device; > unsigned int state; /* State of the memory device */ > struct list_head res_list; > + struct list_head mem_device_list; > }; > > static acpi_status > @@ -287,6 +289,7 @@ static int acpi_memory_device_add(struct acpi_device *device, > return -ENOMEM; > > INIT_LIST_HEAD(&mem_device->res_list); > + INIT_LIST_HEAD(&mem_device->mem_device_list); > mem_device->device = device; > sprintf(acpi_device_name(device), "%s", ACPI_MEMORY_DEVICE_NAME); > sprintf(acpi_device_class(device), "%s", ACPI_MEMORY_DEVICE_CLASS); > @@ -308,9 +311,11 @@ static int acpi_memory_device_add(struct acpi_device *device, > return 0; > } > > + list_add_tail(&mem_device->mem_device_list, &acpi_mem_device_list); > result = acpi_memory_enable_device(mem_device); > if (result) { > dev_err(&device->dev, "acpi_memory_enable_device() error\n"); > + list_del(&mem_device->mem_device_list); > acpi_memory_device_free(mem_device); > return -ENODEV; > } > @@ -328,9 +333,31 @@ static void acpi_memory_device_remove(struct acpi_device *device) > > mem_device = acpi_driver_data(device); > acpi_memory_remove_memory(mem_device); > + list_del(&mem_device->mem_device_list); > acpi_memory_device_free(mem_device); > } > > +int acpi_memory_phys_device(unsigned long start_pfn) > +{ > + struct acpi_memory_device *mem_dev; > + struct acpi_memory_info *info; > + unsigned long start_addr = start_pfn << PAGE_SHIFT; > + int id = 0; > + > + list_for_each_entry(mem_dev, &acpi_mem_device_list, mem_device_list) { > + list_for_each_entry(info, &mem_dev->res_list, list) { > + if ((info->start_addr <= start_addr) && > + (info->start_addr + info->length > start_addr)) > + return id; > + } > + id++; > + } I don't think this solve your problem. When hot adding memory device in my system, consecutive index number is applied to PNP0C80 as follows: $ ls /sys/bus/acpi/devices/ |grep PNP0C80 PNP0C80:00 PNP0C80:01 => hot added memory device PNP0C80:02 => hot added memory device In this case, we can know PNP0C80:YY by memoryXX/phys_device file. But if hot removing and adding the same device, index number is changed as follows: $ ls /sys/bus/acpi/devices/ PNP0C80:00 PNP0C80:03 => hot added memory device PNP0C80:04 => hot added memory device In this case, we cannot know PNP0C80:YY by memoryXX/phys_device file. Thanks, Yasuaki Ishimatsu > + > + /* Memory not associated with a hot-pluggable device gets -1. For > + * example, initial memory. */ > + return -1; > +} > + > void __init acpi_memory_hotplug_init(void) > { > acpi_scan_add_handler_with_hotplug(&memory_device_handler, "memory"); > diff --git a/drivers/base/memory.c b/drivers/base/memory.c > index 8300a18..2cc98df 100644 > --- a/drivers/base/memory.c > +++ b/drivers/base/memory.c > @@ -22,6 +22,7 @@ > #include <linux/mutex.h> > #include <linux/stat.h> > #include <linux/slab.h> > +#include <linux/acpi.h> > > #include <linux/atomic.h> > #include <asm/uaccess.h> > @@ -522,7 +523,11 @@ static inline int memory_fail_init(void) > */ > int __weak arch_get_memory_phys_device(unsigned long start_pfn) > { > - return 0; > +#ifdef CONFIG_ACPI_HOTPLUG_MEMORY > + return acpi_memory_phys_device(start_pfn); > +#else > + return -1; > +#endif > } > > /* > diff --git a/include/linux/acpi.h b/include/linux/acpi.h > index f46cfd7..00302fc 100644 > --- a/include/linux/acpi.h > +++ b/include/linux/acpi.h > @@ -562,6 +562,8 @@ static inline __printf(3, 4) void > acpi_handle_printk(const char *level, void *handle, const char *fmt, ...) {} > #endif /* !CONFIG_ACPI */ > > +int acpi_memory_phys_device(unsigned long start_pfn); > + > /* > * acpi_handle_<level>: Print message with ACPI prefix and object path > * > -- To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Yasuaki, On Wed, Feb 20, 2013 at 12:35:48PM +0900, Yasuaki Ishimatsu wrote: > Hi Vasilis, > > 2013/02/20 3:11, Vasilis Liaskovitis wrote: > >Hi, > > > >On Sun, Feb 17, 2013 at 04:27:18PM +0100, Rafael J. Wysocki wrote: > >>From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > >> > >>Make the ACPI memory hotplug driver use struct acpi_scan_handler > >>for representing the object used to set up ACPI memory hotplug > >>functionality and to remove hotplug memory ranges and data > >>structures used by the driver before unregistering ACPI device > >>nodes representing memory. Register the new struct acpi_scan_handler > >>object with the help of acpi_scan_add_handler_with_hotplug() to allow > >>user space to manipulate the attributes of the memory hotplug > >>profile. > > > >Let's consider an example where we want acpi memory device ejection to be safely > >handled by userspace. We do the following: > > > >echo 0 > /sys/firmware/acpi/hotplug/memory/autoeject > >echo 1 > /sys/firmware/acpi/hotplug/memory/uevents > > > >We succesfully hotplug acpi device: > >/sys/devices/LNXSYSTM:00/LNXSYSBUS:00/PNP0C80:00 > >and its corresponding memblocks /sys/devices/system/memory/memoryXX are > >also successfully onlined. > > > >On an eject request, since uevents == 1, the kernel will emit KOBJ_OFFLINE for: > >/sys/devices/LNXSYSTM:00/LNXSYSBUS:00/PNP0C80:00 > > > >Can userspace know which memblocks in /sys/devices/system/memory/memoryXX/ > >correspond to the acpi device /sys/devices/LNXSYSTM:00/LNXSYSBUS:00/PNP0C80:00 ? > >This will be needed so that userspace tries to offline the memblocks (and only > >if successful, issue the eject operation on the acpi device). As far as I see, > >we don't create any sysfs links or files for this scenario - can userspace get > >this info somehow? > > > > >/sys/devices/system/memory/memoryXX/phys_device needs to be properly implemented > >for this to work I think, see Documentation/ABI/testing/sysfs-memory > > > >The following test patch works toward that direction. Let me know if it's of > >interest or if there are better ideas /comments. > > How about use ../PNP0C80:00/physical_node/resources file? > In my system, the file shows following information. > > $ cat /sys/bus/acpi/devices/PNP0C80\:00/physical_node/resources > state = active > mem 0x0-0x80000000 > mem 0x100000000-0x800000000 > > It means PNP0C80:00's memory ranges are "0x0-0x7fffffff" and > "0x100000000-0x7ffffffff". In x86 architecture, memory section size is > 128MiB. So, if these memory range is divided by 128MiB, you can > calculate memory section number as follow: > > 0x0-0x7fffffff => 0x0-0x10 > 0x100000000-0x7ffffffff => 0x20-0xff > > But there is one problem. The problem is that resources file of added memory > is not created. If the problem is fixed, I think you can use the way. thanks for your suggestion. Is this resources file a property of the physical_node or of each acpi devices? If it's a node specific file could there be a chance that adjacent memory ranges get merged? We 'd like these to not get merged. I will look more into this property. I don't see it currently in my system (probably because initial memory is not backed by acpi devices in my seabios/virtual machine). > [...] > >+int acpi_memory_phys_device(unsigned long start_pfn) > >+{ > >+ struct acpi_memory_device *mem_dev; > >+ struct acpi_memory_info *info; > >+ unsigned long start_addr = start_pfn << PAGE_SHIFT; > >+ int id = 0; > >+ > >+ list_for_each_entry(mem_dev, &acpi_mem_device_list, mem_device_list) { > >+ list_for_each_entry(info, &mem_dev->res_list, list) { > >+ if ((info->start_addr <= start_addr) && > >+ (info->start_addr + info->length > start_addr)) > >+ return id; > >+ } > >+ id++; > >+ } > > I don't think this solve your problem. > > When hot adding memory device in my system, consecutive index number is > applied to PNP0C80 as follows: > > $ ls /sys/bus/acpi/devices/ |grep PNP0C80 > PNP0C80:00 > PNP0C80:01 => hot added memory device > PNP0C80:02 => hot added memory device > > In this case, we can know PNP0C80:YY by memoryXX/phys_device file. > But if hot removing and adding the same device, index number is changed > as follows: > > $ ls /sys/bus/acpi/devices/ > PNP0C80:00 > PNP0C80:03 => hot added memory device > PNP0C80:04 => hot added memory device > > In this case, we cannot know PNP0C80:YY by memoryXX/phys_device file. > thanks, yes you are right. I forgot each new hotplug event will create a new PNP0C80:XX device where XX always increases. So the hot-add/hot-remove/hot-add scenario would have a problem. Then it would be enough to be able to return this monotonically increasing counter from phys_device instead of the current list iterator. Is this counter available somehwere in drivers/acpi/scan.c or bus.c? I 'll take a look. thanks, - Vasilis -- To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2013-02-20 at 11:42 +0100, Vasilis Liaskovitis wrote: > Hi Yasuaki, > > On Wed, Feb 20, 2013 at 12:35:48PM +0900, Yasuaki Ishimatsu wrote: > > Hi Vasilis, > > > > 2013/02/20 3:11, Vasilis Liaskovitis wrote: > > >Hi, > > > > > >On Sun, Feb 17, 2013 at 04:27:18PM +0100, Rafael J. Wysocki wrote: > > >>From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > >> > > >>Make the ACPI memory hotplug driver use struct acpi_scan_handler > > >>for representing the object used to set up ACPI memory hotplug > > >>functionality and to remove hotplug memory ranges and data > > >>structures used by the driver before unregistering ACPI device > > >>nodes representing memory. Register the new struct acpi_scan_handler > > >>object with the help of acpi_scan_add_handler_with_hotplug() to allow > > >>user space to manipulate the attributes of the memory hotplug > > >>profile. > > > > > >Let's consider an example where we want acpi memory device ejection to be safely > > >handled by userspace. We do the following: > > > > > >echo 0 > /sys/firmware/acpi/hotplug/memory/autoeject > > >echo 1 > /sys/firmware/acpi/hotplug/memory/uevents > > > > > >We succesfully hotplug acpi device: > > >/sys/devices/LNXSYSTM:00/LNXSYSBUS:00/PNP0C80:00 > > >and its corresponding memblocks /sys/devices/system/memory/memoryXX are > > >also successfully onlined. > > > > > >On an eject request, since uevents == 1, the kernel will emit KOBJ_OFFLINE for: > > >/sys/devices/LNXSYSTM:00/LNXSYSBUS:00/PNP0C80:00 > > > > > >Can userspace know which memblocks in /sys/devices/system/memory/memoryXX/ > > >correspond to the acpi device /sys/devices/LNXSYSTM:00/LNXSYSBUS:00/PNP0C80:00 ? > > >This will be needed so that userspace tries to offline the memblocks (and only > > >if successful, issue the eject operation on the acpi device). As far as I see, > > >we don't create any sysfs links or files for this scenario - can userspace get > > >this info somehow? > > > > > > > >/sys/devices/system/memory/memoryXX/phys_device needs to be properly implemented > > >for this to work I think, see Documentation/ABI/testing/sysfs-memory > > > > > >The following test patch works toward that direction. Let me know if it's of > > >interest or if there are better ideas /comments. > > > > How about use ../PNP0C80:00/physical_node/resources file? > > In my system, the file shows following information. > > > > $ cat /sys/bus/acpi/devices/PNP0C80\:00/physical_node/resources > > state = active > > mem 0x0-0x80000000 > > mem 0x100000000-0x800000000 > > > > It means PNP0C80:00's memory ranges are "0x0-0x7fffffff" and > > "0x100000000-0x7ffffffff". In x86 architecture, memory section size is > > 128MiB. So, if these memory range is divided by 128MiB, you can > > calculate memory section number as follow: > > > > 0x0-0x7fffffff => 0x0-0x10 > > 0x100000000-0x7ffffffff => 0x20-0xff > > > > But there is one problem. The problem is that resources file of added memory > > is not created. If the problem is fixed, I think you can use the way. > > thanks for your suggestion. Is this resources file a property of the > physical_node or of each acpi devices? > > If it's a node specific file could there be a chance that adjacent memory > ranges get merged? We 'd like these to not get merged. > > I will look more into this property. I don't see it currently in my system > (probably because initial memory is not backed by acpi devices in my > seabios/virtual machine). I have been thinking about this issue as well. In case of CPUs, we have a sysdev link that links LNXCPU:%d to its system/cpu/cpu%d file. /sys/bus/acpi/devices/LNXCPU:02 > ll sysdev lrwxrwxrwx. 1 root root 0 Feb 8 13:52 sysdev -> ../../system/cpu/cpu2 So, one possible approach is to create a sysdev directory under PNP0C080:%d, and then create links to associated system/cpu/memory%d files underneath. What do you think? Thanks, -Toshi -- To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wednesday, February 20, 2013 02:50:12 PM Toshi Kani wrote: > On Wed, 2013-02-20 at 11:42 +0100, Vasilis Liaskovitis wrote: > > Hi Yasuaki, > > > > On Wed, Feb 20, 2013 at 12:35:48PM +0900, Yasuaki Ishimatsu wrote: > > > Hi Vasilis, > > > > > > 2013/02/20 3:11, Vasilis Liaskovitis wrote: > > > >Hi, > > > > > > > >On Sun, Feb 17, 2013 at 04:27:18PM +0100, Rafael J. Wysocki wrote: > > > >>From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > > >> > > > >>Make the ACPI memory hotplug driver use struct acpi_scan_handler > > > >>for representing the object used to set up ACPI memory hotplug > > > >>functionality and to remove hotplug memory ranges and data > > > >>structures used by the driver before unregistering ACPI device > > > >>nodes representing memory. Register the new struct acpi_scan_handler > > > >>object with the help of acpi_scan_add_handler_with_hotplug() to allow > > > >>user space to manipulate the attributes of the memory hotplug > > > >>profile. > > > > > > > >Let's consider an example where we want acpi memory device ejection to be safely > > > >handled by userspace. We do the following: > > > > > > > >echo 0 > /sys/firmware/acpi/hotplug/memory/autoeject > > > >echo 1 > /sys/firmware/acpi/hotplug/memory/uevents > > > > > > > >We succesfully hotplug acpi device: > > > >/sys/devices/LNXSYSTM:00/LNXSYSBUS:00/PNP0C80:00 > > > >and its corresponding memblocks /sys/devices/system/memory/memoryXX are > > > >also successfully onlined. > > > > > > > >On an eject request, since uevents == 1, the kernel will emit KOBJ_OFFLINE for: > > > >/sys/devices/LNXSYSTM:00/LNXSYSBUS:00/PNP0C80:00 > > > > > > > >Can userspace know which memblocks in /sys/devices/system/memory/memoryXX/ > > > >correspond to the acpi device /sys/devices/LNXSYSTM:00/LNXSYSBUS:00/PNP0C80:00 ? > > > >This will be needed so that userspace tries to offline the memblocks (and only > > > >if successful, issue the eject operation on the acpi device). As far as I see, > > > >we don't create any sysfs links or files for this scenario - can userspace get > > > >this info somehow? > > > > > > > > > > >/sys/devices/system/memory/memoryXX/phys_device needs to be properly implemented > > > >for this to work I think, see Documentation/ABI/testing/sysfs-memory > > > > > > > >The following test patch works toward that direction. Let me know if it's of > > > >interest or if there are better ideas /comments. > > > > > > How about use ../PNP0C80:00/physical_node/resources file? > > > In my system, the file shows following information. > > > > > > $ cat /sys/bus/acpi/devices/PNP0C80\:00/physical_node/resources > > > state = active > > > mem 0x0-0x80000000 > > > mem 0x100000000-0x800000000 > > > > > > It means PNP0C80:00's memory ranges are "0x0-0x7fffffff" and > > > "0x100000000-0x7ffffffff". In x86 architecture, memory section size is > > > 128MiB. So, if these memory range is divided by 128MiB, you can > > > calculate memory section number as follow: > > > > > > 0x0-0x7fffffff => 0x0-0x10 > > > 0x100000000-0x7ffffffff => 0x20-0xff > > > > > > But there is one problem. The problem is that resources file of added memory > > > is not created. If the problem is fixed, I think you can use the way. > > > > thanks for your suggestion. Is this resources file a property of the > > physical_node or of each acpi devices? > > > > If it's a node specific file could there be a chance that adjacent memory > > ranges get merged? We 'd like these to not get merged. > > > > I will look more into this property. I don't see it currently in my system > > (probably because initial memory is not backed by acpi devices in my > > seabios/virtual machine). > > I have been thinking about this issue as well. In case of CPUs, we have > a sysdev link that links LNXCPU:%d to its system/cpu/cpu%d file. > > /sys/bus/acpi/devices/LNXCPU:02 > ll sysdev > lrwxrwxrwx. 1 root root 0 Feb 8 13:52 sysdev -> ../../system/cpu/cpu2 > > So, one possible approach is to create a sysdev directory under > PNP0C080:%d, and then create links to associated system/cpu/memory%d > files underneath. What do you think? I need to think about that a bit more, quite frankly. I probably would prefer links to be created directly from under PNP0C080:%d to system/cpu/memory%d (in analogy with PCI and other devices having ACPI companions). One of the problems here is that the whole /sys/bus/acpi/ stuff is kind of confusing, because it makes people think that there's an "ACPI bus", while there's no such thing. For this reason, it should go away and the /sys/devices/LNXSYSTM:00/ directory should be moved somewhere under /sys/firmware/acpi/. Of course, that's long-term, but this means I'd prefer not to add more stuff under /sys/devices/LNXSYSTM:00/ that may potentially cause that to be more difficult to do in the future. Thanks, Rafael
On Wed, 2013-02-20 at 23:29 +0100, Rafael J. Wysocki wrote: > On Wednesday, February 20, 2013 02:50:12 PM Toshi Kani wrote: > > On Wed, 2013-02-20 at 11:42 +0100, Vasilis Liaskovitis wrote: > > > Hi Yasuaki, > > > > > > On Wed, Feb 20, 2013 at 12:35:48PM +0900, Yasuaki Ishimatsu wrote: > > > > Hi Vasilis, > > > > > > > > 2013/02/20 3:11, Vasilis Liaskovitis wrote: > > > > >Hi, > > > > > > > > > >On Sun, Feb 17, 2013 at 04:27:18PM +0100, Rafael J. Wysocki wrote: > > > > >>From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > > > >> > > > > >>Make the ACPI memory hotplug driver use struct acpi_scan_handler > > > > >>for representing the object used to set up ACPI memory hotplug > > > > >>functionality and to remove hotplug memory ranges and data > > > > >>structures used by the driver before unregistering ACPI device > > > > >>nodes representing memory. Register the new struct acpi_scan_handler > > > > >>object with the help of acpi_scan_add_handler_with_hotplug() to allow > > > > >>user space to manipulate the attributes of the memory hotplug > > > > >>profile. > > > > > > > > > >Let's consider an example where we want acpi memory device ejection to be safely > > > > >handled by userspace. We do the following: > > > > > > > > > >echo 0 > /sys/firmware/acpi/hotplug/memory/autoeject > > > > >echo 1 > /sys/firmware/acpi/hotplug/memory/uevents > > > > > > > > > >We succesfully hotplug acpi device: > > > > >/sys/devices/LNXSYSTM:00/LNXSYSBUS:00/PNP0C80:00 > > > > >and its corresponding memblocks /sys/devices/system/memory/memoryXX are > > > > >also successfully onlined. > > > > > > > > > >On an eject request, since uevents == 1, the kernel will emit KOBJ_OFFLINE for: > > > > >/sys/devices/LNXSYSTM:00/LNXSYSBUS:00/PNP0C80:00 > > > > > > > > > >Can userspace know which memblocks in /sys/devices/system/memory/memoryXX/ > > > > >correspond to the acpi device /sys/devices/LNXSYSTM:00/LNXSYSBUS:00/PNP0C80:00 ? > > > > >This will be needed so that userspace tries to offline the memblocks (and only > > > > >if successful, issue the eject operation on the acpi device). As far as I see, > > > > >we don't create any sysfs links or files for this scenario - can userspace get > > > > >this info somehow? > > > > > > > > > > > > > >/sys/devices/system/memory/memoryXX/phys_device needs to be properly implemented > > > > >for this to work I think, see Documentation/ABI/testing/sysfs-memory > > > > > > > > > >The following test patch works toward that direction. Let me know if it's of > > > > >interest or if there are better ideas /comments. > > > > > > > > How about use ../PNP0C80:00/physical_node/resources file? > > > > In my system, the file shows following information. > > > > > > > > $ cat /sys/bus/acpi/devices/PNP0C80\:00/physical_node/resources > > > > state = active > > > > mem 0x0-0x80000000 > > > > mem 0x100000000-0x800000000 > > > > > > > > It means PNP0C80:00's memory ranges are "0x0-0x7fffffff" and > > > > "0x100000000-0x7ffffffff". In x86 architecture, memory section size is > > > > 128MiB. So, if these memory range is divided by 128MiB, you can > > > > calculate memory section number as follow: > > > > > > > > 0x0-0x7fffffff => 0x0-0x10 > > > > 0x100000000-0x7ffffffff => 0x20-0xff > > > > > > > > But there is one problem. The problem is that resources file of added memory > > > > is not created. If the problem is fixed, I think you can use the way. > > > > > > thanks for your suggestion. Is this resources file a property of the > > > physical_node or of each acpi devices? > > > > > > If it's a node specific file could there be a chance that adjacent memory > > > ranges get merged? We 'd like these to not get merged. > > > > > > I will look more into this property. I don't see it currently in my system > > > (probably because initial memory is not backed by acpi devices in my > > > seabios/virtual machine). > > > > I have been thinking about this issue as well. In case of CPUs, we have > > a sysdev link that links LNXCPU:%d to its system/cpu/cpu%d file. > > > > /sys/bus/acpi/devices/LNXCPU:02 > ll sysdev > > lrwxrwxrwx. 1 root root 0 Feb 8 13:52 sysdev -> ../../system/cpu/cpu2 > > > > So, one possible approach is to create a sysdev directory under > > PNP0C080:%d, and then create links to associated system/cpu/memory%d > > files underneath. What do you think? > > I need to think about that a bit more, quite frankly. > > I probably would prefer links to be created directly from under PNP0C080:%d > to system/cpu/memory%d (in analogy with PCI and other devices having ACPI > companions). > > One of the problems here is that the whole /sys/bus/acpi/ stuff is kind of > confusing, because it makes people think that there's an "ACPI bus", while > there's no such thing. For this reason, it should go away and the > /sys/devices/LNXSYSTM:00/ directory should be moved somewhere under > /sys/firmware/acpi/. Of course, that's long-term, but this means I'd prefer > not to add more stuff under /sys/devices/LNXSYSTM:00/ that may potentially > cause that to be more difficult to do in the future. I see. I think PCI creates files under PNP0A08:%d/physical_node. Either way, we need to do something in this space. I will think about a bit more as well. Thanks, -Toshi -- To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Vasilis, 2013/02/20 19:42, Vasilis Liaskovitis wrote: > Hi Yasuaki, > > On Wed, Feb 20, 2013 at 12:35:48PM +0900, Yasuaki Ishimatsu wrote: >> Hi Vasilis, >> >> 2013/02/20 3:11, Vasilis Liaskovitis wrote: >>> Hi, >>> >>> On Sun, Feb 17, 2013 at 04:27:18PM +0100, Rafael J. Wysocki wrote: >>>> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> >>>> >>>> Make the ACPI memory hotplug driver use struct acpi_scan_handler >>>> for representing the object used to set up ACPI memory hotplug >>>> functionality and to remove hotplug memory ranges and data >>>> structures used by the driver before unregistering ACPI device >>>> nodes representing memory. Register the new struct acpi_scan_handler >>>> object with the help of acpi_scan_add_handler_with_hotplug() to allow >>>> user space to manipulate the attributes of the memory hotplug >>>> profile. >>> >>> Let's consider an example where we want acpi memory device ejection to be safely >>> handled by userspace. We do the following: >>> >>> echo 0 > /sys/firmware/acpi/hotplug/memory/autoeject >>> echo 1 > /sys/firmware/acpi/hotplug/memory/uevents >>> >>> We succesfully hotplug acpi device: >>> /sys/devices/LNXSYSTM:00/LNXSYSBUS:00/PNP0C80:00 >>> and its corresponding memblocks /sys/devices/system/memory/memoryXX are >>> also successfully onlined. >>> >>> On an eject request, since uevents == 1, the kernel will emit KOBJ_OFFLINE for: >>> /sys/devices/LNXSYSTM:00/LNXSYSBUS:00/PNP0C80:00 >>> >>> Can userspace know which memblocks in /sys/devices/system/memory/memoryXX/ >>> correspond to the acpi device /sys/devices/LNXSYSTM:00/LNXSYSBUS:00/PNP0C80:00 ? >>> This will be needed so that userspace tries to offline the memblocks (and only >>> if successful, issue the eject operation on the acpi device). As far as I see, >>> we don't create any sysfs links or files for this scenario - can userspace get >>> this info somehow? >> >>> >>> /sys/devices/system/memory/memoryXX/phys_device needs to be properly implemented >>> for this to work I think, see Documentation/ABI/testing/sysfs-memory >>> >>> The following test patch works toward that direction. Let me know if it's of >>> interest or if there are better ideas /comments. >> >> How about use ../PNP0C80:00/physical_node/resources file? >> In my system, the file shows following information. >> >> $ cat /sys/bus/acpi/devices/PNP0C80\:00/physical_node/resources >> state = active >> mem 0x0-0x80000000 >> mem 0x100000000-0x800000000 >> >> It means PNP0C80:00's memory ranges are "0x0-0x7fffffff" and >> "0x100000000-0x7ffffffff". In x86 architecture, memory section size is >> 128MiB. So, if these memory range is divided by 128MiB, you can >> calculate memory section number as follow: >> >> 0x0-0x7fffffff => 0x0-0x10 >> 0x100000000-0x7ffffffff => 0x20-0xff >> >> But there is one problem. The problem is that resources file of added memory >> is not created. If the problem is fixed, I think you can use the way. > > thanks for your suggestion. Is this resources file a property of the > physical_node or of each acpi devices? > > If it's a node specific file could there be a chance that adjacent memory > ranges get merged? We 'd like these to not get merged. This information is created by pnppacpi_init(). It seems that: - resources file is created to each acpi_devices. - the memory range does not get merged. Thanks, Yasuaki Ishimatsu > > I will look more into this property. I don't see it currently in my system > (probably because initial memory is not backed by acpi devices in my > seabios/virtual machine). > >> > [...] >>> +int acpi_memory_phys_device(unsigned long start_pfn) >>> +{ >>> + struct acpi_memory_device *mem_dev; >>> + struct acpi_memory_info *info; >>> + unsigned long start_addr = start_pfn << PAGE_SHIFT; >>> + int id = 0; >>> + >>> + list_for_each_entry(mem_dev, &acpi_mem_device_list, mem_device_list) { >>> + list_for_each_entry(info, &mem_dev->res_list, list) { >>> + if ((info->start_addr <= start_addr) && >>> + (info->start_addr + info->length > start_addr)) >>> + return id; >>> + } >>> + id++; >>> + } >> >> I don't think this solve your problem. >> >> When hot adding memory device in my system, consecutive index number is >> applied to PNP0C80 as follows: >> >> $ ls /sys/bus/acpi/devices/ |grep PNP0C80 >> PNP0C80:00 >> PNP0C80:01 => hot added memory device >> PNP0C80:02 => hot added memory device >> >> In this case, we can know PNP0C80:YY by memoryXX/phys_device file. >> But if hot removing and adding the same device, index number is changed >> as follows: >> >> $ ls /sys/bus/acpi/devices/ >> PNP0C80:00 >> PNP0C80:03 => hot added memory device >> PNP0C80:04 => hot added memory device >> >> In this case, we cannot know PNP0C80:YY by memoryXX/phys_device file. >> > > thanks, yes you are right. I forgot each new hotplug event will create a new > PNP0C80:XX device where XX always increases. So the hot-add/hot-remove/hot-add > scenario would have a problem. > Then it would be enough to be able to return this monotonically increasing > counter from phys_device instead of the current list iterator. Is this counter > available somehwere in drivers/acpi/scan.c or bus.c? I 'll take a look. > > thanks, > > - Vasilis > -- To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/Documentation/ABI/testing/sysfs-devices-memory b/Documentation/ABI/testing/sysfs-devices-memory index 7405de2..290c62a 100644 --- a/Documentation/ABI/testing/sysfs-devices-memory +++ b/Documentation/ABI/testing/sysfs-devices-memory @@ -27,7 +27,13 @@ Contact: Badari Pulavarty <pbadari@us.ibm.com> Description: The file /sys/devices/system/memory/memoryX/phys_device is read-only and is designed to show the name of physical - memory device. Implementation is currently incomplete. + memory device. Implementation is currently incomplete. In a + system with CONFIG_ACPI_HOTPLUG_MEMORY=n, phys_device is always + -1. In a system with CONFIG_ACPI_HOTPLUG_MEMORY=y, phys_device + is -1 for all initial / non-hot-removable memory. For + memory that has been hot-plugged, phys_device will return the + zero-based index of the physical device that this memory block + belongs to. Indices are determined by hotplug order. What: /sys/devices/system/memory/memoryX/phys_index Date: September 2008 diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c index 3be9501..4154dc5 100644 --- a/drivers/acpi/acpi_memhotplug.c +++ b/drivers/acpi/acpi_memhotplug.c @@ -48,6 +48,7 @@ ACPI_MODULE_NAME("acpi_memhotplug"); #define MEMORY_POWER_ON_STATE 1 #define MEMORY_POWER_OFF_STATE 2 +static LIST_HEAD(acpi_mem_device_list); static int acpi_memory_device_add(struct acpi_device *device, const struct acpi_device_id *not_used); static void acpi_memory_device_remove(struct acpi_device *device); @@ -81,6 +82,7 @@ struct acpi_memory_device { struct acpi_device * device; unsigned int state; /* State of the memory device */ struct list_head res_list; + struct list_head mem_device_list; }; static acpi_status @@ -287,6 +289,7 @@ static int acpi_memory_device_add(struct acpi_device *device, return -ENOMEM; INIT_LIST_HEAD(&mem_device->res_list); + INIT_LIST_HEAD(&mem_device->mem_device_list); mem_device->device = device; sprintf(acpi_device_name(device), "%s", ACPI_MEMORY_DEVICE_NAME); sprintf(acpi_device_class(device), "%s", ACPI_MEMORY_DEVICE_CLASS); @@ -308,9 +311,11 @@ static int acpi_memory_device_add(struct acpi_device *device, return 0; } + list_add_tail(&mem_device->mem_device_list, &acpi_mem_device_list); result = acpi_memory_enable_device(mem_device); if (result) { dev_err(&device->dev, "acpi_memory_enable_device() error\n"); + list_del(&mem_device->mem_device_list); acpi_memory_device_free(mem_device); return -ENODEV; } @@ -328,9 +333,31 @@ static void acpi_memory_device_remove(struct acpi_device *device) mem_device = acpi_driver_data(device); acpi_memory_remove_memory(mem_device); + list_del(&mem_device->mem_device_list); acpi_memory_device_free(mem_device); } +int acpi_memory_phys_device(unsigned long start_pfn) +{ + struct acpi_memory_device *mem_dev; + struct acpi_memory_info *info; + unsigned long start_addr = start_pfn << PAGE_SHIFT; + int id = 0; + + list_for_each_entry(mem_dev, &acpi_mem_device_list, mem_device_list) { + list_for_each_entry(info, &mem_dev->res_list, list) { + if ((info->start_addr <= start_addr) && + (info->start_addr + info->length > start_addr)) + return id; + } + id++; + } + + /* Memory not associated with a hot-pluggable device gets -1. For + * example, initial memory. */ + return -1; +} + void __init acpi_memory_hotplug_init(void) { acpi_scan_add_handler_with_hotplug(&memory_device_handler, "memory"); diff --git a/drivers/base/memory.c b/drivers/base/memory.c index 8300a18..2cc98df 100644 --- a/drivers/base/memory.c +++ b/drivers/base/memory.c @@ -22,6 +22,7 @@ #include <linux/mutex.h> #include <linux/stat.h> #include <linux/slab.h> +#include <linux/acpi.h> #include <linux/atomic.h> #include <asm/uaccess.h> @@ -522,7 +523,11 @@ static inline int memory_fail_init(void) */ int __weak arch_get_memory_phys_device(unsigned long start_pfn) { - return 0; +#ifdef CONFIG_ACPI_HOTPLUG_MEMORY + return acpi_memory_phys_device(start_pfn); +#else + return -1; +#endif } /* diff --git a/include/linux/acpi.h b/include/linux/acpi.h index f46cfd7..00302fc 100644 --- a/include/linux/acpi.h +++ b/include/linux/acpi.h @@ -562,6 +562,8 @@ static inline __printf(3, 4) void acpi_handle_printk(const char *level, void *handle, const char *fmt, ...) {} #endif /* !CONFIG_ACPI */ +int acpi_memory_phys_device(unsigned long start_pfn); + /* * acpi_handle_<level>: Print message with ACPI prefix and object path *