[RFT] ACPI: OSL: Use rwlock instead of RCU for memory management
diff mbox series

Message ID 2643462.teTRrieJB7@kreacher
State New
Headers show
Series
  • [RFT] ACPI: OSL: Use rwlock instead of RCU for memory management
Related show

Commit Message

Rafael J. Wysocki June 5, 2020, 2:06 p.m. UTC
From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Subject: [PATCH] ACPI: OSL: Use rwlock instead of RCU for memory management

The ACPI OS layer uses RCU to protect the list of ACPI memory
mappings from being walked while it is updated.  Among other
situations, that list can be walked in non-NMI interrupt context,
so using a sleeping lock to protect it is not an option.

However, performance issues related to the RCU usage in there
appear, as described by Dan Williams:

"Recently a performance problem was reported for a process invoking
a non-trival ASL program. The method call in this case ends up
repetitively triggering a call path like:

    acpi_ex_store
    acpi_ex_store_object_to_node
    acpi_ex_write_data_to_field
    acpi_ex_insert_into_field
    acpi_ex_write_with_update_rule
    acpi_ex_field_datum_io
    acpi_ex_access_region
    acpi_ev_address_space_dispatch
    acpi_ex_system_memory_space_handler
    acpi_os_map_cleanup.part.14
    _synchronize_rcu_expedited.constprop.89
    schedule

The end result of frequent synchronize_rcu_expedited() invocation is
tiny sub-millisecond spurts of execution where the scheduler freely
migrates this apparently sleepy task. The overhead of frequent
scheduler invocation multiplies the execution time by a factor
of 2-3X."

In order to avoid these issues, replace the RCU in the ACPI OS
layer by an rwlock.

That rwlock should not be frequently contended, so the performance
impact of it is not expected to be significant.

Reported-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---

Hi Dan,

This is a possible fix for the ACPI OSL RCU-related performance issues, but
can you please arrange for the testing of it on the affected systems?

Cheers!

---
 drivers/acpi/osl.c |   50 ++++++++++++++++++++++++++++++++++----------------
 1 file changed, 34 insertions(+), 16 deletions(-)

Comments

Dan Williams June 5, 2020, 5:08 p.m. UTC | #1
On Fri, Jun 5, 2020 at 7:06 AM Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Subject: [PATCH] ACPI: OSL: Use rwlock instead of RCU for memory management
>
> The ACPI OS layer uses RCU to protect the list of ACPI memory
> mappings from being walked while it is updated.  Among other
> situations, that list can be walked in non-NMI interrupt context,
> so using a sleeping lock to protect it is not an option.
>
> However, performance issues related to the RCU usage in there
> appear, as described by Dan Williams:
>
> "Recently a performance problem was reported for a process invoking
> a non-trival ASL program. The method call in this case ends up
> repetitively triggering a call path like:
>
>     acpi_ex_store
>     acpi_ex_store_object_to_node
>     acpi_ex_write_data_to_field
>     acpi_ex_insert_into_field
>     acpi_ex_write_with_update_rule
>     acpi_ex_field_datum_io
>     acpi_ex_access_region
>     acpi_ev_address_space_dispatch
>     acpi_ex_system_memory_space_handler
>     acpi_os_map_cleanup.part.14
>     _synchronize_rcu_expedited.constprop.89
>     schedule
>
> The end result of frequent synchronize_rcu_expedited() invocation is
> tiny sub-millisecond spurts of execution where the scheduler freely
> migrates this apparently sleepy task. The overhead of frequent
> scheduler invocation multiplies the execution time by a factor
> of 2-3X."
>
> In order to avoid these issues, replace the RCU in the ACPI OS
> layer by an rwlock.
>
> That rwlock should not be frequently contended, so the performance
> impact of it is not expected to be significant.
>
> Reported-by: Dan Williams <dan.j.williams@intel.com>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
>
> Hi Dan,
>
> This is a possible fix for the ACPI OSL RCU-related performance issues, but
> can you please arrange for the testing of it on the affected systems?

Ugh, is it really this simple? I did not realize the read-side is NMI
safe. I'll take a look.
Andy Shevchenko June 5, 2020, 7:40 p.m. UTC | #2
On Fri, Jun 5, 2020 at 5:11 PM Rafael J. Wysocki <rjw@rjwysocki.net> wrote:

...

> +       if (!refcount) {
> +               write_lock_irq(&acpi_ioremaps_list_lock);
> +
> +               list_del(&map->list);
> +
> +               write_unlock_irq(&acpi_ioremaps_list_lock);
> +       }
>         return refcount;

It seems we can decrease indentation level at the same time:

  if (refcount)
    return refcount;

 ...
 return 0;
Rafael J. Wysocki June 6, 2020, 6:48 a.m. UTC | #3
On Fri, Jun 5, 2020 at 9:40 PM Andy Shevchenko
<andy.shevchenko@gmail.com> wrote:
>
> On Fri, Jun 5, 2020 at 5:11 PM Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>
> ...
>
> > +       if (!refcount) {
> > +               write_lock_irq(&acpi_ioremaps_list_lock);
> > +
> > +               list_del(&map->list);
> > +
> > +               write_unlock_irq(&acpi_ioremaps_list_lock);
> > +       }
> >         return refcount;
>
> It seems we can decrease indentation level at the same time:
>
>   if (refcount)
>     return refcount;
>
>  ...
>  return 0;

Right, but the patch will need to be dropped anyway I think.
Rafael J. Wysocki June 6, 2020, 6:56 a.m. UTC | #4
On Fri, Jun 5, 2020 at 7:09 PM Dan Williams <dan.j.williams@intel.com> wrote:
>
> On Fri, Jun 5, 2020 at 7:06 AM Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> >
> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > Subject: [PATCH] ACPI: OSL: Use rwlock instead of RCU for memory management
> >
> > The ACPI OS layer uses RCU to protect the list of ACPI memory
> > mappings from being walked while it is updated.  Among other
> > situations, that list can be walked in non-NMI interrupt context,
> > so using a sleeping lock to protect it is not an option.
> >
> > However, performance issues related to the RCU usage in there
> > appear, as described by Dan Williams:
> >
> > "Recently a performance problem was reported for a process invoking
> > a non-trival ASL program. The method call in this case ends up
> > repetitively triggering a call path like:
> >
> >     acpi_ex_store
> >     acpi_ex_store_object_to_node
> >     acpi_ex_write_data_to_field
> >     acpi_ex_insert_into_field
> >     acpi_ex_write_with_update_rule
> >     acpi_ex_field_datum_io
> >     acpi_ex_access_region
> >     acpi_ev_address_space_dispatch
> >     acpi_ex_system_memory_space_handler
> >     acpi_os_map_cleanup.part.14
> >     _synchronize_rcu_expedited.constprop.89
> >     schedule
> >
> > The end result of frequent synchronize_rcu_expedited() invocation is
> > tiny sub-millisecond spurts of execution where the scheduler freely
> > migrates this apparently sleepy task. The overhead of frequent
> > scheduler invocation multiplies the execution time by a factor
> > of 2-3X."
> >
> > In order to avoid these issues, replace the RCU in the ACPI OS
> > layer by an rwlock.
> >
> > That rwlock should not be frequently contended, so the performance
> > impact of it is not expected to be significant.
> >
> > Reported-by: Dan Williams <dan.j.williams@intel.com>
> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > ---
> >
> > Hi Dan,
> >
> > This is a possible fix for the ACPI OSL RCU-related performance issues, but
> > can you please arrange for the testing of it on the affected systems?
>
> Ugh, is it really this simple? I did not realize the read-side is NMI
> safe. I'll take a look.

But if an NMI triggers while the lock is being held for writing, it
will deadlock, won't it?

OTOH, according to the RCU documentation it is valid to call
rcu_read_[un]lock() from an NMI handler (see Interrupts and NMIs in
Documentation/RCU/Design/Requirements/Requirements.rst) so we are good
from this perspective today.

Unless we teach APEI to avoid mapping lookups from
apei_{read|write}(), which wouldn't be unreasonable by itself, we need
to hold on to the RCU in ACPI OSL, so it looks like addressing the
problem in ACPICA is the best way to do it (and the current ACPICA
code in question is suboptimal, so it would be good to rework it
anyway).

Cheers!
Rafael J. Wysocki June 8, 2020, 3:33 p.m. UTC | #5
On Saturday, June 6, 2020 8:56:26 AM CEST Rafael J. Wysocki wrote:
> On Fri, Jun 5, 2020 at 7:09 PM Dan Williams <dan.j.williams@intel.com> wrote:
> >
> > On Fri, Jun 5, 2020 at 7:06 AM Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > >
> > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > Subject: [PATCH] ACPI: OSL: Use rwlock instead of RCU for memory management
> > >
> > > The ACPI OS layer uses RCU to protect the list of ACPI memory
> > > mappings from being walked while it is updated.  Among other
> > > situations, that list can be walked in non-NMI interrupt context,
> > > so using a sleeping lock to protect it is not an option.
> > >
> > > However, performance issues related to the RCU usage in there
> > > appear, as described by Dan Williams:
> > >
> > > "Recently a performance problem was reported for a process invoking
> > > a non-trival ASL program. The method call in this case ends up
> > > repetitively triggering a call path like:
> > >
> > >     acpi_ex_store
> > >     acpi_ex_store_object_to_node
> > >     acpi_ex_write_data_to_field
> > >     acpi_ex_insert_into_field
> > >     acpi_ex_write_with_update_rule
> > >     acpi_ex_field_datum_io
> > >     acpi_ex_access_region
> > >     acpi_ev_address_space_dispatch
> > >     acpi_ex_system_memory_space_handler
> > >     acpi_os_map_cleanup.part.14
> > >     _synchronize_rcu_expedited.constprop.89
> > >     schedule
> > >
> > > The end result of frequent synchronize_rcu_expedited() invocation is
> > > tiny sub-millisecond spurts of execution where the scheduler freely
> > > migrates this apparently sleepy task. The overhead of frequent
> > > scheduler invocation multiplies the execution time by a factor
> > > of 2-3X."
> > >
> > > In order to avoid these issues, replace the RCU in the ACPI OS
> > > layer by an rwlock.
> > >
> > > That rwlock should not be frequently contended, so the performance
> > > impact of it is not expected to be significant.
> > >
> > > Reported-by: Dan Williams <dan.j.williams@intel.com>
> > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > ---
> > >
> > > Hi Dan,
> > >
> > > This is a possible fix for the ACPI OSL RCU-related performance issues, but
> > > can you please arrange for the testing of it on the affected systems?
> >
> > Ugh, is it really this simple? I did not realize the read-side is NMI
> > safe. I'll take a look.
> 
> But if an NMI triggers while the lock is being held for writing, it
> will deadlock, won't it?
> 
> OTOH, according to the RCU documentation it is valid to call
> rcu_read_[un]lock() from an NMI handler (see Interrupts and NMIs in
> Documentation/RCU/Design/Requirements/Requirements.rst) so we are good
> from this perspective today.
> 
> Unless we teach APEI to avoid mapping lookups from
> apei_{read|write}(), which wouldn't be unreasonable by itself, we need
> to hold on to the RCU in ACPI OSL, so it looks like addressing the
> problem in ACPICA is the best way to do it (and the current ACPICA
> code in question is suboptimal, so it would be good to rework it
> anyway).
> 
> Cheers!

I've sent the prototype patch below to you, Bob and Erik in private, so
here it goes to the lists for completeness.

It introduces a "fast-path" variant of acpi_os_map_memory() that only
returns non-NULL if a matching mapping is already there in the list
and reworks acpi_ex_system_memory_space_handler() to use it.

The idea is to do a fast-path lookup first for every new mapping and
only run the full acpi_os_map_memory() if that returns NULL and then
save the mapping return by it and do a fast-path lookup for it again
to bump up its reference counter in the OSL layer.  That should prevent
the mappings from going away until the opregions that they belong to
go away (the opregion deactivation code is updated too to remove the
saved mappings), so in the cases when there's not too much opregion
creation and removal activity, it should make the RCU-related overhead
go away.

Please test.

Cheers!

---
 drivers/acpi/acpica/evrgnini.c    |   14 ++++++++-
 drivers/acpi/acpica/exregion.c    |   49 +++++++++++++++++++++++++++++--
 drivers/acpi/osl.c                |   59 ++++++++++++++++++++++++++++----------
 include/acpi/actypes.h            |    7 ++++
 include/acpi/platform/aclinuxex.h |    2 +
 5 files changed, 112 insertions(+), 19 deletions(-)

Index: linux-pm/drivers/acpi/osl.c
===================================================================
--- linux-pm.orig/drivers/acpi/osl.c
+++ linux-pm/drivers/acpi/osl.c
@@ -302,21 +302,8 @@ static void acpi_unmap(acpi_physical_add
 		iounmap(vaddr);
 }
 
-/**
- * acpi_os_map_iomem - Get a virtual address for a given physical address range.
- * @phys: Start of the physical address range to map.
- * @size: Size of the physical address range to map.
- *
- * Look up the given physical address range in the list of existing ACPI memory
- * mappings.  If found, get a reference to it and return a pointer to it (its
- * virtual address).  If not found, map it, add it to that list and return a
- * pointer to it.
- *
- * During early init (when acpi_permanent_mmap has not been set yet) this
- * routine simply calls __acpi_map_table() to get the job done.
- */
-void __iomem __ref
-*acpi_os_map_iomem(acpi_physical_address phys, acpi_size size)
+static void __iomem __ref *__acpi_os_map_iomem(acpi_physical_address phys,
+					       acpi_size size, bool fastpath)
 {
 	struct acpi_ioremap *map;
 	void __iomem *virt;
@@ -339,6 +326,11 @@ void __iomem __ref
 		goto out;
 	}
 
+	if (fastpath) {
+		mutex_unlock(&acpi_ioremap_lock);
+		return NULL;
+	}
+
 	map = kzalloc(sizeof(*map), GFP_KERNEL);
 	if (!map) {
 		mutex_unlock(&acpi_ioremap_lock);
@@ -366,6 +358,25 @@ out:
 	mutex_unlock(&acpi_ioremap_lock);
 	return map->virt + (phys - map->phys);
 }
+
+/**
+ * acpi_os_map_iomem - Get a virtual address for a given physical address range.
+ * @phys: Start of the physical address range to map.
+ * @size: Size of the physical address range to map.
+ *
+ * Look up the given physical address range in the list of existing ACPI memory
+ * mappings.  If found, get a reference to it and return a pointer representing
+ * its virtual address.  If not found, map it, add it to that list and return a
+ * pointer representing its virtual address.
+ *
+ * During early init (when acpi_permanent_mmap has not been set yet) call
+ * __acpi_map_table() to obtain the mapping.
+ */
+void __iomem __ref *acpi_os_map_iomem(acpi_physical_address phys,
+				      acpi_size size)
+{
+	return __acpi_os_map_iomem(phys, size, false);
+}
 EXPORT_SYMBOL_GPL(acpi_os_map_iomem);
 
 void *__ref acpi_os_map_memory(acpi_physical_address phys, acpi_size size)
@@ -374,6 +385,24 @@ void *__ref acpi_os_map_memory(acpi_phys
 }
 EXPORT_SYMBOL_GPL(acpi_os_map_memory);
 
+/**
+ * acpi_os_map_memory_fastpath - Fast-path physical-to-virtual address mapping.
+ * @phys: Start of the physical address range to map.
+ * @size: Size of the physical address range to map.
+ *
+ * Look up the given physical address range in the list of existing ACPI memory
+ * mappings.  If found, get a reference to it and return a pointer representing
+ * its virtual address.  If not found, return NULL.
+ *
+ * During early init (when acpi_permanent_mmap has not been set yet) call
+ * __acpi_map_table() to obtain the mapping.
+ */
+void __ref *acpi_os_map_memory_fastpath(acpi_physical_address phys,
+					acpi_size size)
+{
+	return __acpi_os_map_iomem(phys, size, true);
+}
+
 /* Must be called with mutex_lock(&acpi_ioremap_lock) */
 static unsigned long acpi_os_drop_map_ref(struct acpi_ioremap *map)
 {
Index: linux-pm/include/acpi/actypes.h
===================================================================
--- linux-pm.orig/include/acpi/actypes.h
+++ linux-pm/include/acpi/actypes.h
@@ -1200,12 +1200,19 @@ struct acpi_pci_id {
 	u16 function;
 };
 
+struct acpi_mem_mapping {
+	u8 *logical_address;
+	acpi_size length;
+	struct acpi_mem_mapping *next;
+};
+
 struct acpi_mem_space_context {
 	u32 length;
 	acpi_physical_address address;
 	acpi_physical_address mapped_physical_address;
 	u8 *mapped_logical_address;
 	acpi_size mapped_length;
+	struct acpi_mem_mapping *first_mapping;
 };
 
 /*
Index: linux-pm/drivers/acpi/acpica/exregion.c
===================================================================
--- linux-pm.orig/drivers/acpi/acpica/exregion.c
+++ linux-pm/drivers/acpi/acpica/exregion.c
@@ -44,6 +44,9 @@ acpi_ex_system_memory_space_handler(u32
 	u32 length;
 	acpi_size map_length;
 	acpi_size page_boundary_map_length;
+#ifdef ACPI_OS_MAP_MEMORY_FASTPATH
+	struct acpi_mem_mapping *new_mapping;
+#endif
 #ifdef ACPI_MISALIGNMENT_NOT_SUPPORTED
 	u32 remainder;
 #endif
@@ -143,9 +146,20 @@ acpi_ex_system_memory_space_handler(u32
 
 		/* Create a new mapping starting at the address given */
 
-		mem_info->mapped_logical_address =
-		    acpi_os_map_memory(address, map_length);
-		if (!mem_info->mapped_logical_address) {
+#ifdef ACPI_OS_MAP_MEMORY_FASTPATH
+		/* Look for an existing mapping matching the request at hand. */
+		logical_addr_ptr = acpi_os_map_memory_fastpath(address, length);
+		if (logical_addr_ptr) {
+			/*
+			 * A matching mapping has been found, so cache it and
+			 * carry our the access as requested.
+			 */
+			goto access;
+		}
+#endif /* ACPI_OS_MAP_MEMORY_FASTPATH */
+
+		logical_addr_ptr = acpi_os_map_memory(address, map_length);
+		if (!logical_addr_ptr) {
 			ACPI_ERROR((AE_INFO,
 				    "Could not map memory at 0x%8.8X%8.8X, size %u",
 				    ACPI_FORMAT_UINT64(address),
@@ -154,8 +168,37 @@ acpi_ex_system_memory_space_handler(u32
 			return_ACPI_STATUS(AE_NO_MEMORY);
 		}
 
+#ifdef ACPI_OS_MAP_MEMORY_FASTPATH
+		new_mapping = ACPI_ALLOCATE_ZEROED(sizeof(*new_mapping));
+		if (new_mapping) {
+			new_mapping->logical_address = logical_addr_ptr;
+			new_mapping->length = map_length;
+			new_mapping->next = mem_info->first_mapping;
+			mem_info->first_mapping = new_mapping;
+			/*
+			 * Carry out an extra fast-path lookup to get one more
+			 * reference to this mapping to prevent it from getting
+			 * dropped if a future access involving this region does
+			 * not fall into it.
+			 */
+			acpi_os_map_memory_fastpath(address, map_length);
+		} else {
+			/*
+			 * No room to save the new mapping, but this is not
+			 * critical.  Just log the error and carry out the
+			 * access as requested.
+			 */
+			ACPI_ERROR((AE_INFO,
+				    "Not enough memory to save memory mapping at 0x%8.8X%8.8X, size %u",
+				    ACPI_FORMAT_UINT64(address),
+				    (u32)map_length));
+		}
+
+access:
+#endif /* ACPI_OS_MAP_MEMORY_FASTPATH */
 		/* Save the physical address and mapping size */
 
+		mem_info->mapped_logical_address = logical_addr_ptr;
 		mem_info->mapped_physical_address = address;
 		mem_info->mapped_length = map_length;
 	}
Index: linux-pm/drivers/acpi/acpica/evrgnini.c
===================================================================
--- linux-pm.orig/drivers/acpi/acpica/evrgnini.c
+++ linux-pm/drivers/acpi/acpica/evrgnini.c
@@ -38,6 +38,9 @@ acpi_ev_system_memory_region_setup(acpi_
 	union acpi_operand_object *region_desc =
 	    (union acpi_operand_object *)handle;
 	struct acpi_mem_space_context *local_region_context;
+#ifdef ACPI_OS_MAP_MEMORY_FASTPATH
+	struct acpi_mem_mapping *mapping;
+#endif
 
 	ACPI_FUNCTION_TRACE(ev_system_memory_region_setup);
 
@@ -46,13 +49,22 @@ acpi_ev_system_memory_region_setup(acpi_
 			local_region_context =
 			    (struct acpi_mem_space_context *)*region_context;
 
-			/* Delete a cached mapping if present */
+			/* Delete memory mappings if present */
 
 			if (local_region_context->mapped_length) {
 				acpi_os_unmap_memory(local_region_context->
 						     mapped_logical_address,
 						     local_region_context->
 						     mapped_length);
+#ifdef ACPI_OS_MAP_MEMORY_FASTPATH
+				while (local_region_context->first_mapping) {
+					mapping = local_region_context->first_mapping;
+					local_region_context->first_mapping = mapping->next;
+					acpi_os_unmap_memory(mapping->logical_address,
+							     mapping->length);
+					ACPI_FREE(mapping);
+				}
+#endif /* ACPI_OS_MAP_MEMORY_FASTPATH */
 			}
 			ACPI_FREE(local_region_context);
 			*region_context = NULL;
Index: linux-pm/include/acpi/platform/aclinuxex.h
===================================================================
--- linux-pm.orig/include/acpi/platform/aclinuxex.h
+++ linux-pm/include/acpi/platform/aclinuxex.h
@@ -138,6 +138,8 @@ static inline void acpi_os_terminate_deb
 /*
  * OSL interfaces added by Linux
  */
+#define ACPI_OS_MAP_MEMORY_FASTPATH
+void *acpi_os_map_memory_fastpath(acpi_physical_address where, acpi_size length);
 
 #endif				/* __KERNEL__ */
Rafael J. Wysocki June 8, 2020, 4:29 p.m. UTC | #6
On Mon, Jun 8, 2020 at 5:33 PM Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>
> On Saturday, June 6, 2020 8:56:26 AM CEST Rafael J. Wysocki wrote:
> > On Fri, Jun 5, 2020 at 7:09 PM Dan Williams <dan.j.williams@intel.com> wrote:
> > >
> > > On Fri, Jun 5, 2020 at 7:06 AM Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > > >
> > > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > > Subject: [PATCH] ACPI: OSL: Use rwlock instead of RCU for memory management
> > > >
> > > > The ACPI OS layer uses RCU to protect the list of ACPI memory
> > > > mappings from being walked while it is updated.  Among other
> > > > situations, that list can be walked in non-NMI interrupt context,
> > > > so using a sleeping lock to protect it is not an option.
> > > >
> > > > However, performance issues related to the RCU usage in there
> > > > appear, as described by Dan Williams:
> > > >
> > > > "Recently a performance problem was reported for a process invoking
> > > > a non-trival ASL program. The method call in this case ends up
> > > > repetitively triggering a call path like:
> > > >
> > > >     acpi_ex_store
> > > >     acpi_ex_store_object_to_node
> > > >     acpi_ex_write_data_to_field
> > > >     acpi_ex_insert_into_field
> > > >     acpi_ex_write_with_update_rule
> > > >     acpi_ex_field_datum_io
> > > >     acpi_ex_access_region
> > > >     acpi_ev_address_space_dispatch
> > > >     acpi_ex_system_memory_space_handler
> > > >     acpi_os_map_cleanup.part.14
> > > >     _synchronize_rcu_expedited.constprop.89
> > > >     schedule
> > > >
> > > > The end result of frequent synchronize_rcu_expedited() invocation is
> > > > tiny sub-millisecond spurts of execution where the scheduler freely
> > > > migrates this apparently sleepy task. The overhead of frequent
> > > > scheduler invocation multiplies the execution time by a factor
> > > > of 2-3X."
> > > >
> > > > In order to avoid these issues, replace the RCU in the ACPI OS
> > > > layer by an rwlock.
> > > >
> > > > That rwlock should not be frequently contended, so the performance
> > > > impact of it is not expected to be significant.
> > > >
> > > > Reported-by: Dan Williams <dan.j.williams@intel.com>
> > > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > > ---
> > > >
> > > > Hi Dan,
> > > >
> > > > This is a possible fix for the ACPI OSL RCU-related performance issues, but
> > > > can you please arrange for the testing of it on the affected systems?
> > >
> > > Ugh, is it really this simple? I did not realize the read-side is NMI
> > > safe. I'll take a look.
> >
> > But if an NMI triggers while the lock is being held for writing, it
> > will deadlock, won't it?
> >
> > OTOH, according to the RCU documentation it is valid to call
> > rcu_read_[un]lock() from an NMI handler (see Interrupts and NMIs in
> > Documentation/RCU/Design/Requirements/Requirements.rst) so we are good
> > from this perspective today.
> >
> > Unless we teach APEI to avoid mapping lookups from
> > apei_{read|write}(), which wouldn't be unreasonable by itself, we need
> > to hold on to the RCU in ACPI OSL, so it looks like addressing the
> > problem in ACPICA is the best way to do it (and the current ACPICA
> > code in question is suboptimal, so it would be good to rework it
> > anyway).
> >
> > Cheers!
>
> I've sent the prototype patch below to you, Bob and Erik in private, so
> here it goes to the lists for completeness.
>
> It introduces a "fast-path" variant of acpi_os_map_memory() that only
> returns non-NULL if a matching mapping is already there in the list
> and reworks acpi_ex_system_memory_space_handler() to use it.
>
> The idea is to do a fast-path lookup first for every new mapping and
> only run the full acpi_os_map_memory() if that returns NULL and then
> save the mapping return by it and do a fast-path lookup for it again
> to bump up its reference counter in the OSL layer.  That should prevent
> the mappings from going away until the opregions that they belong to
> go away (the opregion deactivation code is updated too to remove the
> saved mappings), so in the cases when there's not too much opregion
> creation and removal activity, it should make the RCU-related overhead
> go away.
>
> Please test.
>
> Cheers!
>
> ---
>  drivers/acpi/acpica/evrgnini.c    |   14 ++++++++-
>  drivers/acpi/acpica/exregion.c    |   49 +++++++++++++++++++++++++++++--
>  drivers/acpi/osl.c                |   59 ++++++++++++++++++++++++++++----------
>  include/acpi/actypes.h            |    7 ++++
>  include/acpi/platform/aclinuxex.h |    2 +
>  5 files changed, 112 insertions(+), 19 deletions(-)
>
> Index: linux-pm/drivers/acpi/osl.c
> ===================================================================
> --- linux-pm.orig/drivers/acpi/osl.c
> +++ linux-pm/drivers/acpi/osl.c
> @@ -302,21 +302,8 @@ static void acpi_unmap(acpi_physical_add
>                 iounmap(vaddr);
>  }
>
> -/**
> - * acpi_os_map_iomem - Get a virtual address for a given physical address range.
> - * @phys: Start of the physical address range to map.
> - * @size: Size of the physical address range to map.
> - *
> - * Look up the given physical address range in the list of existing ACPI memory
> - * mappings.  If found, get a reference to it and return a pointer to it (its
> - * virtual address).  If not found, map it, add it to that list and return a
> - * pointer to it.
> - *
> - * During early init (when acpi_permanent_mmap has not been set yet) this
> - * routine simply calls __acpi_map_table() to get the job done.
> - */
> -void __iomem __ref
> -*acpi_os_map_iomem(acpi_physical_address phys, acpi_size size)
> +static void __iomem __ref *__acpi_os_map_iomem(acpi_physical_address phys,
> +                                              acpi_size size, bool fastpath)
>  {
>         struct acpi_ioremap *map;
>         void __iomem *virt;
> @@ -339,6 +326,11 @@ void __iomem __ref
>                 goto out;
>         }
>
> +       if (fastpath) {
> +               mutex_unlock(&acpi_ioremap_lock);
> +               return NULL;
> +       }
> +
>         map = kzalloc(sizeof(*map), GFP_KERNEL);
>         if (!map) {
>                 mutex_unlock(&acpi_ioremap_lock);
> @@ -366,6 +358,25 @@ out:
>         mutex_unlock(&acpi_ioremap_lock);
>         return map->virt + (phys - map->phys);
>  }
> +
> +/**
> + * acpi_os_map_iomem - Get a virtual address for a given physical address range.
> + * @phys: Start of the physical address range to map.
> + * @size: Size of the physical address range to map.
> + *
> + * Look up the given physical address range in the list of existing ACPI memory
> + * mappings.  If found, get a reference to it and return a pointer representing
> + * its virtual address.  If not found, map it, add it to that list and return a
> + * pointer representing its virtual address.
> + *
> + * During early init (when acpi_permanent_mmap has not been set yet) call
> + * __acpi_map_table() to obtain the mapping.
> + */
> +void __iomem __ref *acpi_os_map_iomem(acpi_physical_address phys,
> +                                     acpi_size size)
> +{
> +       return __acpi_os_map_iomem(phys, size, false);
> +}
>  EXPORT_SYMBOL_GPL(acpi_os_map_iomem);
>
>  void *__ref acpi_os_map_memory(acpi_physical_address phys, acpi_size size)
> @@ -374,6 +385,24 @@ void *__ref acpi_os_map_memory(acpi_phys
>  }
>  EXPORT_SYMBOL_GPL(acpi_os_map_memory);
>
> +/**
> + * acpi_os_map_memory_fastpath - Fast-path physical-to-virtual address mapping.
> + * @phys: Start of the physical address range to map.
> + * @size: Size of the physical address range to map.
> + *
> + * Look up the given physical address range in the list of existing ACPI memory
> + * mappings.  If found, get a reference to it and return a pointer representing
> + * its virtual address.  If not found, return NULL.
> + *
> + * During early init (when acpi_permanent_mmap has not been set yet) call
> + * __acpi_map_table() to obtain the mapping.
> + */
> +void __ref *acpi_os_map_memory_fastpath(acpi_physical_address phys,
> +                                       acpi_size size)
> +{
> +       return __acpi_os_map_iomem(phys, size, true);
> +}
> +
>  /* Must be called with mutex_lock(&acpi_ioremap_lock) */
>  static unsigned long acpi_os_drop_map_ref(struct acpi_ioremap *map)
>  {
> Index: linux-pm/include/acpi/actypes.h
> ===================================================================
> --- linux-pm.orig/include/acpi/actypes.h
> +++ linux-pm/include/acpi/actypes.h
> @@ -1200,12 +1200,19 @@ struct acpi_pci_id {
>         u16 function;
>  };
>
> +struct acpi_mem_mapping {
> +       u8 *logical_address;
> +       acpi_size length;
> +       struct acpi_mem_mapping *next;
> +};
> +
>  struct acpi_mem_space_context {
>         u32 length;
>         acpi_physical_address address;
>         acpi_physical_address mapped_physical_address;
>         u8 *mapped_logical_address;
>         acpi_size mapped_length;
> +       struct acpi_mem_mapping *first_mapping;
>  };
>
>  /*
> Index: linux-pm/drivers/acpi/acpica/exregion.c
> ===================================================================
> --- linux-pm.orig/drivers/acpi/acpica/exregion.c
> +++ linux-pm/drivers/acpi/acpica/exregion.c
> @@ -44,6 +44,9 @@ acpi_ex_system_memory_space_handler(u32
>         u32 length;
>         acpi_size map_length;
>         acpi_size page_boundary_map_length;
> +#ifdef ACPI_OS_MAP_MEMORY_FASTPATH
> +       struct acpi_mem_mapping *new_mapping;
> +#endif
>  #ifdef ACPI_MISALIGNMENT_NOT_SUPPORTED
>         u32 remainder;
>  #endif
> @@ -143,9 +146,20 @@ acpi_ex_system_memory_space_handler(u32
>
>                 /* Create a new mapping starting at the address given */
>
> -               mem_info->mapped_logical_address =
> -                   acpi_os_map_memory(address, map_length);
> -               if (!mem_info->mapped_logical_address) {
> +#ifdef ACPI_OS_MAP_MEMORY_FASTPATH
> +               /* Look for an existing mapping matching the request at hand. */
> +               logical_addr_ptr = acpi_os_map_memory_fastpath(address, length);

s/length/map_length/

but the patch should still work as is, with more overhead though.

> +               if (logical_addr_ptr) {
> +                       /*
> +                        * A matching mapping has been found, so cache it and
> +                        * carry our the access as requested.
> +                        */
> +                       goto access;
> +               }
> +#endif /* ACPI_OS_MAP_MEMORY_FASTPATH */
> +
> +               logical_addr_ptr = acpi_os_map_memory(address, map_length);
> +               if (!logical_addr_ptr) {
>                         ACPI_ERROR((AE_INFO,
>                                     "Could not map memory at 0x%8.8X%8.8X, size %u",
>                                     ACPI_FORMAT_UINT64(address),
> @@ -154,8 +168,37 @@ acpi_ex_system_memory_space_handler(u32
>                         return_ACPI_STATUS(AE_NO_MEMORY);
>                 }
>
> +#ifdef ACPI_OS_MAP_MEMORY_FASTPATH
> +               new_mapping = ACPI_ALLOCATE_ZEROED(sizeof(*new_mapping));
> +               if (new_mapping) {
> +                       new_mapping->logical_address = logical_addr_ptr;
> +                       new_mapping->length = map_length;
> +                       new_mapping->next = mem_info->first_mapping;
> +                       mem_info->first_mapping = new_mapping;
> +                       /*
> +                        * Carry out an extra fast-path lookup to get one more
> +                        * reference to this mapping to prevent it from getting
> +                        * dropped if a future access involving this region does
> +                        * not fall into it.
> +                        */
> +                       acpi_os_map_memory_fastpath(address, map_length);
> +               } else {
> +                       /*
> +                        * No room to save the new mapping, but this is not
> +                        * critical.  Just log the error and carry out the
> +                        * access as requested.
> +                        */
> +                       ACPI_ERROR((AE_INFO,
> +                                   "Not enough memory to save memory mapping at 0x%8.8X%8.8X, size %u",
> +                                   ACPI_FORMAT_UINT64(address),
> +                                   (u32)map_length));
> +               }
> +
> +access:
> +#endif /* ACPI_OS_MAP_MEMORY_FASTPATH */
>                 /* Save the physical address and mapping size */
>
> +               mem_info->mapped_logical_address = logical_addr_ptr;
>                 mem_info->mapped_physical_address = address;
>                 mem_info->mapped_length = map_length;
>         }
> Index: linux-pm/drivers/acpi/acpica/evrgnini.c
> ===================================================================
> --- linux-pm.orig/drivers/acpi/acpica/evrgnini.c
> +++ linux-pm/drivers/acpi/acpica/evrgnini.c
> @@ -38,6 +38,9 @@ acpi_ev_system_memory_region_setup(acpi_
>         union acpi_operand_object *region_desc =
>             (union acpi_operand_object *)handle;
>         struct acpi_mem_space_context *local_region_context;
> +#ifdef ACPI_OS_MAP_MEMORY_FASTPATH
> +       struct acpi_mem_mapping *mapping;
> +#endif
>
>         ACPI_FUNCTION_TRACE(ev_system_memory_region_setup);
>
> @@ -46,13 +49,22 @@ acpi_ev_system_memory_region_setup(acpi_
>                         local_region_context =
>                             (struct acpi_mem_space_context *)*region_context;
>
> -                       /* Delete a cached mapping if present */
> +                       /* Delete memory mappings if present */
>
>                         if (local_region_context->mapped_length) {
>                                 acpi_os_unmap_memory(local_region_context->
>                                                      mapped_logical_address,
>                                                      local_region_context->
>                                                      mapped_length);
> +#ifdef ACPI_OS_MAP_MEMORY_FASTPATH
> +                               while (local_region_context->first_mapping) {
> +                                       mapping = local_region_context->first_mapping;
> +                                       local_region_context->first_mapping = mapping->next;
> +                                       acpi_os_unmap_memory(mapping->logical_address,
> +                                                            mapping->length);
> +                                       ACPI_FREE(mapping);
> +                               }
> +#endif /* ACPI_OS_MAP_MEMORY_FASTPATH */
>                         }
>                         ACPI_FREE(local_region_context);
>                         *region_context = NULL;
> Index: linux-pm/include/acpi/platform/aclinuxex.h
> ===================================================================
> --- linux-pm.orig/include/acpi/platform/aclinuxex.h
> +++ linux-pm/include/acpi/platform/aclinuxex.h
> @@ -138,6 +138,8 @@ static inline void acpi_os_terminate_deb
>  /*
>   * OSL interfaces added by Linux
>   */
> +#define ACPI_OS_MAP_MEMORY_FASTPATH
> +void *acpi_os_map_memory_fastpath(acpi_physical_address where, acpi_size length);
>
>  #endif                         /* __KERNEL__ */
>
>
>
>

Patch
diff mbox series

Index: linux-pm/drivers/acpi/osl.c
===================================================================
--- linux-pm.orig/drivers/acpi/osl.c
+++ linux-pm/drivers/acpi/osl.c
@@ -81,8 +81,8 @@  struct acpi_ioremap {
 };
 
 static LIST_HEAD(acpi_ioremaps);
+static DEFINE_RWLOCK(acpi_ioremaps_list_lock);
 static DEFINE_MUTEX(acpi_ioremap_lock);
-#define acpi_ioremap_lock_held() lock_is_held(&acpi_ioremap_lock.dep_map)
 
 static void __init acpi_request_region (struct acpi_generic_address *gas,
 	unsigned int length, char *desc)
@@ -214,13 +214,13 @@  acpi_physical_address __init acpi_os_get
 	return pa;
 }
 
-/* Must be called with 'acpi_ioremap_lock' or RCU read lock held. */
+/* Must be called with 'acpi_ioremap_lock' or 'acpi_ioremaps_list_lock' held. */
 static struct acpi_ioremap *
 acpi_map_lookup(acpi_physical_address phys, acpi_size size)
 {
 	struct acpi_ioremap *map;
 
-	list_for_each_entry_rcu(map, &acpi_ioremaps, list, acpi_ioremap_lock_held())
+	list_for_each_entry(map, &acpi_ioremaps, list)
 		if (map->phys <= phys &&
 		    phys + size <= map->phys + map->size)
 			return map;
@@ -228,7 +228,7 @@  acpi_map_lookup(acpi_physical_address ph
 	return NULL;
 }
 
-/* Must be called with 'acpi_ioremap_lock' or RCU read lock held. */
+/* Must be called with 'acpi_ioremap_lock' or 'acpi_ioremaps_list_lock' held. */
 static void __iomem *
 acpi_map_vaddr_lookup(acpi_physical_address phys, unsigned int size)
 {
@@ -257,13 +257,13 @@  void __iomem *acpi_os_get_iomem(acpi_phy
 }
 EXPORT_SYMBOL_GPL(acpi_os_get_iomem);
 
-/* Must be called with 'acpi_ioremap_lock' or RCU read lock held. */
+/* Must be called with 'acpi_ioremap_lock' or 'acpi_ioremaps_list_lock' held. */
 static struct acpi_ioremap *
 acpi_map_lookup_virt(void __iomem *virt, acpi_size size)
 {
 	struct acpi_ioremap *map;
 
-	list_for_each_entry_rcu(map, &acpi_ioremaps, list, acpi_ioremap_lock_held())
+	list_for_each_entry(map, &acpi_ioremaps, list)
 		if (map->virt <= virt &&
 		    virt + size <= map->virt + map->size)
 			return map;
@@ -360,7 +360,11 @@  void __iomem __ref
 	map->size = pg_sz;
 	map->refcount = 1;
 
-	list_add_tail_rcu(&map->list, &acpi_ioremaps);
+	write_lock_irq(&acpi_ioremaps_list_lock);
+
+	list_add_tail(&map->list, &acpi_ioremaps);
+
+	write_unlock_irq(&acpi_ioremaps_list_lock);
 
 out:
 	mutex_unlock(&acpi_ioremap_lock);
@@ -379,14 +383,18 @@  static unsigned long acpi_os_drop_map_re
 {
 	unsigned long refcount = --map->refcount;
 
-	if (!refcount)
-		list_del_rcu(&map->list);
+	if (!refcount) {
+		write_lock_irq(&acpi_ioremaps_list_lock);
+
+		list_del(&map->list);
+
+		write_unlock_irq(&acpi_ioremaps_list_lock);
+	}
 	return refcount;
 }
 
 static void acpi_os_map_cleanup(struct acpi_ioremap *map)
 {
-	synchronize_rcu_expedited();
 	acpi_unmap(map->phys, map->virt);
 	kfree(map);
 }
@@ -704,18 +712,23 @@  acpi_status
 acpi_os_read_memory(acpi_physical_address phys_addr, u64 *value, u32 width)
 {
 	void __iomem *virt_addr;
+	unsigned long flags;
 	unsigned int size = width / 8;
 	bool unmap = false;
 	u64 dummy;
 	int error;
 
-	rcu_read_lock();
+	read_lock_irqsave(&acpi_ioremaps_list_lock, flags);
+
 	virt_addr = acpi_map_vaddr_lookup(phys_addr, size);
 	if (!virt_addr) {
-		rcu_read_unlock();
+
+		read_unlock_irqrestore(&acpi_ioremaps_list_lock, flags);
+
 		virt_addr = acpi_os_ioremap(phys_addr, size);
 		if (!virt_addr)
 			return AE_BAD_ADDRESS;
+
 		unmap = true;
 	}
 
@@ -728,7 +741,7 @@  acpi_os_read_memory(acpi_physical_addres
 	if (unmap)
 		iounmap(virt_addr);
 	else
-		rcu_read_unlock();
+		read_unlock_irqrestore(&acpi_ioremaps_list_lock, flags);
 
 	return AE_OK;
 }
@@ -737,16 +750,21 @@  acpi_status
 acpi_os_write_memory(acpi_physical_address phys_addr, u64 value, u32 width)
 {
 	void __iomem *virt_addr;
+	unsigned long flags;
 	unsigned int size = width / 8;
 	bool unmap = false;
 
-	rcu_read_lock();
+	read_lock_irqsave(&acpi_ioremaps_list_lock, flags);
+
 	virt_addr = acpi_map_vaddr_lookup(phys_addr, size);
 	if (!virt_addr) {
-		rcu_read_unlock();
+
+		read_unlock_irqrestore(&acpi_ioremaps_list_lock, flags);
+
 		virt_addr = acpi_os_ioremap(phys_addr, size);
 		if (!virt_addr)
 			return AE_BAD_ADDRESS;
+
 		unmap = true;
 	}
 
@@ -770,7 +788,7 @@  acpi_os_write_memory(acpi_physical_addre
 	if (unmap)
 		iounmap(virt_addr);
 	else
-		rcu_read_unlock();
+		read_unlock_irqrestore(&acpi_ioremaps_list_lock, flags);
 
 	return AE_OK;
 }