diff mbox

[1/4] pci: Allow lockless access path to PCI mmconfig

Message ID 20170302232104.10136-1-andi@firstfloor.org (mailing list archive)
State New, archived
Delegated to: Bjorn Helgaas
Headers show

Commit Message

Andi Kleen March 2, 2017, 11:21 p.m. UTC
From: Andi Kleen <ak@linux.intel.com>

The Intel uncore driver can do a lot of PCI config accesses to read
performance counters. I had a situation on a 4S system where it
was spending 40+% of CPU time grabbing the pci_cfg_lock due to that.

For 64bit x86 with MMCONFIG there isn't really any reason to take
a lock. The access is directly mapped to an underlying MMIO area,
which can fully operate lockless.

Add a new flag that allows the PCI mid layer to skip the lock
and set it for the 64bit mmconfig code.

There's a small risk that someone relies on this lock for synchronization,
but I think that's unlikely because there isn't really any useful
synchronization at this individual operation level. Any useful
synchronization would likely need to protect at least a
read-modify-write or similar.  So I made it unconditional without opt-in.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 arch/x86/pci/mmconfig_64.c |  1 +
 drivers/pci/access.c       | 14 ++++++++++----
 include/linux/pci.h        |  2 ++
 3 files changed, 13 insertions(+), 4 deletions(-)

Comments

Thomas Gleixner March 14, 2017, 1:06 p.m. UTC | #1
On Thu, 2 Mar 2017, Andi Kleen wrote:

> From: Andi Kleen <ak@linux.intel.com>
> 
> The Intel uncore driver can do a lot of PCI config accesses to read
> performance counters. I had a situation on a 4S system where it
> was spending 40+% of CPU time grabbing the pci_cfg_lock due to that.
> 
> For 64bit x86 with MMCONFIG there isn't really any reason to take
> a lock. The access is directly mapped to an underlying MMIO area,
> which can fully operate lockless.
> 
> Add a new flag that allows the PCI mid layer to skip the lock
> and set it for the 64bit mmconfig code.
> 
> There's a small risk that someone relies on this lock for synchronization,
> but I think that's unlikely because there isn't really any useful
> synchronization at this individual operation level. Any useful
> synchronization would likely need to protect at least a
> read-modify-write or similar.  So I made it unconditional without opt-in.

This part of the changelog is just crap.

The reason why pci_lock exists and is taken for each single read/write
config is that some ops implementations, e.g. the generic ones, must
protect at this granularity level because

	ops->map_bus()
	read/writeX()

needs to be 'atomic'.

MMCONFIG obviously does not require this at all because it's a simple
byte/word/dword read/write which is serialized by itself. So it's obvious
that the serialization with pci_lock is pointless in this case.

It's not that hard to figure it out and write up a proper changelog instead
of handwaving about risk and whatever.

Thanks,

	tglx
H. Peter Anvin March 14, 2017, 5:28 p.m. UTC | #2
On 03/02/17 15:21, Andi Kleen wrote:
> From: Andi Kleen <ak@linux.intel.com>
> 
> The Intel uncore driver can do a lot of PCI config accesses to read
> performance counters. I had a situation on a 4S system where it
> was spending 40+% of CPU time grabbing the pci_cfg_lock due to that.
> 
> For 64bit x86 with MMCONFIG there isn't really any reason to take
> a lock. The access is directly mapped to an underlying MMIO area,
> which can fully operate lockless.
> 
> Add a new flag that allows the PCI mid layer to skip the lock
> and set it for the 64bit mmconfig code.
> 
> There's a small risk that someone relies on this lock for synchronization,
> but I think that's unlikely because there isn't really any useful
> synchronization at this individual operation level. Any useful
> synchronization would likely need to protect at least a
> read-modify-write or similar.  So I made it unconditional without opt-in.
> 
> Signed-off-by: Andi Kleen <ak@linux.intel.com>
> ---
>  arch/x86/pci/mmconfig_64.c |  1 +
>  drivers/pci/access.c       | 14 ++++++++++----
>  include/linux/pci.h        |  2 ++
>  3 files changed, 13 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/pci/mmconfig_64.c b/arch/x86/pci/mmconfig_64.c
> index bea52496aea6..8bf10f41e626 100644
> --- a/arch/x86/pci/mmconfig_64.c
> +++ b/arch/x86/pci/mmconfig_64.c
> @@ -121,6 +121,7 @@ int __init pci_mmcfg_arch_init(void)
>  		}
>  
>  	raw_pci_ext_ops = &pci_mmcfg;
> +	pci_root_ops.ll_allowed = true;
>  

"ll_allowed" is pretty awful naming... you spend almost all the
characters telling us nothing.  I spend several seconds trying to figure
out what "ll" stood for, and without the context of the patch I'd have
had to go a massive grep.  Just call it "lockless" or something.

	-hpa
diff mbox

Patch

diff --git a/arch/x86/pci/mmconfig_64.c b/arch/x86/pci/mmconfig_64.c
index bea52496aea6..8bf10f41e626 100644
--- a/arch/x86/pci/mmconfig_64.c
+++ b/arch/x86/pci/mmconfig_64.c
@@ -121,6 +121,7 @@  int __init pci_mmcfg_arch_init(void)
 		}
 
 	raw_pci_ext_ops = &pci_mmcfg;
+	pci_root_ops.ll_allowed = true;
 
 	return 1;
 }
diff --git a/drivers/pci/access.c b/drivers/pci/access.c
index db239547fefd..22552c6606c1 100644
--- a/drivers/pci/access.c
+++ b/drivers/pci/access.c
@@ -32,11 +32,14 @@  int pci_bus_read_config_##size \
 	int res;							\
 	unsigned long flags;						\
 	u32 data = 0;							\
+	bool ll_allowed = bus->ops->ll_allowed;				\
 	if (PCI_##size##_BAD) return PCIBIOS_BAD_REGISTER_NUMBER;	\
-	raw_spin_lock_irqsave(&pci_lock, flags);			\
+	if (!ll_allowed)						\
+		raw_spin_lock_irqsave(&pci_lock, flags);		\
 	res = bus->ops->read(bus, devfn, pos, len, &data);		\
 	*value = (type)data;						\
-	raw_spin_unlock_irqrestore(&pci_lock, flags);		\
+	if (!ll_allowed)						\
+		raw_spin_unlock_irqrestore(&pci_lock, flags);		\
 	return res;							\
 }
 
@@ -46,10 +49,13 @@  int pci_bus_write_config_##size \
 {									\
 	int res;							\
 	unsigned long flags;						\
+	bool ll_allowed = bus->ops->ll_allowed;				\
 	if (PCI_##size##_BAD) return PCIBIOS_BAD_REGISTER_NUMBER;	\
-	raw_spin_lock_irqsave(&pci_lock, flags);			\
+	if (!ll_allowed)						\
+		raw_spin_lock_irqsave(&pci_lock, flags);		\
 	res = bus->ops->write(bus, devfn, pos, len, value);		\
-	raw_spin_unlock_irqrestore(&pci_lock, flags);		\
+	if (!ll_allowed)						\
+		raw_spin_unlock_irqrestore(&pci_lock, flags);		\
 	return res;							\
 }
 
diff --git a/include/linux/pci.h b/include/linux/pci.h
index e2d1a124216a..9b234cbc7ae1 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -612,6 +612,8 @@  struct pci_ops {
 	void __iomem *(*map_bus)(struct pci_bus *bus, unsigned int devfn, int where);
 	int (*read)(struct pci_bus *bus, unsigned int devfn, int where, int size, u32 *val);
 	int (*write)(struct pci_bus *bus, unsigned int devfn, int where, int size, u32 val);
+	/* Set to true when pci_lock is not needed for read/write */
+	bool ll_allowed;
 };
 
 /*