diff mbox series

[v2] PCI: vmd: Fix spinlock usage on config access for RT kernel

Message ID 20241218115951.83062-1-ryotkkr98@gmail.com (mailing list archive)
State Superseded
Delegated to: Krzysztof WilczyƄski
Headers show
Series [v2] PCI: vmd: Fix spinlock usage on config access for RT kernel | expand

Commit Message

Ryo Takakura Dec. 18, 2024, 11:59 a.m. UTC
PCI config access is locked with pci_lock which serializes
pci_user/bus_write_config*() and pci_user/bus_read_config*().
The subsequently invoked vmd_pci_write() and vmd_pci_read() are also
serialized as they are only invoked by them respectively.

Remove cfg_lock which is taken by vmd_pci_write() and vmd_pci_read()
for their serialization as its already serialized by pci_lock.

This fixes the spinlock_t(cfg_lock) usage within
raw_spinlock_t(pci_lock) on RT kernels where spinlock_t becomes
sleepable.

Reported as following:
[   18.756807] BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
[   18.756810] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1617, name: nodedev-init
[   18.756810] preempt_count: 1, expected: 0
[   18.756811] RCU nest depth: 0, expected: 0
[   18.756811] INFO: lockdep is turned off.
[   18.756812] irq event stamp: 0
[   18.756812] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
[   18.756815] hardirqs last disabled at (0): [<ffffffff864f1fe2>] copy_process+0xa62/0x2d90
[   18.756819] softirqs last  enabled at (0): [<ffffffff864f1fe2>] copy_process+0xa62/0x2d90
[   18.756820] softirqs last disabled at (0): [<0000000000000000>] 0x0
[   18.756822] CPU: 3 UID: 0 PID: 1617 Comm: nodedev-init Tainted: G        W          6.12.1 #11
[   18.756823] Tainted: [W]=WARN
[   18.756824] Hardware name: Dell Inc. Vostro 3710/0K1D6X, BIOS 1.14.0 06/09/2023
[   18.756825] Call Trace:
[   18.756826]  <TASK>
[   18.756827]  dump_stack_lvl+0x9b/0xf0
[   18.756830]  dump_stack+0x10/0x20
[   18.756831]  __might_resched+0x158/0x230
[   18.756833]  rt_spin_lock+0x4e/0x130
[   18.756837]  ? vmd_pci_read+0x8d/0x100 [vmd]
[   18.756839]  vmd_pci_read+0x8d/0x100 [vmd]
[   18.756840]  pci_user_read_config_byte+0x6f/0xe0
[   18.756843]  pci_read_config+0xfe/0x290
[   18.756845]  sysfs_kf_bin_read+0x68/0x90
[   18.756847]  kernfs_fop_read_iter+0xd7/0x200
[   18.756850]  vfs_read+0x26d/0x360
[   18.756853]  ksys_read+0x70/0xf0
[   18.756855]  __x64_sys_read+0x1a/0x20
[   18.756857]  x64_sys_call+0x1715/0x20d0
[   18.756859]  do_syscall_64+0x8f/0x170
[   18.756861]  ? syscall_exit_to_user_mode+0xcd/0x2c0
[   18.756863]  ? do_syscall_64+0x9b/0x170
[   18.756865]  entry_SYSCALL_64_after_hwframe+0x76/0x7e

Signed-off-by: Ryo Takakura <ryotkkr98@gmail.com>
---

Thanks Luis for feedback!

Changes since v1:
https://lore.kernel.org/lkml/20241215141321.383144-1-ryotkkr98@gmail.com/T/

- Remove cfg_lock instead of converting it to raw spinlock as suggested
  by Sebastian[0].
  
[0] https://lore.kernel.org/linux-rt-users/20230620154434.MosrzRUh@linutronix.de/

---
 drivers/pci/controller/vmd.c | 8 --------
 1 file changed, 8 deletions(-)

Comments

Luis Claudio R. Goncalves Dec. 18, 2024, 1:17 p.m. UTC | #1
On Wed, Dec 18, 2024 at 08:59:51PM +0900, Ryo Takakura wrote:
> PCI config access is locked with pci_lock which serializes
> pci_user/bus_write_config*() and pci_user/bus_read_config*().
> The subsequently invoked vmd_pci_write() and vmd_pci_read() are also
> serialized as they are only invoked by them respectively.
> 
> Remove cfg_lock which is taken by vmd_pci_write() and vmd_pci_read()
> for their serialization as its already serialized by pci_lock.
> 
> This fixes the spinlock_t(cfg_lock) usage within
> raw_spinlock_t(pci_lock) on RT kernels where spinlock_t becomes
> sleepable.
> 
> Reported as following:
> [   18.756807] BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
> [   18.756810] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1617, name: nodedev-init
> [   18.756810] preempt_count: 1, expected: 0
> [   18.756811] RCU nest depth: 0, expected: 0
> [   18.756811] INFO: lockdep is turned off.
> [   18.756812] irq event stamp: 0
> [   18.756812] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
> [   18.756815] hardirqs last disabled at (0): [<ffffffff864f1fe2>] copy_process+0xa62/0x2d90
> [   18.756819] softirqs last  enabled at (0): [<ffffffff864f1fe2>] copy_process+0xa62/0x2d90
> [   18.756820] softirqs last disabled at (0): [<0000000000000000>] 0x0
> [   18.756822] CPU: 3 UID: 0 PID: 1617 Comm: nodedev-init Tainted: G        W          6.12.1 #11
> [   18.756823] Tainted: [W]=WARN
> [   18.756824] Hardware name: Dell Inc. Vostro 3710/0K1D6X, BIOS 1.14.0 06/09/2023
> [   18.756825] Call Trace:
> [   18.756826]  <TASK>
> [   18.756827]  dump_stack_lvl+0x9b/0xf0
> [   18.756830]  dump_stack+0x10/0x20
> [   18.756831]  __might_resched+0x158/0x230
> [   18.756833]  rt_spin_lock+0x4e/0x130
> [   18.756837]  ? vmd_pci_read+0x8d/0x100 [vmd]
> [   18.756839]  vmd_pci_read+0x8d/0x100 [vmd]
> [   18.756840]  pci_user_read_config_byte+0x6f/0xe0
> [   18.756843]  pci_read_config+0xfe/0x290
> [   18.756845]  sysfs_kf_bin_read+0x68/0x90
> [   18.756847]  kernfs_fop_read_iter+0xd7/0x200
> [   18.756850]  vfs_read+0x26d/0x360
> [   18.756853]  ksys_read+0x70/0xf0
> [   18.756855]  __x64_sys_read+0x1a/0x20
> [   18.756857]  x64_sys_call+0x1715/0x20d0
> [   18.756859]  do_syscall_64+0x8f/0x170
> [   18.756861]  ? syscall_exit_to_user_mode+0xcd/0x2c0
> [   18.756863]  ? do_syscall_64+0x9b/0x170
> [   18.756865]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> 
> Signed-off-by: Ryo Takakura <ryotkkr98@gmail.com>
> ---

Reviewed-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>

Best regards,
Luis
 
> Thanks Luis for feedback!
> 
> Changes since v1:
> https://lore.kernel.org/lkml/20241215141321.383144-1-ryotkkr98@gmail.com/T/
> 
> - Remove cfg_lock instead of converting it to raw spinlock as suggested
>   by Sebastian[0].
>   
> [0] https://lore.kernel.org/linux-rt-users/20230620154434.MosrzRUh@linutronix.de/
> 
> ---
>  drivers/pci/controller/vmd.c | 8 --------
>  1 file changed, 8 deletions(-)
> 
> diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
> index 9d9596947..2be898424 100644
> --- a/drivers/pci/controller/vmd.c
> +++ b/drivers/pci/controller/vmd.c
> @@ -125,7 +125,6 @@ struct vmd_irq_list {
>  struct vmd_dev {
>  	struct pci_dev		*dev;
>  
> -	spinlock_t		cfg_lock;
>  	void __iomem		*cfgbar;
>  
>  	int msix_count;
> @@ -385,13 +384,11 @@ static int vmd_pci_read(struct pci_bus *bus, unsigned int devfn, int reg,
>  {
>  	struct vmd_dev *vmd = vmd_from_bus(bus);
>  	void __iomem *addr = vmd_cfg_addr(vmd, bus, devfn, reg, len);
> -	unsigned long flags;
>  	int ret = 0;
>  
>  	if (!addr)
>  		return -EFAULT;
>  
> -	spin_lock_irqsave(&vmd->cfg_lock, flags);
>  	switch (len) {
>  	case 1:
>  		*value = readb(addr);
> @@ -406,7 +403,6 @@ static int vmd_pci_read(struct pci_bus *bus, unsigned int devfn, int reg,
>  		ret = -EINVAL;
>  		break;
>  	}
> -	spin_unlock_irqrestore(&vmd->cfg_lock, flags);
>  	return ret;
>  }
>  
> @@ -420,13 +416,11 @@ static int vmd_pci_write(struct pci_bus *bus, unsigned int devfn, int reg,
>  {
>  	struct vmd_dev *vmd = vmd_from_bus(bus);
>  	void __iomem *addr = vmd_cfg_addr(vmd, bus, devfn, reg, len);
> -	unsigned long flags;
>  	int ret = 0;
>  
>  	if (!addr)
>  		return -EFAULT;
>  
> -	spin_lock_irqsave(&vmd->cfg_lock, flags);
>  	switch (len) {
>  	case 1:
>  		writeb(value, addr);
> @@ -444,7 +438,6 @@ static int vmd_pci_write(struct pci_bus *bus, unsigned int devfn, int reg,
>  		ret = -EINVAL;
>  		break;
>  	}
> -	spin_unlock_irqrestore(&vmd->cfg_lock, flags);
>  	return ret;
>  }
>  
> @@ -1009,7 +1002,6 @@ static int vmd_probe(struct pci_dev *dev, const struct pci_device_id *id)
>  	if (features & VMD_FEAT_OFFSET_FIRST_VECTOR)
>  		vmd->first_vec = 1;
>  
> -	spin_lock_init(&vmd->cfg_lock);
>  	pci_set_drvdata(dev, vmd);
>  	err = vmd_enable_domain(vmd, features);
>  	if (err)
> -- 
> 2.34.1
> 
---end quoted text---
Keith Busch Dec. 18, 2024, 3:36 p.m. UTC | #2
On Wed, Dec 18, 2024 at 08:59:51PM +0900, Ryo Takakura wrote:
> PCI config access is locked with pci_lock which serializes
> pci_user/bus_write_config*() and pci_user/bus_read_config*().
> The subsequently invoked vmd_pci_write() and vmd_pci_read() are also
> serialized as they are only invoked by them respectively.
> 
> Remove cfg_lock which is taken by vmd_pci_write() and vmd_pci_read()
> for their serialization as its already serialized by pci_lock.

That's only true if CONFIG_PCI_LOCKLESS_CONFIG isn't set, so pci_lock
won't help with concurrent kernel config access in such a setup. I think
the previous change to raw lock proposal was the correct approach.
 
> @@ -385,13 +384,11 @@ static int vmd_pci_read(struct pci_bus *bus, unsigned int devfn, int reg,
>  {
>  	struct vmd_dev *vmd = vmd_from_bus(bus);
>  	void __iomem *addr = vmd_cfg_addr(vmd, bus, devfn, reg, len);
> -	unsigned long flags;
>  	int ret = 0;
>  
>  	if (!addr)
>  		return -EFAULT;
>  
> -	spin_lock_irqsave(&vmd->cfg_lock, flags);
>  	switch (len) {
>  	case 1:
>  		*value = readb(addr);

There's a comment above this function explaining the need for the lock,
which doesn't make a lot of sense after this patch.
Sebastian Andrzej Siewior Dec. 18, 2024, 3:48 p.m. UTC | #3
On 2024-12-18 08:36:54 [-0700], Keith Busch wrote:
> On Wed, Dec 18, 2024 at 08:59:51PM +0900, Ryo Takakura wrote:
> > PCI config access is locked with pci_lock which serializes
> > pci_user/bus_write_config*() and pci_user/bus_read_config*().
> > The subsequently invoked vmd_pci_write() and vmd_pci_read() are also
> > serialized as they are only invoked by them respectively.
> > 
> > Remove cfg_lock which is taken by vmd_pci_write() and vmd_pci_read()
> > for their serialization as its already serialized by pci_lock.
> 
> That's only true if CONFIG_PCI_LOCKLESS_CONFIG isn't set, so pci_lock
> won't help with concurrent kernel config access in such a setup. I think
> the previous change to raw lock proposal was the correct approach.

I overlooked that. Wouldn't it make sense to let the vmd driver select
that option rather than adding/ having a lock for the same purpose?

Sebastian
Keith Busch Dec. 18, 2024, 3:57 p.m. UTC | #4
On Wed, Dec 18, 2024 at 04:48:38PM +0100, Sebastian Andrzej Siewior wrote:
> On 2024-12-18 08:36:54 [-0700], Keith Busch wrote:
> > On Wed, Dec 18, 2024 at 08:59:51PM +0900, Ryo Takakura wrote:
> > > PCI config access is locked with pci_lock which serializes
> > > pci_user/bus_write_config*() and pci_user/bus_read_config*().
> > > The subsequently invoked vmd_pci_write() and vmd_pci_read() are also
> > > serialized as they are only invoked by them respectively.
> > > 
> > > Remove cfg_lock which is taken by vmd_pci_write() and vmd_pci_read()
> > > for their serialization as its already serialized by pci_lock.
> > 
> > That's only true if CONFIG_PCI_LOCKLESS_CONFIG isn't set, so pci_lock
> > won't help with concurrent kernel config access in such a setup. I think
> > the previous change to raw lock proposal was the correct approach.
> 
> I overlooked that. Wouldn't it make sense to let the vmd driver select
> that option rather than adding/ having a lock for the same purpose?

The arch/x86/Kconfig always selects PCI_LOCKESS_CONFIG, so I don't think
the vmd driver can require it be turned off. Besides, no need to punish
all PCI access if only this device requires it be serialized.
Sebastian Andrzej Siewior Dec. 18, 2024, 4:03 p.m. UTC | #5
On 2024-12-18 08:57:53 [-0700], Keith Busch wrote:
> On Wed, Dec 18, 2024 at 04:48:38PM +0100, Sebastian Andrzej Siewior wrote:
> > On 2024-12-18 08:36:54 [-0700], Keith Busch wrote:
> > > On Wed, Dec 18, 2024 at 08:59:51PM +0900, Ryo Takakura wrote:
> > > > PCI config access is locked with pci_lock which serializes
> > > > pci_user/bus_write_config*() and pci_user/bus_read_config*().
> > > > The subsequently invoked vmd_pci_write() and vmd_pci_read() are also
> > > > serialized as they are only invoked by them respectively.
> > > > 
> > > > Remove cfg_lock which is taken by vmd_pci_write() and vmd_pci_read()
> > > > for their serialization as its already serialized by pci_lock.
> > > 
> > > That's only true if CONFIG_PCI_LOCKLESS_CONFIG isn't set, so pci_lock
> > > won't help with concurrent kernel config access in such a setup. I think
> > > the previous change to raw lock proposal was the correct approach.
> > 
> > I overlooked that. Wouldn't it make sense to let the vmd driver select
> > that option rather than adding/ having a lock for the same purpose?
> 
> The arch/x86/Kconfig always selects PCI_LOCKESS_CONFIG, so I don't think
> the vmd driver can require it be turned off. Besides, no need to punish
> all PCI access if only this device requires it be serialized.

Okay. That makes sense.

Sebastian
Luis Claudio R. Goncalves Dec. 18, 2024, 4:35 p.m. UTC | #6
On Wed, Dec 18, 2024 at 08:57:53AM -0700, Keith Busch wrote:
> On Wed, Dec 18, 2024 at 04:48:38PM +0100, Sebastian Andrzej Siewior wrote:
> > On 2024-12-18 08:36:54 [-0700], Keith Busch wrote:
> > > On Wed, Dec 18, 2024 at 08:59:51PM +0900, Ryo Takakura wrote:
> > > > PCI config access is locked with pci_lock which serializes
> > > > pci_user/bus_write_config*() and pci_user/bus_read_config*().
> > > > The subsequently invoked vmd_pci_write() and vmd_pci_read() are also
> > > > serialized as they are only invoked by them respectively.
> > > > 
> > > > Remove cfg_lock which is taken by vmd_pci_write() and vmd_pci_read()
> > > > for their serialization as its already serialized by pci_lock.
> > > 
> > > That's only true if CONFIG_PCI_LOCKLESS_CONFIG isn't set, so pci_lock
> > > won't help with concurrent kernel config access in such a setup. I think
> > > the previous change to raw lock proposal was the correct approach.
> > 
> > I overlooked that. Wouldn't it make sense to let the vmd driver select
> > that option rather than adding/ having a lock for the same purpose?
> 
> The arch/x86/Kconfig always selects PCI_LOCKESS_CONFIG, so I don't think
> the vmd driver can require it be turned off. Besides, no need to punish
> all PCI access if only this device requires it be serialized.

Sorry, I also missed that and induced Ryo Takakura to rewrite the patch.

Luis
Ryo Takakura Dec. 19, 2024, 12:53 a.m. UTC | #7
Hi all!

on Wed, 18 Dec 2024 13:35:07 -0300, Luis Claudio R. Goncalves wrote:
>On Wed, Dec 18, 2024 at 08:57:53AM -0700, Keith Busch wrote:
>> On Wed, Dec 18, 2024 at 04:48:38PM +0100, Sebastian Andrzej Siewior wrote:
>> > On 2024-12-18 08:36:54 [-0700], Keith Busch wrote:
>> > > On Wed, Dec 18, 2024 at 08:59:51PM +0900, Ryo Takakura wrote:
>> > > > PCI config access is locked with pci_lock which serializes
>> > > > pci_user/bus_write_config*() and pci_user/bus_read_config*().
>> > > > The subsequently invoked vmd_pci_write() and vmd_pci_read() are also
>> > > > serialized as they are only invoked by them respectively.
>> > > > 
>> > > > Remove cfg_lock which is taken by vmd_pci_write() and vmd_pci_read()
>> > > > for their serialization as its already serialized by pci_lock.
>> > > 
>> > > That's only true if CONFIG_PCI_LOCKLESS_CONFIG isn't set, so pci_lock
>> > > won't help with concurrent kernel config access in such a setup. I think
>> > > the previous change to raw lock proposal was the correct approach.
>> > 
>> > I overlooked that. Wouldn't it make sense to let the vmd driver select
>> > that option rather than adding/ having a lock for the same purpose?
>> 
>> The arch/x86/Kconfig always selects PCI_LOCKESS_CONFIG, so I don't think
>> the vmd driver can require it be turned off. Besides, no need to punish
>> all PCI access if only this device requires it be serialized.
>
>Sorry, I also missed that and induced Ryo Takakura to rewrite the patch.

My apologies I missed that as well.
I'll send the first version of the patch once again!

>Luis

Sincerely,
Ryo Takakura
Ryo Takakura Dec. 19, 2024, 12:55 a.m. UTC | #8
Hi Keith,

On Wed, 18 Dec 2024 08:36:54 -0700, Keith Busch wrote:
>> @@ -385,13 +384,11 @@ static int vmd_pci_read(struct pci_bus *bus, unsigned int devfn, int reg,
>>  {
>>  	struct vmd_dev *vmd = vmd_from_bus(bus);
>>  	void __iomem *addr = vmd_cfg_addr(vmd, bus, devfn, reg, len);
>> -	unsigned long flags;
>>  	int ret = 0;
>>  
>>  	if (!addr)
>>  		return -EFAULT;
>>  
>> -	spin_lock_irqsave(&vmd->cfg_lock, flags);
>>  	switch (len) {
>>  	case 1:
>>  		*value = readb(addr);
>
>There's a comment above this function explaining the need for the lock,
>which doesn't make a lot of sense after this patch.

Thanks for pointing it out! It slipped my mind.
I'll leave it as it is for the next patch :)

Sincerely,
Ryo Takakura
diff mbox series

Patch

diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
index 9d9596947..2be898424 100644
--- a/drivers/pci/controller/vmd.c
+++ b/drivers/pci/controller/vmd.c
@@ -125,7 +125,6 @@  struct vmd_irq_list {
 struct vmd_dev {
 	struct pci_dev		*dev;
 
-	spinlock_t		cfg_lock;
 	void __iomem		*cfgbar;
 
 	int msix_count;
@@ -385,13 +384,11 @@  static int vmd_pci_read(struct pci_bus *bus, unsigned int devfn, int reg,
 {
 	struct vmd_dev *vmd = vmd_from_bus(bus);
 	void __iomem *addr = vmd_cfg_addr(vmd, bus, devfn, reg, len);
-	unsigned long flags;
 	int ret = 0;
 
 	if (!addr)
 		return -EFAULT;
 
-	spin_lock_irqsave(&vmd->cfg_lock, flags);
 	switch (len) {
 	case 1:
 		*value = readb(addr);
@@ -406,7 +403,6 @@  static int vmd_pci_read(struct pci_bus *bus, unsigned int devfn, int reg,
 		ret = -EINVAL;
 		break;
 	}
-	spin_unlock_irqrestore(&vmd->cfg_lock, flags);
 	return ret;
 }
 
@@ -420,13 +416,11 @@  static int vmd_pci_write(struct pci_bus *bus, unsigned int devfn, int reg,
 {
 	struct vmd_dev *vmd = vmd_from_bus(bus);
 	void __iomem *addr = vmd_cfg_addr(vmd, bus, devfn, reg, len);
-	unsigned long flags;
 	int ret = 0;
 
 	if (!addr)
 		return -EFAULT;
 
-	spin_lock_irqsave(&vmd->cfg_lock, flags);
 	switch (len) {
 	case 1:
 		writeb(value, addr);
@@ -444,7 +438,6 @@  static int vmd_pci_write(struct pci_bus *bus, unsigned int devfn, int reg,
 		ret = -EINVAL;
 		break;
 	}
-	spin_unlock_irqrestore(&vmd->cfg_lock, flags);
 	return ret;
 }
 
@@ -1009,7 +1002,6 @@  static int vmd_probe(struct pci_dev *dev, const struct pci_device_id *id)
 	if (features & VMD_FEAT_OFFSET_FIRST_VECTOR)
 		vmd->first_vec = 1;
 
-	spin_lock_init(&vmd->cfg_lock);
 	pci_set_drvdata(dev, vmd);
 	err = vmd_enable_domain(vmd, features);
 	if (err)