Message ID | 20241218115951.83062-1-ryotkkr98@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Krzysztof WilczyĆski |
Headers | show |
Series | [v2] PCI: vmd: Fix spinlock usage on config access for RT kernel | expand |
On Wed, Dec 18, 2024 at 08:59:51PM +0900, Ryo Takakura wrote: > PCI config access is locked with pci_lock which serializes > pci_user/bus_write_config*() and pci_user/bus_read_config*(). > The subsequently invoked vmd_pci_write() and vmd_pci_read() are also > serialized as they are only invoked by them respectively. > > Remove cfg_lock which is taken by vmd_pci_write() and vmd_pci_read() > for their serialization as its already serialized by pci_lock. > > This fixes the spinlock_t(cfg_lock) usage within > raw_spinlock_t(pci_lock) on RT kernels where spinlock_t becomes > sleepable. > > Reported as following: > [ 18.756807] BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48 > [ 18.756810] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1617, name: nodedev-init > [ 18.756810] preempt_count: 1, expected: 0 > [ 18.756811] RCU nest depth: 0, expected: 0 > [ 18.756811] INFO: lockdep is turned off. > [ 18.756812] irq event stamp: 0 > [ 18.756812] hardirqs last enabled at (0): [<0000000000000000>] 0x0 > [ 18.756815] hardirqs last disabled at (0): [<ffffffff864f1fe2>] copy_process+0xa62/0x2d90 > [ 18.756819] softirqs last enabled at (0): [<ffffffff864f1fe2>] copy_process+0xa62/0x2d90 > [ 18.756820] softirqs last disabled at (0): [<0000000000000000>] 0x0 > [ 18.756822] CPU: 3 UID: 0 PID: 1617 Comm: nodedev-init Tainted: G W 6.12.1 #11 > [ 18.756823] Tainted: [W]=WARN > [ 18.756824] Hardware name: Dell Inc. Vostro 3710/0K1D6X, BIOS 1.14.0 06/09/2023 > [ 18.756825] Call Trace: > [ 18.756826] <TASK> > [ 18.756827] dump_stack_lvl+0x9b/0xf0 > [ 18.756830] dump_stack+0x10/0x20 > [ 18.756831] __might_resched+0x158/0x230 > [ 18.756833] rt_spin_lock+0x4e/0x130 > [ 18.756837] ? vmd_pci_read+0x8d/0x100 [vmd] > [ 18.756839] vmd_pci_read+0x8d/0x100 [vmd] > [ 18.756840] pci_user_read_config_byte+0x6f/0xe0 > [ 18.756843] pci_read_config+0xfe/0x290 > [ 18.756845] sysfs_kf_bin_read+0x68/0x90 > [ 18.756847] kernfs_fop_read_iter+0xd7/0x200 > [ 18.756850] vfs_read+0x26d/0x360 > [ 18.756853] ksys_read+0x70/0xf0 > [ 18.756855] __x64_sys_read+0x1a/0x20 > [ 18.756857] x64_sys_call+0x1715/0x20d0 > [ 18.756859] do_syscall_64+0x8f/0x170 > [ 18.756861] ? syscall_exit_to_user_mode+0xcd/0x2c0 > [ 18.756863] ? do_syscall_64+0x9b/0x170 > [ 18.756865] entry_SYSCALL_64_after_hwframe+0x76/0x7e > > Signed-off-by: Ryo Takakura <ryotkkr98@gmail.com> > --- Reviewed-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com> Best regards, Luis > Thanks Luis for feedback! > > Changes since v1: > https://lore.kernel.org/lkml/20241215141321.383144-1-ryotkkr98@gmail.com/T/ > > - Remove cfg_lock instead of converting it to raw spinlock as suggested > by Sebastian[0]. > > [0] https://lore.kernel.org/linux-rt-users/20230620154434.MosrzRUh@linutronix.de/ > > --- > drivers/pci/controller/vmd.c | 8 -------- > 1 file changed, 8 deletions(-) > > diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c > index 9d9596947..2be898424 100644 > --- a/drivers/pci/controller/vmd.c > +++ b/drivers/pci/controller/vmd.c > @@ -125,7 +125,6 @@ struct vmd_irq_list { > struct vmd_dev { > struct pci_dev *dev; > > - spinlock_t cfg_lock; > void __iomem *cfgbar; > > int msix_count; > @@ -385,13 +384,11 @@ static int vmd_pci_read(struct pci_bus *bus, unsigned int devfn, int reg, > { > struct vmd_dev *vmd = vmd_from_bus(bus); > void __iomem *addr = vmd_cfg_addr(vmd, bus, devfn, reg, len); > - unsigned long flags; > int ret = 0; > > if (!addr) > return -EFAULT; > > - spin_lock_irqsave(&vmd->cfg_lock, flags); > switch (len) { > case 1: > *value = readb(addr); > @@ -406,7 +403,6 @@ static int vmd_pci_read(struct pci_bus *bus, unsigned int devfn, int reg, > ret = -EINVAL; > break; > } > - spin_unlock_irqrestore(&vmd->cfg_lock, flags); > return ret; > } > > @@ -420,13 +416,11 @@ static int vmd_pci_write(struct pci_bus *bus, unsigned int devfn, int reg, > { > struct vmd_dev *vmd = vmd_from_bus(bus); > void __iomem *addr = vmd_cfg_addr(vmd, bus, devfn, reg, len); > - unsigned long flags; > int ret = 0; > > if (!addr) > return -EFAULT; > > - spin_lock_irqsave(&vmd->cfg_lock, flags); > switch (len) { > case 1: > writeb(value, addr); > @@ -444,7 +438,6 @@ static int vmd_pci_write(struct pci_bus *bus, unsigned int devfn, int reg, > ret = -EINVAL; > break; > } > - spin_unlock_irqrestore(&vmd->cfg_lock, flags); > return ret; > } > > @@ -1009,7 +1002,6 @@ static int vmd_probe(struct pci_dev *dev, const struct pci_device_id *id) > if (features & VMD_FEAT_OFFSET_FIRST_VECTOR) > vmd->first_vec = 1; > > - spin_lock_init(&vmd->cfg_lock); > pci_set_drvdata(dev, vmd); > err = vmd_enable_domain(vmd, features); > if (err) > -- > 2.34.1 > ---end quoted text---
On Wed, Dec 18, 2024 at 08:59:51PM +0900, Ryo Takakura wrote: > PCI config access is locked with pci_lock which serializes > pci_user/bus_write_config*() and pci_user/bus_read_config*(). > The subsequently invoked vmd_pci_write() and vmd_pci_read() are also > serialized as they are only invoked by them respectively. > > Remove cfg_lock which is taken by vmd_pci_write() and vmd_pci_read() > for their serialization as its already serialized by pci_lock. That's only true if CONFIG_PCI_LOCKLESS_CONFIG isn't set, so pci_lock won't help with concurrent kernel config access in such a setup. I think the previous change to raw lock proposal was the correct approach. > @@ -385,13 +384,11 @@ static int vmd_pci_read(struct pci_bus *bus, unsigned int devfn, int reg, > { > struct vmd_dev *vmd = vmd_from_bus(bus); > void __iomem *addr = vmd_cfg_addr(vmd, bus, devfn, reg, len); > - unsigned long flags; > int ret = 0; > > if (!addr) > return -EFAULT; > > - spin_lock_irqsave(&vmd->cfg_lock, flags); > switch (len) { > case 1: > *value = readb(addr); There's a comment above this function explaining the need for the lock, which doesn't make a lot of sense after this patch.
On 2024-12-18 08:36:54 [-0700], Keith Busch wrote: > On Wed, Dec 18, 2024 at 08:59:51PM +0900, Ryo Takakura wrote: > > PCI config access is locked with pci_lock which serializes > > pci_user/bus_write_config*() and pci_user/bus_read_config*(). > > The subsequently invoked vmd_pci_write() and vmd_pci_read() are also > > serialized as they are only invoked by them respectively. > > > > Remove cfg_lock which is taken by vmd_pci_write() and vmd_pci_read() > > for their serialization as its already serialized by pci_lock. > > That's only true if CONFIG_PCI_LOCKLESS_CONFIG isn't set, so pci_lock > won't help with concurrent kernel config access in such a setup. I think > the previous change to raw lock proposal was the correct approach. I overlooked that. Wouldn't it make sense to let the vmd driver select that option rather than adding/ having a lock for the same purpose? Sebastian
On Wed, Dec 18, 2024 at 04:48:38PM +0100, Sebastian Andrzej Siewior wrote: > On 2024-12-18 08:36:54 [-0700], Keith Busch wrote: > > On Wed, Dec 18, 2024 at 08:59:51PM +0900, Ryo Takakura wrote: > > > PCI config access is locked with pci_lock which serializes > > > pci_user/bus_write_config*() and pci_user/bus_read_config*(). > > > The subsequently invoked vmd_pci_write() and vmd_pci_read() are also > > > serialized as they are only invoked by them respectively. > > > > > > Remove cfg_lock which is taken by vmd_pci_write() and vmd_pci_read() > > > for their serialization as its already serialized by pci_lock. > > > > That's only true if CONFIG_PCI_LOCKLESS_CONFIG isn't set, so pci_lock > > won't help with concurrent kernel config access in such a setup. I think > > the previous change to raw lock proposal was the correct approach. > > I overlooked that. Wouldn't it make sense to let the vmd driver select > that option rather than adding/ having a lock for the same purpose? The arch/x86/Kconfig always selects PCI_LOCKESS_CONFIG, so I don't think the vmd driver can require it be turned off. Besides, no need to punish all PCI access if only this device requires it be serialized.
On 2024-12-18 08:57:53 [-0700], Keith Busch wrote: > On Wed, Dec 18, 2024 at 04:48:38PM +0100, Sebastian Andrzej Siewior wrote: > > On 2024-12-18 08:36:54 [-0700], Keith Busch wrote: > > > On Wed, Dec 18, 2024 at 08:59:51PM +0900, Ryo Takakura wrote: > > > > PCI config access is locked with pci_lock which serializes > > > > pci_user/bus_write_config*() and pci_user/bus_read_config*(). > > > > The subsequently invoked vmd_pci_write() and vmd_pci_read() are also > > > > serialized as they are only invoked by them respectively. > > > > > > > > Remove cfg_lock which is taken by vmd_pci_write() and vmd_pci_read() > > > > for their serialization as its already serialized by pci_lock. > > > > > > That's only true if CONFIG_PCI_LOCKLESS_CONFIG isn't set, so pci_lock > > > won't help with concurrent kernel config access in such a setup. I think > > > the previous change to raw lock proposal was the correct approach. > > > > I overlooked that. Wouldn't it make sense to let the vmd driver select > > that option rather than adding/ having a lock for the same purpose? > > The arch/x86/Kconfig always selects PCI_LOCKESS_CONFIG, so I don't think > the vmd driver can require it be turned off. Besides, no need to punish > all PCI access if only this device requires it be serialized. Okay. That makes sense. Sebastian
On Wed, Dec 18, 2024 at 08:57:53AM -0700, Keith Busch wrote: > On Wed, Dec 18, 2024 at 04:48:38PM +0100, Sebastian Andrzej Siewior wrote: > > On 2024-12-18 08:36:54 [-0700], Keith Busch wrote: > > > On Wed, Dec 18, 2024 at 08:59:51PM +0900, Ryo Takakura wrote: > > > > PCI config access is locked with pci_lock which serializes > > > > pci_user/bus_write_config*() and pci_user/bus_read_config*(). > > > > The subsequently invoked vmd_pci_write() and vmd_pci_read() are also > > > > serialized as they are only invoked by them respectively. > > > > > > > > Remove cfg_lock which is taken by vmd_pci_write() and vmd_pci_read() > > > > for their serialization as its already serialized by pci_lock. > > > > > > That's only true if CONFIG_PCI_LOCKLESS_CONFIG isn't set, so pci_lock > > > won't help with concurrent kernel config access in such a setup. I think > > > the previous change to raw lock proposal was the correct approach. > > > > I overlooked that. Wouldn't it make sense to let the vmd driver select > > that option rather than adding/ having a lock for the same purpose? > > The arch/x86/Kconfig always selects PCI_LOCKESS_CONFIG, so I don't think > the vmd driver can require it be turned off. Besides, no need to punish > all PCI access if only this device requires it be serialized. Sorry, I also missed that and induced Ryo Takakura to rewrite the patch. Luis
Hi all! on Wed, 18 Dec 2024 13:35:07 -0300, Luis Claudio R. Goncalves wrote: >On Wed, Dec 18, 2024 at 08:57:53AM -0700, Keith Busch wrote: >> On Wed, Dec 18, 2024 at 04:48:38PM +0100, Sebastian Andrzej Siewior wrote: >> > On 2024-12-18 08:36:54 [-0700], Keith Busch wrote: >> > > On Wed, Dec 18, 2024 at 08:59:51PM +0900, Ryo Takakura wrote: >> > > > PCI config access is locked with pci_lock which serializes >> > > > pci_user/bus_write_config*() and pci_user/bus_read_config*(). >> > > > The subsequently invoked vmd_pci_write() and vmd_pci_read() are also >> > > > serialized as they are only invoked by them respectively. >> > > > >> > > > Remove cfg_lock which is taken by vmd_pci_write() and vmd_pci_read() >> > > > for their serialization as its already serialized by pci_lock. >> > > >> > > That's only true if CONFIG_PCI_LOCKLESS_CONFIG isn't set, so pci_lock >> > > won't help with concurrent kernel config access in such a setup. I think >> > > the previous change to raw lock proposal was the correct approach. >> > >> > I overlooked that. Wouldn't it make sense to let the vmd driver select >> > that option rather than adding/ having a lock for the same purpose? >> >> The arch/x86/Kconfig always selects PCI_LOCKESS_CONFIG, so I don't think >> the vmd driver can require it be turned off. Besides, no need to punish >> all PCI access if only this device requires it be serialized. > >Sorry, I also missed that and induced Ryo Takakura to rewrite the patch. My apologies I missed that as well. I'll send the first version of the patch once again! >Luis Sincerely, Ryo Takakura
Hi Keith, On Wed, 18 Dec 2024 08:36:54 -0700, Keith Busch wrote: >> @@ -385,13 +384,11 @@ static int vmd_pci_read(struct pci_bus *bus, unsigned int devfn, int reg, >> { >> struct vmd_dev *vmd = vmd_from_bus(bus); >> void __iomem *addr = vmd_cfg_addr(vmd, bus, devfn, reg, len); >> - unsigned long flags; >> int ret = 0; >> >> if (!addr) >> return -EFAULT; >> >> - spin_lock_irqsave(&vmd->cfg_lock, flags); >> switch (len) { >> case 1: >> *value = readb(addr); > >There's a comment above this function explaining the need for the lock, >which doesn't make a lot of sense after this patch. Thanks for pointing it out! It slipped my mind. I'll leave it as it is for the next patch :) Sincerely, Ryo Takakura
diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c index 9d9596947..2be898424 100644 --- a/drivers/pci/controller/vmd.c +++ b/drivers/pci/controller/vmd.c @@ -125,7 +125,6 @@ struct vmd_irq_list { struct vmd_dev { struct pci_dev *dev; - spinlock_t cfg_lock; void __iomem *cfgbar; int msix_count; @@ -385,13 +384,11 @@ static int vmd_pci_read(struct pci_bus *bus, unsigned int devfn, int reg, { struct vmd_dev *vmd = vmd_from_bus(bus); void __iomem *addr = vmd_cfg_addr(vmd, bus, devfn, reg, len); - unsigned long flags; int ret = 0; if (!addr) return -EFAULT; - spin_lock_irqsave(&vmd->cfg_lock, flags); switch (len) { case 1: *value = readb(addr); @@ -406,7 +403,6 @@ static int vmd_pci_read(struct pci_bus *bus, unsigned int devfn, int reg, ret = -EINVAL; break; } - spin_unlock_irqrestore(&vmd->cfg_lock, flags); return ret; } @@ -420,13 +416,11 @@ static int vmd_pci_write(struct pci_bus *bus, unsigned int devfn, int reg, { struct vmd_dev *vmd = vmd_from_bus(bus); void __iomem *addr = vmd_cfg_addr(vmd, bus, devfn, reg, len); - unsigned long flags; int ret = 0; if (!addr) return -EFAULT; - spin_lock_irqsave(&vmd->cfg_lock, flags); switch (len) { case 1: writeb(value, addr); @@ -444,7 +438,6 @@ static int vmd_pci_write(struct pci_bus *bus, unsigned int devfn, int reg, ret = -EINVAL; break; } - spin_unlock_irqrestore(&vmd->cfg_lock, flags); return ret; } @@ -1009,7 +1002,6 @@ static int vmd_probe(struct pci_dev *dev, const struct pci_device_id *id) if (features & VMD_FEAT_OFFSET_FIRST_VECTOR) vmd->first_vec = 1; - spin_lock_init(&vmd->cfg_lock); pci_set_drvdata(dev, vmd); err = vmd_enable_domain(vmd, features); if (err)
PCI config access is locked with pci_lock which serializes pci_user/bus_write_config*() and pci_user/bus_read_config*(). The subsequently invoked vmd_pci_write() and vmd_pci_read() are also serialized as they are only invoked by them respectively. Remove cfg_lock which is taken by vmd_pci_write() and vmd_pci_read() for their serialization as its already serialized by pci_lock. This fixes the spinlock_t(cfg_lock) usage within raw_spinlock_t(pci_lock) on RT kernels where spinlock_t becomes sleepable. Reported as following: [ 18.756807] BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48 [ 18.756810] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1617, name: nodedev-init [ 18.756810] preempt_count: 1, expected: 0 [ 18.756811] RCU nest depth: 0, expected: 0 [ 18.756811] INFO: lockdep is turned off. [ 18.756812] irq event stamp: 0 [ 18.756812] hardirqs last enabled at (0): [<0000000000000000>] 0x0 [ 18.756815] hardirqs last disabled at (0): [<ffffffff864f1fe2>] copy_process+0xa62/0x2d90 [ 18.756819] softirqs last enabled at (0): [<ffffffff864f1fe2>] copy_process+0xa62/0x2d90 [ 18.756820] softirqs last disabled at (0): [<0000000000000000>] 0x0 [ 18.756822] CPU: 3 UID: 0 PID: 1617 Comm: nodedev-init Tainted: G W 6.12.1 #11 [ 18.756823] Tainted: [W]=WARN [ 18.756824] Hardware name: Dell Inc. Vostro 3710/0K1D6X, BIOS 1.14.0 06/09/2023 [ 18.756825] Call Trace: [ 18.756826] <TASK> [ 18.756827] dump_stack_lvl+0x9b/0xf0 [ 18.756830] dump_stack+0x10/0x20 [ 18.756831] __might_resched+0x158/0x230 [ 18.756833] rt_spin_lock+0x4e/0x130 [ 18.756837] ? vmd_pci_read+0x8d/0x100 [vmd] [ 18.756839] vmd_pci_read+0x8d/0x100 [vmd] [ 18.756840] pci_user_read_config_byte+0x6f/0xe0 [ 18.756843] pci_read_config+0xfe/0x290 [ 18.756845] sysfs_kf_bin_read+0x68/0x90 [ 18.756847] kernfs_fop_read_iter+0xd7/0x200 [ 18.756850] vfs_read+0x26d/0x360 [ 18.756853] ksys_read+0x70/0xf0 [ 18.756855] __x64_sys_read+0x1a/0x20 [ 18.756857] x64_sys_call+0x1715/0x20d0 [ 18.756859] do_syscall_64+0x8f/0x170 [ 18.756861] ? syscall_exit_to_user_mode+0xcd/0x2c0 [ 18.756863] ? do_syscall_64+0x9b/0x170 [ 18.756865] entry_SYSCALL_64_after_hwframe+0x76/0x7e Signed-off-by: Ryo Takakura <ryotkkr98@gmail.com> --- Thanks Luis for feedback! Changes since v1: https://lore.kernel.org/lkml/20241215141321.383144-1-ryotkkr98@gmail.com/T/ - Remove cfg_lock instead of converting it to raw spinlock as suggested by Sebastian[0]. [0] https://lore.kernel.org/linux-rt-users/20230620154434.MosrzRUh@linutronix.de/ --- drivers/pci/controller/vmd.c | 8 -------- 1 file changed, 8 deletions(-)