Message ID | 1457529455-38314-2-git-send-email-quan.xu@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, 2016-03-09 at 21:17 +0800, Quan Xu wrote: > pcidevs_lock should be held with interrupt enabled. > There's a message from Jan when he says: <<Well, I'd say something like "pcidevs_lock doesn't require interrupts to be disabled while being acquired".>> :-O > However there remains > an exception in AMD IOMMU code, where the lock is acquired with > interrupt > disabled. This inconsistency might lead to deadlock. > I appreciate that 'might' is weaker than 'can'. Personally, I still dob't find this correct, or at least clear enough, referred to this patch, but I won't be in the way because of this. > The fix is straightforward to use spin_lock instead. Also interrupt > has been > enabled when this function is invoked, so we're sure consistency > around > pcidevs_lock can be guaranteed after this fix. > And I also can't really get what "so we're sure consistency around pcidevs_lock can be guaranteed after this fix" actually means, but again, that's probably me, and it's fine. However, > Signed-off-by: Quan Xu <quan.xu@intel.com> > Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com> > This still stands **only** if the very first sentence "pcidevs_lock should be held with interrupt enabled" is changed to "pcidevs_lock doesn't require interrupts to be disabled while being acquired". Regards, Dario
CC Kevin, On March 09, 2016 11:00pm, <dario.faggioli@citrix.com> wrote: > On Wed, 2016-03-09 at 21:17 +0800, Quan Xu wrote: > > pcidevs_lock should be held with interrupt enabled. > > > There's a message from Jan when he says: > <<Well, I'd say something like "pcidevs_lock doesn't require interrupts to be > disabled while being acquired".>> > > :-O > > > However there remains > > an exception in AMD IOMMU code, where the lock is acquired with > > interrupt disabled. This inconsistency might lead to deadlock. > > > I appreciate that 'might' is weaker than 'can'. Personally, I still dob't find this > correct, or at least clear enough, referred to this patch, but I won't be in the > way because of this. > > > The fix is straightforward to use spin_lock instead. Also interrupt > > has been enabled when this function is invoked, so we're sure > > consistency around pcidevs_lock can be guaranteed after this fix. > > > And I also can't really get what "so we're sure consistency around pcidevs_lock > can be guaranteed after this fix" actually means, but again, that's probably me, > and it's fine. > > However, > > > Signed-off-by: Quan Xu <quan.xu@intel.com> > > Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com> > > > This still stands **only** if the very first sentence "pcidevs_lock should be held > with interrupt enabled" is changed to "pcidevs_lock doesn't require interrupts to > be disabled while being acquired". > Dario, I am open for this change:). When I read: 1. (look at "Lesson 3: spinlocks revisited.") https://www.kernel.org/doc/Documentation/locking/spinlocks.txt 2. comment inside check_lock(), in xen/common/spinlock.c, in Xen's codebase. 3. the "Linux Device Drivers, 3rd Edition" book , http://www.makelinux.net/ldd3/chp-5-sect-5 I found that: - "We partition locks into IRQ-safe (__always__ held with IRQs disabled) and IRQ-unsafe (__always__ held with IRQs enabled) types". It looks like Kevin's description is better. I also found that you mentioned in this thread: "... __except for very special situations__". If it is true, I think your description is better. Thanks for your time..:):) Quan
Hi Quan, Can I ask a dumb question? On Wed, Mar 9, 2016 at 8:17 AM, Quan Xu <quan.xu@intel.com> wrote: > pcidevs_lock should be held with interrupt enabled. However there remains > an exception in AMD IOMMU code, where the lock is acquired with interrupt > disabled. This inconsistency might lead to deadlock. Why will this inconsistency lead to deadlock? I understand the difference between spin_lock_irqsave(), which disable interrupt, and spin_lock(), which allows interrupt, however, I didn't get why misuse the spin_lock_irqsave() at the place of spin_lock() could potentially lead to a deadlock? Would you minding pointing me to somewhere I can find the reason or enlighten me? Thank you very much! Best, Meng ----------- Meng Xu PhD Student in Computer and Information Science University of Pennsylvania http://www.cis.upenn.edu/~mengxu/
On March 11, 2016 11:25am, <mengxu@cis.upenn.edu> wrote: > On Wed, Mar 9, 2016 at 8:17 AM, Quan Xu <quan.xu@intel.com> wrote: > > pcidevs_lock should be held with interrupt enabled. However there > > remains an exception in AMD IOMMU code, where the lock is acquired > > with interrupt disabled. This inconsistency might lead to deadlock. > > Why will this inconsistency lead to deadlock? > I understand the difference between spin_lock_irqsave(), which disable interrupt, > and spin_lock(), which allows interrupt, however, I didn't get why misuse the > spin_lock_irqsave() at the place of spin_lock() could potentially lead to a > deadlock? For above 2 questions: Mix is just one of _possible_ scenarios(IMO, not 100% leading to a deadlock): -where an interrupt tries to lock an already locked variable. This is ok if the other interrupt happens on another CPU, but it is _not_ ok if the interrupt happens on the same CPU that already holds the lock I got it from the bellow 2 points: 1).As Jan mentioned, The implication from disabling interrupts while acquiring a lock is that the lock is also being acquired by some interrupt handler. If you mix acquire types, the one not disabling interrupts is prone to be interrupted, and the interrupt trying to get hold of the lock the same CPU already owns. 2). Comment inside check_lock(), we partition locks into IRQ-safe (always held with IRQs disabled) and IRQ-unsafe (always held with IRQs enabled) types. The convention for every lock must be consistently observed else we can deadlock in IRQ-context rendezvous functions (__a rendezvous which gets every CPU into IRQ context before any CPU is released from the rendezvous__). If we can mix IRQ-disabled and IRQ-enabled callers, the following can happen: * Lock is held by CPU A, with IRQs enabled * CPU B is spinning on same lock, with IRQs disabled * Rendezvous starts -- CPU A takes interrupt and enters rendezbous spin * DEADLOCK -- CPU B will never enter rendezvous, CPU A will never exit the rendezvous, and will hence never release the lock. To guard against this subtle bug we latch the IRQ safety of every spinlock in the system, on first use. > Would you minding pointing me to somewhere I can find the reason or > enlighten me? > Some reference material: * (look at "Lesson 3: spinlocks revisited.") https://www.kernel.org/doc/Documentation/locking/spinlocks.txt * comment inside check_lock(), in xen/common/spinlock.c, in Xen's codebase. * the "Linux Device Drivers, 3rd Edition" book , http://www.makelinux.net/ldd3/chp-5-sect-5 Quan
On Fri, 2016-03-11 at 06:54 +0000, Xu, Quan wrote: > On March 11, 2016 11:25am, <mengxu@cis.upenn.edu> wrote: > > > > On Wed, Mar 9, 2016 at 8:17 AM, Quan Xu <quan.xu@intel.com> wrote: > > > > > > pcidevs_lock should be held with interrupt enabled. However there > > > remains an exception in AMD IOMMU code, where the lock is > > > acquired > > > with interrupt disabled. This inconsistency might lead to > > > deadlock. > > Why will this inconsistency lead to deadlock? > > I understand the difference between spin_lock_irqsave(), which > > disable interrupt, > > and spin_lock(), which allows interrupt, however, I didn't get why > > misuse the > > spin_lock_irqsave() at the place of spin_lock() could potentially > > lead to a > > deadlock? > > 1).As Jan mentioned, The implication from disabling interrupts while > acquiring a lock is that the lock is also being acquired by some > interrupt handler. > If you mix acquire types, the one not disabling interrupts is prone > to be interrupted, and the interrupt trying to get hold of the lock > the same CPU already owns. > The key issue is whether or not a lock can be acquired from IRQ context (i.e., in an interrupt handler, or a function called by that, as a result of an interrupt occurring). For lock that can, IRQ disabling variants must be used *everywhere* the lock is taken (so, e.g., not only when it is actually taken from IRQ context, just *always*!). If that rule is not honored, we are ok if the lock is taken on CPUA, and the interrupt handled on CPUB: CPUA CPUB . . . . spin_lock(l) . . . . . <-- Interrupt! . . . spin_lock(l); //spins on l . x //spins on l . x //spins on l spin_unlock(l) . //takes l . . . spin_unlock(l); . . <-- . . . but the following can happen, if the interrupt is delivered to the CPU that has already taken the lock: CPU . . [1] spin_lock(l); //takes l . . <-- Interrupt! . spin_lock(l) // CPU deadlocks If, in the latter case, spin_lock_irq(l) were used at [1], things would have been fine, as the interrupt wouldn't have occurred until l weren't released: CPU . . spin_lock_irq(l) //takes l . . <---------------- Interrupt! . - //IRQs are disabled . - //IRQs are disabled . - //IRQs are disabled spin_unlock_irq(l) . //IRQs re-hanbled spin_lock_irq(l); //takes l . . spin_unlock_irq(l); . <----------------- . . Note that whether spin_lock_irq() or spin_lock_irqsave() should be used, is completely independent from this, and it must be decided according to whether IRQs are disabled already or not when taking the lock. And (quite obviously, but since wre're here) it is fine to mix spin_lock_irq() and spin_lock_irqsave(), as they both disable interrupts, which is what matters. So, now, if we know for sure that a lock is _never_ever_ever_ taken from interrupt context, can we mix spin_lock() and spin_lock_irq() on it (for whatever reason)? Well, as far as the above reasoning is concerned, yes. In fact, the deadlock arises because IRQs interrupt asynchronously what the CPU is doing, and that can happen when the CPU has taken the lock already. But if the 'asynchronous' part goes away, we really don't care whether a lock is take at time t1 with IRQ enabled, and at time t2 with IRQ disabled, don't you think? Well, here it is where what the comment inside check_lock() describes comes into play. As quoted by Qaun already: > 2). Comment inside check_lock(), > we partition locks into IRQ-safe (always held with IRQs disabled) and > IRQ-unsafe (always held with IRQs enabled) types. The convention for > every lock must be consistently observed else we can deadlock in > IRQ-context rendezvous functions (__a rendezvous which gets every CPU > into IRQ context before any CPU is released from the rendezvous__). > If we can mix IRQ-disabled and IRQ-enabled callers, the following can > happen: > * Lock is held by CPU A, with IRQs enabled > * CPU B is spinning on same lock, with IRQs disabled > * Rendezvous starts -- CPU A takes interrupt and enters rendezbous > spin > * DEADLOCK -- CPU B will never enter rendezvous, CPU A will never > exit > the rendezvous, and will hence never release the lock. > > To guard against this subtle bug we latch the IRQ safety of every > spinlock in the system, on first use. > This is a very good safety measure, but it can be quite a restriction in some (luckily limited) cases. And that's why it is possible -- although absolutely discouraged-- to relax it. See, for an example of this, the comment in start_secondary(), in xen/arch/x86/smpboot.c: ... /* * Just as during early bootstrap, it is convenient here to disable * spinlock checking while we have IRQs disabled. This allows us to * acquire IRQ-unsafe locks when it would otherwise be disallowed. * * It is safe because the race we are usually trying to avoid involves * a group of CPUs rendezvousing in an IPI handler, where one cannot * join because it is spinning with IRQs disabled waiting to acquire a * lock held by another in the rendezvous group (the lock must be an * IRQ-unsafe lock since the CPU took the IPI after acquiring it, and * hence had IRQs enabled). This is a deadlock scenario. * * However, no CPU can be involved in rendezvous until it is online, * hence no such group can be waiting for this CPU until it is * visible in cpu_online_map. Hence such a deadlock is not possible. */ spin_debug_disable(); ... Just FTR, at least as far as I can remember, Linux does not enforce anything like that. Hope this clarifies things. Regards, Dario
On March 11, 2016 6:36pm, <Dario Faggioli> wrote: > On Fri, 2016-03-11 at 06:54 +0000, Xu, Quan wrote: > > On March 11, 2016 11:25am, <mengxu@cis.upenn.edu> wrote: > > > > > > On Wed, Mar 9, 2016 at 8:17 AM, Quan Xu <quan.xu@intel.com> wrote: > > > > > > > > pcidevs_lock should be held with interrupt enabled. However there > > > > remains an exception in AMD IOMMU code, where the lock is acquired > > > > with interrupt disabled. This inconsistency might lead to > > > > deadlock. > > > Why will this inconsistency lead to deadlock? > > > I understand the difference between spin_lock_irqsave(), which > > > disable interrupt, and spin_lock(), which allows interrupt, however, > > > I didn't get why misuse the > > > spin_lock_irqsave() at the place of spin_lock() could potentially > > > lead to a deadlock? > > > > 1).As Jan mentioned, The implication from disabling interrupts while > > acquiring a lock is that the lock is also being acquired by some > > interrupt handler. > > If you mix acquire types, the one not disabling interrupts is prone > > to be interrupted, and the interrupt trying to get hold of the lock > > the same CPU already owns. > > > The key issue is whether or not a lock can be acquired from IRQ context (i.e., in > an interrupt handler, or a function called by that, as a result of an interrupt > occurring). > > For lock that can, IRQ disabling variants must be used *everywhere* the lock is > taken (so, e.g., not only when it is actually taken from IRQ context, just > *always*!). > > If that rule is not honored, we are ok if the lock is taken on CPUA, and the > interrupt handled on CPUB: > > CPUA CPUB > . . > . . > spin_lock(l) . > . . > . . <-- Interrupt! > . . > . spin_lock(l); //spins on l > . x //spins on l > . x //spins on l > spin_unlock(l) . //takes l > . . > . spin_unlock(l); > . . <-- . > . . > > but the following can happen, if the interrupt is delivered to the CPU that has > already taken the lock: > > CPU > . > . > [1] spin_lock(l); //takes l > . > . <-- Interrupt! > . > spin_lock(l) // CPU deadlocks > > If, in the latter case, spin_lock_irq(l) were used at [1], things would have been > fine, as the interrupt wouldn't have occurred until l weren't > released: > > CPU > . > . > spin_lock_irq(l) //takes l > . > . <---------------- Interrupt! > . - //IRQs are disabled > . - //IRQs are disabled > . - //IRQs are disabled > spin_unlock_irq(l) . //IRQs re-hanbled > spin_lock_irq(l); //takes l > . > . > spin_unlock_irq(l); > . <----------------- . > . > > Note that whether spin_lock_irq() or spin_lock_irqsave() should be used, is > completely independent from this, and it must be decided according to whether > IRQs are disabled already or not when taking the lock. And (quite obviously, but > since wre're here) it is fine to mix > spin_lock_irq() and spin_lock_irqsave(), as they both disable interrupts, which is > what matters. > > So, now, if we know for sure that a lock is _never_ever_ever_ taken from > interrupt context, can we mix spin_lock() and spin_lock_irq() on it (for whatever > reason)? Well, as far as the above reasoning is concerned, yes. > I appreciate Dario's explanation. btw, be careful of spin_lock_irq(), which includes 'ASSERT(local_irq_is_enabled()'.. > In fact, the deadlock arises because IRQs interrupt asynchronously what the > CPU is doing, and that can happen when the CPU has taken the lock already. But > if the 'asynchronous' part goes away, we really don't care whether a lock is take > at time t1 with IRQ enabled, and at time t2 with IRQ disabled, don't you think? > Yes. Consistency may be helpful to avoid some easy-to-avoid lock errors. Moreover, without my fix, I think it would not lead dead lock, as the pcidevs_lock is not being taken In IRQ context. Right? For deadlock, I think the key problems are: - A lock can be acquired from IRQ context -The interrupt is delivered to the _same_CPU_ that already holds the lock. Quan > Well, here it is where what the comment inside check_lock() describes comes > into play. As quoted by Qaun already: > > > 2). Comment inside check_lock(), > > we partition locks into IRQ-safe (always held with IRQs disabled) and > > IRQ-unsafe (always held with IRQs enabled) types. The convention for > > every lock must be consistently observed else we can deadlock in > > IRQ-context rendezvous functions (__a rendezvous which gets every CPU > > into IRQ context before any CPU is released from the rendezvous__). > > If we can mix IRQ-disabled and IRQ-enabled callers, the following can > > happen: > > * Lock is held by CPU A, with IRQs enabled > > * CPU B is spinning on same lock, with IRQs disabled > > * Rendezvous starts -- CPU A takes interrupt and enters rendezbous > > spin > > * DEADLOCK -- CPU B will never enter rendezvous, CPU A will never > > exit > > the rendezvous, and will hence never release the lock. > > > > To guard against this subtle bug we latch the IRQ safety of every > > spinlock in the system, on first use. > > > This is a very good safety measure, but it can be quite a restriction in some > (luckily limited) cases. And that's why it is possible -- although absolutely > discouraged-- to relax it. See, for an example of this, the comment in > start_secondary(), in xen/arch/x86/smpboot.c: > > ... > /* > * Just as during early bootstrap, it is convenient here to disable > * spinlock checking while we have IRQs disabled. This allows us to > * acquire IRQ-unsafe locks when it would otherwise be disallowed. > * > * It is safe because the race we are usually trying to avoid involves > * a group of CPUs rendezvousing in an IPI handler, where one cannot > * join because it is spinning with IRQs disabled waiting to acquire a > * lock held by another in the rendezvous group (the lock must be an > * IRQ-unsafe lock since the CPU took the IPI after acquiring it, and > * hence had IRQs enabled). This is a deadlock scenario. > * > * However, no CPU can be involved in rendezvous until it is online, > * hence no such group can be waiting for this CPU until it is > * visible in cpu_online_map. Hence such a deadlock is not possible. > */ > spin_debug_disable(); > ... > > Just FTR, at least as far as I can remember, Linux does not enforce anything like > that. > > Hope this clarifies things.
On Fri, 2016-03-11 at 12:36 +0000, Xu, Quan wrote: > On March 11, 2016 6:36pm, <Dario Faggioli> wrote: > > On Fri, 2016-03-11 at 06:54 +0000, Xu, Quan wrote: > > > > > So, now, if we know for sure that a lock is _never_ever_ever_ taken > > from > > interrupt context, can we mix spin_lock() and spin_lock_irq() on it > > (for whatever > > reason)? Well, as far as the above reasoning is concerned, yes. > > > I appreciate Dario's explanation. > btw, be careful of spin_lock_irq(), which includes > 'ASSERT(local_irq_is_enabled()'.. > It does. What about it? That is exactly what marks the difference between spin_lock_irq() and spin_lock_irqsave(). In fact, the former should not be used if interrupts are already disabled because then, when calling the corresponding spin_unlock_irq(), they'd be re-enabled while instead they shouldn't. But as said, from the point of view of preventing deadlock, for locks taken both from inside and outside IRQ context, they're equivalent. > > > > In fact, the deadlock arises because IRQs interrupt asynchronously > > what the > > CPU is doing, and that can happen when the CPU has taken the lock > > already. But > > if the 'asynchronous' part goes away, we really don't care whether > > a lock is take > > at time t1 with IRQ enabled, and at time t2 with IRQ disabled, > > don't you think? > > > Yes. > Consistency may be helpful to avoid some easy-to-avoid lock errors. > Moreover, without my fix, I think it would not lead dead lock, as the > pcidevs_lock is not being taken > In IRQ context. Right? > > > For deadlock, I think the key problems are: > - A lock can be acquired from IRQ context > -The interrupt is delivered to the _same_CPU_ that already holds > the lock. > In your case, pcidevs_lock is certainly not being taken from both inside and outside IRQ context. If it where, using spin_lock() --as it happens basically everywhere in the code-- would be wrong, and using spin_lock_irq[save]() --as it happens in the only case you're patching- - would be what would be right! Regards, Dario
Hi Quan and Dario, On Fri, Mar 11, 2016 at 5:35 AM, Dario Faggioli <dario.faggioli@citrix.com> wrote: > On Fri, 2016-03-11 at 06:54 +0000, Xu, Quan wrote: >> On March 11, 2016 11:25am, <mengxu@cis.upenn.edu> wrote: >> > >> > On Wed, Mar 9, 2016 at 8:17 AM, Quan Xu <quan.xu@intel.com> wrote: >> > > >> > > pcidevs_lock should be held with interrupt enabled. However there >> > > remains an exception in AMD IOMMU code, where the lock is >> > > acquired >> > > with interrupt disabled. This inconsistency might lead to >> > > deadlock. >> > Why will this inconsistency lead to deadlock? >> > I understand the difference between spin_lock_irqsave(), which >> > disable interrupt, >> > and spin_lock(), which allows interrupt, however, I didn't get why >> > misuse the >> > spin_lock_irqsave() at the place of spin_lock() could potentially >> > lead to a >> > deadlock? >> >> 1).As Jan mentioned, The implication from disabling interrupts while >> acquiring a lock is that the lock is also being acquired by some >> interrupt handler. >> If you mix acquire types, the one not disabling interrupts is prone >> to be interrupted, and the interrupt trying to get hold of the lock >> the same CPU already owns. >> > The key issue is whether or not a lock can be acquired from IRQ context > (i.e., in an interrupt handler, or a function called by that, as a > result of an interrupt occurring). > > For lock that can, IRQ disabling variants must be used *everywhere* the > lock is taken (so, e.g., not only when it is actually taken from IRQ > context, just *always*!). > > If that rule is not honored, we are ok if the lock is taken on CPUA, > and the interrupt handled on CPUB: > > CPUA CPUB > . . > . . > spin_lock(l) . > . . > . . <-- Interrupt! > . . > . spin_lock(l); //spins on l > . x //spins on l > . x //spins on l > spin_unlock(l) . //takes l > . . > . spin_unlock(l); > . . <-- . > . . > > but the following can happen, if the interrupt is delivered to the CPU > that has already taken the lock: > > CPU > . > . > [1] spin_lock(l); //takes l > . > . <-- Interrupt! > . > spin_lock(l) // CPU deadlocks > > If, in the latter case, spin_lock_irq(l) were used at [1], things would > have been fine, as the interrupt wouldn't have occurred until l weren't > released: > > CPU > . > . > spin_lock_irq(l) //takes l > . > . <---------------- Interrupt! > . - //IRQs are disabled > . - //IRQs are disabled > . - //IRQs are disabled > spin_unlock_irq(l) . //IRQs re-hanbled > spin_lock_irq(l); //takes l > . > . > spin_unlock_irq(l); > . <----------------- . > . > > Note that whether spin_lock_irq() or spin_lock_irqsave() should be > used, is completely independent from this, and it must be decided > according to whether IRQs are disabled already or not when taking the > lock. And (quite obviously, but since wre're here) it is fine to mix > spin_lock_irq() and spin_lock_irqsave(), as they both disable > interrupts, which is what matters. > > So, now, if we know for sure that a lock is _never_ever_ever_ taken > from interrupt context, can we mix spin_lock() and spin_lock_irq() on > it (for whatever reason)? Well, as far as the above reasoning is > concerned, yes. > > In fact, the deadlock arises because IRQs interrupt asynchronously what > the CPU is doing, and that can happen when the CPU has taken the lock > already. But if the 'asynchronous' part goes away, we really don't care > whether a lock is take at time t1 with IRQ enabled, and at time t2 with > IRQ disabled, don't you think? > > Well, here it is where what the comment inside check_lock() describes > comes into play. As quoted by Qaun already: > >> 2). Comment inside check_lock(), >> we partition locks into IRQ-safe (always held with IRQs disabled) and >> IRQ-unsafe (always held with IRQs enabled) types. The convention for >> every lock must be consistently observed else we can deadlock in >> IRQ-context rendezvous functions (__a rendezvous which gets every CPU >> into IRQ context before any CPU is released from the rendezvous__). >> If we can mix IRQ-disabled and IRQ-enabled callers, the following can >> happen: >> * Lock is held by CPU A, with IRQs enabled >> * CPU B is spinning on same lock, with IRQs disabled >> * Rendezvous starts -- CPU A takes interrupt and enters rendezbous >> spin >> * DEADLOCK -- CPU B will never enter rendezvous, CPU A will never >> exit >> the rendezvous, and will hence never release the lock. >> >> To guard against this subtle bug we latch the IRQ safety of every >> spinlock in the system, on first use. >> > This is a very good safety measure, but it can be quite a restriction > in some (luckily limited) cases. And that's why it is possible -- > although absolutely discouraged-- to relax it. See, for an example of > this, the comment in start_secondary(), in xen/arch/x86/smpboot.c: > > ... > /* > * Just as during early bootstrap, it is convenient here to disable > * spinlock checking while we have IRQs disabled. This allows us to > * acquire IRQ-unsafe locks when it would otherwise be disallowed. > * > * It is safe because the race we are usually trying to avoid involves > * a group of CPUs rendezvousing in an IPI handler, where one cannot > * join because it is spinning with IRQs disabled waiting to acquire a > * lock held by another in the rendezvous group (the lock must be an > * IRQ-unsafe lock since the CPU took the IPI after acquiring it, and > * hence had IRQs enabled). This is a deadlock scenario. > * > * However, no CPU can be involved in rendezvous until it is online, > * hence no such group can be waiting for this CPU until it is > * visible in cpu_online_map. Hence such a deadlock is not possible. > */ > spin_debug_disable(); > ... > > Just FTR, at least as far as I can remember, Linux does not enforce > anything like that. > > Hope this clarifies things. Thank you very much for your explanation and education! They are really helpful! :-D Let me summarize: ;-) | | spin_lock | spin_lock_irq | spin_lock_irqsave | Can run in irq context? | No | Yes | Yes | Can run in irq disabled context? | No | No | Yes Why deadlock may occur if we mix the spin_lock and spin_lock_irq(save)? If we mix the spin_lock and spin_lock_irq(save), and a group of CPUs rendezvousing in an IPI handler, we will have deadlock. Because the CPU A that takes spin_lock will wait for the rendezvousing condition to be satisfied, while the CPU B that takes th spin_lock_irq(save) will not enter into the rendezvousing condition (since it disable the interrupt). Then, CPU A waits for CPU B to handle the IPI to get out of the rendezvousing condition (kind of synchrous point), which won't until it gets the spin_lock. CPU B waits for CPU A to release the spin_lock, which won't until it get out of the rendezvousing condition; Are my understanding and summary correct? Thanks and Best Regards, Meng ----------- Meng Xu PhD Student in Computer and Information Science University of Pennsylvania http://www.cis.upenn.edu/~mengxu/
> >> In fact, the deadlock arises because IRQs interrupt asynchronously what the >> CPU is doing, and that can happen when the CPU has taken the lock already. But >> if the 'asynchronous' part goes away, we really don't care whether a lock is take >> at time t1 with IRQ enabled, and at time t2 with IRQ disabled, don't you think? >> > > Yes. > Consistency may be helpful to avoid some easy-to-avoid lock errors. > Moreover, without my fix, I think it would not lead dead lock, as the pcidevs_lock is not being taken > In IRQ context. Right? I think without your fix, the deadlock may still happen due to the rendezvous condition. CPU A | CPU B | CPU C Step 1| spin_lock | Step 2| | spin_lock_irq | Step 3| | wait for A to unlock | Step 4| | send rendezvous IPI to A and B Step 5| receive IPI | wait for A to unlock | Step 6| wait for B to handle the IPI | wait for A to unlock | Step 7| spin_unlock Deadlock occurs at Step 6, IMO. Unless we can prove that rendezvous won't happen while spin_lock_irqsave is taken, we have the deadlock hazard. Actually, I'm not quite sure when the rendezvous condition may happen in this situation. So probably, in reality, we don't have the rendezvous condition. > > > For deadlock, I think the key problems are: > - A lock can be acquired from IRQ context > -The interrupt is delivered to the _same_CPU_ that already holds the lock. > > This is one type of deadlock, not the one due to rendezvous condition, I think. :-) Thanks and Best Regards, Meng ----------- Meng Xu PhD Student in Computer and Information Science University of Pennsylvania http://www.cis.upenn.edu/~mengxu/
On Fri, 2016-03-11 at 09:49 -0500, Meng Xu wrote: > > Yes. > > Consistency may be helpful to avoid some easy-to-avoid lock errors. > > Moreover, without my fix, I think it would not lead dead lock, as > > the pcidevs_lock is not being taken > > In IRQ context. Right? > I think without your fix, the deadlock may still happen due to the > rendezvous condition. > CPU A | CPU B > | CPU C > Step 1| spin_lock | > Step 2| | > spin_lock_irq | > Step 3| | wait for A to > unlock | > Step 4| > | send rendezvous IPI to A and B > Step 5| receive IPI | wait for A to > unlock | > Step 6| wait for B to handle the IPI | wait for A to unlock | > Step 7| spin_unlock > > > Deadlock occurs at Step 6, IMO. > > Unless we can prove that rendezvous won't happen while > spin_lock_irqsave is taken, we have the deadlock hazard. > Yes. But, in the case of Quan's patch (without it, I mean), have you seen where in the code it is that we use spin_lock_irqsave()? It's inside a function that is called during Xen boot, whose callchain starts with iommu_setup(), from __start_xen(). Here's a (big, sorry) code snippet of what is around iommu_setup(): ... init_idle_domain(); this_cpu(stubs.addr) = alloc_stub_page(smp_processor_id(), &this_cpu(stubs).mfn); BUG_ON(!this_cpu(stubs.addr)); trap_init(); rcu_init(); early_time_init(); arch_init_memory(); alternative_instructions(); local_irq_enable(); pt_pci_init(); vesa_mtrr_init(); acpi_mmcfg_init(); early_msi_init(); iommu_setup(); /* setup iommu if available */ smp_prepare_cpus(max_cpus); spin_debug_enable(); /* * Initialise higher-level timer functions. We do this fairly late * (after interrupts got enabled) because the time bases and scale * factors need to be updated regularly. */ init_xen_time(); initialize_keytable(); console_init_postirq(); system_state = SYS_STATE_smp_boot; do_presmp_initcalls(); for_each_present_cpu ( i ) { /* Set up cpu_to_node[]. */ srat_detect_node(i); /* Set up node_to_cpumask based on cpu_to_node[]. */ numa_add_cpu(i); if ( (num_online_cpus() < max_cpus) && !cpu_online(i) ) { int ret = cpu_up(i); if ( ret != 0 ) printk("Failed to bring up CPU %u (error %d)\n", i, ret); } } printk("Brought up %ld CPUs\n", (long)num_online_cpus()); ... As you can see, it is only *after* iommu_setup() that we call functions like smp_prepare_cpus(), do_presmp_initcalls(), and then the loop that waits for all the present CPUs to come online. What that means is that, at iommu_setup() time, there still is only one CPU online, and there is not much chances that one single CPU deadlocks in a rendezvous! Honestly, the biggest issue that I think Quan's patch solves, is that if we ever want/manage to move spin_debug_enable() up above it, then the BUG_ON in check_lock() would trigger the first time that pcidevs_lock would be taken with interrupts enabled. Until then, code is technically fine, and, as a matter of fact, I think that removing the locking from that particular instance would be an equally effective fix! All that being said, consistency is indeed important, and for the sake of it and for other reasons too, even if, strictly speaking, there isn't any actual buggy behavior to be fixed here, and it is worthwhile conforming to a locking pattern that is consistent with the rules that we sat ourselves, unless there's specific reasons not to. Regards, Dario
On Fri, 2016-03-11 at 09:41 -0500, Meng Xu wrote: > Thank you very much for your explanation and education! They are > really helpful! :-D > > Let me summarize: ;-) > > > > | spin_lock | > spin_lock_irq | spin_lock_irqsave > > > > Can run in irq context? | No | Yes > | Yes > > > > Can run in irq disabled context? | > > No | No | Yes > The table came out mangled, at least in my mail reader. For what I can see, I think I see the point you're trying to make, and it's hard to say whether the table itself is right or wrong... Probably, a table like this, is just not the best way with which to try to represent things. For instance, you seem to be saying that spin_lock() can't be used if interrupts are disabled, which is not true. For instance, it is perfectly fine to do the following: CPU: . spin_lock_irq(l1); . . [1] spin_lock(l2); . . [2] spin_unlock(l2); . . spin_unlock_irq(l1); . Even if l2 is also taken in an interrupt handler. In fact, what we want is make sure that such an interrupt handler that may take l2, does not interrupt CPU in the [1]--[2] time frame... But that can't happen because interrupts are already disabled, so just using spin_lock() is ok, and should be done, as that's cheaper than spin_lock_irqsave(). Fact is, what really matters is whether or not a lock is taken both inside and outside of IRQ context. As a matter of fact, it is mostly the case that, if a lock is ever taken inside an interrupt handler, then spin_lock_irqsave() is what you want... but this does not justify coming up with a table like the one above and claiming it to be general. > Why deadlock may occur if we mix the spin_lock and > spin_lock_irq(save)? > If we mix the spin_lock and spin_lock_irq(save), and a group of CPUs > rendezvousing in an IPI handler, we will have deadlock. Because the > CPU A that takes spin_lock will wait for the rendezvousing condition > to be satisfied, while the CPU B that takes th spin_lock_irq(save) > will not enter into the rendezvousing condition (since it disable the > interrupt). Then, > CPU A waits for CPU B to handle the IPI to get out of the > rendezvousing condition (kind of synchrous point), which won't until > it gets the spin_lock. > CPU B waits for CPU A to release the spin_lock, which won't until it > get out of the rendezvousing condition; > > Are my understanding and summary correct? > Yes, this looks a decent explanation of the rendezvous-related spinlock problem... Although I prefer the wording that we do have already in check_lock() and in start_secondary(). :-D Regards, Dario
On Fri, Mar 11, 2016 at 10:55 AM, Dario Faggioli <dario.faggioli@citrix.com> wrote: > On Fri, 2016-03-11 at 09:49 -0500, Meng Xu wrote: >> > Yes. >> > Consistency may be helpful to avoid some easy-to-avoid lock errors. >> > Moreover, without my fix, I think it would not lead dead lock, as >> > the pcidevs_lock is not being taken >> > In IRQ context. Right? >> I think without your fix, the deadlock may still happen due to the >> rendezvous condition. >> CPU A | CPU B >> | CPU C >> Step 1| spin_lock | >> Step 2| | >> spin_lock_irq | >> Step 3| | wait for A to >> unlock | >> Step 4| >> | send rendezvous IPI to A and B >> Step 5| receive IPI | wait for A to >> unlock | >> Step 6| wait for B to handle the IPI | wait for A to unlock | >> Step 7| spin_unlock >> >> >> Deadlock occurs at Step 6, IMO. >> >> Unless we can prove that rendezvous won't happen while >> spin_lock_irqsave is taken, we have the deadlock hazard. >> > Yes. But, in the case of Quan's patch (without it, I mean), have you > seen where in the code it is that we use spin_lock_irqsave()? > > It's inside a function that is called during Xen boot, whose callchain > starts with iommu_setup(), from __start_xen(). Here's a (big, sorry) > code snippet of what is around iommu_setup(): > > ... > init_idle_domain(); > > this_cpu(stubs.addr) = alloc_stub_page(smp_processor_id(), > &this_cpu(stubs).mfn); > BUG_ON(!this_cpu(stubs.addr)); > > trap_init(); > rcu_init(); > > early_time_init(); > > arch_init_memory(); > > alternative_instructions(); > > local_irq_enable(); > > pt_pci_init(); > > vesa_mtrr_init(); > > acpi_mmcfg_init(); > > early_msi_init(); > > iommu_setup(); /* setup iommu if available */ > > smp_prepare_cpus(max_cpus); > > spin_debug_enable(); > > /* > * Initialise higher-level timer functions. We do this fairly late > * (after interrupts got enabled) because the time bases and scale > * factors need to be updated regularly. > */ > init_xen_time(); > initialize_keytable(); > console_init_postirq(); > > system_state = SYS_STATE_smp_boot; > do_presmp_initcalls(); > for_each_present_cpu ( i ) > { > /* Set up cpu_to_node[]. */ > srat_detect_node(i); > /* Set up node_to_cpumask based on cpu_to_node[]. */ > numa_add_cpu(i); > > if ( (num_online_cpus() < max_cpus) && !cpu_online(i) ) > { > int ret = cpu_up(i); > if ( ret != 0 ) > printk("Failed to bring up CPU %u (error %d)\n", i, ret); > } > } > printk("Brought up %ld CPUs\n", (long)num_online_cpus()); > ... > > As you can see, it is only *after* iommu_setup() that we call functions > like smp_prepare_cpus(), do_presmp_initcalls(), and then the loop that > waits for all the present CPUs to come online. > > What that means is that, at iommu_setup() time, there still is only one > CPU online, and there is not much chances that one single CPU deadlocks > in a rendezvous! > > Honestly, the biggest issue that I think Quan's patch solves, is that > if we ever want/manage to move spin_debug_enable() up above it, then > the BUG_ON in check_lock() would trigger the first time that > pcidevs_lock would be taken with interrupts enabled. Right! I got it now.. :-) The lock is initialized as SPIN_LOCK_UNLOCKED. check_lock() will trigger the BUG_ON when it is used in a interrupt disabled situation. > > Until then, code is technically fine, and, as a matter of fact, I think > that removing the locking from that particular instance would be an > equally effective fix! > > All that being said, consistency is indeed important, and for the sake > of it and for other reasons too, even if, strictly speaking, there > isn't any actual buggy behavior to be fixed here, and it is worthwhile > conforming to a locking pattern that is consistent with the rules that > we sat ourselves, unless there's specific reasons not to. I see. Even though no deadlock may happen, consistency can be the reason to modify. :-) But it's good to know the real reason of making the change, which could be avoiding the deadlock, be consistency, or both. It seems to me that this is more for consistency, and avoid the potential deadlock (although not so sure if it will eventually happen, it could be a hazard.). Thank you, Dario and Quan, very much for your explanation! :-) They are really helpful! :-D BTW, I'm sorry for the messed format of the table. Let me reformat it here: Condition 1 (C1): Can run in irq context? | spin_lock | spin_lock_irq | spin_lock_irqsave C1 | No | Yes | Yes C2 | No | No | Yes It may be too fast (incorrect) to get the generalized conclusion, but it should give some overview of the differences of these three locks. Just for reference later. ;-) Thanks, Meng ----------- Meng Xu PhD Student in Computer and Information Science University of Pennsylvania http://www.cis.upenn.edu/~mengxu/
diff --git a/xen/drivers/passthrough/amd/iommu_init.c b/xen/drivers/passthrough/amd/iommu_init.c index d90a2d2..a400497 100644 --- a/xen/drivers/passthrough/amd/iommu_init.c +++ b/xen/drivers/passthrough/amd/iommu_init.c @@ -778,7 +778,6 @@ static bool_t __init set_iommu_interrupt_handler(struct amd_iommu *iommu) { int irq, ret; hw_irq_controller *handler; - unsigned long flags; u16 control; irq = create_irq(NUMA_NO_NODE); @@ -788,10 +787,10 @@ static bool_t __init set_iommu_interrupt_handler(struct amd_iommu *iommu) return 0; } - spin_lock_irqsave(&pcidevs_lock, flags); + spin_lock(&pcidevs_lock); iommu->msi.dev = pci_get_pdev(iommu->seg, PCI_BUS(iommu->bdf), PCI_DEVFN2(iommu->bdf)); - spin_unlock_irqrestore(&pcidevs_lock, flags); + spin_unlock(&pcidevs_lock); if ( !iommu->msi.dev ) { AMD_IOMMU_DEBUG("IOMMU: no pdev for %04x:%02x:%02x.%u\n",