diff mbox

[v2,1/2] IOMMU/spinlock: Fix a bug found in AMD IOMMU initialization.

Message ID 1457492889-36625-2-git-send-email-quan.xu@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Quan Xu March 9, 2016, 3:08 a.m. UTC
When iommu_setup() is called in __start_xen(), interrupts have
already been enabled, and nothing disables (without reenabling)
them again in the path that leads to calling
set_iommu_interrupt_handler(). There's therefore no risk of them
being disabled and needing remaining so, and hence no need for
saving and restoring the flags. This means that, even if interrupt
would need to be disabled when taking pcidevs_lock, spin_lock_irq()
and spin_unlock_irq() could be used.

Inside set_iommu_interrupt_handler(), pcidevs_lock serializes calls
to pci_get_pdev(), which does not require interrupts to be disabled
during its execution. In fact, pcidevs_lock is always (except for
this case) taken without disabling interrupts, and disabling them in
this case would make the BUG_ON() in check_lock() trigger, if it
wasn't for the fact that spinlock debugging checks are still
disabled, during initialization, when iommu_setup() (which then end
up calling set_iommu_interrupt_handler()) is called.

In order to fix this, we can use just plain spin_lock() and spin_unlock(),
instead of spin_lock_irqsave() and spin_unlock_irqrestore().

Signed-off-by: Quan Xu <quan.xu@intel.com>
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
CC: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
CC: Dario Faggioli <dario.faggioli@citrix.com>
CC: Jan Beulich <jbeulich@suse.com>
---
 xen/drivers/passthrough/amd/iommu_init.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

Comments

Tian, Kevin March 9, 2016, 5:19 a.m. UTC | #1
> From: Quan Xu

> Sent: Wednesday, March 09, 2016 11:08 AM


Hi, Quan, sorry that I still didn't quite get below description.

> 

> When iommu_setup() is called in __start_xen(), interrupts have

> already been enabled, and nothing disables (without reenabling)

> them again in the path that leads to calling

> set_iommu_interrupt_handler(). There's therefore no risk of them

> being disabled and needing remaining so, and hence no need for


no risk of them being 'enabled' since no one disables them again?

> saving and restoring the flags. This means that, even if interrupt

> would need to be disabled when taking pcidevs_lock, spin_lock_irq()

> and spin_unlock_irq() could be used.


I didn't see how this explanation relates to below change. You are
changing from spin_lock_irqsave to spin_lock. But here you explains
the reason to spin_lock_irq...

> 

> Inside set_iommu_interrupt_handler(), pcidevs_lock serializes calls

> to pci_get_pdev(), which does not require interrupts to be disabled

> during its execution. In fact, pcidevs_lock is always (except for

> this case) taken without disabling interrupts, and disabling them in

> this case would make the BUG_ON() in check_lock() trigger, if it

> wasn't for the fact that spinlock debugging checks are still

> disabled, during initialization, when iommu_setup() (which then end

> up calling set_iommu_interrupt_handler()) is called.


The key problem is that we need consistent lock usage in all places 
(either all in IRQ-safe or all in IRQ-unsafe), regardless of whether 
check_lock is activated or not (which is just a debug method to help
identify such inconsistency problem). I think it should be clear 
enough to state that pci_get_pdev doesn't require holding lock with
interrupt disabled (so we should use spin_lock in this AMD case),
and add the fact that when this function is invoked the interrupt
is indeed enabled.


> 

> In order to fix this, we can use just plain spin_lock() and spin_unlock(),

> instead of spin_lock_irqsave() and spin_unlock_irqrestore().


What about revise like below?
--

pcidevs_lock should be held with interrupt enabled. However there
remains an exception in AMD IOMMU code, where the lock is
acquired with interrupt disabled. This inconsistency can lead to 
deadlock. 

The fix is straightforward to use spin_lock instead. Also interrupt
has been enabled when this function is invoked, so we're sure
consistency around pcidevs_lock can be guaranteed after this fix.
--

> 

> Signed-off-by: Quan Xu <quan.xu@intel.com>

> Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>

> CC: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>

> CC: Dario Faggioli <dario.faggioli@citrix.com>

> CC: Jan Beulich <jbeulich@suse.com>

> ---

>  xen/drivers/passthrough/amd/iommu_init.c | 5 ++---

>  1 file changed, 2 insertions(+), 3 deletions(-)

> 

> diff --git a/xen/drivers/passthrough/amd/iommu_init.c

> b/xen/drivers/passthrough/amd/iommu_init.c

> index d90a2d2..a400497 100644

> --- a/xen/drivers/passthrough/amd/iommu_init.c

> +++ b/xen/drivers/passthrough/amd/iommu_init.c

> @@ -778,7 +778,6 @@ static bool_t __init set_iommu_interrupt_handler(struct

> amd_iommu *iommu)

>  {

>      int irq, ret;

>      hw_irq_controller *handler;

> -    unsigned long flags;

>      u16 control;

> 

>      irq = create_irq(NUMA_NO_NODE);

> @@ -788,10 +787,10 @@ static bool_t __init set_iommu_interrupt_handler(struct

> amd_iommu *iommu)

>          return 0;

>      }

> 

> -    spin_lock_irqsave(&pcidevs_lock, flags);

> +    spin_lock(&pcidevs_lock);

>      iommu->msi.dev = pci_get_pdev(iommu->seg, PCI_BUS(iommu->bdf),

>                                    PCI_DEVFN2(iommu->bdf));

> -    spin_unlock_irqrestore(&pcidevs_lock, flags);

> +    spin_unlock(&pcidevs_lock);

>      if ( !iommu->msi.dev )

>      {

>          AMD_IOMMU_DEBUG("IOMMU: no pdev for %04x:%02x:%02x.%u\n",

> --

> 1.9.1

> 

> 

> _______________________________________________

> Xen-devel mailing list

> Xen-devel@lists.xen.org

> http://lists.xen.org/xen-devel
Quan Xu March 9, 2016, 7:31 a.m. UTC | #2
On March 09, 2016 1:19pm, <Tian, Kevin> wrote:
> > From: Quan Xu

> > Sent: Wednesday, March 09, 2016 11:08 AM

> > When iommu_setup() is called in __start_xen(), interrupts have already

> > been enabled, and nothing disables (without reenabling) them again in

> > the path that leads to calling set_iommu_interrupt_handler(). There's

> > therefore no risk of them being disabled and needing remaining so, and

> > hence no need for

> 

> no risk of them being 'enabled' since no one disables them again?

> 


Yes,

> > saving and restoring the flags. This means that, even if interrupt

> > would need to be disabled when taking pcidevs_lock, spin_lock_irq()

> > and spin_unlock_irq() could be used.

> 

> I didn't see how this explanation relates to below change. You are changing

> from spin_lock_irqsave to spin_lock. But here you explains the reason to

> spin_lock_irq...

> 

Yes, you are right. I think I'd better remove or add a '()' for:

   "This means that, even if interrupt
    would need to be disabled when taking pcidevs_lock, spin_lock_irq()
    and spin_unlock_irq() could be used."

> >

> > Inside set_iommu_interrupt_handler(), pcidevs_lock serializes calls to

> > pci_get_pdev(), which does not require interrupts to be disabled

> > during its execution. In fact, pcidevs_lock is always (except for this

> > case) taken without disabling interrupts, and disabling them in this

> > case would make the BUG_ON() in check_lock() trigger, if it wasn't for

> > the fact that spinlock debugging checks are still disabled, during

> > initialization, when iommu_setup() (which then end up calling

> > set_iommu_interrupt_handler()) is called.

> 

> The key problem is that we need consistent lock usage in all places (either all in

> IRQ-safe or all in IRQ-unsafe), regardless of whether check_lock is activated or

> not (which is just a debug method to help identify such inconsistency problem).



IMO, this is not the key problem,  __Wait for Dario's / Jan's opinions__.


> I think it should be clear enough to state that pci_get_pdev doesn't require

> holding lock with interrupt disabled (so we should use spin_lock in this AMD

> case), and add the fact that when this function is invoked the interrupt is indeed

> enabled.

> 

> 

> >

> > In order to fix this, we can use just plain spin_lock() and

> > spin_unlock(), instead of spin_lock_irqsave() and spin_unlock_irqrestore().

> 

> What about revise like below?

> --

> 

> pcidevs_lock should be held with interrupt enabled.


I am not sure for this point.


> However there remains an

> exception in AMD IOMMU code, where the lock is acquired with interrupt

> disabled. This inconsistency can lead to deadlock.

> 

> The fix is straightforward to use spin_lock instead. Also interrupt has been

> enabled when this function is invoked, so we're sure consistency around

> pcidevs_lock can be guaranteed after this fix.


I really appreciate you send out a revised one, but I think it is not only for consistency.
 __Wait for Dario's / Jan's opinions__.

To be honest, to me, __my_changlog_ is complicated.


Quan
Jan Beulich March 9, 2016, 10:09 a.m. UTC | #3
>>> "Xu, Quan" <quan.xu@intel.com> 03/09/16 8:32 AM >>>
>On March 09, 2016 1:19pm, <Tian, Kevin> wrote:
>> > Inside set_iommu_interrupt_handler(), pcidevs_lock serializes calls to
>> > pci_get_pdev(), which does not require interrupts to be disabled
>> > during its execution. In fact, pcidevs_lock is always (except for this
>> > case) taken without disabling interrupts, and disabling them in this
>> > case would make the BUG_ON() in check_lock() trigger, if it wasn't for
>> > the fact that spinlock debugging checks are still disabled, during
>> > initialization, when iommu_setup() (which then end up calling
>> > set_iommu_interrupt_handler()) is called.
>> 
>> The key problem is that we need consistent lock usage in all places (either all in
>> IRQ-safe or all in IRQ-unsafe), regardless of whether check_lock is activated or
>> not (which is just a debug method to help identify such inconsistency problem).
>
>IMO, this is not the key problem,  __Wait for Dario's / Jan's opinions__.

The inconsistency is one way of look at the problem, so with that it is
kind of "key".

>> What about revise like below?
>> --
>> 
>> pcidevs_lock should be held with interrupt enabled.
>
>I am not sure for this point.

Well, I'd say something like "pcidevs_lock doesn't require interrupts to be
disabled while being acquired".

>> However there remains an
>> exception in AMD IOMMU code, where the lock is acquired with interrupt
>> disabled. This inconsistency can lead to deadlock.
>> 
>> The fix is straightforward to use spin_lock instead. Also interrupt has been
>> enabled when this function is invoked, so we're sure consistency around
>> pcidevs_lock can be guaranteed after this fix.
>
>I really appreciate you send out a revised one, but I think it is not only for consistency.
>__Wait for Dario's / Jan's opinions__.
>
>To be honest, to me, __my_changlog_ is complicated.

I think Kevin's text above is okay. Perhaps weaken the "can lead to a
deadlock" slightly, because that's just a theoretical concern, not one that's
possible in practice on that code path.

Jan
Dario Faggioli March 9, 2016, 10:24 a.m. UTC | #4
On Wed, 2016-03-09 at 07:31 +0000, Xu, Quan wrote:
> On March 09, 2016 1:19pm, <Tian, Kevin> wrote:
> > 
> > > When iommu_setup() is called in __start_xen(), interrupts have
> > > already
> > > been enabled, and nothing disables (without reenabling) them
> > > again in
> > > the path that leads to calling set_iommu_interrupt_handler().
> > > There's
> > > therefore no risk of them being disabled and needing remaining
> > > so, and
> > > hence no need for
> > no risk of them being 'enabled' since no one disables them again?
> > 
> Yes,
> 
Reason why one use _irqsave() when locking is because he doesn't know
whether interrupt are disabled or not, and wants that, whatever the
status is (enabled or disabled), it remains unchanged upon unlock
(which, therefore, does the _irqrestore()).

If we are absolutely sure that they are enabled, i.e., there is no
_risk_ that they are disabled, it would be ok to just use _irq() (if
disabling interrupt is necessary, which is not in this case, but that's
another thing) and avoid saving the flags.

That's how I read the original description (which, of course, does not
mean it can't be simplified).

> > > saving and restoring the flags. This means that, even if
> > > interrupt
> > > would need to be disabled when taking pcidevs_lock,
> > > spin_lock_irq()
> > > and spin_unlock_irq() could be used.
> > I didn't see how this explanation relates to below change. You are
> > changing
> > from spin_lock_irqsave to spin_lock. But here you explains the
> > reason to
> > spin_lock_irq...
> > 
>
Exactly. We have spin_lock_irqsave() that does two things:
 - disables interrupts,
 - saves and restores the flags.

We think that:
 - there's no need to save flags, even if disabling interrupts were 
   necessary,
 - there is no need to disable interrupts.

And therefore, we are changing to spin_lock() that:
 - doesn't disable interrupts,
 - doesn't save and restore the flags.

So, basically, switching spinlock variant is basically _2_ changes. I
do think that it is worth touching in the changelog why _both_ are ok.

> Yes, you are right. I think I'd better remove or add a '()' for:
> 
>    "This means that, even if interrupt
>     would need to be disabled when taking pcidevs_lock,
> spin_lock_irq()
>     and spin_unlock_irq() could be used."
> 
Yeah, that makes it more accurate, but even longer, while I think the
changelog could use some shortening and simplification.

> > > Inside set_iommu_interrupt_handler(), pcidevs_lock serializes
> > > calls to
> > > pci_get_pdev(), which does not require interrupts to be disabled
> > > during its execution. In fact, pcidevs_lock is always (except for
> > > this
> > > case) taken without disabling interrupts, and disabling them in
> > > this
> > > case would make the BUG_ON() in check_lock() trigger, if it
> > > wasn't for
> > > the fact that spinlock debugging checks are still disabled,
> > > during
> > > initialization, when iommu_setup() (which then end up calling
> > > set_iommu_interrupt_handler()) is called.
> > The key problem is that we need consistent lock usage in all places
> > (either all in
> > IRQ-safe or all in IRQ-unsafe), regardless of whether check_lock is
> > activated or
> > not (which is just a debug method to help identify such
> > inconsistency problem).
> 
It is, and, during boot, is disabled for a reason, which is that we
allow mixed usage, albeit only in super special circumstances (like, in
fact, early boot). So I agree that check_lock() is just a sanity
checking mechanism, but let's not state anything incorrect (as Jan
requested in the first place).

> IMO, this is not the key problem,  __Wait for Dario's / Jan's
> opinions__.
> 
Well, it is a way to describe at least some of the aspects of the key
problem, I guess, just not the one that I think I would stress the
most.

> > I think it should be clear enough to state that pci_get_pdev
> > doesn't require
> > holding lock with interrupt disabled (so we should use spin_lock in
> > this AMD
> > case), and add the fact that when this function is invoked the
> > interrupt is indeed
> > enabled.
> > 
I totally agree with this description! (Can we use that as
changelog? :-) )

> > > In order to fix this, we can use just plain spin_lock() and
> > > spin_unlock(), instead of spin_lock_irqsave() and
> > > spin_unlock_irqrestore().
> > What about revise like below?
> > --
> > 
> > pcidevs_lock should be held with interrupt enabled.
> I am not sure for this point.
> 
Well, it is true that it should. Altough, I think it's more accurate to
say that, as Kevin says above, it "doesn't require being held with
interrupt disabled".

> > However there remains an
> > exception in AMD IOMMU code, where the lock is acquired with
> > interrupt
> > disabled. This inconsistency can lead to deadlock.
> > 
Can it? In the case of the occurrence being changed by this patch, I
don't think it can.

> > The fix is straightforward to use spin_lock instead. Also interrupt
> > has been
> > enabled when this function is invoked, so we're sure consistency
> > around
> > pcidevs_lock can be guaranteed after this fix.
> I really appreciate you send out a revised one, but I think it is not
> only for consistency.
>  __Wait for Dario's / Jan's opinions__.
> 
> To be honest, to me, __my_changlog_ is complicated.
> 
It is complicated. I think it is a detailed, and IMO correct,
description of the reason why the patch is ok. That is indeed the
purpose of a changelog (especially for these kind of patches), but it
certainly could be summarized/simplified a bit.

I guess I'm also giving it a try, using what Kevin wrote in the middle
of his email as a basis (with a small addition about consistency at the
end).

"pci_get_pdev() doesn't require interrupts to be disabled when taking
the pcidevs_lock, which protects its execution. So, spin_lock() can be
used for that, and that is what is done almost everywhere.

Currenlty, there is an exception, in early boot IOMMU initialization on
AMD systems, where spin_lock_irqsave() and restore are used. However,
since, in that case too:
 - we don't need to disable interrupts (for same reasons stated above),
 - interrupts are enabled already, so there is no need to save and
   restore flags,
just change it into using spin_lock(), as everywhere.

Note that, mixing interrupt disabled and enabled spinlocks is something
we disallow, except for very special situations. And since this one
does not qualify as such, using IRQ disabling variants can be
considered a bug (which manages to not trigger the safety checking we
have in place only because they're not yet enabled)."

Regards,
Dario
Quan Xu March 9, 2016, 12:52 p.m. UTC | #5
On March 09, 2016 6:25pm, <dario.faggioli@citrix.com> wrote:
> On Wed, 2016-03-09 at 07:31 +0000, Xu, Quan wrote:

> > On March 09, 2016 1:19pm, <Tian, Kevin> wrote:

> > >

> > > > When iommu_setup() is called in __start_xen(), interrupts have

> > > > already been enabled, and nothing disables (without reenabling)

> > > > them again in the path that leads to calling

> > > > set_iommu_interrupt_handler().

> > > > There's

> > > > therefore no risk of them being disabled and needing remaining so,

> > > > and hence no need for

> > > no risk of them being 'enabled' since no one disables them again?

> > >

> > Yes,

> >

> Reason why one use _irqsave() when locking is because he doesn't know

> whether interrupt are disabled or not, and wants that, whatever the status is

> (enabled or disabled), it remains unchanged upon unlock (which, therefore,

> does the _irqrestore()).

> 

> If we are absolutely sure that they are enabled, i.e., there is no _risk_ that they

> are disabled, it would be ok to just use _irq() (if disabling interrupt is necessary,

> which is not in this case, but that's another thing) and avoid saving the flags.

> 

> That's how I read the original description (which, of course, does not mean it

> can't be simplified).

> 

Dario, thanks for your share.
I appreciate you always share some knowledge. :):)
btw, the "Linux Device Drivers, 3rd Edition" book also describe it clearly, http://www.makelinux.net/ldd3/chp-5-sect-5 

> > > I think it should be clear enough to state that pci_get_pdev doesn't

> > > require holding lock with interrupt disabled (so we should use

> > > spin_lock in this AMD case), and add the fact that when this

> > > function is invoked the interrupt is indeed enabled.

> > >

> I totally agree with this description! (Can we use that as changelog? :-) )

> 


Of course. :)

> > > > In order to fix this, we can use just plain spin_lock() and

> > > > spin_unlock(), instead of spin_lock_irqsave() and

> > > > spin_unlock_irqrestore().

> > > What about revise like below?

> > > --

> > >

> > > pcidevs_lock should be held with interrupt enabled.

> > I am not sure for this point.

> >

> Well, it is true that it should. Altough, I think it's more accurate to say that, as

> Kevin says above, it "doesn't require being held with interrupt disabled".

> 

> > > However there remains an

> > > exception in AMD IOMMU code, where the lock is acquired with

> > > interrupt disabled. This inconsistency can lead to deadlock.

> > >

> Can it? In the case of the occurrence being changed by this patch, I don't think it

> can.


Before this patch, it might. As Jan mentioned, that's just a theoretical concern: 
 -spin_lock_irqsave() disables interrupts (on the local processor only) before taking the spinlock. 
  Supposed in MP system (pCPU1, pCPU2, ...), when the pCPU1 call the spin_lock_irqsave(),
  It does only disable interrupts for pCPU1, and _not_ for other pCPUn.
 -Then, it is mixing interrupt disabled and enabled spinlocks.

> 

> > > The fix is straightforward to use spin_lock instead. Also interrupt

> > > has been enabled when this function is invoked, so we're sure

> > > consistency around pcidevs_lock can be guaranteed after this fix.

> > I really appreciate you send out a revised one, but I think it is not

> > only for consistency.

> >  __Wait for Dario's / Jan's opinions__.

> >

> > To be honest, to me, __my_changlog_ is complicated.

> >

> It is complicated. I think it is a detailed, and IMO correct, description of the

> reason why the patch is ok. That is indeed the purpose of a changelog

> (especially for these kind of patches), but it certainly could be

> summarized/simplified a bit.

> 

> I guess I'm also giving it a try, using what Kevin wrote in the middle of his email

> as a basis (with a small addition about consistency at the end).

> 

> "pci_get_pdev() doesn't require interrupts to be disabled when taking the

> pcidevs_lock, which protects its execution. So, spin_lock() can be used for that,

> and that is what is done almost everywhere.

> 

> Currenlty, there is an exception, in early boot IOMMU initialization on AMD

> systems, where spin_lock_irqsave() and restore are used. However, since, in

> that case too:

>  - we don't need to disable interrupts (for same reasons stated above),

>  - interrupts are enabled already, so there is no need to save and

>    restore flags,

> just change it into using spin_lock(), as everywhere.

> 

> Note that, mixing interrupt disabled and enabled spinlocks is something we

> disallow,


Dario, could you share something in detail?
Quan

> except for very special situations.
Dario Faggioli March 9, 2016, 1:19 p.m. UTC | #6
On Wed, 2016-03-09 at 12:52 +0000, Xu, Quan wrote:
> On March 09, 2016 6:25pm, <dario.faggioli@citrix.com> wrote:
> > On Wed, 2016-03-09 at 07:31 +0000, Xu, Quan wrote:
> > > On March 09, 2016 1:19pm, <Tian, Kevin> wrote:
> > > > 
> > If we are absolutely sure that they are enabled, i.e., there is no
> > _risk_ that they
> > are disabled, it would be ok to just use _irq() (if disabling
> > interrupt is necessary,
> > which is not in this case, but that's another thing) and avoid
> > saving the flags.
> > 
> Dario, thanks for your share.
> I appreciate you always share some knowledge. :):)
> btw, the "Linux Device Drivers, 3rd Edition" book also describe it
> clearly, http://www.makelinux.net/ldd3/chp-5-sect-5 
> 
Yes, with respect to that, us and Linux are similar, and the concerns
on whether interrupts should be disabled or not when taking a spinlock
are generic, and can be applied to any OS/hypervisor.

AFAICR, Linux does not have any check in place similar to our
check_lock(), but that does not mean much.

> > > > However there remains an
> > > > exception in AMD IOMMU code, where the lock is acquired with
> > > > interrupt disabled. This inconsistency can lead to deadlock.
> > > > 
> > Can it? In the case of the occurrence being changed by this patch,
> > I don't think it
> > can.
> Before this patch, it might. As Jan mentioned, that's just a
> theoretical concern: 
>  -spin_lock_irqsave() disables interrupts (on the local processor
> only) before taking the spinlock. 
>   Supposed in MP system (pCPU1, pCPU2, ...), when the pCPU1 call the
> spin_lock_irqsave(),
>   It does only disable interrupts for pCPU1, and _not_ for other
> pCPUn.
>  -Then, it is mixing interrupt disabled and enabled spinlocks.
> 
I was commenting on the "can lead to deadlock" part, which is something
that, in general, we risk if we mix, but that can't possibly occur in
this specific case. This is also what Jan is saying, and the reason why
is also asking to mitigate this exact claim... So, I'm not sure what
your point is here...

> > Note that, mixing interrupt disabled and enabled spinlocks is
> > something we
> > disallow,
>
> Dario, could you share something in detail?
>
Sorry, I again don't understand... share something about what? I was
proposing myself a text to be used as changelog, which I'm pasting
again here below, for completeness/clarity.

This sentence you seem to be asking about, is what we've been
"debating", in this thread about the changelog, since almost the
beginning: it's bad to mix, because it leads to inconsistencies can are
bad because in general it would trigger BUG_ON, in debug builds, and
cause deadlock, in non-debug builds, even if in this case neither
happen. What is that you need more insights about?

Here it is (again) what I would use as commit message:
---
pci_get_pdev() doesn't require interrupts to be disabled when taking
the pcidevs_lock, which protects its execution. So, spin_lock() can be
used for that, and that is what is done almost everywhere.

Currenlty, there is an exception, in early boot IOMMU initialization on
AMD systems, where spin_lock_irqsave() and restore are used. However,
since, in that case too:
 - we don't need to disable interrupts (for same reasons stated above),
 - interrupts are enabled already, so there is no need to save and
   restore flags,
just change it into using spin_lock(), as everywhere.

Note that, mixing interrupt disabled and enabled spinlocks is something
we disallow, except for very special situations. And since this one
does not qualify as such, using IRQ disabling variants can be
considered a bug (which manages to not trigger the safety checking we
have in place only because they're not yet enabled).
---

Regards,
Dario
Quan Xu March 9, 2016, 1:46 p.m. UTC | #7
> -----Original Message-----

> From: Dario Faggioli [mailto:dario.faggioli@citrix.com]

> Sent: Wednesday, March 09, 2016 9:20 PM

> To: Xu, Quan

> Cc: Suravee Suthikulpanit; xen-devel@lists.xen.org; Jan Beulich; Tian, Kevin

> Subject: Re: [Xen-devel] [PATCH v2 1/2] IOMMU/spinlock: Fix a bug found in

> AMD IOMMU initialization.

> 

> On Wed, 2016-03-09 at 12:52 +0000, Xu, Quan wrote:

> > On March 09, 2016 6:25pm, <dario.faggioli@citrix.com> wrote:

> > > On Wed, 2016-03-09 at 07:31 +0000, Xu, Quan wrote:

> > > > On March 09, 2016 1:19pm, <Tian, Kevin> wrote:

> > > > >

> > > If we are absolutely sure that they are enabled, i.e., there is no

> > > _risk_ that they are disabled, it would be ok to just use _irq() (if

> > > disabling interrupt is necessary, which is not in this case, but

> > > that's another thing) and avoid saving the flags.

> > >

> > Dario, thanks for your share.

> > I appreciate you always share some knowledge. :):) btw, the "Linux

> > Device Drivers, 3rd Edition" book also describe it clearly,

> > http://www.makelinux.net/ldd3/chp-5-sect-5

> >

> Yes, with respect to that, us and Linux are similar, and the concerns on whether

> interrupts should be disabled or not when taking a spinlock are generic, and can

> be applied to any OS/hypervisor.

> 

> AFAICR, Linux does not have any check in place similar to our check_lock(), but

> that does not mean much.

> 

> > > > > However there remains an

> > > > > exception in AMD IOMMU code, where the lock is acquired with

> > > > > interrupt disabled. This inconsistency can lead to deadlock.

> > > > >

> > > Can it? In the case of the occurrence being changed by this patch, I

> > > don't think it can.

> > Before this patch, it might. As Jan mentioned, that's just a

> > theoretical concern:

> >  -spin_lock_irqsave() disables interrupts (on the local processor

> > only) before taking the spinlock.

> >   Supposed in MP system (pCPU1, pCPU2, ...), when the pCPU1 call the

> > spin_lock_irqsave(),

> >   It does only disable interrupts for pCPU1, and _not_ for other

> > pCPUn.

> >  -Then, it is mixing interrupt disabled and enabled spinlocks.

> >

> I was commenting on the "can lead to deadlock" part, which is something that,

> in general, we risk if we mix, but that can't possibly occur in this specific case.

> This is also what Jan is saying, and the reason why is also asking to mitigate this

> exact claim... So, I'm not sure what your point is here...

> 

> > > Note that, mixing interrupt disabled and enabled spinlocks is

> > > something we disallow,

> >

> > Dario, could you share something in detail?

> >

> Sorry, I again don't understand... share something about what? I was proposing

> myself a text to be used as changelog, which I'm pasting again here below, for

> completeness/clarity.

> 


Now I am still not clear for this point- "this inconsistency might lead to deadlock".
I think it is similar to 'mixing interrupt disabled and enabled spinlocks is something we disallow'.
I hope you can give me an example about how to lead to deadlock. 

Quan
Jan Beulich March 9, 2016, 1:55 p.m. UTC | #8
>>> On 09.03.16 at 14:46, <quan.xu@intel.com> wrote:
> Now I am still not clear for this point- "this inconsistency might lead to 
> deadlock".
> I think it is similar to 'mixing interrupt disabled and enabled spinlocks is 
> something we disallow'.
> I hope you can give me an example about how to lead to deadlock. 

The implication from disabling interrupts while acquiring a lock
is that the lock is also being acquired by some interrupt
handler. If you mix acquire types, the one not disabling
interrupts is prone to be interrupted, and the interrupt trying
to get hold of the lock the same CPU already owns.

Jan
Dario Faggioli March 9, 2016, 2:45 p.m. UTC | #9
On Wed, 2016-03-09 at 06:55 -0700, Jan Beulich wrote:
> > 
> > > > On 09.03.16 at 14:46, <quan.xu@intel.com> wrote:
> > Now I am still not clear for this point- "this inconsistency might
> > lead to 
> > deadlock".
> > I think it is similar to 'mixing interrupt disabled and enabled
> > spinlocks is 
> > something we disallow'.
> > I hope you can give me an example about how to lead to deadlock. 
> The implication from disabling interrupts while acquiring a lock
> is that the lock is also being acquired by some interrupt
> handler. If you mix acquire types, the one not disabling
> interrupts is prone to be interrupted, and the interrupt trying
> to get hold of the lock the same CPU already owns.
> 
Exactly.

There are a few other nice writeup online as well.

The most famous one, I guess, is this one from Linus (look at "Lesson
3: spinlocks revisited.")
https://www.kernel.org/doc/Documentation/locking/spinlocks.txt

And, of course, there's the comment inside check_lock(), in
xen/common/spinlock.c, in Xen's codebase, where another example of how
it could be dangerous to mix, even if multiple cpus are involved.

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
Quan Xu March 10, 2016, 3:21 a.m. UTC | #10
On March 09, 2016 9:56pm, <JBeulich@suse.com> wrote:
> >>> On 09.03.16 at 14:46, <quan.xu@intel.com> wrote:
> > Now I am still not clear for this point- "this inconsistency might
> > lead to deadlock".
> > I think it is similar to 'mixing interrupt disabled and enabled
> > spinlocks is something we disallow'.
> > I hope you can give me an example about how to lead to deadlock.
> 
> The implication from disabling interrupts while acquiring a lock is that the lock is
> also being acquired by some interrupt handler. If you mix acquire types, the one
> not disabling interrupts is prone to be interrupted, and the interrupt trying to get
> hold of the lock the same CPU already owns.
> 

Jan, thanks.
Now I got it. btw, I also find a good explanation from the comment, inside check_lock(), in xen/common/spinlock.c.
Quan
Quan Xu March 10, 2016, 5:36 a.m. UTC | #11
On March 09, 2016 10:45pm, Dario Faggioli wrote:
> On Wed, 2016-03-09 at 06:55 -0700, Jan Beulich wrote:

> > >

> > > > > On 09.03.16 at 14:46, <quan.xu@intel.com> wrote:

> > > Now I am still not clear for this point- "this inconsistency might

> > > lead to deadlock".

> > > I think it is similar to 'mixing interrupt disabled and enabled

> > > spinlocks is something we disallow'.

> > > I hope you can give me an example about how to lead to deadlock.

> > The implication from disabling interrupts while acquiring a lock is

> > that the lock is also being acquired by some interrupt handler. If you

> > mix acquire types, the one not disabling interrupts is prone to be

> > interrupted, and the interrupt trying to get hold of the lock the same

> > CPU already owns.

> >

> Exactly.

> 

> There are a few other nice writeup online as well.

> 

> The most famous one, I guess, is this one from Linus (look at "Lesson

> 3: spinlocks revisited.")

> https://www.kernel.org/doc/Documentation/locking/spinlocks.txt

> 

> And, of course, there's the comment inside check_lock(), in

> xen/common/spinlock.c, in Xen's codebase, where another example of how it

> could be dangerous to mix, even if multiple cpus are involved.

> 

Dario, thanks! You know, it helped me a lot.

Quan
diff mbox

Patch

diff --git a/xen/drivers/passthrough/amd/iommu_init.c b/xen/drivers/passthrough/amd/iommu_init.c
index d90a2d2..a400497 100644
--- a/xen/drivers/passthrough/amd/iommu_init.c
+++ b/xen/drivers/passthrough/amd/iommu_init.c
@@ -778,7 +778,6 @@  static bool_t __init set_iommu_interrupt_handler(struct amd_iommu *iommu)
 {
     int irq, ret;
     hw_irq_controller *handler;
-    unsigned long flags;
     u16 control;
 
     irq = create_irq(NUMA_NO_NODE);
@@ -788,10 +787,10 @@  static bool_t __init set_iommu_interrupt_handler(struct amd_iommu *iommu)
         return 0;
     }
 
-    spin_lock_irqsave(&pcidevs_lock, flags);
+    spin_lock(&pcidevs_lock);
     iommu->msi.dev = pci_get_pdev(iommu->seg, PCI_BUS(iommu->bdf),
                                   PCI_DEVFN2(iommu->bdf));
-    spin_unlock_irqrestore(&pcidevs_lock, flags);
+    spin_unlock(&pcidevs_lock);
     if ( !iommu->msi.dev )
     {
         AMD_IOMMU_DEBUG("IOMMU: no pdev for %04x:%02x:%02x.%u\n",