diff mbox series

vpci/msix: exit early if MSI-X is disabled

Message ID 20201201174014.27878-1-roger.pau@citrix.com (mailing list archive)
State New
Headers show
Series vpci/msix: exit early if MSI-X is disabled | expand

Commit Message

Roger Pau Monné Dec. 1, 2020, 5:40 p.m. UTC
Do not attempt to mask an MSI-X entry if MSI-X is not enabled. Else it
will lead to hitting the following assert on debug builds:

(XEN) Panic on CPU 13:
(XEN) Assertion 'entry->arch.pirq != INVALID_PIRQ' failed at vmsi.c:843

In order to fix it exit early from the switch in msix_write if MSI-X
is not enabled.

Fixes: d6281be9d0145 ('vpci/msix: add MSI-X handlers')
Reported-by: Manuel Bouyer <bouyer@antioche.eu.org>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
 xen/drivers/vpci/msix.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

Comments

Jan Beulich Dec. 2, 2020, 8:38 a.m. UTC | #1
On 01.12.2020 18:40, Roger Pau Monne wrote:
> Do not attempt to mask an MSI-X entry if MSI-X is not enabled. Else it
> will lead to hitting the following assert on debug builds:
> 
> (XEN) Panic on CPU 13:
> (XEN) Assertion 'entry->arch.pirq != INVALID_PIRQ' failed at vmsi.c:843

Since the line number is only of limited use, I'd like to see the
function name (vpci_msix_arch_mask_entry()) also added here; easily
done while committing, if the question further down can be resolved
without code change.

> --- a/xen/drivers/vpci/msix.c
> +++ b/xen/drivers/vpci/msix.c
> @@ -357,7 +357,11 @@ static int msix_write(struct vcpu *v, unsigned long addr, unsigned int len,
>           * so that it picks the new state.
>           */
>          entry->masked = new_masked;
> -        if ( !new_masked && msix->enabled && !msix->masked && entry->updated )
> +
> +        if ( !msix->enabled )
> +            break;
> +
> +        if ( !new_masked && !msix->masked && entry->updated )
>          {
>              /*
>               * If MSI-X is enabled, the function mask is not active, the entry

What about a "disabled" -> "enabled-but-masked" transition? This,
afaict, similarly won't trigger setting up of entries from
control_write(), and hence I'd expect the ASSERT() to similarly
trigger when subsequently an entry's mask bit gets altered.

I'd also be fine making this further adjustment, if you agree,
but the one thing I haven't been able to fully convince myself of
is that there's then still no need to set ->updated to true.

Jan
Jan Beulich Dec. 3, 2020, 1:40 p.m. UTC | #2
On 02.12.2020 09:38, Jan Beulich wrote:
> On 01.12.2020 18:40, Roger Pau Monne wrote:
>> --- a/xen/drivers/vpci/msix.c
>> +++ b/xen/drivers/vpci/msix.c
>> @@ -357,7 +357,11 @@ static int msix_write(struct vcpu *v, unsigned long addr, unsigned int len,
>>           * so that it picks the new state.
>>           */
>>          entry->masked = new_masked;
>> -        if ( !new_masked && msix->enabled && !msix->masked && entry->updated )
>> +
>> +        if ( !msix->enabled )
>> +            break;
>> +
>> +        if ( !new_masked && !msix->masked && entry->updated )
>>          {
>>              /*
>>               * If MSI-X is enabled, the function mask is not active, the entry
> 
> What about a "disabled" -> "enabled-but-masked" transition? This,
> afaict, similarly won't trigger setting up of entries from
> control_write(), and hence I'd expect the ASSERT() to similarly
> trigger when subsequently an entry's mask bit gets altered.
> 
> I'd also be fine making this further adjustment, if you agree,
> but the one thing I haven't been able to fully convince myself of
> is that there's then still no need to set ->updated to true.

I've taken another look. I think setting ->updated (or something
equivalent) is needed in that case, in order to not lose the
setting of the entry mask bit. However, this would only defer the
problem to control_write(): This would now need to call
vpci_msix_arch_mask_entry() under suitable conditions, but avoid
calling it when the entry is disabled or was never set up. No
matter whether making the setting of ->updated conditional, or
adding a conditional call in update_entry(), we'd need to
evaluate whether the entry is currently disabled. Imo, instead of
introducing a new arch hook for this, it's easier to make
vpci_msix_arch_mask_entry() tolerate getting called on a disabled
entry. Below my proposed alternative change.

While writing the description I started wondering why we require
address or data fields to have got written before the first
unmask. I don't think the hardware imposes such a requirement;
zeros would be used instead, whatever this means. Let's not
forget that it's only the primary purpose of MSI/MSI-X to
trigger interrupts. Forcing the writes to go elsewhere in
memory is not forbidden from all I know, and could be used by a
driver. IOW I think ->updated should start out as set to true.
But of course vpci_msi_update() then would need to check the
upper address bits and avoid setting up an interrupt if they're
not 0xfee. And further arrangements would be needed to have the
guest requested write actually get carried out correctly.

Jan

x86/vPCI: tolerate (un)masking a disabled MSI-X entry

None of the four reasons causing vpci_msix_arch_mask_entry() to get
called (there's just a single call site) are impossible or illegal prior
to an entry actually having got set up:
- the entry may remain masked (in this case, however, a prior masked ->
  unmasked transition would already not have worked),
- MSI-X may not be enabled,
- the global mask bit may be set,
- the entry may not otherwise have been updated.
Hence the function asserting that the entry was previously set up was
simply wrong. Since the caller tracks the masked state (and setting up
of an entry would only be effected when that software bit is clear),
it's okay to skip both masking and unmasking requests in this case.

Fixes: d6281be9d0145 ('vpci/msix: add MSI-X handlers')
Reported-by: Manuel Bouyer <bouyer@antioche.eu.org>
Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/hvm/vmsi.c
+++ b/xen/arch/x86/hvm/vmsi.c
@@ -840,8 +840,8 @@ void vpci_msi_arch_print(const struct vp
 void vpci_msix_arch_mask_entry(struct vpci_msix_entry *entry,
                                const struct pci_dev *pdev, bool mask)
 {
-    ASSERT(entry->arch.pirq != INVALID_PIRQ);
-    vpci_mask_pirq(pdev->domain, entry->arch.pirq, mask);
+    if ( entry->arch.pirq != INVALID_PIRQ )
+        vpci_mask_pirq(pdev->domain, entry->arch.pirq, mask);
 }
 
 int vpci_msix_arch_enable_entry(struct vpci_msix_entry *entry,
Roger Pau Monné Dec. 6, 2020, 11:15 a.m. UTC | #3
Sorry, slightly sleep deprived, hope the reply below makes sense.

On Thu, Dec 03, 2020 at 02:40:28PM +0100, Jan Beulich wrote:
> On 02.12.2020 09:38, Jan Beulich wrote:
> > On 01.12.2020 18:40, Roger Pau Monne wrote:
> >> --- a/xen/drivers/vpci/msix.c
> >> +++ b/xen/drivers/vpci/msix.c
> >> @@ -357,7 +357,11 @@ static int msix_write(struct vcpu *v, unsigned long addr, unsigned int len,
> >>           * so that it picks the new state.
> >>           */
> >>          entry->masked = new_masked;
> >> -        if ( !new_masked && msix->enabled && !msix->masked && entry->updated )
> >> +
> >> +        if ( !msix->enabled )
> >> +            break;
> >> +
> >> +        if ( !new_masked && !msix->masked && entry->updated )
> >>          {
> >>              /*
> >>               * If MSI-X is enabled, the function mask is not active, the entry
> > 
> > What about a "disabled" -> "enabled-but-masked" transition? This,
> > afaict, similarly won't trigger setting up of entries from
> > control_write(), and hence I'd expect the ASSERT() to similarly
> > trigger when subsequently an entry's mask bit gets altered.

This would only happen if the user hasn't written to the entry address
or data fields since initialization, or else the update field would be
set and then when clearing the entry mask bit in
PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET the entry will be properly setup.

> > I'd also be fine making this further adjustment, if you agree,
> > but the one thing I haven't been able to fully convince myself of
> > is that there's then still no need to set ->updated to true.
> 
> I've taken another look. I think setting ->updated (or something
> equivalent) is needed in that case, in order to not lose the
> setting of the entry mask bit. However, this would only defer the
> problem to control_write(): This would now need to call
> vpci_msix_arch_mask_entry() under suitable conditions, but avoid
> calling it when the entry is disabled or was never set up.

If the entry is masked control_write won't call update_entry, leaving
the entry updated bit as-is, thus deferring the call to update_entry
to further updates in PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET. I think this
is all fine.

> No
> matter whether making the setting of ->updated conditional, or
> adding a conditional call in update_entry(), we'd need to
> evaluate whether the entry is currently disabled. Imo, instead of
> introducing a new arch hook for this, it's easier to make
> vpci_msix_arch_mask_entry() tolerate getting called on a disabled
> entry. Below my proposed alternative change.

I think just setting the updated bit for all entries at initialization
would solve this, as this would then force a call to update_entry when
and entry us unmasked (by writes to PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET).

In any case the assert in vpci_msix_arch_mask_entry is a logic check,
IIRC calling it with an invalid pirq will just result in the function
being a no op as domain_spin_lock_irq_desc will return NULL.

> 
> While writing the description I started wondering why we require
> address or data fields to have got written before the first
> unmask. I don't think the hardware imposes such a requirement;
> zeros would be used instead, whatever this means. Let's not
> forget that it's only the primary purpose of MSI/MSI-X to
> trigger interrupts. Forcing the writes to go elsewhere in
> memory is not forbidden from all I know, and could be used by a
> driver. IOW I think ->updated should start out as set to true.
> But of course vpci_msi_update() then would need to check the
> upper address bits and avoid setting up an interrupt if they're
> not 0xfee. And further arrangements would be needed to have the
> guest requested write actually get carried out correctly.

Seems correct, albeit adding such logic seems to complicate the code
and expand the attack surface. IMO I wouldn't implement this unless we
know there's a real use case for this.

Thanks, Roger.
Jan Beulich Dec. 7, 2020, 8:19 a.m. UTC | #4
On 06.12.2020 12:15, Roger Pau Monné wrote:
> On Thu, Dec 03, 2020 at 02:40:28PM +0100, Jan Beulich wrote:
>> On 02.12.2020 09:38, Jan Beulich wrote:
>>> On 01.12.2020 18:40, Roger Pau Monne wrote:
>>>> --- a/xen/drivers/vpci/msix.c
>>>> +++ b/xen/drivers/vpci/msix.c
>>>> @@ -357,7 +357,11 @@ static int msix_write(struct vcpu *v, unsigned long addr, unsigned int len,
>>>>           * so that it picks the new state.
>>>>           */
>>>>          entry->masked = new_masked;
>>>> -        if ( !new_masked && msix->enabled && !msix->masked && entry->updated )
>>>> +
>>>> +        if ( !msix->enabled )
>>>> +            break;
>>>> +
>>>> +        if ( !new_masked && !msix->masked && entry->updated )
>>>>          {
>>>>              /*
>>>>               * If MSI-X is enabled, the function mask is not active, the entry
>>>
>>> What about a "disabled" -> "enabled-but-masked" transition? This,
>>> afaict, similarly won't trigger setting up of entries from
>>> control_write(), and hence I'd expect the ASSERT() to similarly
>>> trigger when subsequently an entry's mask bit gets altered.
> 
> This would only happen if the user hasn't written to the entry address
> or data fields since initialization, or else the update field would be
> set and then when clearing the entry mask bit in
> PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET the entry will be properly setup.

Right, but where's the difference to writes here happening when
!msix->enabled? All I'm saying is that all possible cases leading
to the "else" of this "if" need to be equally considered. Hence
my alternative patch.

>>> I'd also be fine making this further adjustment, if you agree,
>>> but the one thing I haven't been able to fully convince myself of
>>> is that there's then still no need to set ->updated to true.
>>
>> I've taken another look. I think setting ->updated (or something
>> equivalent) is needed in that case, in order to not lose the
>> setting of the entry mask bit. However, this would only defer the
>> problem to control_write(): This would now need to call
>> vpci_msix_arch_mask_entry() under suitable conditions, but avoid
>> calling it when the entry is disabled or was never set up.
> 
> If the entry is masked control_write won't call update_entry, leaving
> the entry updated bit as-is, thus deferring the call to update_entry
> to further updates in PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET. I think this
> is all fine.

It is under the assumption that msix_write() behaves correctly.
What I was saying is that there might appear to be a need to
set ->updated in msix_write() (to make sure the mask bit change
won't get lost), at which point the logic in control_write()
would need adjustment. Which I find undesirable.

>> No
>> matter whether making the setting of ->updated conditional, or
>> adding a conditional call in update_entry(), we'd need to
>> evaluate whether the entry is currently disabled. Imo, instead of
>> introducing a new arch hook for this, it's easier to make
>> vpci_msix_arch_mask_entry() tolerate getting called on a disabled
>> entry. Below my proposed alternative change.
> 
> I think just setting the updated bit for all entries at initialization
> would solve this, as this would then force a call to update_entry when
> and entry us unmasked (by writes to PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET).

I don't see this being a solution - we'd still end up calling
vpci_msix_arch_mask_entry() when !msix->enabled or msix->masked.

> In any case the assert in vpci_msix_arch_mask_entry is a logic check,
> IIRC calling it with an invalid pirq will just result in the function
> being a no op as domain_spin_lock_irq_desc will return NULL.

Ah yes, pirq_info() would return NULL here. Am I reading this as a
suggestion to simply drop the ASSERT(), instead of replacing it by
an if()? It would feel slightly more robust to me to keep the if().

>> While writing the description I started wondering why we require
>> address or data fields to have got written before the first
>> unmask. I don't think the hardware imposes such a requirement;
>> zeros would be used instead, whatever this means. Let's not
>> forget that it's only the primary purpose of MSI/MSI-X to
>> trigger interrupts. Forcing the writes to go elsewhere in
>> memory is not forbidden from all I know, and could be used by a
>> driver. IOW I think ->updated should start out as set to true.
>> But of course vpci_msi_update() then would need to check the
>> upper address bits and avoid setting up an interrupt if they're
>> not 0xfee. And further arrangements would be needed to have the
>> guest requested write actually get carried out correctly.
> 
> Seems correct, albeit adding such logic seems to complicate the code
> and expand the attack surface. IMO I wouldn't implement this unless we
> know there's a real use case for this.

I wasn't meaning to suggest we implement any of this without need.
I was, however, thinking we ought to at least check the high 12
address bits, and avoid trying to interpret the low 20 ones if
they don't match. Let me add a patch to this effect to the small
series that I've already accumulated anyway.

Jan
diff mbox series

Patch

diff --git a/xen/drivers/vpci/msix.c b/xen/drivers/vpci/msix.c
index 64dd0a929c..93902ba7db 100644
--- a/xen/drivers/vpci/msix.c
+++ b/xen/drivers/vpci/msix.c
@@ -357,7 +357,11 @@  static int msix_write(struct vcpu *v, unsigned long addr, unsigned int len,
          * so that it picks the new state.
          */
         entry->masked = new_masked;
-        if ( !new_masked && msix->enabled && !msix->masked && entry->updated )
+
+        if ( !msix->enabled )
+            break;
+
+        if ( !new_masked && !msix->masked && entry->updated )
         {
             /*
              * If MSI-X is enabled, the function mask is not active, the entry