diff mbox

[kernel] vfio: Print message about masking INTx only when it is known to be broken

Message ID 20180312164426.68b3c422@w520.home (mailing list archive)
State New, archived
Headers show

Commit Message

Alex Williamson March 12, 2018, 10:44 p.m. UTC
On Fri,  9 Mar 2018 12:17:36 +1100
Alexey Kardashevskiy <aik@ozlabs.ru> wrote:

> Commit 2170dd04316e ("vfio-pci: Mask INTx if a device is not capabable
> of enabling it") is causing 'Masking broken INTx support' messages
> every time when a PCI without INTx support is enabled. This message is
> intended to appear when a device known for broken INTx is being enabled.
> However the message also appears on IOV virtual functions where INTx
> cannot possibly be enabled so saying that the "broken" support is masked
> is not correct.
> 
> This changes the message to only appear when the device advertises INTx
> (via PCI_INTERRUPT_PIN != 0) but does not implement PCI 2.3 INTx masking.
> 
> Fixes: 2170dd04316e ("vfio-pci: Mask INTx if a device is not capabable of enabling it")
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
>  drivers/vfio/pci/vfio_pci.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> index b0f7594..7f2ec47 100644
> --- a/drivers/vfio/pci/vfio_pci.c
> +++ b/drivers/vfio/pci/vfio_pci.c
> @@ -245,7 +245,13 @@ static int vfio_pci_enable(struct vfio_pci_device *vdev)
>  
>  	if (likely(!nointxmask)) {
>  		if (vfio_pci_nointx(pdev)) {
> -			dev_info(&pdev->dev, "Masking broken INTx support\n");
> +			u8 pin = 0;
> +
> +			pci_read_config_byte(vdev->pdev, PCI_INTERRUPT_PIN,
> +					     &pin);
> +			if (pin)
> +				dev_info(&pdev->dev,
> +					 "Masking broken INTx support\n");
>  			vdev->nointx = true;
>  			pci_intx(pdev, 0);
>  		} else


Why don't we fix it at the location of the bug rather than after the
fact?  I don't see any reason to invoke the nointx handling for devices
that don't actually support INTx.  Thanks,

Alex

Comments

Alexey Kardashevskiy March 13, 2018, 2:38 a.m. UTC | #1
On 13/3/18 9:44 am, Alex Williamson wrote:
> On Fri,  9 Mar 2018 12:17:36 +1100
> Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
> 
>> Commit 2170dd04316e ("vfio-pci: Mask INTx if a device is not capabable
>> of enabling it") is causing 'Masking broken INTx support' messages
>> every time when a PCI without INTx support is enabled. This message is
>> intended to appear when a device known for broken INTx is being enabled.
>> However the message also appears on IOV virtual functions where INTx
>> cannot possibly be enabled so saying that the "broken" support is masked
>> is not correct.
>>
>> This changes the message to only appear when the device advertises INTx
>> (via PCI_INTERRUPT_PIN != 0) but does not implement PCI 2.3 INTx masking.
>>
>> Fixes: 2170dd04316e ("vfio-pci: Mask INTx if a device is not capabable of enabling it")
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>> ---
>>  drivers/vfio/pci/vfio_pci.c | 8 +++++++-
>>  1 file changed, 7 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
>> index b0f7594..7f2ec47 100644
>> --- a/drivers/vfio/pci/vfio_pci.c
>> +++ b/drivers/vfio/pci/vfio_pci.c
>> @@ -245,7 +245,13 @@ static int vfio_pci_enable(struct vfio_pci_device *vdev)
>>  
>>  	if (likely(!nointxmask)) {
>>  		if (vfio_pci_nointx(pdev)) {
>> -			dev_info(&pdev->dev, "Masking broken INTx support\n");
>> +			u8 pin = 0;
>> +
>> +			pci_read_config_byte(vdev->pdev, PCI_INTERRUPT_PIN,
>> +					     &pin);
>> +			if (pin)
>> +				dev_info(&pdev->dev,
>> +					 "Masking broken INTx support\n");
>>  			vdev->nointx = true;
>>  			pci_intx(pdev, 0);
>>  		} else
> 
> 
> Why don't we fix it at the location of the bug rather than after the
> fact?  I don't see any reason to invoke the nointx handling for devices
> that don't actually support INTx.  Thanks,


We can do that too, this just means that we will keep doing v2.3 masking
tests and possibly other things for SRIOV VFs when we do not need to as
dev-irq==0 anyway. Not a big deal though (was not a problem before), are
you going to post this as a patch or you want me to do this? Thanks,



> 
> Alex
> 
> 
> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> index ad18ed266dc0..34d463a0b4a5 100644
> --- a/drivers/vfio/pci/vfio_pci.c
> +++ b/drivers/vfio/pci/vfio_pci.c
> @@ -192,6 +192,8 @@ static void vfio_pci_disable(struct vfio_pci_device *vdev);
>   */
>  static bool vfio_pci_nointx(struct pci_dev *pdev)
>  {
> +	u8 pin;
> +
>  	switch (pdev->vendor) {
>  	case PCI_VENDOR_ID_INTEL:
>  		switch (pdev->device) {
> @@ -207,7 +209,9 @@ static bool vfio_pci_nointx(struct pci_dev *pdev)
>  		}
>  	}
>  
> -	if (!pdev->irq)
> +	pci_read_config_byte(pdev, PCI_INTERRUPT_PIN, &pin);
> +
> +	if (pin && !pdev->irq)
>  		return true;
>  
>  	return false;
>
Alex Williamson March 15, 2018, 3:02 a.m. UTC | #2
On Tue, 13 Mar 2018 13:38:46 +1100
Alexey Kardashevskiy <aik@ozlabs.ru> wrote:

> On 13/3/18 9:44 am, Alex Williamson wrote:
> > On Fri,  9 Mar 2018 12:17:36 +1100
> > Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
> >   
> >> Commit 2170dd04316e ("vfio-pci: Mask INTx if a device is not capabable
> >> of enabling it") is causing 'Masking broken INTx support' messages
> >> every time when a PCI without INTx support is enabled. This message is
> >> intended to appear when a device known for broken INTx is being enabled.
> >> However the message also appears on IOV virtual functions where INTx
> >> cannot possibly be enabled so saying that the "broken" support is masked
> >> is not correct.
> >>
> >> This changes the message to only appear when the device advertises INTx
> >> (via PCI_INTERRUPT_PIN != 0) but does not implement PCI 2.3 INTx masking.
> >>
> >> Fixes: 2170dd04316e ("vfio-pci: Mask INTx if a device is not capabable of enabling it")
> >> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> >> ---
> >>  drivers/vfio/pci/vfio_pci.c | 8 +++++++-
> >>  1 file changed, 7 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> >> index b0f7594..7f2ec47 100644
> >> --- a/drivers/vfio/pci/vfio_pci.c
> >> +++ b/drivers/vfio/pci/vfio_pci.c
> >> @@ -245,7 +245,13 @@ static int vfio_pci_enable(struct vfio_pci_device *vdev)
> >>  
> >>  	if (likely(!nointxmask)) {
> >>  		if (vfio_pci_nointx(pdev)) {
> >> -			dev_info(&pdev->dev, "Masking broken INTx support\n");
> >> +			u8 pin = 0;
> >> +
> >> +			pci_read_config_byte(vdev->pdev, PCI_INTERRUPT_PIN,
> >> +					     &pin);
> >> +			if (pin)
> >> +				dev_info(&pdev->dev,
> >> +					 "Masking broken INTx support\n");
> >>  			vdev->nointx = true;
> >>  			pci_intx(pdev, 0);
> >>  		} else  
> > 
> > 
> > Why don't we fix it at the location of the bug rather than after the
> > fact?  I don't see any reason to invoke the nointx handling for devices
> > that don't actually support INTx.  Thanks,  
> 
> 
> We can do that too, this just means that we will keep doing v2.3 masking
> tests and possibly other things for SRIOV VFs when we do not need to as
> dev-irq==0 anyway. Not a big deal though (was not a problem before), are
> you going to post this as a patch or you want me to do this? Thanks,

I tend to prefer keeping the nointx as a special case, where I don't
consider that a device not supporting intx, such as a vf, a special
case.  The user of the device can detect that the device doesn't
support intx and we don't need to worry about the device triggering
it.  The nointx case is trying to emulate that in software.  Despite
the commit subject of 2170dd04316e, I think the intent of that commit
is to invoke the nointx behavior if the device supports intx, but
something in the chipset/platform/whatever has made it unconfigurable.

Now that you mention v2.3 stuff, I further question the validity of the
original patch.  The nointx handling depends on DisINTx support being
only half broken, that we can mask it, but not detect a pending
interrupt.  OTOH, pdev->irq being NULL precludes our ability to
configure an irq handler, but what does it imply about the device's
ability to cause chaos by triggering the interrupt line regardless?
The nointx case does quite a bit of pci_intx(pdev, 0) calls to stop the
device from pulling intx, which is why I don't want to use it for
devices no supporting intx.  Should we only be invoking the nointx path
if a) the device supports intx (pin reg), b) pdev->irq is null, and c)
the device passes the v2.3 masking test (which is cached, not retested
now, btw)?  Thanks,

Alex
Alexey Kardashevskiy March 18, 2018, 2:05 p.m. UTC | #3
On 15/3/18 2:02 pm, Alex Williamson wrote:
> On Tue, 13 Mar 2018 13:38:46 +1100
> Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
> 
>> On 13/3/18 9:44 am, Alex Williamson wrote:
>>> On Fri,  9 Mar 2018 12:17:36 +1100
>>> Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
>>>   
>>>> Commit 2170dd04316e ("vfio-pci: Mask INTx if a device is not capabable
>>>> of enabling it") is causing 'Masking broken INTx support' messages
>>>> every time when a PCI without INTx support is enabled. This message is
>>>> intended to appear when a device known for broken INTx is being enabled.
>>>> However the message also appears on IOV virtual functions where INTx
>>>> cannot possibly be enabled so saying that the "broken" support is masked
>>>> is not correct.
>>>>
>>>> This changes the message to only appear when the device advertises INTx
>>>> (via PCI_INTERRUPT_PIN != 0) but does not implement PCI 2.3 INTx masking.
>>>>
>>>> Fixes: 2170dd04316e ("vfio-pci: Mask INTx if a device is not capabable of enabling it")
>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>> ---
>>>>  drivers/vfio/pci/vfio_pci.c | 8 +++++++-
>>>>  1 file changed, 7 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
>>>> index b0f7594..7f2ec47 100644
>>>> --- a/drivers/vfio/pci/vfio_pci.c
>>>> +++ b/drivers/vfio/pci/vfio_pci.c
>>>> @@ -245,7 +245,13 @@ static int vfio_pci_enable(struct vfio_pci_device *vdev)
>>>>  
>>>>  	if (likely(!nointxmask)) {
>>>>  		if (vfio_pci_nointx(pdev)) {
>>>> -			dev_info(&pdev->dev, "Masking broken INTx support\n");
>>>> +			u8 pin = 0;
>>>> +
>>>> +			pci_read_config_byte(vdev->pdev, PCI_INTERRUPT_PIN,
>>>> +					     &pin);
>>>> +			if (pin)
>>>> +				dev_info(&pdev->dev,
>>>> +					 "Masking broken INTx support\n");
>>>>  			vdev->nointx = true;
>>>>  			pci_intx(pdev, 0);
>>>>  		} else  
>>>
>  Should we only be invoking the nointx path
> if a) the device supports intx (pin reg), b) pdev->irq is null, and c)
> the device passes the v2.3 masking test (which is cached, not retested
> now, btw)?  

Yes we should, and yes I missed that last bit about caching the result of
the test which is performed anyway in the common pci code. Thanks,
Alex Williamson March 19, 2018, 2:15 a.m. UTC | #4
On Mon, 19 Mar 2018 01:05:27 +1100
Alexey Kardashevskiy <aik@ozlabs.ru> wrote:

> On 15/3/18 2:02 pm, Alex Williamson wrote:
> > On Tue, 13 Mar 2018 13:38:46 +1100
> > Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
> >   
> >> On 13/3/18 9:44 am, Alex Williamson wrote:  
> >>> On Fri,  9 Mar 2018 12:17:36 +1100
> >>> Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
> >>>     
> >>>> Commit 2170dd04316e ("vfio-pci: Mask INTx if a device is not capabable
> >>>> of enabling it") is causing 'Masking broken INTx support' messages
> >>>> every time when a PCI without INTx support is enabled. This message is
> >>>> intended to appear when a device known for broken INTx is being enabled.
> >>>> However the message also appears on IOV virtual functions where INTx
> >>>> cannot possibly be enabled so saying that the "broken" support is masked
> >>>> is not correct.
> >>>>
> >>>> This changes the message to only appear when the device advertises INTx
> >>>> (via PCI_INTERRUPT_PIN != 0) but does not implement PCI 2.3 INTx masking.
> >>>>
> >>>> Fixes: 2170dd04316e ("vfio-pci: Mask INTx if a device is not capabable of enabling it")
> >>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> >>>> ---
> >>>>  drivers/vfio/pci/vfio_pci.c | 8 +++++++-
> >>>>  1 file changed, 7 insertions(+), 1 deletion(-)
> >>>>
> >>>> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> >>>> index b0f7594..7f2ec47 100644
> >>>> --- a/drivers/vfio/pci/vfio_pci.c
> >>>> +++ b/drivers/vfio/pci/vfio_pci.c
> >>>> @@ -245,7 +245,13 @@ static int vfio_pci_enable(struct vfio_pci_device *vdev)
> >>>>  
> >>>>  	if (likely(!nointxmask)) {
> >>>>  		if (vfio_pci_nointx(pdev)) {
> >>>> -			dev_info(&pdev->dev, "Masking broken INTx support\n");
> >>>> +			u8 pin = 0;
> >>>> +
> >>>> +			pci_read_config_byte(vdev->pdev, PCI_INTERRUPT_PIN,
> >>>> +					     &pin);
> >>>> +			if (pin)
> >>>> +				dev_info(&pdev->dev,
> >>>> +					 "Masking broken INTx support\n");
> >>>>  			vdev->nointx = true;
> >>>>  			pci_intx(pdev, 0);
> >>>>  		} else    
> >>>  
> >  Should we only be invoking the nointx path
> > if a) the device supports intx (pin reg), b) pdev->irq is null, and c)
> > the device passes the v2.3 masking test (which is cached, not retested
> > now, btw)?    
> 
> Yes we should, and yes I missed that last bit about caching the result of
> the test which is performed anyway in the common pci code. Thanks,

I spent a while considering whether we should include c) in the patch I
posted, but I couldn't get past the question of what we'd do with the
device if it doesn't pass the INTx masking test.  We'd want to emulate
the INTx pin register and report no INTx irq info, as the nointx path
does, then what?  If writing DisINTx doesn't do anything, we need to
decide whether lack of INTx routing means the device can harmlessly
pull INTx, or if pulling INTx is not harmless, then we'd need to ban
the device from being assigned to the user.  Since we don't know of any
problems and it seems sort of reasonable to assume that if there's no
INTx routing that we can tug it all we want (because if someone is
listening, it's a platform bug that it's not routed), I left it alone
in my follow-up patch.  Let me know if you have other thoughts.  Thanks,

Alex
Alexey Kardashevskiy March 22, 2018, 4:02 a.m. UTC | #5
On 19/3/18 1:15 pm, Alex Williamson wrote:
> On Mon, 19 Mar 2018 01:05:27 +1100
> Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
> 
>> On 15/3/18 2:02 pm, Alex Williamson wrote:
>>> On Tue, 13 Mar 2018 13:38:46 +1100
>>> Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
>>>   
>>>> On 13/3/18 9:44 am, Alex Williamson wrote:  
>>>>> On Fri,  9 Mar 2018 12:17:36 +1100
>>>>> Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
>>>>>     
>>>>>> Commit 2170dd04316e ("vfio-pci: Mask INTx if a device is not capabable
>>>>>> of enabling it") is causing 'Masking broken INTx support' messages
>>>>>> every time when a PCI without INTx support is enabled. This message is
>>>>>> intended to appear when a device known for broken INTx is being enabled.
>>>>>> However the message also appears on IOV virtual functions where INTx
>>>>>> cannot possibly be enabled so saying that the "broken" support is masked
>>>>>> is not correct.
>>>>>>
>>>>>> This changes the message to only appear when the device advertises INTx
>>>>>> (via PCI_INTERRUPT_PIN != 0) but does not implement PCI 2.3 INTx masking.
>>>>>>
>>>>>> Fixes: 2170dd04316e ("vfio-pci: Mask INTx if a device is not capabable of enabling it")
>>>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>>>> ---
>>>>>>  drivers/vfio/pci/vfio_pci.c | 8 +++++++-
>>>>>>  1 file changed, 7 insertions(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
>>>>>> index b0f7594..7f2ec47 100644
>>>>>> --- a/drivers/vfio/pci/vfio_pci.c
>>>>>> +++ b/drivers/vfio/pci/vfio_pci.c
>>>>>> @@ -245,7 +245,13 @@ static int vfio_pci_enable(struct vfio_pci_device *vdev)
>>>>>>  
>>>>>>  	if (likely(!nointxmask)) {
>>>>>>  		if (vfio_pci_nointx(pdev)) {
>>>>>> -			dev_info(&pdev->dev, "Masking broken INTx support\n");
>>>>>> +			u8 pin = 0;
>>>>>> +
>>>>>> +			pci_read_config_byte(vdev->pdev, PCI_INTERRUPT_PIN,
>>>>>> +					     &pin);
>>>>>> +			if (pin)
>>>>>> +				dev_info(&pdev->dev,
>>>>>> +					 "Masking broken INTx support\n");
>>>>>>  			vdev->nointx = true;
>>>>>>  			pci_intx(pdev, 0);
>>>>>>  		} else    
>>>>>  
>>>  Should we only be invoking the nointx path
>>> if a) the device supports intx (pin reg), b) pdev->irq is null, and c)
>>> the device passes the v2.3 masking test (which is cached, not retested
>>> now, btw)?    
>>
>> Yes we should, and yes I missed that last bit about caching the result of
>> the test which is performed anyway in the common pci code. Thanks,
> 
> I spent a while considering whether we should include c) in the patch I
> posted, but I couldn't get past the question of what we'd do with the
> device if it doesn't pass the INTx masking test.  We'd want to emulate
> the INTx pin register and report no INTx irq info, as the nointx path
> does, then what?  If writing DisINTx doesn't do anything, we need to
> decide whether lack of INTx routing means the device can harmlessly
> pull INTx, or if pulling INTx is not harmless, then we'd need to ban
> the device from being assigned to the user.  Since we don't know of any
> problems and it seems sort of reasonable to assume that if there's no
> INTx routing that we can tug it all we want (because if someone is
> listening, it's a platform bug that it's not routed), I left it alone
> in my follow-up patch.  Let me know if you have other thoughts.  Thanks,


Well, pulling INTx is nasty anyway as a device may die afterwards and the
specific INTx line (or a latch in PCIe bridge) will be blocked for the
devices sharing the same line, and DisINTx does not change that. Our other
hypervisor (pHyp) simply prohibits INTx in order to avoid dealing with this
(this is why 2170dd04316e was made). So not enforcing nointx for
DisINTx-incapable devices does not seem much worse than enabling INTx for
pass through itself.
Alex Williamson March 22, 2018, 4:40 a.m. UTC | #6
On Thu, 22 Mar 2018 15:02:09 +1100
Alexey Kardashevskiy <aik@ozlabs.ru> wrote:

> On 19/3/18 1:15 pm, Alex Williamson wrote:
> > On Mon, 19 Mar 2018 01:05:27 +1100
> > Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
> >   
> >> On 15/3/18 2:02 pm, Alex Williamson wrote:  
> >>> On Tue, 13 Mar 2018 13:38:46 +1100
> >>> Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
> >>>     
> >>>> On 13/3/18 9:44 am, Alex Williamson wrote:       
> >>>  Should we only be invoking the nointx path
> >>> if a) the device supports intx (pin reg), b) pdev->irq is null, and c)
> >>> the device passes the v2.3 masking test (which is cached, not retested
> >>> now, btw)?      
> >>
> >> Yes we should, and yes I missed that last bit about caching the result of
> >> the test which is performed anyway in the common pci code. Thanks,  
> > 
> > I spent a while considering whether we should include c) in the patch I
> > posted, but I couldn't get past the question of what we'd do with the
> > device if it doesn't pass the INTx masking test.  We'd want to emulate
> > the INTx pin register and report no INTx irq info, as the nointx path
> > does, then what?  If writing DisINTx doesn't do anything, we need to
> > decide whether lack of INTx routing means the device can harmlessly
> > pull INTx, or if pulling INTx is not harmless, then we'd need to ban
> > the device from being assigned to the user.  Since we don't know of any
> > problems and it seems sort of reasonable to assume that if there's no
> > INTx routing that we can tug it all we want (because if someone is
> > listening, it's a platform bug that it's not routed), I left it alone
> > in my follow-up patch.  Let me know if you have other thoughts.  Thanks,  
> 
> 
> Well, pulling INTx is nasty anyway as a device may die afterwards and the
> specific INTx line (or a latch in PCIe bridge) will be blocked for the
> devices sharing the same line, and DisINTx does not change that.

"... a device may die afterwards"?  What does that mean?  In normal
operation, INTx fires, DisINTx gets set, the interrupt is forwarded to
the user, the user services and unmasks the interrupt, rinse and
repeat.  If the device somehow gets into a locked state where it's
pulling INTx and disregarding DisINTx, a) this is broken hardware, b)
the spurious interrupt handler will mask the interrupt and switch to
polling, penalizing everyone sharing that line.  I don't know that
there's a model where we can factor in catastrophic hardware failure
into the equation.

> Our other
> hypervisor (pHyp) simply prohibits INTx in order to avoid dealing with this
> (this is why 2170dd04316e was made). So not enforcing nointx for
> DisINTx-incapable devices does not seem much worse than enabling INTx for
> pass through itself.

Too. Many. Negatives.  If a device shares an INTx line with other
devices then it certainly needs to support DisINTx in order to play
nicely with userspace drivers.  We require exclusive interrupts for
devices that do not support DisINTx.  If the hypervisor is choosing to
avoid dealing with that problem by not providing routing for INTx
(effectively a paravirt please don't use this interface), then it seems
like it's not fit to support userspace drivers.  It's depending on the
cooperation of the user.  We can't prevent the user from making the card
pull INTx, we can only discourage them from using it, which is not
sufficient.  Therefore if we really do need to block use of such
devices (not INTx routing, no DisINTx support) from using vfio, my patch
is insufficient and I think the best approach at this stage is to revert
2170dd04316e and you can rehash a solution for pHyp.  Thanks,

Alex
Alexey Kardashevskiy March 22, 2018, 7:32 a.m. UTC | #7
On 22/3/18 3:40 pm, Alex Williamson wrote:
> On Thu, 22 Mar 2018 15:02:09 +1100
> Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
> 
>> On 19/3/18 1:15 pm, Alex Williamson wrote:
>>> On Mon, 19 Mar 2018 01:05:27 +1100
>>> Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
>>>   
>>>> On 15/3/18 2:02 pm, Alex Williamson wrote:  
>>>>> On Tue, 13 Mar 2018 13:38:46 +1100
>>>>> Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
>>>>>     
>>>>>> On 13/3/18 9:44 am, Alex Williamson wrote:       
>>>>>  Should we only be invoking the nointx path
>>>>> if a) the device supports intx (pin reg), b) pdev->irq is null, and c)
>>>>> the device passes the v2.3 masking test (which is cached, not retested
>>>>> now, btw)?      
>>>>
>>>> Yes we should, and yes I missed that last bit about caching the result of
>>>> the test which is performed anyway in the common pci code. Thanks,  
>>>
>>> I spent a while considering whether we should include c) in the patch I
>>> posted, but I couldn't get past the question of what we'd do with the
>>> device if it doesn't pass the INTx masking test.  We'd want to emulate
>>> the INTx pin register and report no INTx irq info, as the nointx path
>>> does, then what?  If writing DisINTx doesn't do anything, we need to
>>> decide whether lack of INTx routing means the device can harmlessly
>>> pull INTx, or if pulling INTx is not harmless, then we'd need to ban
>>> the device from being assigned to the user.  Since we don't know of any
>>> problems and it seems sort of reasonable to assume that if there's no
>>> INTx routing that we can tug it all we want (because if someone is
>>> listening, it's a platform bug that it's not routed), I left it alone
>>> in my follow-up patch.  Let me know if you have other thoughts.  Thanks,  
>>
>>
>> Well, pulling INTx is nasty anyway as a device may die afterwards and the
>> specific INTx line (or a latch in PCIe bridge) will be blocked for the
>> devices sharing the same line, and DisINTx does not change that.
> 
> "... a device may die afterwards"?  What does that mean?  

I meant the device could be blocked by EEH after raising INTx.

> In normal
> operation, INTx fires, DisINTx gets set, the interrupt is forwarded to
> the user, the user services and unmasks the interrupt, rinse and
> repeat.  If the device somehow gets into a locked state where it's
> pulling INTx and disregarding DisINTx, a) this is broken hardware, b)
> the spurious interrupt handler will mask the interrupt and switch to
> polling, penalizing everyone sharing that line.  I don't know that
> there's a model where we can factor in catastrophic hardware failure
> into the equation.
>>> Our other
>> hypervisor (pHyp) simply prohibits INTx in order to avoid dealing with this
>> (this is why 2170dd04316e was made). So not enforcing nointx for
>> DisINTx-incapable devices does not seem much worse than enabling INTx for
>> pass through itself.
> 
> Too. Many. Negatives.  If a device shares an INTx line with other
> devices then it certainly needs to support DisINTx in order to play
> nicely with userspace drivers.  We require exclusive interrupts for
> devices that do not support DisINTx.  If the hypervisor is choosing to
> avoid dealing with that problem by not providing routing for INTx
> (effectively a paravirt please don't use this interface), then it seems
> like it's not fit to support userspace drivers. It's depending on the
> cooperation of the user. We can't prevent the user from making the card
> pull INTx, we can only discourage them from using it, which is not
> sufficient. Therefore if we really do need to block use of such
> devices (not INTx routing, no DisINTx support) from using vfio, my patch
> is insufficient and I think the best approach at this stage is to revert
> 2170dd04316e and you can rehash a solution for pHyp.  Thanks,

No, I do not want to block INTx and I am happy with "[PATCH v2] vfio/pci:
Quiet broken INTx whining when INTx is unsupported by device". I am missing
the point here, sorry, no-INTx-routing + no-DisINTx + no-MSI(X) - I
understand why we would want to block these but if a device cannot do INTx
nicely but still can do MSI(X) - why would we want to block it?
Alex Williamson March 22, 2018, 1:13 p.m. UTC | #8
On Thu, 22 Mar 2018 18:32:56 +1100
Alexey Kardashevskiy <aik@ozlabs.ru> wrote:

> On 22/3/18 3:40 pm, Alex Williamson wrote:
> > On Thu, 22 Mar 2018 15:02:09 +1100
> > Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
> >   
> >> On 19/3/18 1:15 pm, Alex Williamson wrote:  
> >>> On Mon, 19 Mar 2018 01:05:27 +1100
> >>> Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
> >>>     
> >>>> On 15/3/18 2:02 pm, Alex Williamson wrote:    
> >>>>> On Tue, 13 Mar 2018 13:38:46 +1100
> >>>>> Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
> >>>>>       
> >>>>>> On 13/3/18 9:44 am, Alex Williamson wrote:         
> >>>>>  Should we only be invoking the nointx path
> >>>>> if a) the device supports intx (pin reg), b) pdev->irq is null, and c)
> >>>>> the device passes the v2.3 masking test (which is cached, not retested
> >>>>> now, btw)?        
> >>>>
> >>>> Yes we should, and yes I missed that last bit about caching the result of
> >>>> the test which is performed anyway in the common pci code. Thanks,    
> >>>
> >>> I spent a while considering whether we should include c) in the patch I
> >>> posted, but I couldn't get past the question of what we'd do with the
> >>> device if it doesn't pass the INTx masking test.  We'd want to emulate
> >>> the INTx pin register and report no INTx irq info, as the nointx path
> >>> does, then what?  If writing DisINTx doesn't do anything, we need to
> >>> decide whether lack of INTx routing means the device can harmlessly
> >>> pull INTx, or if pulling INTx is not harmless, then we'd need to ban
> >>> the device from being assigned to the user.  Since we don't know of any
> >>> problems and it seems sort of reasonable to assume that if there's no
> >>> INTx routing that we can tug it all we want (because if someone is
> >>> listening, it's a platform bug that it's not routed), I left it alone
> >>> in my follow-up patch.  Let me know if you have other thoughts.  Thanks,    
> >>
> >>
> >> Well, pulling INTx is nasty anyway as a device may die afterwards and the
> >> specific INTx line (or a latch in PCIe bridge) will be blocked for the
> >> devices sharing the same line, and DisINTx does not change that.  
> > 
> > "... a device may die afterwards"?  What does that mean?    
> 
> I meant the device could be blocked by EEH after raising INTx.
> 
> > In normal
> > operation, INTx fires, DisINTx gets set, the interrupt is forwarded to
> > the user, the user services and unmasks the interrupt, rinse and
> > repeat.  If the device somehow gets into a locked state where it's
> > pulling INTx and disregarding DisINTx, a) this is broken hardware, b)
> > the spurious interrupt handler will mask the interrupt and switch to
> > polling, penalizing everyone sharing that line.  I don't know that
> > there's a model where we can factor in catastrophic hardware failure
> > into the equation.  
> >>> Our other  
> >> hypervisor (pHyp) simply prohibits INTx in order to avoid dealing with this
> >> (this is why 2170dd04316e was made). So not enforcing nointx for
> >> DisINTx-incapable devices does not seem much worse than enabling INTx for
> >> pass through itself.  
> > 
> > Too. Many. Negatives.  If a device shares an INTx line with other
> > devices then it certainly needs to support DisINTx in order to play
> > nicely with userspace drivers.  We require exclusive interrupts for
> > devices that do not support DisINTx.  If the hypervisor is choosing to
> > avoid dealing with that problem by not providing routing for INTx
> > (effectively a paravirt please don't use this interface), then it seems
> > like it's not fit to support userspace drivers. It's depending on the
> > cooperation of the user. We can't prevent the user from making the card
> > pull INTx, we can only discourage them from using it, which is not
> > sufficient. Therefore if we really do need to block use of such
> > devices (not INTx routing, no DisINTx support) from using vfio, my patch
> > is insufficient and I think the best approach at this stage is to revert
> > 2170dd04316e and you can rehash a solution for pHyp.  Thanks,  
> 
> No, I do not want to block INTx and I am happy with "[PATCH v2] vfio/pci:
> Quiet broken INTx whining when INTx is unsupported by device". I am missing
> the point here, sorry, no-INTx-routing + no-DisINTx + no-MSI(X) - I
> understand why we would want to block these but if a device cannot do INTx
> nicely but still can do MSI(X) - why would we want to block it?

Because "cannot do INTx nicely" means one of:

a) device can assert INTx all it wants and nobody cares (ie. it's not
connected or routed)

b) device INTx is connected or routed at the pHyp level, the device can
cause badness by asserting INTx, we are only advising the user not to
use INTx

The assumption in my version of the patch is a), but you seem to have
clarified that the situation is really b), in which case we have no
solution for devices which do not support DisINTx, they should be
disallowed from use through vfio.  The user has the ability to cause
the device to assert INTx regardless of whether we report a pin
register, report INTx irq info, or support it through SET_IRQ, it
cannot be masked at the device, the pHyp hypervisor has not provided
routing, so it cannot be masked at an interrupt controller, and
apparently we cannot presume that asserting INTx is harmless, as I'd
hoped.  What prevents a user from exploiting INTx on a device that does
not support DisINTx when running on pHyp?  Thanks,

Alex
Alex Williamson March 22, 2018, 9:30 p.m. UTC | #9
On Thu, 22 Mar 2018 21:09:09 +0000
Casey Leedom <leedom@chelsio.com> wrote:

> | From: Alex Williamson <alex.williamson@redhat.com>
> | Sent: Thursday, March 22, 2018 6:13 AM
> | ...
> | On Thu, 22 Mar 2018 18:32:56 +1100
> | Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
> | > ...
> | > No, I do not want to block INTx and I am happy with "[PATCH v2]
> | > vfio/pci: Quiet broken INTx whining when INTx is unsupported by
> | > device". I am missing the point here, sorry, no-INTx-routing +
> | > no-DisINTx + no-MSI(X) - I understand why we would want to block
> | > these but if a device cannot do INTx nicely but still can do
> | > MSI(X) - why would we want to block it?
> | 
> | Because "cannot do INTx nicely" means one of:
> | 
> | a) device can assert INTx all it wants and nobody cares (ie. it's not
> | connected or routed)
> | 
> | b) device INTx is connected or routed at the pHyp level, the device can
> | cause badness by asserting INTx, we are only advising the user not to
> | use INTx
> | 
> | The assumption in my version of the patch is a), but you seem to have
> | clarified that the situation is really b), in which case we have no
> | solution for devices which do not support DisINTx, they should be
> | disallowed from use through vfio.  The user has the ability to cause
> | the device to assert INTx regardless of whether we report a pin
> | register, report INTx irq info, or support it through SET_IRQ, it
> | cannot be masked at the device, the pHyp hypervisor has not provided
> | routing, so it cannot be masked at an interrupt controller, and
> | apparently we cannot presume that asserting INTx is harmless, as I'd
> | hoped.  What prevents a user from exploiting INTx on a device that does
> | not support DisINTx when running on pHyp?  Thanks,
> 
> I must remind everyone that Single Root I/O Virtualization Virtual Functions
> may _NOT_ implement Legacy Pin INTx Interrupts as per the SR-IOV
> specification.  I.e. VFs cannot and must not implement INTx ... and they

That's not the under debate, VFs are fine, they cannot assert INTx, the
original commit was never intended to affect them.  The question is on
this pHyp platform where we might be assigning a traditional PCIe
endpoint where the device does support INTx, but the hypervisor (which
we're running on) is not exposing INTx routing, therefore we have an
INTx pin, but we don't have pdev->irq and we need to guess whether the
device asserting INTx can cause bad things.  If we think the device
supports DisINTx, we can limit it's ability to assert INTx, if not, the
user can make the device assert INTx regardless of what we expose to it.

> were the entire original motivation for allowing PCI Devices to be attached
> to Virtual Machines ...

That's debatable.  Obviously aspects of VFs make them much preferred
for assignment, but device assignment would be here with or without
them and we get to deal with supporting all flavors of devices.  Thanks,

Alex
diff mbox

Patch

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index ad18ed266dc0..34d463a0b4a5 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -192,6 +192,8 @@  static void vfio_pci_disable(struct vfio_pci_device *vdev);
  */
 static bool vfio_pci_nointx(struct pci_dev *pdev)
 {
+	u8 pin;
+
 	switch (pdev->vendor) {
 	case PCI_VENDOR_ID_INTEL:
 		switch (pdev->device) {
@@ -207,7 +209,9 @@  static bool vfio_pci_nointx(struct pci_dev *pdev)
 		}
 	}
 
-	if (!pdev->irq)
+	pci_read_config_byte(pdev, PCI_INTERRUPT_PIN, &pin);
+
+	if (pin && !pdev->irq)
 		return true;
 
 	return false;