diff mbox series

[v2,2/3] virito_pci: add timeout to reset device operation

Message ID 20210408081109.56537-2-mgurtovoy@nvidia.com (mailing list archive)
State New, archived
Headers show
Series [v2,1/3] virtio: update reset callback to return status | expand

Commit Message

Max Gurtovoy April 8, 2021, 8:11 a.m. UTC
According to the spec after writing 0 to device_status, the driver MUST
wait for a read of device_status to return 0 before reinitializing the
device. In case we have a device that won't return 0, the reset
operation will loop forever and cause the host/vm to stuck. Set timeout
for 3 minutes before giving up on the device.

Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
---
 drivers/virtio/virtio_pci_modern.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

Comments

Jason Wang April 8, 2021, 9:01 a.m. UTC | #1
在 2021/4/8 下午4:11, Max Gurtovoy 写道:
> According to the spec after writing 0 to device_status, the driver MUST
> wait for a read of device_status to return 0 before reinitializing the
> device. In case we have a device that won't return 0, the reset
> operation will loop forever and cause the host/vm to stuck. Set timeout
> for 3 minutes before giving up on the device.
>
> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
> ---
>   drivers/virtio/virtio_pci_modern.c | 10 +++++++++-
>   1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c
> index cc3412a96a17..dcee616e8d21 100644
> --- a/drivers/virtio/virtio_pci_modern.c
> +++ b/drivers/virtio/virtio_pci_modern.c
> @@ -162,6 +162,7 @@ static int vp_reset(struct virtio_device *vdev)
>   {
>   	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
>   	struct virtio_pci_modern_device *mdev = &vp_dev->mdev;
> +	unsigned long timeout = jiffies + msecs_to_jiffies(180000);
>   
>   	/* 0 status means a reset. */
>   	vp_modern_set_status(mdev, 0);
> @@ -169,9 +170,16 @@ static int vp_reset(struct virtio_device *vdev)
>   	 * device_status to return 0 before reinitializing the device.
>   	 * This will flush out the status write, and flush in device writes,
>   	 * including MSI-X interrupts, if any.
> +	 * Set a timeout before giving up on the device.
>   	 */
> -	while (vp_modern_get_status(mdev))
> +	while (vp_modern_get_status(mdev)) {
> +		if (time_after(jiffies, timeout)) {


What happens if the device finish the rest after the timeout?

Thanks


> +			dev_err(&vdev->dev, "virtio: device not ready. "
> +				"Aborting. Try again later\n");
> +			return -EAGAIN;
> +		}
>   		msleep(1);
> +	}
>   	/* Flush pending VQ/configuration callbacks. */
>   	vp_synchronize_vectors(vdev);
>   	return 0;
Max Gurtovoy April 8, 2021, 9:44 a.m. UTC | #2
On 4/8/2021 12:01 PM, Jason Wang wrote:
>
> 在 2021/4/8 下午4:11, Max Gurtovoy 写道:
>> According to the spec after writing 0 to device_status, the driver MUST
>> wait for a read of device_status to return 0 before reinitializing the
>> device. In case we have a device that won't return 0, the reset
>> operation will loop forever and cause the host/vm to stuck. Set timeout
>> for 3 minutes before giving up on the device.
>>
>> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
>> ---
>>   drivers/virtio/virtio_pci_modern.c | 10 +++++++++-
>>   1 file changed, 9 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/virtio/virtio_pci_modern.c 
>> b/drivers/virtio/virtio_pci_modern.c
>> index cc3412a96a17..dcee616e8d21 100644
>> --- a/drivers/virtio/virtio_pci_modern.c
>> +++ b/drivers/virtio/virtio_pci_modern.c
>> @@ -162,6 +162,7 @@ static int vp_reset(struct virtio_device *vdev)
>>   {
>>       struct virtio_pci_device *vp_dev = to_vp_device(vdev);
>>       struct virtio_pci_modern_device *mdev = &vp_dev->mdev;
>> +    unsigned long timeout = jiffies + msecs_to_jiffies(180000);
>>         /* 0 status means a reset. */
>>       vp_modern_set_status(mdev, 0);
>> @@ -169,9 +170,16 @@ static int vp_reset(struct virtio_device *vdev)
>>        * device_status to return 0 before reinitializing the device.
>>        * This will flush out the status write, and flush in device 
>> writes,
>>        * including MSI-X interrupts, if any.
>> +     * Set a timeout before giving up on the device.
>>        */
>> -    while (vp_modern_get_status(mdev))
>> +    while (vp_modern_get_status(mdev)) {
>> +        if (time_after(jiffies, timeout)) {
>
>
> What happens if the device finish the rest after the timeout?


The driver will set VIRTIO_CONFIG_S_FAILED and one can re-probe it later 
on (e.g by re-scanning the pci bus).


>
> Thanks
>
>
>> +            dev_err(&vdev->dev, "virtio: device not ready. "
>> +                "Aborting. Try again later\n");
>> +            return -EAGAIN;
>> +        }
>>           msleep(1);
>> +    }
>>       /* Flush pending VQ/configuration callbacks. */
>>       vp_synchronize_vectors(vdev);
>>       return 0;
>
Jason Wang April 8, 2021, 12:45 p.m. UTC | #3
在 2021/4/8 下午5:44, Max Gurtovoy 写道:
>
> On 4/8/2021 12:01 PM, Jason Wang wrote:
>>
>> 在 2021/4/8 下午4:11, Max Gurtovoy 写道:
>>> According to the spec after writing 0 to device_status, the driver MUST
>>> wait for a read of device_status to return 0 before reinitializing the
>>> device. In case we have a device that won't return 0, the reset
>>> operation will loop forever and cause the host/vm to stuck. Set timeout
>>> for 3 minutes before giving up on the device.
>>>
>>> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
>>> ---
>>>   drivers/virtio/virtio_pci_modern.c | 10 +++++++++-
>>>   1 file changed, 9 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/virtio/virtio_pci_modern.c 
>>> b/drivers/virtio/virtio_pci_modern.c
>>> index cc3412a96a17..dcee616e8d21 100644
>>> --- a/drivers/virtio/virtio_pci_modern.c
>>> +++ b/drivers/virtio/virtio_pci_modern.c
>>> @@ -162,6 +162,7 @@ static int vp_reset(struct virtio_device *vdev)
>>>   {
>>>       struct virtio_pci_device *vp_dev = to_vp_device(vdev);
>>>       struct virtio_pci_modern_device *mdev = &vp_dev->mdev;
>>> +    unsigned long timeout = jiffies + msecs_to_jiffies(180000);
>>>         /* 0 status means a reset. */
>>>       vp_modern_set_status(mdev, 0);
>>> @@ -169,9 +170,16 @@ static int vp_reset(struct virtio_device *vdev)
>>>        * device_status to return 0 before reinitializing the device.
>>>        * This will flush out the status write, and flush in device 
>>> writes,
>>>        * including MSI-X interrupts, if any.
>>> +     * Set a timeout before giving up on the device.
>>>        */
>>> -    while (vp_modern_get_status(mdev))
>>> +    while (vp_modern_get_status(mdev)) {
>>> +        if (time_after(jiffies, timeout)) {
>>
>>
>> What happens if the device finish the rest after the timeout?
>
>
> The driver will set VIRTIO_CONFIG_S_FAILED and one can re-probe it 
> later on (e.g by re-scanning the pci bus).


Ok, so do we need the flush through vp_synchronize_vectors() here?

Thanks


>
>
>>
>> Thanks
>>
>>
>>> +            dev_err(&vdev->dev, "virtio: device not ready. "
>>> +                "Aborting. Try again later\n");
>>> +            return -EAGAIN;
>>> +        }
>>>           msleep(1);
>>> +    }
>>>       /* Flush pending VQ/configuration callbacks. */
>>>       vp_synchronize_vectors(vdev);
>>>       return 0;
>>
>
Max Gurtovoy April 8, 2021, 12:57 p.m. UTC | #4
On 4/8/2021 3:45 PM, Jason Wang wrote:
>
> 在 2021/4/8 下午5:44, Max Gurtovoy 写道:
>>
>> On 4/8/2021 12:01 PM, Jason Wang wrote:
>>>
>>> 在 2021/4/8 下午4:11, Max Gurtovoy 写道:
>>>> According to the spec after writing 0 to device_status, the driver 
>>>> MUST
>>>> wait for a read of device_status to return 0 before reinitializing the
>>>> device. In case we have a device that won't return 0, the reset
>>>> operation will loop forever and cause the host/vm to stuck. Set 
>>>> timeout
>>>> for 3 minutes before giving up on the device.
>>>>
>>>> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
>>>> ---
>>>>   drivers/virtio/virtio_pci_modern.c | 10 +++++++++-
>>>>   1 file changed, 9 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/virtio/virtio_pci_modern.c 
>>>> b/drivers/virtio/virtio_pci_modern.c
>>>> index cc3412a96a17..dcee616e8d21 100644
>>>> --- a/drivers/virtio/virtio_pci_modern.c
>>>> +++ b/drivers/virtio/virtio_pci_modern.c
>>>> @@ -162,6 +162,7 @@ static int vp_reset(struct virtio_device *vdev)
>>>>   {
>>>>       struct virtio_pci_device *vp_dev = to_vp_device(vdev);
>>>>       struct virtio_pci_modern_device *mdev = &vp_dev->mdev;
>>>> +    unsigned long timeout = jiffies + msecs_to_jiffies(180000);
>>>>         /* 0 status means a reset. */
>>>>       vp_modern_set_status(mdev, 0);
>>>> @@ -169,9 +170,16 @@ static int vp_reset(struct virtio_device *vdev)
>>>>        * device_status to return 0 before reinitializing the device.
>>>>        * This will flush out the status write, and flush in device 
>>>> writes,
>>>>        * including MSI-X interrupts, if any.
>>>> +     * Set a timeout before giving up on the device.
>>>>        */
>>>> -    while (vp_modern_get_status(mdev))
>>>> +    while (vp_modern_get_status(mdev)) {
>>>> +        if (time_after(jiffies, timeout)) {
>>>
>>>
>>> What happens if the device finish the rest after the timeout?
>>
>>
>> The driver will set VIRTIO_CONFIG_S_FAILED and one can re-probe it 
>> later on (e.g by re-scanning the pci bus).
>
>
> Ok, so do we need the flush through vp_synchronize_vectors() here?

If the device didn't write 0 to status I guess we don't need that.

The device shouldn't raise any interrupt before negotiation finish 
successfully.

MST, is that correct ?

>
> Thanks
>
>
>>
>>
>>>
>>> Thanks
>>>
>>>
>>>> + dev_err(&vdev->dev, "virtio: device not ready. "
>>>> +                "Aborting. Try again later\n");
>>>> +            return -EAGAIN;
>>>> +        }
>>>>           msleep(1);
>>>> +    }
>>>>       /* Flush pending VQ/configuration callbacks. */
>>>>       vp_synchronize_vectors(vdev);
>>>>       return 0;
>>>
>>
>
Jason Wang April 8, 2021, 1:18 p.m. UTC | #5
在 2021/4/8 下午8:57, Max Gurtovoy 写道:
>
> On 4/8/2021 3:45 PM, Jason Wang wrote:
>>
>> 在 2021/4/8 下午5:44, Max Gurtovoy 写道:
>>>
>>> On 4/8/2021 12:01 PM, Jason Wang wrote:
>>>>
>>>> 在 2021/4/8 下午4:11, Max Gurtovoy 写道:
>>>>> According to the spec after writing 0 to device_status, the driver 
>>>>> MUST
>>>>> wait for a read of device_status to return 0 before reinitializing 
>>>>> the
>>>>> device. In case we have a device that won't return 0, the reset
>>>>> operation will loop forever and cause the host/vm to stuck. Set 
>>>>> timeout
>>>>> for 3 minutes before giving up on the device.
>>>>>
>>>>> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
>>>>> ---
>>>>>   drivers/virtio/virtio_pci_modern.c | 10 +++++++++-
>>>>>   1 file changed, 9 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/drivers/virtio/virtio_pci_modern.c 
>>>>> b/drivers/virtio/virtio_pci_modern.c
>>>>> index cc3412a96a17..dcee616e8d21 100644
>>>>> --- a/drivers/virtio/virtio_pci_modern.c
>>>>> +++ b/drivers/virtio/virtio_pci_modern.c
>>>>> @@ -162,6 +162,7 @@ static int vp_reset(struct virtio_device *vdev)
>>>>>   {
>>>>>       struct virtio_pci_device *vp_dev = to_vp_device(vdev);
>>>>>       struct virtio_pci_modern_device *mdev = &vp_dev->mdev;
>>>>> +    unsigned long timeout = jiffies + msecs_to_jiffies(180000);
>>>>>         /* 0 status means a reset. */
>>>>>       vp_modern_set_status(mdev, 0);
>>>>> @@ -169,9 +170,16 @@ static int vp_reset(struct virtio_device *vdev)
>>>>>        * device_status to return 0 before reinitializing the device.
>>>>>        * This will flush out the status write, and flush in device 
>>>>> writes,
>>>>>        * including MSI-X interrupts, if any.
>>>>> +     * Set a timeout before giving up on the device.
>>>>>        */
>>>>> -    while (vp_modern_get_status(mdev))
>>>>> +    while (vp_modern_get_status(mdev)) {
>>>>> +        if (time_after(jiffies, timeout)) {
>>>>
>>>>
>>>> What happens if the device finish the rest after the timeout?
>>>
>>>
>>> The driver will set VIRTIO_CONFIG_S_FAILED and one can re-probe it 
>>> later on (e.g by re-scanning the pci bus).
>>
>>
>> Ok, so do we need the flush through vp_synchronize_vectors() here?
>
> If the device didn't write 0 to status I guess we don't need that.
>
> The device shouldn't raise any interrupt before negotiation finish 
> successfully.


The reset could be triggered in other places like driver removing.

Thanks


>
> MST, is that correct ?
>
>>
>> Thanks
>>
>>
>>>
>>>
>>>>
>>>> Thanks
>>>>
>>>>
>>>>> + dev_err(&vdev->dev, "virtio: device not ready. "
>>>>> +                "Aborting. Try again later\n");
>>>>> +            return -EAGAIN;
>>>>> +        }
>>>>>           msleep(1);
>>>>> +    }
>>>>>       /* Flush pending VQ/configuration callbacks. */
>>>>>       vp_synchronize_vectors(vdev);
>>>>>       return 0;
>>>>
>>>
>>
>
diff mbox series

Patch

diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c
index cc3412a96a17..dcee616e8d21 100644
--- a/drivers/virtio/virtio_pci_modern.c
+++ b/drivers/virtio/virtio_pci_modern.c
@@ -162,6 +162,7 @@  static int vp_reset(struct virtio_device *vdev)
 {
 	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
 	struct virtio_pci_modern_device *mdev = &vp_dev->mdev;
+	unsigned long timeout = jiffies + msecs_to_jiffies(180000);
 
 	/* 0 status means a reset. */
 	vp_modern_set_status(mdev, 0);
@@ -169,9 +170,16 @@  static int vp_reset(struct virtio_device *vdev)
 	 * device_status to return 0 before reinitializing the device.
 	 * This will flush out the status write, and flush in device writes,
 	 * including MSI-X interrupts, if any.
+	 * Set a timeout before giving up on the device.
 	 */
-	while (vp_modern_get_status(mdev))
+	while (vp_modern_get_status(mdev)) {
+		if (time_after(jiffies, timeout)) {
+			dev_err(&vdev->dev, "virtio: device not ready. "
+				"Aborting. Try again later\n");
+			return -EAGAIN;
+		}
 		msleep(1);
+	}
 	/* Flush pending VQ/configuration callbacks. */
 	vp_synchronize_vectors(vdev);
 	return 0;