diff mbox series

[3/3] mlx5_vdpa: defer clear_virtqueues to until DRIVER_OK

Message ID 1612614564-4220-3-git-send-email-si-wei.liu@oracle.com (mailing list archive)
State Not Applicable
Headers show
Series [1/3] mlx5_vdpa: should exclude header length and fcs from mtu | expand

Checks

Context Check Description
netdev/tree_selection success Not a local patch

Commit Message

Si-Wei Liu Feb. 6, 2021, 12:29 p.m. UTC
While virtq is stopped,  get_vq_state() is supposed to
be  called to  get  sync'ed  with  the latest internal
avail_index from device. The saved avail_index is used
to restate  the virtq  once device is started.  Commit
b35ccebe3ef7 introduced the clear_virtqueues() routine
to  reset  the saved  avail_index,  however, the index
gets cleared a bit earlier before get_vq_state() tries
to read it. This would cause consistency problems when
virtq is restarted, e.g. through a series of link down
and link up events. We  could  defer  the  clearing of
avail_index  to  until  the  device  is to be started,
i.e. until  VIRTIO_CONFIG_S_DRIVER_OK  is set again in
set_status().

Fixes: b35ccebe3ef7 ("vdpa/mlx5: Restore the hardware used index after change map")
Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
---
 drivers/vdpa/mlx5/net/mlx5_vnet.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Jason Wang Feb. 8, 2021, 4:38 a.m. UTC | #1
On 2021/2/6 下午8:29, Si-Wei Liu wrote:
> While virtq is stopped,  get_vq_state() is supposed to
> be  called to  get  sync'ed  with  the latest internal
> avail_index from device. The saved avail_index is used
> to restate  the virtq  once device is started.  Commit
> b35ccebe3ef7 introduced the clear_virtqueues() routine
> to  reset  the saved  avail_index,  however, the index
> gets cleared a bit earlier before get_vq_state() tries
> to read it. This would cause consistency problems when
> virtq is restarted, e.g. through a series of link down
> and link up events. We  could  defer  the  clearing of
> avail_index  to  until  the  device  is to be started,
> i.e. until  VIRTIO_CONFIG_S_DRIVER_OK  is set again in
> set_status().
>
> Fixes: b35ccebe3ef7 ("vdpa/mlx5: Restore the hardware used index after change map")
> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>


Acked-by: Jason Wang <jasowang@redhat.com>


> ---
>   drivers/vdpa/mlx5/net/mlx5_vnet.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> index aa6f8cd..444ab58 100644
> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> @@ -1785,7 +1785,6 @@ static void mlx5_vdpa_set_status(struct vdpa_device *vdev, u8 status)
>   	if (!status) {
>   		mlx5_vdpa_info(mvdev, "performing device reset\n");
>   		teardown_driver(ndev);
> -		clear_virtqueues(ndev);
>   		mlx5_vdpa_destroy_mr(&ndev->mvdev);
>   		ndev->mvdev.status = 0;
>   		++mvdev->generation;
> @@ -1794,6 +1793,7 @@ static void mlx5_vdpa_set_status(struct vdpa_device *vdev, u8 status)
>   
>   	if ((status ^ ndev->mvdev.status) & VIRTIO_CONFIG_S_DRIVER_OK) {
>   		if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
> +			clear_virtqueues(ndev);
>   			err = setup_driver(ndev);
>   			if (err) {
>   				mlx5_vdpa_warn(mvdev, "failed to setup driver\n");
Eli Cohen Feb. 8, 2021, 5:48 a.m. UTC | #2
On Sat, Feb 06, 2021 at 04:29:24AM -0800, Si-Wei Liu wrote:
> While virtq is stopped,  get_vq_state() is supposed to
> be  called to  get  sync'ed  with  the latest internal
> avail_index from device. The saved avail_index is used
> to restate  the virtq  once device is started.  Commit
> b35ccebe3ef7 introduced the clear_virtqueues() routine
> to  reset  the saved  avail_index,  however, the index
> gets cleared a bit earlier before get_vq_state() tries
> to read it. This would cause consistency problems when
> virtq is restarted, e.g. through a series of link down
> and link up events. We  could  defer  the  clearing of
> avail_index  to  until  the  device  is to be started,
> i.e. until  VIRTIO_CONFIG_S_DRIVER_OK  is set again in
> set_status().


Not sure I understand the scenario. You are talking about reset of the
device followed by up/down events on the interface. How can you trigger
this?

> 
> Fixes: b35ccebe3ef7 ("vdpa/mlx5: Restore the hardware used index after change map")
> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
> ---
>  drivers/vdpa/mlx5/net/mlx5_vnet.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> index aa6f8cd..444ab58 100644
> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> @@ -1785,7 +1785,6 @@ static void mlx5_vdpa_set_status(struct vdpa_device *vdev, u8 status)
>  	if (!status) {
>  		mlx5_vdpa_info(mvdev, "performing device reset\n");
>  		teardown_driver(ndev);
> -		clear_virtqueues(ndev);
>  		mlx5_vdpa_destroy_mr(&ndev->mvdev);
>  		ndev->mvdev.status = 0;
>  		++mvdev->generation;
> @@ -1794,6 +1793,7 @@ static void mlx5_vdpa_set_status(struct vdpa_device *vdev, u8 status)
>  
>  	if ((status ^ ndev->mvdev.status) & VIRTIO_CONFIG_S_DRIVER_OK) {
>  		if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
> +			clear_virtqueues(ndev);
>  			err = setup_driver(ndev);
>  			if (err) {
>  				mlx5_vdpa_warn(mvdev, "failed to setup driver\n");
> -- 
> 1.8.3.1
>
Si-Wei Liu Feb. 9, 2021, 1:40 a.m. UTC | #3
On 2/7/2021 9:48 PM, Eli Cohen wrote:
> On Sat, Feb 06, 2021 at 04:29:24AM -0800, Si-Wei Liu wrote:
>> While virtq is stopped,  get_vq_state() is supposed to
>> be  called to  get  sync'ed  with  the latest internal
>> avail_index from device. The saved avail_index is used
>> to restate  the virtq  once device is started.  Commit
>> b35ccebe3ef7 introduced the clear_virtqueues() routine
>> to  reset  the saved  avail_index,  however, the index
>> gets cleared a bit earlier before get_vq_state() tries
>> to read it. This would cause consistency problems when
>> virtq is restarted, e.g. through a series of link down
>> and link up events. We  could  defer  the  clearing of
>> avail_index  to  until  the  device  is to be started,
>> i.e. until  VIRTIO_CONFIG_S_DRIVER_OK  is set again in
>> set_status().
>
> Not sure I understand the scenario. You are talking about reset of the
> device followed by up/down events on the interface. How can you trigger
> this?
Currently it's not possible to trigger link up/down events with upstream 
QEMU due lack of config/control interrupt implementation. And live 
migration could be another scenario that cannot be satisfied because of 
inconsistent queue state. They share the same root of cause as captured 
here.

-Siwei

>
>> Fixes: b35ccebe3ef7 ("vdpa/mlx5: Restore the hardware used index after change map")
>> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
>> ---
>>   drivers/vdpa/mlx5/net/mlx5_vnet.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
>> index aa6f8cd..444ab58 100644
>> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
>> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
>> @@ -1785,7 +1785,6 @@ static void mlx5_vdpa_set_status(struct vdpa_device *vdev, u8 status)
>>   	if (!status) {
>>   		mlx5_vdpa_info(mvdev, "performing device reset\n");
>>   		teardown_driver(ndev);
>> -		clear_virtqueues(ndev);
>>   		mlx5_vdpa_destroy_mr(&ndev->mvdev);
>>   		ndev->mvdev.status = 0;
>>   		++mvdev->generation;
>> @@ -1794,6 +1793,7 @@ static void mlx5_vdpa_set_status(struct vdpa_device *vdev, u8 status)
>>   
>>   	if ((status ^ ndev->mvdev.status) & VIRTIO_CONFIG_S_DRIVER_OK) {
>>   		if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
>> +			clear_virtqueues(ndev);
>>   			err = setup_driver(ndev);
>>   			if (err) {
>>   				mlx5_vdpa_warn(mvdev, "failed to setup driver\n");
>> -- 
>> 1.8.3.1
>>
Jason Wang Feb. 9, 2021, 3:37 a.m. UTC | #4
On 2021/2/6 下午8:29, Si-Wei Liu wrote:
> While virtq is stopped,  get_vq_state() is supposed to
> be  called to  get  sync'ed  with  the latest internal
> avail_index from device. The saved avail_index is used
> to restate  the virtq  once device is started.  Commit
> b35ccebe3ef7 introduced the clear_virtqueues() routine
> to  reset  the saved  avail_index,  however, the index
> gets cleared a bit earlier before get_vq_state() tries
> to read it. This would cause consistency problems when
> virtq is restarted, e.g. through a series of link down
> and link up events. We  could  defer  the  clearing of
> avail_index  to  until  the  device  is to be started,
> i.e. until  VIRTIO_CONFIG_S_DRIVER_OK  is set again in
> set_status().
>
> Fixes: b35ccebe3ef7 ("vdpa/mlx5: Restore the hardware used index after change map")
> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
> ---
>   drivers/vdpa/mlx5/net/mlx5_vnet.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> index aa6f8cd..444ab58 100644
> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> @@ -1785,7 +1785,6 @@ static void mlx5_vdpa_set_status(struct vdpa_device *vdev, u8 status)
>   	if (!status) {
>   		mlx5_vdpa_info(mvdev, "performing device reset\n");
>   		teardown_driver(ndev);
> -		clear_virtqueues(ndev);
>   		mlx5_vdpa_destroy_mr(&ndev->mvdev);
>   		ndev->mvdev.status = 0;
>   		++mvdev->generation;
> @@ -1794,6 +1793,7 @@ static void mlx5_vdpa_set_status(struct vdpa_device *vdev, u8 status)
>   
>   	if ((status ^ ndev->mvdev.status) & VIRTIO_CONFIG_S_DRIVER_OK) {
>   		if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
> +			clear_virtqueues(ndev);


Rethink about this. As mentioned in another thread, this in fact breaks 
set_vq_state().  (See vhost_virtqueue_start() -> 
vhost_vdpa_set_vring_base() in qemu codes).

The issue is that the avail idx is forgot, we need keep it.

Thanks


>   			err = setup_driver(ndev);
>   			if (err) {
>   				mlx5_vdpa_warn(mvdev, "failed to setup driver\n");
Si-Wei Liu Feb. 10, 2021, 12:26 a.m. UTC | #5
On 2/8/2021 7:37 PM, Jason Wang wrote:
>
> On 2021/2/6 下午8:29, Si-Wei Liu wrote:
>> While virtq is stopped,  get_vq_state() is supposed to
>> be  called to  get  sync'ed  with  the latest internal
>> avail_index from device. The saved avail_index is used
>> to restate  the virtq  once device is started.  Commit
>> b35ccebe3ef7 introduced the clear_virtqueues() routine
>> to  reset  the saved  avail_index,  however, the index
>> gets cleared a bit earlier before get_vq_state() tries
>> to read it. This would cause consistency problems when
>> virtq is restarted, e.g. through a series of link down
>> and link up events. We  could  defer  the  clearing of
>> avail_index  to  until  the  device  is to be started,
>> i.e. until  VIRTIO_CONFIG_S_DRIVER_OK  is set again in
>> set_status().
>>
>> Fixes: b35ccebe3ef7 ("vdpa/mlx5: Restore the hardware used index 
>> after change map")
>> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
>> ---
>>   drivers/vdpa/mlx5/net/mlx5_vnet.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
>> b/drivers/vdpa/mlx5/net/mlx5_vnet.c
>> index aa6f8cd..444ab58 100644
>> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
>> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
>> @@ -1785,7 +1785,6 @@ static void mlx5_vdpa_set_status(struct 
>> vdpa_device *vdev, u8 status)
>>       if (!status) {
>>           mlx5_vdpa_info(mvdev, "performing device reset\n");
>>           teardown_driver(ndev);
>> -        clear_virtqueues(ndev);
>>           mlx5_vdpa_destroy_mr(&ndev->mvdev);
>>           ndev->mvdev.status = 0;
>>           ++mvdev->generation;
>> @@ -1794,6 +1793,7 @@ static void mlx5_vdpa_set_status(struct 
>> vdpa_device *vdev, u8 status)
>>         if ((status ^ ndev->mvdev.status) & VIRTIO_CONFIG_S_DRIVER_OK) {
>>           if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
>> +            clear_virtqueues(ndev);
>
>
> Rethink about this. As mentioned in another thread, this in fact 
> breaks set_vq_state().  (See vhost_virtqueue_start() -> 
> vhost_vdpa_set_vring_base() in qemu codes).
I assume that the clearing for vhost-vdpa would be done via (qemu code),

vhost_dev_start()->vhost_vdpa_dev_start()->vhost_vdpa_call(status | 
VIRTIO_CONFIG_S_DRIVER_OK)

which is _after_ vhost_virtqueue_start() gets called to restore the 
avail_idx to h/w in vhost_dev_start(). What am I missing here?

-Siwei


>
> The issue is that the avail idx is forgot, we need keep it.
>
> Thanks
>
>
>>               err = setup_driver(ndev);
>>               if (err) {
>>                   mlx5_vdpa_warn(mvdev, "failed to setup driver\n");
>
Jason Wang Feb. 10, 2021, 4 a.m. UTC | #6
On 2021/2/10 上午8:26, Si-Wei Liu wrote:
>
>
> On 2/8/2021 7:37 PM, Jason Wang wrote:
>>
>> On 2021/2/6 下午8:29, Si-Wei Liu wrote:
>>> While virtq is stopped,  get_vq_state() is supposed to
>>> be  called to  get  sync'ed  with  the latest internal
>>> avail_index from device. The saved avail_index is used
>>> to restate  the virtq  once device is started.  Commit
>>> b35ccebe3ef7 introduced the clear_virtqueues() routine
>>> to  reset  the saved  avail_index,  however, the index
>>> gets cleared a bit earlier before get_vq_state() tries
>>> to read it. This would cause consistency problems when
>>> virtq is restarted, e.g. through a series of link down
>>> and link up events. We  could  defer  the  clearing of
>>> avail_index  to  until  the  device  is to be started,
>>> i.e. until  VIRTIO_CONFIG_S_DRIVER_OK  is set again in
>>> set_status().
>>>
>>> Fixes: b35ccebe3ef7 ("vdpa/mlx5: Restore the hardware used index 
>>> after change map")
>>> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
>>> ---
>>>   drivers/vdpa/mlx5/net/mlx5_vnet.c | 2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
>>> b/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>> index aa6f8cd..444ab58 100644
>>> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>> @@ -1785,7 +1785,6 @@ static void mlx5_vdpa_set_status(struct 
>>> vdpa_device *vdev, u8 status)
>>>       if (!status) {
>>>           mlx5_vdpa_info(mvdev, "performing device reset\n");
>>>           teardown_driver(ndev);
>>> -        clear_virtqueues(ndev);
>>>           mlx5_vdpa_destroy_mr(&ndev->mvdev);
>>>           ndev->mvdev.status = 0;
>>>           ++mvdev->generation;
>>> @@ -1794,6 +1793,7 @@ static void mlx5_vdpa_set_status(struct 
>>> vdpa_device *vdev, u8 status)
>>>         if ((status ^ ndev->mvdev.status) & 
>>> VIRTIO_CONFIG_S_DRIVER_OK) {
>>>           if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
>>> +            clear_virtqueues(ndev);
>>
>>
>> Rethink about this. As mentioned in another thread, this in fact 
>> breaks set_vq_state().  (See vhost_virtqueue_start() -> 
>> vhost_vdpa_set_vring_base() in qemu codes).
> I assume that the clearing for vhost-vdpa would be done via (qemu code),
>
> vhost_dev_start()->vhost_vdpa_dev_start()->vhost_vdpa_call(status | 
> VIRTIO_CONFIG_S_DRIVER_OK)
>
> which is _after_ vhost_virtqueue_start() gets called to restore the 
> avail_idx to h/w in vhost_dev_start(). What am I missing here?
>
> -Siwei


I think not. I thought clear_virtqueues() will clear hardware index but 
looks not. (I guess we need a better name other than clear_virtqueues(), 
e.g from the name it looks like the it will clear the hardware states)

Thanks


>
>
>>
>> The issue is that the avail idx is forgot, we need keep it.
>>
>> Thanks
>>
>>
>>>               err = setup_driver(ndev);
>>>               if (err) {
>>>                   mlx5_vdpa_warn(mvdev, "failed to setup driver\n");
>>
>
diff mbox series

Patch

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index aa6f8cd..444ab58 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -1785,7 +1785,6 @@  static void mlx5_vdpa_set_status(struct vdpa_device *vdev, u8 status)
 	if (!status) {
 		mlx5_vdpa_info(mvdev, "performing device reset\n");
 		teardown_driver(ndev);
-		clear_virtqueues(ndev);
 		mlx5_vdpa_destroy_mr(&ndev->mvdev);
 		ndev->mvdev.status = 0;
 		++mvdev->generation;
@@ -1794,6 +1793,7 @@  static void mlx5_vdpa_set_status(struct vdpa_device *vdev, u8 status)
 
 	if ((status ^ ndev->mvdev.status) & VIRTIO_CONFIG_S_DRIVER_OK) {
 		if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
+			clear_virtqueues(ndev);
 			err = setup_driver(ndev);
 			if (err) {
 				mlx5_vdpa_warn(mvdev, "failed to setup driver\n");