[v3,1/1] vfio-ccw: Prevent quiesce function going into an infinite loop
diff mbox series

Message ID 4d5a4b98ab1b41ac6131b5c36de18b76c5d66898.1555449329.git.alifm@linux.ibm.com
State New
Headers show
Series
  • [v3,1/1] vfio-ccw: Prevent quiesce function going into an infinite loop
Related show

Commit Message

Farhan Ali April 16, 2019, 9:23 p.m. UTC
The quiesce function calls cio_cancel_halt_clear() and if we
get an -EBUSY we go into a loop where we:
	- wait for any interrupts
	- flush all I/O in the workqueue
	- retry cio_cancel_halt_clear

During the period where we are waiting for interrupts or
flushing all I/O, the channel subsystem could have completed
a halt/clear action and turned off the corresponding activity
control bits in the subchannel status word. This means the next
time we call cio_cancel_halt_clear(), we will again start by
calling cancel subchannel and so we can be stuck between calling
cancel and halt forever.

Rather than calling cio_cancel_halt_clear() immediately after
waiting, let's try to disable the subchannel. If we succeed in
disabling the subchannel then we know nothing else can happen
with the device.

Suggested-by: Eric Farman <farman@linux.ibm.com>
Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
---
ChangeLog:
v2 -> v3
   - Log an error message when cio_cancel_halt_clear
     returns EIO and break out of the loop.
   
   - Did not include past change log as the other patches
     of the original series have been queued by Conny.
     Old series (v2) can be found here:
     https://marc.info/?l=kvm&m=155475754101769&w=2

 drivers/s390/cio/vfio_ccw_drv.c | 32 ++++++++++++++++++--------------
 1 file changed, 18 insertions(+), 14 deletions(-)

Comments

Cornelia Huck April 17, 2019, 9:03 a.m. UTC | #1
On Tue, 16 Apr 2019 17:23:14 -0400
Farhan Ali <alifm@linux.ibm.com> wrote:

> The quiesce function calls cio_cancel_halt_clear() and if we
> get an -EBUSY we go into a loop where we:
> 	- wait for any interrupts
> 	- flush all I/O in the workqueue
> 	- retry cio_cancel_halt_clear
> 
> During the period where we are waiting for interrupts or
> flushing all I/O, the channel subsystem could have completed
> a halt/clear action and turned off the corresponding activity
> control bits in the subchannel status word. This means the next
> time we call cio_cancel_halt_clear(), we will again start by
> calling cancel subchannel and so we can be stuck between calling
> cancel and halt forever.
> 
> Rather than calling cio_cancel_halt_clear() immediately after
> waiting, let's try to disable the subchannel. If we succeed in
> disabling the subchannel then we know nothing else can happen
> with the device.
> 
> Suggested-by: Eric Farman <farman@linux.ibm.com>
> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
> ---
> ChangeLog:
> v2 -> v3
>    - Log an error message when cio_cancel_halt_clear
>      returns EIO and break out of the loop.
>    
>    - Did not include past change log as the other patches
>      of the original series have been queued by Conny.
>      Old series (v2) can be found here:
>      https://marc.info/?l=kvm&m=155475754101769&w=2
> 
>  drivers/s390/cio/vfio_ccw_drv.c | 32 ++++++++++++++++++--------------
>  1 file changed, 18 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/s390/cio/vfio_ccw_drv.c b/drivers/s390/cio/vfio_ccw_drv.c
> index 78517aa..66a66ac 100644
> --- a/drivers/s390/cio/vfio_ccw_drv.c
> +++ b/drivers/s390/cio/vfio_ccw_drv.c
> @@ -43,26 +43,30 @@ int vfio_ccw_sch_quiesce(struct subchannel *sch)
>  	if (ret != -EBUSY)
>  		goto out_unlock;
>  
> +	iretry = 255;
>  	do {
> -		iretry = 255;
>  
>  		ret = cio_cancel_halt_clear(sch, &iretry);
> -		while (ret == -EBUSY) {
> -			/*
> -			 * Flush all I/O and wait for
> -			 * cancel/halt/clear completion.
> -			 */
> -			private->completion = &completion;
> -			spin_unlock_irq(sch->lock);
>  
> -			wait_for_completion_timeout(&completion, 3*HZ);
> +		if (ret == -EIO) {
> +			pr_err("vfio_ccw: could not quiesce subchannel 0.%x.%04x!\n",
> +			       sch->schid.ssid, sch->schid.sch_no);

What about using
	dev_err(&sch->dev, "could not quiesce");
instead?

(Can make that change while applying, no need to resend for that.)

> +			break;
> +		}
> +
> +		/*
> +		 * Flush all I/O and wait for
> +		 * cancel/halt/clear completion.
> +		 */
> +		private->completion = &completion;
> +		spin_unlock_irq(sch->lock);
>  
> -			private->completion = NULL;
> -			flush_workqueue(vfio_ccw_work_q);
> -			spin_lock_irq(sch->lock);
> -			ret = cio_cancel_halt_clear(sch, &iretry);
> -		};
> +		if (ret == -EBUSY)
> +			wait_for_completion_timeout(&completion, 3*HZ);
>  
> +		private->completion = NULL;
> +		flush_workqueue(vfio_ccw_work_q);
> +		spin_lock_irq(sch->lock);
>  		ret = cio_disable_subchannel(sch);
>  	} while (ret == -EBUSY);
>  out_unlock:

Otherwise, looks good to me. Will queue when I get some ack/r-b.
Eric Farman April 17, 2019, 1:58 p.m. UTC | #2
On 4/17/19 5:03 AM, Cornelia Huck wrote:
> On Tue, 16 Apr 2019 17:23:14 -0400
> Farhan Ali <alifm@linux.ibm.com> wrote:
> 
>> The quiesce function calls cio_cancel_halt_clear() and if we
>> get an -EBUSY we go into a loop where we:
>> 	- wait for any interrupts
>> 	- flush all I/O in the workqueue
>> 	- retry cio_cancel_halt_clear
>>
>> During the period where we are waiting for interrupts or
>> flushing all I/O, the channel subsystem could have completed
>> a halt/clear action and turned off the corresponding activity
>> control bits in the subchannel status word. This means the next
>> time we call cio_cancel_halt_clear(), we will again start by
>> calling cancel subchannel and so we can be stuck between calling
>> cancel and halt forever.
>>
>> Rather than calling cio_cancel_halt_clear() immediately after
>> waiting, let's try to disable the subchannel. If we succeed in
>> disabling the subchannel then we know nothing else can happen
>> with the device.
>>
>> Suggested-by: Eric Farman <farman@linux.ibm.com>
>> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
>> ---
>> ChangeLog:
>> v2 -> v3
>>     - Log an error message when cio_cancel_halt_clear
>>       returns EIO and break out of the loop.
>>     
>>     - Did not include past change log as the other patches
>>       of the original series have been queued by Conny.
>>       Old series (v2) can be found here:
>>       https://marc.info/?l=kvm&m=155475754101769&w=2
>>
>>   drivers/s390/cio/vfio_ccw_drv.c | 32 ++++++++++++++++++--------------
>>   1 file changed, 18 insertions(+), 14 deletions(-)
>>
>> diff --git a/drivers/s390/cio/vfio_ccw_drv.c b/drivers/s390/cio/vfio_ccw_drv.c
>> index 78517aa..66a66ac 100644
>> --- a/drivers/s390/cio/vfio_ccw_drv.c
>> +++ b/drivers/s390/cio/vfio_ccw_drv.c
>> @@ -43,26 +43,30 @@ int vfio_ccw_sch_quiesce(struct subchannel *sch)
>>   	if (ret != -EBUSY)
>>   		goto out_unlock;
>>   
>> +	iretry = 255;
>>   	do {
>> -		iretry = 255;
>>   
>>   		ret = cio_cancel_halt_clear(sch, &iretry);
>> -		while (ret == -EBUSY) {
>> -			/*
>> -			 * Flush all I/O and wait for
>> -			 * cancel/halt/clear completion.
>> -			 */
>> -			private->completion = &completion;
>> -			spin_unlock_irq(sch->lock);
>>   
>> -			wait_for_completion_timeout(&completion, 3*HZ);
>> +		if (ret == -EIO) {
>> +			pr_err("vfio_ccw: could not quiesce subchannel 0.%x.%04x!\n",
>> +			       sch->schid.ssid, sch->schid.sch_no);
> 
> What about using
> 	dev_err(&sch->dev, "could not quiesce");
> instead?

+1

> 
> (Can make that change while applying, no need to resend for that.)
> 
>> +			break;
>> +		}
>> +
>> +		/*
>> +		 * Flush all I/O and wait for
>> +		 * cancel/halt/clear completion.
>> +		 */
>> +		private->completion = &completion;
>> +		spin_unlock_irq(sch->lock);
>>   
>> -			private->completion = NULL;
>> -			flush_workqueue(vfio_ccw_work_q);
>> -			spin_lock_irq(sch->lock);
>> -			ret = cio_cancel_halt_clear(sch, &iretry);
>> -		};
>> +		if (ret == -EBUSY)
>> +			wait_for_completion_timeout(&completion, 3*HZ);
>>   
>> +		private->completion = NULL;
>> +		flush_workqueue(vfio_ccw_work_q);
>> +		spin_lock_irq(sch->lock);
>>   		ret = cio_disable_subchannel(sch);
>>   	} while (ret == -EBUSY);
>>   out_unlock:
> 
> Otherwise, looks good to me. Will queue when I get some ack/r-b.
> 

I like it, but I feel weird giving an r-b to something I suggested:

Acked-by: Eric Farman <farman@linux.ibm.com>
Farhan Ali April 17, 2019, 2:02 p.m. UTC | #3
On 04/17/2019 05:03 AM, Cornelia Huck wrote:
> On Tue, 16 Apr 2019 17:23:14 -0400
> Farhan Ali <alifm@linux.ibm.com> wrote:
> 
>> The quiesce function calls cio_cancel_halt_clear() and if we
>> get an -EBUSY we go into a loop where we:
>> 	- wait for any interrupts
>> 	- flush all I/O in the workqueue
>> 	- retry cio_cancel_halt_clear
>>
>> During the period where we are waiting for interrupts or
>> flushing all I/O, the channel subsystem could have completed
>> a halt/clear action and turned off the corresponding activity
>> control bits in the subchannel status word. This means the next
>> time we call cio_cancel_halt_clear(), we will again start by
>> calling cancel subchannel and so we can be stuck between calling
>> cancel and halt forever.
>>
>> Rather than calling cio_cancel_halt_clear() immediately after
>> waiting, let's try to disable the subchannel. If we succeed in
>> disabling the subchannel then we know nothing else can happen
>> with the device.
>>
>> Suggested-by: Eric Farman <farman@linux.ibm.com>
>> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
>> ---
>> ChangeLog:
>> v2 -> v3
>>     - Log an error message when cio_cancel_halt_clear
>>       returns EIO and break out of the loop.
>>     
>>     - Did not include past change log as the other patches
>>       of the original series have been queued by Conny.
>>       Old series (v2) can be found here:
>>       https://marc.info/?l=kvm&m=155475754101769&w=2
>>
>>   drivers/s390/cio/vfio_ccw_drv.c | 32 ++++++++++++++++++--------------
>>   1 file changed, 18 insertions(+), 14 deletions(-)
>>
>> diff --git a/drivers/s390/cio/vfio_ccw_drv.c b/drivers/s390/cio/vfio_ccw_drv.c
>> index 78517aa..66a66ac 100644
>> --- a/drivers/s390/cio/vfio_ccw_drv.c
>> +++ b/drivers/s390/cio/vfio_ccw_drv.c
>> @@ -43,26 +43,30 @@ int vfio_ccw_sch_quiesce(struct subchannel *sch)
>>   	if (ret != -EBUSY)
>>   		goto out_unlock;
>>   
>> +	iretry = 255;
>>   	do {
>> -		iretry = 255;
>>   
>>   		ret = cio_cancel_halt_clear(sch, &iretry);
>> -		while (ret == -EBUSY) {
>> -			/*
>> -			 * Flush all I/O and wait for
>> -			 * cancel/halt/clear completion.
>> -			 */
>> -			private->completion = &completion;
>> -			spin_unlock_irq(sch->lock);
>>   
>> -			wait_for_completion_timeout(&completion, 3*HZ);
>> +		if (ret == -EIO) {
>> +			pr_err("vfio_ccw: could not quiesce subchannel 0.%x.%04x!\n",
>> +			       sch->schid.ssid, sch->schid.sch_no);
> 
> What about using
> 	dev_err(&sch->dev, "could not quiesce");
> instead?
> 
> (Can make that change while applying, no need to resend for that.)

Sure, the change is fine with me, dev_err would be more appropriate.

> 
>> +			break;
>> +		}
>> +
>> +		/*
>> +		 * Flush all I/O and wait for
>> +		 * cancel/halt/clear completion.
>> +		 */
>> +		private->completion = &completion;
>> +		spin_unlock_irq(sch->lock);
>>   
>> -			private->completion = NULL;
>> -			flush_workqueue(vfio_ccw_work_q);
>> -			spin_lock_irq(sch->lock);
>> -			ret = cio_cancel_halt_clear(sch, &iretry);
>> -		};
>> +		if (ret == -EBUSY)
>> +			wait_for_completion_timeout(&completion, 3*HZ);
>>   
>> +		private->completion = NULL;
>> +		flush_workqueue(vfio_ccw_work_q);
>> +		spin_lock_irq(sch->lock);
>>   		ret = cio_disable_subchannel(sch);
>>   	} while (ret == -EBUSY);
>>   out_unlock:
> 
> Otherwise, looks good to me. Will queue when I get some ack/r-b.
> 
> Thanks again for reviewing :).

Thanks
Farhan
Halil Pasic April 17, 2019, 3:13 p.m. UTC | #4
On Wed, 17 Apr 2019 09:58:24 -0400
Eric Farman <farman@linux.ibm.com> wrote:

> 
> 
> On 4/17/19 5:03 AM, Cornelia Huck wrote:
> > On Tue, 16 Apr 2019 17:23:14 -0400
> > Farhan Ali <alifm@linux.ibm.com> wrote:
> > 
> >> The quiesce function calls cio_cancel_halt_clear() and if we
> >> get an -EBUSY we go into a loop where we:
> >> 	- wait for any interrupts
> >> 	- flush all I/O in the workqueue
> >> 	- retry cio_cancel_halt_clear
> >>
> >> During the period where we are waiting for interrupts or
> >> flushing all I/O, the channel subsystem could have completed
> >> a halt/clear action and turned off the corresponding activity
> >> control bits in the subchannel status word. This means the next
> >> time we call cio_cancel_halt_clear(), we will again start by
> >> calling cancel subchannel and so we can be stuck between calling
> >> cancel and halt forever.
> >>
> >> Rather than calling cio_cancel_halt_clear() immediately after
> >> waiting, let's try to disable the subchannel. If we succeed in
> >> disabling the subchannel then we know nothing else can happen
> >> with the device.
> >>
> >> Suggested-by: Eric Farman <farman@linux.ibm.com>
> >> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
> >> ---
> >> ChangeLog:
> >> v2 -> v3
> >>     - Log an error message when cio_cancel_halt_clear
> >>       returns EIO and break out of the loop.
> >>     
> >>     - Did not include past change log as the other patches
> >>       of the original series have been queued by Conny.
> >>       Old series (v2) can be found here:
> >>       https://marc.info/?l=kvm&m=155475754101769&w=2
> >>
> >>   drivers/s390/cio/vfio_ccw_drv.c | 32 ++++++++++++++++++--------------
> >>   1 file changed, 18 insertions(+), 14 deletions(-)
> >>
> >> diff --git a/drivers/s390/cio/vfio_ccw_drv.c b/drivers/s390/cio/vfio_ccw_drv.c
> >> index 78517aa..66a66ac 100644
> >> --- a/drivers/s390/cio/vfio_ccw_drv.c
> >> +++ b/drivers/s390/cio/vfio_ccw_drv.c
> >> @@ -43,26 +43,30 @@ int vfio_ccw_sch_quiesce(struct subchannel *sch)
> >>   	if (ret != -EBUSY)
> >>   		goto out_unlock;
> >>   
> >> +	iretry = 255;
> >>   	do {
> >> -		iretry = 255;
> >>   
> >>   		ret = cio_cancel_halt_clear(sch, &iretry);
> >> -		while (ret == -EBUSY) {
> >> -			/*
> >> -			 * Flush all I/O and wait for
> >> -			 * cancel/halt/clear completion.
> >> -			 */
> >> -			private->completion = &completion;
> >> -			spin_unlock_irq(sch->lock);
> >>   
> >> -			wait_for_completion_timeout(&completion, 3*HZ);
> >> +		if (ret == -EIO) {
> >> +			pr_err("vfio_ccw: could not quiesce subchannel 0.%x.%04x!\n",
> >> +			       sch->schid.ssid, sch->schid.sch_no);
> > 
> > What about using
> > 	dev_err(&sch->dev, "could not quiesce");
> > instead?
> 
> +1
> 
> > 
> > (Can make that change while applying, no need to resend for that.)
> > 
> >> +			break;
> >> +		}
> >> +
> >> +		/*
> >> +		 * Flush all I/O and wait for
> >> +		 * cancel/halt/clear completion.
> >> +		 */
> >> +		private->completion = &completion;
> >> +		spin_unlock_irq(sch->lock);
> >>   
> >> -			private->completion = NULL;
> >> -			flush_workqueue(vfio_ccw_work_q);
> >> -			spin_lock_irq(sch->lock);
> >> -			ret = cio_cancel_halt_clear(sch, &iretry);
> >> -		};
> >> +		if (ret == -EBUSY)
> >> +			wait_for_completion_timeout(&completion, 3*HZ);
> >>   
> >> +		private->completion = NULL;
> >> +		flush_workqueue(vfio_ccw_work_q);
> >> +		spin_lock_irq(sch->lock);
> >>   		ret = cio_disable_subchannel(sch);
> >>   	} while (ret == -EBUSY);
> >>   out_unlock:
> > 
> > Otherwise, looks good to me. Will queue when I get some ack/r-b.
> > 
> 
> I like it, but I feel weird giving an r-b to something I suggested:
> 
> Acked-by: Eric Farman <farman@linux.ibm.com>
> 

I think r-b is fine. You did verify both the design and the
implementation I guess. So I don't see why not.

How urgent is this. I could give this some love till the end of the
week. Should I @Connie,@Farhan?

I was mostly ignoring these patches so I can't capitalize on my
understanding from reviewing the previous versions and need some time to
say something about it.

Regards,
Halil
Farhan Ali April 17, 2019, 3:18 p.m. UTC | #5
On 04/17/2019 11:13 AM, Halil Pasic wrote:
>>> Otherwise, looks good to me. Will queue when I get some ack/r-b.
>>>
>> I like it, but I feel weird giving an r-b to something I suggested:
>>
>> Acked-by: Eric Farman<farman@linux.ibm.com>
>>
> I think r-b is fine. You did verify both the design and the
> implementation I guess. So I don't see why not.
> 
> How urgent is this. I could give this some love till the end of the
> week. Should I @Connie,@Farhan?

Having more people review it is always a good thing :)

This patch (and other patches in the original series) are queued for 
5.2, and 5.1 is not released yet. I would say we have some time.

> 
> I was mostly ignoring these patches so I can't capitalize on my
> understanding from reviewing the previous versions and need some time to
> say something about it.
> 
> Regards,
> Halil
>
Cornelia Huck April 18, 2019, 2:36 p.m. UTC | #6
On Wed, 17 Apr 2019 17:13:11 +0200
Halil Pasic <pasic@linux.ibm.com> wrote:

> On Wed, 17 Apr 2019 09:58:24 -0400
> Eric Farman <farman@linux.ibm.com> wrote:

> > I like it, but I feel weird giving an r-b to something I suggested:
> > 
> > Acked-by: Eric Farman <farman@linux.ibm.com>
> >   
> 
> I think r-b is fine. You did verify both the design and the
> implementation I guess. So I don't see why not.

+1, an r-b by a suggestor is completely fine :)

> 
> How urgent is this. I could give this some love till the end of the
> week. Should I @Connie,@Farhan?

I plan to queue this early next week, so I can send a pull request mid
next week.
Halil Pasic April 19, 2019, 8:12 p.m. UTC | #7
On Wed, 17 Apr 2019 11:18:19 -0400
Farhan Ali <alifm@linux.ibm.com> wrote:

> 
> 
> On 04/17/2019 11:13 AM, Halil Pasic wrote:
> >>> Otherwise, looks good to me. Will queue when I get some ack/r-b.
> >>>
> >> I like it, but I feel weird giving an r-b to something I suggested:
> >>
> >> Acked-by: Eric Farman<farman@linux.ibm.com>
> >>
> > I think r-b is fine. You did verify both the design and the
> > implementation I guess. So I don't see why not.
> > 
> > How urgent is this. I could give this some love till the end of the
> > week. Should I @Connie,@Farhan?
> 
> Having more people review it is always a good thing :)
> 

Hi Farhan,

I was starring at this code for about an hour if not more and could not
figure out the intentions/ideas behind it. That is not a fault of your
patch, but I can't say that I understand neither the before nor the
after.

What understand this patch basically does is make us call
cio_disable_subchannel() more often. That is what you point out in your
commit message as well. But I fail to see how does this achieve what the
summary line promises:  'Prevent quiesce function going into an infinite
loop'.

Sorry, I can't r-b this. Maybe you can help me gain an understanding of
this code offline.

I guess, the approval of the people who actually understand what it is
going on (i.e. Connie and Eric) will have to suffice.

Regards,
Halil
Farhan Ali April 22, 2019, 2:01 p.m. UTC | #8
On 04/19/2019 04:12 PM, Halil Pasic wrote:
> On Wed, 17 Apr 2019 11:18:19 -0400
> Farhan Ali <alifm@linux.ibm.com> wrote:
> 
>>
>>
>> On 04/17/2019 11:13 AM, Halil Pasic wrote:
>>>>> Otherwise, looks good to me. Will queue when I get some ack/r-b.
>>>>>
>>>> I like it, but I feel weird giving an r-b to something I suggested:
>>>>
>>>> Acked-by: Eric Farman<farman@linux.ibm.com>
>>>>
>>> I think r-b is fine. You did verify both the design and the
>>> implementation I guess. So I don't see why not.
>>>
>>> How urgent is this. I could give this some love till the end of the
>>> week. Should I @Connie,@Farhan?
>>
>> Having more people review it is always a good thing :)
>>
> 
> Hi Farhan,
> 
> I was starring at this code for about an hour if not more and could not
> figure out the intentions/ideas behind it. That is not a fault of your
> patch, but I can't say that I understand neither the before nor the
> after.
> 
> What understand this patch basically does is make us call
> cio_disable_subchannel() more often. That is what you point out in your
> commit message as well. But I fail to see how does this achieve what the
> summary line promises:  'Prevent quiesce function going into an infinite
> loop'.
> 

The main problem with the previous way, was we were calling 
cio_cancel_halt_clear and then waiting and then calling it again.

So if cio_cancel_halt_clear returned EBUSY we would always be stuck in 
the first loop. Now a problem can occur when cancel subchannel returns 
EINVAL (cc 2) and so we try to do halt subchannel. cio_cancel_halt_clear 
will return EBUSY for a successful halt subchannel as well. And so back 
in the quiesce function we will wait and if the halt succeeds, the 
channel subsystem will clear the halt pending bit in the activity 
control field of SCSW. This means the next time we try 
cio_cancel_halt_clear we will again start by calling cancel subchannel, 
which could again return EINVAL....

We would be stuck in an infinite loop. One way to prevent this is to 
call cio_disable_subchannel right after calling cio_cancel_halt_clear,
if we can successfully disable the subchannel then we are sure the 
device is quiesced.


> Sorry, I can't r-b this. Maybe you can help me gain an understanding of
> this code offline.

I hope the above explanation helps.

> 
> I guess, the approval of the people who actually understand what it is
> going on (i.e. Connie and Eric) will have to suffice.
> 
> Regards,
> Halil
> 
>
Halil Pasic April 23, 2019, 5:42 p.m. UTC | #9
On Mon, 22 Apr 2019 10:01:46 -0400
Farhan Ali <alifm@linux.ibm.com> wrote:

> 
> 
> On 04/19/2019 04:12 PM, Halil Pasic wrote:
> > On Wed, 17 Apr 2019 11:18:19 -0400
> > Farhan Ali <alifm@linux.ibm.com> wrote:
> > 
> >>
> >>
> >> On 04/17/2019 11:13 AM, Halil Pasic wrote:
> >>>>> Otherwise, looks good to me. Will queue when I get some ack/r-b.
> >>>>>
> >>>> I like it, but I feel weird giving an r-b to something I suggested:
> >>>>
> >>>> Acked-by: Eric Farman<farman@linux.ibm.com>
> >>>>
> >>> I think r-b is fine. You did verify both the design and the
> >>> implementation I guess. So I don't see why not.
> >>>
> >>> How urgent is this. I could give this some love till the end of the
> >>> week. Should I @Connie,@Farhan?
> >>
> >> Having more people review it is always a good thing :)
> >>
> > 
> > Hi Farhan,
> > 
> > I was starring at this code for about an hour if not more and could not
> > figure out the intentions/ideas behind it. That is not a fault of your
> > patch, but I can't say that I understand neither the before nor the
> > after.
> > 
> > What understand this patch basically does is make us call
> > cio_disable_subchannel() more often. That is what you point out in your
> > commit message as well. But I fail to see how does this achieve what the
> > summary line promises:  'Prevent quiesce function going into an infinite
> > loop'.
> > 
> 

> The main problem with the previous way, was we were calling 
> cio_cancel_halt_clear and then waiting and then calling it again.

Thanks for explaining this to me offline ;).

> 
> So if cio_cancel_halt_clear returned EBUSY we would always be stuck in 
> the first loop. Now a problem can occur when cancel subchannel returns 
> EINVAL (cc 2) and so we try to do halt subchannel. cio_cancel_halt_clear 
> will return EBUSY for a successful halt subchannel as well. And so back 
> in the quiesce function we will wait and if the halt succeeds, the 
> channel subsystem will clear the halt pending bit in the activity 
> control field of SCSW. This means the next time we try 
> cio_cancel_halt_clear we will again start by calling cancel subchannel, 
> which could again return EINVAL....
> 


So in other words (one of) the problematic scenario(s) is the following.
1. xsch() returns cc 2 because there is not start pending and is not
suspended (i.e. there is nothing to cancel)
2. we proceed to hsch() which completes successfully and
cio_cancel_halt_clear() returns EBUSY
3. We wait for the interrupt and halt pending gets cleared because
a halt signal was issued to the device (e.g. halt worked fine).
4. We call cio_cancel_halt_clear() again *despite the fact that halt
already accomplished what we were hoping for*.
5. xsch() returns cc 2 again, because hopefully no new ssch() caused
the subchannel to have a start/resume pending. So we progress to 2.

Seems the current code is broken if the cancel() in
cio_cancel_halt_clear() returns EINVAL (cc 2), and it returning EBUSY
(cc 1) is also suspicious (because we don't do a tsch()).

> We would be stuck in an infinite loop. One way to prevent this is to 
> call cio_disable_subchannel right after calling cio_cancel_halt_clear,
> if we can successfully disable the subchannel then we are sure the 
> device is quiesced.
> 


Yes your change helps with cc 2 form chancel(). I'm not sure about cc 1
though -- I'm not sure cc = 1 is even possible. 

So now that I kind of understand what is this patch trying to accomplish,
I think I can help with some stuff.

Let us have a look at the commit message first:



> Subject: [PATCH v3 1/1] vfio-ccw: Prevent quiesce function going into
> 
> The quiesce function calls cio_cancel_halt_clear() and if we
> get an -EBUSY we go into a loop where we:
> 	- wait for any interrupts
> 	- flush all I/O in the workqueue

We don't actually flush any I/O. What may sit on the workqueue is the
'tell QEMU about an IRB' work.

> 	- retry cio_cancel_halt_clear
> 
> During the period where we are waiting for interrupts or
> flushing all I/O, the channel subsystem could have completed
> a halt/clear action and turned off the corresponding activity
> control bits in the subchannel status word.

The very idea behind the completion is to wait until we get
an interrupt.

> This means the next
> time we call cio_cancel_halt_clear(), we will again start by
> calling cancel subchannel and so we can be stuck between calling
> cancel and halt forever.
> 

Actually very likely if we had a cancel() returned EINVAL previously.

> Rather than calling cio_cancel_halt_clear() immediately after
> waiting, let's try to disable the subchannel. If we succeed in
> disabling the subchannel then we know nothing else can happen
> with the device.

One thing I'm confused about is, that we don't seem to prevent
new I/O being submitted. That is we could still loop indefinitely
if we get new IO after the 'kill I/O on the subchannel' is done but
before the msch() with the disable is issued.

The 'flush all I/O' parts in the commit message and in the code make
this even more confusing.

Another thing that I don't quite understand is injecting interrupts
into QEMU for stuff that is actually not guest initiated.

Furthermore I find how cio_cancel_halt_clear() quite confusing. We
usually return EBUSY to say that the function was suppressed because,
well, the resource is busy. For cio_cancel_halt_clear() EBUSY means:
* either a xsch() was effectively suppressed because status pending
* or an hsch() or csch() was actually successfully executed, and the
client code is supposed to wait for the assync clear/halt function
to complete. This gets even more confusing when one reads:

/**                                                                             
 * cio_cancel_halt_clear - Cancel running I/O by performing cancel, halt        
 * and clear ordinally if subchannel is valid.                                  
 * @sch: subchannel on which to perform the cancel_halt_clear operation         
 * @iretry: the number of the times remained to retry the next operation        
 *                                                                              
 * This should be called repeatedly since halt/clear are asynchronous

This is simply not true. Because halt/clear is async, we may not know
if we managed to accomplish canceling the running I/O. But the next
call of this function won't detect and say 'yep it worked, we are good
now' and return 0. This is the responsibility of the caller.

If it were like that, the current code would have been fine!
           
 * operations. We do one try with cio_cancel, three tries with cio_halt,        
 * 255 tries with cio_clear. The caller should initialize @iretry with          
 * the value 255 for its first call to this, and keep using the same            
 * @iretry in the subsequent calls until it gets a non -EBUSY return.           
 *                                                                              
 * Returns 0 if device now idle, -ENODEV for device not operational,            
 * -EBUSY if an interrupt is expected (either from halt/clear or from a         
 * status pending), and -EIO if out of retries.                                 
 */                                                                             
int cio_cancel_halt_clear(struct subchannel *sch, int *iretry

TL;DR:

I welcome  this batch (with an r-b) but I would like the commit message
and the comment changed so that the misleading 'flush all I/O in the
workqueue'.

I think 'vfio-ccw: fix cio_cancel_halt_clear() usage' would reflect the
content of this patch better, because reasoning about the upper limit,
and what happens if this upper limit is hit is not what this patch is
about. It is about a client code bug that rendered iretry ineffective.

Regards,
Halil
Farhan Ali April 23, 2019, 7:41 p.m. UTC | #10
On 04/23/2019 01:42 PM, Halil Pasic wrote:
> One thing I'm confused about is, that we don't seem to prevent
> new I/O being submitted. That is we could still loop indefinitely
> if we get new IO after the 'kill I/O on the subchannel' is done but
> before the msch() with the disable is issued.

So the quiesce function will be called in the remove, release functions 
and also in the mdev reset callback via an ioctl VFIO_DEVICE_RESET.

Now the release function is invoked in cases when we hot unplug the 
device or the guest is gone (or anything that will close the vfio mdev 
file descriptor, I believe). In such scenarios it's really the userspace 
which is asking to release the device. Similar for remove, where the 
user has to explicitly write to the remove file for the mdev to invoke 
it. Under normal conditions no sane userspace should be doing 
release/remove while there are still on going I/Os :)

Me and Conny had some discussion on this in v1 of this patch:
https://marc.info/?l=kvm&m=155437117823248&w=2

> 
> The 'flush all I/O' parts in the commit message and in the code make
> this even more confusing.

Maybe...if it's too confusing it could be fixed, but IMHO I don't think 
it's a dealbreaker. If anyone else thinks otherwise, I can go ahead and 
change it.

> 
> Another thing that I don't quite understand is injecting interrupts
> into QEMU for stuff that is actually not guest initiated.

As mentioned above under normal conditions we shouldn't be doing 
quiesce. But wouldn't those interrupts just be unsolicited interrupts 
for the guest?

> 
> Furthermore I find how cio_cancel_halt_clear() quite confusing. We
> usually return EBUSY to say that the function was suppressed because,
> well, the resource is busy. For cio_cancel_halt_clear() EBUSY means:
> * either a xsch() was effectively suppressed because status pending
> * or an hsch() or csch() was actually successfully executed, and the
> client code is supposed to wait for the assync clear/halt function
> to complete. This gets even more confusing when one reads:
> 
> /**
>   * cio_cancel_halt_clear - Cancel running I/O by performing cancel, halt
>   * and clear ordinally if subchannel is valid.
>   * @sch: subchannel on which to perform the cancel_halt_clear operation
>   * @iretry: the number of the times remained to retry the next operation
>   *
>   * This should be called repeatedly since halt/clear are asynchronous
> 
> This is simply not true. Because halt/clear is async, we may not know
> if we managed to accomplish canceling the running I/O. But the next
> call of this function won't detect and say 'yep it worked, we are good
> now' and return 0. This is the responsibility of the caller.
> 
> If it were like that, the current code would have been fine!
>             
>   * operations. We do one try with cio_cancel, three tries with cio_halt,
>   * 255 tries with cio_clear. The caller should initialize @iretry with
>   * the value 255 for its first call to this, and keep using the same
>   * @iretry in the subsequent calls until it gets a non -EBUSY return.
>   *
>   * Returns 0 if device now idle, -ENODEV for device not operational,
>   * -EBUSY if an interrupt is expected (either from halt/clear or from a
>   * status pending), and -EIO if out of retries.
>   */
> int cio_cancel_halt_clear(struct subchannel *sch, int *iretry
> 
> TL;DR:
> 
> I welcome  this batch (with an r-b) but I would like the commit message
> and the comment changed so that the misleading 'flush all I/O in the
> workqueue'.
> 
> I think 'vfio-ccw: fix cio_cancel_halt_clear() usage' would reflect the
> content of this patch better, because reasoning about the upper limit,
> and what happens if this upper limit is hit is not what this patch is
> about. It is about a client code bug that rendered iretry ineffective.
> 

I politely disagree with the change in subject line. I think the current 
subject line describe what we are trying to prevent with this patch. But 
again if anyone else feels otherwise, I will go ahead and change :)

Thanks
Farhan


> Regards,
> Halil
> 
>
Eric Farman April 23, 2019, 8:37 p.m. UTC | #11
On 4/23/19 3:41 PM, Farhan Ali wrote:
> 
> 
> On 04/23/2019 01:42 PM, Halil Pasic wrote:

...snip...

>> TL;DR:
>>
>> I welcome  this batch (with an r-b) but I would like the commit message
>> and the comment changed so that the misleading 'flush all I/O in the
>> workqueue'.
>>
>> I think 'vfio-ccw: fix cio_cancel_halt_clear() usage' would reflect the
>> content of this patch better, because reasoning about the upper limit,
>> and what happens if this upper limit is hit is not what this patch is
>> about. It is about a client code bug that rendered iretry ineffective.
>>
> 
> I politely disagree with the change in subject line. I think the current 
> subject line describe what we are trying to prevent with this patch. But 
> again if anyone else feels otherwise, I will go ahead and change :)

I think the entire patch is fine as-is.

Reviewed-by: Eric Farman <farman@linux.ibm.com>

> 
> Thanks
> Farhan
> 
> 
>> Regards,
>> Halil
>>
>>
Cornelia Huck April 24, 2019, 7:09 a.m. UTC | #12
On Tue, 23 Apr 2019 15:41:34 -0400
Farhan Ali <alifm@linux.ibm.com> wrote:

> On 04/23/2019 01:42 PM, Halil Pasic wrote:
> > One thing I'm confused about is, that we don't seem to prevent
> > new I/O being submitted. That is we could still loop indefinitely
> > if we get new IO after the 'kill I/O on the subchannel' is done but
> > before the msch() with the disable is issued.  
> 
> So the quiesce function will be called in the remove, release functions 
> and also in the mdev reset callback via an ioctl VFIO_DEVICE_RESET.
> 
> Now the release function is invoked in cases when we hot unplug the 
> device or the guest is gone (or anything that will close the vfio mdev 
> file descriptor, I believe). In such scenarios it's really the userspace 
> which is asking to release the device. Similar for remove, where the 
> user has to explicitly write to the remove file for the mdev to invoke 
> it. Under normal conditions no sane userspace should be doing 
> release/remove while there are still on going I/Os :)
> 
> Me and Conny had some discussion on this in v1 of this patch:
> https://marc.info/?l=kvm&m=155437117823248&w=2
> 
> > 
> > The 'flush all I/O' parts in the commit message and in the code make
> > this even more confusing.  
> 
> Maybe...if it's too confusing it could be fixed, but IMHO I don't think 
> it's a dealbreaker. If anyone else thinks otherwise, I can go ahead and 
> change it.

I think it's fine -- I wasn't confused.

> 
> > 
> > Another thing that I don't quite understand is injecting interrupts
> > into QEMU for stuff that is actually not guest initiated.  
> 
> As mentioned above under normal conditions we shouldn't be doing 
> quiesce. But wouldn't those interrupts just be unsolicited interrupts 
> for the guest?

Yes, you simply cannot keep an enabled subchannel from getting status
pending with unsolicited status.

> 
> > 
> > Furthermore I find how cio_cancel_halt_clear() quite confusing. We

Well, that's a problem (if any) with the common I/O layer and beyond
the scope of this patch.

> > TL;DR:
> > 
> > I welcome  this batch (with an r-b) but I would like the commit message

So, what does this sentence mean? Confused.

> > and the comment changed so that the misleading 'flush all I/O in the
> > workqueue'.
> > 
> > I think 'vfio-ccw: fix cio_cancel_halt_clear() usage' would reflect the
> > content of this patch better, because reasoning about the upper limit,
> > and what happens if this upper limit is hit is not what this patch is
> > about. It is about a client code bug that rendered iretry ineffective.
> >   
> 
> I politely disagree with the change in subject line. I think the current 
> subject line describe what we are trying to prevent with this patch. But 
> again if anyone else feels otherwise, I will go ahead and change :)

No, I agree that the subject line is completely fine.
Halil Pasic April 24, 2019, 10:02 a.m. UTC | #13
On Wed, 24 Apr 2019 09:09:15 +0200
Cornelia Huck <cohuck@redhat.com> wrote:

> On Tue, 23 Apr 2019 15:41:34 -0400
> Farhan Ali <alifm@linux.ibm.com> wrote:
> 
> > On 04/23/2019 01:42 PM, Halil Pasic wrote:
> > > One thing I'm confused about is, that we don't seem to prevent
> > > new I/O being submitted. That is we could still loop indefinitely
> > > if we get new IO after the 'kill I/O on the subchannel' is done but
> > > before the msch() with the disable is issued.  
> > 
> > So the quiesce function will be called in the remove, release functions 
> > and also in the mdev reset callback via an ioctl VFIO_DEVICE_RESET.
> > 
> > Now the release function is invoked in cases when we hot unplug the 
> > device or the guest is gone (or anything that will close the vfio mdev 
> > file descriptor, I believe). In such scenarios it's really the userspace 
> > which is asking to release the device. Similar for remove, where the 
> > user has to explicitly write to the remove file for the mdev to invoke 
> > it. Under normal conditions no sane userspace should be doing 
> > release/remove while there are still on going I/Os :)
> > 
> > Me and Conny had some discussion on this in v1 of this patch:
> > https://marc.info/?l=kvm&m=155437117823248&w=2
> > 
> > > 
> > > The 'flush all I/O' parts in the commit message and in the code make
> > > this even more confusing.  
> > 
> > Maybe...if it's too confusing it could be fixed, but IMHO I don't think 
> > it's a dealbreaker. If anyone else thinks otherwise, I can go ahead and 
> > change it.
> 
> I think it's fine -- I wasn't confused.
> 

What I/O is flushed in the workqueue? I guess it is about the line

flush_workqueue(vfio_ccw_work_q);

But there is no I/O that can be flushed in vfio_ccw_work_q, but the
bottom half of the interrupt handler if you like. 

> > 
> > > 
> > > Another thing that I don't quite understand is injecting interrupts
> > > into QEMU for stuff that is actually not guest initiated.  
> > 
> > As mentioned above under normal conditions we shouldn't be doing 
> > quiesce. But wouldn't those interrupts just be unsolicited interrupts 
> > for the guest?
> 
> Yes, you simply cannot keep an enabled subchannel from getting status
> pending with unsolicited status.
> 

I don't think a status that results from a halt subchannel can be called
unsolicited. For example if no halt signal was issued, the halt remains
pending. IMHO it is, form a guest perspective to see a halt pending in an
IRB without having a HSCH executed.

> > 
> > > 
> > > Furthermore I find how cio_cancel_halt_clear() quite confusing. We
> 
> Well, that's a problem (if any) with the common I/O layer and beyond
> the scope of this patch.
> 
> > > TL;DR:
> > > 
> > > I welcome  this batch (with an r-b) but I would like the commit message
> 
> So, what does this sentence mean? Confused.
> 

s/batch/patch/ and the sentence misses 'is gone' at the very end.

> > > and the comment changed so that the misleading 'flush all I/O in the
> > > workqueue'.
> > > 
> > > I think 'vfio-ccw: fix cio_cancel_halt_clear() usage' would reflect the
> > > content of this patch better, because reasoning about the upper limit,
> > > and what happens if this upper limit is hit is not what this patch is
> > > about. It is about a client code bug that rendered iretry ineffective.
> > >   
> > 
> > I politely disagree with the change in subject line. I think the current 
> > subject line describe what we are trying to prevent with this patch. But 
> > again if anyone else feels otherwise, I will go ahead and change :)
> 
> No, I agree that the subject line is completely fine.
> 

This is the 'infinite' loop in question.

                iretry = 255;                                                                            
                                                                                                         
                ret = cio_cancel_halt_clear(sch, &iretry);                                               
                while (ret == -EBUSY) {                                                                  
                        /*                                                                               
                         * Flush all I/O and wait for                                                    
                         * cancel/halt/clear completion.                                                 
                         */                                                                              
                        private->completion = &completion;                                               
                        spin_unlock_irq(sch->lock);                                                      
                                                                                                         
                        wait_for_completion_timeout(&completion, 3*HZ);                                  
                                                                                                         
                        private->completion = NULL;                                                      
                        flush_workqueue(vfio_ccw_work_q);                                                
                        spin_lock_irq(sch->lock);                                                        
                        ret = cio_cancel_halt_clear(sch, &iretry);                                       
                };

Considering the documentation of cio_cancel_halt_clear() and without
fully understanding the implementation of cio_cancel_halt_clear() it
looks IMHO limited to 255 retries. But it is not. Because
cio_cancel_halt_clear() is used incorrectly.

Regarding the changed code, sorry, I missed the break on -EIO. With that
the new loop should indeed be limited to ~255 iterations.

Acked-by: Halil Pasic <pasic@linux.ibm.com>

Regards,
Halil
Halil Pasic April 24, 2019, 10:21 a.m. UTC | #14
On Tue, 23 Apr 2019 15:41:34 -0400
Farhan Ali <alifm@linux.ibm.com> wrote:

> 
> 
> On 04/23/2019 01:42 PM, Halil Pasic wrote:
> > One thing I'm confused about is, that we don't seem to prevent
> > new I/O being submitted. That is we could still loop indefinitely
> > if we get new IO after the 'kill I/O on the subchannel' is done but
> > before the msch() with the disable is issued.
> 
> So the quiesce function will be called in the remove, release functions 
> and also in the mdev reset callback via an ioctl VFIO_DEVICE_RESET.
> 
> Now the release function is invoked in cases when we hot unplug the 
> device or the guest is gone (or anything that will close the vfio mdev 
> file descriptor, I believe). In such scenarios it's really the userspace 
> which is asking to release the device. Similar for remove, where the 
> user has to explicitly write to the remove file for the mdev to invoke 
> it. Under normal conditions no sane userspace should be doing 
> release/remove while there are still on going I/Os :)

So you say userspace has to take care of it. Is this documented
somewhere? Does QEMU actually take care of it?

> 
> Me and Conny had some discussion on this in v1 of this patch:
> https://marc.info/?l=kvm&m=155437117823248&w=2
> 
> > 
> > The 'flush all I/O' parts in the commit message and in the code make
> > this even more confusing.
> 
> Maybe...if it's too confusing it could be fixed, but IMHO I don't think 
> it's a dealbreaker. If anyone else thinks otherwise, I can go ahead and 
> change it.
> 

I responded to Connie. No it is not a dealbreaker. But I prefer not
disseminating false information.

[..]

Regarding the rest of the points from her, I responded to Connie's
follow up as well.

Cheers,
Halil
Cornelia Huck April 24, 2019, 4:35 p.m. UTC | #15
On Tue, 16 Apr 2019 17:23:14 -0400
Farhan Ali <alifm@linux.ibm.com> wrote:

> The quiesce function calls cio_cancel_halt_clear() and if we
> get an -EBUSY we go into a loop where we:
> 	- wait for any interrupts
> 	- flush all I/O in the workqueue
> 	- retry cio_cancel_halt_clear
> 
> During the period where we are waiting for interrupts or
> flushing all I/O, the channel subsystem could have completed
> a halt/clear action and turned off the corresponding activity
> control bits in the subchannel status word. This means the next
> time we call cio_cancel_halt_clear(), we will again start by
> calling cancel subchannel and so we can be stuck between calling
> cancel and halt forever.
> 
> Rather than calling cio_cancel_halt_clear() immediately after
> waiting, let's try to disable the subchannel. If we succeed in
> disabling the subchannel then we know nothing else can happen
> with the device.
> 
> Suggested-by: Eric Farman <farman@linux.ibm.com>
> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
> ---
> ChangeLog:
> v2 -> v3
>    - Log an error message when cio_cancel_halt_clear
>      returns EIO and break out of the loop.
>    
>    - Did not include past change log as the other patches
>      of the original series have been queued by Conny.
>      Old series (v2) can be found here:
>      https://marc.info/?l=kvm&m=155475754101769&w=2
> 
>  drivers/s390/cio/vfio_ccw_drv.c | 32 ++++++++++++++++++--------------
>  1 file changed, 18 insertions(+), 14 deletions(-)

Thanks, applied.

Patch
diff mbox series

diff --git a/drivers/s390/cio/vfio_ccw_drv.c b/drivers/s390/cio/vfio_ccw_drv.c
index 78517aa..66a66ac 100644
--- a/drivers/s390/cio/vfio_ccw_drv.c
+++ b/drivers/s390/cio/vfio_ccw_drv.c
@@ -43,26 +43,30 @@  int vfio_ccw_sch_quiesce(struct subchannel *sch)
 	if (ret != -EBUSY)
 		goto out_unlock;
 
+	iretry = 255;
 	do {
-		iretry = 255;
 
 		ret = cio_cancel_halt_clear(sch, &iretry);
-		while (ret == -EBUSY) {
-			/*
-			 * Flush all I/O and wait for
-			 * cancel/halt/clear completion.
-			 */
-			private->completion = &completion;
-			spin_unlock_irq(sch->lock);
 
-			wait_for_completion_timeout(&completion, 3*HZ);
+		if (ret == -EIO) {
+			pr_err("vfio_ccw: could not quiesce subchannel 0.%x.%04x!\n",
+			       sch->schid.ssid, sch->schid.sch_no);
+			break;
+		}
+
+		/*
+		 * Flush all I/O and wait for
+		 * cancel/halt/clear completion.
+		 */
+		private->completion = &completion;
+		spin_unlock_irq(sch->lock);
 
-			private->completion = NULL;
-			flush_workqueue(vfio_ccw_work_q);
-			spin_lock_irq(sch->lock);
-			ret = cio_cancel_halt_clear(sch, &iretry);
-		};
+		if (ret == -EBUSY)
+			wait_for_completion_timeout(&completion, 3*HZ);
 
+		private->completion = NULL;
+		flush_workqueue(vfio_ccw_work_q);
+		spin_lock_irq(sch->lock);
 		ret = cio_disable_subchannel(sch);
 	} while (ret == -EBUSY);
 out_unlock: