diff mbox series

[RFC,v1,1/1] vfio-ccw: Don't call cp_free if we are processing a channel program

Message ID 46dc0cbdcb8a414d70b7807fceb1cca6229408d5.1561055076.git.alifm@linux.ibm.com (mailing list archive)
State New, archived
Headers show
Series [RFC,v1,1/1] vfio-ccw: Don't call cp_free if we are processing a channel program | expand

Commit Message

Farhan Ali June 20, 2019, 7:40 p.m. UTC
There is a small window where it's possible that an interrupt can
arrive and can call cp_free, while we are still processing a channel
program (i.e allocating memory, pinnging pages, translating
addresses etc). This can lead to allocating and freeing at the same
time and can cause memory corruption.

Let's not call cp_free if we are currently processing a channel program.

Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
---

I have been running my test overnight with this patch and I haven't
seen the stack traces that I mentioned about earlier. I would like
to get some reviews on this and also if this is the right thing to
do?

Thanks
Farhan

 drivers/s390/cio/vfio_ccw_drv.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Eric Farman June 20, 2019, 8:27 p.m. UTC | #1
On 6/20/19 3:40 PM, Farhan Ali wrote:
> There is a small window where it's possible that an interrupt can
> arrive and can call cp_free, while we are still processing a channel
> program (i.e allocating memory, pinnging pages, translating

s/pinnging/pinning/

> addresses etc). This can lead to allocating and freeing at the same
> time and can cause memory corruption.
> 
> Let's not call cp_free if we are currently processing a channel program.

The check around this cp_free() call is for a solicited interrupt, so
it's presumably in response to a SSCH we issued.  But if we're still
processing a CP, then we hadn't issued the SSCH to the hardware yet.  So
what is this interrupt for?  Do the contents of irb.cpa provide any
clues, perhaps if it's in the current cp or for someone else?

> 
> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
> ---
> 
> I have been running my test overnight with this patch and I haven't
> seen the stack traces that I mentioned about earlier. I would like
> to get some reviews on this and also if this is the right thing to
> do?
> 
> Thanks
> Farhan
> 
>  drivers/s390/cio/vfio_ccw_drv.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/s390/cio/vfio_ccw_drv.c b/drivers/s390/cio/vfio_ccw_drv.c
> index 66a66ac..61ece3f 100644
> --- a/drivers/s390/cio/vfio_ccw_drv.c
> +++ b/drivers/s390/cio/vfio_ccw_drv.c
> @@ -88,7 +88,7 @@ static void vfio_ccw_sch_io_todo(struct work_struct *work)
>  		     (SCSW_ACTL_DEVACT | SCSW_ACTL_SCHACT));
>  	if (scsw_is_solicited(&irb->scsw)) {
>  		cp_update_scsw(&private->cp, &irb->scsw);

As I alluded earlier, do we know this irb is for this cp?  If no, what
does this function end up putting in the scsw?

> -		if (is_final)
> +		if (is_final && private->state != VFIO_CCW_STATE_CP_PROCESSING)

In looking at how we set this state, and how we exit it, I see we do:

if SSCH got CC0, CP_PROCESSING -> CP_PENDING
if SSCH got !CC0, CP_PROCESSING -> IDLE

While the first scenario happens immediately after the SSCH instruction,
I guess it could be just tiny enough, like the io_trigger FSM patch I
sent a few weeks ago.

Meanwhile, the latter happens way after we return from the jump table.
So that scenario leaves considerable time for such an interrupt to
occur, though I don't understand why it would if we got a CC(1-3) on the
SSCH.

And anyway, the return from fsm_io_helper() in that case will also call
cp_free().  So why does the cp->initialized check provide protection
from a double-free in that direction, but not here?  I'm confused.

>  			cp_free(&private->cp);
>  	}
>  	mutex_lock(&private->io_mutex);
>
Halil Pasic June 21, 2019, 2 p.m. UTC | #2
On Thu, 20 Jun 2019 17:07:09 -0400
Farhan Ali <alifm@linux.ibm.com> wrote:

> There is a small window where it's possible that an interrupt can
> arrive and can call cp_free, while we are still processing a channel
> program (i.e allocating memory, pinnging pages, translating
> addresses etc). This can lead to allocating and freeing at the same
> time and can cause memory corruption.
> 
> Let's not call cp_free if we are currently processing a channel program.
> 
> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
> ---
> 
> I have been running my test overnight with this patch and I haven't
> seen the stack traces that I mentioned about earlier. I would like
> to get some reviews on this and also if this is the right thing to
> do?
> 
> Thanks
> Farhan
> 
>  drivers/s390/cio/vfio_ccw_drv.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/s390/cio/vfio_ccw_drv.c b/drivers/s390/cio/vfio_ccw_drv.c
> index 66a66ac..61ece3f 100644
> --- a/drivers/s390/cio/vfio_ccw_drv.c
> +++ b/drivers/s390/cio/vfio_ccw_drv.c
> @@ -88,7 +88,7 @@ static void vfio_ccw_sch_io_todo(struct work_struct *work)
>  		     (SCSW_ACTL_DEVACT | SCSW_ACTL_SCHACT));
>  	if (scsw_is_solicited(&irb->scsw)) {
>  		cp_update_scsw(&private->cp, &irb->scsw);
> -		if (is_final)
> +		if (is_final && private->state != VFIO_CCW_STATE_CP_PROCESSING)

How is access to private->state correctly synchronized? And don't we
expect private->state == VFIO_CCW_STATE_CP_PENDING in case the cp was
submitted successfully with a ssch() and is done now (one way or the
other)?

Does this have something to do with 71189f2 "vfio-ccw: make it safe to
access channel programs" (Cornelia Huck, 2019-01-21)?

Regards,
Halil

>  			cp_free(&private->cp);
>  	}
>  	mutex_lock(&private->io_mutex);
Farhan Ali June 21, 2019, 2:17 p.m. UTC | #3
On 06/20/2019 04:27 PM, Eric Farman wrote:
> 
> 
> On 6/20/19 3:40 PM, Farhan Ali wrote:
>> There is a small window where it's possible that an interrupt can
>> arrive and can call cp_free, while we are still processing a channel
>> program (i.e allocating memory, pinnging pages, translating
> 
> s/pinnging/pinning/
> 
>> addresses etc). This can lead to allocating and freeing at the same
>> time and can cause memory corruption.
>>
>> Let's not call cp_free if we are currently processing a channel program.
> 
> The check around this cp_free() call is for a solicited interrupt, so
> it's presumably in response to a SSCH we issued.  But if we're still
> processing a CP, then we hadn't issued the SSCH to the hardware yet.  So
> what is this interrupt for?  Do the contents of irb.cpa provide any
> clues, perhaps if it's in the current cp or for someone else?
> 

I don't think the interrupt is in response to an ssch but rather due to 
an csch/hsch.

>>
>> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
>> ---
>>
>> I have been running my test overnight with this patch and I haven't
>> seen the stack traces that I mentioned about earlier. I would like
>> to get some reviews on this and also if this is the right thing to
>> do?
>>
>> Thanks
>> Farhan
>>
>>   drivers/s390/cio/vfio_ccw_drv.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/s390/cio/vfio_ccw_drv.c b/drivers/s390/cio/vfio_ccw_drv.c
>> index 66a66ac..61ece3f 100644
>> --- a/drivers/s390/cio/vfio_ccw_drv.c
>> +++ b/drivers/s390/cio/vfio_ccw_drv.c
>> @@ -88,7 +88,7 @@ static void vfio_ccw_sch_io_todo(struct work_struct *work)
>>   		     (SCSW_ACTL_DEVACT | SCSW_ACTL_SCHACT));
>>   	if (scsw_is_solicited(&irb->scsw)) {
>>   		cp_update_scsw(&private->cp, &irb->scsw);
> 
> As I alluded earlier, do we know this irb is for this cp?  If no, what
> does this function end up putting in the scsw?
> 
>> -		if (is_final)
>> +		if (is_final && private->state != VFIO_CCW_STATE_CP_PROCESSING)
> 
> In looking at how we set this state, and how we exit it, I see we do:
> 
> if SSCH got CC0, CP_PROCESSING -> CP_PENDING
> if SSCH got !CC0, CP_PROCESSING -> IDLE
> 
> While the first scenario happens immediately after the SSCH instruction,
> I guess it could be just tiny enough, like the io_trigger FSM patch I
> sent a few weeks ago.
> 
> Meanwhile, the latter happens way after we return from the jump table.
> So that scenario leaves considerable time for such an interrupt to
> occur, though I don't understand why it would if we got a CC(1-3) on the
> SSCH.
> 
> And anyway, the return from fsm_io_helper() in that case will also call
> cp_free().  So why does the cp->initialized check provide protection
> from a double-free in that direction, but not here?  I'm confused.

I have a theory where I think it's possible to have 2 different threads 
executing cp_free

If we start with private->state == IDLE and the guest issues a 
clear/halt and then an ssch

- clear/halt will be issued to hardware, and if succeeds we will return 
cc=0 to guest

- the guest can then issue ssch

- we get an interrupt for csch/hsch and we queue the interrupt in the 
workqueue

- we start processing the ssch and then at the same time another cpu 
could be working on the
interrupt


Thread 1                                        Thread 2
--------                                        --------

fsm_io_request                                  vfio_ccw_sch_io_todo 

     cp_init                                         cp_free
     cp_prefetch
     fsm_io_helper
         cp_free



The test that I am trying is with a guest running an fio workload, while 
at the same time stressing the error recovery path in the guest. So 
there is a lot of ssch and lot of csch.

Of course I don't think my patch completely solves the problem, I think 
it just makes the window narrower. I just wanted to get a discussion 
started :)


Now that I am thinking more about it, I think we might have to protect 
cp with it's own mutex.

Thanks
Farhan


> 
>>   			cp_free(&private->cp);
>>   	}
>>   	mutex_lock(&private->io_mutex);
>>
>
Farhan Ali June 21, 2019, 2:26 p.m. UTC | #4
Hey Halil,

On 06/21/2019 10:00 AM, Halil Pasic wrote:
> On Thu, 20 Jun 2019 17:07:09 -0400
> Farhan Ali <alifm@linux.ibm.com> wrote:
> 
>> There is a small window where it's possible that an interrupt can
>> arrive and can call cp_free, while we are still processing a channel
>> program (i.e allocating memory, pinnging pages, translating
>> addresses etc). This can lead to allocating and freeing at the same
>> time and can cause memory corruption.
>>
>> Let's not call cp_free if we are currently processing a channel program.
>>
>> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
>> ---
>>
>> I have been running my test overnight with this patch and I haven't
>> seen the stack traces that I mentioned about earlier. I would like
>> to get some reviews on this and also if this is the right thing to
>> do?
>>
>> Thanks
>> Farhan
>>
>>   drivers/s390/cio/vfio_ccw_drv.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/s390/cio/vfio_ccw_drv.c b/drivers/s390/cio/vfio_ccw_drv.c
>> index 66a66ac..61ece3f 100644
>> --- a/drivers/s390/cio/vfio_ccw_drv.c
>> +++ b/drivers/s390/cio/vfio_ccw_drv.c
>> @@ -88,7 +88,7 @@ static void vfio_ccw_sch_io_todo(struct work_struct *work)
>>   		     (SCSW_ACTL_DEVACT | SCSW_ACTL_SCHACT));
>>   	if (scsw_is_solicited(&irb->scsw)) {
>>   		cp_update_scsw(&private->cp, &irb->scsw);
>> -		if (is_final)
>> +		if (is_final && private->state != VFIO_CCW_STATE_CP_PROCESSING)
> 
> How is access to private->state correctly synchronized? And don't we
> expect private->state == VFIO_CCW_STATE_CP_PENDING in case the cp was
> submitted successfully with a ssch() and is done now (one way or the
> other)?

I think the interrupt that we are processing could be for a csch/hsch 
and not an ssch. So we could have one thread handling an ssch and 
another thread handling a csch/hsch interrupt.

> 
> Does this have something to do with 71189f2 "vfio-ccw: make it safe to
> access channel programs" (Cornelia Huck, 2019-01-21)?

It's related. Though that patch handles freeing cp when we have 
successfully issued a ssch and then issue a csch/hsch. But it leaves a 
tiny window where if the reverse happens where we get an csch/hsch 
first, and then ssch there could be a scenario where we can end up 
calling cp_free in 2 different threads.

I have responded to Eric's email explaining further, so if you could 
kindly refer to that and continue the discussion there because I think 
he had similar questions and concern as you :)

Thanks
Farhan

>
> Regards,
> Halil
> 
>>   			cp_free(&private->cp);
>>   	}
>>   	mutex_lock(&private->io_mutex);
> 
>
Eric Farman June 21, 2019, 5:40 p.m. UTC | #5
On 6/21/19 10:17 AM, Farhan Ali wrote:
> 
> 
> On 06/20/2019 04:27 PM, Eric Farman wrote:
>>
>>
>> On 6/20/19 3:40 PM, Farhan Ali wrote:
>>> There is a small window where it's possible that an interrupt can
>>> arrive and can call cp_free, while we are still processing a channel
>>> program (i.e allocating memory, pinnging pages, translating
>>
>> s/pinnging/pinning/
>>
>>> addresses etc). This can lead to allocating and freeing at the same
>>> time and can cause memory corruption.
>>>
>>> Let's not call cp_free if we are currently processing a channel program.
>>
>> The check around this cp_free() call is for a solicited interrupt, so
>> it's presumably in response to a SSCH we issued.  But if we're still
>> processing a CP, then we hadn't issued the SSCH to the hardware yet.  So
>> what is this interrupt for?  Do the contents of irb.cpa provide any
>> clues, perhaps if it's in the current cp or for someone else?
>>
> 
> I don't think the interrupt is in response to an ssch but rather due to
> an csch/hsch.
> 
>>>
>>> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
>>> ---
>>>
>>> I have been running my test overnight with this patch and I haven't
>>> seen the stack traces that I mentioned about earlier. I would like
>>> to get some reviews on this and also if this is the right thing to
>>> do?
>>>
>>> Thanks
>>> Farhan
>>>
>>>   drivers/s390/cio/vfio_ccw_drv.c | 2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/s390/cio/vfio_ccw_drv.c
>>> b/drivers/s390/cio/vfio_ccw_drv.c
>>> index 66a66ac..61ece3f 100644
>>> --- a/drivers/s390/cio/vfio_ccw_drv.c
>>> +++ b/drivers/s390/cio/vfio_ccw_drv.c
>>> @@ -88,7 +88,7 @@ static void vfio_ccw_sch_io_todo(struct work_struct
>>> *work)
>>>                (SCSW_ACTL_DEVACT | SCSW_ACTL_SCHACT));
>>>       if (scsw_is_solicited(&irb->scsw)) {
>>>           cp_update_scsw(&private->cp, &irb->scsw);
>>
>> As I alluded earlier, do we know this irb is for this cp?  If no, what
>> does this function end up putting in the scsw?
>>
>>> -        if (is_final)
>>> +        if (is_final && private->state != VFIO_CCW_STATE_CP_PROCESSING)
>>
>> In looking at how we set this state, and how we exit it, I see we do:
>>
>> if SSCH got CC0, CP_PROCESSING -> CP_PENDING
>> if SSCH got !CC0, CP_PROCESSING -> IDLE
>>
>> While the first scenario happens immediately after the SSCH instruction,
>> I guess it could be just tiny enough, like the io_trigger FSM patch I
>> sent a few weeks ago.
>>
>> Meanwhile, the latter happens way after we return from the jump table.
>> So that scenario leaves considerable time for such an interrupt to
>> occur, though I don't understand why it would if we got a CC(1-3) on the
>> SSCH.
>>
>> And anyway, the return from fsm_io_helper() in that case will also call
>> cp_free().  So why does the cp->initialized check provide protection
>> from a double-free in that direction, but not here?  I'm confused.
> 
> I have a theory where I think it's possible to have 2 different threads
> executing cp_free
> 
> If we start with private->state == IDLE and the guest issues a
> clear/halt and then an ssch
> 
> - clear/halt will be issued to hardware, and if succeeds we will return
> cc=0 to guest
> 
> - the guest can then issue ssch

It can issue whatever it wants, but shouldn't the SSCH get a CC2 until
the halt/clear pending state is cleared?  Hrm, fsm_io_request() doesn't
look.  Rather, it (fsm_io_helper()) relies on the CC2 to be signalled
from the SSCH issued to the device.  There's a lot of stuff that happens
before we get to that point.

I'm wondering if there's a way we could/should return the SSCH here
before we do any processing.  After all, until the interrupt on the
workqueue is processed, we are busy.

> 
> - we get an interrupt for csch/hsch and we queue the interrupt in the
> workqueue
> 
> - we start processing the ssch and then at the same time another cpu
> could be working on the
> interrupt>
> 
> Thread 1                                        Thread 2
> --------                                        --------
> 
> fsm_io_request                                  vfio_ccw_sch_io_todo
>     cp_init                                         cp_free
>     cp_prefetch
>     fsm_io_helper
>         cp_free
> 
> 
> 
> The test that I am trying is with a guest running an fio workload, while
> at the same time stressing the error recovery path in the guest. So
> there is a lot of ssch and lot of csch.
> 
> Of course I don't think my patch completely solves the problem, I think
> it just makes the window narrower. I just wanted to get a discussion
> started :)
> 
> 
> Now that I am thinking more about it, I think we might have to protect
> cp with it's own mutex.

That seems like a big hammer, and I wonder if the existing SCHIB/FSM/CP
state data doesn't provide that information to us.  But I gotta wander
around some code before I say.

> 
> Thanks
> Farhan
> 
> 
>>
>>>               cp_free(&private->cp);
>>>       }
>>>       mutex_lock(&private->io_mutex);
>>>
>>
>
Farhan Ali June 21, 2019, 6:34 p.m. UTC | #6
On 06/21/2019 01:40 PM, Eric Farman wrote:
> 
> 
> On 6/21/19 10:17 AM, Farhan Ali wrote:
>>
>>
>> On 06/20/2019 04:27 PM, Eric Farman wrote:
>>>
>>>
>>> On 6/20/19 3:40 PM, Farhan Ali wrote:
>>>> There is a small window where it's possible that an interrupt can
>>>> arrive and can call cp_free, while we are still processing a channel
>>>> program (i.e allocating memory, pinnging pages, translating
>>>
>>> s/pinnging/pinning/
>>>
>>>> addresses etc). This can lead to allocating and freeing at the same
>>>> time and can cause memory corruption.
>>>>
>>>> Let's not call cp_free if we are currently processing a channel program.
>>>
>>> The check around this cp_free() call is for a solicited interrupt, so
>>> it's presumably in response to a SSCH we issued.  But if we're still
>>> processing a CP, then we hadn't issued the SSCH to the hardware yet.  So
>>> what is this interrupt for?  Do the contents of irb.cpa provide any
>>> clues, perhaps if it's in the current cp or for someone else?
>>>
>>
>> I don't think the interrupt is in response to an ssch but rather due to
>> an csch/hsch.
>>
>>>>
>>>> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
>>>> ---
>>>>
>>>> I have been running my test overnight with this patch and I haven't
>>>> seen the stack traces that I mentioned about earlier. I would like
>>>> to get some reviews on this and also if this is the right thing to
>>>> do?
>>>>
>>>> Thanks
>>>> Farhan
>>>>
>>>>    drivers/s390/cio/vfio_ccw_drv.c | 2 +-
>>>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/s390/cio/vfio_ccw_drv.c
>>>> b/drivers/s390/cio/vfio_ccw_drv.c
>>>> index 66a66ac..61ece3f 100644
>>>> --- a/drivers/s390/cio/vfio_ccw_drv.c
>>>> +++ b/drivers/s390/cio/vfio_ccw_drv.c
>>>> @@ -88,7 +88,7 @@ static void vfio_ccw_sch_io_todo(struct work_struct
>>>> *work)
>>>>                 (SCSW_ACTL_DEVACT | SCSW_ACTL_SCHACT));
>>>>        if (scsw_is_solicited(&irb->scsw)) {
>>>>            cp_update_scsw(&private->cp, &irb->scsw);
>>>
>>> As I alluded earlier, do we know this irb is for this cp?  If no, what
>>> does this function end up putting in the scsw?
>>>
>>>> -        if (is_final)
>>>> +        if (is_final && private->state != VFIO_CCW_STATE_CP_PROCESSING)
>>>
>>> In looking at how we set this state, and how we exit it, I see we do:
>>>
>>> if SSCH got CC0, CP_PROCESSING -> CP_PENDING
>>> if SSCH got !CC0, CP_PROCESSING -> IDLE
>>>
>>> While the first scenario happens immediately after the SSCH instruction,
>>> I guess it could be just tiny enough, like the io_trigger FSM patch I
>>> sent a few weeks ago.
>>>
>>> Meanwhile, the latter happens way after we return from the jump table.
>>> So that scenario leaves considerable time for such an interrupt to
>>> occur, though I don't understand why it would if we got a CC(1-3) on the
>>> SSCH.
>>>
>>> And anyway, the return from fsm_io_helper() in that case will also call
>>> cp_free().  So why does the cp->initialized check provide protection
>>> from a double-free in that direction, but not here?  I'm confused.
>>
>> I have a theory where I think it's possible to have 2 different threads
>> executing cp_free
>>
>> If we start with private->state == IDLE and the guest issues a
>> clear/halt and then an ssch
>>
>> - clear/halt will be issued to hardware, and if succeeds we will return
>> cc=0 to guest
>>
>> - the guest can then issue ssch
> 
> It can issue whatever it wants, but shouldn't the SSCH get a CC2 until
> the halt/clear pending state is cleared?  Hrm, fsm_io_request() doesn't
> look.  Rather, it (fsm_io_helper()) relies on the CC2 to be signalled
> from the SSCH issued to the device.  There's a lot of stuff that happens
> before we get to that point.

Yes, and stuff that happens is memory allocation, pinning and other 
things which can take time.

> 
> I'm wondering if there's a way we could/should return the SSCH here
> before we do any processing.  After all, until the interrupt on the
> workqueue is processed, we are busy.
> 

you mean return the csch/hsch before processing the ssch? Maybe we could 
extend CP_PENDING state, to just PENDING and use that to reject any ssch 
if we have a pending csch/hsch?

>>
>> - we get an interrupt for csch/hsch and we queue the interrupt in the
>> workqueue
>>
>> - we start processing the ssch and then at the same time another cpu
>> could be working on the
>> interrupt>
>>
>> Thread 1                                        Thread 2
>> --------                                        --------
>>
>> fsm_io_request                                  vfio_ccw_sch_io_todo
>>      cp_init                                         cp_free
>>      cp_prefetch
>>      fsm_io_helper
>>          cp_free
>>
>>
>>
>> The test that I am trying is with a guest running an fio workload, while
>> at the same time stressing the error recovery path in the guest. So
>> there is a lot of ssch and lot of csch.
>>
>> Of course I don't think my patch completely solves the problem, I think
>> it just makes the window narrower. I just wanted to get a discussion
>> started :)
>>
>>
>> Now that I am thinking more about it, I think we might have to protect
>> cp with it's own mutex.
> 
> That seems like a big hammer, and I wonder if the existing SCHIB/FSM/CP
> state data doesn't provide that information to us.  But I gotta wander
> around some code before I say.

Any ideas are welcome :)

> 
>>
>> Thanks
>> Farhan
>>
>>
>>>
>>>>                cp_free(&private->cp);
>>>>        }
>>>>        mutex_lock(&private->io_mutex);
>>>>
>>>
>>
>
Cornelia Huck June 24, 2019, 9:42 a.m. UTC | #7
On Fri, 21 Jun 2019 14:34:10 -0400
Farhan Ali <alifm@linux.ibm.com> wrote:

> On 06/21/2019 01:40 PM, Eric Farman wrote:
> > 
> > 
> > On 6/21/19 10:17 AM, Farhan Ali wrote:  
> >>
> >>
> >> On 06/20/2019 04:27 PM, Eric Farman wrote:  
> >>>
> >>>
> >>> On 6/20/19 3:40 PM, Farhan Ali wrote:  
> >>>> There is a small window where it's possible that an interrupt can
> >>>> arrive and can call cp_free, while we are still processing a channel
> >>>> program (i.e allocating memory, pinnging pages, translating  
> >>>
> >>> s/pinnging/pinning/
> >>>  
> >>>> addresses etc). This can lead to allocating and freeing at the same
> >>>> time and can cause memory corruption.
> >>>>
> >>>> Let's not call cp_free if we are currently processing a channel program.  
> >>>
> >>> The check around this cp_free() call is for a solicited interrupt, so
> >>> it's presumably in response to a SSCH we issued.  But if we're still
> >>> processing a CP, then we hadn't issued the SSCH to the hardware yet.  So
> >>> what is this interrupt for?  Do the contents of irb.cpa provide any
> >>> clues, perhaps if it's in the current cp or for someone else?
> >>>  
> >>
> >> I don't think the interrupt is in response to an ssch but rather due to
> >> an csch/hsch.

The solicited check only checks if it is solicited. It can be for any
channel I/O instruction that causes an interrupt... we probably should
adapt the check.

> >>  
> >>>>
> >>>> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
> >>>> ---
> >>>>
> >>>> I have been running my test overnight with this patch and I haven't
> >>>> seen the stack traces that I mentioned about earlier. I would like
> >>>> to get some reviews on this and also if this is the right thing to
> >>>> do?
> >>>>
> >>>> Thanks
> >>>> Farhan
> >>>>
> >>>>    drivers/s390/cio/vfio_ccw_drv.c | 2 +-
> >>>>    1 file changed, 1 insertion(+), 1 deletion(-)
> >>>>
> >>>> diff --git a/drivers/s390/cio/vfio_ccw_drv.c
> >>>> b/drivers/s390/cio/vfio_ccw_drv.c
> >>>> index 66a66ac..61ece3f 100644
> >>>> --- a/drivers/s390/cio/vfio_ccw_drv.c
> >>>> +++ b/drivers/s390/cio/vfio_ccw_drv.c
> >>>> @@ -88,7 +88,7 @@ static void vfio_ccw_sch_io_todo(struct work_struct
> >>>> *work)
> >>>>                 (SCSW_ACTL_DEVACT | SCSW_ACTL_SCHACT));
> >>>>        if (scsw_is_solicited(&irb->scsw)) {
> >>>>            cp_update_scsw(&private->cp, &irb->scsw);  
> >>>
> >>> As I alluded earlier, do we know this irb is for this cp?  If no, what
> >>> does this function end up putting in the scsw?

Yes, I think this also needs to check whether we have at least a prior
start function around. (We use the orb provided by the guest; maybe we
should check if that intparm is set in the irb?)

> >>>  
> >>>> -        if (is_final)
> >>>> +        if (is_final && private->state != VFIO_CCW_STATE_CP_PROCESSING)  
> >>>
> >>> In looking at how we set this state, and how we exit it, I see we do:
> >>>
> >>> if SSCH got CC0, CP_PROCESSING -> CP_PENDING
> >>> if SSCH got !CC0, CP_PROCESSING -> IDLE
> >>>
> >>> While the first scenario happens immediately after the SSCH instruction,
> >>> I guess it could be just tiny enough, like the io_trigger FSM patch I
> >>> sent a few weeks ago.
> >>>
> >>> Meanwhile, the latter happens way after we return from the jump table.
> >>> So that scenario leaves considerable time for such an interrupt to
> >>> occur, though I don't understand why it would if we got a CC(1-3) on the
> >>> SSCH.
> >>>
> >>> And anyway, the return from fsm_io_helper() in that case will also call
> >>> cp_free().  So why does the cp->initialized check provide protection
> >>> from a double-free in that direction, but not here?  I'm confused.  
> >>
> >> I have a theory where I think it's possible to have 2 different threads
> >> executing cp_free
> >>
> >> If we start with private->state == IDLE and the guest issues a
> >> clear/halt and then an ssch
> >>
> >> - clear/halt will be issued to hardware, and if succeeds we will return
> >> cc=0 to guest
> >>
> >> - the guest can then issue ssch  
> > 
> > It can issue whatever it wants, but shouldn't the SSCH get a CC2 until
> > the halt/clear pending state is cleared?  Hrm, fsm_io_request() doesn't
> > look.  Rather, it (fsm_io_helper()) relies on the CC2 to be signalled
> > from the SSCH issued to the device.  There's a lot of stuff that happens
> > before we get to that point.  
> 
> Yes, and stuff that happens is memory allocation, pinning and other 
> things which can take time.
> 
> > 
> > I'm wondering if there's a way we could/should return the SSCH here
> > before we do any processing.  After all, until the interrupt on the
> > workqueue is processed, we are busy.
> >   
> 
> you mean return the csch/hsch before processing the ssch? Maybe we could 
> extend CP_PENDING state, to just PENDING and use that to reject any ssch 
> if we have a pending csch/hsch?

I'd probably not rely on the state for this. Maybe we could look at the
work queue? But it might be that the check for the intparm I suggested
above is already enough.

> 
> >>
> >> - we get an interrupt for csch/hsch and we queue the interrupt in the
> >> workqueue
> >>
> >> - we start processing the ssch and then at the same time another cpu
> >> could be working on the  
> >> interrupt>  
> >>
> >> Thread 1                                        Thread 2
> >> --------                                        --------
> >>
> >> fsm_io_request                                  vfio_ccw_sch_io_todo
> >>      cp_init                                         cp_free
> >>      cp_prefetch
> >>      fsm_io_helper
> >>          cp_free
> >>
> >>
> >>
> >> The test that I am trying is with a guest running an fio workload, while
> >> at the same time stressing the error recovery path in the guest. So
> >> there is a lot of ssch and lot of csch.
> >>
> >> Of course I don't think my patch completely solves the problem, I think
> >> it just makes the window narrower. I just wanted to get a discussion
> >> started :)
> >>
> >>
> >> Now that I am thinking more about it, I think we might have to protect
> >> cp with it's own mutex.  
> > 
> > That seems like a big hammer, and I wonder if the existing SCHIB/FSM/CP
> > state data doesn't provide that information to us.  But I gotta wander
> > around some code before I say.  
> 
> Any ideas are welcome :)

See above :) That certainly looks like a much smaller hammer.

> 
> >   
> >>
> >> Thanks
> >> Farhan
> >>
> >>  
> >>>  
> >>>>                cp_free(&private->cp);
> >>>>        }
> >>>>        mutex_lock(&private->io_mutex);
> >>>>  
> >>>  
> >>  
> >
Cornelia Huck June 24, 2019, 10:05 a.m. UTC | #8
On Mon, 24 Jun 2019 11:42:31 +0200
Cornelia Huck <cohuck@redhat.com> wrote:

> On Fri, 21 Jun 2019 14:34:10 -0400
> Farhan Ali <alifm@linux.ibm.com> wrote:
> 
> > On 06/21/2019 01:40 PM, Eric Farman wrote:
> > > 
> > > 
> > > On 6/21/19 10:17 AM, Farhan Ali wrote:  
> > >>
> > >>
> > >> On 06/20/2019 04:27 PM, Eric Farman wrote:  
> > >>>
> > >>>
> > >>> On 6/20/19 3:40 PM, Farhan Ali wrote:  
> > >>>> There is a small window where it's possible that an interrupt can
> > >>>> arrive and can call cp_free, while we are still processing a channel
> > >>>> program (i.e allocating memory, pinnging pages, translating  
> > >>>
> > >>> s/pinnging/pinning/
> > >>>  
> > >>>> addresses etc). This can lead to allocating and freeing at the same
> > >>>> time and can cause memory corruption.
> > >>>>
> > >>>> Let's not call cp_free if we are currently processing a channel program.  
> > >>>
> > >>> The check around this cp_free() call is for a solicited interrupt, so
> > >>> it's presumably in response to a SSCH we issued.  But if we're still
> > >>> processing a CP, then we hadn't issued the SSCH to the hardware yet.  So
> > >>> what is this interrupt for?  Do the contents of irb.cpa provide any
> > >>> clues, perhaps if it's in the current cp or for someone else?
> > >>>  
> > >>
> > >> I don't think the interrupt is in response to an ssch but rather due to
> > >> an csch/hsch.
> 
> The solicited check only checks if it is solicited. It can be for any
> channel I/O instruction that causes an interrupt... we probably should
> adapt the check.
> 
> > >>  
> > >>>>
> > >>>> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
> > >>>> ---
> > >>>>
> > >>>> I have been running my test overnight with this patch and I haven't
> > >>>> seen the stack traces that I mentioned about earlier. I would like
> > >>>> to get some reviews on this and also if this is the right thing to
> > >>>> do?
> > >>>>
> > >>>> Thanks
> > >>>> Farhan
> > >>>>
> > >>>>    drivers/s390/cio/vfio_ccw_drv.c | 2 +-
> > >>>>    1 file changed, 1 insertion(+), 1 deletion(-)
> > >>>>
> > >>>> diff --git a/drivers/s390/cio/vfio_ccw_drv.c
> > >>>> b/drivers/s390/cio/vfio_ccw_drv.c
> > >>>> index 66a66ac..61ece3f 100644
> > >>>> --- a/drivers/s390/cio/vfio_ccw_drv.c
> > >>>> +++ b/drivers/s390/cio/vfio_ccw_drv.c
> > >>>> @@ -88,7 +88,7 @@ static void vfio_ccw_sch_io_todo(struct work_struct
> > >>>> *work)
> > >>>>                 (SCSW_ACTL_DEVACT | SCSW_ACTL_SCHACT));
> > >>>>        if (scsw_is_solicited(&irb->scsw)) {
> > >>>>            cp_update_scsw(&private->cp, &irb->scsw);  
> > >>>
> > >>> As I alluded earlier, do we know this irb is for this cp?  If no, what
> > >>> does this function end up putting in the scsw?
> 
> Yes, I think this also needs to check whether we have at least a prior
> start function around. (We use the orb provided by the guest; maybe we
> should check if that intparm is set in the irb?)

Hrm; not so easy as we always set the intparm to the address of the
subchannel structure... 

Maybe check if we have have one of the conditions of the large table
16-6 and correlate to the ccw address? Or is it enough to check the
function control? (Don't remember when the hardware resets it.)

> 
> > >>>  
> > >>>> -        if (is_final)
> > >>>> +        if (is_final && private->state != VFIO_CCW_STATE_CP_PROCESSING)  
> > >>>
> > >>> In looking at how we set this state, and how we exit it, I see we do:
> > >>>
> > >>> if SSCH got CC0, CP_PROCESSING -> CP_PENDING
> > >>> if SSCH got !CC0, CP_PROCESSING -> IDLE
> > >>>
> > >>> While the first scenario happens immediately after the SSCH instruction,
> > >>> I guess it could be just tiny enough, like the io_trigger FSM patch I
> > >>> sent a few weeks ago.
> > >>>
> > >>> Meanwhile, the latter happens way after we return from the jump table.
> > >>> So that scenario leaves considerable time for such an interrupt to
> > >>> occur, though I don't understand why it would if we got a CC(1-3) on the
> > >>> SSCH.
> > >>>
> > >>> And anyway, the return from fsm_io_helper() in that case will also call
> > >>> cp_free().  So why does the cp->initialized check provide protection
> > >>> from a double-free in that direction, but not here?  I'm confused.  
> > >>
> > >> I have a theory where I think it's possible to have 2 different threads
> > >> executing cp_free
> > >>
> > >> If we start with private->state == IDLE and the guest issues a
> > >> clear/halt and then an ssch
> > >>
> > >> - clear/halt will be issued to hardware, and if succeeds we will return
> > >> cc=0 to guest
> > >>
> > >> - the guest can then issue ssch  
> > > 
> > > It can issue whatever it wants, but shouldn't the SSCH get a CC2 until
> > > the halt/clear pending state is cleared?  Hrm, fsm_io_request() doesn't
> > > look.  Rather, it (fsm_io_helper()) relies on the CC2 to be signalled
> > > from the SSCH issued to the device.  There's a lot of stuff that happens
> > > before we get to that point.  
> > 
> > Yes, and stuff that happens is memory allocation, pinning and other 
> > things which can take time.
> > 
> > > 
> > > I'm wondering if there's a way we could/should return the SSCH here
> > > before we do any processing.  After all, until the interrupt on the
> > > workqueue is processed, we are busy.
> > >   
> > 
> > you mean return the csch/hsch before processing the ssch? Maybe we could 
> > extend CP_PENDING state, to just PENDING and use that to reject any ssch 
> > if we have a pending csch/hsch?
> 
> I'd probably not rely on the state for this. Maybe we could look at the
> work queue? But it might be that the check for the intparm I suggested
> above is already enough.
> 
> > 
> > >>
> > >> - we get an interrupt for csch/hsch and we queue the interrupt in the
> > >> workqueue
> > >>
> > >> - we start processing the ssch and then at the same time another cpu
> > >> could be working on the  
> > >> interrupt>  
> > >>
> > >> Thread 1                                        Thread 2
> > >> --------                                        --------
> > >>
> > >> fsm_io_request                                  vfio_ccw_sch_io_todo
> > >>      cp_init                                         cp_free
> > >>      cp_prefetch
> > >>      fsm_io_helper
> > >>          cp_free
> > >>
> > >>
> > >>
> > >> The test that I am trying is with a guest running an fio workload, while
> > >> at the same time stressing the error recovery path in the guest. So
> > >> there is a lot of ssch and lot of csch.
> > >>
> > >> Of course I don't think my patch completely solves the problem, I think
> > >> it just makes the window narrower. I just wanted to get a discussion
> > >> started :)
> > >>
> > >>
> > >> Now that I am thinking more about it, I think we might have to protect
> > >> cp with it's own mutex.  
> > > 
> > > That seems like a big hammer, and I wonder if the existing SCHIB/FSM/CP
> > > state data doesn't provide that information to us.  But I gotta wander
> > > around some code before I say.  
> > 
> > Any ideas are welcome :)
> 
> See above :) That certainly looks like a much smaller hammer.
> 
> > 
> > >   
> > >>
> > >> Thanks
> > >> Farhan
> > >>
> > >>  
> > >>>  
> > >>>>                cp_free(&private->cp);
> > >>>>        }
> > >>>>        mutex_lock(&private->io_mutex);
> > >>>>  
> > >>>  
> > >>  
> > >   
>
Halil Pasic June 24, 2019, 11:31 a.m. UTC | #9
On Mon, 24 Jun 2019 11:42:31 +0200
Cornelia Huck <cohuck@redhat.com> wrote:

> > > It can issue whatever it wants, but shouldn't the SSCH get a CC2 until
> > > the halt/clear pending state is cleared?  Hrm, fsm_io_request() doesn't
> > > look.  Rather, it (fsm_io_helper()) relies on the CC2 to be signalled
> > > from the SSCH issued to the device.  There's a lot of stuff that happens
> > > before we get to that point.

Yes CC2 would be the correct thing to do in this situation. Doesn't QEMU
do some sort of logic of this kind already. AFAIR QEMU should reject the
SSCH because it knows that the halt/clear function is in progress or
pending. Or am I worng?

Even if QEMU does it, the kernel must not rely on QEMU or
whatever userspace doing it correctly. What I'm trying to say, if QEMU
can do it vfio_ccw should do it as well.

I'm almost always in favor of sticking close to what PoP says.

    
> > 
> > Yes, and stuff that happens is memory allocation, pinning and other 
> > things which can take time.
> >   
> > > 
> > > I'm wondering if there's a way we could/should return the SSCH here
> > > before we do any processing.  After all, until the interrupt on the
> > > workqueue is processed, we are busy.
> > >     
> > 
> > you mean return the csch/hsch before processing the ssch? Maybe we could 
> > extend CP_PENDING state, to just PENDING and use that to reject any ssch 
> > if we have a pending csch/hsch?  
> 
> I'd probably not rely on the state for this. Maybe we could look at the
> work queue? But it might be that the check for the intparm I suggested
> above is already enough.

PoP says function control bits are what matter here:
"""
Condition code 2 is set, and no other action is
taken, when a start, halt, or clear function is currently
in progress at the subchannel (see “Function Control
(FC)” on page 13).
"""

Regards,
Halil
Cornelia Huck June 24, 2019, 11:46 a.m. UTC | #10
On Mon, 24 Jun 2019 12:05:14 +0200
Cornelia Huck <cohuck@redhat.com> wrote:

> On Mon, 24 Jun 2019 11:42:31 +0200
> Cornelia Huck <cohuck@redhat.com> wrote:
> 
> > On Fri, 21 Jun 2019 14:34:10 -0400
> > Farhan Ali <alifm@linux.ibm.com> wrote:
> >   
> > > On 06/21/2019 01:40 PM, Eric Farman wrote:  
> > > > 
> > > > 
> > > > On 6/21/19 10:17 AM, Farhan Ali wrote:    
> > > >>
> > > >>
> > > >> On 06/20/2019 04:27 PM, Eric Farman wrote:    
> > > >>>
> > > >>>
> > > >>> On 6/20/19 3:40 PM, Farhan Ali wrote:    

> > > >>>> diff --git a/drivers/s390/cio/vfio_ccw_drv.c
> > > >>>> b/drivers/s390/cio/vfio_ccw_drv.c
> > > >>>> index 66a66ac..61ece3f 100644
> > > >>>> --- a/drivers/s390/cio/vfio_ccw_drv.c
> > > >>>> +++ b/drivers/s390/cio/vfio_ccw_drv.c
> > > >>>> @@ -88,7 +88,7 @@ static void vfio_ccw_sch_io_todo(struct work_struct
> > > >>>> *work)
> > > >>>>                 (SCSW_ACTL_DEVACT | SCSW_ACTL_SCHACT));
> > > >>>>        if (scsw_is_solicited(&irb->scsw)) {
> > > >>>>            cp_update_scsw(&private->cp, &irb->scsw);    
> > > >>>
> > > >>> As I alluded earlier, do we know this irb is for this cp?  If no, what
> > > >>> does this function end up putting in the scsw?  
> > 
> > Yes, I think this also needs to check whether we have at least a prior
> > start function around. (We use the orb provided by the guest; maybe we
> > should check if that intparm is set in the irb?)  
> 
> Hrm; not so easy as we always set the intparm to the address of the
> subchannel structure... 
> 
> Maybe check if we have have one of the conditions of the large table
> 16-6 and correlate to the ccw address? Or is it enough to check the
> function control? (Don't remember when the hardware resets it.)

Nope, we cannot look at the function control, as csch clears any set
start function bit :( (see "Function Control", pg 16-13)

I think this problem mostly boils down to "csch clears pending status;
therefore, we may only get one interrupt, even though there had been a
start function going on". If we only go with what the hardware gives
us, I don't see a way to distinguish "clear with a prior start" from
"clear only". Maybe we want to track an "issued" status in the cp?
Cornelia Huck June 24, 2019, 12:07 p.m. UTC | #11
On Mon, 24 Jun 2019 13:46:22 +0200
Cornelia Huck <cohuck@redhat.com> wrote:

> On Mon, 24 Jun 2019 12:05:14 +0200
> Cornelia Huck <cohuck@redhat.com> wrote:
> 
> > On Mon, 24 Jun 2019 11:42:31 +0200
> > Cornelia Huck <cohuck@redhat.com> wrote:
> >   
> > > On Fri, 21 Jun 2019 14:34:10 -0400
> > > Farhan Ali <alifm@linux.ibm.com> wrote:
> > >     
> > > > On 06/21/2019 01:40 PM, Eric Farman wrote:    
> > > > > 
> > > > > 
> > > > > On 6/21/19 10:17 AM, Farhan Ali wrote:      
> > > > >>
> > > > >>
> > > > >> On 06/20/2019 04:27 PM, Eric Farman wrote:      
> > > > >>>
> > > > >>>
> > > > >>> On 6/20/19 3:40 PM, Farhan Ali wrote:      
> 
> > > > >>>> diff --git a/drivers/s390/cio/vfio_ccw_drv.c
> > > > >>>> b/drivers/s390/cio/vfio_ccw_drv.c
> > > > >>>> index 66a66ac..61ece3f 100644
> > > > >>>> --- a/drivers/s390/cio/vfio_ccw_drv.c
> > > > >>>> +++ b/drivers/s390/cio/vfio_ccw_drv.c
> > > > >>>> @@ -88,7 +88,7 @@ static void vfio_ccw_sch_io_todo(struct work_struct
> > > > >>>> *work)
> > > > >>>>                 (SCSW_ACTL_DEVACT | SCSW_ACTL_SCHACT));
> > > > >>>>        if (scsw_is_solicited(&irb->scsw)) {
> > > > >>>>            cp_update_scsw(&private->cp, &irb->scsw);      
> > > > >>>
> > > > >>> As I alluded earlier, do we know this irb is for this cp?  If no, what
> > > > >>> does this function end up putting in the scsw?    
> > > 
> > > Yes, I think this also needs to check whether we have at least a prior
> > > start function around. (We use the orb provided by the guest; maybe we
> > > should check if that intparm is set in the irb?)    
> > 
> > Hrm; not so easy as we always set the intparm to the address of the
> > subchannel structure... 
> > 
> > Maybe check if we have have one of the conditions of the large table
> > 16-6 and correlate to the ccw address? Or is it enough to check the
> > function control? (Don't remember when the hardware resets it.)  
> 
> Nope, we cannot look at the function control, as csch clears any set
> start function bit :( (see "Function Control", pg 16-13)
> 
> I think this problem mostly boils down to "csch clears pending status;
> therefore, we may only get one interrupt, even though there had been a
> start function going on". If we only go with what the hardware gives
> us, I don't see a way to distinguish "clear with a prior start" from
> "clear only". Maybe we want to track an "issued" status in the cp?

Sorry for replying to myself again :), but maybe we should simply call
cp_free() if we got cc 0 from a csch? Any start function has been
terminated at the subchannel during successful execution of csch, and
cp_free does nothing if !cp->initialized, so we should hopefully be
safe there as well. We can then add a check for the start function in
the function control in the check above and should be fine, I think.
Farhan Ali June 24, 2019, 2:44 p.m. UTC | #12
On 06/24/2019 08:07 AM, Cornelia Huck wrote:
> On Mon, 24 Jun 2019 13:46:22 +0200
> Cornelia Huck <cohuck@redhat.com> wrote:
> 
>> On Mon, 24 Jun 2019 12:05:14 +0200
>> Cornelia Huck <cohuck@redhat.com> wrote:
>>
>>> On Mon, 24 Jun 2019 11:42:31 +0200
>>> Cornelia Huck <cohuck@redhat.com> wrote:
>>>    
>>>> On Fri, 21 Jun 2019 14:34:10 -0400
>>>> Farhan Ali <alifm@linux.ibm.com> wrote:
>>>>      
>>>>> On 06/21/2019 01:40 PM, Eric Farman wrote:
>>>>>>
>>>>>>
>>>>>> On 6/21/19 10:17 AM, Farhan Ali wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 06/20/2019 04:27 PM, Eric Farman wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 6/20/19 3:40 PM, Farhan Ali wrote:
>>
>>>>>>>>> diff --git a/drivers/s390/cio/vfio_ccw_drv.c
>>>>>>>>> b/drivers/s390/cio/vfio_ccw_drv.c
>>>>>>>>> index 66a66ac..61ece3f 100644
>>>>>>>>> --- a/drivers/s390/cio/vfio_ccw_drv.c
>>>>>>>>> +++ b/drivers/s390/cio/vfio_ccw_drv.c
>>>>>>>>> @@ -88,7 +88,7 @@ static void vfio_ccw_sch_io_todo(struct work_struct
>>>>>>>>> *work)
>>>>>>>>>                  (SCSW_ACTL_DEVACT | SCSW_ACTL_SCHACT));
>>>>>>>>>         if (scsw_is_solicited(&irb->scsw)) {
>>>>>>>>>             cp_update_scsw(&private->cp, &irb->scsw);
>>>>>>>>
>>>>>>>> As I alluded earlier, do we know this irb is for this cp?  If no, what
>>>>>>>> does this function end up putting in the scsw?
>>>>
>>>> Yes, I think this also needs to check whether we have at least a prior
>>>> start function around. (We use the orb provided by the guest; maybe we
>>>> should check if that intparm is set in the irb?)
>>>
>>> Hrm; not so easy as we always set the intparm to the address of the
>>> subchannel structure...
>>>
>>> Maybe check if we have have one of the conditions of the large table
>>> 16-6 and correlate to the ccw address? Or is it enough to check the
>>> function control? (Don't remember when the hardware resets it.)
>>
>> Nope, we cannot look at the function control, as csch clears any set
>> start function bit :( (see "Function Control", pg 16-13)
>>
>> I think this problem mostly boils down to "csch clears pending status;
>> therefore, we may only get one interrupt, even though there had been a
>> start function going on". If we only go with what the hardware gives
>> us, I don't see a way to distinguish "clear with a prior start" from
>> "clear only". Maybe we want to track an "issued" status in the cp?
> 
> Sorry for replying to myself again :), but maybe we should simply call
> cp_free() if we got cc 0 from a csch? Any start function has been
> terminated at the subchannel during successful execution of csch, and
> cp_free does nothing if !cp->initialized, so we should hopefully be
> safe there as well. We can then add a check for the start function in
> the function control in the check above and should be fine, I think.
> 
> 

So you mean not call cp_free in vfio_ccw_sch_io_todo, and instead call 
cp_free for a cc=0 for csch (and hsch) ?

Won't we end up with memory leak for a successful for ssch then?

But even if we don't remove the cp_free from vfio_ccw_sch_io_todo, I am 
not sure if your suggestion will fix the problem. The problem here is 
that we can call vfio_ccw_sch_io_todo (for a clear or halt interrupt) at 
the same time we are handling an ssch request. So depending on the order 
of the operations we could still end up calling cp_free from both from 
threads (i refer to the threads I mentioned in response to Eric's 
earlier email).

Another thing that concerns me is that vfio-ccw can also issue csch/hsch 
in the quiesce path, independently of what the guest issues. So in that 
case we could have a similar scenario to processing an ssch request and 
issuing halt/clear in parallel. But maybe I am being paranoid :)

Thanks
Farhan
Cornelia Huck June 24, 2019, 3:09 p.m. UTC | #13
On Mon, 24 Jun 2019 10:44:17 -0400
Farhan Ali <alifm@linux.ibm.com> wrote:

> On 06/24/2019 08:07 AM, Cornelia Huck wrote:
> > On Mon, 24 Jun 2019 13:46:22 +0200
> > Cornelia Huck <cohuck@redhat.com> wrote:
> >   
> >> On Mon, 24 Jun 2019 12:05:14 +0200
> >> Cornelia Huck <cohuck@redhat.com> wrote:
> >>  
> >>> On Mon, 24 Jun 2019 11:42:31 +0200
> >>> Cornelia Huck <cohuck@redhat.com> wrote:
> >>>      
> >>>> On Fri, 21 Jun 2019 14:34:10 -0400
> >>>> Farhan Ali <alifm@linux.ibm.com> wrote:
> >>>>        
> >>>>> On 06/21/2019 01:40 PM, Eric Farman wrote:  
> >>>>>>
> >>>>>>
> >>>>>> On 6/21/19 10:17 AM, Farhan Ali wrote:  
> >>>>>>>
> >>>>>>>
> >>>>>>> On 06/20/2019 04:27 PM, Eric Farman wrote:  
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On 6/20/19 3:40 PM, Farhan Ali wrote:  
> >>  
> >>>>>>>>> diff --git a/drivers/s390/cio/vfio_ccw_drv.c
> >>>>>>>>> b/drivers/s390/cio/vfio_ccw_drv.c
> >>>>>>>>> index 66a66ac..61ece3f 100644
> >>>>>>>>> --- a/drivers/s390/cio/vfio_ccw_drv.c
> >>>>>>>>> +++ b/drivers/s390/cio/vfio_ccw_drv.c
> >>>>>>>>> @@ -88,7 +88,7 @@ static void vfio_ccw_sch_io_todo(struct work_struct
> >>>>>>>>> *work)
> >>>>>>>>>                  (SCSW_ACTL_DEVACT | SCSW_ACTL_SCHACT));
> >>>>>>>>>         if (scsw_is_solicited(&irb->scsw)) {
> >>>>>>>>>             cp_update_scsw(&private->cp, &irb->scsw);  
> >>>>>>>>
> >>>>>>>> As I alluded earlier, do we know this irb is for this cp?  If no, what
> >>>>>>>> does this function end up putting in the scsw?  
> >>>>
> >>>> Yes, I think this also needs to check whether we have at least a prior
> >>>> start function around. (We use the orb provided by the guest; maybe we
> >>>> should check if that intparm is set in the irb?)  
> >>>
> >>> Hrm; not so easy as we always set the intparm to the address of the
> >>> subchannel structure...
> >>>
> >>> Maybe check if we have have one of the conditions of the large table
> >>> 16-6 and correlate to the ccw address? Or is it enough to check the
> >>> function control? (Don't remember when the hardware resets it.)  
> >>
> >> Nope, we cannot look at the function control, as csch clears any set
> >> start function bit :( (see "Function Control", pg 16-13)
> >>
> >> I think this problem mostly boils down to "csch clears pending status;
> >> therefore, we may only get one interrupt, even though there had been a
> >> start function going on". If we only go with what the hardware gives
> >> us, I don't see a way to distinguish "clear with a prior start" from
> >> "clear only". Maybe we want to track an "issued" status in the cp?  
> > 
> > Sorry for replying to myself again :), but maybe we should simply call
> > cp_free() if we got cc 0 from a csch? Any start function has been
> > terminated at the subchannel during successful execution of csch, and
> > cp_free does nothing if !cp->initialized, so we should hopefully be
> > safe there as well. We can then add a check for the start function in
> > the function control in the check above and should be fine, I think.
> > 
> >   
> 
> So you mean not call cp_free in vfio_ccw_sch_io_todo, and instead call 
> cp_free for a cc=0 for csch (and hsch) ?
> 
> Won't we end up with memory leak for a successful for ssch then?

No; both:

- free if cc=0 for csch (as this clears the status; hsch doesn't)
- free in _todo if the start function is set in the irb and the status
  is final

> 
> But even if we don't remove the cp_free from vfio_ccw_sch_io_todo, I am 
> not sure if your suggestion will fix the problem. The problem here is 
> that we can call vfio_ccw_sch_io_todo (for a clear or halt interrupt) at 
> the same time we are handling an ssch request. So depending on the order 
> of the operations we could still end up calling cp_free from both from 
> threads (i refer to the threads I mentioned in response to Eric's 
> earlier email).

What I don't see is why this is a problem with ->initialized; wasn't
the problem that we misinterpreted an interrupt for csch as one for a
not-yet-issued ssch?

> 
> Another thing that concerns me is that vfio-ccw can also issue csch/hsch 
> in the quiesce path, independently of what the guest issues. So in that 
> case we could have a similar scenario to processing an ssch request and 
> issuing halt/clear in parallel. But maybe I am being paranoid :)

I think the root problem is really trying to clear a cp while another
thread is trying to set it up. Should we maybe use something like rcu?
Farhan Ali June 24, 2019, 3:24 p.m. UTC | #14
On 06/24/2019 11:09 AM, Cornelia Huck wrote:
> On Mon, 24 Jun 2019 10:44:17 -0400
> Farhan Ali <alifm@linux.ibm.com> wrote:
> 
>> On 06/24/2019 08:07 AM, Cornelia Huck wrote:
>>> On Mon, 24 Jun 2019 13:46:22 +0200
>>> Cornelia Huck <cohuck@redhat.com> wrote:
>>>    
>>>> On Mon, 24 Jun 2019 12:05:14 +0200
>>>> Cornelia Huck <cohuck@redhat.com> wrote:
>>>>   
>>>>> On Mon, 24 Jun 2019 11:42:31 +0200
>>>>> Cornelia Huck <cohuck@redhat.com> wrote:
>>>>>       
>>>>>> On Fri, 21 Jun 2019 14:34:10 -0400
>>>>>> Farhan Ali <alifm@linux.ibm.com> wrote:
>>>>>>         
>>>>>>> On 06/21/2019 01:40 PM, Eric Farman wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 6/21/19 10:17 AM, Farhan Ali wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 06/20/2019 04:27 PM, Eric Farman wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 6/20/19 3:40 PM, Farhan Ali wrote:
>>>>   
>>>>>>>>>>> diff --git a/drivers/s390/cio/vfio_ccw_drv.c
>>>>>>>>>>> b/drivers/s390/cio/vfio_ccw_drv.c
>>>>>>>>>>> index 66a66ac..61ece3f 100644
>>>>>>>>>>> --- a/drivers/s390/cio/vfio_ccw_drv.c
>>>>>>>>>>> +++ b/drivers/s390/cio/vfio_ccw_drv.c
>>>>>>>>>>> @@ -88,7 +88,7 @@ static void vfio_ccw_sch_io_todo(struct work_struct
>>>>>>>>>>> *work)
>>>>>>>>>>>                   (SCSW_ACTL_DEVACT | SCSW_ACTL_SCHACT));
>>>>>>>>>>>          if (scsw_is_solicited(&irb->scsw)) {
>>>>>>>>>>>              cp_update_scsw(&private->cp, &irb->scsw);
>>>>>>>>>>
>>>>>>>>>> As I alluded earlier, do we know this irb is for this cp?  If no, what
>>>>>>>>>> does this function end up putting in the scsw?
>>>>>>
>>>>>> Yes, I think this also needs to check whether we have at least a prior
>>>>>> start function around. (We use the orb provided by the guest; maybe we
>>>>>> should check if that intparm is set in the irb?)
>>>>>
>>>>> Hrm; not so easy as we always set the intparm to the address of the
>>>>> subchannel structure...
>>>>>
>>>>> Maybe check if we have have one of the conditions of the large table
>>>>> 16-6 and correlate to the ccw address? Or is it enough to check the
>>>>> function control? (Don't remember when the hardware resets it.)
>>>>
>>>> Nope, we cannot look at the function control, as csch clears any set
>>>> start function bit :( (see "Function Control", pg 16-13)
>>>>
>>>> I think this problem mostly boils down to "csch clears pending status;
>>>> therefore, we may only get one interrupt, even though there had been a
>>>> start function going on". If we only go with what the hardware gives
>>>> us, I don't see a way to distinguish "clear with a prior start" from
>>>> "clear only". Maybe we want to track an "issued" status in the cp?
>>>
>>> Sorry for replying to myself again :), but maybe we should simply call
>>> cp_free() if we got cc 0 from a csch? Any start function has been
>>> terminated at the subchannel during successful execution of csch, and
>>> cp_free does nothing if !cp->initialized, so we should hopefully be
>>> safe there as well. We can then add a check for the start function in
>>> the function control in the check above and should be fine, I think.
>>>
>>>    
>>
>> So you mean not call cp_free in vfio_ccw_sch_io_todo, and instead call
>> cp_free for a cc=0 for csch (and hsch) ?
>>
>> Won't we end up with memory leak for a successful for ssch then?
> 
> No; both:
> 
> - free if cc=0 for csch (as this clears the status; hsch doesn't)
> - free in _todo if the start function is set in the irb and the status
>    is final
> 
>>
>> But even if we don't remove the cp_free from vfio_ccw_sch_io_todo, I am
>> not sure if your suggestion will fix the problem. The problem here is
>> that we can call vfio_ccw_sch_io_todo (for a clear or halt interrupt) at
>> the same time we are handling an ssch request. So depending on the order
>> of the operations we could still end up calling cp_free from both from
>> threads (i refer to the threads I mentioned in response to Eric's
>> earlier email).
> 
> What I don't see is why this is a problem with ->initialized; wasn't
> the problem that we misinterpreted an interrupt for csch as one for a
> not-yet-issued ssch?
> 

It's the order in which we do things, which could cause the problem. 
Since we queue interrupt handling in the workqueue, we could delay 
processing the csch interrupt. During this delay if ssch comes through, 
we might have already set ->initialized to true.

So when we get around to handling the interrupt in io_todo, we would go 
ahead and call cp_free. This would cause the problem of freeing the 
ccwchain list while we might be adding to it.

>>
>> Another thing that concerns me is that vfio-ccw can also issue csch/hsch
>> in the quiesce path, independently of what the guest issues. So in that
>> case we could have a similar scenario to processing an ssch request and
>> issuing halt/clear in parallel. But maybe I am being paranoid :)
> 
> I think the root problem is really trying to clear a cp while another
> thread is trying to set it up. Should we maybe use something like rcu?
> 
> 

Yes, this is the root problem. I am not too familiar with rcu locking, 
but what would be the benefit over a traditional mutex?

Thanks
Farhan
Cornelia Huck June 27, 2019, 9:14 a.m. UTC | #15
On Mon, 24 Jun 2019 11:24:16 -0400
Farhan Ali <alifm@linux.ibm.com> wrote:

> On 06/24/2019 11:09 AM, Cornelia Huck wrote:
> > On Mon, 24 Jun 2019 10:44:17 -0400
> > Farhan Ali <alifm@linux.ibm.com> wrote:

> >> But even if we don't remove the cp_free from vfio_ccw_sch_io_todo, I am
> >> not sure if your suggestion will fix the problem. The problem here is
> >> that we can call vfio_ccw_sch_io_todo (for a clear or halt interrupt) at
> >> the same time we are handling an ssch request. So depending on the order
> >> of the operations we could still end up calling cp_free from both from
> >> threads (i refer to the threads I mentioned in response to Eric's
> >> earlier email).  
> > 
> > What I don't see is why this is a problem with ->initialized; wasn't
> > the problem that we misinterpreted an interrupt for csch as one for a
> > not-yet-issued ssch?
> >   
> 
> It's the order in which we do things, which could cause the problem. 
> Since we queue interrupt handling in the workqueue, we could delay 
> processing the csch interrupt. During this delay if ssch comes through, 
> we might have already set ->initialized to true.
> 
> So when we get around to handling the interrupt in io_todo, we would go 
> ahead and call cp_free. This would cause the problem of freeing the 
> ccwchain list while we might be adding to it.
> 
> >>
> >> Another thing that concerns me is that vfio-ccw can also issue csch/hsch
> >> in the quiesce path, independently of what the guest issues. So in that
> >> case we could have a similar scenario to processing an ssch request and
> >> issuing halt/clear in parallel. But maybe I am being paranoid :)  
> > 
> > I think the root problem is really trying to clear a cp while another
> > thread is trying to set it up. Should we maybe use something like rcu?
> > 
> >   
> 
> Yes, this is the root problem. I am not too familiar with rcu locking, 
> but what would be the benefit over a traditional mutex?

I don't quite remember what I had been envisioning at the time (sorry,
the heat seems to make my brain a bit slushy :/), but I think we might
have two copies of the cp and use an rcu-ed pointer in the private
structure to point to one of the copies. If we make sure we've
synchronized on the pointer at interrupt time, we should be able to
free the old one in _todo and act on the new on when doing ssch. And
yes, I realize that this is awfully vague :)
Farhan Ali June 28, 2019, 1:05 p.m. UTC | #16
On 06/27/2019 05:14 AM, Cornelia Huck wrote:
> On Mon, 24 Jun 2019 11:24:16 -0400
> Farhan Ali <alifm@linux.ibm.com> wrote:
> 
>> On 06/24/2019 11:09 AM, Cornelia Huck wrote:
>>> On Mon, 24 Jun 2019 10:44:17 -0400
>>> Farhan Ali <alifm@linux.ibm.com> wrote:
> 
>>>> But even if we don't remove the cp_free from vfio_ccw_sch_io_todo, I am
>>>> not sure if your suggestion will fix the problem. The problem here is
>>>> that we can call vfio_ccw_sch_io_todo (for a clear or halt interrupt) at
>>>> the same time we are handling an ssch request. So depending on the order
>>>> of the operations we could still end up calling cp_free from both from
>>>> threads (i refer to the threads I mentioned in response to Eric's
>>>> earlier email).
>>>
>>> What I don't see is why this is a problem with ->initialized; wasn't
>>> the problem that we misinterpreted an interrupt for csch as one for a
>>> not-yet-issued ssch?
>>>    
>>
>> It's the order in which we do things, which could cause the problem.
>> Since we queue interrupt handling in the workqueue, we could delay
>> processing the csch interrupt. During this delay if ssch comes through,
>> we might have already set ->initialized to true.
>>
>> So when we get around to handling the interrupt in io_todo, we would go
>> ahead and call cp_free. This would cause the problem of freeing the
>> ccwchain list while we might be adding to it.
>>
>>>>
>>>> Another thing that concerns me is that vfio-ccw can also issue csch/hsch
>>>> in the quiesce path, independently of what the guest issues. So in that
>>>> case we could have a similar scenario to processing an ssch request and
>>>> issuing halt/clear in parallel. But maybe I am being paranoid :)
>>>
>>> I think the root problem is really trying to clear a cp while another
>>> thread is trying to set it up. Should we maybe use something like rcu?
>>>
>>>    
>>
>> Yes, this is the root problem. I am not too familiar with rcu locking,
>> but what would be the benefit over a traditional mutex?
> 
> I don't quite remember what I had been envisioning at the time (sorry,
> the heat seems to make my brain a bit slushy :/), but I think we might
> have two copies of the cp and use an rcu-ed pointer in the private
> structure to point to one of the copies. If we make sure we've
> synchronized on the pointer at interrupt time, we should be able to
> free the old one in _todo and act on the new on when doing ssch. And
> yes, I realize that this is awfully vague :)
> 
> 

Sorry for the delayed response. I was trying out few ideas, and I think 
the simplest one for me that worked and that makes sense is to 
explicitly add the check to see if the state == CP_PENDING when trying 
to free the cp (as mentioned by Halil in a separate thread).

When we are in the CP_PENDING state then we know for sure that we have a 
currently allocated cp and no other thread is working on it. So in the 
interrupt context, it should be okay to free cp.

I have prototyped with the mutex, but the code becomes too hairy. I 
looked into the rcu api and from what I understand about rcu it would 
provide advantage if we more readers than updaters. But in our case we 
really have 2 updaters, updating the cp at the same time.

In the meantime I also have some minor fixes while going over the code 
again :). I will post a v2 soon for review.

Thanks
Farhan
diff mbox series

Patch

diff --git a/drivers/s390/cio/vfio_ccw_drv.c b/drivers/s390/cio/vfio_ccw_drv.c
index 66a66ac..61ece3f 100644
--- a/drivers/s390/cio/vfio_ccw_drv.c
+++ b/drivers/s390/cio/vfio_ccw_drv.c
@@ -88,7 +88,7 @@  static void vfio_ccw_sch_io_todo(struct work_struct *work)
 		     (SCSW_ACTL_DEVACT | SCSW_ACTL_SCHACT));
 	if (scsw_is_solicited(&irb->scsw)) {
 		cp_update_scsw(&private->cp, &irb->scsw);
-		if (is_final)
+		if (is_final && private->state != VFIO_CCW_STATE_CP_PROCESSING)
 			cp_free(&private->cp);
 	}
 	mutex_lock(&private->io_mutex);