[RFC,v1,01/10] s390/cio: Fix cleanup of pfn_array alloc failure
diff mbox series

Message ID 20181109023937.96105-2-farman@linux.ibm.com
State Mainlined
Headers show
Series
  • vfio-ccw channel program rework
Related show

Commit Message

Eric Farman Nov. 9, 2018, 2:39 a.m. UTC
If pfn_array_alloc fails somehow, we need to release the pfn_array_table
that was malloc'd earlier.

Signed-off-by: Eric Farman <farman@linux.ibm.com>
---
 drivers/s390/cio/vfio_ccw_cp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Cornelia Huck Nov. 9, 2018, 10:45 a.m. UTC | #1
On Fri,  9 Nov 2018 03:39:28 +0100
Eric Farman <farman@linux.ibm.com> wrote:

> If pfn_array_alloc fails somehow, we need to release the pfn_array_table
> that was malloc'd earlier.
> 
> Signed-off-by: Eric Farman <farman@linux.ibm.com>
> ---
>  drivers/s390/cio/vfio_ccw_cp.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/s390/cio/vfio_ccw_cp.c b/drivers/s390/cio/vfio_ccw_cp.c
> index fd77e46eb3b2..ef5ab45d94b3 100644
> --- a/drivers/s390/cio/vfio_ccw_cp.c
> +++ b/drivers/s390/cio/vfio_ccw_cp.c
> @@ -528,7 +528,7 @@ static int ccwchain_fetch_direct(struct ccwchain *chain,
>  
>  	ret = pfn_array_alloc_pin(pat->pat_pa, cp->mdev, ccw->cda, ccw->count);
>  	if (ret < 0)
> -		goto out_init;
> +		goto out_unpin;
>  
>  	/* Translate this direct ccw to a idal ccw. */
>  	idaws = kcalloc(ret, sizeof(*idaws), GFP_DMA | GFP_KERNEL);

It's a bit confusing that this will also call vfio_unpin_pages() even
though there should not be any pinned pages at that point in time; but
from what I see, it should not hurt, so this patch is fine.
Eric Farman Nov. 9, 2018, 2:49 p.m. UTC | #2
On 11/09/2018 05:45 AM, Cornelia Huck wrote:
> On Fri,  9 Nov 2018 03:39:28 +0100
> Eric Farman <farman@linux.ibm.com> wrote:
> 
>> If pfn_array_alloc fails somehow, we need to release the pfn_array_table
>> that was malloc'd earlier.
>>
>> Signed-off-by: Eric Farman <farman@linux.ibm.com>
>> ---
>>   drivers/s390/cio/vfio_ccw_cp.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/s390/cio/vfio_ccw_cp.c b/drivers/s390/cio/vfio_ccw_cp.c
>> index fd77e46eb3b2..ef5ab45d94b3 100644
>> --- a/drivers/s390/cio/vfio_ccw_cp.c
>> +++ b/drivers/s390/cio/vfio_ccw_cp.c
>> @@ -528,7 +528,7 @@ static int ccwchain_fetch_direct(struct ccwchain *chain,
>>   
>>   	ret = pfn_array_alloc_pin(pat->pat_pa, cp->mdev, ccw->cda, ccw->count);
>>   	if (ret < 0)
>> -		goto out_init;
>> +		goto out_unpin;
>>   
>>   	/* Translate this direct ccw to a idal ccw. */
>>   	idaws = kcalloc(ret, sizeof(*idaws), GFP_DMA | GFP_KERNEL);
> 
> It's a bit confusing that this will also call vfio_unpin_pages() even
> though there should not be any pinned pages at that point in time; but
> from what I see, it should not hurt, so this patch is fine.
> 

Yeah, confusing indeed.  The problem today is that we don't undo the 
pfn_array_table_init() call that happened prior to this error, and so 
there would be a leak of the pat->pat_pa memory.  So going to out_init 
is certainly not right.

In the pfn_array patches later, I do conditionally call 
vfio_unpin_pages() based on the contents of pfn_array->pa_iova, to avoid 
the scenario you describe.

  - Eric
Halil Pasic Nov. 9, 2018, 5:01 p.m. UTC | #3
On Fri, 9 Nov 2018 09:49:22 -0500
Eric Farman <farman@linux.ibm.com> wrote:

> 
> 
> On 11/09/2018 05:45 AM, Cornelia Huck wrote:
> > On Fri,  9 Nov 2018 03:39:28 +0100
> > Eric Farman <farman@linux.ibm.com> wrote:
> > 
> >> If pfn_array_alloc fails somehow, we need to release the
> >> pfn_array_table that was malloc'd earlier.
> >>
> >> Signed-off-by: Eric Farman <farman@linux.ibm.com>
> >> ---
> >>   drivers/s390/cio/vfio_ccw_cp.c | 2 +-
> >>   1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/s390/cio/vfio_ccw_cp.c
> >> b/drivers/s390/cio/vfio_ccw_cp.c index fd77e46eb3b2..ef5ab45d94b3
> >> 100644 --- a/drivers/s390/cio/vfio_ccw_cp.c
> >> +++ b/drivers/s390/cio/vfio_ccw_cp.c
> >> @@ -528,7 +528,7 @@ static int ccwchain_fetch_direct(struct
> >> ccwchain *chain, 
> >>   	ret = pfn_array_alloc_pin(pat->pat_pa, cp->mdev,
> >> ccw->cda, ccw->count); if (ret < 0)
> >> -		goto out_init;
> >> +		goto out_unpin;
> >>   
> >>   	/* Translate this direct ccw to a idal ccw. */
> >>   	idaws = kcalloc(ret, sizeof(*idaws), GFP_DMA |
> >> GFP_KERNEL);
> > 
> > It's a bit confusing that this will also call vfio_unpin_pages()
> > even though there should not be any pinned pages at that point in
> > time; but from what I see, it should not hurt, so this patch is
> > fine.
> > 
> 
> Yeah, confusing indeed.  The problem today is that we don't undo the 
> pfn_array_table_init() call that happened prior to this error, and so 
> there would be a leak of the pat->pat_pa memory.  So going to
> out_init is certainly not right.
> 
> In the pfn_array patches later, I do conditionally call 
> vfio_unpin_pages() based on the contents of pfn_array->pa_iova, to
> avoid the scenario you describe.
> 
>   - Eric
> 

Quite confusing and generally awful. I will try to figure out the
before-after on a series level, and then consider the individual
patches in detail. The in between states are predestined to look awful
because of the current state.

Regards,
Halil
Farhan Ali Nov. 9, 2018, 9:19 p.m. UTC | #4
On 11/09/2018 12:01 PM, Halil Pasic wrote:
> On Fri, 9 Nov 2018 09:49:22 -0500
> Eric Farman <farman@linux.ibm.com> wrote:
> 
>>
>>
>> On 11/09/2018 05:45 AM, Cornelia Huck wrote:
>>> On Fri,  9 Nov 2018 03:39:28 +0100
>>> Eric Farman <farman@linux.ibm.com> wrote:
>>>
>>>> If pfn_array_alloc fails somehow, we need to release the
>>>> pfn_array_table that was malloc'd earlier.
>>>>
>>>> Signed-off-by: Eric Farman <farman@linux.ibm.com>
>>>> ---
>>>>    drivers/s390/cio/vfio_ccw_cp.c | 2 +-
>>>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/s390/cio/vfio_ccw_cp.c
>>>> b/drivers/s390/cio/vfio_ccw_cp.c index fd77e46eb3b2..ef5ab45d94b3
>>>> 100644 --- a/drivers/s390/cio/vfio_ccw_cp.c
>>>> +++ b/drivers/s390/cio/vfio_ccw_cp.c
>>>> @@ -528,7 +528,7 @@ static int ccwchain_fetch_direct(struct
>>>> ccwchain *chain,
>>>>    	ret = pfn_array_alloc_pin(pat->pat_pa, cp->mdev,
>>>> ccw->cda, ccw->count); if (ret < 0)
>>>> -		goto out_init;
>>>> +		goto out_unpin;
>>>>    
>>>>    	/* Translate this direct ccw to a idal ccw. */
>>>>    	idaws = kcalloc(ret, sizeof(*idaws), GFP_DMA |
>>>> GFP_KERNEL);
>>>
>>> It's a bit confusing that this will also call vfio_unpin_pages()
>>> even though there should not be any pinned pages at that point in
>>> time; but from what I see, it should not hurt, so this patch is
>>> fine.
>>>
>>
>> Yeah, confusing indeed.  The problem today is that we don't undo the
>> pfn_array_table_init() call that happened prior to this error, and so
>> there would be a leak of the pat->pat_pa memory.  So going to
>> out_init is certainly not right.
>>
>> In the pfn_array patches later, I do conditionally call
>> vfio_unpin_pages() based on the contents of pfn_array->pa_iova, to
>> avoid the scenario you describe.
>>
>>    - Eric
>>
> 
> Quite confusing and generally awful. I will try to figure out the
> before-after on a series level, and then consider the individual
> patches in detail. The in between states are predestined to look awful
> because of the current state.
> 
> Regards,
> Halil
> 
> 
> 
The fix is correct but yeah we are unpinning pages when we shouldn't 
have anymore pinned pages.
Maybe add an extra check in pfn_array_unpin_free to see if pa_iova_pfn 
is null? I don't know if it would make it even more confusing :)

Thanks
Farhan
Eric Farman Nov. 11, 2018, 6:13 p.m. UTC | #5
On 11/09/2018 04:19 PM, Farhan Ali wrote:
> 
> 
> On 11/09/2018 12:01 PM, Halil Pasic wrote:
>> On Fri, 9 Nov 2018 09:49:22 -0500
>> Eric Farman <farman@linux.ibm.com> wrote:
>>
>>>
>>>
>>> On 11/09/2018 05:45 AM, Cornelia Huck wrote:
>>>> On Fri,  9 Nov 2018 03:39:28 +0100
>>>> Eric Farman <farman@linux.ibm.com> wrote:
>>>>
>>>>> If pfn_array_alloc fails somehow, we need to release the
>>>>> pfn_array_table that was malloc'd earlier.
>>>>>
>>>>> Signed-off-by: Eric Farman <farman@linux.ibm.com>
>>>>> ---
>>>>>    drivers/s390/cio/vfio_ccw_cp.c | 2 +-
>>>>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/drivers/s390/cio/vfio_ccw_cp.c
>>>>> b/drivers/s390/cio/vfio_ccw_cp.c index fd77e46eb3b2..ef5ab45d94b3
>>>>> 100644 --- a/drivers/s390/cio/vfio_ccw_cp.c
>>>>> +++ b/drivers/s390/cio/vfio_ccw_cp.c
>>>>> @@ -528,7 +528,7 @@ static int ccwchain_fetch_direct(struct
>>>>> ccwchain *chain,
>>>>>        ret = pfn_array_alloc_pin(pat->pat_pa, cp->mdev,
>>>>> ccw->cda, ccw->count); if (ret < 0)
>>>>> -        goto out_init;
>>>>> +        goto out_unpin;
>>>>>        /* Translate this direct ccw to a idal ccw. */
>>>>>        idaws = kcalloc(ret, sizeof(*idaws), GFP_DMA |
>>>>> GFP_KERNEL);
>>>>
>>>> It's a bit confusing that this will also call vfio_unpin_pages()
>>>> even though there should not be any pinned pages at that point in
>>>> time; but from what I see, it should not hurt, so this patch is
>>>> fine.
>>>>
>>>
>>> Yeah, confusing indeed.  The problem today is that we don't undo the
>>> pfn_array_table_init() call that happened prior to this error, and so
>>> there would be a leak of the pat->pat_pa memory.  So going to
>>> out_init is certainly not right.
>>>
>>> In the pfn_array patches later, I do conditionally call
>>> vfio_unpin_pages() based on the contents of pfn_array->pa_iova, to
>>> avoid the scenario you describe.
>>>
>>>    - Eric
>>>
>>
>> Quite confusing and generally awful. I will try to figure out the
>> before-after on a series level, and then consider the individual
>> patches in detail. The in between states are predestined to look awful
>> because of the current state.
>>
>> Regards,
>> Halil
>>
>>
>>
> The fix is correct but yeah we are unpinning pages when we shouldn't 
> have anymore pinned pages.
> Maybe add an extra check in pfn_array_unpin_free to see if pa_iova_pfn 
> is null? I don't know if it would make it even more confusing :)

I know I put this check in later, but I think it's a 
belts-and-suspenders situation.  If pfn_array_alloc_pin failed, then we 
took one of three error exits:

1) pa->pa_nr is set to zero, based on the input length
2) pa->pa_iova_pfn is zero, because of -ENOMEM
3) vfio_pin_pages() failed somehow, so on cleanup we set pa->pa_nr to 
zero, kfree(pa->pa_iova_pfn), and set pa->pa_iova_pfn to NULL

Since both pa_nr (npages) and pa_iova_pfn (user_pfn) are zero/NULL, the 
unpin routine will exit out early with -EINVAL.  We don't check the 
return code from vfio_unpin_pages, but that's fine since we're already 
in an error path.

I'm not opposed to a check here, so can spin a v2 of this patch if 
desired.  But I'm not as concerned about it with the state of the code 
on this patch.

  - Eric
Cornelia Huck Nov. 12, 2018, 10 a.m. UTC | #6
On Sun, 11 Nov 2018 13:13:48 -0500
Eric Farman <farman@linux.ibm.com> wrote:

> On 11/09/2018 04:19 PM, Farhan Ali wrote:
> > 
> > 
> > On 11/09/2018 12:01 PM, Halil Pasic wrote:  
> >> On Fri, 9 Nov 2018 09:49:22 -0500
> >> Eric Farman <farman@linux.ibm.com> wrote:
> >>  
> >>>
> >>>
> >>> On 11/09/2018 05:45 AM, Cornelia Huck wrote:  
> >>>> On Fri,  9 Nov 2018 03:39:28 +0100
> >>>> Eric Farman <farman@linux.ibm.com> wrote:
> >>>>  
> >>>>> If pfn_array_alloc fails somehow, we need to release the
> >>>>> pfn_array_table that was malloc'd earlier.
> >>>>>
> >>>>> Signed-off-by: Eric Farman <farman@linux.ibm.com>
> >>>>> ---
> >>>>>    drivers/s390/cio/vfio_ccw_cp.c | 2 +-
> >>>>>    1 file changed, 1 insertion(+), 1 deletion(-)
> >>>>>
> >>>>> diff --git a/drivers/s390/cio/vfio_ccw_cp.c
> >>>>> b/drivers/s390/cio/vfio_ccw_cp.c index fd77e46eb3b2..ef5ab45d94b3
> >>>>> 100644 --- a/drivers/s390/cio/vfio_ccw_cp.c
> >>>>> +++ b/drivers/s390/cio/vfio_ccw_cp.c
> >>>>> @@ -528,7 +528,7 @@ static int ccwchain_fetch_direct(struct
> >>>>> ccwchain *chain,
> >>>>>        ret = pfn_array_alloc_pin(pat->pat_pa, cp->mdev,
> >>>>> ccw->cda, ccw->count); if (ret < 0)
> >>>>> -        goto out_init;
> >>>>> +        goto out_unpin;
> >>>>>        /* Translate this direct ccw to a idal ccw. */
> >>>>>        idaws = kcalloc(ret, sizeof(*idaws), GFP_DMA |
> >>>>> GFP_KERNEL);  
> >>>>
> >>>> It's a bit confusing that this will also call vfio_unpin_pages()
> >>>> even though there should not be any pinned pages at that point in
> >>>> time; but from what I see, it should not hurt, so this patch is
> >>>> fine.
> >>>>  
> >>>
> >>> Yeah, confusing indeed.  The problem today is that we don't undo the
> >>> pfn_array_table_init() call that happened prior to this error, and so
> >>> there would be a leak of the pat->pat_pa memory.  So going to
> >>> out_init is certainly not right.
> >>>
> >>> In the pfn_array patches later, I do conditionally call
> >>> vfio_unpin_pages() based on the contents of pfn_array->pa_iova, to
> >>> avoid the scenario you describe.
> >>>
> >>>    - Eric
> >>>  
> >>
> >> Quite confusing and generally awful. I will try to figure out the
> >> before-after on a series level, and then consider the individual
> >> patches in detail. The in between states are predestined to look awful
> >> because of the current state.

It's probably better to consider the first two, bug-fixing patches
separately (the complete series is probably not 4.20 material.)

> >>
> >> Regards,
> >> Halil
> >>
> >>
> >>  
> > The fix is correct but yeah we are unpinning pages when we shouldn't 
> > have anymore pinned pages.
> > Maybe add an extra check in pfn_array_unpin_free to see if pa_iova_pfn 
> > is null? I don't know if it would make it even more confusing :)  
> 
> I know I put this check in later, but I think it's a 
> belts-and-suspenders situation.  If pfn_array_alloc_pin failed, then we 
> took one of three error exits:
> 
> 1) pa->pa_nr is set to zero, based on the input length
> 2) pa->pa_iova_pfn is zero, because of -ENOMEM
> 3) vfio_pin_pages() failed somehow, so on cleanup we set pa->pa_nr to 
> zero, kfree(pa->pa_iova_pfn), and set pa->pa_iova_pfn to NULL
> 
> Since both pa_nr (npages) and pa_iova_pfn (user_pfn) are zero/NULL, the 
> unpin routine will exit out early with -EINVAL.  We don't check the 
> return code from vfio_unpin_pages, but that's fine since we're already 
> in an error path.

Yes, that was my reasoning as well.

> 
> I'm not opposed to a check here, so can spin a v2 of this patch if 
> desired.  But I'm not as concerned about it with the state of the code 
> on this patch.

I'd be happy to merge this patch as-is, but if people think a check
makes things clearer, I'd be happy to merge an updated patch as well.
Pierre Morel Nov. 12, 2018, 10:28 a.m. UTC | #7
On 11/11/2018 19:13, Eric Farman wrote:
> 
> 
> On 11/09/2018 04:19 PM, Farhan Ali wrote:
>>
>>
>> On 11/09/2018 12:01 PM, Halil Pasic wrote:
>>> On Fri, 9 Nov 2018 09:49:22 -0500
>>> Eric Farman <farman@linux.ibm.com> wrote:
>>>
>>>>
>>>>
>>>> On 11/09/2018 05:45 AM, Cornelia Huck wrote:
>>>>> On Fri,  9 Nov 2018 03:39:28 +0100
>>>>> Eric Farman <farman@linux.ibm.com> wrote:
>>>>>
>>>>>> If pfn_array_alloc fails somehow, we need to release the
>>>>>> pfn_array_table that was malloc'd earlier.
>>>>>>
>>>>>> Signed-off-by: Eric Farman <farman@linux.ibm.com>
>>>>>> ---
>>>>>>    drivers/s390/cio/vfio_ccw_cp.c | 2 +-
>>>>>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/drivers/s390/cio/vfio_ccw_cp.c
>>>>>> b/drivers/s390/cio/vfio_ccw_cp.c index fd77e46eb3b2..ef5ab45d94b3
>>>>>> 100644 --- a/drivers/s390/cio/vfio_ccw_cp.c
>>>>>> +++ b/drivers/s390/cio/vfio_ccw_cp.c
>>>>>> @@ -528,7 +528,7 @@ static int ccwchain_fetch_direct(struct
>>>>>> ccwchain *chain,
>>>>>>        ret = pfn_array_alloc_pin(pat->pat_pa, cp->mdev,
>>>>>> ccw->cda, ccw->count); if (ret < 0)
>>>>>> -        goto out_init;
>>>>>> +        goto out_unpin;
>>>>>>        /* Translate this direct ccw to a idal ccw. */
>>>>>>        idaws = kcalloc(ret, sizeof(*idaws), GFP_DMA |
>>>>>> GFP_KERNEL);
>>>>>
>>>>> It's a bit confusing that this will also call vfio_unpin_pages()
>>>>> even though there should not be any pinned pages at that point in
>>>>> time; but from what I see, it should not hurt, so this patch is
>>>>> fine.
>>>>>
>>>>
>>>> Yeah, confusing indeed.  The problem today is that we don't undo the
>>>> pfn_array_table_init() call that happened prior to this error, and so
>>>> there would be a leak of the pat->pat_pa memory.  So going to
>>>> out_init is certainly not right.
>>>>
>>>> In the pfn_array patches later, I do conditionally call
>>>> vfio_unpin_pages() based on the contents of pfn_array->pa_iova, to
>>>> avoid the scenario you describe.
>>>>
>>>>    - Eric
>>>>
>>>
>>> Quite confusing and generally awful. I will try to figure out the
>>> before-after on a series level, and then consider the individual
>>> patches in detail. The in between states are predestined to look awful
>>> because of the current state.
>>>
>>> Regards,
>>> Halil
>>>
>>>
>>>
>> The fix is correct but yeah we are unpinning pages when we shouldn't 
>> have anymore pinned pages.
>> Maybe add an extra check in pfn_array_unpin_free to see if pa_iova_pfn 
>> is null? I don't know if it would make it even more confusing :)
> 
> I know I put this check in later, but I think it's a 
> belts-and-suspenders situation.  If pfn_array_alloc_pin failed, then we 
> took one of three error exits:
> 
> 1) pa->pa_nr is set to zero, based on the input length
> 2) pa->pa_iova_pfn is zero, because of -ENOMEM
> 3) vfio_pin_pages() failed somehow, so on cleanup we set pa->pa_nr to 
> zero, kfree(pa->pa_iova_pfn), and set pa->pa_iova_pfn to NULL
> 
> Since both pa_nr (npages) and pa_iova_pfn (user_pfn) are zero/NULL, the 
> unpin routine will exit out early with -EINVAL.  We don't check the 
> return code from vfio_unpin_pages, but that's fine since we're already 
> in an error path.
> 
> I'm not opposed to a check here, so can spin a v2 of this patch if 
> desired.  But I'm not as concerned about it with the state of the code 
> on this patch.
> 
>   - Eric

Hi Eric,

I think the problem here comes from the pfn_array_table_unpin_free() 
doing both unpin and free.

I would prefer that you change the  pfn_array_table_init() to return the 
pointer, so you can free the pointer in the caller like:


         p = pfn_array_table_init(pat, 1);
         if (!p)
                 goto out_init;

         ret = pfn_array_alloc_pin(pat->pat_pa, cp->mdev, ccw->cda, 
ccw->count);
         if (ret < 0)
                 goto out_free;
...

out_unpin:
         pfn_array_table_unpin_free(pat, cp->mdev);
out_free:
	pfn_array_table_free(p);
out_init:
         ccw->cda = 0;
         return ret;
}


And modify the pfn_array_table_unpin_free() to pfn_array_table_unpin()
and add the freeing in pfn_array_table_free(p).


Something like that with a correct return code handle.

Which would make the code more logical and readable.

What do you think?

Regards,
Pierre
Cornelia Huck Nov. 12, 2018, 10:32 a.m. UTC | #8
On Mon, 12 Nov 2018 11:28:38 +0100
Pierre Morel <pmorel@linux.ibm.com> wrote:

> Hi Eric,
> 
> I think the problem here comes from the pfn_array_table_unpin_free() 
> doing both unpin and free.
> 
> I would prefer that you change the  pfn_array_table_init() to return the 
> pointer, so you can free the pointer in the caller like:
> 
> 
>          p = pfn_array_table_init(pat, 1);
>          if (!p)
>                  goto out_init;
> 
>          ret = pfn_array_alloc_pin(pat->pat_pa, cp->mdev, ccw->cda, 
> ccw->count);
>          if (ret < 0)
>                  goto out_free;
> ...
> 
> out_unpin:
>          pfn_array_table_unpin_free(pat, cp->mdev);
> out_free:
> 	pfn_array_table_free(p);
> out_init:
>          ccw->cda = 0;
>          return ret;
> }
> 
> 
> And modify the pfn_array_table_unpin_free() to pfn_array_table_unpin()
> and add the freeing in pfn_array_table_free(p).
> 
> 
> Something like that with a correct return code handle.
> 
> Which would make the code more logical and readable.
> 
> What do you think?

While I agree that this would improve the code, I'm not sure how much
of it remains at the end of this series (I haven't read it completely
yet.) IOW, is this a code change that would get kicked out again anyway?
Pierre Morel Nov. 12, 2018, 10:41 a.m. UTC | #9
On 12/11/2018 11:32, Cornelia Huck wrote:
> On Mon, 12 Nov 2018 11:28:38 +0100
> Pierre Morel <pmorel@linux.ibm.com> wrote:
> 
>> Hi Eric,
>>
>> I think the problem here comes from the pfn_array_table_unpin_free()
>> doing both unpin and free.
>>
>> I would prefer that you change the  pfn_array_table_init() to return the
>> pointer, so you can free the pointer in the caller like:
>>
>>
>>           p = pfn_array_table_init(pat, 1);
>>           if (!p)
>>                   goto out_init;
>>
>>           ret = pfn_array_alloc_pin(pat->pat_pa, cp->mdev, ccw->cda,
>> ccw->count);
>>           if (ret < 0)
>>                   goto out_free;
>> ...
>>
>> out_unpin:
>>           pfn_array_table_unpin_free(pat, cp->mdev);
>> out_free:
>> 	pfn_array_table_free(p);
>> out_init:
>>           ccw->cda = 0;
>>           return ret;
>> }
>>
>>
>> And modify the pfn_array_table_unpin_free() to pfn_array_table_unpin()
>> and add the freeing in pfn_array_table_free(p).
>>
>>
>> Something like that with a correct return code handle.
>>
>> Which would make the code more logical and readable.
>>
>> What do you think?
> 
> While I agree that this would improve the code, I'm not sure how much
> of it remains at the end of this series (I haven't read it completely
> yet.) IOW, is this a code change that would get kicked out again anyway?
> 

At the end,pfn_array_alloc_pin() is decoupled in pfn_array_alloc() and 
pfn_array_pin() which is a good thing but but the pfn_array_unpin_free() 
survives as unpin + free.
Cornelia Huck Nov. 12, 2018, 10:59 a.m. UTC | #10
On Mon, 12 Nov 2018 11:41:50 +0100
Pierre Morel <pmorel@linux.ibm.com> wrote:

> On 12/11/2018 11:32, Cornelia Huck wrote:
> > On Mon, 12 Nov 2018 11:28:38 +0100
> > Pierre Morel <pmorel@linux.ibm.com> wrote:
> >   
> >> Hi Eric,
> >>
> >> I think the problem here comes from the pfn_array_table_unpin_free()
> >> doing both unpin and free.
> >>
> >> I would prefer that you change the  pfn_array_table_init() to return the
> >> pointer, so you can free the pointer in the caller like:
> >>
> >>
> >>           p = pfn_array_table_init(pat, 1);
> >>           if (!p)
> >>                   goto out_init;
> >>
> >>           ret = pfn_array_alloc_pin(pat->pat_pa, cp->mdev, ccw->cda,
> >> ccw->count);
> >>           if (ret < 0)
> >>                   goto out_free;
> >> ...
> >>
> >> out_unpin:
> >>           pfn_array_table_unpin_free(pat, cp->mdev);
> >> out_free:
> >> 	pfn_array_table_free(p);
> >> out_init:
> >>           ccw->cda = 0;
> >>           return ret;
> >> }
> >>
> >>
> >> And modify the pfn_array_table_unpin_free() to pfn_array_table_unpin()
> >> and add the freeing in pfn_array_table_free(p).
> >>
> >>
> >> Something like that with a correct return code handle.
> >>
> >> Which would make the code more logical and readable.
> >>
> >> What do you think?  
> > 
> > While I agree that this would improve the code, I'm not sure how much
> > of it remains at the end of this series (I haven't read it completely
> > yet.) IOW, is this a code change that would get kicked out again anyway?
> >   
> 
> At the end,pfn_array_alloc_pin() is decoupled in pfn_array_alloc() and 
> pfn_array_pin() which is a good thing but but the pfn_array_unpin_free() 
> survives as unpin + free.

OK, this seems like a good idea to me, then :)
Halil Pasic Nov. 12, 2018, 12:38 p.m. UTC | #11
On Mon, 12 Nov 2018 11:00:55 +0100
Cornelia Huck <cohuck@redhat.com> wrote:

> From: Cornelia Huck <cohuck@redhat.com>
> To: Eric Farman <farman@linux.ibm.com>
> Cc: Farhan Ali <alifm@linux.ibm.com>, Halil Pasic
> <pasic@linux.ibm.com>, Pierre Morel <pmorel@linux.ibm.com>,
> linux-s390@vger.kernel.org, kvm@vger.kernel.org, "Jason J . Herne"
> <jjherne@linux.ibm.com> Subject: Re: [RFC PATCH v1 01/10] s390/cio:
> Fix cleanup of pfn_array alloc failure Date: Mon, 12 Nov 2018
> 11:00:55 +0100 Sender: linux-s390-owner@vger.kernel.org Organization:
> Red Hat GmbH
> 
> On Sun, 11 Nov 2018 13:13:48 -0500
> Eric Farman <farman@linux.ibm.com> wrote:
> 
> > On 11/09/2018 04:19 PM, Farhan Ali wrote:  
> > > 
> > > 
> > > On 11/09/2018 12:01 PM, Halil Pasic wrote:    
> > >> On Fri, 9 Nov 2018 09:49:22 -0500
> > >> Eric Farman <farman@linux.ibm.com> wrote:
> > >>    
> > >>>
> > >>>
> > >>> On 11/09/2018 05:45 AM, Cornelia Huck wrote:    
> > >>>> On Fri,  9 Nov 2018 03:39:28 +0100
> > >>>> Eric Farman <farman@linux.ibm.com> wrote:
> > >>>>    
> > >>>>> If pfn_array_alloc fails somehow, we need to release the
> > >>>>> pfn_array_table that was malloc'd earlier.
> > >>>>>
> > >>>>> Signed-off-by: Eric Farman <farman@linux.ibm.com>

Acked-by: Halil Pasic <pasic@linux.ibm.com>

> > >>>>> ---
> > >>>>>    drivers/s390/cio/vfio_ccw_cp.c | 2 +-
> > >>>>>    1 file changed, 1 insertion(+), 1 deletion(-)
> > >>>>>
> > >>>>> diff --git a/drivers/s390/cio/vfio_ccw_cp.c
> > >>>>> b/drivers/s390/cio/vfio_ccw_cp.c index
> > >>>>> fd77e46eb3b2..ef5ab45d94b3 100644 ---
> > >>>>> a/drivers/s390/cio/vfio_ccw_cp.c +++
> > >>>>> b/drivers/s390/cio/vfio_ccw_cp.c @@ -528,7 +528,7 @@ static
> > >>>>> int ccwchain_fetch_direct(struct ccwchain *chain,
> > >>>>>        ret = pfn_array_alloc_pin(pat->pat_pa, cp->mdev,
> > >>>>> ccw->cda, ccw->count); if (ret < 0)
> > >>>>> -        goto out_init;
> > >>>>> +        goto out_unpin;
> > >>>>>        /* Translate this direct ccw to a idal ccw. */
> > >>>>>        idaws = kcalloc(ret, sizeof(*idaws), GFP_DMA |
> > >>>>> GFP_KERNEL);    
> > >>>>
> > >>>> It's a bit confusing that this will also call
> > >>>> vfio_unpin_pages() even though there should not be any pinned
> > >>>> pages at that point in time; but from what I see, it should
> > >>>> not hurt, so this patch is fine.
> > >>>>    
> > >>>
> > >>> Yeah, confusing indeed.  The problem today is that we don't
> > >>> undo the pfn_array_table_init() call that happened prior to
> > >>> this error, and so there would be a leak of the pat->pat_pa
> > >>> memory.  So going to out_init is certainly not right.
> > >>>
> > >>> In the pfn_array patches later, I do conditionally call
> > >>> vfio_unpin_pages() based on the contents of pfn_array->pa_iova,
> > >>> to avoid the scenario you describe.
> > >>>
> > >>>    - Eric
> > >>>    
> > >>
> > >> Quite confusing and generally awful. I will try to figure out the
> > >> before-after on a series level, and then consider the individual
> > >> patches in detail. The in between states are predestined to look
> > >> awful because of the current state.  
> 
> It's probably better to consider the first two, bug-fixing patches
> separately (the complete series is probably not 4.20 material.)

I second that! From what I've seen up until now, it will take time to
properly review the complete series. The two bugfixes in turn are easy
to understand.

Halil
Cornelia Huck Nov. 12, 2018, 1:23 p.m. UTC | #12
On Mon, 12 Nov 2018 13:38:46 +0100
Halil Pasic <pasic@linux.ibm.com> wrote:

> On Mon, 12 Nov 2018 11:00:55 +0100
> Cornelia Huck <cohuck@redhat.com> wrote:
> 
> > From: Cornelia Huck <cohuck@redhat.com>
> > To: Eric Farman <farman@linux.ibm.com>
> > Cc: Farhan Ali <alifm@linux.ibm.com>, Halil Pasic
> > <pasic@linux.ibm.com>, Pierre Morel <pmorel@linux.ibm.com>,
> > linux-s390@vger.kernel.org, kvm@vger.kernel.org, "Jason J . Herne"
> > <jjherne@linux.ibm.com> Subject: Re: [RFC PATCH v1 01/10] s390/cio:
> > Fix cleanup of pfn_array alloc failure Date: Mon, 12 Nov 2018
> > 11:00:55 +0100 Sender: linux-s390-owner@vger.kernel.org Organization:
> > Red Hat GmbH
> > 
> > On Sun, 11 Nov 2018 13:13:48 -0500
> > Eric Farman <farman@linux.ibm.com> wrote:
> >   
> > > On 11/09/2018 04:19 PM, Farhan Ali wrote:    
> > > > 
> > > > 
> > > > On 11/09/2018 12:01 PM, Halil Pasic wrote:      
> > > >> On Fri, 9 Nov 2018 09:49:22 -0500
> > > >> Eric Farman <farman@linux.ibm.com> wrote:
> > > >>      
> > > >>>
> > > >>>
> > > >>> On 11/09/2018 05:45 AM, Cornelia Huck wrote:      
> > > >>>> On Fri,  9 Nov 2018 03:39:28 +0100
> > > >>>> Eric Farman <farman@linux.ibm.com> wrote:
> > > >>>>      
> > > >>>>> If pfn_array_alloc fails somehow, we need to release the
> > > >>>>> pfn_array_table that was malloc'd earlier.
> > > >>>>>
> > > >>>>> Signed-off-by: Eric Farman <farman@linux.ibm.com>  
> 
> Acked-by: Halil Pasic <pasic@linux.ibm.com>
> 
> > > >>>>> ---
> > > >>>>>    drivers/s390/cio/vfio_ccw_cp.c | 2 +-
> > > >>>>>    1 file changed, 1 insertion(+), 1 deletion(-)
> > > >>>>>
> > > >>>>> diff --git a/drivers/s390/cio/vfio_ccw_cp.c
> > > >>>>> b/drivers/s390/cio/vfio_ccw_cp.c index
> > > >>>>> fd77e46eb3b2..ef5ab45d94b3 100644 ---
> > > >>>>> a/drivers/s390/cio/vfio_ccw_cp.c +++
> > > >>>>> b/drivers/s390/cio/vfio_ccw_cp.c @@ -528,7 +528,7 @@ static
> > > >>>>> int ccwchain_fetch_direct(struct ccwchain *chain,
> > > >>>>>        ret = pfn_array_alloc_pin(pat->pat_pa, cp->mdev,
> > > >>>>> ccw->cda, ccw->count); if (ret < 0)
> > > >>>>> -        goto out_init;
> > > >>>>> +        goto out_unpin;
> > > >>>>>        /* Translate this direct ccw to a idal ccw. */
> > > >>>>>        idaws = kcalloc(ret, sizeof(*idaws), GFP_DMA |
> > > >>>>> GFP_KERNEL);      
> > > >>>>
> > > >>>> It's a bit confusing that this will also call
> > > >>>> vfio_unpin_pages() even though there should not be any pinned
> > > >>>> pages at that point in time; but from what I see, it should
> > > >>>> not hurt, so this patch is fine.
> > > >>>>      
> > > >>>
> > > >>> Yeah, confusing indeed.  The problem today is that we don't
> > > >>> undo the pfn_array_table_init() call that happened prior to
> > > >>> this error, and so there would be a leak of the pat->pat_pa
> > > >>> memory.  So going to out_init is certainly not right.
> > > >>>
> > > >>> In the pfn_array patches later, I do conditionally call
> > > >>> vfio_unpin_pages() based on the contents of pfn_array->pa_iova,
> > > >>> to avoid the scenario you describe.
> > > >>>
> > > >>>    - Eric
> > > >>>      
> > > >>
> > > >> Quite confusing and generally awful. I will try to figure out the
> > > >> before-after on a series level, and then consider the individual
> > > >> patches in detail. The in between states are predestined to look
> > > >> awful because of the current state.    
> > 
> > It's probably better to consider the first two, bug-fixing patches
> > separately (the complete series is probably not 4.20 material.)  
> 
> I second that! From what I've seen up until now, it will take time to
> properly review the complete series. The two bugfixes in turn are easy
> to understand.

So we'll go with this patch and defer any further rework to a v2 of the
complete series, ok?

I'll queue this patch, then; unless someone complains, I'll send a pull
request for vfio-ccw tomorrow.
Cornelia Huck Nov. 12, 2018, 1:25 p.m. UTC | #13
On Fri,  9 Nov 2018 03:39:28 +0100
Eric Farman <farman@linux.ibm.com> wrote:

> If pfn_array_alloc fails somehow, we need to release the pfn_array_table
> that was malloc'd earlier.
> 
> Signed-off-by: Eric Farman <farman@linux.ibm.com>
> ---
>  drivers/s390/cio/vfio_ccw_cp.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/s390/cio/vfio_ccw_cp.c b/drivers/s390/cio/vfio_ccw_cp.c
> index fd77e46eb3b2..ef5ab45d94b3 100644
> --- a/drivers/s390/cio/vfio_ccw_cp.c
> +++ b/drivers/s390/cio/vfio_ccw_cp.c
> @@ -528,7 +528,7 @@ static int ccwchain_fetch_direct(struct ccwchain *chain,
>  
>  	ret = pfn_array_alloc_pin(pat->pat_pa, cp->mdev, ccw->cda, ccw->count);
>  	if (ret < 0)
> -		goto out_init;
> +		goto out_unpin;
>  
>  	/* Translate this direct ccw to a idal ccw. */
>  	idaws = kcalloc(ret, sizeof(*idaws), GFP_DMA | GFP_KERNEL);

Thanks, applied.
Eric Farman Nov. 12, 2018, 2:04 p.m. UTC | #14
On 11/12/2018 05:59 AM, Cornelia Huck wrote:
> On Mon, 12 Nov 2018 11:41:50 +0100
> Pierre Morel <pmorel@linux.ibm.com> wrote:
> 
>> On 12/11/2018 11:32, Cornelia Huck wrote:
>>> On Mon, 12 Nov 2018 11:28:38 +0100
>>> Pierre Morel <pmorel@linux.ibm.com> wrote:
>>>    
>>>> Hi Eric,
>>>>
>>>> I think the problem here comes from the pfn_array_table_unpin_free()
>>>> doing both unpin and free.
>>>>
>>>> I would prefer that you change the  pfn_array_table_init() to return the
>>>> pointer, so you can free the pointer in the caller like:
>>>>
>>>>
>>>>            p = pfn_array_table_init(pat, 1);
>>>>            if (!p)
>>>>                    goto out_init;
>>>>
>>>>            ret = pfn_array_alloc_pin(pat->pat_pa, cp->mdev, ccw->cda,
>>>> ccw->count);
>>>>            if (ret < 0)
>>>>                    goto out_free;
>>>> ...
>>>>
>>>> out_unpin:
>>>>            pfn_array_table_unpin_free(pat, cp->mdev);
>>>> out_free:
>>>> 	pfn_array_table_free(p);
>>>> out_init:
>>>>            ccw->cda = 0;
>>>>            return ret;
>>>> }
>>>>
>>>>
>>>> And modify the pfn_array_table_unpin_free() to pfn_array_table_unpin()
>>>> and add the freeing in pfn_array_table_free(p).
>>>>
>>>>
>>>> Something like that with a correct return code handle.
>>>>
>>>> Which would make the code more logical and readable.
>>>>
>>>> What do you think?
>>>
>>> While I agree that this would improve the code, I'm not sure how much
>>> of it remains at the end of this series (I haven't read it completely
>>> yet.) IOW, is this a code change that would get kicked out again anyway?
>>>    
>>
>> At the end,pfn_array_alloc_pin() is decoupled in pfn_array_alloc() and
>> pfn_array_pin() which is a good thing but but the pfn_array_unpin_free()
>> survives as unpin + free.

And don't forget, pfn_array_table_unpin_free() goes away entirely.  I do 
add a non-zero iova in patch 8.

> 
> OK, this seems like a good idea to me, then :)
>
Eric Farman Nov. 12, 2018, 2:16 p.m. UTC | #15
On 11/12/2018 09:04 AM, Eric Farman wrote:
> 
> 
> On 11/12/2018 05:59 AM, Cornelia Huck wrote:
>> On Mon, 12 Nov 2018 11:41:50 +0100
>> Pierre Morel <pmorel@linux.ibm.com> wrote:
>>
>>> On 12/11/2018 11:32, Cornelia Huck wrote:
>>>> On Mon, 12 Nov 2018 11:28:38 +0100
>>>> Pierre Morel <pmorel@linux.ibm.com> wrote:
>>>>> Hi Eric,
>>>>>
>>>>> I think the problem here comes from the pfn_array_table_unpin_free()
>>>>> doing both unpin and free.
>>>>>
>>>>> I would prefer that you change the  pfn_array_table_init() to 
>>>>> return the
>>>>> pointer, so you can free the pointer in the caller like:
>>>>>
>>>>>
>>>>>            p = pfn_array_table_init(pat, 1);
>>>>>            if (!p)
>>>>>                    goto out_init;
>>>>>
>>>>>            ret = pfn_array_alloc_pin(pat->pat_pa, cp->mdev, ccw->cda,
>>>>> ccw->count);
>>>>>            if (ret < 0)
>>>>>                    goto out_free;
>>>>> ...
>>>>>
>>>>> out_unpin:
>>>>>            pfn_array_table_unpin_free(pat, cp->mdev);
>>>>> out_free:
>>>>>     pfn_array_table_free(p);
>>>>> out_init:
>>>>>            ccw->cda = 0;
>>>>>            return ret;
>>>>> }
>>>>>
>>>>>
>>>>> And modify the pfn_array_table_unpin_free() to pfn_array_table_unpin()
>>>>> and add the freeing in pfn_array_table_free(p).
>>>>>
>>>>>
>>>>> Something like that with a correct return code handle.
>>>>>
>>>>> Which would make the code more logical and readable.
>>>>>
>>>>> What do you think?
>>>>
>>>> While I agree that this would improve the code, I'm not sure how much
>>>> of it remains at the end of this series (I haven't read it completely
>>>> yet.) IOW, is this a code change that would get kicked out again 
>>>> anyway?
>>>
>>> At the end,pfn_array_alloc_pin() is decoupled in pfn_array_alloc() and
>>> pfn_array_pin() which is a good thing but but the pfn_array_unpin_free()
>>> survives as unpin + free.
> 
> And don't forget, pfn_array_table_unpin_free() goes away entirely.  I do 
> add a 

check for a

> non-zero iova in patch 8.
> 

Sorry.  Coffee hasn't kicked in yet.  :)

>>
>> OK, this seems like a good idea to me, then :)
>>

Patch
diff mbox series

diff --git a/drivers/s390/cio/vfio_ccw_cp.c b/drivers/s390/cio/vfio_ccw_cp.c
index fd77e46eb3b2..ef5ab45d94b3 100644
--- a/drivers/s390/cio/vfio_ccw_cp.c
+++ b/drivers/s390/cio/vfio_ccw_cp.c
@@ -528,7 +528,7 @@  static int ccwchain_fetch_direct(struct ccwchain *chain,
 
 	ret = pfn_array_alloc_pin(pat->pat_pa, cp->mdev, ccw->cda, ccw->count);
 	if (ret < 0)
-		goto out_init;
+		goto out_unpin;
 
 	/* Translate this direct ccw to a idal ccw. */
 	idaws = kcalloc(ret, sizeof(*idaws), GFP_DMA | GFP_KERNEL);