diff mbox series

[v4,3/8] ide: account UNMAP (TRIM) operations

Message ID 1534844779-118784-4-git-send-email-anton.nefedov@virtuozzo.com (mailing list archive)
State New, archived
Headers show
Series discard blockstats | expand

Commit Message

Anton Nefedov Aug. 21, 2018, 9:46 a.m. UTC
Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
---
 hw/ide/core.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

Comments

Kevin Wolf Oct. 4, 2018, 3:33 p.m. UTC | #1
Am 21.08.2018 um 11:46 hat Anton Nefedov geschrieben:
> Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
> Reviewed-by: Alberto Garcia <berto@igalia.com>
> ---
>  hw/ide/core.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/hw/ide/core.c b/hw/ide/core.c
> index 2c62efc..352429b 100644
> --- a/hw/ide/core.c
> +++ b/hw/ide/core.c
> @@ -440,6 +440,14 @@ static void ide_issue_trim_cb(void *opaque, int ret)
>      TrimAIOCB *iocb = opaque;
>      IDEState *s = iocb->s;
>  
> +    if (iocb->i >= 0) {
> +        if (ret >= 0) {
> +            block_acct_done(blk_get_stats(s->blk), &s->acct);
> +        } else {
> +            block_acct_failed(blk_get_stats(s->blk), &s->acct);
> +        }
> +    }
> +
>      if (ret >= 0) {
>          while (iocb->j < iocb->qiov->niov) {
>              int j = iocb->j;
> @@ -461,6 +469,9 @@ static void ide_issue_trim_cb(void *opaque, int ret)
>                      goto done;
>                  }
>  
> +                block_acct_start(blk_get_stats(s->blk), &s->acct,
> +                                 count << BDRV_SECTOR_BITS, BLOCK_ACCT_UNMAP);
> +
>                  /* Got an entry! Submit and exit.  */
>                  iocb->aiocb = blk_aio_pdiscard(s->blk,
>                                                 sector << BDRV_SECTOR_BITS,
> @@ -845,6 +856,7 @@ static void ide_dma_cb(void *opaque, int ret)
>      }
>  
>      if (ret == -EINVAL) {
> +        block_acct_invalid(blk_get_stats(s->blk), BLOCK_ACCT_UNMAP);

This looks wrong to me, ide_dma_cb() is not only called for unmap, but
also for reads and writes, and each of them could return -EINVAL.

Also, -EINVAL doesn't necessarily mean that the guest driver did
something wrong, it could also be the result of a host problem.
Therefore, it isn't right to call block_acct_invalid() here - especially
since the request may already have been accounted for as either done or
failed in ide_issue_trim_cb().

Instead, I think it would be better to immediately account for invalid
requests in ide_issue_trim_cb() where iocb->ret = -EINVAL is set and we
know for sure that indeed !ide_sect_range_ok() is the cause for the
-EINVAL return code.

>          ide_dma_error(s);
>          return;
>      }

Kevin
Anton Nefedov Oct. 8, 2018, 2:38 p.m. UTC | #2
On 4/10/2018 6:33 PM, Kevin Wolf wrote:
> Am 21.08.2018 um 11:46 hat Anton Nefedov geschrieben:
>> Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
>> Reviewed-by: Alberto Garcia <berto@igalia.com>
>> ---
>>   hw/ide/core.c | 12 ++++++++++++
>>   1 file changed, 12 insertions(+)
>>
>> diff --git a/hw/ide/core.c b/hw/ide/core.c
>> index 2c62efc..352429b 100644
>> --- a/hw/ide/core.c
>> +++ b/hw/ide/core.c
>> @@ -440,6 +440,14 @@ static void ide_issue_trim_cb(void *opaque, int ret)
>>       TrimAIOCB *iocb = opaque;
>>       IDEState *s = iocb->s;
>>   
>> +    if (iocb->i >= 0) {
>> +        if (ret >= 0) {
>> +            block_acct_done(blk_get_stats(s->blk), &s->acct);
>> +        } else {
>> +            block_acct_failed(blk_get_stats(s->blk), &s->acct);
>> +        }
>> +    }
>> +
>>       if (ret >= 0) {
>>           while (iocb->j < iocb->qiov->niov) {
>>               int j = iocb->j;
>> @@ -461,6 +469,9 @@ static void ide_issue_trim_cb(void *opaque, int ret)
>>                       goto done;
>>                   }
>>   
>> +                block_acct_start(blk_get_stats(s->blk), &s->acct,
>> +                                 count << BDRV_SECTOR_BITS, BLOCK_ACCT_UNMAP);
>> +
>>                   /* Got an entry! Submit and exit.  */
>>                   iocb->aiocb = blk_aio_pdiscard(s->blk,
>>                                                  sector << BDRV_SECTOR_BITS,
>> @@ -845,6 +856,7 @@ static void ide_dma_cb(void *opaque, int ret)
>>       }
>>   
>>       if (ret == -EINVAL) {
>> +        block_acct_invalid(blk_get_stats(s->blk), BLOCK_ACCT_UNMAP);
> 
> This looks wrong to me, ide_dma_cb() is not only called for unmap, but
> also for reads and writes, and each of them could return -EINVAL.
> 

Stating here BLOCK_ACCT_UNMAP is definitely a blunder :(

> Also, -EINVAL doesn't necessarily mean that the guest driver did
> something wrong, it could also be the result of a host problem.
> Therefore, it isn't right to call block_acct_invalid() here - especially
> since the request may already have been accounted for as either done or
> failed in ide_issue_trim_cb().
> 

Couldn't be accounted done with such retcode;
and it seems I shouldnt do block_acct_failed() there anyway - or it's
accounted twice: there and in ide_dma_cb()->ide_handle_rw_error()

But if EINVAL (from further layers) should not be accounted as an
invalid op, then it should be accounted failed instead, the thing that
current code does not do.
(and which brings us back to possible double-accounting if we account
invalid in ide_issue_trim_cb() )

> Instead, I think it would be better to immediately account for invalid
> requests in ide_issue_trim_cb() where iocb->ret = -EINVAL is set and we
> know for sure that indeed !ide_sect_range_ok() is the cause for the
> -EINVAL return code.
> 
So I guess yes, move acct_invalid in ide_issue_trim_cb() and leave
acct_failed there, and filter off TRIM commands in the common
accounting.

/Anton
Kevin Wolf Oct. 8, 2018, 3:03 p.m. UTC | #3
Am 08.10.2018 um 16:38 hat Anton Nefedov geschrieben:
> On 4/10/2018 6:33 PM, Kevin Wolf wrote:
> > Am 21.08.2018 um 11:46 hat Anton Nefedov geschrieben:
> >> Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
> >> Reviewed-by: Alberto Garcia <berto@igalia.com>
> >> ---
> >>   hw/ide/core.c | 12 ++++++++++++
> >>   1 file changed, 12 insertions(+)
> >>
> >> diff --git a/hw/ide/core.c b/hw/ide/core.c
> >> index 2c62efc..352429b 100644
> >> --- a/hw/ide/core.c
> >> +++ b/hw/ide/core.c
> >> @@ -440,6 +440,14 @@ static void ide_issue_trim_cb(void *opaque, int ret)
> >>       TrimAIOCB *iocb = opaque;
> >>       IDEState *s = iocb->s;
> >>   
> >> +    if (iocb->i >= 0) {
> >> +        if (ret >= 0) {
> >> +            block_acct_done(blk_get_stats(s->blk), &s->acct);
> >> +        } else {
> >> +            block_acct_failed(blk_get_stats(s->blk), &s->acct);
> >> +        }
> >> +    }
> >> +
> >>       if (ret >= 0) {
> >>           while (iocb->j < iocb->qiov->niov) {
> >>               int j = iocb->j;
> >> @@ -461,6 +469,9 @@ static void ide_issue_trim_cb(void *opaque, int ret)
> >>                       goto done;
> >>                   }
> >>   
> >> +                block_acct_start(blk_get_stats(s->blk), &s->acct,
> >> +                                 count << BDRV_SECTOR_BITS, BLOCK_ACCT_UNMAP);
> >> +
> >>                   /* Got an entry! Submit and exit.  */
> >>                   iocb->aiocb = blk_aio_pdiscard(s->blk,
> >>                                                  sector << BDRV_SECTOR_BITS,
> >> @@ -845,6 +856,7 @@ static void ide_dma_cb(void *opaque, int ret)
> >>       }
> >>   
> >>       if (ret == -EINVAL) {
> >> +        block_acct_invalid(blk_get_stats(s->blk), BLOCK_ACCT_UNMAP);
> > 
> > This looks wrong to me, ide_dma_cb() is not only called for unmap, but
> > also for reads and writes, and each of them could return -EINVAL.
> > 
> 
> Stating here BLOCK_ACCT_UNMAP is definitely a blunder :(
> 
> > Also, -EINVAL doesn't necessarily mean that the guest driver did
> > something wrong, it could also be the result of a host problem.
> > Therefore, it isn't right to call block_acct_invalid() here - especially
> > since the request may already have been accounted for as either done or
> > failed in ide_issue_trim_cb().
> > 
> 
> Couldn't be accounted done with such retcode;
> and it seems I shouldnt do block_acct_failed() there anyway - or it's
> accounted twice: there and in ide_dma_cb()->ide_handle_rw_error()
> 
> But if EINVAL (from further layers) should not be accounted as an
> invalid op, then it should be accounted failed instead, the thing that
> current code does not do.
> (and which brings us back to possible double-accounting if we account
> invalid in ide_issue_trim_cb() )

Yes, commit caeadbc8ba4 was already wrong in assuming that there is
only one possible source for -EINVAL.

> > Instead, I think it would be better to immediately account for invalid
> > requests in ide_issue_trim_cb() where iocb->ret = -EINVAL is set and we
> > know for sure that indeed !ide_sect_range_ok() is the cause for the
> > -EINVAL return code.
> > 
> So I guess yes, move acct_invalid in ide_issue_trim_cb() and leave
> acct_failed there, and filter off TRIM commands in the common
> accounting.

blk_aio_discard() can fail with -EINVAL, too, so getting this error code
from a TRIM command doesn't mean anything. It can still have multiple
possible sources.

Maybe we just need to remember somewhere whether we already accounted
for a request (maybe an additional field in BlockAcctCookie? Or change
the type to BLOCK_ACCT_ALREADY_ACCOUNTED?) and then make an additional
block_account_one_io() call a no-op for such requests.

Kevin
Anton Nefedov Oct. 8, 2018, 3:25 p.m. UTC | #4
On 8/10/2018 6:03 PM, Kevin Wolf wrote:
> Am 08.10.2018 um 16:38 hat Anton Nefedov geschrieben:
>> On 4/10/2018 6:33 PM, Kevin Wolf wrote:
>>> Am 21.08.2018 um 11:46 hat Anton Nefedov geschrieben:
>>>> Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
>>>> Reviewed-by: Alberto Garcia <berto@igalia.com>
>>>> ---
>>>>    hw/ide/core.c | 12 ++++++++++++
>>>>    1 file changed, 12 insertions(+)
>>>>
>>>> diff --git a/hw/ide/core.c b/hw/ide/core.c
>>>> index 2c62efc..352429b 100644
>>>> --- a/hw/ide/core.c
>>>> +++ b/hw/ide/core.c
>>>> @@ -440,6 +440,14 @@ static void ide_issue_trim_cb(void *opaque, int ret)
>>>>        TrimAIOCB *iocb = opaque;
>>>>        IDEState *s = iocb->s;
>>>>    
>>>> +    if (iocb->i >= 0) {
>>>> +        if (ret >= 0) {
>>>> +            block_acct_done(blk_get_stats(s->blk), &s->acct);
>>>> +        } else {
>>>> +            block_acct_failed(blk_get_stats(s->blk), &s->acct);
>>>> +        }
>>>> +    }
>>>> +
>>>>        if (ret >= 0) {
>>>>            while (iocb->j < iocb->qiov->niov) {
>>>>                int j = iocb->j;
>>>> @@ -461,6 +469,9 @@ static void ide_issue_trim_cb(void *opaque, int ret)
>>>>                        goto done;
>>>>                    }
>>>>    
>>>> +                block_acct_start(blk_get_stats(s->blk), &s->acct,
>>>> +                                 count << BDRV_SECTOR_BITS, BLOCK_ACCT_UNMAP);
>>>> +
>>>>                    /* Got an entry! Submit and exit.  */
>>>>                    iocb->aiocb = blk_aio_pdiscard(s->blk,
>>>>                                                   sector << BDRV_SECTOR_BITS,
>>>> @@ -845,6 +856,7 @@ static void ide_dma_cb(void *opaque, int ret)
>>>>        }
>>>>    
>>>>        if (ret == -EINVAL) {
>>>> +        block_acct_invalid(blk_get_stats(s->blk), BLOCK_ACCT_UNMAP);
>>>
>>> This looks wrong to me, ide_dma_cb() is not only called for unmap, but
>>> also for reads and writes, and each of them could return -EINVAL.
>>>
>>
>> Stating here BLOCK_ACCT_UNMAP is definitely a blunder :(
>>
>>> Also, -EINVAL doesn't necessarily mean that the guest driver did
>>> something wrong, it could also be the result of a host problem.
>>> Therefore, it isn't right to call block_acct_invalid() here - especially
>>> since the request may already have been accounted for as either done or
>>> failed in ide_issue_trim_cb().
>>>
>>
>> Couldn't be accounted done with such retcode;
>> and it seems I shouldnt do block_acct_failed() there anyway - or it's
>> accounted twice: there and in ide_dma_cb()->ide_handle_rw_error()
>>
>> But if EINVAL (from further layers) should not be accounted as an
>> invalid op, then it should be accounted failed instead, the thing that
>> current code does not do.
>> (and which brings us back to possible double-accounting if we account
>> invalid in ide_issue_trim_cb() )
> 
> Yes, commit caeadbc8ba4 was already wrong in assuming that there is
> only one possible source for -EINVAL.
> 
>>> Instead, I think it would be better to immediately account for invalid
>>> requests in ide_issue_trim_cb() where iocb->ret = -EINVAL is set and we
>>> know for sure that indeed !ide_sect_range_ok() is the cause for the
>>> -EINVAL return code.
>>>
>> So I guess yes, move acct_invalid in ide_issue_trim_cb() and leave
>> acct_failed there, and filter off TRIM commands in the common
>> accounting.
> 
> blk_aio_discard() can fail with -EINVAL, too, so getting this error code
> from a TRIM command doesn't mean anything. It can still have multiple
> possible sources.
> 

I meant that common ide_dma_cb() should account EINVAL (along with other
errors) as failed, unless it's TRIM, which means it's already
accounted (either invalid or failed)

> Maybe we just need to remember somewhere whether we already accounted
> for a request (maybe an additional field in BlockAcctCookie? Or change
> the type to BLOCK_ACCT_ALREADY_ACCOUNTED?) and then make an additional
> block_account_one_io() call a no-op for such requests.
>  > Kevin
> 

Maybe even resetting to BLOCK_ACCT_NONE == 0. It should also protect
from accounting uninitialized cookie.

/Anton
Kevin Wolf Oct. 8, 2018, 3:46 p.m. UTC | #5
Am 08.10.2018 um 17:25 hat Anton Nefedov geschrieben:
> 
> 
> On 8/10/2018 6:03 PM, Kevin Wolf wrote:
> > Am 08.10.2018 um 16:38 hat Anton Nefedov geschrieben:
> >> On 4/10/2018 6:33 PM, Kevin Wolf wrote:
> >>> Am 21.08.2018 um 11:46 hat Anton Nefedov geschrieben:
> >>>> Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
> >>>> Reviewed-by: Alberto Garcia <berto@igalia.com>
> >>>> ---
> >>>>    hw/ide/core.c | 12 ++++++++++++
> >>>>    1 file changed, 12 insertions(+)
> >>>>
> >>>> diff --git a/hw/ide/core.c b/hw/ide/core.c
> >>>> index 2c62efc..352429b 100644
> >>>> --- a/hw/ide/core.c
> >>>> +++ b/hw/ide/core.c
> >>>> @@ -440,6 +440,14 @@ static void ide_issue_trim_cb(void *opaque, int ret)
> >>>>        TrimAIOCB *iocb = opaque;
> >>>>        IDEState *s = iocb->s;
> >>>>    
> >>>> +    if (iocb->i >= 0) {
> >>>> +        if (ret >= 0) {
> >>>> +            block_acct_done(blk_get_stats(s->blk), &s->acct);
> >>>> +        } else {
> >>>> +            block_acct_failed(blk_get_stats(s->blk), &s->acct);
> >>>> +        }
> >>>> +    }
> >>>> +
> >>>>        if (ret >= 0) {
> >>>>            while (iocb->j < iocb->qiov->niov) {
> >>>>                int j = iocb->j;
> >>>> @@ -461,6 +469,9 @@ static void ide_issue_trim_cb(void *opaque, int ret)
> >>>>                        goto done;
> >>>>                    }
> >>>>    
> >>>> +                block_acct_start(blk_get_stats(s->blk), &s->acct,
> >>>> +                                 count << BDRV_SECTOR_BITS, BLOCK_ACCT_UNMAP);
> >>>> +
> >>>>                    /* Got an entry! Submit and exit.  */
> >>>>                    iocb->aiocb = blk_aio_pdiscard(s->blk,
> >>>>                                                   sector << BDRV_SECTOR_BITS,
> >>>> @@ -845,6 +856,7 @@ static void ide_dma_cb(void *opaque, int ret)
> >>>>        }
> >>>>    
> >>>>        if (ret == -EINVAL) {
> >>>> +        block_acct_invalid(blk_get_stats(s->blk), BLOCK_ACCT_UNMAP);
> >>>
> >>> This looks wrong to me, ide_dma_cb() is not only called for unmap, but
> >>> also for reads and writes, and each of them could return -EINVAL.
> >>>
> >>
> >> Stating here BLOCK_ACCT_UNMAP is definitely a blunder :(
> >>
> >>> Also, -EINVAL doesn't necessarily mean that the guest driver did
> >>> something wrong, it could also be the result of a host problem.
> >>> Therefore, it isn't right to call block_acct_invalid() here - especially
> >>> since the request may already have been accounted for as either done or
> >>> failed in ide_issue_trim_cb().
> >>>
> >>
> >> Couldn't be accounted done with such retcode;
> >> and it seems I shouldnt do block_acct_failed() there anyway - or it's
> >> accounted twice: there and in ide_dma_cb()->ide_handle_rw_error()
> >>
> >> But if EINVAL (from further layers) should not be accounted as an
> >> invalid op, then it should be accounted failed instead, the thing that
> >> current code does not do.
> >> (and which brings us back to possible double-accounting if we account
> >> invalid in ide_issue_trim_cb() )
> > 
> > Yes, commit caeadbc8ba4 was already wrong in assuming that there is
> > only one possible source for -EINVAL.
> > 
> >>> Instead, I think it would be better to immediately account for invalid
> >>> requests in ide_issue_trim_cb() where iocb->ret = -EINVAL is set and we
> >>> know for sure that indeed !ide_sect_range_ok() is the cause for the
> >>> -EINVAL return code.
> >>>
> >> So I guess yes, move acct_invalid in ide_issue_trim_cb() and leave
> >> acct_failed there, and filter off TRIM commands in the common
> >> accounting.
> > 
> > blk_aio_discard() can fail with -EINVAL, too, so getting this error code
> > from a TRIM command doesn't mean anything. It can still have multiple
> > possible sources.
> > 
> 
> I meant that common ide_dma_cb() should account EINVAL (along with other
> errors) as failed, unless it's TRIM, which means it's already
> accounted (either invalid or failed)

Oh, you would already account for failure in ide_issue_trim_cb(), too,
but only if it's EINVAL? That feels like it would complicate the code
quite a bit.

And actually, didn't commit caeadbc8ba4 break werror=stop for requests
returning -EINVAL because we don't call ide_handle_rw_error() any more?

> > Maybe we just need to remember somewhere whether we already accounted
> > for a request (maybe an additional field in BlockAcctCookie? Or change
> > the type to BLOCK_ACCT_ALREADY_ACCOUNTED?) and then make an additional
> > block_account_one_io() call a no-op for such requests.
> 
> Maybe even resetting to BLOCK_ACCT_NONE == 0. It should also protect
> from accounting uninitialized cookie.

That sounds good to me.

Kevin
Anton Nefedov Oct. 8, 2018, 4:04 p.m. UTC | #6
On 8/10/2018 6:46 PM, Kevin Wolf wrote:
> Am 08.10.2018 um 17:25 hat Anton Nefedov geschrieben:
>>
>>
>> On 8/10/2018 6:03 PM, Kevin Wolf wrote:
>>> Am 08.10.2018 um 16:38 hat Anton Nefedov geschrieben:
>>>> On 4/10/2018 6:33 PM, Kevin Wolf wrote:
>>>>> Am 21.08.2018 um 11:46 hat Anton Nefedov geschrieben:
>>>>>> Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
>>>>>> Reviewed-by: Alberto Garcia <berto@igalia.com>
>>>>>> ---
>>>>>>     hw/ide/core.c | 12 ++++++++++++
>>>>>>     1 file changed, 12 insertions(+)
>>>>>>
>>>>>> diff --git a/hw/ide/core.c b/hw/ide/core.c
>>>>>> index 2c62efc..352429b 100644
>>>>>> --- a/hw/ide/core.c
>>>>>> +++ b/hw/ide/core.c
>>>>>> @@ -440,6 +440,14 @@ static void ide_issue_trim_cb(void *opaque, int ret)
>>>>>>         TrimAIOCB *iocb = opaque;
>>>>>>         IDEState *s = iocb->s;
>>>>>>     
>>>>>> +    if (iocb->i >= 0) {
>>>>>> +        if (ret >= 0) {
>>>>>> +            block_acct_done(blk_get_stats(s->blk), &s->acct);
>>>>>> +        } else {
>>>>>> +            block_acct_failed(blk_get_stats(s->blk), &s->acct);
>>>>>> +        }
>>>>>> +    }
>>>>>> +
>>>>>>         if (ret >= 0) {
>>>>>>             while (iocb->j < iocb->qiov->niov) {
>>>>>>                 int j = iocb->j;
>>>>>> @@ -461,6 +469,9 @@ static void ide_issue_trim_cb(void *opaque, int ret)
>>>>>>                         goto done;
>>>>>>                     }
>>>>>>     
>>>>>> +                block_acct_start(blk_get_stats(s->blk), &s->acct,
>>>>>> +                                 count << BDRV_SECTOR_BITS, BLOCK_ACCT_UNMAP);
>>>>>> +
>>>>>>                     /* Got an entry! Submit and exit.  */
>>>>>>                     iocb->aiocb = blk_aio_pdiscard(s->blk,
>>>>>>                                                    sector << BDRV_SECTOR_BITS,
>>>>>> @@ -845,6 +856,7 @@ static void ide_dma_cb(void *opaque, int ret)
>>>>>>         }
>>>>>>     
>>>>>>         if (ret == -EINVAL) {
>>>>>> +        block_acct_invalid(blk_get_stats(s->blk), BLOCK_ACCT_UNMAP);
>>>>>
>>>>> This looks wrong to me, ide_dma_cb() is not only called for unmap, but
>>>>> also for reads and writes, and each of them could return -EINVAL.
>>>>>
>>>>
>>>> Stating here BLOCK_ACCT_UNMAP is definitely a blunder :(
>>>>
>>>>> Also, -EINVAL doesn't necessarily mean that the guest driver did
>>>>> something wrong, it could also be the result of a host problem.
>>>>> Therefore, it isn't right to call block_acct_invalid() here - especially
>>>>> since the request may already have been accounted for as either done or
>>>>> failed in ide_issue_trim_cb().
>>>>>
>>>>
>>>> Couldn't be accounted done with such retcode;
>>>> and it seems I shouldnt do block_acct_failed() there anyway - or it's
>>>> accounted twice: there and in ide_dma_cb()->ide_handle_rw_error()
>>>>
>>>> But if EINVAL (from further layers) should not be accounted as an
>>>> invalid op, then it should be accounted failed instead, the thing that
>>>> current code does not do.
>>>> (and which brings us back to possible double-accounting if we account
>>>> invalid in ide_issue_trim_cb() )
>>>
>>> Yes, commit caeadbc8ba4 was already wrong in assuming that there is
>>> only one possible source for -EINVAL.
>>>
>>>>> Instead, I think it would be better to immediately account for invalid
>>>>> requests in ide_issue_trim_cb() where iocb->ret = -EINVAL is set and we
>>>>> know for sure that indeed !ide_sect_range_ok() is the cause for the
>>>>> -EINVAL return code.
>>>>>
>>>> So I guess yes, move acct_invalid in ide_issue_trim_cb() and leave
>>>> acct_failed there, and filter off TRIM commands in the common
>>>> accounting.
>>>
>>> blk_aio_discard() can fail with -EINVAL, too, so getting this error code
>>> from a TRIM command doesn't mean anything. It can still have multiple
>>> possible sources.
>>>
>>
>> I meant that common ide_dma_cb() should account EINVAL (along with other
>> errors) as failed, unless it's TRIM, which means it's already
>> accounted (either invalid or failed)
> 
> Oh, you would already account for failure in ide_issue_trim_cb(), too,
> but only if it's EINVAL? That feels like it would complicate the code
> quite a bit.
> 

No, no :) ide_issue_trim_cb does the proper accounting (failed/invalid)
for TRIM.
Then common path (ide_dma_cb()) does not account TRIM operations at all
regardless of the error code. No need to check for TRIM specifically if
we have BLOCK_ACCT_NONE.

> And actually, didn't commit caeadbc8ba4 break werror=stop for requests
> returning -EINVAL because we don't call ide_handle_rw_error() any more?
> 

Yes.

Read/write ignore werror=stop for invalid range case (not for EINVAL).
I wonder if it's crucial to ignore it for TRIM too, otherwise we could
just remove this chunk

      if (ret == -EINVAL) {
          ide_dma_error(s);
          return;
      }
Kevin Wolf Oct. 8, 2018, 4:43 p.m. UTC | #7
Am 08.10.2018 um 18:04 hat Anton Nefedov geschrieben:
> 
> 
> On 8/10/2018 6:46 PM, Kevin Wolf wrote:
> > Am 08.10.2018 um 17:25 hat Anton Nefedov geschrieben:
> >>
> >>
> >> On 8/10/2018 6:03 PM, Kevin Wolf wrote:
> >>> Am 08.10.2018 um 16:38 hat Anton Nefedov geschrieben:
> >>>> On 4/10/2018 6:33 PM, Kevin Wolf wrote:
> >>>>> Am 21.08.2018 um 11:46 hat Anton Nefedov geschrieben:
> >>>>>> Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
> >>>>>> Reviewed-by: Alberto Garcia <berto@igalia.com>
> >>>>>> ---
> >>>>>>     hw/ide/core.c | 12 ++++++++++++
> >>>>>>     1 file changed, 12 insertions(+)
> >>>>>>
> >>>>>> diff --git a/hw/ide/core.c b/hw/ide/core.c
> >>>>>> index 2c62efc..352429b 100644
> >>>>>> --- a/hw/ide/core.c
> >>>>>> +++ b/hw/ide/core.c
> >>>>>> @@ -440,6 +440,14 @@ static void ide_issue_trim_cb(void *opaque, int ret)
> >>>>>>         TrimAIOCB *iocb = opaque;
> >>>>>>         IDEState *s = iocb->s;
> >>>>>>     
> >>>>>> +    if (iocb->i >= 0) {
> >>>>>> +        if (ret >= 0) {
> >>>>>> +            block_acct_done(blk_get_stats(s->blk), &s->acct);
> >>>>>> +        } else {
> >>>>>> +            block_acct_failed(blk_get_stats(s->blk), &s->acct);
> >>>>>> +        }
> >>>>>> +    }
> >>>>>> +
> >>>>>>         if (ret >= 0) {
> >>>>>>             while (iocb->j < iocb->qiov->niov) {
> >>>>>>                 int j = iocb->j;
> >>>>>> @@ -461,6 +469,9 @@ static void ide_issue_trim_cb(void *opaque, int ret)
> >>>>>>                         goto done;
> >>>>>>                     }
> >>>>>>     
> >>>>>> +                block_acct_start(blk_get_stats(s->blk), &s->acct,
> >>>>>> +                                 count << BDRV_SECTOR_BITS, BLOCK_ACCT_UNMAP);
> >>>>>> +
> >>>>>>                     /* Got an entry! Submit and exit.  */
> >>>>>>                     iocb->aiocb = blk_aio_pdiscard(s->blk,
> >>>>>>                                                    sector << BDRV_SECTOR_BITS,
> >>>>>> @@ -845,6 +856,7 @@ static void ide_dma_cb(void *opaque, int ret)
> >>>>>>         }
> >>>>>>     
> >>>>>>         if (ret == -EINVAL) {
> >>>>>> +        block_acct_invalid(blk_get_stats(s->blk), BLOCK_ACCT_UNMAP);
> >>>>>
> >>>>> This looks wrong to me, ide_dma_cb() is not only called for unmap, but
> >>>>> also for reads and writes, and each of them could return -EINVAL.
> >>>>>
> >>>>
> >>>> Stating here BLOCK_ACCT_UNMAP is definitely a blunder :(
> >>>>
> >>>>> Also, -EINVAL doesn't necessarily mean that the guest driver did
> >>>>> something wrong, it could also be the result of a host problem.
> >>>>> Therefore, it isn't right to call block_acct_invalid() here - especially
> >>>>> since the request may already have been accounted for as either done or
> >>>>> failed in ide_issue_trim_cb().
> >>>>>
> >>>>
> >>>> Couldn't be accounted done with such retcode;
> >>>> and it seems I shouldnt do block_acct_failed() there anyway - or it's
> >>>> accounted twice: there and in ide_dma_cb()->ide_handle_rw_error()
> >>>>
> >>>> But if EINVAL (from further layers) should not be accounted as an
> >>>> invalid op, then it should be accounted failed instead, the thing that
> >>>> current code does not do.
> >>>> (and which brings us back to possible double-accounting if we account
> >>>> invalid in ide_issue_trim_cb() )
> >>>
> >>> Yes, commit caeadbc8ba4 was already wrong in assuming that there is
> >>> only one possible source for -EINVAL.
> >>>
> >>>>> Instead, I think it would be better to immediately account for invalid
> >>>>> requests in ide_issue_trim_cb() where iocb->ret = -EINVAL is set and we
> >>>>> know for sure that indeed !ide_sect_range_ok() is the cause for the
> >>>>> -EINVAL return code.
> >>>>>
> >>>> So I guess yes, move acct_invalid in ide_issue_trim_cb() and leave
> >>>> acct_failed there, and filter off TRIM commands in the common
> >>>> accounting.
> >>>
> >>> blk_aio_discard() can fail with -EINVAL, too, so getting this error code
> >>> from a TRIM command doesn't mean anything. It can still have multiple
> >>> possible sources.
> >>>
> >>
> >> I meant that common ide_dma_cb() should account EINVAL (along with other
> >> errors) as failed, unless it's TRIM, which means it's already
> >> accounted (either invalid or failed)
> > 
> > Oh, you would already account for failure in ide_issue_trim_cb(), too,
> > but only if it's EINVAL? That feels like it would complicate the code
> > quite a bit.
> > 
> 
> No, no :) ide_issue_trim_cb does the proper accounting (failed/invalid)
> for TRIM.
> Then common path (ide_dma_cb()) does not account TRIM operations at all
> regardless of the error code. No need to check for TRIM specifically if
> we have BLOCK_ACCT_NONE.
> 
> > And actually, didn't commit caeadbc8ba4 break werror=stop for requests
> > returning -EINVAL because we don't call ide_handle_rw_error() any more?
> > 
> 
> Yes.
> 
> Read/write ignore werror=stop for invalid range case (not for EINVAL).
> I wonder if it's crucial to ignore it for TRIM too, otherwise we could
> just remove this chunk
> 
>       if (ret == -EINVAL) {
>           ide_dma_error(s);
>           return;
>       }

Ah, right, I forgot about this.

It is actually desirable to avoid stopping for invalid requests because
we should only stop for host errors. An invalid request is a guest error
and stopping the VM will do nothing to fix it. (Resuming the VM would
immediately fail again, so the VM would be locked in paused state.)

Kevin
Anton Nefedov Oct. 17, 2018, 3:32 p.m. UTC | #8
On 8/10/2018 7:43 PM, Kevin Wolf wrote:
> Am 08.10.2018 um 18:04 hat Anton Nefedov geschrieben:
>>
>>
>> On 8/10/2018 6:46 PM, Kevin Wolf wrote:
>>> Am 08.10.2018 um 17:25 hat Anton Nefedov geschrieben:
>>>>
>>>>
>>>> On 8/10/2018 6:03 PM, Kevin Wolf wrote:
>>>>> Am 08.10.2018 um 16:38 hat Anton Nefedov geschrieben:
>>>>>> On 4/10/2018 6:33 PM, Kevin Wolf wrote:
>>>>>>> Am 21.08.2018 um 11:46 hat Anton Nefedov geschrieben:
>>>>>>>> Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
>>>>>>>> Reviewed-by: Alberto Garcia <berto@igalia.com>
>>>>>>>> ---
>>>>>>>>      hw/ide/core.c | 12 ++++++++++++
>>>>>>>>      1 file changed, 12 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/hw/ide/core.c b/hw/ide/core.c
>>>>>>>> index 2c62efc..352429b 100644
>>>>>>>> --- a/hw/ide/core.c
>>>>>>>> +++ b/hw/ide/core.c
>>>>>>>> @@ -440,6 +440,14 @@ static void ide_issue_trim_cb(void *opaque, int ret)
>>>>>>>>          TrimAIOCB *iocb = opaque;
>>>>>>>>          IDEState *s = iocb->s;
>>>>>>>>      
>>>>>>>> +    if (iocb->i >= 0) {
>>>>>>>> +        if (ret >= 0) {
>>>>>>>> +            block_acct_done(blk_get_stats(s->blk), &s->acct);
>>>>>>>> +        } else {
>>>>>>>> +            block_acct_failed(blk_get_stats(s->blk), &s->acct);
>>>>>>>> +        }
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>>          if (ret >= 0) {
>>>>>>>>              while (iocb->j < iocb->qiov->niov) {
>>>>>>>>                  int j = iocb->j;
>>>>>>>> @@ -461,6 +469,9 @@ static void ide_issue_trim_cb(void *opaque, int ret)
>>>>>>>>                          goto done;
>>>>>>>>                      }
>>>>>>>>      
>>>>>>>> +                block_acct_start(blk_get_stats(s->blk), &s->acct,
>>>>>>>> +                                 count << BDRV_SECTOR_BITS, BLOCK_ACCT_UNMAP);
>>>>>>>> +
>>>>>>>>                      /* Got an entry! Submit and exit.  */
>>>>>>>>                      iocb->aiocb = blk_aio_pdiscard(s->blk,
>>>>>>>>                                                     sector << BDRV_SECTOR_BITS,
>>>>>>>> @@ -845,6 +856,7 @@ static void ide_dma_cb(void *opaque, int ret)
>>>>>>>>          }
>>>>>>>>      
>>>>>>>>          if (ret == -EINVAL) {
>>>>>>>> +        block_acct_invalid(blk_get_stats(s->blk), BLOCK_ACCT_UNMAP);
>>>>>>>
>>>>>>> This looks wrong to me, ide_dma_cb() is not only called for unmap, but
>>>>>>> also for reads and writes, and each of them could return -EINVAL.
>>>>>>>
>>>>>>
>>>>>> Stating here BLOCK_ACCT_UNMAP is definitely a blunder :(
>>>>>>
>>>>>>> Also, -EINVAL doesn't necessarily mean that the guest driver did
>>>>>>> something wrong, it could also be the result of a host problem.
>>>>>>> Therefore, it isn't right to call block_acct_invalid() here - especially
>>>>>>> since the request may already have been accounted for as either done or
>>>>>>> failed in ide_issue_trim_cb().
>>>>>>>
>>>>>>
>>>>>> Couldn't be accounted done with such retcode;
>>>>>> and it seems I shouldnt do block_acct_failed() there anyway - or it's
>>>>>> accounted twice: there and in ide_dma_cb()->ide_handle_rw_error()
>>>>>>
>>>>>> But if EINVAL (from further layers) should not be accounted as an
>>>>>> invalid op, then it should be accounted failed instead, the thing that
>>>>>> current code does not do.
>>>>>> (and which brings us back to possible double-accounting if we account
>>>>>> invalid in ide_issue_trim_cb() )
>>>>>
>>>>> Yes, commit caeadbc8ba4 was already wrong in assuming that there is
>>>>> only one possible source for -EINVAL.
>>>>>
>>>>>>> Instead, I think it would be better to immediately account for invalid
>>>>>>> requests in ide_issue_trim_cb() where iocb->ret = -EINVAL is set and we
>>>>>>> know for sure that indeed !ide_sect_range_ok() is the cause for the
>>>>>>> -EINVAL return code.
>>>>>>>
>>>>>> So I guess yes, move acct_invalid in ide_issue_trim_cb() and leave
>>>>>> acct_failed there, and filter off TRIM commands in the common
>>>>>> accounting.
>>>>>
>>>>> blk_aio_discard() can fail with -EINVAL, too, so getting this error code
>>>>> from a TRIM command doesn't mean anything. It can still have multiple
>>>>> possible sources.
>>>>>
>>>>
>>>> I meant that common ide_dma_cb() should account EINVAL (along with other
>>>> errors) as failed, unless it's TRIM, which means it's already
>>>> accounted (either invalid or failed)
>>>
>>> Oh, you would already account for failure in ide_issue_trim_cb(), too,
>>> but only if it's EINVAL? That feels like it would complicate the code
>>> quite a bit.
>>>
>>
>> No, no :) ide_issue_trim_cb does the proper accounting (failed/invalid)
>> for TRIM.
>> Then common path (ide_dma_cb()) does not account TRIM operations at all
>> regardless of the error code. No need to check for TRIM specifically if
>> we have BLOCK_ACCT_NONE.
>>
>>> And actually, didn't commit caeadbc8ba4 break werror=stop for requests
>>> returning -EINVAL because we don't call ide_handle_rw_error() any more?
>>>
>>
>> Yes.
>>
>> Read/write ignore werror=stop for invalid range case (not for EINVAL).
>> I wonder if it's crucial to ignore it for TRIM too, otherwise we could
>> just remove this chunk
>>
>>        if (ret == -EINVAL) {
>>            ide_dma_error(s);
>>            return;
>>        }
> 
> Ah, right, I forgot about this.
> 
> It is actually desirable to avoid stopping for invalid requests because
> we should only stop for host errors. An invalid request is a guest error
> and stopping the VM will do nothing to fix it. (Resuming the VM would
> immediately fail again, so the VM would be locked in paused state.)
> 
> Kevin
> 

I can't come up with a perfect solution of this.

It's hard to reach TRIM sector-ranges from common ide_dma_cb() (which
only has QEMUSGList) and it's hard to accurately report range invalidity
to ide_dma_cb() with a single retval.

I see the following options for invalid range trim:

   1. distinguish range invalidity in ide_dma_cb() with some unused
      errcode instead of -EINVAL
      - does one exist?
   2. use some new global flag on IDEState
      - every callback (ide_dma_cb, pmac_ide_transfer_cb) must check
        and clear that
   3. parse SGList and check range invalidity separately in ide_dma_cb()
      - somewhat too intrusive change

or put up with it:
   4. ignore (account as invalid but do not abort)
   5. treat as a host error i.e. call ide_handle_rw_error() and stop VM
      if werror=stop
   6. keep -EINVAL (as done now) and potentially ignore configured error
      action for the host returned -EINVAL

What do you think?

/Anton
diff mbox series

Patch

diff --git a/hw/ide/core.c b/hw/ide/core.c
index 2c62efc..352429b 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -440,6 +440,14 @@  static void ide_issue_trim_cb(void *opaque, int ret)
     TrimAIOCB *iocb = opaque;
     IDEState *s = iocb->s;
 
+    if (iocb->i >= 0) {
+        if (ret >= 0) {
+            block_acct_done(blk_get_stats(s->blk), &s->acct);
+        } else {
+            block_acct_failed(blk_get_stats(s->blk), &s->acct);
+        }
+    }
+
     if (ret >= 0) {
         while (iocb->j < iocb->qiov->niov) {
             int j = iocb->j;
@@ -461,6 +469,9 @@  static void ide_issue_trim_cb(void *opaque, int ret)
                     goto done;
                 }
 
+                block_acct_start(blk_get_stats(s->blk), &s->acct,
+                                 count << BDRV_SECTOR_BITS, BLOCK_ACCT_UNMAP);
+
                 /* Got an entry! Submit and exit.  */
                 iocb->aiocb = blk_aio_pdiscard(s->blk,
                                                sector << BDRV_SECTOR_BITS,
@@ -845,6 +856,7 @@  static void ide_dma_cb(void *opaque, int ret)
     }
 
     if (ret == -EINVAL) {
+        block_acct_invalid(blk_get_stats(s->blk), BLOCK_ACCT_UNMAP);
         ide_dma_error(s);
         return;
     }