diff mbox

[2/7] ide: account UNMAP (TRIM) operations

Message ID 1511196664-85304-3-git-send-email-anton.nefedov@virtuozzo.com (mailing list archive)
State New, archived
Headers show

Commit Message

Anton Nefedov Nov. 20, 2017, 4:50 p.m. UTC
Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 hw/ide/core.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

Comments

Alberto Garcia Dec. 5, 2017, 3:21 p.m. UTC | #1
On Mon 20 Nov 2017 05:50:59 PM CET, Anton Nefedov wrote:
> Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  hw/ide/core.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
>
> diff --git a/hw/ide/core.c b/hw/ide/core.c
> index 471d0c9..2e4dea7 100644
> --- a/hw/ide/core.c
> +++ b/hw/ide/core.c
> @@ -389,6 +389,7 @@ typedef struct TrimAIOCB {
>      QEMUIOVector *qiov;
>      BlockAIOCB *aiocb;
>      int i, j;
> +    BlockAcctCookie acct;
>  } TrimAIOCB;
>  
>  static void trim_aio_cancel(BlockAIOCB *acb)
> @@ -426,6 +427,14 @@ static void ide_trim_bh_cb(void *opaque)
>  static void ide_issue_trim_cb(void *opaque, int ret)
>  {
>      TrimAIOCB *iocb = opaque;
> +    if (iocb->i >= 0) {
> +        if (ret >= 0) {
> +            block_acct_done(blk_get_stats(iocb->blk), &iocb->acct);
> +        } else {
> +            block_acct_failed(blk_get_stats(iocb->blk), &iocb->acct);
> +        }
> +    }

This part looks fine, but don't you also need to account for invalid
requests (in ide_dma_cb() or somewhere else) ?

Berto
Anton Nefedov Dec. 5, 2017, 5:14 p.m. UTC | #2
On 5/12/2017 6:21 PM, Alberto Garcia wrote:
> On Mon 20 Nov 2017 05:50:59 PM CET, Anton Nefedov wrote:
>> Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
>> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>   hw/ide/core.c | 12 ++++++++++++
>>   1 file changed, 12 insertions(+)
>>
>> diff --git a/hw/ide/core.c b/hw/ide/core.c
>> index 471d0c9..2e4dea7 100644
>> --- a/hw/ide/core.c
>> +++ b/hw/ide/core.c
>> @@ -389,6 +389,7 @@ typedef struct TrimAIOCB {
>>       QEMUIOVector *qiov;
>>       BlockAIOCB *aiocb;
>>       int i, j;
>> +    BlockAcctCookie acct;
>>   } TrimAIOCB;
>>   
>>   static void trim_aio_cancel(BlockAIOCB *acb)
>> @@ -426,6 +427,14 @@ static void ide_trim_bh_cb(void *opaque)
>>   static void ide_issue_trim_cb(void *opaque, int ret)
>>   {
>>       TrimAIOCB *iocb = opaque;
>> +    if (iocb->i >= 0) {
>> +        if (ret >= 0) {
>> +            block_acct_done(blk_get_stats(iocb->blk), &iocb->acct);
>> +        } else {
>> +            block_acct_failed(blk_get_stats(iocb->blk), &iocb->acct);
>> +        }
>> +    }
> 
> This part looks fine, but don't you also need to account for invalid
> requests (in ide_dma_cb() or somewhere else) ?
> 
> Berto
> 

Good point; in fact, the TRIM sector range is never checked.
(well it should be, down at the block layer, and then counted as error).

The motivation was:

     commit d66168ed687325aa6d338ce3a3cff18ce3098ed6
     Author: Michael Tokarev <mjt@tls.msk.ru>
     Date:   Wed Aug 13 11:23:31 2014 +0400

     ide: only constrain read/write requests to drive size, not other types

     Commit 58ac321135a introduced a check to ide dma processing which
     constrains all requests to drive size.  However, apparently, some
     valid requests (like TRIM) does not fit in this constraint, and
     fails in 2.1.  So check the range only for reads and writes.


It seems like the removed check was at the wrong place (trim request has
to be parsed first to get offset and nbytes).
Probably it should be put to ide_issue_trim_cb() instead.

cc John (should have done it earlier)

/Anton
John Snow Dec. 6, 2017, 10:09 p.m. UTC | #3
On 12/05/2017 12:14 PM, Anton Nefedov wrote:
> 
> 
> On 5/12/2017 6:21 PM, Alberto Garcia wrote:
>> On Mon 20 Nov 2017 05:50:59 PM CET, Anton Nefedov wrote:
>>> Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
>>> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>> ---
>>>   hw/ide/core.c | 12 ++++++++++++
>>>   1 file changed, 12 insertions(+)
>>>
>>> diff --git a/hw/ide/core.c b/hw/ide/core.c
>>> index 471d0c9..2e4dea7 100644
>>> --- a/hw/ide/core.c
>>> +++ b/hw/ide/core.c
>>> @@ -389,6 +389,7 @@ typedef struct TrimAIOCB {
>>>       QEMUIOVector *qiov;
>>>       BlockAIOCB *aiocb;
>>>       int i, j;
>>> +    BlockAcctCookie acct;
>>>   } TrimAIOCB;
>>>     static void trim_aio_cancel(BlockAIOCB *acb)
>>> @@ -426,6 +427,14 @@ static void ide_trim_bh_cb(void *opaque)
>>>   static void ide_issue_trim_cb(void *opaque, int ret)
>>>   {
>>>       TrimAIOCB *iocb = opaque;
>>> +    if (iocb->i >= 0) {
>>> +        if (ret >= 0) {
>>> +            block_acct_done(blk_get_stats(iocb->blk), &iocb->acct);
>>> +        } else {
>>> +            block_acct_failed(blk_get_stats(iocb->blk), &iocb->acct);
>>> +        }
>>> +    }
>>
>> This part looks fine, but don't you also need to account for invalid
>> requests (in ide_dma_cb() or somewhere else) ?
>>

not ide_dma_cb this time, because the command does not use ATA registers
as input, see below :(

>> Berto
>>
> 
> Good point; in fact, the TRIM sector range is never checked.
> (well it should be, down at the block layer, and then counted as error).
> 
> The motivation was:
> 
>     commit d66168ed687325aa6d338ce3a3cff18ce3098ed6
>     Author: Michael Tokarev <mjt@tls.msk.ru>
>     Date:   Wed Aug 13 11:23:31 2014 +0400
> 
>     ide: only constrain read/write requests to drive size, not other types
> 
>     Commit 58ac321135a introduced a check to ide dma processing which
>     constrains all requests to drive size.  However, apparently, some
>     valid requests (like TRIM) does not fit in this constraint, and
>     fails in 2.1.  So check the range only for reads and writes.
> 

I wound up at the same commit. The problem here is that the TRIM command
does not issue contiguous LBA+count requests in the same way using the
ATA registers like DMA R/W functions do, but instead works a bit more
like DMA commands in that it transmits a list of regions separately.

Kevin pointed this out in 2014:

https://lists.gnu.org/archive/html/qemu-devel/2014-08/msg02012.html

"I can't give you a clear answer, but it all depends on the value of
sector_num. This is the contents of the LBA registers and unused for
TRIM (spec says it's reserved, so we shouldn't be looking at it)."

He's right! This means the LBA/Count registers are undefined when we're
using TRIM in ide_dma_cb.

So, you have two things you can do:

(1) Modify ide_sector_start_dma to start accounting for TRIM early, and
then continue to mark it failed/complete where you do, but you'll need
to add in a new error case in ide_issue_trim_cb to check the range and
abort the job. We do not need to preprocess the entire "list" of TRIM
regions upfront as we are within our rights to process some of them
before aborting.

(2) Continue to start accounting where you do, per-region, which keeps
trim requests "per chunk", but add the error range checking almost
immediately after, if this is useful for statistical purposes. It will
look a little funny to start accounting and then immediately and
synchronously mark it invalid, but that's probably the most accurate thing.

I'm basing this off of the ATA8-AC3 spec, which defines the command
"DATA SET MANAGEMENT - 06h, DMA":

> If the Trim bit is set to one and:
> a) the device detects an invalid LBA Range Entry; or
> b) count is greater than IDENTIFY DEVICE data word 105 (see 7.16.7.55),
> then the device shall return command aborted.
> A device may trim one or more LBA Range Entries before it returns
command aborted. See table 209.

ATA8-ACS3 section  seems pretty clear to me that we *may* abort the
command if we detect an "invalid LBA Range Entry" which is defined in
3.1.40 as:

"3.1.40 Invalid LBA Range: A range of LBAs that contains one or more
invalid LBAs."

which references 3.1.39:

3.1.39 Invalid LBA: An LBA that is greater than or equal to the largest
value reported in IDENTIFY
DEVICE data words 60..61 (see 7.16.7.22), IDENTIFY DEVICE data words
100..103
(see 7.16.7.53), or IDENTIFY DEVICE data words 230..233 (see 7.16.7.88).

Of course, we only know if a range could possibly be invalid by the time
we actually process it in ide_issue_trim_cb.

> 
> It seems like the removed check was at the wrong place (trim request has
> to be parsed first to get offset and nbytes).
> Probably it should be put to ide_issue_trim_cb() instead.
> 
> cc John (should have done it earlier)
> 

Hi!

> /Anton
>
diff mbox

Patch

diff --git a/hw/ide/core.c b/hw/ide/core.c
index 471d0c9..2e4dea7 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -389,6 +389,7 @@  typedef struct TrimAIOCB {
     QEMUIOVector *qiov;
     BlockAIOCB *aiocb;
     int i, j;
+    BlockAcctCookie acct;
 } TrimAIOCB;
 
 static void trim_aio_cancel(BlockAIOCB *acb)
@@ -426,6 +427,14 @@  static void ide_trim_bh_cb(void *opaque)
 static void ide_issue_trim_cb(void *opaque, int ret)
 {
     TrimAIOCB *iocb = opaque;
+    if (iocb->i >= 0) {
+        if (ret >= 0) {
+            block_acct_done(blk_get_stats(iocb->blk), &iocb->acct);
+        } else {
+            block_acct_failed(blk_get_stats(iocb->blk), &iocb->acct);
+        }
+    }
+
     if (ret >= 0) {
         while (iocb->j < iocb->qiov->niov) {
             int j = iocb->j;
@@ -442,6 +451,9 @@  static void ide_issue_trim_cb(void *opaque, int ret)
                     continue;
                 }
 
+                block_acct_start(blk_get_stats(iocb->blk), &iocb->acct,
+                                 count << BDRV_SECTOR_BITS, BLOCK_ACCT_UNMAP);
+
                 /* Got an entry! Submit and exit.  */
                 iocb->aiocb = blk_aio_pdiscard(iocb->blk,
                                                sector << BDRV_SECTOR_BITS,