mbox series

[00/16] hw/block/nvme: zoned namespace command set

Message ID 20200924204516.1881843-1-its@irrelevant.dk (mailing list archive)
Headers show
Series hw/block/nvme: zoned namespace command set | expand

Message

Klaus Jensen Sept. 24, 2020, 8:45 p.m. UTC
From: Klaus Jensen <k.jensen@samsung.com>

While going through a few rounds of reviews on Dmitry's series I have
also continued nursing my own implementation since originally posted. I
did not receive any reviews originally, since it depended on a lot of
preceding series, but now, with the staging of multiple namespaces on
nvme-next yesterday, I think it deserves another shot since it now
applies directly. The series consists of a couple of trivial patches
followed by the "hw/block/nvme: add support for dulbe and block
utilization tracking", "hw/block/nvme: support namespace types" and the
set of zoned namespace support patches.

A couple of points on how this defers from Dmitry et. al.'s series and
why I think this implementation deserves a review.

  * Standard blockdev-based approach to persistent state. The
    implementation uses a plain blockdev associated with the nvme-ns
    device for storing state persistently. This same 'pstate' blockdev
    is also used for logical block allocation tracking.

  * Relies on automatic configuration of DLFEAT according to what the
    underlying blockdev provides (i.e. BDRV_O_UNMAP for guaranteeing
    zeroes on discarded blocks) for handling reads in the gaps between
    write pointer, ZCAP and ZSZE. Issues discards for zone resets. This
    removes the zero filling.

Finally, I wrote this. I am *NOT* saying that this somehow makes it
better, but as a maintainer, is a big deal to me since both series are
arguably a lot of code to maintain and support (both series are about
the same size). But - I am not the only maintainer, so if Keith (now
suddenly placed in the grim role as some sort of arbiter) signs off on
Dmitry's series, then so be it, I will rest my case.

I think we all want to see an implementation of zoned namespaces in QEMU
sooner rather than later, but I would lie if I said I wouldn't prefer
that it was this one.

Based-on: <20200922084533.1273962-1-its@irrelevant.dk>

Gollu Appalanaidu (1):
  hw/block/nvme: add commands supported and effects log page

Klaus Jensen (15):
  hw/block/nvme: add nsid to get/setfeat trace events
  hw/block/nvme: add trace event for requests with non-zero status code
  hw/block/nvme: make lba data size configurable
  hw/block/nvme: reject io commands if only admin command set selected
  hw/block/nvme: consolidate read, write and write zeroes
  hw/block/nvme: add support for dulbe and block utilization tracking
  hw/block/nvme: support namespace types
  hw/block/nvme: add basic read/write for zoned namespaces
  hw/block/nvme: add the zone management receive command
  hw/block/nvme: add the zone management send command
  hw/block/nvme: add the zone append command
  hw/block/nvme: track and enforce zone resources
  hw/block/nvme: allow open to close transitions by controller
  hw/block/nvme: support zone active excursions
  hw/block/nvme: support reset/finish recommended limits

 docs/specs/nvme.txt   |   49 +-
 hw/block/nvme-ns.h    |  166 +++-
 hw/block/nvme.h       |   24 +
 include/block/nvme.h  |  262 ++++++-
 block/nvme.c          |    4 +-
 hw/block/nvme-ns.c    |  411 +++++++++-
 hw/block/nvme.c       | 1727 +++++++++++++++++++++++++++++++++++++++--
 hw/block/trace-events |   50 +-
 8 files changed, 2580 insertions(+), 113 deletions(-)

Comments

no-reply@patchew.org Sept. 24, 2020, 10:43 p.m. UTC | #1
Patchew URL: https://patchew.org/QEMU/20200924204516.1881843-1-its@irrelevant.dk/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Message-id: 20200924204516.1881843-1-its@irrelevant.dk
Subject: [PATCH 00/16] hw/block/nvme: zoned namespace command set

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
Switched to a new branch 'test'
a59b4bb hw/block/nvme: support reset/finish recommended limits
83d78bf hw/block/nvme: support zone active excursions
eb9852e hw/block/nvme: allow open to close transitions by controller
4018228 hw/block/nvme: track and enforce zone resources
3f934e8 hw/block/nvme: add the zone append command
6343d89 hw/block/nvme: add the zone management send command
a527eef hw/block/nvme: add the zone management receive command
fd032f9 hw/block/nvme: add basic read/write for zoned namespaces
aef6511 hw/block/nvme: support namespace types
ab4c119 hw/block/nvme: add commands supported and effects log page
3eb56a0 hw/block/nvme: add support for dulbe and block utilization tracking
b532fe0 hw/block/nvme: consolidate read, write and write zeroes
e992082 hw/block/nvme: reject io commands if only admin command set selected
8a19e08 hw/block/nvme: make lba data size configurable
3edfb11 hw/block/nvme: add trace event for requests with non-zero status code
8de0031 hw/block/nvme: add nsid to get/setfeat trace events

=== OUTPUT BEGIN ===
1/16 Checking commit 8de00318317d (hw/block/nvme: add nsid to get/setfeat trace events)
2/16 Checking commit 3edfb1110713 (hw/block/nvme: add trace event for requests with non-zero status code)
3/16 Checking commit 8a19e08afeb7 (hw/block/nvme: make lba data size configurable)
4/16 Checking commit e992082bb9bf (hw/block/nvme: reject io commands if only admin command set selected)
5/16 Checking commit b532fe07a157 (hw/block/nvme: consolidate read, write and write zeroes)
6/16 Checking commit 3eb56a0748fb (hw/block/nvme: add support for dulbe and block utilization tracking)
7/16 Checking commit ab4c119d9d68 (hw/block/nvme: add commands supported and effects log page)
ERROR: Macros with complex values should be enclosed in parenthesis
#46: FILE: hw/block/nvme.c:131:
+#define NVME_EFFECTS_NVM_INITIALIZER                   \
+    [NVME_CMD_FLUSH]            = NVME_EFFECTS_CSUPP | \
+                                  NVME_EFFECTS_LBCC,   \
+    [NVME_CMD_WRITE]            = NVME_EFFECTS_CSUPP | \
+                                  NVME_EFFECTS_LBCC,   \
+    [NVME_CMD_READ]             = NVME_EFFECTS_CSUPP,  \
+    [NVME_CMD_WRITE_ZEROES]     = NVME_EFFECTS_CSUPP | \
+                                  NVME_EFFECTS_LBCC

total: 1 errors, 0 warnings, 149 lines checked

Patch 7/16 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

8/16 Checking commit aef6511be82b (hw/block/nvme: support namespace types)
9/16 Checking commit fd032f918b37 (hw/block/nvme: add basic read/write for zoned namespaces)
10/16 Checking commit a527eef9b9fe (hw/block/nvme: add the zone management receive command)
11/16 Checking commit 6343d89bf734 (hw/block/nvme: add the zone management send command)
WARNING: Block comments use a leading /* on a separate line
#66: FILE: hw/block/nvme.c:1118:
+    return __nvme_allocate(ns, slba, nlb, false /* deallocate */);

WARNING: Block comments use a leading /* on a separate line
#77: FILE: hw/block/nvme.c:1129:
+    return __nvme_allocate(ns, slba, nlb, true /* deallocate */);

total: 0 errors, 2 warnings, 704 lines checked

Patch 11/16 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
12/16 Checking commit 3f934e89564a (hw/block/nvme: add the zone append command)
13/16 Checking commit 40182287d15e (hw/block/nvme: track and enforce zone resources)
14/16 Checking commit eb9852ee9c0f (hw/block/nvme: allow open to close transitions by controller)
15/16 Checking commit 83d78bf53392 (hw/block/nvme: support zone active excursions)
16/16 Checking commit a59b4bb2c855 (hw/block/nvme: support reset/finish recommended limits)
=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/20200924204516.1881843-1-its@irrelevant.dk/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com
Keith Busch Sept. 25, 2020, 1:24 a.m. UTC | #2
On Thu, Sep 24, 2020 at 10:45:00PM +0200, Klaus Jensen wrote:
> Finally, I wrote this. I am *NOT* saying that this somehow makes it
> better, but as a maintainer, is a big deal to me since both series are
> arguably a lot of code to maintain and support (both series are about
> the same size). But - I am not the only maintainer, so if Keith (now
> suddenly placed in the grim role as some sort of arbiter) signs off on
> Dmitry's series, then so be it, I will rest my case.

I think it's neat there's enough interest in ZNS that we have multiple
solutions to consider.

I'm still catching up from virtual conferencing, but I should be able to have a
look over the weekend. I know everyone's put a lot of work into the development
of this capability, so maybe there's something to be taken from both? Not sure
yet if that's feasible, but I'll have a better idea on that later.
Klaus Jensen Sept. 25, 2020, 5:27 a.m. UTC | #3
On Sep 24 19:24, Keith Busch wrote:
> On Thu, Sep 24, 2020 at 10:45:00PM +0200, Klaus Jensen wrote:
> > Finally, I wrote this. I am *NOT* saying that this somehow makes it
> > better, but as a maintainer, is a big deal to me since both series are
> > arguably a lot of code to maintain and support (both series are about
> > the same size). But - I am not the only maintainer, so if Keith (now
> > suddenly placed in the grim role as some sort of arbiter) signs off on
> > Dmitry's series, then so be it, I will rest my case.
> 
> I think it's neat there's enough interest in ZNS that we have multiple
> solutions to consider.
> 

Yes - it is a luxury problem :)

> I'm still catching up from virtual conferencing, but I should be able to have a
> look over the weekend. I know everyone's put a lot of work into the development
> of this capability, so maybe there's something to be taken from both? Not sure
> yet if that's feasible, but I'll have a better idea on that later.

Thanks Keith, sounds good.
Klaus Jensen Sept. 25, 2020, 7:55 a.m. UTC | #4
On Sep 24 15:43, no-reply@patchew.org wrote:
> Patchew URL: https://patchew.org/QEMU/20200924204516.1881843-1-its@irrelevant.dk/
> 
> 
> 
> Hi,
> 
> This series seems to have some coding style problems. See output below for
> more information:
> 
> Type: series
> Message-id: 20200924204516.1881843-1-its@irrelevant.dk
> Subject: [PATCH 00/16] hw/block/nvme: zoned namespace command set
> 
> 7/16 Checking commit ab4c119d9d68 (hw/block/nvme: add commands supported and effects log page)
> ERROR: Macros with complex values should be enclosed in parenthesis
> #46: FILE: hw/block/nvme.c:131:
> +#define NVME_EFFECTS_NVM_INITIALIZER                   \
> +    [NVME_CMD_FLUSH]            = NVME_EFFECTS_CSUPP | \
> +                                  NVME_EFFECTS_LBCC,   \
> +    [NVME_CMD_WRITE]            = NVME_EFFECTS_CSUPP | \
> +                                  NVME_EFFECTS_LBCC,   \
> +    [NVME_CMD_READ]             = NVME_EFFECTS_CSUPP,  \
> +    [NVME_CMD_WRITE_ZEROES]     = NVME_EFFECTS_CSUPP | \
> +                                  NVME_EFFECTS_LBCC
> 

Pffft. If anyone has a better idea how to make this nice and not too
repetitive in the code, please let me know ;)

> 11/16 Checking commit 6343d89bf734 (hw/block/nvme: add the zone management send command)
> WARNING: Block comments use a leading /* on a separate line
> #66: FILE: hw/block/nvme.c:1118:
> +    return __nvme_allocate(ns, slba, nlb, false /* deallocate */);
> 
> WARNING: Block comments use a leading /* on a separate line
> #77: FILE: hw/block/nvme.c:1129:
> +    return __nvme_allocate(ns, slba, nlb, true /* deallocate */);
> 
> total: 0 errors, 2 warnings, 704 lines checked
> 
> Patch 11/16 has style problems, please review.  If any of these errors
> are false positives report them to the maintainer, see
> CHECKPATCH in MAINTAINERS.

I seem to remember that this has been up before, but doesnt look like a
fix for that has gone into checkpatch.pl.
Dmitry Fomichev Sept. 25, 2020, 5:06 p.m. UTC | #5
> -----Original Message-----
> From: Qemu-block <qemu-block-
> bounces+dmitry.fomichev=wdc.com@nongnu.org> On Behalf Of Klaus
> Jensen
> Sent: Thursday, September 24, 2020 4:45 PM
> To: qemu-devel@nongnu.org
> Cc: Fam Zheng <fam@euphon.net>; Kevin Wolf <kwolf@redhat.com>; qemu-
> block@nongnu.org; Klaus Jensen <k.jensen@samsung.com>; Max Reitz
> <mreitz@redhat.com>; Keith Busch <kbusch@kernel.org>; Klaus Jensen
> <its@irrelevant.dk>
> Subject: [PATCH 00/16] hw/block/nvme: zoned namespace command set
> 
> From: Klaus Jensen <k.jensen@samsung.com>
> 
> 
> While going through a few rounds of reviews on Dmitry's series I have
> 
> also continued nursing my own implementation since originally posted. I
> 
> did not receive any reviews originally, since it depended on a lot of
> 
> preceding series, but now, with the staging of multiple namespaces on
> 
> nvme-next yesterday, I think it deserves another shot since it now
> 
> applies directly. The series consists of a couple of trivial patches
> 
> followed by the "hw/block/nvme: add support for dulbe and block
> 
> utilization tracking", "hw/block/nvme: support namespace types" and the
> 
> set of zoned namespace support patches.
> 
> 
> 
> A couple of points on how this defers from Dmitry et. al.'s series and
> 
> why I think this implementation deserves a review.
> 
> 
> 
>   * Standard blockdev-based approach to persistent state. The
> 
>     implementation uses a plain blockdev associated with the nvme-ns
> 
>     device for storing state persistently. This same 'pstate' blockdev
> 
>     is also used for logical block allocation tracking.
> 

Is persistent state mandatory or optional? Sorry for asking, but I am
still catching up with your other patches. I think having it optional is
a big benefit for performance testing.

> 
> 
>   * Relies on automatic configuration of DLFEAT according to what the
> 
>     underlying blockdev provides (i.e. BDRV_O_UNMAP for guaranteeing
> 
>     zeroes on discarded blocks) for handling reads in the gaps between
> 
>     write pointer, ZCAP and ZSZE. Issues discards for zone resets. This
> 
>     removes the zero filling.
> 

Doesn't this make non-zero fill patterns impossible? In many storage
environments, vendors and admins are adamant about having varying
fill patterns to see who caused the data corruption if there is one.
Not sure how important this for the particular application, but WDC
series provides the functionality to specify the fill pattern.

> 
> 
> Finally, I wrote this. I am *NOT* saying that this somehow makes it
> 
> better, but as a maintainer, is a big deal to me since both series are
> 
> arguably a lot of code to maintain and support (both series are about
> 
> the same size). But - I am not the only maintainer, so if Keith (now
> 
> suddenly placed in the grim role as some sort of arbiter) signs off on
> 
> Dmitry's series, then so be it, I will rest my case.
> 
> 
> 
> I think we all want to see an implementation of zoned namespaces in QEMU
> 
> sooner rather than later, but I would lie if I said I wouldn't prefer
> 
> that it was this one.
> 
> 
> 
> Based-on: <20200922084533.1273962-1-its@irrelevant.dk>
> 
> 
> 
> Gollu Appalanaidu (1):
> 
>   hw/block/nvme: add commands supported and effects log page
> 
> 
> 
> Klaus Jensen (15):
> 
>   hw/block/nvme: add nsid to get/setfeat trace events
> 
>   hw/block/nvme: add trace event for requests with non-zero status code
> 
>   hw/block/nvme: make lba data size configurable
> 
>   hw/block/nvme: reject io commands if only admin command set selected
> 
>   hw/block/nvme: consolidate read, write and write zeroes
> 
>   hw/block/nvme: add support for dulbe and block utilization tracking
> 
>   hw/block/nvme: support namespace types
> 
>   hw/block/nvme: add basic read/write for zoned namespaces
> 
>   hw/block/nvme: add the zone management receive command
> 
>   hw/block/nvme: add the zone management send command
> 
>   hw/block/nvme: add the zone append command
> 
>   hw/block/nvme: track and enforce zone resources
> 
>   hw/block/nvme: allow open to close transitions by controller
> 
>   hw/block/nvme: support zone active excursions
> 
>   hw/block/nvme: support reset/finish recommended limits
> 
> 
> 
>  docs/specs/nvme.txt   |   49 +-
> 
>  hw/block/nvme-ns.h    |  166 +++-
> 
>  hw/block/nvme.h       |   24 +
> 
>  include/block/nvme.h  |  262 ++++++-
> 
>  block/nvme.c          |    4 +-
> 
>  hw/block/nvme-ns.c    |  411 +++++++++-
> 
>  hw/block/nvme.c       | 1727 +++++++++++++++++++++++++++++++++++++++-
> -
> 
>  hw/block/trace-events |   50 +-
> 
>  8 files changed, 2580 insertions(+), 113 deletions(-)
> 
> 
> 
> --
> 
> 2.28.0
> 
> 
>
Klaus Jensen Sept. 25, 2020, 5:27 p.m. UTC | #6
On Sep 25 17:06, Dmitry Fomichev wrote:
> > From: Klaus Jensen <k.jensen@samsung.com>
> > 
> >   * Standard blockdev-based approach to persistent state. The
> > 
> >     implementation uses a plain blockdev associated with the nvme-ns
> > 
> >     device for storing state persistently. This same 'pstate' blockdev
> > 
> >     is also used for logical block allocation tracking.
> > 
> 
> Is persistent state mandatory or optional? Sorry for asking, but I am
> still catching up with your other patches. I think having it optional is
> a big benefit for performance testing.
> 

Yes, the 'pstate' blockdev is optional.

> > 
> > 
> >   * Relies on automatic configuration of DLFEAT according to what the
> > 
> >     underlying blockdev provides (i.e. BDRV_O_UNMAP for guaranteeing
> > 
> >     zeroes on discarded blocks) for handling reads in the gaps between
> > 
> >     write pointer, ZCAP and ZSZE. Issues discards for zone resets. This
> > 
> >     removes the zero filling.
> > 
> 
> Doesn't this make non-zero fill patterns impossible? In many storage
> environments, vendors and admins are adamant about having varying
> fill patterns to see who caused the data corruption if there is one.
> Not sure how important this for the particular application, but WDC
> series provides the functionality to specify the fill pattern.
> 

That *is* a good point.

By default I think it should default to either the 0x00 fill pattern (if
supported by the underlying blockdev), or "no fill pattern reported"
when 0x00's cannot be guaranteed. But, an option to enable filling with,
say 0xff, like you do in your series, would be nice.