diff mbox series

[v3] blkif: reconcile protocol specification with in-use implementations

Message ID 20240912095729.25927-1-roger.pau@citrix.com (mailing list archive)
State New
Headers show
Series [v3] blkif: reconcile protocol specification with in-use implementations | expand

Commit Message

Roger Pau Monné Sept. 12, 2024, 9:57 a.m. UTC
Current blkif implementations (both backends and frontends) have all slight
differences about how they handle the 'sector-size' xenstore node, and how
other fields are derived from this value or hardcoded to be expressed in units
of 512 bytes.

To give some context, this is an excerpt of how different implementations use
the value in 'sector-size' as the base unit for to other fields rather than
just to set the logical sector size of the block device:

                        │ sectors xenbus node │ requests sector_number │ requests {first,last}_sect
────────────────────────┼─────────────────────┼────────────────────────┼───────────────────────────
FreeBSD blk{front,back} │     sector-size     │      sector-size       │           512
────────────────────────┼─────────────────────┼────────────────────────┼───────────────────────────
Linux blk{front,back}   │         512         │          512           │           512
────────────────────────┼─────────────────────┼────────────────────────┼───────────────────────────
QEMU blkback            │     sector-size     │      sector-size       │       sector-size
────────────────────────┼─────────────────────┼────────────────────────┼───────────────────────────
Windows blkfront        │     sector-size     │      sector-size       │       sector-size
────────────────────────┼─────────────────────┼────────────────────────┼───────────────────────────
MiniOS                  │     sector-size     │          512           │           512

An attempt was made by 67e1c050e36b in order to change the base units of the
request fields and the xenstore 'sectors' node.  That however only lead to more
confusion, as the specification now clearly diverged from the reference
implementation in Linux.  Such change was only implemented for QEMU Qdisk
and Windows PV blkfront.

Partially revert to the state before 67e1c050e36b while adjusting the
documentation for 'sectors' to match what it used to be previous to
2fa701e5346d:

 * Declare 'feature-large-sector-size' deprecated.  Frontends should not expose
   the node, backends should not make decisions based on its presence.

 * Clarify that 'sectors' xenstore node and the requests fields are always in
   512-byte units, like it was previous to 2fa701e5346d and 67e1c050e36b.

All base units for the fields used in the protocol are 512-byte based, the
xenbus 'sector-size' field is only used to signal the logic block size.  When
'sector-size' is greater than 512, blkfront implementations must make sure that
the offsets and sizes (despite being expressed in 512-byte units) are aligned
to the logical block size specified in 'sector-size', otherwise the backend
will fail to process the requests.

This will require changes to some of the frontends and backends in order to
properly support 'sector-size' nodes greater than 512.

Fixes: 2fa701e5346d ('blkif.h: Provide more complete documentation of the blkif interface')
Fixes: 67e1c050e36b ('public/io/blkif.h: try to fix the semantics of sector based quantities')
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Anthony PERARD <anthony.perard@vates.tech>
---
Changes since v2:
 - Reword a sentence and fix incorrect use of sector-size.

Changes since v1:
 - Update commit message.
 - Expand comments.
---
 xen/include/public/io/blkif.h | 52 ++++++++++++++++++++++++++---------
 1 file changed, 39 insertions(+), 13 deletions(-)

Comments

Jan Beulich Sept. 12, 2024, 12:06 p.m. UTC | #1
On 12.09.2024 11:57, Roger Pau Monne wrote:
> Current blkif implementations (both backends and frontends) have all slight
> differences about how they handle the 'sector-size' xenstore node, and how
> other fields are derived from this value or hardcoded to be expressed in units
> of 512 bytes.
> 
> To give some context, this is an excerpt of how different implementations use
> the value in 'sector-size' as the base unit for to other fields rather than
> just to set the logical sector size of the block device:
> 
>                         │ sectors xenbus node │ requests sector_number │ requests {first,last}_sect
> ────────────────────────┼─────────────────────┼────────────────────────┼───────────────────────────
> FreeBSD blk{front,back} │     sector-size     │      sector-size       │           512
> ────────────────────────┼─────────────────────┼────────────────────────┼───────────────────────────
> Linux blk{front,back}   │         512         │          512           │           512
> ────────────────────────┼─────────────────────┼────────────────────────┼───────────────────────────
> QEMU blkback            │     sector-size     │      sector-size       │       sector-size
> ────────────────────────┼─────────────────────┼────────────────────────┼───────────────────────────
> Windows blkfront        │     sector-size     │      sector-size       │       sector-size
> ────────────────────────┼─────────────────────┼────────────────────────┼───────────────────────────
> MiniOS                  │     sector-size     │          512           │           512
> 
> An attempt was made by 67e1c050e36b in order to change the base units of the
> request fields and the xenstore 'sectors' node.  That however only lead to more
> confusion, as the specification now clearly diverged from the reference
> implementation in Linux.  Such change was only implemented for QEMU Qdisk
> and Windows PV blkfront.
> 
> Partially revert to the state before 67e1c050e36b while adjusting the
> documentation for 'sectors' to match what it used to be previous to
> 2fa701e5346d:
> 
>  * Declare 'feature-large-sector-size' deprecated.  Frontends should not expose
>    the node, backends should not make decisions based on its presence.
> 
>  * Clarify that 'sectors' xenstore node and the requests fields are always in
>    512-byte units, like it was previous to 2fa701e5346d and 67e1c050e36b.
> 
> All base units for the fields used in the protocol are 512-byte based, the
> xenbus 'sector-size' field is only used to signal the logic block size.  When
> 'sector-size' is greater than 512, blkfront implementations must make sure that
> the offsets and sizes (despite being expressed in 512-byte units) are aligned
> to the logical block size specified in 'sector-size', otherwise the backend
> will fail to process the requests.
> 
> This will require changes to some of the frontends and backends in order to
> properly support 'sector-size' nodes greater than 512.
> 
> Fixes: 2fa701e5346d ('blkif.h: Provide more complete documentation of the blkif interface')
> Fixes: 67e1c050e36b ('public/io/blkif.h: try to fix the semantics of sector based quantities')
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> Reviewed-by: Juergen Gross <jgross@suse.com>
> Reviewed-by: Anthony PERARD <anthony.perard@vates.tech>

The Fixes: tags generally suggest this wants backporting. I'm a little uncertain
here though, as it won't really affect anything that is built. Opinions?

Jan
Roger Pau Monné Sept. 12, 2024, 12:50 p.m. UTC | #2
On Thu, Sep 12, 2024 at 02:06:23PM +0200, Jan Beulich wrote:
> On 12.09.2024 11:57, Roger Pau Monne wrote:
> > Current blkif implementations (both backends and frontends) have all slight
> > differences about how they handle the 'sector-size' xenstore node, and how
> > other fields are derived from this value or hardcoded to be expressed in units
> > of 512 bytes.
> > 
> > To give some context, this is an excerpt of how different implementations use
> > the value in 'sector-size' as the base unit for to other fields rather than
> > just to set the logical sector size of the block device:
> > 
> >                         │ sectors xenbus node │ requests sector_number │ requests {first,last}_sect
> > ────────────────────────┼─────────────────────┼────────────────────────┼───────────────────────────
> > FreeBSD blk{front,back} │     sector-size     │      sector-size       │           512
> > ────────────────────────┼─────────────────────┼────────────────────────┼───────────────────────────
> > Linux blk{front,back}   │         512         │          512           │           512
> > ────────────────────────┼─────────────────────┼────────────────────────┼───────────────────────────
> > QEMU blkback            │     sector-size     │      sector-size       │       sector-size
> > ────────────────────────┼─────────────────────┼────────────────────────┼───────────────────────────
> > Windows blkfront        │     sector-size     │      sector-size       │       sector-size
> > ────────────────────────┼─────────────────────┼────────────────────────┼───────────────────────────
> > MiniOS                  │     sector-size     │          512           │           512
> > 
> > An attempt was made by 67e1c050e36b in order to change the base units of the
> > request fields and the xenstore 'sectors' node.  That however only lead to more
> > confusion, as the specification now clearly diverged from the reference
> > implementation in Linux.  Such change was only implemented for QEMU Qdisk
> > and Windows PV blkfront.
> > 
> > Partially revert to the state before 67e1c050e36b while adjusting the
> > documentation for 'sectors' to match what it used to be previous to
> > 2fa701e5346d:
> > 
> >  * Declare 'feature-large-sector-size' deprecated.  Frontends should not expose
> >    the node, backends should not make decisions based on its presence.
> > 
> >  * Clarify that 'sectors' xenstore node and the requests fields are always in
> >    512-byte units, like it was previous to 2fa701e5346d and 67e1c050e36b.
> > 
> > All base units for the fields used in the protocol are 512-byte based, the
> > xenbus 'sector-size' field is only used to signal the logic block size.  When
> > 'sector-size' is greater than 512, blkfront implementations must make sure that
> > the offsets and sizes (despite being expressed in 512-byte units) are aligned
> > to the logical block size specified in 'sector-size', otherwise the backend
> > will fail to process the requests.
> > 
> > This will require changes to some of the frontends and backends in order to
> > properly support 'sector-size' nodes greater than 512.
> > 
> > Fixes: 2fa701e5346d ('blkif.h: Provide more complete documentation of the blkif interface')
> > Fixes: 67e1c050e36b ('public/io/blkif.h: try to fix the semantics of sector based quantities')
> > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> > Reviewed-by: Juergen Gross <jgross@suse.com>
> > Reviewed-by: Anthony PERARD <anthony.perard@vates.tech>
> 
> The Fixes: tags generally suggest this wants backporting. I'm a little uncertain
> here though, as it won't really affect anything that is built. Opinions?

I would suggest to backport to open release branches, as people
working on protocol implementations might not use the headers from
unstable, but rather from the last release.

Thanks, Roger.
Andrew Cooper Sept. 12, 2024, 12:55 p.m. UTC | #3
On 12/09/2024 1:50 pm, Roger Pau Monné wrote:
> On Thu, Sep 12, 2024 at 02:06:23PM +0200, Jan Beulich wrote:
>> On 12.09.2024 11:57, Roger Pau Monne wrote:
>>> Current blkif implementations (both backends and frontends) have all slight
>>> differences about how they handle the 'sector-size' xenstore node, and how
>>> other fields are derived from this value or hardcoded to be expressed in units
>>> of 512 bytes.
>>>
>>> To give some context, this is an excerpt of how different implementations use
>>> the value in 'sector-size' as the base unit for to other fields rather than
>>> just to set the logical sector size of the block device:
>>>
>>>                         │ sectors xenbus node │ requests sector_number │ requests {first,last}_sect
>>> ────────────────────────┼─────────────────────┼────────────────────────┼───────────────────────────
>>> FreeBSD blk{front,back} │     sector-size     │      sector-size       │           512
>>> ────────────────────────┼─────────────────────┼────────────────────────┼───────────────────────────
>>> Linux blk{front,back}   │         512         │          512           │           512
>>> ────────────────────────┼─────────────────────┼────────────────────────┼───────────────────────────
>>> QEMU blkback            │     sector-size     │      sector-size       │       sector-size
>>> ────────────────────────┼─────────────────────┼────────────────────────┼───────────────────────────
>>> Windows blkfront        │     sector-size     │      sector-size       │       sector-size
>>> ────────────────────────┼─────────────────────┼────────────────────────┼───────────────────────────
>>> MiniOS                  │     sector-size     │          512           │           512
>>>
>>> An attempt was made by 67e1c050e36b in order to change the base units of the
>>> request fields and the xenstore 'sectors' node.  That however only lead to more
>>> confusion, as the specification now clearly diverged from the reference
>>> implementation in Linux.  Such change was only implemented for QEMU Qdisk
>>> and Windows PV blkfront.
>>>
>>> Partially revert to the state before 67e1c050e36b while adjusting the
>>> documentation for 'sectors' to match what it used to be previous to
>>> 2fa701e5346d:
>>>
>>>  * Declare 'feature-large-sector-size' deprecated.  Frontends should not expose
>>>    the node, backends should not make decisions based on its presence.
>>>
>>>  * Clarify that 'sectors' xenstore node and the requests fields are always in
>>>    512-byte units, like it was previous to 2fa701e5346d and 67e1c050e36b.
>>>
>>> All base units for the fields used in the protocol are 512-byte based, the
>>> xenbus 'sector-size' field is only used to signal the logic block size.  When
>>> 'sector-size' is greater than 512, blkfront implementations must make sure that
>>> the offsets and sizes (despite being expressed in 512-byte units) are aligned
>>> to the logical block size specified in 'sector-size', otherwise the backend
>>> will fail to process the requests.
>>>
>>> This will require changes to some of the frontends and backends in order to
>>> properly support 'sector-size' nodes greater than 512.
>>>
>>> Fixes: 2fa701e5346d ('blkif.h: Provide more complete documentation of the blkif interface')
>>> Fixes: 67e1c050e36b ('public/io/blkif.h: try to fix the semantics of sector based quantities')
>>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
>>> Reviewed-by: Juergen Gross <jgross@suse.com>
>>> Reviewed-by: Anthony PERARD <anthony.perard@vates.tech>
>> The Fixes: tags generally suggest this wants backporting. I'm a little uncertain
>> here though, as it won't really affect anything that is built. Opinions?
> I would suggest to backport to open release branches, as people
> working on protocol implementations might not use the headers from
> unstable, but rather from the last release.

Independently to backport, this definitely warrants a CHANGELOG entry.

We need to take all reasonable steps to make people aware that there are
breakages here.

~Andrew
diff mbox series

Patch

diff --git a/xen/include/public/io/blkif.h b/xen/include/public/io/blkif.h
index 22f1eef0c0ca..9b00d633d372 100644
--- a/xen/include/public/io/blkif.h
+++ b/xen/include/public/io/blkif.h
@@ -237,12 +237,16 @@ 
  * sector-size
  *      Values:         <uint32_t>
  *
- *      The logical block size, in bytes, of the underlying storage. This
- *      must be a power of two with a minimum value of 512.
+ *      The logical block size, in bytes, of the underlying storage. This must
+ *      be a power of two with a minimum value of 512.  The sector size should
+ *      only be used for request segment length and alignment.
  *
- *      NOTE: Because of implementation bugs in some frontends this must be
- *            set to 512, unless the frontend advertizes a non-zero value
- *            in its "feature-large-sector-size" xenbus node. (See below).
+ *      When exposing a device that uses a logical sector size of 4096, the
+ *      only difference xenstore wise will be that 'sector-size' (and possibly
+ *      'physical-sector-size' if supported by the backend) will be 4096, but
+ *      the 'sectors' node will still be calculated using 512 byte units.  The
+ *      sector base units in the ring requests fields will all be 512 byte
+ *      based despite the logical sector size exposed in 'sector-size'.
  *
  * physical-sector-size
  *      Values:         <uint32_t>
@@ -254,9 +258,9 @@ 
  * sectors
  *      Values:         <uint64_t>
  *
- *      The size of the backend device, expressed in units of "sector-size".
- *      The product of "sector-size" and "sectors" must also be an integer
- *      multiple of "physical-sector-size", if that node is present.
+ *      The size of the backend device, expressed in units of 512b.  The
+ *      product of "sectors" * 512 must also be an integer multiple of
+ *      "physical-sector-size", if that node is present.
  *
  *****************************************************************************
  *                            Frontend XenBus Nodes
@@ -338,6 +342,7 @@ 
  * feature-large-sector-size
  *      Values:         0/1 (boolean)
  *      Default Value:  0
+ *      Notes:          DEPRECATED, 12
  *
  *      A value of "1" indicates that the frontend will correctly supply and
  *      interpret all sector-based quantities in terms of the "sector-size"
@@ -411,6 +416,11 @@ 
  *(10) The discard-secure property may be present and will be set to 1 if the
  *     backing device supports secure discard.
  *(11) Only used by Linux and NetBSD.
+ *(12) Possibly only ever implemented by the QEMU Qdisk backend and the Windows
+ *     PV block frontend.  Other backends and frontends supported 'sector-size'
+ *     values greater than 512 before such feature was added.  Frontends should
+ *     not expose this node, neither should backends make any decisions based
+ *     on it being exposed by the frontend.
  */
 
 /*
@@ -619,11 +629,14 @@ 
 #define BLKIF_MAX_INDIRECT_PAGES_PER_REQUEST 8
 
 /*
- * NB. 'first_sect' and 'last_sect' in blkif_request_segment, as well as
- * 'sector_number' in blkif_request, blkif_request_discard and
- * blkif_request_indirect are sector-based quantities. See the description
- * of the "feature-large-sector-size" frontend xenbus node above for
- * more information.
+ * NB. 'first_sect' and 'last_sect' in blkif_request_segment are all in units
+ * of 512 bytes, despite the 'sector-size' xenstore node possibly having a
+ * value greater than 512.
+ *
+ * The value in 'first_sect' and 'last_sect' fields must be setup so that the
+ * resulting segment offset and size is aligned to the logical sector size
+ * reported by the 'sector-size' xenstore node, see 'Backend Device Properties'
+ * section.
  */
 struct blkif_request_segment {
     grant_ref_t gref;        /* reference to I/O buffer frame        */
@@ -634,6 +647,10 @@  struct blkif_request_segment {
 
 /*
  * Starting ring element for any I/O request.
+ *
+ * The 'sector_number' field is in units of 512b, despite the value of the
+ * 'sector-size' xenstore node.  Note however that the offset in
+ * 'sector_number' must be aligned to 'sector-size'.
  */
 struct blkif_request {
     uint8_t        operation;    /* BLKIF_OP_???                         */
@@ -648,6 +665,10 @@  typedef struct blkif_request blkif_request_t;
 /*
  * Cast to this structure when blkif_request.operation == BLKIF_OP_DISCARD
  * sizeof(struct blkif_request_discard) <= sizeof(struct blkif_request)
+ *
+ * The 'sector_number' field is in units of 512b, despite the value of the
+ * 'sector-size' xenstore node.  Note however that the offset in
+ * 'sector_number' must be aligned to 'sector-size'.
  */
 struct blkif_request_discard {
     uint8_t        operation;    /* BLKIF_OP_DISCARD                     */
@@ -660,6 +681,11 @@  struct blkif_request_discard {
 };
 typedef struct blkif_request_discard blkif_request_discard_t;
 
+/*
+ * The 'sector_number' field is in units of 512b, despite the value of the
+ * 'sector-size' xenstore node.  Note however that the offset in
+ * 'sector_number' must be aligned to 'sector-size'.
+ */
 struct blkif_request_indirect {
     uint8_t        operation;    /* BLKIF_OP_INDIRECT                    */
     uint8_t        indirect_op;  /* BLKIF_OP_{READ/WRITE}                */