diff mbox

[RFC,v2] Data integrity extension(DIX) support for xen-block

Message ID 1461137170-24787-1-git-send-email-bob.liu@oracle.com (mailing list archive)
State New, archived
Headers show

Commit Message

Bob Liu April 20, 2016, 7:26 a.m. UTC
* What's data integrity extension(DIX) and why?
Modern filesystems feature checksumming of data and metadata to protect against
data corruption.  However, the detection of the corruption is done at read time
which could potentially be months after the data was written.  At that point the
original data that the application tried to write is most likely lost.

The solution in Linux is the data integrity framework which enables protection
information to be pinned to I/Os and sent to/received from controllers that
support it. struct bio has been extended with a pointer to a struct bip which
in turn contains the integrity metadata.
Both raw data and integrity metadata are mapped to two separate scatterlists.

* Issues when xen-block get involved.
xen-blkfront only transmits the raw data-segment scatterlist of each bio
while the integrity-metadata-segment scatterlist has been ignored.

* Proposal for transmitting integrity-metadata-segment scatterlist.
Adding an extra request following the normal data request, this extra request
contains integrity-metadata segments only.

The xen-blkback will reconstruct the new bio with recevied data and integrity
segments.

Signed-off-by: Bob Liu <bob.liu@oracle.com>
---
 xen/include/public/io/blkif.h |   19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

Comments

David Vrabel April 20, 2016, 8:59 a.m. UTC | #1
On 20/04/16 08:26, Bob Liu wrote:
> 
>  /*
> + * Recognized only if "feature-data-integrity" is present in backend xenbus info.
> + * A request with BLKIF_OP_DIX_FLAG indicates the following request is a special
> + * request which only contains integrity-metadata segments of current request.
> + *
> + * If a backend does not recognize BLKIF_OP_DIX_FLAG, it should *not* create the
> + * "feature-data-integrity" node!
> + */
> +#define BLKIF_OP_DIX_FLAG (0x80)

This looks fine as a mechanism for actually transferring the data but
you do need to specify:

1. The format of this DIX data.  You may reference external
specifications for this.

2. A mechanism for reporting which DIX formats the backend supports and
a way for the frontend to select one (if multiple are selected).

3. The behaviour the frontend can expect from the backend.  (e.g., if
the frontend writes sector S with DIX data D, a read of sector S with
complete with DIX data D).

David
Bob Liu April 20, 2016, 12:08 p.m. UTC | #2
On 04/20/2016 04:59 PM, David Vrabel wrote:
> On 20/04/16 08:26, Bob Liu wrote:
>>
>>  /*
>> + * Recognized only if "feature-data-integrity" is present in backend xenbus info.
>> + * A request with BLKIF_OP_DIX_FLAG indicates the following request is a special
>> + * request which only contains integrity-metadata segments of current request.
>> + *
>> + * If a backend does not recognize BLKIF_OP_DIX_FLAG, it should *not* create the
>> + * "feature-data-integrity" node!
>> + */
>> +#define BLKIF_OP_DIX_FLAG (0x80)
> 
> This looks fine as a mechanism for actually transferring the data but
> you do need to specify:
> 
> 1. The format of this DIX data.  You may reference external
> specifications for this.
> 

Sure!

> 2. A mechanism for reporting which DIX formats the backend supports and
> a way for the frontend to select one (if multiple are selected).
> 

The "feature-data-integrity" could be extended to "unsigned int" instead of "bool",
so as to report all DIX formats backend supports.

> 3. The behaviour the frontend can expect from the backend.  (e.g., if
> the frontend writes sector S with DIX data D, a read of sector S with
> complete with DIX data D).
> 

Sorry, I didn't get the point of this example.

Thank you for your review!
David Vrabel April 20, 2016, 1:49 p.m. UTC | #3
On 20/04/16 13:08, Bob Liu wrote:
> 
> 
> The "feature-data-integrity" could be extended to "unsigned int" instead of "bool",
> so as to report all DIX formats backend supports.

I think it would be preferable to have something string based.  I think
Linux reports the formats using a string encoding.  Perhaps the same
could be used.

>> 3. The behaviour the frontend can expect from the backend.  (e.g., if
>> the frontend writes sector S with DIX data D, a read of sector S with
>> complete with DIX data D).
>>
> 
> Sorry, I didn't get the point of this example.

If not sure what you're not getting.

This is the behaviour of hardware that has this feature, yes?  The block
device provided by blkback must provide the same behaviour, right?  So
this must be specified.

David
diff mbox

Patch

diff --git a/xen/include/public/io/blkif.h b/xen/include/public/io/blkif.h
index 99f0326..a0124b2 100644
--- a/xen/include/public/io/blkif.h
+++ b/xen/include/public/io/blkif.h
@@ -182,6 +182,15 @@ 
  *      backend driver paired with a LIFO queue in the frontend will
  *      allow us to have better performance in this scenario.
  *
+ * feature-data-integrity
+ *      Values:         0/1 (boolean)
+ *      Default Value:  0
+ *
+ *      A value of "1" indicates that the backend can process requests
+ *      containing the BLKIF_OP_DIX_FLAG request opcode.  Requests
+ *      with this flag means the following request is a special request which
+ *      only contains integrity-metadata segments of current request.
+ *
  *----------------------- Request Transport Parameters ------------------------
  *
  * max-ring-page-order
@@ -635,6 +644,16 @@ 
 #define BLKIF_OP_INDIRECT          6
 
 /*
+ * Recognized only if "feature-data-integrity" is present in backend xenbus info.
+ * A request with BLKIF_OP_DIX_FLAG indicates the following request is a special
+ * request which only contains integrity-metadata segments of current request.
+ *
+ * If a backend does not recognize BLKIF_OP_DIX_FLAG, it should *not* create the
+ * "feature-data-integrity" node!
+ */
+#define BLKIF_OP_DIX_FLAG (0x80)
+
+/*
  * Maximum scatter/gather segments per request.
  * This is carefully chosen so that sizeof(blkif_ring_t) <= PAGE_SIZE.
  * NB. This could be 12 if the ring indexes weren't stored in the same page.