mbox series

[v4,00/11] Read/Write with meta/integrity

Message ID 20241016112912.63542-1-anuj20.g@samsung.com (mailing list archive)
Headers show
Series Read/Write with meta/integrity | expand

Message

Anuj Gupta Oct. 16, 2024, 11:29 a.m. UTC
This adds a new io_uring interface to exchange meta along with read/write.

Interface:
Meta information is represented using a newly introduced 'struct io_uring_meta'.
Application sets up a SQE128 ring, and prepares io_uring_meta within second
SQE. Application populates 'struct io_uring_meta' fields as below:

* meta_type: describes type of meta that is passed. Currently one type
"Integrity" is supported.
* meta_flags: these are meta-type specific flags. Three flags are exposed for
integrity type, namely BLK_INTEGRITY_CHK_GUARD/APPTAG/REFTAG.
* meta_len: length of the meta buffer
* meta_addr: address of the meta buffer
* seed: seed value for ref tag remapping
* app_tag: optional application-specific 16b value; this goes along with
INTEGRITY_CHK_APPTAG flag.

Block path (direct IO) , NVMe and SCSI driver are modified to support
this.

Patch 1 is an enhancement patch.
Patch 2 is required to make the user metadata split work correctly.
Patch 3 to 6 are prep patches.
Patch 7 adds the io_uring support.
Patch 8 gives us unified interface for user and kernel generated
integrity.
Patch 9 adds the support for block direct IO, patch 10 for NVMe, and
patch 11 for SCSI.

In patch 11 for scsi, we added a check to prevent scenarios where refcheck
is specified without appcheck and vice-versa, as it is not possible in
scsi. However block layer generated integrity doesn't specify appcheck.
For drives formatted with type1/type2 PI, block layer would specify refcheck
but not appcheck. Hence, these I/O's would fail. Any suggestions how this
could be handled?

Some of the design choices came from this discussion [1].

Example program on how to use the interface is appended below [2]
(It also tests whether reftag remapping happens correctly or not)

Tree:
https://github.com/SamsungDS/linux/tree/feat/pi_us_v4
Testing:
has been done by modifying fio to use this interface.
https://github.com/SamsungDS/fio/tree/priv/feat/pi-test-v5

Changes since v3:
https://lore.kernel.org/linux-block/20240823103811.2421-1-anuj20.g@samsung.com/

- add reftag seed support (Martin)
- fix incorrect formatting in uio_meta (hch)
- s/IOCB_HAS_META/IOCB_HAS_METADATA (hch)
- move integrity check flags to block layer header (hch)
- add comments for BIP_CHECK_GUARD/REFTAG/APPTAG flags (hch)
- remove bio_integrity check during completion if IOCB_HAS_METADATA is set (hch)
- use goto label to get rid of duplicate error handling (hch)
- add warn_on if trying to do sync io with iocb_has_metadata flag (hch)
- remove check for disabling reftag remapping (hch)
- remove BIP_INTEGRITY_USER flag (hch)
- add comment for app_tag field introduced in bio_integrity_payload (hch)
- pass request to nvme_set_app_tag function (hch)
- right indentation at a place in scsi patch (hch)
- move IOCB_HAS_METADATA to a separate fs patch (hch)

Changes since v2:
https://lore.kernel.org/linux-block/20240626100700.3629-1-anuj20.g@samsung.com/
- io_uring error handling styling (Gabriel)
- add documented helper to get metadata bytes from data iter (hch)
- during clone specify "what flags to clone" rather than
"what not to clone" (hch)
- Move uio_meta defination to bio-integrity.h (hch)
- Rename apptag field to app_tag (hch)
- Change datatype of flags field in uio_meta to bitwise (hch)
- Don't introduce BIP_USER_CHK_FOO flags (hch, martin)
- Driver should rely on block layer flags instead of seeing if it is
user-passthrough (hch)
- update the scsi code for handling user-meta (hch, martin)

Changes since v1:
https://lore.kernel.org/linux-block/20240425183943.6319-1-joshi.k@samsung.com/
- Do not use new opcode for meta, and also add the provision to introduce new
meta types beyond integrity (Pavel)
- Stuff IOCB_HAS_META check in need_complete_io (Jens)
- Split meta handling in NVMe into a separate handler (Keith)
- Add meta handling for __blkdev_direct_IO too (Keith)
- Don't inherit BIP_COPY_USER flag for cloned bio's (Christoph)
- Better commit descriptions (Christoph)

Changes since RFC:
- modify io_uring plumbing based on recent async handling state changes
- fixes/enhancements to correctly handle the split for meta buffer
- add flags to specify guard/reftag/apptag checks
- add support to send apptag


[1] https://lore.kernel.org/linux-block/20240705083205.2111277-1-hch@lst.de/

[2]

#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <linux/blkdev.h>
#include <linux/io_uring.h>
#include <linux/types.h>
#include "liburing.h"

/* write data/meta. read both. compare. send apptag too.
* prerequisite:
* protected xfer: format namespace with 4KB + 8b, pi_type = 1
* For testing reftag remapping on device-mapper, create a
* device-mapper and run this program. Device mapper creation:
* # echo 0 80 linear /dev/nvme0n1 0 > /tmp/table
* # echo 80 160 linear /dev/nvme0n1 200 >> /tmp/table
* # dmsetup create two /tmp/table
* # ./a.out /dev/dm-0
*/

#define DATA_LEN 4096
#define META_LEN 8

struct t10_pi_tuple {
        __be16  guard;
        __be16  apptag;
        __be32  reftag;
};

int main(int argc, char *argv[])
{
         struct io_uring ring;
         struct io_uring_sqe *sqe = NULL;
         struct io_uring_cqe *cqe = NULL;
         void *wdb,*rdb;
         char wmb[META_LEN], rmb[META_LEN];
         char *data_str = "data buffer";
         int fd, ret, blksize;
         struct stat fstat;
         unsigned long long offset = DATA_LEN * 10;
         struct t10_pi_tuple *pi;
         struct io_uring_meta *md;

         if (argc != 2) {
                 fprintf(stderr, "Usage: %s <block-device>", argv[0]);
                 return 1;
         };

         if (stat(argv[1], &fstat) == 0) {
                 blksize = (int)fstat.st_blksize;
         } else {
                 perror("stat");
                 return 1;
         }

         if (posix_memalign(&wdb, blksize, DATA_LEN)) {
                 perror("posix_memalign failed");
                 return 1;
         }
         if (posix_memalign(&rdb, blksize, DATA_LEN)) {
                 perror("posix_memalign failed");
                 return 1;
         }

         memset(wdb, 0, DATA_LEN);

         fd = open(argv[1], O_RDWR | O_DIRECT);
         if (fd < 0) {
                 printf("Error in opening device\n");
                 return 0;
         }

         ret = io_uring_queue_init(8, &ring, IORING_SETUP_SQE128);
         if (ret) {
                 fprintf(stderr, "ring setup failed: %d\n", ret);
                 return 1;
         }

         /* write data + meta-buffer to device */
         sqe = io_uring_get_sqe(&ring);
         if (!sqe) {
                 fprintf(stderr, "get sqe failed\n");
                 return 1;
         }

         io_uring_prep_write(sqe, fd, wdb, DATA_LEN, offset);

         md = (struct io_uring_meta *) sqe->big_sqe_cmd;
         md->meta_type = META_TYPE_INTEGRITY;
         md->meta_addr = (__u64)wmb;
         md->meta_len = META_LEN;
         /* flags to ask for guard/reftag/apptag*/
         md->meta_flags = BLK_INTEGRITY_CHK_GUARD | BLK_INTEGRITY_CHK_REFTAG | BLK_INTEGRITY_CHK_APPTAG;
         md->app_tag = 0x1234;
         md->seed = 10;

         pi = (struct t10_pi_tuple *)wmb;
         pi->guard = 0;
         pi->reftag = 0x0A000000;
         pi->apptag = 0x3412;

         ret = io_uring_submit(&ring);
         if (ret <= 0) {
                 fprintf(stderr, "sqe submit failed: %d\n", ret);
                 return 1;
         }

         ret = io_uring_wait_cqe(&ring, &cqe);
         if (!cqe) {
                 fprintf(stderr, "cqe is NULL :%d\n", ret);
                 return 1;
         }
         if (cqe->res < 0) {
                 fprintf(stderr, "write cqe failure: %d", cqe->res);
                 return 1;
         }

         io_uring_cqe_seen(&ring, cqe);

         /* read data + meta-buffer back from device */
         sqe = io_uring_get_sqe(&ring);
         if (!sqe) {
                 fprintf(stderr, "get sqe failed\n");
                 return 1;
         }

         io_uring_prep_read(sqe, fd, rdb, DATA_LEN, offset);

         md = (struct io_uring_meta *) sqe->big_sqe_cmd;
         md->meta_type = META_TYPE_INTEGRITY;
         md->meta_addr = (__u64)rmb;
         md->meta_len = META_LEN;
         md->meta_flags = BLK_INTEGRITY_CHK_GUARD | BLK_INTEGRITY_CHK_REFTAG | BLK_INTEGRITY_CHK_APPTAG;
         md->app_tag = 0x1234;
         md->seed = 10;

         ret = io_uring_submit(&ring);
         if (ret <= 0) {
                 fprintf(stderr, "sqe submit failed: %d\n", ret);
                 return 1;
         }

         ret = io_uring_wait_cqe(&ring, &cqe);
         if (!cqe) {
                 fprintf(stderr, "cqe is NULL :%d\n", ret);
                 return 1;
         }

         if (cqe->res < 0) {
                 fprintf(stderr, "read cqe failure: %d", cqe->res);
                 return 1;
         }

	 pi = (struct t10_pi_tuple *)rmb;
	 if (pi->apptag != 0x3412)
		 printf("Failure: apptag mismatch!\n");
	 if (pi->reftag != 0x0A000000)
		 printf("Failure: reftag mismatch!\n");

         io_uring_cqe_seen(&ring, cqe);

         pi = (struct t10_pi_tuple *)rmb;

         if (strncmp(wmb, rmb, META_LEN))
                 printf("Failure: meta mismatch!, wmb=%s, rmb=%s\n", wmb, rmb);

         if (strncmp(wdb, rdb, DATA_LEN))
                 printf("Failure: data mismatch!\n");

         io_uring_queue_exit(&ring);
         free(rdb);
         free(wdb);
         return 0;
}

Anuj Gupta (8):
  block: define set of integrity flags to be inherited by cloned bip
  block: copy back bounce buffer to user-space correctly in case of
    split
  block: modify bio_integrity_map_user to accept iov_iter as argument
  fs: introduce IOCB_HAS_METADATA for metadata
  block: add flags for integrity meta
  io_uring/rw: add support to send meta along with read/write
  block: introduce BIP_CHECK_GUARD/REFTAG/APPTAG bip_flags
  scsi: add support for user-meta interface

Kanchan Joshi (3):
  block: define meta io descriptor
  block: add support to pass user meta buffer
  nvme: add support for passing on the application tag

 block/bio-integrity.c         | 73 +++++++++++++++++++++++++++++-----
 block/blk-integrity.c         | 10 ++++-
 block/fops.c                  | 44 +++++++++++++++-----
 drivers/nvme/host/core.c      | 21 ++++++----
 drivers/scsi/sd.c             | 25 +++++++++++-
 include/linux/bio-integrity.h | 30 ++++++++++++--
 include/linux/fs.h            |  1 +
 include/uapi/linux/blkdev.h   | 11 +++++
 include/uapi/linux/io_uring.h | 26 ++++++++++++
 io_uring/io_uring.c           |  6 +++
 io_uring/rw.c                 | 75 +++++++++++++++++++++++++++++++++--
 io_uring/rw.h                 | 15 ++++++-
 12 files changed, 300 insertions(+), 37 deletions(-)

Comments

Martin K. Petersen Oct. 22, 2024, 2:04 a.m. UTC | #1
Anuj,

> * meta_flags: these are meta-type specific flags. Three flags are
> exposed for integrity type, namely
> BLK_INTEGRITY_CHK_GUARD/APPTAG/REFTAG.

It's a bit weird that these are BLK_INTEGRITY_FLAGS since they are
exposed via io_uring.

I have reads limping along in my test tool on top of this series (with
the sd_prot_flags_valid check back out). Will work on writes tomorrow.