From patchwork Mon Feb 29 03:37:10 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bob Liu X-Patchwork-Id: 8448461 Return-Path: X-Original-To: patchwork-xen-devel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork1.web.kernel.org (Postfix) with ESMTP id 0265A9F52D for ; Mon, 29 Feb 2016 03:40:51 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 0036220265 for ; Mon, 29 Feb 2016 03:40:49 +0000 (UTC) Received: from lists.xen.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id EB31C20263 for ; Mon, 29 Feb 2016 03:40:48 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xen.org) by lists.xen.org with esmtp (Exim 4.84) (envelope-from ) id 1aaEeZ-0003GX-Nx; Mon, 29 Feb 2016 03:37:47 +0000 Received: from mail6.bemta3.messagelabs.com ([195.245.230.39]) by lists.xen.org with esmtp (Exim 4.84) (envelope-from ) id 1aaEeY-0003GR-E7 for xen-devel@lists.xen.org; Mon, 29 Feb 2016 03:37:46 +0000 Received: from [85.158.137.68] by server-8.bemta-3.messagelabs.com id D0/DA-04050-90DB3D65; Mon, 29 Feb 2016 03:37:45 +0000 X-Env-Sender: bob.liu@oracle.com X-Msg-Ref: server-5.tower-31.messagelabs.com!1456717063!25625715!1 X-Originating-IP: [141.146.126.69] X-SpamReason: No, hits=0.0 required=7.0 tests=sa_preprocessor: VHJ1c3RlZCBJUDogMTQxLjE0Ni4xMjYuNjkgPT4gMjc3MjE4\n X-StarScan-Received: X-StarScan-Version: 7.35.1; banners=-,-,- X-VirusChecked: Checked Received: (qmail 18769 invoked from network); 29 Feb 2016 03:37:44 -0000 Received: from aserp1040.oracle.com (HELO aserp1040.oracle.com) (141.146.126.69) by server-5.tower-31.messagelabs.com with DHE-RSA-AES256-GCM-SHA384 encrypted SMTP; 29 Feb 2016 03:37:44 -0000 Received: from aserv0022.oracle.com (aserv0022.oracle.com [141.146.126.234]) by aserp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id u1T3bbTV022898 (version=TLSv1 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Mon, 29 Feb 2016 03:37:37 GMT Received: from userv0121.oracle.com (userv0121.oracle.com [156.151.31.72]) by aserv0022.oracle.com (8.13.8/8.13.8) with ESMTP id u1T3baAD007557 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=FAIL); Mon, 29 Feb 2016 03:37:37 GMT Received: from abhmp0014.oracle.com (abhmp0014.oracle.com [141.146.116.20]) by userv0121.oracle.com (8.13.8/8.13.8) with ESMTP id u1T3bZF6015868; Mon, 29 Feb 2016 03:37:35 GMT Received: from boliuliu.home (/101.80.139.88) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Sun, 28 Feb 2016 19:37:35 -0800 From: Bob Liu To: xen-devel@lists.xen.org Date: Mon, 29 Feb 2016 11:37:10 +0800 Message-Id: <1456717031-13423-1-git-send-email-bob.liu@oracle.com> X-Mailer: git-send-email 1.7.10.4 X-Source-IP: aserv0022.oracle.com [141.146.126.234] Cc: jgross@suse.com, ian.jackson@eu.citrix.com, Bob Liu , paul.durrant@citrix.com, jbeulich@suse.com, roger.pau@citrix.com Subject: [Xen-devel] [RFC PATCH] xen-block: introduces extra request to pass-through SCSI commands X-BeenThere: xen-devel@lists.xen.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP 1) What is this patch about? This patch introduces an new block operation (BLKIF_OP_EXTRA_FLAG). A request with BLKIF_OP_EXTRA_FLAG set means the following request is an extra request which is used to pass through SCSI commands. This is like a simplified version of XEN_NETIF_EXTRA_* in netif.h. It can be extended easily to transmit other per-request/bio data from frontend to backend e.g Data Integrity Field per bio. 2) Why we need this? Currently only raw data segments are transmitted from blkfront to blkback, which means some advanced features are lost. * Guest knows nothing about features of the real backend storage. For example, on bare-metal environment INQUIRY SCSI command can be used to query storage device information. If it's a SSD or flash device we can have the option to use the device as a fast cache. But this can't happen in current domU guests, because blkfront only knows it's just a normal virtual disk * Failover Clusters in Windows Failover clusters require SCSI-3 persistent reservation target disks, but now this can't work in domU. 3) Known issues: * Security issues, how to 'validate' this extra request payload. E.g SCSI operates on LUN bases (the whole disk) while we really just want to operate on partitions * Can't pass SCSI commands through if the backend storage driver is bio-based instead of request-based. 4) Alternative approach: Using PVSCSI instead: * Doubt PVSCSI can support as many type of backend storage devices as Xen-block. * Much longer path: ioctl() -> SCSI upper layer -> Middle layer -> PVSCSI-frontend -> PVSCSI-backend -> Target framework(LIO?) -> With xen-block we only need: ioctl() -> blkfront -> blkback -> * xen-block has been existed for many years, widely used and more stable. Welcome any input, thank you! Signed-off-by: Bob Liu --- xen/include/public/io/blkif.h | 73 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 73 insertions(+) diff --git a/xen/include/public/io/blkif.h b/xen/include/public/io/blkif.h index 99f0326..7c10bce 100644 --- a/xen/include/public/io/blkif.h +++ b/xen/include/public/io/blkif.h @@ -635,6 +635,28 @@ #define BLKIF_OP_INDIRECT 6 /* + * Recognised only if "feature-extra-request" is present in backend xenbus info. + * A request with BLKIF_OP_EXTRA_FLAG indicates an extra request is followed + * in the shared ring buffer. + * + * By this way, extra data like SCSI command, DIF/DIX and other per-request/bio + * data can be transmitted from frontend to backend. + * + * The 'wire' format is like: + * Request 1: xen_blkif_request + * [Request 2: xen_blkif_extra_request] (only if request 1 has BLKIF_OP_EXTRA_FLAG) + * Request 3: xen_blkif_request + * Request 4: xen_blkif_request + * [Request 5: xen_blkif_extra_request] (only if request 4 has BLKIF_OP_EXTRA_FLAG) + * ... + * Request N: xen_blkif_request + * + * If a backend does not recognize BLKIF_OP_EXTRA_FLAG, it should *not* create the + * "feature-extra-request" node! + */ +#define BLKIF_OP_EXTRA_FLAG (0x80) + +/* * Maximum scatter/gather segments per request. * This is carefully chosen so that sizeof(blkif_ring_t) <= PAGE_SIZE. * NB. This could be 12 if the ring indexes weren't stored in the same page. @@ -703,10 +725,61 @@ struct blkif_request_indirect { }; typedef struct blkif_request_indirect blkif_request_indirect_t; +enum blkif_extra_request_type { + BLKIF_EXTRA_TYPE_SCSI_CMD = 1, /* Transmit SCSI command. */ +}; + +struct scsi_cmd_req { + /* + * Grant mapping for transmiting SCSI command to backend, and + * also receive sense data from backend. + * One 4KB page is enough. + */ + grant_ref_t cmd_gref; + /* Length of SCSI command in the grant mapped page. */ + unsigned int cmd_len; + + /* + * SCSI command may require transmiting data segment length less + * than a sector(512 bytes). + * Record num_sg and last segment length in extra request so that + * backend can know about them. + */ + unsigned int num_sg; + unsigned int last_sg_len; +}; + +/* + * Extra request, must follow a normal-request and a normal-request can + * only be followed by one extra request. + */ +struct blkif_request_extra { + uint8_t type; /* BLKIF_EXTRA_TYPE_* */ + uint16_t _pad1; +#ifndef CONFIG_X86_32 + uint32_t _pad2; /* offsetof(blkif_...,u.extra.id) == 8 */ +#endif + uint64_t id; + struct scsi_cmd_req scsi_cmd; +} __attribute__((__packed__)); +typedef struct blkif_request_extra blkif_request_extra_t; + +struct scsi_cmd_res { + unsigned int resid_len; + /* Length of sense data returned in grant mapped page. */ + unsigned int sense_len; +}; + +struct blkif_response_extra { + uint8_t type; /* BLKIF_EXTRA_TYPE_* */ + struct scsi_cmd_res scsi_cmd; +} __attribute__((__packed__)); + struct blkif_response { uint64_t id; /* copied from request */ uint8_t operation; /* copied from request */ int16_t status; /* BLKIF_RSP_??? */ + struct blkif_response_extra extra; }; typedef struct blkif_response blkif_response_t;