Message ID | 1456717031-13423-1-git-send-email-bob.liu@oracle.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 29/02/16 04:37, Bob Liu wrote: > 1) What is this patch about? > This patch introduces an new block operation (BLKIF_OP_EXTRA_FLAG). > A request with BLKIF_OP_EXTRA_FLAG set means the following request is an > extra request which is used to pass through SCSI commands. > This is like a simplified version of XEN_NETIF_EXTRA_* in netif.h. > It can be extended easily to transmit other per-request/bio data from frontend > to backend e.g Data Integrity Field per bio. > > 2) Why we need this? > Currently only raw data segments are transmitted from blkfront to blkback, which > means some advanced features are lost. > * Guest knows nothing about features of the real backend storage. > For example, on bare-metal environment INQUIRY SCSI command can be used > to query storage device information. If it's a SSD or flash device we > can have the option to use the device as a fast cache. > But this can't happen in current domU guests, because blkfront only > knows it's just a normal virtual disk > > * Failover Clusters in Windows > Failover clusters require SCSI-3 persistent reservation target disks, > but now this can't work in domU. > > 3) Known issues: > * Security issues, how to 'validate' this extra request payload. > E.g SCSI operates on LUN bases (the whole disk) while we really just want to > operate on partitions It's not only validation: some operations just affect the whole LUN (e.g. Reserve/Release). And what about "multi-LUN" commands like "report LUNs"? > * Can't pass SCSI commands through if the backend storage driver is bio-based > instead of request-based. > > 4) Alternative approach: Using PVSCSI instead: > * Doubt PVSCSI can support as many type of backend storage devices as Xen-block. pvSCSI won't need to support all types of backends. It's enough to support those where passing through SCSI commands makes sense. Seems to be a similar issue as the above mentioned problem with bio-based backend storage drivers. > * Much longer path: > ioctl() -> SCSI upper layer -> Middle layer -> PVSCSI-frontend -> PVSCSI-backend -> Target framework(LIO?) -> > > With xen-block we only need: > ioctl() -> blkfront -> blkback -> I'd like to see performance numbers before making a decision. > * xen-block has been existed for many years, widely used and more stable. Adding another SCSI passthrough capability wasn't accepted for pvSCSI (that's the reason I used the Target Framework). Why do you think it will be accepted for pvblk? This is not my personal opinion, just a heads up from someone who had a try already. ;-) Juergen > Welcome any input, thank you! > > Signed-off-by: Bob Liu <bob.liu@oracle.com> > --- > xen/include/public/io/blkif.h | 73 +++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 73 insertions(+) > > diff --git a/xen/include/public/io/blkif.h b/xen/include/public/io/blkif.h > index 99f0326..7c10bce 100644 > --- a/xen/include/public/io/blkif.h > +++ b/xen/include/public/io/blkif.h > @@ -635,6 +635,28 @@ > #define BLKIF_OP_INDIRECT 6 > > /* > + * Recognised only if "feature-extra-request" is present in backend xenbus info. > + * A request with BLKIF_OP_EXTRA_FLAG indicates an extra request is followed > + * in the shared ring buffer. > + * > + * By this way, extra data like SCSI command, DIF/DIX and other per-request/bio > + * data can be transmitted from frontend to backend. > + * > + * The 'wire' format is like: > + * Request 1: xen_blkif_request > + * [Request 2: xen_blkif_extra_request] (only if request 1 has BLKIF_OP_EXTRA_FLAG) > + * Request 3: xen_blkif_request > + * Request 4: xen_blkif_request > + * [Request 5: xen_blkif_extra_request] (only if request 4 has BLKIF_OP_EXTRA_FLAG) > + * ... > + * Request N: xen_blkif_request > + * > + * If a backend does not recognize BLKIF_OP_EXTRA_FLAG, it should *not* create the > + * "feature-extra-request" node! > + */ > +#define BLKIF_OP_EXTRA_FLAG (0x80) > + > +/* > * Maximum scatter/gather segments per request. > * This is carefully chosen so that sizeof(blkif_ring_t) <= PAGE_SIZE. > * NB. This could be 12 if the ring indexes weren't stored in the same page. > @@ -703,10 +725,61 @@ struct blkif_request_indirect { > }; > typedef struct blkif_request_indirect blkif_request_indirect_t; > > +enum blkif_extra_request_type { > + BLKIF_EXTRA_TYPE_SCSI_CMD = 1, /* Transmit SCSI command. */ > +}; > + > +struct scsi_cmd_req { > + /* > + * Grant mapping for transmiting SCSI command to backend, and > + * also receive sense data from backend. > + * One 4KB page is enough. > + */ > + grant_ref_t cmd_gref; > + /* Length of SCSI command in the grant mapped page. */ > + unsigned int cmd_len; > + > + /* > + * SCSI command may require transmiting data segment length less > + * than a sector(512 bytes). > + * Record num_sg and last segment length in extra request so that > + * backend can know about them. > + */ > + unsigned int num_sg; > + unsigned int last_sg_len; > +}; > + > +/* > + * Extra request, must follow a normal-request and a normal-request can > + * only be followed by one extra request. > + */ > +struct blkif_request_extra { > + uint8_t type; /* BLKIF_EXTRA_TYPE_* */ > + uint16_t _pad1; > +#ifndef CONFIG_X86_32 > + uint32_t _pad2; /* offsetof(blkif_...,u.extra.id) == 8 */ > +#endif > + uint64_t id; > + struct scsi_cmd_req scsi_cmd; > +} __attribute__((__packed__)); > +typedef struct blkif_request_extra blkif_request_extra_t; > + > +struct scsi_cmd_res { > + unsigned int resid_len; > + /* Length of sense data returned in grant mapped page. */ > + unsigned int sense_len; > +}; > + > +struct blkif_response_extra { > + uint8_t type; /* BLKIF_EXTRA_TYPE_* */ > + struct scsi_cmd_res scsi_cmd; > +} __attribute__((__packed__)); > + > struct blkif_response { > uint64_t id; /* copied from request */ > uint8_t operation; /* copied from request */ > int16_t status; /* BLKIF_RSP_??? */ > + struct blkif_response_extra extra; > }; > typedef struct blkif_response blkif_response_t; > >
> -----Original Message----- > From: Bob Liu [mailto:bob.liu@oracle.com] > Sent: 29 February 2016 03:37 > To: xen-devel@lists.xen.org > Cc: Ian Jackson; jbeulich@suse.com; Roger Pau Monne; jgross@suse.com; > Paul Durrant; konrad.wilk@oracle.com; Bob Liu > Subject: [RFC PATCH] xen-block: introduces extra request to pass-through > SCSI commands > > 1) What is this patch about? > This patch introduces an new block operation (BLKIF_OP_EXTRA_FLAG). > A request with BLKIF_OP_EXTRA_FLAG set means the following request is an > extra request which is used to pass through SCSI commands. > This is like a simplified version of XEN_NETIF_EXTRA_* in netif.h. > It can be extended easily to transmit other per-request/bio data from > frontend > to backend e.g Data Integrity Field per bio. > > 2) Why we need this? > Currently only raw data segments are transmitted from blkfront to blkback, > which > means some advanced features are lost. > * Guest knows nothing about features of the real backend storage. > For example, on bare-metal environment INQUIRY SCSI command > can be used > to query storage device information. If it's a SSD or flash device we > can have the option to use the device as a fast cache. > But this can't happen in current domU guests, because blkfront only > knows it's just a normal virtual disk > That's the sort of information that should be advertised via xenstore then. There already feature flags for specific things but if some form of throughput/latency information is meaningful to a frontend stack then perhaps that can be advertised too. > * Failover Clusters in Windows > Failover clusters require SCSI-3 persistent reservation target disks, > but now this can't work in domU. > That's true but allowing arbitrary SCSI messages through is not the way forward IMO. Just because Windows thinks every HBA is SCSI doesn't mean other OS do so I think reservation/release should have dedicated messages in the blkif protocol if it's desirable to support clustering in the frontend. > 3) Known issues: > * Security issues, how to 'validate' this extra request payload. > E.g SCSI operates on LUN bases (the whole disk) while we really just want > to > operate on partitions > > * Can't pass SCSI commands through if the backend storage driver is bio- > based > instead of request-based. > > 4) Alternative approach: Using PVSCSI instead: > * Doubt PVSCSI can support as many type of backend storage devices as > Xen-block. > LIO can interface to any block device in much the same way blkback does IIRC. > * Much longer path: > ioctl() -> SCSI upper layer -> Middle layer -> PVSCSI-frontend -> PVSCSI- > backend -> Target framework(LIO?) -> > > With xen-block we only need: > ioctl() -> blkfront -> blkback -> > ...and what happens if the block device that blkback is talking to is a SCSI LUN? That latter path is also not true for Windows. You've got all the SCSI translation logic in the frontend when using blkif so that first path would collapse to: Disk driver -> (SCSI) HBA Driver -> xen-scsiback -> LIO -> backstore -> XXX > * xen-block has been existed for many years, widely used and more stable. > It's definitely widely used, but it has had stability issues in recent times. Paul > Welcome any input, thank you! > > Signed-off-by: Bob Liu <bob.liu@oracle.com> > --- > xen/include/public/io/blkif.h | 73 > +++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 73 insertions(+) > > diff --git a/xen/include/public/io/blkif.h b/xen/include/public/io/blkif.h > index 99f0326..7c10bce 100644 > --- a/xen/include/public/io/blkif.h > +++ b/xen/include/public/io/blkif.h > @@ -635,6 +635,28 @@ > #define BLKIF_OP_INDIRECT 6 > > /* > + * Recognised only if "feature-extra-request" is present in backend xenbus > info. > + * A request with BLKIF_OP_EXTRA_FLAG indicates an extra request is > followed > + * in the shared ring buffer. > + * > + * By this way, extra data like SCSI command, DIF/DIX and other per- > request/bio > + * data can be transmitted from frontend to backend. > + * > + * The 'wire' format is like: > + * Request 1: xen_blkif_request > + * [Request 2: xen_blkif_extra_request] (only if request 1 has > BLKIF_OP_EXTRA_FLAG) > + * Request 3: xen_blkif_request > + * Request 4: xen_blkif_request > + * [Request 5: xen_blkif_extra_request] (only if request 4 has > BLKIF_OP_EXTRA_FLAG) > + * ... > + * Request N: xen_blkif_request > + * > + * If a backend does not recognize BLKIF_OP_EXTRA_FLAG, it should *not* > create the > + * "feature-extra-request" node! > + */ > +#define BLKIF_OP_EXTRA_FLAG (0x80) > + > +/* > * Maximum scatter/gather segments per request. > * This is carefully chosen so that sizeof(blkif_ring_t) <= PAGE_SIZE. > * NB. This could be 12 if the ring indexes weren't stored in the same page. > @@ -703,10 +725,61 @@ struct blkif_request_indirect { > }; > typedef struct blkif_request_indirect blkif_request_indirect_t; > > +enum blkif_extra_request_type { > + BLKIF_EXTRA_TYPE_SCSI_CMD = 1, /* Transmit SCSI > command. */ > +}; > + > +struct scsi_cmd_req { > + /* > + * Grant mapping for transmiting SCSI command to backend, and > + * also receive sense data from backend. > + * One 4KB page is enough. > + */ > + grant_ref_t cmd_gref; > + /* Length of SCSI command in the grant mapped page. */ > + unsigned int cmd_len; > + > + /* > + * SCSI command may require transmiting data segment length less > + * than a sector(512 bytes). > + * Record num_sg and last segment length in extra request so that > + * backend can know about them. > + */ > + unsigned int num_sg; > + unsigned int last_sg_len; > +}; > + > +/* > + * Extra request, must follow a normal-request and a normal-request can > + * only be followed by one extra request. > + */ > +struct blkif_request_extra { > + uint8_t type; /* BLKIF_EXTRA_TYPE_* */ > + uint16_t _pad1; > +#ifndef CONFIG_X86_32 > + uint32_t _pad2; /* offsetof(blkif_...,u.extra.id) == 8 > */ > +#endif > + uint64_t id; > + struct scsi_cmd_req scsi_cmd; > +} __attribute__((__packed__)); > +typedef struct blkif_request_extra blkif_request_extra_t; > + > +struct scsi_cmd_res { > + unsigned int resid_len; > + /* Length of sense data returned in grant mapped page. */ > + unsigned int sense_len; > +}; > + > +struct blkif_response_extra { > + uint8_t type; /* BLKIF_EXTRA_TYPE_* */ > + struct scsi_cmd_res scsi_cmd; > +} __attribute__((__packed__)); > + > struct blkif_response { > uint64_t id; /* copied from request */ > uint8_t operation; /* copied from request */ > int16_t status; /* BLKIF_RSP_??? */ > + struct blkif_response_extra extra; > }; > typedef struct blkif_response blkif_response_t; > > -- > 1.7.10.4
On Mon, Feb 29, 2016 at 09:13:41AM +0000, Paul Durrant wrote: > > -----Original Message----- > > From: Bob Liu [mailto:bob.liu@oracle.com] > > Sent: 29 February 2016 03:37 > > To: xen-devel@lists.xen.org > > Cc: Ian Jackson; jbeulich@suse.com; Roger Pau Monne; jgross@suse.com; > > Paul Durrant; konrad.wilk@oracle.com; Bob Liu > > Subject: [RFC PATCH] xen-block: introduces extra request to pass-through > > SCSI commands > > > > 1) What is this patch about? > > This patch introduces an new block operation (BLKIF_OP_EXTRA_FLAG). > > A request with BLKIF_OP_EXTRA_FLAG set means the following request is an > > extra request which is used to pass through SCSI commands. > > This is like a simplified version of XEN_NETIF_EXTRA_* in netif.h. > > It can be extended easily to transmit other per-request/bio data from > > frontend > > to backend e.g Data Integrity Field per bio. > > > > 2) Why we need this? > > Currently only raw data segments are transmitted from blkfront to blkback, > > which > > means some advanced features are lost. > > * Guest knows nothing about features of the real backend storage. > > For example, on bare-metal environment INQUIRY SCSI command > > can be used > > to query storage device information. If it's a SSD or flash device we > > can have the option to use the device as a fast cache. > > But this can't happen in current domU guests, because blkfront only > > knows it's just a normal virtual disk > > > > That's the sort of information that should be advertised via xenstore then. There already feature flags for specific things but if some form of throughput/latency information is meaningful to a frontend stack then perhaps that can be advertised too. Certainly could be put on the XenStore. Do you envision this being done pre guest creation (so toolstack does it), or the backend finds this and populates the XenStore keys? Or that the frontend writes an XenStore key 'scsi-inq=vpd80' and the backend responds by populating an 'scsi-inq-vpd80=' <binary blob>'? If so can the XenStore accept binary payloads? Can it be more than 4K? > > > * Failover Clusters in Windows > > Failover clusters require SCSI-3 persistent reservation target disks, > > but now this can't work in domU. > > > > That's true but allowing arbitrary SCSI messages through is not the way forward IMO. Just because Windows thinks every HBA is SCSI doesn't mean other OS do so I think reservation/release should have dedicated messages in the blkif protocol if it's desirable to support clustering in the frontend. Could you expand a bit on the 'dedicated message' you have in mind please? > > > 3) Known issues: > > * Security issues, how to 'validate' this extra request payload. > > E.g SCSI operates on LUN bases (the whole disk) while we really just want > > to > > operate on partitions > > > > * Can't pass SCSI commands through if the backend storage driver is bio- > > based > > instead of request-based. > > > > 4) Alternative approach: Using PVSCSI instead: > > * Doubt PVSCSI can support as many type of backend storage devices as > > Xen-block. > > > > LIO can interface to any block device in much the same way blkback does IIRC. But it can't do multipath or LVMs - which is an integral component. Anyhow that is more of a implementation specific quirk. > > > * Much longer path: > > ioctl() -> SCSI upper layer -> Middle layer -> PVSCSI-frontend -> PVSCSI- > > backend -> Target framework(LIO?) -> > > > > With xen-block we only need: > > ioctl() -> blkfront -> blkback -> > > > > ...and what happens if the block device that blkback is talking to is a SCSI LUN? > > That latter path is also not true for Windows. You've got all the SCSI translation logic in the frontend when using blkif so that first path would collapse to: > > Disk driver -> (SCSI) HBA Driver -> xen-scsiback -> LIO -> backstore -> XXX I don't know if it matters on the length of the path for say SCSI INQ. It isn't like that is performance specific. Neither are the clustering SCSI commands. > > > * xen-block has been existed for many years, widely used and more stable. > > > > It's definitely widely used, but it has had stability issues in recent times. Oh? Could you send the bug-reports to me and Roger, CC xen-devel and LKML please ?
On Mon, Feb 29, 2016 at 09:12:30AM +0100, Juergen Gross wrote: > On 29/02/16 04:37, Bob Liu wrote: > > 1) What is this patch about? > > This patch introduces an new block operation (BLKIF_OP_EXTRA_FLAG). > > A request with BLKIF_OP_EXTRA_FLAG set means the following request is an > > extra request which is used to pass through SCSI commands. > > This is like a simplified version of XEN_NETIF_EXTRA_* in netif.h. > > It can be extended easily to transmit other per-request/bio data from frontend > > to backend e.g Data Integrity Field per bio. > > > > 2) Why we need this? > > Currently only raw data segments are transmitted from blkfront to blkback, which > > means some advanced features are lost. > > * Guest knows nothing about features of the real backend storage. > > For example, on bare-metal environment INQUIRY SCSI command can be used > > to query storage device information. If it's a SSD or flash device we > > can have the option to use the device as a fast cache. > > But this can't happen in current domU guests, because blkfront only > > knows it's just a normal virtual disk > > > > * Failover Clusters in Windows > > Failover clusters require SCSI-3 persistent reservation target disks, > > but now this can't work in domU. > > > > 3) Known issues: > > * Security issues, how to 'validate' this extra request payload. > > E.g SCSI operates on LUN bases (the whole disk) while we really just want to > > operate on partitions > > It's not only validation: some operations just affect the whole LUN > (e.g. Reserve/Release). And what about "multi-LUN" commands like > "report LUNs"? Don't expose them. Bob and I want to get an idea of what would be a good compromise to allow some SCSI specific (or perhaps ATA specific or DIF/DIX?) type of commands go through the PV driver. Would it be better if it was through XenBus? But that may not work for some that are tied closely to requests, such as DIF/DIX. However the 'DISCARD' for example worked out - it is an umbrella for both SCSI UNMAP and ATA DISCARD operation and hides the complexity of the low level protocol. Could there be an 'INQ' ? Since the SCSI VPD is the most exhaustive in terms of details it may make sense to base it on that..? > > > * Can't pass SCSI commands through if the backend storage driver is bio-based > > instead of request-based. > > > > 4) Alternative approach: Using PVSCSI instead: > > * Doubt PVSCSI can support as many type of backend storage devices as Xen-block. > > pvSCSI won't need to support all types of backends. It's enough to > support those where passing through SCSI commands makes sense. > > Seems to be a similar issue as the above mentioned problem with > bio-based backend storage drivers. In particular the ones we care about are: - Multipath over FibreChannel devices. - Linear mapping (LVM) over the multipath. - And then potentially an filesystem on top of that - .. and a raw file on the filesystem. Having SCSI VPD 0x83 page sent to frontend for that would be good. Not sure about SCSI reservations. Perhaps those are more of .. unique in that the implementation would have to make sure that the guest owns the whole LUN. But that is implementation question. This is about the design - how would you envision to to cram in SCSI commands or DIF/DIX commands or ATA commands via PV block layer? > > > * Much longer path: > > ioctl() -> SCSI upper layer -> Middle layer -> PVSCSI-frontend -> PVSCSI-backend -> Target framework(LIO?) -> > > > > With xen-block we only need: > > ioctl() -> blkfront -> blkback -> > > I'd like to see performance numbers before making a decision. For SCSI INQ? <laughs> Or are you talking about raw READ/WRITE? > > > * xen-block has been existed for many years, widely used and more stable. > > Adding another SCSI passthrough capability wasn't accepted for pvSCSI > (that's the reason I used the Target Framework). Why do you think it > will be accepted for pvblk? > > This is not my personal opinion, just a heads up from someone who had a > try already. ;-) Right. So SCSI passthrough out. What about mediated access for specific SCSI, or specific ATA, or DIF/DIX ones? And how would you do it knowing the SCSI maintainers much better than we do?
> -----Original Message----- > From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com] > Sent: 29 February 2016 14:56 > To: Paul Durrant > Cc: Bob Liu; xen-devel@lists.xen.org; Ian Jackson; jbeulich@suse.com; Roger > Pau Monne; jgross@suse.com > Subject: Re: [RFC PATCH] xen-block: introduces extra request to pass- > through SCSI commands > > On Mon, Feb 29, 2016 at 09:13:41AM +0000, Paul Durrant wrote: > > > -----Original Message----- > > > From: Bob Liu [mailto:bob.liu@oracle.com] > > > Sent: 29 February 2016 03:37 > > > To: xen-devel@lists.xen.org > > > Cc: Ian Jackson; jbeulich@suse.com; Roger Pau Monne; > jgross@suse.com; > > > Paul Durrant; konrad.wilk@oracle.com; Bob Liu > > > Subject: [RFC PATCH] xen-block: introduces extra request to pass- > through > > > SCSI commands > > > > > > 1) What is this patch about? > > > This patch introduces an new block operation (BLKIF_OP_EXTRA_FLAG). > > > A request with BLKIF_OP_EXTRA_FLAG set means the following request > is an > > > extra request which is used to pass through SCSI commands. > > > This is like a simplified version of XEN_NETIF_EXTRA_* in netif.h. > > > It can be extended easily to transmit other per-request/bio data from > > > frontend > > > to backend e.g Data Integrity Field per bio. > > > > > > 2) Why we need this? > > > Currently only raw data segments are transmitted from blkfront to > blkback, > > > which > > > means some advanced features are lost. > > > * Guest knows nothing about features of the real backend storage. > > > For example, on bare-metal environment INQUIRY SCSI command > > > can be used > > > to query storage device information. If it's a SSD or flash device we > > > can have the option to use the device as a fast cache. > > > But this can't happen in current domU guests, because blkfront only > > > knows it's just a normal virtual disk > > > > > > > That's the sort of information that should be advertised via xenstore then. > There already feature flags for specific things but if some form of > throughput/latency information is meaningful to a frontend stack then > perhaps that can be advertised too. > > Certainly could be put on the XenStore. Do you envision this being done > pre guest creation (so toolstack does it), or the backend finds this > and populates the XenStore keys? > > Or that the frontend writes an XenStore key 'scsi-inq=vpd80' and the > backend > responds by populating an 'scsi-inq-vpd80=' <binary blob>'? If so can > the XenStore accept binary payloads? Can it be more than 4K? > I was thinking more along the lines of blkback creating xenstore keys with any relevant information. We have sector size and number of sectors already but I don't see any harm in having values for other quantities that may be useful to a frontend. Bouncing SCSI inquiries via xenstore was certainly not what I was thinking. > > > > > > * Failover Clusters in Windows > > > Failover clusters require SCSI-3 persistent reservation target disks, > > > but now this can't work in domU. > > > > > > > That's true but allowing arbitrary SCSI messages through is not the way > forward IMO. Just because Windows thinks every HBA is SCSI doesn't mean > other OS do so I think reservation/release should have dedicated messages > in the blkif protocol if it's desirable to support clustering in the frontend. > > Could you expand a bit on the 'dedicated message' you have in mind please? > As in we create message types to reserve/release whatever backend is being used and the backend uses whatever mechanism is appropriate to deal with that. E.g. if it were qdisk talking to a raw file then that might just be an flock. > > > > > 3) Known issues: > > > * Security issues, how to 'validate' this extra request payload. > > > E.g SCSI operates on LUN bases (the whole disk) while we really just > want > > > to > > > operate on partitions > > > > > > * Can't pass SCSI commands through if the backend storage driver is bio- > > > based > > > instead of request-based. > > > > > > 4) Alternative approach: Using PVSCSI instead: > > > * Doubt PVSCSI can support as many type of backend storage devices as > > > Xen-block. > > > > > > > LIO can interface to any block device in much the same way blkback does > IIRC. > > But it can't do multipath or LVMs - which is an integral component. > Really? I was not aware of that limitation and it surprises me since AFAIK LIO can also use a raw file as a backstore which seems like it would be above either of those. > Anyhow that is more of a implementation specific quirk. > > > > > * Much longer path: > > > ioctl() -> SCSI upper layer -> Middle layer -> PVSCSI-frontend -> PVSCSI- > > > backend -> Target framework(LIO?) -> > > > > > > With xen-block we only need: > > > ioctl() -> blkfront -> blkback -> > > > > > > > ...and what happens if the block device that blkback is talking to is a SCSI > LUN? > > > > That latter path is also not true for Windows. You've got all the SCSI > translation logic in the frontend when using blkif so that first path would > collapse to: > > > > Disk driver -> (SCSI) HBA Driver -> xen-scsiback -> LIO -> backstore -> XXX > > I don't know if it matters on the length of the path for say SCSI INQ. It isn't > like > that is performance specific. Neither are the clustering SCSI commands. > > > > > > * xen-block has been existed for many years, widely used and more > stable. > > > > > > > It's definitely widely used, but it has had stability issues in recent times. > > Oh? Could you send the bug-reports to me and Roger, CC xen-devel and > LKML please ? I was casting my mind back to incompatibilities that crept in with persistent grants. TBH I haven't used blkback much since then; I tend to use qemu qdisk as my backend these days. Paul
On 29/02/16 16:05, Konrad Rzeszutek Wilk wrote: > On Mon, Feb 29, 2016 at 09:12:30AM +0100, Juergen Gross wrote: >> On 29/02/16 04:37, Bob Liu wrote: >>> 1) What is this patch about? >>> This patch introduces an new block operation (BLKIF_OP_EXTRA_FLAG). >>> A request with BLKIF_OP_EXTRA_FLAG set means the following request is an >>> extra request which is used to pass through SCSI commands. >>> This is like a simplified version of XEN_NETIF_EXTRA_* in netif.h. >>> It can be extended easily to transmit other per-request/bio data from frontend >>> to backend e.g Data Integrity Field per bio. >>> >>> 2) Why we need this? >>> Currently only raw data segments are transmitted from blkfront to blkback, which >>> means some advanced features are lost. >>> * Guest knows nothing about features of the real backend storage. >>> For example, on bare-metal environment INQUIRY SCSI command can be used >>> to query storage device information. If it's a SSD or flash device we >>> can have the option to use the device as a fast cache. >>> But this can't happen in current domU guests, because blkfront only >>> knows it's just a normal virtual disk >>> >>> * Failover Clusters in Windows >>> Failover clusters require SCSI-3 persistent reservation target disks, >>> but now this can't work in domU. >>> >>> 3) Known issues: >>> * Security issues, how to 'validate' this extra request payload. >>> E.g SCSI operates on LUN bases (the whole disk) while we really just want to >>> operate on partitions >> >> It's not only validation: some operations just affect the whole LUN >> (e.g. Reserve/Release). And what about "multi-LUN" commands like >> "report LUNs"? > > Don't expose them. Bob and I want to get an idea of what would be a good > compromise to allow some SCSI specific (or perhaps ATA specific or DIF/DIX?) type of > commands go through the PV driver. > > Would it be better if it was through XenBus? But that may not work for some > that are tied closely to requests, such as DIF/DIX. > > However the 'DISCARD' for example worked out - it is an umbrella for both > SCSI UNMAP and ATA DISCARD operation and hides the complexity of the low level > protocol. Could there be an 'INQ' ? Since the SCSI VPD is the most exhaustive > in terms of details it may make sense to base it on that..? > >> >>> * Can't pass SCSI commands through if the backend storage driver is bio-based >>> instead of request-based. >>> >>> 4) Alternative approach: Using PVSCSI instead: >>> * Doubt PVSCSI can support as many type of backend storage devices as Xen-block. >> >> pvSCSI won't need to support all types of backends. It's enough to >> support those where passing through SCSI commands makes sense. >> >> Seems to be a similar issue as the above mentioned problem with >> bio-based backend storage drivers. > > In particular the ones we care about are: > - Multipath over FibreChannel devices. > - Linear mapping (LVM) over the multipath. > - And then potentially an filesystem on top of that > - .. and a raw file on the filesystem. > > Having SCSI VPD 0x83 page sent to frontend for that would be good. > > Not sure about SCSI reservations. Perhaps those are more of .. unique > in that the implementation would have to make sure that the guest > owns the whole LUN. But that is implementation question. > > This is about the design - how would you envision to to cram in > SCSI commands or DIF/DIX commands or ATA commands via PV block layer? Have some kind of abstraction which can be mapped to SCSI commands easily, but don't stick to the SCSI definitions. Use your own structures, commands etc. and build SCSI commands from those in the backend. This way you avoid having to emulate SCSI. Instead of naming it "SCSI passthrough" call it "special commands". :-) I would add only those operations you really need. Add an inquiry operation which returns the supported "special commands". The availability of the inquiry can be reported via Xenstore. > >> >>> * Much longer path: >>> ioctl() -> SCSI upper layer -> Middle layer -> PVSCSI-frontend -> PVSCSI-backend -> Target framework(LIO?) -> >>> >>> With xen-block we only need: >>> ioctl() -> blkfront -> blkback -> >> >> I'd like to see performance numbers before making a decision. > > For SCSI INQ? <laughs> > > Or are you talking about raw READ/WRITE? READ/WRITE. I'm quite sure pvSCSI will be slower than pvblk, but I don't know how much slower. >> >>> * xen-block has been existed for many years, widely used and more stable. >> >> Adding another SCSI passthrough capability wasn't accepted for pvSCSI >> (that's the reason I used the Target Framework). Why do you think it >> will be accepted for pvblk? >> >> This is not my personal opinion, just a heads up from someone who had a >> try already. ;-) > > Right. So SCSI passthrough out. What about mediated access for > specific SCSI, or specific ATA, or DIF/DIX ones? And how would you do it > knowing the SCSI maintainers much better than we do? Don't call it SCSI command passthrough, or use the target framework. As stated above: I think renaming the whole feature is better. You'll avoid much trouble and weird configuration problems. Juergen
El 29/2/16 a les 16:28, Paul Durrant ha escrit: >> -----Original Message----- >> From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com] >> Sent: 29 February 2016 14:56 >> To: Paul Durrant >> Cc: Bob Liu; xen-devel@lists.xen.org; Ian Jackson; jbeulich@suse.com; Roger >> Pau Monne; jgross@suse.com >> Subject: Re: [RFC PATCH] xen-block: introduces extra request to pass- >> through SCSI commands >> >> On Mon, Feb 29, 2016 at 09:13:41AM +0000, Paul Durrant wrote: >>>> -----Original Message----- >>>> From: Bob Liu [mailto:bob.liu@oracle.com] >>>> Sent: 29 February 2016 03:37 >>>> To: xen-devel@lists.xen.org >>>> Cc: Ian Jackson; jbeulich@suse.com; Roger Pau Monne; >> jgross@suse.com; >>>> Paul Durrant; konrad.wilk@oracle.com; Bob Liu >>>> Subject: [RFC PATCH] xen-block: introduces extra request to pass- >>>> * xen-block has been existed for many years, widely used and more >> stable. >>>> >>> >>> It's definitely widely used, but it has had stability issues in recent times. >> >> Oh? Could you send the bug-reports to me and Roger, CC xen-devel and >> LKML please ? > > I was casting my mind back to incompatibilities that crept in with persistent grants. TBH I haven't used blkback much since then; I tend to use qemu qdisk as my backend these days. FWIW, QEMU Qdisk also has persistent grants support... and was implemented more or less at the same time as blkback persistent grants. Roger.
Bob Liu writes ("[RFC PATCH] xen-block: introduces extra request to pass-through SCSI commands"): > 4) Alternative approach: Using PVSCSI instead: > * Doubt PVSCSI can support as many type of backend storage devices > as Xen-block. I don't understand why this would be the case. Your "extra requests" are SCSI CDBs, so your proposal only works with SCSI targets. > * Much longer path: > ioctl() -> SCSI upper layer -> Middle layer -> PVSCSI-frontend -> PVSCSI-backend -> Target framework(LIO?) -> The pvscsi pieces here is precisely the multiplexer that you really want, for this all to work correctly. If you want your guests to be able to drive this thing properly, you need to make the thing look like a SCSI target anyway. So effectively you've reinvented PVSCSI. When you say "longer path" do you really mean that the performance is poor ? I don't think this is due to the number of layers here. The layers are quite thin. If there is a performance problem with PVSCSI then perhaps that should be investigated. Ian.
Ian Jackson writes ("Re: [RFC PATCH] xen-block: introduces extra request to pass-through SCSI commands"):
> [stuff suggesting use of PVSCSI instead]
For the avoidance of doubt:
1. Thanks very much for bringing this proposal to us at the concept
stage. It is much easier to discuss these matters in a constructive
way before a lot of effort has been put into an implementation.
2. I should explain the downsides which I see in your proposal:
- Your suggestion has bad security properties: previously, the PV
block protocol would present only a very simple and narrow
interface. Your SCSI CDB passthrough proposal means that guests
would be able to activate features in SCSI targets which would be
unexpected and unintended by the host administrator. Such features
would perhaps even be unknown to the host administrator.
This could be mitigated by making this feature configurable, of
course, defaulting to off, along with clear documentation. But it's
not a desirable property.
- For similar reasons it will often be difficult to use such a feature
safely. Guest software in particular might expect that it can
safely use whatever features it can see, and do all sorts of
exciting things.
- It involves duplicating multiplexing logic which already exists in
PVSCSI.
Ian.
On Mon, Feb 29, 2016 at 04:35:36PM +0100, Roger Pau Monné wrote: > El 29/2/16 a les 16:28, Paul Durrant ha escrit: > >> -----Original Message----- > >> From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com] > >> Sent: 29 February 2016 14:56 > >> To: Paul Durrant > >> Cc: Bob Liu; xen-devel@lists.xen.org; Ian Jackson; jbeulich@suse.com; Roger > >> Pau Monne; jgross@suse.com > >> Subject: Re: [RFC PATCH] xen-block: introduces extra request to pass- > >> through SCSI commands > >> > >> On Mon, Feb 29, 2016 at 09:13:41AM +0000, Paul Durrant wrote: > >>>> -----Original Message----- > >>>> From: Bob Liu [mailto:bob.liu@oracle.com] > >>>> Sent: 29 February 2016 03:37 > >>>> To: xen-devel@lists.xen.org > >>>> Cc: Ian Jackson; jbeulich@suse.com; Roger Pau Monne; > >> jgross@suse.com; > >>>> Paul Durrant; konrad.wilk@oracle.com; Bob Liu > >>>> Subject: [RFC PATCH] xen-block: introduces extra request to pass- > >>>> * xen-block has been existed for many years, widely used and more > >> stable. > >>>> > >>> > >>> It's definitely widely used, but it has had stability issues in recent times. > >> > >> Oh? Could you send the bug-reports to me and Roger, CC xen-devel and > >> LKML please ? > > > > I was casting my mind back to incompatibilities that crept in with persistent grants. TBH I haven't used blkback much since then; I tend to use qemu qdisk as my backend these days. > > FWIW, QEMU Qdisk also has persistent grants support... and was > implemented more or less at the same time as blkback persistent grants. <nods> Perhaps Pauls' definition of 'recent times' means == two years or so :-) :-) > > Roger. >
> -----Original Message----- [snip] > > > I was casting my mind back to incompatibilities that crept in with > persistent grants. TBH I haven't used blkback much since then; I tend to use > qemu qdisk as my backend these days. > > > > FWIW, QEMU Qdisk also has persistent grants support... and was > > implemented more or less at the same time as blkback persistent grants. > > <nods> > > Perhaps Pauls' definition of 'recent times' means == two years or so :-) > > :-) Indeed. That's a blink of an eye for someone as ancient as me :-) Paul > > > > Roger. > >
On 03/01/2016 12:29 AM, Ian Jackson wrote: > Ian Jackson writes ("Re: [RFC PATCH] xen-block: introduces extra request to pass-through SCSI commands"): >> [stuff suggesting use of PVSCSI instead] > > For the avoidance of doubt: > > 1. Thanks very much for bringing this proposal to us at the concept > stage. It is much easier to discuss these matters in a constructive > way before a lot of effort has been put into an implementation. > > 2. I should explain the downsides which I see in your proposal: > > - Your suggestion has bad security properties: previously, the PV > block protocol would present only a very simple and narrow > interface. Your SCSI CDB passthrough proposal means that guests > would be able to activate features in SCSI targets which would be > unexpected and unintended by the host administrator. Such features > would perhaps even be unknown to the host administrator. > > This could be mitigated by making this feature configurable, of > course, defaulting to off, along with clear documentation. But it's > not a desirable property. > > - For similar reasons it will often be difficult to use such a feature > safely. Guest software in particular might expect that it can > safely use whatever features it can see, and do all sorts of > exciting things. > > - It involves duplicating multiplexing logic which already exists in > PVSCSI. > One thing I'm still not sure about PVSCSI is do we have the same security issue since LIO can interface to any block device. E.g when using a partition /dev/sda1 as the PVSCSI-backend, but the PVSCSI-frontend may send SCSI operates on LUN bases (the whole disk). P.S. Thanks to all of you, it helps a lot!
On 03/01/2016 12:29 AM, Ian Jackson wrote: > Ian Jackson writes ("Re: [RFC PATCH] xen-block: introduces extra request to pass-through SCSI commands"): >> [stuff suggesting use of PVSCSI instead] > > For the avoidance of doubt: > > 1. Thanks very much for bringing this proposal to us at the concept > stage. It is much easier to discuss these matters in a constructive > way before a lot of effort has been put into an implementation. > > 2. I should explain the downsides which I see in your proposal: > > - Your suggestion has bad security properties: previously, the PV > block protocol would present only a very simple and narrow > interface. Your SCSI CDB passthrough proposal means that guests > would be able to activate features in SCSI targets which would be > unexpected and unintended by the host administrator. Such features > would perhaps even be unknown to the host administrator. > > This could be mitigated by making this feature configurable, of > course, defaulting to off, along with clear documentation. But it's > not a desirable property. > > - For similar reasons it will often be difficult to use such a feature > safely. Guest software in particular might expect that it can > safely use whatever features it can see, and do all sorts of > exciting things. > > - It involves duplicating multiplexing logic which already exists in > PVSCSI. > One thing I'm still not sure about PVSCSI is do we have the same security issue since LIO can interface to any block device. E.g when using a partition /dev/sda1 as the PVSCSI-backend, but the PVSCSI-frontend may still send SCSI operates on LUN bases (the whole disk). P.S. Thanks to all of you, it helps a lot!
Bob Liu writes ("Re: [RFC PATCH] xen-block: introduces extra request to pass-through SCSI commands"): > One thing I'm still not sure about PVSCSI is do we have the same security issue since LIO can interface to any block device. > E.g when using a partition /dev/sda1 as the PVSCSI-backend, but the PVSCSI-frontend may still send SCSI operates on LUN bases (the whole disk). I don't think you can use pvscsi to passthrough a partition such as /dev/sda1. Such a thing is not a SCSI command target. Ian.
On 01/03/16 19:08, Ian Jackson wrote: > Bob Liu writes ("Re: [RFC PATCH] xen-block: introduces extra request to pass-through SCSI commands"): >> One thing I'm still not sure about PVSCSI is do we have the same security issue since LIO can interface to any block device. >> E.g when using a partition /dev/sda1 as the PVSCSI-backend, but the PVSCSI-frontend may still send SCSI operates on LUN bases (the whole disk). > > I don't think you can use pvscsi to passthrough a partition such as > /dev/sda1. Such a thing is not a SCSI command target. It might be possible via the fileio target backend. In this case LUN based SCSI operations are ignored/refused/emulated by LIO. Juergen
Hi Juergen, On 03/02/2016 03:39 PM, Juergen Gross wrote: > On 01/03/16 19:08, Ian Jackson wrote: >> Bob Liu writes ("Re: [RFC PATCH] xen-block: introduces extra request to pass-through SCSI commands"): >>> One thing I'm still not sure about PVSCSI is do we have the same security issue since LIO can interface to any block device. >>> E.g when using a partition /dev/sda1 as the PVSCSI-backend, but the PVSCSI-frontend may still send SCSI operates on LUN bases (the whole disk). >> >> I don't think you can use pvscsi to passthrough a partition such as >> /dev/sda1. Such a thing is not a SCSI command target. > > It might be possible via the fileio target backend. In this case LUN > based SCSI operations are ignored/refused/emulated by LIO. > Do you know whether pvscsi can work on top of multipath(the device-mapper framework) or LVMs? Thank you! Bob
Bob Liu writes ("Re: [RFC PATCH] xen-block: introduces extra request to pass-through SCSI commands"):
> Do you know whether pvscsi can work on top of multipath(the device-mapper framework) or LVMs?
No, it can't. devmapper and LVM work with the block device
abstraction.
Implicitly you seem to be suggesting that you want to use dm-multipath
and LVM, but also send other SCSI CDBs from the upper layers through
to the underlying SCSI storage target.
I can't see how that could cause anything but pain. In many cases
"the underlying SCSI storage target" wouldn't be well defined. Even
if it was, these side channel SCSI commands are likely to Go Wrong in
exciting ways.
What SCSI commands do you want to send ?
Ian.
> -----Original Message----- > From: Ian Jackson [mailto:Ian.Jackson@eu.citrix.com] > Sent: 02 March 2016 11:41 > To: Bob Liu > Cc: Juergen Gross; xen-devel@lists.xen.org; jbeulich@suse.com; Roger Pau > Monne; Paul Durrant; konrad.wilk@oracle.com > Subject: Re: [RFC PATCH] xen-block: introduces extra request to pass- > through SCSI commands > > Bob Liu writes ("Re: [RFC PATCH] xen-block: introduces extra request to > pass-through SCSI commands"): > > Do you know whether pvscsi can work on top of multipath(the device- > mapper framework) or LVMs? > > No, it can't. devmapper and LVM work with the block device > abstraction. > Google suggests you can use LIO to present an LV as a virtual SCSI LUN, in which case pvscsi should work. Paul > Implicitly you seem to be suggesting that you want to use dm-multipath > and LVM, but also send other SCSI CDBs from the upper layers through > to the underlying SCSI storage target. > > I can't see how that could cause anything but pain. In many cases > "the underlying SCSI storage target" wouldn't be well defined. Even > if it was, these side channel SCSI commands are likely to Go Wrong in > exciting ways. > > What SCSI commands do you want to send ? > > Ian.
On 02/03/16 12:46, Paul Durrant wrote: >> -----Original Message----- >> From: Ian Jackson [mailto:Ian.Jackson@eu.citrix.com] >> Sent: 02 March 2016 11:41 >> To: Bob Liu >> Cc: Juergen Gross; xen-devel@lists.xen.org; jbeulich@suse.com; Roger Pau >> Monne; Paul Durrant; konrad.wilk@oracle.com >> Subject: Re: [RFC PATCH] xen-block: introduces extra request to pass- >> through SCSI commands >> >> Bob Liu writes ("Re: [RFC PATCH] xen-block: introduces extra request to >> pass-through SCSI commands"): >>> Do you know whether pvscsi can work on top of multipath(the device- >> mapper framework) or LVMs? >> >> No, it can't. devmapper and LVM work with the block device >> abstraction. >> > > Google suggests you can use LIO to present an LV as a virtual SCSI LUN, in which case pvscsi should work. Indeed: http://linux-iscsi.org/wiki/Targetcli#IBLOCK Juergen
On 03/02/2016 07:40 PM, Ian Jackson wrote: > Bob Liu writes ("Re: [RFC PATCH] xen-block: introduces extra request to pass-through SCSI commands"): >> Do you know whether pvscsi can work on top of multipath(the device-mapper framework) or LVMs? > > No, it can't. devmapper and LVM work with the block device > abstraction. > > Implicitly you seem to be suggesting that you want to use dm-multipath > and LVM, but also send other SCSI CDBs from the upper layers through > to the underlying SCSI storage target. > Exactly! > I can't see how that could cause anything but pain. In many cases > "the underlying SCSI storage target" wouldn't be well defined. Even > if it was, these side channel SCSI commands are likely to Go Wrong in > exciting ways. > > What SCSI commands do you want to send ? > * INQUIRY * PERSISTENT RESERVE IN * PERSISTENT RESERVE OUT This is for Failover Clusters in Windows, not sure whether more commands are required. I didn't get a required scsi commands list in the failover document.
Bob Liu writes ("Re: [RFC PATCH] xen-block: introduces extra request to pass-through SCSI commands"): > On 03/02/2016 07:40 PM, Ian Jackson wrote: > > I can't see how that could cause anything but pain. In many cases > > "the underlying SCSI storage target" wouldn't be well defined. Even > > if it was, these side channel SCSI commands are likely to Go Wrong in > > exciting ways. > > > > What SCSI commands do you want to send ? > > * INQUIRY ... but why ? > * PERSISTENT RESERVE IN > * PERSISTENT RESERVE OUT > > This is for Failover Clusters in Windows, not sure whether more > commands are required. I didn't get a required scsi commands list > in the failover document. So you want to be able to reserve the volume against concurrent access ? If you're using LVM, such a reservation should apply to the LVM LV, not to the underlying physical storage device, clearly. So I think LIO [1] + PVSCSI might be what you want. Ian. [1] http://linux-iscsi.org/wiki/LIO
> -----Original Message----- > From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com] > Sent: 02 March 2016 17:23 > To: Ian Jackson > Cc: Bob Liu; Juergen Gross; xen-devel@lists.xen.org; jbeulich@suse.com; > Roger Pau Monne; Paul Durrant > Subject: Re: [RFC PATCH] xen-block: introduces extra request to pass- > through SCSI commands > > On Wed, Mar 02, 2016 at 02:44:12PM +0000, Ian Jackson wrote: > > Bob Liu writes ("Re: [RFC PATCH] xen-block: introduces extra request to > pass-through SCSI commands"): > > > On 03/02/2016 07:40 PM, Ian Jackson wrote: > > > > I can't see how that could cause anything but pain. In many cases > > > > "the underlying SCSI storage target" wouldn't be well defined. Even > > > > if it was, these side channel SCSI commands are likely to Go Wrong in > > > > exciting ways. > > > > > > > > What SCSI commands do you want to send ? > > > > > > * INQUIRY > > > > ... but why ? > > Need to expose VPD information to the guest for application usage. > > > > > > * PERSISTENT RESERVE IN > > > * PERSISTENT RESERVE OUT > > > > > > This is for Failover Clusters in Windows, not sure whether more > > > commands are required. I didn't get a required scsi commands list > > > in the failover document. > > > > So you want to be able to reserve the volume against concurrent > > access ? If you're using LVM, such a reservation should apply to the > > LVM LV, not to the underlying physical storage device, clearly. So I > > think LIO [1] + PVSCSI might be what you want. > > > > Except it would mean writting a new Windows PV driver and replacing the > existing block one. Actually, that's quite desirable. The existing driver is actually a SCSI HBA driver (because that's all you can write is you use thr Windows STORPORT wrapper... which you need to if you want to pass WHQL) so making it use a PV scsi protocol would probably make it smaller since all the scsi <-> blkif translation code could be ripped out and all the stuff to synthesize INQUIRY responses based on vbd data in xenstore could go away too. Paul > > > > Ian. > > > > [1] http://linux-iscsi.org/wiki/LIO
Paul Durrant writes ("RE: [RFC PATCH] xen-block: introduces extra request to pass-through SCSI commands"): > Actually, that's quite desirable. The existing driver is actually a > SCSI HBA driver (because that's all you can write is you use thr > Windows STORPORT wrapper... which you need to if you want to pass > WHQL) so making it use a PV scsi protocol would probably make it > smaller since all the scsi <-> blkif translation code could be > ripped out and all the stuff to synthesize INQUIRY responses based > on vbd data in xenstore could go away too. Do non-Linux hosts have something like LIO ? Ian.
On 03/03/16 13:03, Ian Jackson wrote: > Paul Durrant writes ("RE: [RFC PATCH] xen-block: introduces extra request to pass-through SCSI commands"): >> Actually, that's quite desirable. The existing driver is actually a >> SCSI HBA driver (because that's all you can write is you use thr >> Windows STORPORT wrapper... which you need to if you want to pass >> WHQL) so making it use a PV scsi protocol would probably make it >> smaller since all the scsi <-> blkif translation code could be >> ripped out and all the stuff to synthesize INQUIRY responses based >> on vbd data in xenstore could go away too. > > Do non-Linux hosts have something like LIO ? Are there any pvSCSI backends other than the one in Linux? Juergen
On March 3, 2016 7:25:11 AM EST, Juergen Gross <jgross@suse.com> wrote: >On 03/03/16 13:03, Ian Jackson wrote: >> Paul Durrant writes ("RE: [RFC PATCH] xen-block: introduces extra >request to pass-through SCSI commands"): >>> Actually, that's quite desirable. The existing driver is actually a >>> SCSI HBA driver (because that's all you can write is you use thr >>> Windows STORPORT wrapper... which you need to if you want to pass >>> WHQL) so making it use a PV scsi protocol would probably make it >>> smaller since all the scsi <-> blkif translation code could be >>> ripped out and all the stuff to synthesize INQUIRY responses based >>> on vbd data in xenstore could go away too. >> Wait a minute? You have that in the driver? Is this documented somewhere? Did I miss this in blkif.h? >> Do non-Linux hosts have something like LIO ? > >Are there any pvSCSI backends other than the one in Linux? > Not that I know off.
> -----Original Message----- > From: Xen-devel [mailto:xen-devel-bounces@lists.xen.org] On Behalf Of > Konrad Rzeszutek Wilk > Sent: 03 March 2016 14:07 > To: Juergen Gross; Ian Jackson; Paul Durrant > Cc: xen-devel@lists.xen.org; jbeulich@suse.com; Roger Pau Monne > Subject: Re: [Xen-devel] [RFC PATCH] xen-block: introduces extra request to > pass-through SCSI commands > > On March 3, 2016 7:25:11 AM EST, Juergen Gross <jgross@suse.com> wrote: > >On 03/03/16 13:03, Ian Jackson wrote: > >> Paul Durrant writes ("RE: [RFC PATCH] xen-block: introduces extra > >request to pass-through SCSI commands"): > >>> Actually, that's quite desirable. The existing driver is actually a > >>> SCSI HBA driver (because that's all you can write is you use thr > >>> Windows STORPORT wrapper... which you need to if you want to pass > >>> WHQL) so making it use a PV scsi protocol would probably make it > >>> smaller since all the scsi <-> blkif translation code could be > >>> ripped out and all the stuff to synthesize INQUIRY responses based > >>> on vbd data in xenstore could go away too. > >> > > Wait a minute? You have that in the driver? Is this documented somewhere? > Did I miss this in blkif.h? > Yes, the Windows XENVBD driver has a whole load of code for synthesizing INQUIRY responses since that's how Windows probes for disks. There's nothing in blkif.h for any of this though since it’s an implementation detail of the Windows frontend. However, if you want to take a look, go to http://xenbits.xen.org/gitweb/?p=pvdrivers/win/xenvbd.git;a=blob;f=src/xenvbd/pdoinquiry.c. There's even code in there for base64 decoding VPD pages from xenstore! (That's a XenServer implementation artefact... I'm not recommending it as a way forward ;-)) Paul > > >> Do non-Linux hosts have something like LIO ? > > > >Are there any pvSCSI backends other than the one in Linux? > > > > Not that I know off. > > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
diff --git a/xen/include/public/io/blkif.h b/xen/include/public/io/blkif.h index 99f0326..7c10bce 100644 --- a/xen/include/public/io/blkif.h +++ b/xen/include/public/io/blkif.h @@ -635,6 +635,28 @@ #define BLKIF_OP_INDIRECT 6 /* + * Recognised only if "feature-extra-request" is present in backend xenbus info. + * A request with BLKIF_OP_EXTRA_FLAG indicates an extra request is followed + * in the shared ring buffer. + * + * By this way, extra data like SCSI command, DIF/DIX and other per-request/bio + * data can be transmitted from frontend to backend. + * + * The 'wire' format is like: + * Request 1: xen_blkif_request + * [Request 2: xen_blkif_extra_request] (only if request 1 has BLKIF_OP_EXTRA_FLAG) + * Request 3: xen_blkif_request + * Request 4: xen_blkif_request + * [Request 5: xen_blkif_extra_request] (only if request 4 has BLKIF_OP_EXTRA_FLAG) + * ... + * Request N: xen_blkif_request + * + * If a backend does not recognize BLKIF_OP_EXTRA_FLAG, it should *not* create the + * "feature-extra-request" node! + */ +#define BLKIF_OP_EXTRA_FLAG (0x80) + +/* * Maximum scatter/gather segments per request. * This is carefully chosen so that sizeof(blkif_ring_t) <= PAGE_SIZE. * NB. This could be 12 if the ring indexes weren't stored in the same page. @@ -703,10 +725,61 @@ struct blkif_request_indirect { }; typedef struct blkif_request_indirect blkif_request_indirect_t; +enum blkif_extra_request_type { + BLKIF_EXTRA_TYPE_SCSI_CMD = 1, /* Transmit SCSI command. */ +}; + +struct scsi_cmd_req { + /* + * Grant mapping for transmiting SCSI command to backend, and + * also receive sense data from backend. + * One 4KB page is enough. + */ + grant_ref_t cmd_gref; + /* Length of SCSI command in the grant mapped page. */ + unsigned int cmd_len; + + /* + * SCSI command may require transmiting data segment length less + * than a sector(512 bytes). + * Record num_sg and last segment length in extra request so that + * backend can know about them. + */ + unsigned int num_sg; + unsigned int last_sg_len; +}; + +/* + * Extra request, must follow a normal-request and a normal-request can + * only be followed by one extra request. + */ +struct blkif_request_extra { + uint8_t type; /* BLKIF_EXTRA_TYPE_* */ + uint16_t _pad1; +#ifndef CONFIG_X86_32 + uint32_t _pad2; /* offsetof(blkif_...,u.extra.id) == 8 */ +#endif + uint64_t id; + struct scsi_cmd_req scsi_cmd; +} __attribute__((__packed__)); +typedef struct blkif_request_extra blkif_request_extra_t; + +struct scsi_cmd_res { + unsigned int resid_len; + /* Length of sense data returned in grant mapped page. */ + unsigned int sense_len; +}; + +struct blkif_response_extra { + uint8_t type; /* BLKIF_EXTRA_TYPE_* */ + struct scsi_cmd_res scsi_cmd; +} __attribute__((__packed__)); + struct blkif_response { uint64_t id; /* copied from request */ uint8_t operation; /* copied from request */ int16_t status; /* BLKIF_RSP_??? */ + struct blkif_response_extra extra; }; typedef struct blkif_response blkif_response_t;
1) What is this patch about? This patch introduces an new block operation (BLKIF_OP_EXTRA_FLAG). A request with BLKIF_OP_EXTRA_FLAG set means the following request is an extra request which is used to pass through SCSI commands. This is like a simplified version of XEN_NETIF_EXTRA_* in netif.h. It can be extended easily to transmit other per-request/bio data from frontend to backend e.g Data Integrity Field per bio. 2) Why we need this? Currently only raw data segments are transmitted from blkfront to blkback, which means some advanced features are lost. * Guest knows nothing about features of the real backend storage. For example, on bare-metal environment INQUIRY SCSI command can be used to query storage device information. If it's a SSD or flash device we can have the option to use the device as a fast cache. But this can't happen in current domU guests, because blkfront only knows it's just a normal virtual disk * Failover Clusters in Windows Failover clusters require SCSI-3 persistent reservation target disks, but now this can't work in domU. 3) Known issues: * Security issues, how to 'validate' this extra request payload. E.g SCSI operates on LUN bases (the whole disk) while we really just want to operate on partitions * Can't pass SCSI commands through if the backend storage driver is bio-based instead of request-based. 4) Alternative approach: Using PVSCSI instead: * Doubt PVSCSI can support as many type of backend storage devices as Xen-block. * Much longer path: ioctl() -> SCSI upper layer -> Middle layer -> PVSCSI-frontend -> PVSCSI-backend -> Target framework(LIO?) -> With xen-block we only need: ioctl() -> blkfront -> blkback -> * xen-block has been existed for many years, widely used and more stable. Welcome any input, thank you! Signed-off-by: Bob Liu <bob.liu@oracle.com> --- xen/include/public/io/blkif.h | 73 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 73 insertions(+)