Message ID | 20230317195938.1745318-1-bvanassche@acm.org (mailing list archive) |
---|---|
Headers | show |
Series | Submit split bios in LBA order | expand |
On Fri, Mar 17, 2023 at 12:59:36PM -0700, Bart Van Assche wrote: > Hi Jens, > > For zoned storage it is essential that split bios are submitted in LBA order. > This patch series realizes this by modifying __bio_split_to_limits() such that > it submits the first bio fragment and returns the remainder instead of > submitting the remainder and returning the first bio fragment. Please consider > this patch series for the next merge window. Why are you sending large writes using REQ_OP_WRITE and not using REQ_OP_ZONE_APPEND which side steps all these issues?
On 3/17/23 23:29, Christoph Hellwig wrote: > On Fri, Mar 17, 2023 at 12:59:36PM -0700, Bart Van Assche wrote: >> For zoned storage it is essential that split bios are submitted in LBA order. >> This patch series realizes this by modifying __bio_split_to_limits() such that >> it submits the first bio fragment and returns the remainder instead of >> submitting the remainder and returning the first bio fragment. Please consider >> this patch series for the next merge window. > > Why are you sending large writes using REQ_OP_WRITE and not > using REQ_OP_ZONE_APPEND which side steps all these issues? Hi Christoph, How to achieve optimal performance with REQ_OP_ZONE_APPEND for SCSI devices? My understanding of how REQ_OP_ZONE_APPEND works for SCSI devices is as follows: * ATA devices cannot support this operation directly because there are not enough bits in the ATA sense data to report where appended data has been written. * T10 has not yet started with standardizing a zone append operation. * The code that emulates REQ_OP_ZONE_APPEND for SCSI devices (in sd_zbc.c) serializes REQ_OP_ZONE_APPEND operations (QD=1). * To achieve optimal performance, QD > 1 is required. Thanks, Bart.
On Mon, Mar 20, 2023 at 10:28 AM Bart Van Assche <bvanassche@acm.org> wrote: > > On 3/17/23 23:29, Christoph Hellwig wrote: > > On Fri, Mar 17, 2023 at 12:59:36PM -0700, Bart Van Assche wrote: > >> For zoned storage it is essential that split bios are submitted in LBA order. > >> This patch series realizes this by modifying __bio_split_to_limits() such that > >> it submits the first bio fragment and returns the remainder instead of > >> submitting the remainder and returning the first bio fragment. Please consider > >> this patch series for the next merge window. > > > > Why are you sending large writes using REQ_OP_WRITE and not > > using REQ_OP_ZONE_APPEND which side steps all these issues? > > Hi Christoph, > > How to achieve optimal performance with REQ_OP_ZONE_APPEND for SCSI > devices? My understanding of how REQ_OP_ZONE_APPEND works for SCSI > devices is as follows: > * ATA devices cannot support this operation directly because there are > not enough bits in the ATA sense data to report where appended data > has been written. > * T10 has not yet started with standardizing a zone append operation. > * The code that emulates REQ_OP_ZONE_APPEND for SCSI devices (in > sd_zbc.c) serializes REQ_OP_ZONE_APPEND operations (QD=1). > * To achieve optimal performance, QD > 1 is required. I recall there were dragons lurking particularly with how we handle requeues wherein just submitting in order was not sufficient to guarantee IO is actually dispatched in order. (of note: when requeueing a request, we splice it to the _end_ of the hctx dispatch list, so if you get a requeue in the middle of a multi-segment IO, it will get re-ordered. I recall this change went in specifically to re-order requests in case there was a passthrough lurking to un-jam a device.) Have you looked at this? Perhaps requeues are slowpath anyways, so we could sort there? There may also be other requeue weirdness with layered devices... Khazhy
On Mon, Mar 20, 2023 at 10:22:41AM -0700, Bart Van Assche wrote: > How to achieve optimal performance with REQ_OP_ZONE_APPEND for SCSI > devices? My understanding of how REQ_OP_ZONE_APPEND works for SCSI devices > is as follows: > * ATA devices cannot support this operation directly because there are > not enough bits in the ATA sense data to report where appended data > has been written. ATA doesn't really have autosense in the SCSI way. It could be handled the same way that CDL completions are handled. That is a complete mess, and between CDL and Zone Append we'll probably eventually need an extended FIS for SATA if we want to keep ATA alive. > * T10 has not yet started with standardizing a zone append operation. Time to get it started then! > * The code that emulates REQ_OP_ZONE_APPEND for SCSI devices (in > sd_zbc.c) serializes REQ_OP_ZONE_APPEND operations (QD=1). Because that's the only thing that actually works. > * To achieve optimal performance, QD > 1 is required. If you have something magic that works, this code is the place to take advantage of it.
On 3/23/23 01:27, Christoph Hellwig wrote: > On Mon, Mar 20, 2023 at 10:22:41AM -0700, Bart Van Assche wrote: >> * T10 has not yet started with standardizing a zone append operation. > > Time to get it started then! Hi Christoph, If someone else wants to work on this that would be great. I do not plan to work on this because I do not expect that a SCSI zone append command would be standardized by the time we need it. Although there are references to T10 drafts in the UFS standard, since a few months JEDEC strongly prefers to refer to finalized external standards in its own standards. Hence, standardizing zoned storage for UFS would have to wait until T10 has published a standard that supports a zone append command. INCITS published ZBC-1 in 2016, two years after the first ZBC-1 draft was uploaded to the T10 servers. INCITS approved ZBC-2 this month, six years after the first ZBC-2 draft was uploaded to the T10 servers. Because of the long time it takes to complete new versions of T10 standards we plan not to wait until T10 has standardized a zone append operation. Thanks, Bart.
On 3/25/23 02:05, Bart Van Assche wrote: > On 3/23/23 01:27, Christoph Hellwig wrote: >> On Mon, Mar 20, 2023 at 10:22:41AM -0700, Bart Van Assche wrote: >>> * T10 has not yet started with standardizing a zone append operation. >> >> Time to get it started then! > > Hi Christoph, > > If someone else wants to work on this that would be great. I do not plan > to work on this because I do not expect that a SCSI zone append command > would be standardized by the time we need it. Although there are > references to T10 drafts in the UFS standard, since a few months JEDEC > strongly prefers to refer to finalized external standards in its own > standards. Hence, standardizing zoned storage for UFS would have to wait > until T10 has published a standard that supports a zone append command. > INCITS published ZBC-1 in 2016, two years after the first ZBC-1 draft > was uploaded to the T10 servers. INCITS approved ZBC-2 this month, six > years after the first ZBC-2 draft was uploaded to the T10 servers. > Because of the long time it takes to complete new versions of T10 > standards we plan not to wait until T10 has standardized a zone append > operation. Such standardization effort is likely to face a lot of headwind because defining a zone append command for ATA (T13 ACS) is not possible with a single self-contained command (as one cannot return the written sector using sense data like with scsi). And when it comes to ZBC, keeping it in sync with ZAC is desired...
On Sat, Mar 25, 2023 at 11:15:40AM +0900, Damien Le Moal wrote: > Such standardization effort is likely to face a lot of headwind because > defining a zone append command for ATA (T13 ACS) is not possible with a > single self-contained command (as one cannot return the written sector > using sense data like with scsi). The same was true for CDL and it got in anyway. And yes, CDL on ATA is a complete f**king mess, and needs to be fixed. So ATA needs to byte the bullet and extent the FIS anyway, so we might as well get started on it ASAP. Fortunately the only implementations that really matter now are AHCI and SAS expanders, so it sounds very doable to get there. > And when it comes to ZBC, keeping it in sync with ZAC is desired... There is so many features in SCSI and not ATA, most notably protection information that this sounds like a BS argument to me. That being said supporting Zone Append and properly doing CDL in ATA would be very useful.
On Fri, Mar 24, 2023 at 10:05:48AM -0700, Bart Van Assche wrote: > If someone else wants to work on this that would be great. I do not plan to > work on this because I do not expect that a SCSI zone append command would > be standardized by the time we need it. Although there are references to > T10 drafts in the UFS standard, since a few months JEDEC strongly prefers > to refer to finalized external standards in its own standards. Hence, > standardizing zoned storage for UFS would have to wait until T10 has > published a standard that supports a zone append command. INCITS published > ZBC-1 in 2016, two years after the first ZBC-1 draft was uploaded to the > T10 servers. INCITS approved ZBC-2 this month, six years after the first > ZBC-2 draft was uploaded to the T10 servers. Because of the long time it > takes to complete new versions of T10 standards we plan not to wait until > T10 has standardized a zone append operation. Which is why we need to start the work now. Note that I don't think your time frames matter too much - the first draft of zbc2 is where people opened up the process again. The more relevant time frame is between getting the main new feature in and publusing, which is way shorter.
On 3/26/23 16:44, Christoph Hellwig wrote: > On Fri, Mar 24, 2023 at 10:05:48AM -0700, Bart Van Assche wrote: >> If someone else wants to work on this that would be great. I do not plan to >> work on this because I do not expect that a SCSI zone append command would >> be standardized by the time we need it. Although there are references to >> T10 drafts in the UFS standard, since a few months JEDEC strongly prefers >> to refer to finalized external standards in its own standards. Hence, >> standardizing zoned storage for UFS would have to wait until T10 has >> published a standard that supports a zone append command. INCITS published >> ZBC-1 in 2016, two years after the first ZBC-1 draft was uploaded to the >> T10 servers. INCITS approved ZBC-2 this month, six years after the first >> ZBC-2 draft was uploaded to the T10 servers. Because of the long time it >> takes to complete new versions of T10 standards we plan not to wait until >> T10 has standardized a zone append operation. > > Which is why we need to start the work now. Note that I don't think > your time frames matter too much - the first draft of zbc2 is where > people opened up the process again. The more relevant time frame is > between getting the main new feature in and publusing, which is way > shorter. Hi Christoph, If you help with the npo2 zone size patch series making progress towards being integrated in the upstream kernel I will help with the standardization of a write append command in the T10 ZBC standard. Thanks, Bart.