Message ID | 20210827075045.642269-1-damien.lemoal@wdc.com (mailing list archive) |
---|---|
Headers | show |
Series | Initial support for multi-actuator HDDs | expand |
Damien Le Moal <damien.lemoal@wdc.com> writes: > Single LUN multi-actuator hard-disks are cappable to seek and execute > multiple commands in parallel. This capability is exposed to the host > using the Concurrent Positioning Ranges VPD page (SCSI) and Log (ATA). > Each positioning range describes the contiguous set of LBAs that an > actuator serves. Are these ranges exlusive to each actuator or can they overlap? > This series does not attempt in any way to optimize accesses to > multi-actuator devices (e.g. block IO schedulers or filesystems). This > initial support only exposes the independent access ranges information > to user space through sysfs. Is the plan to eventually change the IO scheduler to maintain two different queues, one for each actuator, and send down commands for two different IO streams that the elevator attempts to keep sequential?
On Friday, August 27, 2021 at 10:10:15 AM Phillip Susi wrote: > >This message has originated from an External Source. Please use proper judgment and caution when opening attachments, clicking links, or responding to this email. > > >Damien Le Moal <damien.lemoal@wdc.com> writes: > >> Single LUN multi-actuator hard-disks are cappable to seek and execute >> multiple commands in parallel. This capability is exposed to the host >> using the Concurrent Positioning Ranges VPD page (SCSI) and Log (ATA). >> Each positioning range describes the contiguous set of LBAs that an >> actuator serves. > >Are these ranges exlusive to each actuator or can they overlap? > >> This series does not attempt in any way to optimize accesses to >> multi-actuator devices (e.g. block IO schedulers or filesystems). This >> initial support only exposes the independent access ranges information >> to user space through sysfs. > >Is the plan to eventually change the IO scheduler to maintain two >different queues, one for each actuator, and send down commands for two >different IO streams that the elevator attempts to keep sequential? > > There is nothing in the spec that requires the ranges to be contiguous or non-overlapping. It's easy to imagine a HDD architecture that allows multiple heads to access the same sectors on the disk. It's also easy to imagine a workload scenario where parallel access to the same disk could be useful. (Think of a typical storage design that sequentially writes new user data gradually filling the disk, while simultaneously supporting random user reads over the written data.) The IO Scheduler is a useful place to implement per-actuator load management, but with the LBA-to-actuator mapping available to user space (via sysfs) it could also be done at the user level. Or pretty much anywhere else where we have knowledge and control of the various streams. The system is flexible and adaptable to a really wide range of HDD designs and usage models. Best regards, -Tim Tim Walker Seagate Research
On Fri, Aug 27, 2021 at 02:28:58PM +0000, Tim Walker wrote: > There is nothing in the spec that requires the ranges to be contiguous > or non-overlapping. Yikes, that is a pretty stupid standard. Almost as bad as allowing non-uniform sized non-power of two sized zones :) > It's easy to imagine a HDD architecture that allows multiple heads to access the same sectors on the disk. It's also easy to imagine a workload scenario where parallel access to the same disk could be useful. (Think of a typical storage design that sequentially writes new user data gradually filling the disk, while simultaneously supporting random user reads over the written data.) But for those drivers you do not actually need this scheme at all. Storage devices that support higher concurrency are bog standard with SSDs and if you want to go back storage arrays. The only interesting case is when these ranges are separate so that the access can be carved up based on the boundary. Now I don't want to give people ideas with overlapping but not identical, which would be just horrible.
On Friday, August 27, 2021 at 12:42:54 PM Christoph Hellwig wrote: > > > >On Fri, Aug 27, 2021 at 02:28:58PM +0000, Tim Walker wrote: >> There is nothing in the spec that requires the ranges to be contiguous >> or non-overlapping. > >Yikes, that is a pretty stupid standard. Almost as bad as allowing >non-uniform sized non-power of two sized zones :) > >> It's easy to imagine a HDD architecture that allows multiple heads to access the same sectors on the disk. It's also easy to imagine a workload scenario where parallel access to the same disk could be useful. (Think of a typical storage design that sequentially writes new user data gradually filling the disk, while simultaneously supporting random user reads over the written data.) > >But for those drivers you do not actually need this scheme at all. >Storage devices that support higher concurrency are bog standard with >SSDs and if you want to go back storage arrays. The only interesting >case is when these ranges are separate so that the access can be carved >up based on the boundary. Now I don't want to give people ideas with >overlapping but not identical, which would be just horrible. > Christoph - you are right. The main purpose, AFAIC, is to expose the parallel access capabilities within a LUN/SATA target due to multiple actuators. I hope the ranges are *always* contiguous and *never* overlapping. But there's no telling what somebody has up their sleeve. Best regards, -Tim
Tim Walker <tim.t.walker@seagate.com> writes: > The IO Scheduler is a useful place to implement per-actuator load > management, but with the LBA-to-actuator mapping available to user > space (via sysfs) it could also be done at the user level. Or pretty > much anywhere else where we have knowledge and control of the various > streams. I suppose there may be some things user space could do with the information, but mainly doesn't it have to be done in the IO scheduler? As it stands now, it is going to try to avoid seeking between the two regions even though the drive can service a contiguous stream from both just fine, right?
On 2021/08/28 2:38, Phillip Susi wrote: > > Tim Walker <tim.t.walker@seagate.com> writes: > >> The IO Scheduler is a useful place to implement per-actuator load >> management, but with the LBA-to-actuator mapping available to user >> space (via sysfs) it could also be done at the user level. Or pretty >> much anywhere else where we have knowledge and control of the various >> streams. > > I suppose there may be some things user space could do with the > information, but mainly doesn't it have to be done in the IO scheduler? Correct, if the user does not use a file system then optimizations will depend on the user application and the IO scheduler. > As it stands now, it is going to try to avoid seeking between the two > regions even though the drive can service a contiguous stream from both > just fine, right? Correct. But any IO scheduler optimization will kick-in only and only if the user is accessing the drive at a queue depth beyond the drive max QD, 32 for SATA. If the drive is exercised at a QD less than its maximum, the scheduler does not hold on to requests (at least mq-deadline does not, not sure about bfq). So even with only this patch set (no optimizations at the kernel level), the user can still make things work as expected, that is, get multiple streams of IOs to execute in parallel.
On 2021/08/28 1:43, Christoph Hellwig wrote: > On Fri, Aug 27, 2021 at 02:28:58PM +0000, Tim Walker wrote: >> There is nothing in the spec that requires the ranges to be contiguous >> or non-overlapping. > > Yikes, that is a pretty stupid standard. Almost as bad as allowing > non-uniform sized non-power of two sized zones :) > >> It's easy to imagine a HDD architecture that allows multiple heads to access the same sectors on the disk. It's also easy to imagine a workload scenario where parallel access to the same disk could be useful. (Think of a typical storage design that sequentially writes new user data gradually filling the disk, while simultaneously supporting random user reads over the written data.) > > But for those drivers you do not actually need this scheme at all. Agree. > Storage devices that support higher concurrency are bog standard with > SSDs and if you want to go back storage arrays. The only interesting > case is when these ranges are separate so that the access can be carved > up based on the boundary. Now I don't want to give people ideas with > overlapping but not identical, which would be just horrible. Agree too. And looking at my patch again, the function disk_check_iaranges() in patch 1 only checks that the overall sector range of all access ranges is form 0 to capacity - 1, but it does not check for holes nor overlap. I need to change that and ignore any disk that reports overlapping ranges or ranges with holes in the LBA space. Holes would be horrible and if we have overlap, then the drive can optimize by itself. Will resend a V7 with corrections for that.