Message ID | 20200602113311.121513-1-hare@suse.de (mailing list archive) |
---|---|
Headers | show |
Series | scsi: use xarray for devices and targets | expand |
On Tue, Jun 02, 2020 at 01:33:05PM +0200, Hannes Reinecke wrote: > Hi all, > > based on the ideas from Doug Gilbert here's now my take on using > xarrays for devices and targets. > It revolves around two ideas: > > - The scsi target 'channel' and 'id' numbers are never ever used > to the full 32 bit range; channels are well below 10, and no > driver is using more than 16 bits for the id. So we can reduce > the type of 'channel' and 'id' to 16 bits, and use the 32 bit > value 'channel << 16 | id' as the index into the target xarray. > - Nearly every target only ever uses the first two levels of the > 4-level SCSI LUN structure, which means that we can use the > linearized SCSI LUN id as an index into the xarray. > If we ever come across targets utilizing more that 2 levels of > the LUN structure we'll allocate the first unused index and have > to resort to a less efficient lookup instead of direct indexing. > > With these changes we can implement an efficient lookup mechanism, > devolving into direct lookup for most cases. It also allows us to > detect duplicate entries or accidental overwrites of existing elements > by using xa_cmpxchg(). > And iteration over targets and devices should be as efficient as the > current, list-based, approach. > > As usual, comments and reviews are welcome. I see absolutely no argument for what the point of this series. It adds more code, and I don't really see any indications for it fixing bugs, speeding up workloads, or reducing memory usage.
On 2020-06-03 8:53 a.m., Christoph Hellwig wrote: > On Tue, Jun 02, 2020 at 01:33:05PM +0200, Hannes Reinecke wrote: >> Hi all, >> >> based on the ideas from Doug Gilbert here's now my take on using >> xarrays for devices and targets. >> It revolves around two ideas: >> >> - The scsi target 'channel' and 'id' numbers are never ever used >> to the full 32 bit range; channels are well below 10, and no >> driver is using more than 16 bits for the id. So we can reduce >> the type of 'channel' and 'id' to 16 bits, and use the 32 bit >> value 'channel << 16 | id' as the index into the target xarray. >> - Nearly every target only ever uses the first two levels of the >> 4-level SCSI LUN structure, which means that we can use the >> linearized SCSI LUN id as an index into the xarray. >> If we ever come across targets utilizing more that 2 levels of >> the LUN structure we'll allocate the first unused index and have >> to resort to a less efficient lookup instead of direct indexing. >> >> With these changes we can implement an efficient lookup mechanism, >> devolving into direct lookup for most cases. It also allows us to >> detect duplicate entries or accidental overwrites of existing elements >> by using xa_cmpxchg(). >> And iteration over targets and devices should be as efficient as the >> current, list-based, approach. >> >> As usual, comments and reviews are welcome. > > I see absolutely no argument for what the point of this series. It adds > more code, and I don't really see any indications for it fixing bugs, > speeding up workloads, or reducing memory usage. Lets take memory usage first. The legacy design (part of which may have been a later add-on) has three collections where two are needed: 1) all targets in a host 2) all sdev_s in a target 3) all sdev_s in a host So the third one is redundant and now removed (together with the complexity of making sure those 3 collections are always in sync, seen from the users' viewpoint). Each doubly linked collection on 64 bit machines uses 16 bytes (2 eight byte pointers). So that is a 32 byte reduction in each sdev object. The proposed solution adds 0 bytes because it uses the LUN as an index which is already there. Similar but smaller win in scsi_target objects. There are also some locks and mutexes in the three level object tree (host-target-sdev[LU]) that can probably be dispensed with as xarrays come with their own locks. That has not been done yet making both my earlier proposal and this one "overlocked". And locks and mutexes take up space in objects and slow things down. The speeding up will come in big machine startup and shutdown and its reaction time to disruptions (e.g. cable disconnected to a disk array) IMO. xarray and explicit parent pointers give us a faster way to navigate up and down the object tree. With this patchset we have an O(ln(n)) lookup in the downward direction where currently we only have O(n). Very little use is made of the "lookup" functions in the API because users could see that it was just an iteration (i.e. O(n)). Hopefully transports will take advantage of faster lookups and perhaps implement their own xarrays. Even the upward navigation can be complicated by transports inserting levels between the host and the target. This is what the SCSI mid-layer object tree looks like moving upwards from a SAS SSD, connected to an SAS expander, moving up to its host (a HBA): scsi_device, ptr=ffff99d23f513960 scsi_target, ptr=ffff99d241595c28 sas_rphy, ptr=ffff99d242519c00 sas_port, ptr=ffff99d24251ec00 sas_expander_device, ptr=ffff99d23f4c6438 sas_port, ptr=ffff99d23f4c7400 Scsi_Host, ptr=ffff99d2425261f8 There already is a scsi_device::host redundant pointer to bypass the oft-called and slow-walking dev_to_shost(). I'm proposing another redundant scsi_target::parent_shost pointer that will bypass seven dev_to_shost() invocations. Currently all iterations are done under the host_lock as that is required for doubly linked list safety. xarray uses rcu read locks on all non-modifying operations including iterations and if we can safely rely on them, that will increase the available parallelism within one host. Finally the SCSI fast path will usually require the presence the corresponding sdev object, preferably cached. So making it smaller will help. Doug Gilbert P.S. I sidestepped the "bugs" issue. Surely we will add some but it is hard to believe when you wade into the complexity of the currently linked collections and their myriad of locks, that there aren't subtle bugs in the existing code. I have been working with xarrays for about 1 year and finding locking issues is easier with xarrays compared to "roll your own" linked list locking, IMO.
On 6/3/20 2:53 PM, Christoph Hellwig wrote: > On Tue, Jun 02, 2020 at 01:33:05PM +0200, Hannes Reinecke wrote: >> Hi all, >> >> based on the ideas from Doug Gilbert here's now my take on using >> xarrays for devices and targets. >> It revolves around two ideas: >> >> - The scsi target 'channel' and 'id' numbers are never ever used >> to the full 32 bit range; channels are well below 10, and no >> driver is using more than 16 bits for the id. So we can reduce >> the type of 'channel' and 'id' to 16 bits, and use the 32 bit >> value 'channel << 16 | id' as the index into the target xarray. >> - Nearly every target only ever uses the first two levels of the >> 4-level SCSI LUN structure, which means that we can use the >> linearized SCSI LUN id as an index into the xarray. >> If we ever come across targets utilizing more that 2 levels of >> the LUN structure we'll allocate the first unused index and have >> to resort to a less efficient lookup instead of direct indexing. >> >> With these changes we can implement an efficient lookup mechanism, >> devolving into direct lookup for most cases. It also allows us to >> detect duplicate entries or accidental overwrites of existing elements >> by using xa_cmpxchg(). >> And iteration over targets and devices should be as efficient as the >> current, list-based, approach. >> >> As usual, comments and reviews are welcome. > > I see absolutely no argument for what the point of this series. It adds > more code, and I don't really see any indications for it fixing bugs, > speeding up workloads, or reducing memory usage. > From my perspective this is a proof-of-concept; using xarrays to store targets and LUNs has the benefit that we can directly access the elements, and the lookup will be more efficient for larger setups. But it's not a clear-cut solution, merely replacing one concept with some issues with another concept with another set of issues. Guess the real benefit will come only if we manage to move to explicit scsi target removal, and not the implicit model of making the scsi target dependent on the underlying scsi devices we have now. I'll be experimenting with that and will post an update for it. I _do_ like the xarray for targets, though; they have a fixed location where they can go and as such xarray are a far more natural choice. For LUNs it's less compelling as xarrays can't use 64bits generically as index, but still. Cheers, Hannes