Message ID | 20220803162857.27770-1-d.bogdanov@yadro.com (mailing list archive) |
---|---|
Headers | show |
Series | Target cluster implementation over DLM | expand |
On 8/3/22 11:04 AM, Dmitry Bogdanov wrote: > Hi linux target comminity. > > Let's me present RFC of an implementation of cluster features for Target > Core that needs for backstore devices shared through cluster nodes. > > The patchset is big and of several subsets, but it contains some arguable > things and it would take too much time to discsuss them separatelly. > > Patches 1-9: > Make RTPI be part of se_tpg instead of se_lun. That is a must because > there is no possibility to assign RTPI on a LUN. > That data model is different from SCST and current in LIO but still does > not contradict with SAM and even is more according to SAM - a whole TCM > is a SCSI Device, and all its ports are SCSI Ports with unique RTPIs. > + unique identification of TPG through the cluster. > + possibility of assignment of RPTI. > - number of all TPGs will be limited to 65535. > This patchset was published first time 2 years ago [1]. In previous > version the peers RTPIs were put in <device>/alua/... folder. In this > version the peers RTPIs are part of TPGs on the remote fabric (patch 35). > > Patches 10-29: > Fixes some bugs and deviations from the standard in PR code. > Undepend pr_reg from se_nacl and se_tpg to be just a registration holder. > Make APTPL registrations (not linked to se_dev_entry) be full-fledged > registrations. What are the arguable parts? Do you think it will be the DLM part and coordinating it with nvmet developers? Or was it patches 1-9 and the multi-node support? Or both :) Is it possible and would it be valuable to at least kind of break this up a little? I would break this up and post the fixes in one set. I'll help you get them in as soon as possible. For patches 1-9, I think I remember you posting them before, but I was in the middle of starting a new job so I didn't review them. I really needed something like that at my last 2 jobs so I think it's a valuable feature and I'll review that as well. If we could at least get those 2 chunks separated then it would make the DLM parts below easier to get eyeballs on. I'm ok with the idea in general. I think every nvmet developer will see the massive patchset and not even look at this first 0/48 email :) > > Patches 30-34: > DLM_CKV module that uses DLM and provides: > * Cluster Lock service (pure wrapper over DLM). > * Cluster Key-Value service in memory storage. > * Cluster Notification service with a blocking acknowledge. > * Cluster membership callbacks. > This module is supposed to be used by TCM and nvmet to implement cluster > operations. > > Patch 35: > New 'remote' (in fact dummy) fabric module. Configuration on this fabric will > provide to TCM a view of TPG/LUN/ACL configuration on a peer nodes. > > Patche 36: > Introduce cluster ops and functions to register a cluster ops > implementation modules. There could be a several different modules. > The device attrib cluster_impl regulates which implementation to use > for that device. 'single' is for default (no cluster) implementation. > > Patches 37-48: > TCM Cluster over DLM module implementation inspired by SCST. > * Use DLM_CKV Lock service to serialize order of PR OUT commands > * Use DLM_CKV Key-Value storage service to store PR cluster data. > Sync it after successful execution of PR OUT command. > * Use DLM_CKV Notification service to notify (in blocking manner) other > nodes to fetch PR cluster data. The handling of PR OUT command is > blocked until other nodes read the cluster PR data. > > It provides: > * Cluster lock per LBA for Compare And Write. > * Full support of SCSI-3 Persistent Reservations including > PREEMPT AND ABORT and REGISTER AND MOVE. > * Normal PR APTPL imlementation (persistanse over power loss) > * Shared LUN RESET > * Shared SCSI-2 Reservations. > * Unit Attentions for all TPGs in cluster >
On Wed, Aug 03, 2022 at 12:36:56PM -0500, Mike Christie wrote: > > On 8/3/22 11:04 AM, Dmitry Bogdanov wrote: > > Hi linux target comminity. > > > > Let's me present RFC of an implementation of cluster features for Target > > Core that needs for backstore devices shared through cluster nodes. > > > > The patchset is big and of several subsets, but it contains some arguable > > things and it would take too much time to discsuss them separatelly. > > > > Patches 1-9: > > Make RTPI be part of se_tpg instead of se_lun. That is a must because > > there is no possibility to assign RTPI on a LUN. > > That data model is different from SCST and current in LIO but still does > > not contradict with SAM and even is more according to SAM - a whole TCM > > is a SCSI Device, and all its ports are SCSI Ports with unique RTPIs. > > + unique identification of TPG through the cluster. > > + possibility of assignment of RPTI. > > - number of all TPGs will be limited to 65535. > > This patchset was published first time 2 years ago [1]. In previous > > version the peers RTPIs were put in <device>/alua/... folder. In this > > version the peers RTPIs are part of TPGs on the remote fabric (patch 35). > > > > Patches 10-29: > > Fixes some bugs and deviations from the standard in PR code. > > Undepend pr_reg from se_nacl and se_tpg to be just a registration holder. > > Make APTPL registrations (not linked to se_dev_entry) be full-fledged > > registrations. > > > What are the arguable parts? Do you think it will be the DLM part > and coordinating it with nvmet developers? Or was it patches 1-9 > and the multi-node support? Or both :) In fact every subset can be a subject to argue :) * RTPI patchset - changing data model from RTPI-set on backstore device to RTPI-set on a whole node. * PR refactoring - to much changes, may be APTPL changes are not backward compatible * remote/dummy fabric - name * DLM_CKV - name, place and even a meaning of the module * tcm_cluster - too much new exported symbols, not resistant to node death in between of storing PR data in DLM_CKV and other error cases. > Is it possible and would it be valuable to at least kind of break this > up a little? > > I would break this up and post the fixes in one set. I'll help you get > them in as soon as possible. After approve of the idea I can break the patch set to several ones and start to post it without RFC prefix. The only problem is that they all depend on previous ones. So I have to post each after the previous gets merged. > > For patches 1-9, I think I remember you posting them before, but I was in > the middle of starting a new job so I didn't review them. I really needed > something like that at my last 2 jobs so I think it's a valuable feature > and I'll review that as well. > > If we could at least get those 2 chunks separated then it would make the DLM > parts below easier to get eyeballs on. I'm ok with the idea in general. I > think every nvmet developer will see the massive patchset and not even look at > this first 0/48 email :) I am not going to share this patchset to nvmet dev list :) nvmet does not yet have a local version of CompareAndWrite and Reservations features, so it is too early for them. > > > > > > Patches 30-34: > > DLM_CKV module that uses DLM and provides: > > * Cluster Lock service (pure wrapper over DLM). > > * Cluster Key-Value service in memory storage. > > * Cluster Notification service with a blocking acknowledge. > > * Cluster membership callbacks. > > This module is supposed to be used by TCM and nvmet to implement cluster > > operations. > > > > Patch 35: > > New 'remote' (in fact dummy) fabric module. Configuration on this fabric will > > provide to TCM a view of TPG/LUN/ACL configuration on a peer nodes. > > > > Patche 36: > > Introduce cluster ops and functions to register a cluster ops > > implementation modules. There could be a several different modules. > > The device attrib cluster_impl regulates which implementation to use > > for that device. 'single' is for default (no cluster) implementation. > > > > Patches 37-48: > > TCM Cluster over DLM module implementation inspired by SCST. > > * Use DLM_CKV Lock service to serialize order of PR OUT commands > > * Use DLM_CKV Key-Value storage service to store PR cluster data. > > Sync it after successful execution of PR OUT command. > > * Use DLM_CKV Notification service to notify (in blocking manner) other > > nodes to fetch PR cluster data. The handling of PR OUT command is > > blocked until other nodes read the cluster PR data. > > > > It provides: > > * Cluster lock per LBA for Compare And Write. > > * Full support of SCSI-3 Persistent Reservations including > > PREEMPT AND ABORT and REGISTER AND MOVE. > > * Normal PR APTPL imlementation (persistanse over power loss) > > * Shared LUN RESET > > * Shared SCSI-2 Reservations. > > * Unit Attentions for all TPGs in cluster > >