Message ID | 20190227171442.11853-14-igor.j.konopko@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | lightnvm: bugfixes and improvements | expand |
> On 27 Feb 2019, at 18.14, Igor Konopko <igor.j.konopko@intel.com> wrote: > > Current lightnvm and pblk implementation does not care > about NVMe max data transfer size, which can be smaller > than 64*K=256K. This patch fixes issues related to that. > > Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> > --- > drivers/lightnvm/core.c | 9 +++++++-- > drivers/nvme/host/lightnvm.c | 1 + > include/linux/lightnvm.h | 1 + > 3 files changed, 9 insertions(+), 2 deletions(-) > > diff --git a/drivers/lightnvm/core.c b/drivers/lightnvm/core.c > index 5f82036fe322..c01f83b8fbaf 100644 > --- a/drivers/lightnvm/core.c > +++ b/drivers/lightnvm/core.c > @@ -325,6 +325,7 @@ static int nvm_create_tgt(struct nvm_dev *dev, struct nvm_ioctl_create *create) > struct nvm_target *t; > struct nvm_tgt_dev *tgt_dev; > void *targetdata; > + unsigned int mdts; > int ret; > > switch (create->conf.type) { > @@ -412,8 +413,12 @@ static int nvm_create_tgt(struct nvm_dev *dev, struct nvm_ioctl_create *create) > tdisk->private_data = targetdata; > tqueue->queuedata = targetdata; > > - blk_queue_max_hw_sectors(tqueue, > - (dev->geo.csecs >> 9) * NVM_MAX_VLBA); > + mdts = (dev->geo.csecs >> 9) * NVM_MAX_VLBA; > + if (dev->geo.mdts) { > + mdts = min_t(u32, dev->geo.mdts, > + (dev->geo.csecs >> 9) * NVM_MAX_VLBA); > + } > + blk_queue_max_hw_sectors(tqueue, mdts); > > set_capacity(tdisk, tt->capacity(targetdata)); > add_disk(tdisk); > diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c > index b759c25c89c8..b88a39a3cbd1 100644 > --- a/drivers/nvme/host/lightnvm.c > +++ b/drivers/nvme/host/lightnvm.c > @@ -991,6 +991,7 @@ int nvme_nvm_register(struct nvme_ns *ns, char *disk_name, int node) > geo->csecs = 1 << ns->lba_shift; > geo->sos = ns->ms; > geo->ext = ns->ext; > + geo->mdts = ns->ctrl->max_hw_sectors; > > dev->q = q; > memcpy(dev->name, disk_name, DISK_NAME_LEN); > diff --git a/include/linux/lightnvm.h b/include/linux/lightnvm.h > index 5d865a5d5cdc..d3b02708e5f0 100644 > --- a/include/linux/lightnvm.h > +++ b/include/linux/lightnvm.h > @@ -358,6 +358,7 @@ struct nvm_geo { > u16 csecs; /* sector size */ > u16 sos; /* out-of-band area size */ > bool ext; /* metadata in extended data buffer */ > + u32 mdts; /* Max data transfer size*/ > > /* device write constrains */ > u32 ws_min; /* minimum write size */ > -- > 2.17.1 I see where you are going with this and I partially agree, but none of the OCSSD specs define a way to define this parameter. Thus, adding this behavior taken from NVMe in Linux can break current implementations. Is this a real life problem for you? Or this is just for NVMe “correctness”? Javier
On Mon, Mar 4, 2019 at 10:05 AM Javier González <javier@javigon.com> wrote: > > > On 27 Feb 2019, at 18.14, Igor Konopko <igor.j.konopko@intel.com> wrote: > > > > Current lightnvm and pblk implementation does not care > > about NVMe max data transfer size, which can be smaller > > than 64*K=256K. This patch fixes issues related to that. Could you describe *what* issues you are fixing? > > > > Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> > > --- > > drivers/lightnvm/core.c | 9 +++++++-- > > drivers/nvme/host/lightnvm.c | 1 + > > include/linux/lightnvm.h | 1 + > > 3 files changed, 9 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/lightnvm/core.c b/drivers/lightnvm/core.c > > index 5f82036fe322..c01f83b8fbaf 100644 > > --- a/drivers/lightnvm/core.c > > +++ b/drivers/lightnvm/core.c > > @@ -325,6 +325,7 @@ static int nvm_create_tgt(struct nvm_dev *dev, struct nvm_ioctl_create *create) > > struct nvm_target *t; > > struct nvm_tgt_dev *tgt_dev; > > void *targetdata; > > + unsigned int mdts; > > int ret; > > > > switch (create->conf.type) { > > @@ -412,8 +413,12 @@ static int nvm_create_tgt(struct nvm_dev *dev, struct nvm_ioctl_create *create) > > tdisk->private_data = targetdata; > > tqueue->queuedata = targetdata; > > > > - blk_queue_max_hw_sectors(tqueue, > > - (dev->geo.csecs >> 9) * NVM_MAX_VLBA); > > + mdts = (dev->geo.csecs >> 9) * NVM_MAX_VLBA; > > + if (dev->geo.mdts) { > > + mdts = min_t(u32, dev->geo.mdts, > > + (dev->geo.csecs >> 9) * NVM_MAX_VLBA); > > + } > > + blk_queue_max_hw_sectors(tqueue, mdts); > > > > set_capacity(tdisk, tt->capacity(targetdata)); > > add_disk(tdisk); > > diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c > > index b759c25c89c8..b88a39a3cbd1 100644 > > --- a/drivers/nvme/host/lightnvm.c > > +++ b/drivers/nvme/host/lightnvm.c > > @@ -991,6 +991,7 @@ int nvme_nvm_register(struct nvme_ns *ns, char *disk_name, int node) > > geo->csecs = 1 << ns->lba_shift; > > geo->sos = ns->ms; > > geo->ext = ns->ext; > > + geo->mdts = ns->ctrl->max_hw_sectors; > > > > dev->q = q; > > memcpy(dev->name, disk_name, DISK_NAME_LEN); > > diff --git a/include/linux/lightnvm.h b/include/linux/lightnvm.h > > index 5d865a5d5cdc..d3b02708e5f0 100644 > > --- a/include/linux/lightnvm.h > > +++ b/include/linux/lightnvm.h > > @@ -358,6 +358,7 @@ struct nvm_geo { > > u16 csecs; /* sector size */ > > u16 sos; /* out-of-band area size */ > > bool ext; /* metadata in extended data buffer */ > > + u32 mdts; /* Max data transfer size*/ > > > > /* device write constrains */ > > u32 ws_min; /* minimum write size */ > > -- > > 2.17.1 > > I see where you are going with this and I partially agree, but none of > the OCSSD specs define a way to define this parameter. Thus, adding this > behavior taken from NVMe in Linux can break current implementations. Is > this a real life problem for you? Or this is just for NVMe “correctness”? > > Javier Hmm.Looking into the 2.0 spec what it says about vector reads: (figure 28):"The number of Logical Blocks (NLB): This field indicates the number of logical blocks to be read. This is a 0’s based value. Maximum of 64 LBAs is supported." You got the max limit covered, and the spec does not say anything about the minimum number of LBAs to support. Matias: any thoughts on this? Javier: How would this patch break current implementations? Igor: how does this patch fix the mdts restriction? There are no checks on i.e. the gc read path that ensures that a lower limit than NVM_MAX_VLBA is enforced. Thanks, Hans
> On 4 Mar 2019, at 12.30, Hans Holmberg <hans@owltronix.com> wrote: > > On Mon, Mar 4, 2019 at 10:05 AM Javier González <javier@javigon.com> wrote: >>> On 27 Feb 2019, at 18.14, Igor Konopko <igor.j.konopko@intel.com> wrote: >>> >>> Current lightnvm and pblk implementation does not care >>> about NVMe max data transfer size, which can be smaller >>> than 64*K=256K. This patch fixes issues related to that. > > Could you describe *what* issues you are fixing? > >>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> >>> --- >>> drivers/lightnvm/core.c | 9 +++++++-- >>> drivers/nvme/host/lightnvm.c | 1 + >>> include/linux/lightnvm.h | 1 + >>> 3 files changed, 9 insertions(+), 2 deletions(-) >>> >>> diff --git a/drivers/lightnvm/core.c b/drivers/lightnvm/core.c >>> index 5f82036fe322..c01f83b8fbaf 100644 >>> --- a/drivers/lightnvm/core.c >>> +++ b/drivers/lightnvm/core.c >>> @@ -325,6 +325,7 @@ static int nvm_create_tgt(struct nvm_dev *dev, struct nvm_ioctl_create *create) >>> struct nvm_target *t; >>> struct nvm_tgt_dev *tgt_dev; >>> void *targetdata; >>> + unsigned int mdts; >>> int ret; >>> >>> switch (create->conf.type) { >>> @@ -412,8 +413,12 @@ static int nvm_create_tgt(struct nvm_dev *dev, struct nvm_ioctl_create *create) >>> tdisk->private_data = targetdata; >>> tqueue->queuedata = targetdata; >>> >>> - blk_queue_max_hw_sectors(tqueue, >>> - (dev->geo.csecs >> 9) * NVM_MAX_VLBA); >>> + mdts = (dev->geo.csecs >> 9) * NVM_MAX_VLBA; >>> + if (dev->geo.mdts) { >>> + mdts = min_t(u32, dev->geo.mdts, >>> + (dev->geo.csecs >> 9) * NVM_MAX_VLBA); >>> + } >>> + blk_queue_max_hw_sectors(tqueue, mdts); >>> >>> set_capacity(tdisk, tt->capacity(targetdata)); >>> add_disk(tdisk); >>> diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c >>> index b759c25c89c8..b88a39a3cbd1 100644 >>> --- a/drivers/nvme/host/lightnvm.c >>> +++ b/drivers/nvme/host/lightnvm.c >>> @@ -991,6 +991,7 @@ int nvme_nvm_register(struct nvme_ns *ns, char *disk_name, int node) >>> geo->csecs = 1 << ns->lba_shift; >>> geo->sos = ns->ms; >>> geo->ext = ns->ext; >>> + geo->mdts = ns->ctrl->max_hw_sectors; >>> >>> dev->q = q; >>> memcpy(dev->name, disk_name, DISK_NAME_LEN); >>> diff --git a/include/linux/lightnvm.h b/include/linux/lightnvm.h >>> index 5d865a5d5cdc..d3b02708e5f0 100644 >>> --- a/include/linux/lightnvm.h >>> +++ b/include/linux/lightnvm.h >>> @@ -358,6 +358,7 @@ struct nvm_geo { >>> u16 csecs; /* sector size */ >>> u16 sos; /* out-of-band area size */ >>> bool ext; /* metadata in extended data buffer */ >>> + u32 mdts; /* Max data transfer size*/ >>> >>> /* device write constrains */ >>> u32 ws_min; /* minimum write size */ >>> -- >>> 2.17.1 >> >> I see where you are going with this and I partially agree, but none of >> the OCSSD specs define a way to define this parameter. Thus, adding this >> behavior taken from NVMe in Linux can break current implementations. Is >> this a real life problem for you? Or this is just for NVMe “correctness”? >> >> Javier > > Hmm.Looking into the 2.0 spec what it says about vector reads: > > (figure 28):"The number of Logical Blocks (NLB): This field indicates > the number of logical blocks to be read. This is a 0’s based value. > Maximum of 64 LBAs is supported." > > You got the max limit covered, and the spec does not say anything > about the minimum number of LBAs to support. > > Matias: any thoughts on this? > > Javier: How would this patch break current implementations? Say an OCSSD controller that sets mdts to a value under 64 or does not set it at all (maybe garbage). Think you can get to one pretty quickly... > > Igor: how does this patch fix the mdts restriction? There are no > checks on i.e. the gc read path that ensures that a lower limit than > NVM_MAX_VLBA is enforced. This is the other part where the implementation breaks. Javier
On Mon, Mar 4, 2019 at 12:44 PM Javier González <javier@javigon.com> wrote: > > > > > On 4 Mar 2019, at 12.30, Hans Holmberg <hans@owltronix.com> wrote: > > > > On Mon, Mar 4, 2019 at 10:05 AM Javier González <javier@javigon.com> wrote: > >>> On 27 Feb 2019, at 18.14, Igor Konopko <igor.j.konopko@intel.com> wrote: > >>> > >>> Current lightnvm and pblk implementation does not care > >>> about NVMe max data transfer size, which can be smaller > >>> than 64*K=256K. This patch fixes issues related to that. > > > > Could you describe *what* issues you are fixing? > > > >>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> > >>> --- > >>> drivers/lightnvm/core.c | 9 +++++++-- > >>> drivers/nvme/host/lightnvm.c | 1 + > >>> include/linux/lightnvm.h | 1 + > >>> 3 files changed, 9 insertions(+), 2 deletions(-) > >>> > >>> diff --git a/drivers/lightnvm/core.c b/drivers/lightnvm/core.c > >>> index 5f82036fe322..c01f83b8fbaf 100644 > >>> --- a/drivers/lightnvm/core.c > >>> +++ b/drivers/lightnvm/core.c > >>> @@ -325,6 +325,7 @@ static int nvm_create_tgt(struct nvm_dev *dev, struct nvm_ioctl_create *create) > >>> struct nvm_target *t; > >>> struct nvm_tgt_dev *tgt_dev; > >>> void *targetdata; > >>> + unsigned int mdts; > >>> int ret; > >>> > >>> switch (create->conf.type) { > >>> @@ -412,8 +413,12 @@ static int nvm_create_tgt(struct nvm_dev *dev, struct nvm_ioctl_create *create) > >>> tdisk->private_data = targetdata; > >>> tqueue->queuedata = targetdata; > >>> > >>> - blk_queue_max_hw_sectors(tqueue, > >>> - (dev->geo.csecs >> 9) * NVM_MAX_VLBA); > >>> + mdts = (dev->geo.csecs >> 9) * NVM_MAX_VLBA; > >>> + if (dev->geo.mdts) { > >>> + mdts = min_t(u32, dev->geo.mdts, > >>> + (dev->geo.csecs >> 9) * NVM_MAX_VLBA); > >>> + } > >>> + blk_queue_max_hw_sectors(tqueue, mdts); > >>> > >>> set_capacity(tdisk, tt->capacity(targetdata)); > >>> add_disk(tdisk); > >>> diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c > >>> index b759c25c89c8..b88a39a3cbd1 100644 > >>> --- a/drivers/nvme/host/lightnvm.c > >>> +++ b/drivers/nvme/host/lightnvm.c > >>> @@ -991,6 +991,7 @@ int nvme_nvm_register(struct nvme_ns *ns, char *disk_name, int node) > >>> geo->csecs = 1 << ns->lba_shift; > >>> geo->sos = ns->ms; > >>> geo->ext = ns->ext; > >>> + geo->mdts = ns->ctrl->max_hw_sectors; > >>> > >>> dev->q = q; > >>> memcpy(dev->name, disk_name, DISK_NAME_LEN); > >>> diff --git a/include/linux/lightnvm.h b/include/linux/lightnvm.h > >>> index 5d865a5d5cdc..d3b02708e5f0 100644 > >>> --- a/include/linux/lightnvm.h > >>> +++ b/include/linux/lightnvm.h > >>> @@ -358,6 +358,7 @@ struct nvm_geo { > >>> u16 csecs; /* sector size */ > >>> u16 sos; /* out-of-band area size */ > >>> bool ext; /* metadata in extended data buffer */ > >>> + u32 mdts; /* Max data transfer size*/ > >>> > >>> /* device write constrains */ > >>> u32 ws_min; /* minimum write size */ > >>> -- > >>> 2.17.1 > >> > >> I see where you are going with this and I partially agree, but none of > >> the OCSSD specs define a way to define this parameter. Thus, adding this > >> behavior taken from NVMe in Linux can break current implementations. Is > >> this a real life problem for you? Or this is just for NVMe “correctness”? > >> > >> Javier > > > > Hmm.Looking into the 2.0 spec what it says about vector reads: > > > > (figure 28):"The number of Logical Blocks (NLB): This field indicates > > the number of logical blocks to be read. This is a 0’s based value. > > Maximum of 64 LBAs is supported." > > > > You got the max limit covered, and the spec does not say anything > > about the minimum number of LBAs to support. > > > > Matias: any thoughts on this? > > > > Javier: How would this patch break current implementations? > > Say an OCSSD controller that sets mdts to a value under 64 or does not > set it at all (maybe garbage). Think you can get to one pretty quickly... So we cant make use of a perfectly good, standardized, parameter because some hypothetical non-compliant device out there might not provide a sane value? > > > > > Igor: how does this patch fix the mdts restriction? There are no > > checks on i.e. the gc read path that ensures that a lower limit than > > NVM_MAX_VLBA is enforced. > > This is the other part where the implementation breaks. No, it just does not fix it. over-and-out, Hans > > Javier
On 04.03.2019 13:22, Hans Holmberg wrote: > On Mon, Mar 4, 2019 at 12:44 PM Javier González <javier@javigon.com> wrote: >> >> >> >>> On 4 Mar 2019, at 12.30, Hans Holmberg <hans@owltronix.com> wrote: >>> >>> On Mon, Mar 4, 2019 at 10:05 AM Javier González <javier@javigon.com> wrote: >>>>> On 27 Feb 2019, at 18.14, Igor Konopko <igor.j.konopko@intel.com> wrote: >>>>> >>>>> Current lightnvm and pblk implementation does not care >>>>> about NVMe max data transfer size, which can be smaller >>>>> than 64*K=256K. This patch fixes issues related to that. >>> >>> Could you describe *what* issues you are fixing? I'm fixing the issue related to controllers which NVMe max data transfer size is lower that 256K (for example 128K, which happens for existing NVMe controllers and NVMe spec compliant). Such a controllers are not able to handle command which contains 64 PPAs, since the the size of DMAed buffer will be above the capabilities of such a controller. >>> >>>>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> >>>>> --- >>>>> drivers/lightnvm/core.c | 9 +++++++-- >>>>> drivers/nvme/host/lightnvm.c | 1 + >>>>> include/linux/lightnvm.h | 1 + >>>>> 3 files changed, 9 insertions(+), 2 deletions(-) >>>>> >>>>> diff --git a/drivers/lightnvm/core.c b/drivers/lightnvm/core.c >>>>> index 5f82036fe322..c01f83b8fbaf 100644 >>>>> --- a/drivers/lightnvm/core.c >>>>> +++ b/drivers/lightnvm/core.c >>>>> @@ -325,6 +325,7 @@ static int nvm_create_tgt(struct nvm_dev *dev, struct nvm_ioctl_create *create) >>>>> struct nvm_target *t; >>>>> struct nvm_tgt_dev *tgt_dev; >>>>> void *targetdata; >>>>> + unsigned int mdts; >>>>> int ret; >>>>> >>>>> switch (create->conf.type) { >>>>> @@ -412,8 +413,12 @@ static int nvm_create_tgt(struct nvm_dev *dev, struct nvm_ioctl_create *create) >>>>> tdisk->private_data = targetdata; >>>>> tqueue->queuedata = targetdata; >>>>> >>>>> - blk_queue_max_hw_sectors(tqueue, >>>>> - (dev->geo.csecs >> 9) * NVM_MAX_VLBA); >>>>> + mdts = (dev->geo.csecs >> 9) * NVM_MAX_VLBA; >>>>> + if (dev->geo.mdts) { >>>>> + mdts = min_t(u32, dev->geo.mdts, >>>>> + (dev->geo.csecs >> 9) * NVM_MAX_VLBA); >>>>> + } >>>>> + blk_queue_max_hw_sectors(tqueue, mdts); >>>>> >>>>> set_capacity(tdisk, tt->capacity(targetdata)); >>>>> add_disk(tdisk); >>>>> diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c >>>>> index b759c25c89c8..b88a39a3cbd1 100644 >>>>> --- a/drivers/nvme/host/lightnvm.c >>>>> +++ b/drivers/nvme/host/lightnvm.c >>>>> @@ -991,6 +991,7 @@ int nvme_nvm_register(struct nvme_ns *ns, char *disk_name, int node) >>>>> geo->csecs = 1 << ns->lba_shift; >>>>> geo->sos = ns->ms; >>>>> geo->ext = ns->ext; >>>>> + geo->mdts = ns->ctrl->max_hw_sectors; >>>>> >>>>> dev->q = q; >>>>> memcpy(dev->name, disk_name, DISK_NAME_LEN); >>>>> diff --git a/include/linux/lightnvm.h b/include/linux/lightnvm.h >>>>> index 5d865a5d5cdc..d3b02708e5f0 100644 >>>>> --- a/include/linux/lightnvm.h >>>>> +++ b/include/linux/lightnvm.h >>>>> @@ -358,6 +358,7 @@ struct nvm_geo { >>>>> u16 csecs; /* sector size */ >>>>> u16 sos; /* out-of-band area size */ >>>>> bool ext; /* metadata in extended data buffer */ >>>>> + u32 mdts; /* Max data transfer size*/ >>>>> >>>>> /* device write constrains */ >>>>> u32 ws_min; /* minimum write size */ >>>>> -- >>>>> 2.17.1 >>>> >>>> I see where you are going with this and I partially agree, but none of >>>> the OCSSD specs define a way to define this parameter. Thus, adding this >>>> behavior taken from NVMe in Linux can break current implementations. Is >>>> this a real life problem for you? Or this is just for NVMe “correctness”? >>>> >>>> Javier >>> >>> Hmm.Looking into the 2.0 spec what it says about vector reads: >>> >>> (figure 28):"The number of Logical Blocks (NLB): This field indicates >>> the number of logical blocks to be read. This is a 0’s based value. >>> Maximum of 64 LBAs is supported." >>> >>> You got the max limit covered, and the spec does not say anything >>> about the minimum number of LBAs to support. >>> >>> Matias: any thoughts on this? >>> >>> Javier: How would this patch break current implementations? >> >> Say an OCSSD controller that sets mdts to a value under 64 or does not >> set it at all (maybe garbage). Think you can get to one pretty quickly... > > So we cant make use of a perfectly good, standardized, parameter > because some hypothetical non-compliant device out there might not > provide a sane value? > >> >>> >>> Igor: how does this patch fix the mdts restriction? There are no >>> checks on i.e. the gc read path that ensures that a lower limit than >>> NVM_MAX_VLBA is enforced. >> >> This is the other part where the implementation breaks. > > No, it just does not fix it. > > over-and-out, > Hans IO requests issued from both garbage collector and writer thread are upper limiter by pblk->max_write_pgs variable. This variable is calculated based on queue_max_hw_sectors() already in pblk. User reads on the other hand are splitted by block layer based on queue_max_hw_sectors() too. Is there any other path which I'm missing, which will still tries to issue IOs with higher IO size? Or am I missing sth else in my logic? >> >> Javier
On Mon, Mar 4, 2019 at 2:04 PM Igor Konopko <igor.j.konopko@intel.com> wrote: > > > > On 04.03.2019 13:22, Hans Holmberg wrote: > > On Mon, Mar 4, 2019 at 12:44 PM Javier González <javier@javigon.com> wrote: > >> > >> > >> > >>> On 4 Mar 2019, at 12.30, Hans Holmberg <hans@owltronix.com> wrote: > >>> > >>> On Mon, Mar 4, 2019 at 10:05 AM Javier González <javier@javigon.com> wrote: > >>>>> On 27 Feb 2019, at 18.14, Igor Konopko <igor.j.konopko@intel.com> wrote: > >>>>> > >>>>> Current lightnvm and pblk implementation does not care > >>>>> about NVMe max data transfer size, which can be smaller > >>>>> than 64*K=256K. This patch fixes issues related to that. > >>> > >>> Could you describe *what* issues you are fixing? > > I'm fixing the issue related to controllers which NVMe max data transfer > size is lower that 256K (for example 128K, which happens for existing > NVMe controllers and NVMe spec compliant). Such a controllers are not > able to handle command which contains 64 PPAs, since the the size of > DMAed buffer will be above the capabilities of such a controller. Thanks for the explanation, that would be great to have in the commit message. > > >>> > >>>>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> > >>>>> --- > >>>>> drivers/lightnvm/core.c | 9 +++++++-- > >>>>> drivers/nvme/host/lightnvm.c | 1 + > >>>>> include/linux/lightnvm.h | 1 + > >>>>> 3 files changed, 9 insertions(+), 2 deletions(-) > >>>>> > >>>>> diff --git a/drivers/lightnvm/core.c b/drivers/lightnvm/core.c > >>>>> index 5f82036fe322..c01f83b8fbaf 100644 > >>>>> --- a/drivers/lightnvm/core.c > >>>>> +++ b/drivers/lightnvm/core.c > >>>>> @@ -325,6 +325,7 @@ static int nvm_create_tgt(struct nvm_dev *dev, struct nvm_ioctl_create *create) > >>>>> struct nvm_target *t; > >>>>> struct nvm_tgt_dev *tgt_dev; > >>>>> void *targetdata; > >>>>> + unsigned int mdts; > >>>>> int ret; > >>>>> > >>>>> switch (create->conf.type) { > >>>>> @@ -412,8 +413,12 @@ static int nvm_create_tgt(struct nvm_dev *dev, struct nvm_ioctl_create *create) > >>>>> tdisk->private_data = targetdata; > >>>>> tqueue->queuedata = targetdata; > >>>>> > >>>>> - blk_queue_max_hw_sectors(tqueue, > >>>>> - (dev->geo.csecs >> 9) * NVM_MAX_VLBA); > >>>>> + mdts = (dev->geo.csecs >> 9) * NVM_MAX_VLBA; > >>>>> + if (dev->geo.mdts) { > >>>>> + mdts = min_t(u32, dev->geo.mdts, > >>>>> + (dev->geo.csecs >> 9) * NVM_MAX_VLBA); > >>>>> + } > >>>>> + blk_queue_max_hw_sectors(tqueue, mdts); > >>>>> > >>>>> set_capacity(tdisk, tt->capacity(targetdata)); > >>>>> add_disk(tdisk); > >>>>> diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c > >>>>> index b759c25c89c8..b88a39a3cbd1 100644 > >>>>> --- a/drivers/nvme/host/lightnvm.c > >>>>> +++ b/drivers/nvme/host/lightnvm.c > >>>>> @@ -991,6 +991,7 @@ int nvme_nvm_register(struct nvme_ns *ns, char *disk_name, int node) > >>>>> geo->csecs = 1 << ns->lba_shift; > >>>>> geo->sos = ns->ms; > >>>>> geo->ext = ns->ext; > >>>>> + geo->mdts = ns->ctrl->max_hw_sectors; > >>>>> > >>>>> dev->q = q; > >>>>> memcpy(dev->name, disk_name, DISK_NAME_LEN); > >>>>> diff --git a/include/linux/lightnvm.h b/include/linux/lightnvm.h > >>>>> index 5d865a5d5cdc..d3b02708e5f0 100644 > >>>>> --- a/include/linux/lightnvm.h > >>>>> +++ b/include/linux/lightnvm.h > >>>>> @@ -358,6 +358,7 @@ struct nvm_geo { > >>>>> u16 csecs; /* sector size */ > >>>>> u16 sos; /* out-of-band area size */ > >>>>> bool ext; /* metadata in extended data buffer */ > >>>>> + u32 mdts; /* Max data transfer size*/ > >>>>> > >>>>> /* device write constrains */ > >>>>> u32 ws_min; /* minimum write size */ > >>>>> -- > >>>>> 2.17.1 > >>>> > >>>> I see where you are going with this and I partially agree, but none of > >>>> the OCSSD specs define a way to define this parameter. Thus, adding this > >>>> behavior taken from NVMe in Linux can break current implementations. Is > >>>> this a real life problem for you? Or this is just for NVMe “correctness”? > >>>> > >>>> Javier > >>> > >>> Hmm.Looking into the 2.0 spec what it says about vector reads: > >>> > >>> (figure 28):"The number of Logical Blocks (NLB): This field indicates > >>> the number of logical blocks to be read. This is a 0’s based value. > >>> Maximum of 64 LBAs is supported." > >>> > >>> You got the max limit covered, and the spec does not say anything > >>> about the minimum number of LBAs to support. > >>> > >>> Matias: any thoughts on this? > >>> > >>> Javier: How would this patch break current implementations? > >> > >> Say an OCSSD controller that sets mdts to a value under 64 or does not > >> set it at all (maybe garbage). Think you can get to one pretty quickly... > > > > So we cant make use of a perfectly good, standardized, parameter > > because some hypothetical non-compliant device out there might not > > provide a sane value? > > > >> > >>> > >>> Igor: how does this patch fix the mdts restriction? There are no > >>> checks on i.e. the gc read path that ensures that a lower limit than > >>> NVM_MAX_VLBA is enforced. > >> > >> This is the other part where the implementation breaks. > > > > No, it just does not fix it. > > > > over-and-out, > > Hans > > IO requests issued from both garbage collector and writer thread are > upper limiter by pblk->max_write_pgs variable. This variable is > calculated based on queue_max_hw_sectors() already in pblk. > > User reads on the other hand are splitted by block layer based on > queue_max_hw_sectors() too. Mea culpa, right you are! > > Is there any other path which I'm missing, which will still tries to > issue IOs with higher IO size? Or am I missing sth else in my logic? I missed the adjustments of max_write_pgs in pblk_core_init. We should be good. Thanks, Hans
> On 4 Mar 2019, at 13.22, Hans Holmberg <hans@owltronix.com> wrote: > > On Mon, Mar 4, 2019 at 12:44 PM Javier González <javier@javigon.com> wrote: >>> On 4 Mar 2019, at 12.30, Hans Holmberg <hans@owltronix.com> wrote: >>> >>> On Mon, Mar 4, 2019 at 10:05 AM Javier González <javier@javigon.com> wrote: >>>>> On 27 Feb 2019, at 18.14, Igor Konopko <igor.j.konopko@intel.com> wrote: >>>>> >>>>> Current lightnvm and pblk implementation does not care >>>>> about NVMe max data transfer size, which can be smaller >>>>> than 64*K=256K. This patch fixes issues related to that. >>> >>> Could you describe *what* issues you are fixing? >>> >>>>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> >>>>> --- >>>>> drivers/lightnvm/core.c | 9 +++++++-- >>>>> drivers/nvme/host/lightnvm.c | 1 + >>>>> include/linux/lightnvm.h | 1 + >>>>> 3 files changed, 9 insertions(+), 2 deletions(-) >>>>> >>>>> diff --git a/drivers/lightnvm/core.c b/drivers/lightnvm/core.c >>>>> index 5f82036fe322..c01f83b8fbaf 100644 >>>>> --- a/drivers/lightnvm/core.c >>>>> +++ b/drivers/lightnvm/core.c >>>>> @@ -325,6 +325,7 @@ static int nvm_create_tgt(struct nvm_dev *dev, struct nvm_ioctl_create *create) >>>>> struct nvm_target *t; >>>>> struct nvm_tgt_dev *tgt_dev; >>>>> void *targetdata; >>>>> + unsigned int mdts; >>>>> int ret; >>>>> >>>>> switch (create->conf.type) { >>>>> @@ -412,8 +413,12 @@ static int nvm_create_tgt(struct nvm_dev *dev, struct nvm_ioctl_create *create) >>>>> tdisk->private_data = targetdata; >>>>> tqueue->queuedata = targetdata; >>>>> >>>>> - blk_queue_max_hw_sectors(tqueue, >>>>> - (dev->geo.csecs >> 9) * NVM_MAX_VLBA); >>>>> + mdts = (dev->geo.csecs >> 9) * NVM_MAX_VLBA; >>>>> + if (dev->geo.mdts) { >>>>> + mdts = min_t(u32, dev->geo.mdts, >>>>> + (dev->geo.csecs >> 9) * NVM_MAX_VLBA); >>>>> + } >>>>> + blk_queue_max_hw_sectors(tqueue, mdts); >>>>> >>>>> set_capacity(tdisk, tt->capacity(targetdata)); >>>>> add_disk(tdisk); >>>>> diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c >>>>> index b759c25c89c8..b88a39a3cbd1 100644 >>>>> --- a/drivers/nvme/host/lightnvm.c >>>>> +++ b/drivers/nvme/host/lightnvm.c >>>>> @@ -991,6 +991,7 @@ int nvme_nvm_register(struct nvme_ns *ns, char *disk_name, int node) >>>>> geo->csecs = 1 << ns->lba_shift; >>>>> geo->sos = ns->ms; >>>>> geo->ext = ns->ext; >>>>> + geo->mdts = ns->ctrl->max_hw_sectors; >>>>> >>>>> dev->q = q; >>>>> memcpy(dev->name, disk_name, DISK_NAME_LEN); >>>>> diff --git a/include/linux/lightnvm.h b/include/linux/lightnvm.h >>>>> index 5d865a5d5cdc..d3b02708e5f0 100644 >>>>> --- a/include/linux/lightnvm.h >>>>> +++ b/include/linux/lightnvm.h >>>>> @@ -358,6 +358,7 @@ struct nvm_geo { >>>>> u16 csecs; /* sector size */ >>>>> u16 sos; /* out-of-band area size */ >>>>> bool ext; /* metadata in extended data buffer */ >>>>> + u32 mdts; /* Max data transfer size*/ >>>>> >>>>> /* device write constrains */ >>>>> u32 ws_min; /* minimum write size */ >>>>> -- >>>>> 2.17.1 >>>> >>>> I see where you are going with this and I partially agree, but none of >>>> the OCSSD specs define a way to define this parameter. Thus, adding this >>>> behavior taken from NVMe in Linux can break current implementations. Is >>>> this a real life problem for you? Or this is just for NVMe “correctness”? >>>> >>>> Javier >>> >>> Hmm.Looking into the 2.0 spec what it says about vector reads: >>> >>> (figure 28):"The number of Logical Blocks (NLB): This field indicates >>> the number of logical blocks to be read. This is a 0’s based value. >>> Maximum of 64 LBAs is supported." >>> >>> You got the max limit covered, and the spec does not say anything >>> about the minimum number of LBAs to support. >>> >>> Matias: any thoughts on this? >>> >>> Javier: How would this patch break current implementations? >> >> Say an OCSSD controller that sets mdts to a value under 64 or does not >> set it at all (maybe garbage). Think you can get to one pretty quickly... > > So we cant make use of a perfectly good, standardized, parameter > because some hypothetical non-compliant device out there might not > provide a sane value? The OCSSD standard has never used NVMe parameters, so there is no compliant / non-compliant. In fact, until we changed OCSSD 2.0 to get the sector and OOB sizes from the standard identify command, we used to have them in the geometry. If you did not catch it in the first reference, this concern is explicitly related to OCSSD controllers already out there - some of which you should be caring about. If we are to use this information in the future, I would advocate to first make changes in the spec and then in the code, not the other way around. > >>> Igor: how does this patch fix the mdts restriction? There are no >>> checks on i.e. the gc read path that ensures that a lower limit than >>> NVM_MAX_VLBA is enforced. >> >> This is the other part where the implementation breaks. > > No, it just does not fix it. It is broken in _this_ implementation. > > over-and-out, > Hans >> Javier
On 3/4/19 2:19 PM, Javier González wrote: > >> On 4 Mar 2019, at 13.22, Hans Holmberg <hans@owltronix.com> wrote: >> >> On Mon, Mar 4, 2019 at 12:44 PM Javier González <javier@javigon.com> wrote: >>>> On 4 Mar 2019, at 12.30, Hans Holmberg <hans@owltronix.com> wrote: >>>> >>>> On Mon, Mar 4, 2019 at 10:05 AM Javier González <javier@javigon.com> wrote: >>>>>> On 27 Feb 2019, at 18.14, Igor Konopko <igor.j.konopko@intel.com> wrote: >>>>>> >>>>>> Current lightnvm and pblk implementation does not care >>>>>> about NVMe max data transfer size, which can be smaller >>>>>> than 64*K=256K. This patch fixes issues related to that. >>>> >>>> Could you describe *what* issues you are fixing? >>>> >>>>>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> >>>>>> --- >>>>>> drivers/lightnvm/core.c | 9 +++++++-- >>>>>> drivers/nvme/host/lightnvm.c | 1 + >>>>>> include/linux/lightnvm.h | 1 + >>>>>> 3 files changed, 9 insertions(+), 2 deletions(-) >>>>>> >>>>>> diff --git a/drivers/lightnvm/core.c b/drivers/lightnvm/core.c >>>>>> index 5f82036fe322..c01f83b8fbaf 100644 >>>>>> --- a/drivers/lightnvm/core.c >>>>>> +++ b/drivers/lightnvm/core.c >>>>>> @@ -325,6 +325,7 @@ static int nvm_create_tgt(struct nvm_dev *dev, struct nvm_ioctl_create *create) >>>>>> struct nvm_target *t; >>>>>> struct nvm_tgt_dev *tgt_dev; >>>>>> void *targetdata; >>>>>> + unsigned int mdts; >>>>>> int ret; >>>>>> >>>>>> switch (create->conf.type) { >>>>>> @@ -412,8 +413,12 @@ static int nvm_create_tgt(struct nvm_dev *dev, struct nvm_ioctl_create *create) >>>>>> tdisk->private_data = targetdata; >>>>>> tqueue->queuedata = targetdata; >>>>>> >>>>>> - blk_queue_max_hw_sectors(tqueue, >>>>>> - (dev->geo.csecs >> 9) * NVM_MAX_VLBA); >>>>>> + mdts = (dev->geo.csecs >> 9) * NVM_MAX_VLBA; >>>>>> + if (dev->geo.mdts) { >>>>>> + mdts = min_t(u32, dev->geo.mdts, >>>>>> + (dev->geo.csecs >> 9) * NVM_MAX_VLBA); >>>>>> + } >>>>>> + blk_queue_max_hw_sectors(tqueue, mdts); >>>>>> >>>>>> set_capacity(tdisk, tt->capacity(targetdata)); >>>>>> add_disk(tdisk); >>>>>> diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c >>>>>> index b759c25c89c8..b88a39a3cbd1 100644 >>>>>> --- a/drivers/nvme/host/lightnvm.c >>>>>> +++ b/drivers/nvme/host/lightnvm.c >>>>>> @@ -991,6 +991,7 @@ int nvme_nvm_register(struct nvme_ns *ns, char *disk_name, int node) >>>>>> geo->csecs = 1 << ns->lba_shift; >>>>>> geo->sos = ns->ms; >>>>>> geo->ext = ns->ext; >>>>>> + geo->mdts = ns->ctrl->max_hw_sectors; >>>>>> >>>>>> dev->q = q; >>>>>> memcpy(dev->name, disk_name, DISK_NAME_LEN); >>>>>> diff --git a/include/linux/lightnvm.h b/include/linux/lightnvm.h >>>>>> index 5d865a5d5cdc..d3b02708e5f0 100644 >>>>>> --- a/include/linux/lightnvm.h >>>>>> +++ b/include/linux/lightnvm.h >>>>>> @@ -358,6 +358,7 @@ struct nvm_geo { >>>>>> u16 csecs; /* sector size */ >>>>>> u16 sos; /* out-of-band area size */ >>>>>> bool ext; /* metadata in extended data buffer */ >>>>>> + u32 mdts; /* Max data transfer size*/ >>>>>> >>>>>> /* device write constrains */ >>>>>> u32 ws_min; /* minimum write size */ >>>>>> -- >>>>>> 2.17.1 >>>>> >>>>> I see where you are going with this and I partially agree, but none of >>>>> the OCSSD specs define a way to define this parameter. Thus, adding this >>>>> behavior taken from NVMe in Linux can break current implementations. Is >>>>> this a real life problem for you? Or this is just for NVMe “correctness”? >>>>> >>>>> Javier >>>> >>>> Hmm.Looking into the 2.0 spec what it says about vector reads: >>>> >>>> (figure 28):"The number of Logical Blocks (NLB): This field indicates >>>> the number of logical blocks to be read. This is a 0’s based value. >>>> Maximum of 64 LBAs is supported." >>>> >>>> You got the max limit covered, and the spec does not say anything >>>> about the minimum number of LBAs to support. >>>> >>>> Matias: any thoughts on this? >>>> >>>> Javier: How would this patch break current implementations? >>> >>> Say an OCSSD controller that sets mdts to a value under 64 or does not >>> set it at all (maybe garbage). Think you can get to one pretty quickly... >> >> So we cant make use of a perfectly good, standardized, parameter >> because some hypothetical non-compliant device out there might not >> provide a sane value? > > The OCSSD standard has never used NVMe parameters, so there is no > compliant / non-compliant. In fact, until we changed OCSSD 2.0 to > get the sector and OOB sizes from the standard identify > command, we used to have them in the geometry. What the hell? Yes it has. The whole OCSSD spec is dependent on the NVMe spec. It is using many commands from the NVMe specification, which is not defined in the OCSSD specification. The MDTS field should be respected in all case, similarly to how the block layer respects it. Since the lightnvm subsystem are hooking in on the side, this also be honoured by pblk (or the lightnvm subsystem should fix it up) > > If you did not catch it in the first reference, this concern is explicitly > related to OCSSD controllers already out there - some of which you > should be caring about. > > If we are to use this information in the future, I would advocate to > first make changes in the spec and then in the code, not the other way > around. > >> >>>> Igor: how does this patch fix the mdts restriction? There are no >>>> checks on i.e. the gc read path that ensures that a lower limit than >>>> NVM_MAX_VLBA is enforced. >>> >>> This is the other part where the implementation breaks. >> >> No, it just does not fix it. > > It is broken in _this_ implementation. > >> >> over-and-out, >> Hans >>> Javier
> On 4 Mar 2019, at 14.25, Matias Bjørling <mb@lightnvm.io> wrote: > > On 3/4/19 2:19 PM, Javier González wrote: >>> On 4 Mar 2019, at 13.22, Hans Holmberg <hans@owltronix.com> wrote: >>> >>> On Mon, Mar 4, 2019 at 12:44 PM Javier González <javier@javigon.com> wrote: >>>>> On 4 Mar 2019, at 12.30, Hans Holmberg <hans@owltronix.com> wrote: >>>>> >>>>> On Mon, Mar 4, 2019 at 10:05 AM Javier González <javier@javigon.com> wrote: >>>>>>> On 27 Feb 2019, at 18.14, Igor Konopko <igor.j.konopko@intel.com> wrote: >>>>>>> >>>>>>> Current lightnvm and pblk implementation does not care >>>>>>> about NVMe max data transfer size, which can be smaller >>>>>>> than 64*K=256K. This patch fixes issues related to that. >>>>> >>>>> Could you describe *what* issues you are fixing? >>>>> >>>>>>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> >>>>>>> --- >>>>>>> drivers/lightnvm/core.c | 9 +++++++-- >>>>>>> drivers/nvme/host/lightnvm.c | 1 + >>>>>>> include/linux/lightnvm.h | 1 + >>>>>>> 3 files changed, 9 insertions(+), 2 deletions(-) >>>>>>> >>>>>>> diff --git a/drivers/lightnvm/core.c b/drivers/lightnvm/core.c >>>>>>> index 5f82036fe322..c01f83b8fbaf 100644 >>>>>>> --- a/drivers/lightnvm/core.c >>>>>>> +++ b/drivers/lightnvm/core.c >>>>>>> @@ -325,6 +325,7 @@ static int nvm_create_tgt(struct nvm_dev *dev, struct nvm_ioctl_create *create) >>>>>>> struct nvm_target *t; >>>>>>> struct nvm_tgt_dev *tgt_dev; >>>>>>> void *targetdata; >>>>>>> + unsigned int mdts; >>>>>>> int ret; >>>>>>> >>>>>>> switch (create->conf.type) { >>>>>>> @@ -412,8 +413,12 @@ static int nvm_create_tgt(struct nvm_dev *dev, struct nvm_ioctl_create *create) >>>>>>> tdisk->private_data = targetdata; >>>>>>> tqueue->queuedata = targetdata; >>>>>>> >>>>>>> - blk_queue_max_hw_sectors(tqueue, >>>>>>> - (dev->geo.csecs >> 9) * NVM_MAX_VLBA); >>>>>>> + mdts = (dev->geo.csecs >> 9) * NVM_MAX_VLBA; >>>>>>> + if (dev->geo.mdts) { >>>>>>> + mdts = min_t(u32, dev->geo.mdts, >>>>>>> + (dev->geo.csecs >> 9) * NVM_MAX_VLBA); >>>>>>> + } >>>>>>> + blk_queue_max_hw_sectors(tqueue, mdts); >>>>>>> >>>>>>> set_capacity(tdisk, tt->capacity(targetdata)); >>>>>>> add_disk(tdisk); >>>>>>> diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c >>>>>>> index b759c25c89c8..b88a39a3cbd1 100644 >>>>>>> --- a/drivers/nvme/host/lightnvm.c >>>>>>> +++ b/drivers/nvme/host/lightnvm.c >>>>>>> @@ -991,6 +991,7 @@ int nvme_nvm_register(struct nvme_ns *ns, char *disk_name, int node) >>>>>>> geo->csecs = 1 << ns->lba_shift; >>>>>>> geo->sos = ns->ms; >>>>>>> geo->ext = ns->ext; >>>>>>> + geo->mdts = ns->ctrl->max_hw_sectors; >>>>>>> >>>>>>> dev->q = q; >>>>>>> memcpy(dev->name, disk_name, DISK_NAME_LEN); >>>>>>> diff --git a/include/linux/lightnvm.h b/include/linux/lightnvm.h >>>>>>> index 5d865a5d5cdc..d3b02708e5f0 100644 >>>>>>> --- a/include/linux/lightnvm.h >>>>>>> +++ b/include/linux/lightnvm.h >>>>>>> @@ -358,6 +358,7 @@ struct nvm_geo { >>>>>>> u16 csecs; /* sector size */ >>>>>>> u16 sos; /* out-of-band area size */ >>>>>>> bool ext; /* metadata in extended data buffer */ >>>>>>> + u32 mdts; /* Max data transfer size*/ >>>>>>> >>>>>>> /* device write constrains */ >>>>>>> u32 ws_min; /* minimum write size */ >>>>>>> -- >>>>>>> 2.17.1 >>>>>> >>>>>> I see where you are going with this and I partially agree, but none of >>>>>> the OCSSD specs define a way to define this parameter. Thus, adding this >>>>>> behavior taken from NVMe in Linux can break current implementations. Is >>>>>> this a real life problem for you? Or this is just for NVMe “correctness”? >>>>>> >>>>>> Javier >>>>> >>>>> Hmm.Looking into the 2.0 spec what it says about vector reads: >>>>> >>>>> (figure 28):"The number of Logical Blocks (NLB): This field indicates >>>>> the number of logical blocks to be read. This is a 0’s based value. >>>>> Maximum of 64 LBAs is supported." >>>>> >>>>> You got the max limit covered, and the spec does not say anything >>>>> about the minimum number of LBAs to support. >>>>> >>>>> Matias: any thoughts on this? >>>>> >>>>> Javier: How would this patch break current implementations? >>>> >>>> Say an OCSSD controller that sets mdts to a value under 64 or does not >>>> set it at all (maybe garbage). Think you can get to one pretty quickly... >>> >>> So we cant make use of a perfectly good, standardized, parameter >>> because some hypothetical non-compliant device out there might not >>> provide a sane value? >> The OCSSD standard has never used NVMe parameters, so there is no >> compliant / non-compliant. In fact, until we changed OCSSD 2.0 to >> get the sector and OOB sizes from the standard identify >> command, we used to have them in the geometry. > > What the hell? Yes it has. The whole OCSSD spec is dependent on the > NVMe spec. It is using many commands from the NVMe specification, > which is not defined in the OCSSD specification. > First, lower the tone. Second, no, it has not and never has, starting with all the write constrains, continuing with the vector commands, etc. You cannot choose what you want to be compliant with and what you do not. OCSSD uses the NVMe protocol but it is self sufficient with its geometry for all the read / write / erase paths - it even depends on different PCIe class codes to be identified… To do this in the way the rest of the spec is defined, we either add a filed to the geometry or explicitly mention that MDTS is used, as we do with the sector and metadata sizes. Third, as a maintainer of this subsystem you should care about devices in the field that might break due to such a change (supported by the company you work for or not) - even if you can argue whether the change is compliant or not. And Hans, as a representative of a company that has such devices out there, you should care too. What if we add a quirk in the feature bits for this so that newer devices can implement this and older devices can still function? > The MDTS field should be respected in all case, similarly to how the > block layer respects it. Since the lightnvm subsystem are hooking in > on the side, this also be honoured by pblk (or the lightnvm subsystem > should fix it up) > This said, pblk does not care which value you give, it uses what the subsystem tells it - this is not arguing for this change not to be implemented. The only thing we should care about if implementing this is removing the constant defining 64 ppas and making allocations dynamic in the partial read and GC paths. Javier
> On 4 Mar 2019, at 14.04, Igor Konopko <igor.j.konopko@intel.com> wrote: > > > >> On 04.03.2019 13:22, Hans Holmberg wrote: >>> On Mon, Mar 4, 2019 at 12:44 PM Javier González <javier@javigon.com> wrote: >>> >>> >>> >>>> On 4 Mar 2019, at 12.30, Hans Holmberg <hans@owltronix.com> wrote: >>>> >>>> On Mon, Mar 4, 2019 at 10:05 AM Javier González <javier@javigon.com> wrote: >>>>>> On 27 Feb 2019, at 18.14, Igor Konopko <igor.j.konopko@intel.com> wrote: >>>>>> >>>>>> Current lightnvm and pblk implementation does not care >>>>>> about NVMe max data transfer size, which can be smaller >>>>>> than 64*K=256K. This patch fixes issues related to that. >>>> >>>> Could you describe *what* issues you are fixing? > > I'm fixing the issue related to controllers which NVMe max data transfer size is lower that 256K (for example 128K, which happens for existing NVMe controllers and NVMe spec compliant). Such a controllers are not able to handle command which contains 64 PPAs, since the the size of DMAed buffer will be above the capabilities of such a controller. > OK. Then let’s try to get support for this ASAP. If the rest agree on a feature bit then it should be straightforward. If not, let’s at least add a warning if MDTS < 64 so that people are able to identify where the missing performance went when updating the kernel if they are not using MDTS on their OCSSD implementations. Javier
On Mon, Mar 4, 2019 at 2:44 PM Javier González <javier@javigon.com> wrote: > > > > On 4 Mar 2019, at 14.25, Matias Bjørling <mb@lightnvm.io> wrote: > > > > On 3/4/19 2:19 PM, Javier González wrote: > >>> On 4 Mar 2019, at 13.22, Hans Holmberg <hans@owltronix.com> wrote: > >>> > >>> On Mon, Mar 4, 2019 at 12:44 PM Javier González <javier@javigon.com> wrote: > >>>>> On 4 Mar 2019, at 12.30, Hans Holmberg <hans@owltronix.com> wrote: > >>>>> > >>>>> On Mon, Mar 4, 2019 at 10:05 AM Javier González <javier@javigon.com> wrote: > >>>>>>> On 27 Feb 2019, at 18.14, Igor Konopko <igor.j.konopko@intel.com> wrote: > >>>>>>> > >>>>>>> Current lightnvm and pblk implementation does not care > >>>>>>> about NVMe max data transfer size, which can be smaller > >>>>>>> than 64*K=256K. This patch fixes issues related to that. > >>>>> > >>>>> Could you describe *what* issues you are fixing? > >>>>> > >>>>>>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> > >>>>>>> --- > >>>>>>> drivers/lightnvm/core.c | 9 +++++++-- > >>>>>>> drivers/nvme/host/lightnvm.c | 1 + > >>>>>>> include/linux/lightnvm.h | 1 + > >>>>>>> 3 files changed, 9 insertions(+), 2 deletions(-) > >>>>>>> > >>>>>>> diff --git a/drivers/lightnvm/core.c b/drivers/lightnvm/core.c > >>>>>>> index 5f82036fe322..c01f83b8fbaf 100644 > >>>>>>> --- a/drivers/lightnvm/core.c > >>>>>>> +++ b/drivers/lightnvm/core.c > >>>>>>> @@ -325,6 +325,7 @@ static int nvm_create_tgt(struct nvm_dev *dev, struct nvm_ioctl_create *create) > >>>>>>> struct nvm_target *t; > >>>>>>> struct nvm_tgt_dev *tgt_dev; > >>>>>>> void *targetdata; > >>>>>>> + unsigned int mdts; > >>>>>>> int ret; > >>>>>>> > >>>>>>> switch (create->conf.type) { > >>>>>>> @@ -412,8 +413,12 @@ static int nvm_create_tgt(struct nvm_dev *dev, struct nvm_ioctl_create *create) > >>>>>>> tdisk->private_data = targetdata; > >>>>>>> tqueue->queuedata = targetdata; > >>>>>>> > >>>>>>> - blk_queue_max_hw_sectors(tqueue, > >>>>>>> - (dev->geo.csecs >> 9) * NVM_MAX_VLBA); > >>>>>>> + mdts = (dev->geo.csecs >> 9) * NVM_MAX_VLBA; > >>>>>>> + if (dev->geo.mdts) { > >>>>>>> + mdts = min_t(u32, dev->geo.mdts, > >>>>>>> + (dev->geo.csecs >> 9) * NVM_MAX_VLBA); > >>>>>>> + } > >>>>>>> + blk_queue_max_hw_sectors(tqueue, mdts); > >>>>>>> > >>>>>>> set_capacity(tdisk, tt->capacity(targetdata)); > >>>>>>> add_disk(tdisk); > >>>>>>> diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c > >>>>>>> index b759c25c89c8..b88a39a3cbd1 100644 > >>>>>>> --- a/drivers/nvme/host/lightnvm.c > >>>>>>> +++ b/drivers/nvme/host/lightnvm.c > >>>>>>> @@ -991,6 +991,7 @@ int nvme_nvm_register(struct nvme_ns *ns, char *disk_name, int node) > >>>>>>> geo->csecs = 1 << ns->lba_shift; > >>>>>>> geo->sos = ns->ms; > >>>>>>> geo->ext = ns->ext; > >>>>>>> + geo->mdts = ns->ctrl->max_hw_sectors; > >>>>>>> > >>>>>>> dev->q = q; > >>>>>>> memcpy(dev->name, disk_name, DISK_NAME_LEN); > >>>>>>> diff --git a/include/linux/lightnvm.h b/include/linux/lightnvm.h > >>>>>>> index 5d865a5d5cdc..d3b02708e5f0 100644 > >>>>>>> --- a/include/linux/lightnvm.h > >>>>>>> +++ b/include/linux/lightnvm.h > >>>>>>> @@ -358,6 +358,7 @@ struct nvm_geo { > >>>>>>> u16 csecs; /* sector size */ > >>>>>>> u16 sos; /* out-of-band area size */ > >>>>>>> bool ext; /* metadata in extended data buffer */ > >>>>>>> + u32 mdts; /* Max data transfer size*/ > >>>>>>> > >>>>>>> /* device write constrains */ > >>>>>>> u32 ws_min; /* minimum write size */ > >>>>>>> -- > >>>>>>> 2.17.1 > >>>>>> > >>>>>> I see where you are going with this and I partially agree, but none of > >>>>>> the OCSSD specs define a way to define this parameter. Thus, adding this > >>>>>> behavior taken from NVMe in Linux can break current implementations. Is > >>>>>> this a real life problem for you? Or this is just for NVMe “correctness”? > >>>>>> > >>>>>> Javier > >>>>> > >>>>> Hmm.Looking into the 2.0 spec what it says about vector reads: > >>>>> > >>>>> (figure 28):"The number of Logical Blocks (NLB): This field indicates > >>>>> the number of logical blocks to be read. This is a 0’s based value. > >>>>> Maximum of 64 LBAs is supported." > >>>>> > >>>>> You got the max limit covered, and the spec does not say anything > >>>>> about the minimum number of LBAs to support. > >>>>> > >>>>> Matias: any thoughts on this? > >>>>> > >>>>> Javier: How would this patch break current implementations? > >>>> > >>>> Say an OCSSD controller that sets mdts to a value under 64 or does not > >>>> set it at all (maybe garbage). Think you can get to one pretty quickly... > >>> > >>> So we cant make use of a perfectly good, standardized, parameter > >>> because some hypothetical non-compliant device out there might not > >>> provide a sane value? > >> The OCSSD standard has never used NVMe parameters, so there is no > >> compliant / non-compliant. In fact, until we changed OCSSD 2.0 to > >> get the sector and OOB sizes from the standard identify > >> command, we used to have them in the geometry. > > > > What the hell? Yes it has. The whole OCSSD spec is dependent on the > > NVMe spec. It is using many commands from the NVMe specification, > > which is not defined in the OCSSD specification. > > > > First, lower the tone. > > Second, no, it has not and never has, starting with all the write > constrains, continuing with the vector commands, etc. You cannot choose > what you want to be compliant with and what you do not. OCSSD uses the > NVMe protocol but it is self sufficient with its geometry for all the > read / write / erase paths - it even depends on different PCIe class > codes to be identified… To do this in the way the rest of the spec is > defined, we either add a filed to the geometry or explicitly mention > that MDTS is used, as we do with the sector and metadata sizes. > > Third, as a maintainer of this subsystem you should care about devices > in the field that might break due to such a change (supported by the > company you work for or not) - even if you can argue whether the change > is compliant or not. > > And Hans, as a representative of a company that has such devices out > there, you should care too. If you worry about me doing my job, you need not to. I test. So far I have not found any regressions in this patchset. Please keep your open source hat on Javier. > > What if we add a quirk in the feature bits for this so that newer > devices can implement this and older devices can still function? > > > The MDTS field should be respected in all case, similarly to how the > > block layer respects it. Since the lightnvm subsystem are hooking in > > on the side, this also be honoured by pblk (or the lightnvm subsystem > > should fix it up) > > > > This said, pblk does not care which value you give, it uses what the > subsystem tells it - this is not arguing for this change not to be > implemented. > > The only thing we should care about if implementing this is removing the > constant defining 64 ppas and making allocations dynamic in the partial > read and GC paths. > > Javier
> On 4 Mar 2019, at 15.24, Hans Holmberg <hans@owltronix.com> wrote: > > On Mon, Mar 4, 2019 at 2:44 PM Javier González <javier@javigon.com> wrote: >>> On 4 Mar 2019, at 14.25, Matias Bjørling <mb@lightnvm.io> wrote: >>> >>> On 3/4/19 2:19 PM, Javier González wrote: >>>>> On 4 Mar 2019, at 13.22, Hans Holmberg <hans@owltronix.com> wrote: >>>>> >>>>> On Mon, Mar 4, 2019 at 12:44 PM Javier González <javier@javigon.com> wrote: >>>>>>> On 4 Mar 2019, at 12.30, Hans Holmberg <hans@owltronix.com> wrote: >>>>>>> >>>>>>> On Mon, Mar 4, 2019 at 10:05 AM Javier González <javier@javigon.com> wrote: >>>>>>>>> On 27 Feb 2019, at 18.14, Igor Konopko <igor.j.konopko@intel.com> wrote: >>>>>>>>> >>>>>>>>> Current lightnvm and pblk implementation does not care >>>>>>>>> about NVMe max data transfer size, which can be smaller >>>>>>>>> than 64*K=256K. This patch fixes issues related to that. >>>>>>> >>>>>>> Could you describe *what* issues you are fixing? >>>>>>> >>>>>>>>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> >>>>>>>>> --- >>>>>>>>> drivers/lightnvm/core.c | 9 +++++++-- >>>>>>>>> drivers/nvme/host/lightnvm.c | 1 + >>>>>>>>> include/linux/lightnvm.h | 1 + >>>>>>>>> 3 files changed, 9 insertions(+), 2 deletions(-) >>>>>>>>> >>>>>>>>> diff --git a/drivers/lightnvm/core.c b/drivers/lightnvm/core.c >>>>>>>>> index 5f82036fe322..c01f83b8fbaf 100644 >>>>>>>>> --- a/drivers/lightnvm/core.c >>>>>>>>> +++ b/drivers/lightnvm/core.c >>>>>>>>> @@ -325,6 +325,7 @@ static int nvm_create_tgt(struct nvm_dev *dev, struct nvm_ioctl_create *create) >>>>>>>>> struct nvm_target *t; >>>>>>>>> struct nvm_tgt_dev *tgt_dev; >>>>>>>>> void *targetdata; >>>>>>>>> + unsigned int mdts; >>>>>>>>> int ret; >>>>>>>>> >>>>>>>>> switch (create->conf.type) { >>>>>>>>> @@ -412,8 +413,12 @@ static int nvm_create_tgt(struct nvm_dev *dev, struct nvm_ioctl_create *create) >>>>>>>>> tdisk->private_data = targetdata; >>>>>>>>> tqueue->queuedata = targetdata; >>>>>>>>> >>>>>>>>> - blk_queue_max_hw_sectors(tqueue, >>>>>>>>> - (dev->geo.csecs >> 9) * NVM_MAX_VLBA); >>>>>>>>> + mdts = (dev->geo.csecs >> 9) * NVM_MAX_VLBA; >>>>>>>>> + if (dev->geo.mdts) { >>>>>>>>> + mdts = min_t(u32, dev->geo.mdts, >>>>>>>>> + (dev->geo.csecs >> 9) * NVM_MAX_VLBA); >>>>>>>>> + } >>>>>>>>> + blk_queue_max_hw_sectors(tqueue, mdts); >>>>>>>>> >>>>>>>>> set_capacity(tdisk, tt->capacity(targetdata)); >>>>>>>>> add_disk(tdisk); >>>>>>>>> diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c >>>>>>>>> index b759c25c89c8..b88a39a3cbd1 100644 >>>>>>>>> --- a/drivers/nvme/host/lightnvm.c >>>>>>>>> +++ b/drivers/nvme/host/lightnvm.c >>>>>>>>> @@ -991,6 +991,7 @@ int nvme_nvm_register(struct nvme_ns *ns, char *disk_name, int node) >>>>>>>>> geo->csecs = 1 << ns->lba_shift; >>>>>>>>> geo->sos = ns->ms; >>>>>>>>> geo->ext = ns->ext; >>>>>>>>> + geo->mdts = ns->ctrl->max_hw_sectors; >>>>>>>>> >>>>>>>>> dev->q = q; >>>>>>>>> memcpy(dev->name, disk_name, DISK_NAME_LEN); >>>>>>>>> diff --git a/include/linux/lightnvm.h b/include/linux/lightnvm.h >>>>>>>>> index 5d865a5d5cdc..d3b02708e5f0 100644 >>>>>>>>> --- a/include/linux/lightnvm.h >>>>>>>>> +++ b/include/linux/lightnvm.h >>>>>>>>> @@ -358,6 +358,7 @@ struct nvm_geo { >>>>>>>>> u16 csecs; /* sector size */ >>>>>>>>> u16 sos; /* out-of-band area size */ >>>>>>>>> bool ext; /* metadata in extended data buffer */ >>>>>>>>> + u32 mdts; /* Max data transfer size*/ >>>>>>>>> >>>>>>>>> /* device write constrains */ >>>>>>>>> u32 ws_min; /* minimum write size */ >>>>>>>>> -- >>>>>>>>> 2.17.1 >>>>>>>> >>>>>>>> I see where you are going with this and I partially agree, but none of >>>>>>>> the OCSSD specs define a way to define this parameter. Thus, adding this >>>>>>>> behavior taken from NVMe in Linux can break current implementations. Is >>>>>>>> this a real life problem for you? Or this is just for NVMe “correctness”? >>>>>>>> >>>>>>>> Javier >>>>>>> >>>>>>> Hmm.Looking into the 2.0 spec what it says about vector reads: >>>>>>> >>>>>>> (figure 28):"The number of Logical Blocks (NLB): This field indicates >>>>>>> the number of logical blocks to be read. This is a 0’s based value. >>>>>>> Maximum of 64 LBAs is supported." >>>>>>> >>>>>>> You got the max limit covered, and the spec does not say anything >>>>>>> about the minimum number of LBAs to support. >>>>>>> >>>>>>> Matias: any thoughts on this? >>>>>>> >>>>>>> Javier: How would this patch break current implementations? >>>>>> >>>>>> Say an OCSSD controller that sets mdts to a value under 64 or does not >>>>>> set it at all (maybe garbage). Think you can get to one pretty quickly... >>>>> >>>>> So we cant make use of a perfectly good, standardized, parameter >>>>> because some hypothetical non-compliant device out there might not >>>>> provide a sane value? >>>> The OCSSD standard has never used NVMe parameters, so there is no >>>> compliant / non-compliant. In fact, until we changed OCSSD 2.0 to >>>> get the sector and OOB sizes from the standard identify >>>> command, we used to have them in the geometry. >>> >>> What the hell? Yes it has. The whole OCSSD spec is dependent on the >>> NVMe spec. It is using many commands from the NVMe specification, >>> which is not defined in the OCSSD specification. >> >> First, lower the tone. >> >> Second, no, it has not and never has, starting with all the write >> constrains, continuing with the vector commands, etc. You cannot choose >> what you want to be compliant with and what you do not. OCSSD uses the >> NVMe protocol but it is self sufficient with its geometry for all the >> read / write / erase paths - it even depends on different PCIe class >> codes to be identified… To do this in the way the rest of the spec is >> defined, we either add a filed to the geometry or explicitly mention >> that MDTS is used, as we do with the sector and metadata sizes. >> >> Third, as a maintainer of this subsystem you should care about devices >> in the field that might break due to such a change (supported by the >> company you work for or not) - even if you can argue whether the change >> is compliant or not. >> >> And Hans, as a representative of a company that has such devices out >> there, you should care too. > > If you worry about me doing my job, you need not to. > I test. So far I have not found any regressions in this patchset. > > Please keep your open source hat on Javier. Ok. You said it. Then please apply the patch :) Reviewed-by: Javier González <javier@javigon.com>
On 3/4/19 2:44 PM, Javier González wrote: > >> On 4 Mar 2019, at 14.25, Matias Bjørling <mb@lightnvm.io> wrote: >> >> On 3/4/19 2:19 PM, Javier González wrote: >>>> On 4 Mar 2019, at 13.22, Hans Holmberg <hans@owltronix.com> wrote: >>>> >>>> On Mon, Mar 4, 2019 at 12:44 PM Javier González <javier@javigon.com> wrote: >>>>>> On 4 Mar 2019, at 12.30, Hans Holmberg <hans@owltronix.com> wrote: >>>>>> >>>>>> On Mon, Mar 4, 2019 at 10:05 AM Javier González <javier@javigon.com> wrote: >>>>>>>> On 27 Feb 2019, at 18.14, Igor Konopko <igor.j.konopko@intel.com> wrote: >>>>>>>> >>>>>>>> Current lightnvm and pblk implementation does not care >>>>>>>> about NVMe max data transfer size, which can be smaller >>>>>>>> than 64*K=256K. This patch fixes issues related to that. >>>>>> >>>>>> Could you describe *what* issues you are fixing? >>>>>> >>>>>>>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> >>>>>>>> --- >>>>>>>> drivers/lightnvm/core.c | 9 +++++++-- >>>>>>>> drivers/nvme/host/lightnvm.c | 1 + >>>>>>>> include/linux/lightnvm.h | 1 + >>>>>>>> 3 files changed, 9 insertions(+), 2 deletions(-) >>>>>>>> >>>>>>>> diff --git a/drivers/lightnvm/core.c b/drivers/lightnvm/core.c >>>>>>>> index 5f82036fe322..c01f83b8fbaf 100644 >>>>>>>> --- a/drivers/lightnvm/core.c >>>>>>>> +++ b/drivers/lightnvm/core.c >>>>>>>> @@ -325,6 +325,7 @@ static int nvm_create_tgt(struct nvm_dev *dev, struct nvm_ioctl_create *create) >>>>>>>> struct nvm_target *t; >>>>>>>> struct nvm_tgt_dev *tgt_dev; >>>>>>>> void *targetdata; >>>>>>>> + unsigned int mdts; >>>>>>>> int ret; >>>>>>>> >>>>>>>> switch (create->conf.type) { >>>>>>>> @@ -412,8 +413,12 @@ static int nvm_create_tgt(struct nvm_dev *dev, struct nvm_ioctl_create *create) >>>>>>>> tdisk->private_data = targetdata; >>>>>>>> tqueue->queuedata = targetdata; >>>>>>>> >>>>>>>> - blk_queue_max_hw_sectors(tqueue, >>>>>>>> - (dev->geo.csecs >> 9) * NVM_MAX_VLBA); >>>>>>>> + mdts = (dev->geo.csecs >> 9) * NVM_MAX_VLBA; >>>>>>>> + if (dev->geo.mdts) { >>>>>>>> + mdts = min_t(u32, dev->geo.mdts, >>>>>>>> + (dev->geo.csecs >> 9) * NVM_MAX_VLBA); >>>>>>>> + } >>>>>>>> + blk_queue_max_hw_sectors(tqueue, mdts); >>>>>>>> >>>>>>>> set_capacity(tdisk, tt->capacity(targetdata)); >>>>>>>> add_disk(tdisk); >>>>>>>> diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c >>>>>>>> index b759c25c89c8..b88a39a3cbd1 100644 >>>>>>>> --- a/drivers/nvme/host/lightnvm.c >>>>>>>> +++ b/drivers/nvme/host/lightnvm.c >>>>>>>> @@ -991,6 +991,7 @@ int nvme_nvm_register(struct nvme_ns *ns, char *disk_name, int node) >>>>>>>> geo->csecs = 1 << ns->lba_shift; >>>>>>>> geo->sos = ns->ms; >>>>>>>> geo->ext = ns->ext; >>>>>>>> + geo->mdts = ns->ctrl->max_hw_sectors; >>>>>>>> >>>>>>>> dev->q = q; >>>>>>>> memcpy(dev->name, disk_name, DISK_NAME_LEN); >>>>>>>> diff --git a/include/linux/lightnvm.h b/include/linux/lightnvm.h >>>>>>>> index 5d865a5d5cdc..d3b02708e5f0 100644 >>>>>>>> --- a/include/linux/lightnvm.h >>>>>>>> +++ b/include/linux/lightnvm.h >>>>>>>> @@ -358,6 +358,7 @@ struct nvm_geo { >>>>>>>> u16 csecs; /* sector size */ >>>>>>>> u16 sos; /* out-of-band area size */ >>>>>>>> bool ext; /* metadata in extended data buffer */ >>>>>>>> + u32 mdts; /* Max data transfer size*/ >>>>>>>> >>>>>>>> /* device write constrains */ >>>>>>>> u32 ws_min; /* minimum write size */ >>>>>>>> -- >>>>>>>> 2.17.1 >>>>>>> >>>>>>> I see where you are going with this and I partially agree, but none of >>>>>>> the OCSSD specs define a way to define this parameter. Thus, adding this >>>>>>> behavior taken from NVMe in Linux can break current implementations. Is >>>>>>> this a real life problem for you? Or this is just for NVMe “correctness”? >>>>>>> >>>>>>> Javier >>>>>> >>>>>> Hmm.Looking into the 2.0 spec what it says about vector reads: >>>>>> >>>>>> (figure 28):"The number of Logical Blocks (NLB): This field indicates >>>>>> the number of logical blocks to be read. This is a 0’s based value. >>>>>> Maximum of 64 LBAs is supported." >>>>>> >>>>>> You got the max limit covered, and the spec does not say anything >>>>>> about the minimum number of LBAs to support. >>>>>> >>>>>> Matias: any thoughts on this? >>>>>> >>>>>> Javier: How would this patch break current implementations? >>>>> >>>>> Say an OCSSD controller that sets mdts to a value under 64 or does not >>>>> set it at all (maybe garbage). Think you can get to one pretty quickly... >>>> >>>> So we cant make use of a perfectly good, standardized, parameter >>>> because some hypothetical non-compliant device out there might not >>>> provide a sane value? >>> The OCSSD standard has never used NVMe parameters, so there is no >>> compliant / non-compliant. In fact, until we changed OCSSD 2.0 to >>> get the sector and OOB sizes from the standard identify >>> command, we used to have them in the geometry. >> >> What the hell? Yes it has. The whole OCSSD spec is dependent on the >> NVMe spec. It is using many commands from the NVMe specification, >> which is not defined in the OCSSD specification. >> > > First, lower the tone. > > Second, no, it has not and never has, starting with all the write > constrains, continuing with the vector commands, etc. > You cannot choose > what you want to be compliant with and what you do not. OCSSD uses the > NVMe protocol but it is self sufficient with its geometry for all the > read / write / erase paths - it even depends on different PCIe class > codes to be identified… No. It does not. To do this in the way the rest of the spec is > defined, we either add a filed to the geometry or explicitly mention > that MDTS is used, as we do with the sector and metadata sizes. > > Third, as a maintainer of this subsystem you should care about devices > in the field that might break due to such a change (supported by the > company you work for or not) - even if you can argue whether the change > is compliant or not. Same as Hans. If you worry about me doing my job, you need not to. > > And Hans, as a representative of a company that has such devices out > there, you should care too. > > What if we add a quirk in the feature bits for this so that newer > devices can implement this and older devices can still function? > >> The MDTS field should be respected in all case, similarly to how the >> block layer respects it. Since the lightnvm subsystem are hooking in >> on the side, this also be honoured by pblk (or the lightnvm subsystem >> should fix it up) >> > > This said, pblk does not care which value you give, it uses what the > subsystem tells it - this is not arguing for this change not to be > implemented. > > The only thing we should care about if implementing this is removing the > constant defining 64 ppas and making allocations dynamic in the partial > read and GC paths. > > Javier >
diff --git a/drivers/lightnvm/core.c b/drivers/lightnvm/core.c index 5f82036fe322..c01f83b8fbaf 100644 --- a/drivers/lightnvm/core.c +++ b/drivers/lightnvm/core.c @@ -325,6 +325,7 @@ static int nvm_create_tgt(struct nvm_dev *dev, struct nvm_ioctl_create *create) struct nvm_target *t; struct nvm_tgt_dev *tgt_dev; void *targetdata; + unsigned int mdts; int ret; switch (create->conf.type) { @@ -412,8 +413,12 @@ static int nvm_create_tgt(struct nvm_dev *dev, struct nvm_ioctl_create *create) tdisk->private_data = targetdata; tqueue->queuedata = targetdata; - blk_queue_max_hw_sectors(tqueue, - (dev->geo.csecs >> 9) * NVM_MAX_VLBA); + mdts = (dev->geo.csecs >> 9) * NVM_MAX_VLBA; + if (dev->geo.mdts) { + mdts = min_t(u32, dev->geo.mdts, + (dev->geo.csecs >> 9) * NVM_MAX_VLBA); + } + blk_queue_max_hw_sectors(tqueue, mdts); set_capacity(tdisk, tt->capacity(targetdata)); add_disk(tdisk); diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c index b759c25c89c8..b88a39a3cbd1 100644 --- a/drivers/nvme/host/lightnvm.c +++ b/drivers/nvme/host/lightnvm.c @@ -991,6 +991,7 @@ int nvme_nvm_register(struct nvme_ns *ns, char *disk_name, int node) geo->csecs = 1 << ns->lba_shift; geo->sos = ns->ms; geo->ext = ns->ext; + geo->mdts = ns->ctrl->max_hw_sectors; dev->q = q; memcpy(dev->name, disk_name, DISK_NAME_LEN); diff --git a/include/linux/lightnvm.h b/include/linux/lightnvm.h index 5d865a5d5cdc..d3b02708e5f0 100644 --- a/include/linux/lightnvm.h +++ b/include/linux/lightnvm.h @@ -358,6 +358,7 @@ struct nvm_geo { u16 csecs; /* sector size */ u16 sos; /* out-of-band area size */ bool ext; /* metadata in extended data buffer */ + u32 mdts; /* Max data transfer size*/ /* device write constrains */ u32 ws_min; /* minimum write size */
Current lightnvm and pblk implementation does not care about NVMe max data transfer size, which can be smaller than 64*K=256K. This patch fixes issues related to that. Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> --- drivers/lightnvm/core.c | 9 +++++++-- drivers/nvme/host/lightnvm.c | 1 + include/linux/lightnvm.h | 1 + 3 files changed, 9 insertions(+), 2 deletions(-)