Message ID | 55c82aa5730c05017a58641ad9550b6c0f0e16b2.1527618402.git.swise@opengridcomputing.com (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
On Tue, 29 May 2018, Steve Wise wrote: > @@ -200,17 +204,17 @@ static int nvmet_rdma_alloc_cmd(struct nvmet_rdma_device *ndev, > c->sge[0].length = sizeof(*c->nvme_cmd); > c->sge[0].lkey = ndev->pd->local_dma_lkey; > > - if (!admin) { > + if (!admin && inline_data_size) { > c->inline_page = alloc_pages(GFP_KERNEL, > - get_order(NVMET_RDMA_INLINE_DATA_SIZE)); > + get_order(inline_data_size)); Now we do higher order allocations here. This means that the allocation can fail if system memory is highly fragmented. And the allocations can no longer be satisfied from the per cpu caches. So allocation performance will drop. > if (!c->inline_page) > goto out_unmap_cmd; Maybe think about some sort of fallback to vmalloc or so if this alloc fails? -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 5/30/2018 10:49 AM, Christopher Lameter wrote: > On Tue, 29 May 2018, Steve Wise wrote: > >> @@ -200,17 +204,17 @@ static int nvmet_rdma_alloc_cmd(struct nvmet_rdma_device *ndev, >> c->sge[0].length = sizeof(*c->nvme_cmd); >> c->sge[0].lkey = ndev->pd->local_dma_lkey; >> >> - if (!admin) { >> + if (!admin && inline_data_size) { >> c->inline_page = alloc_pages(GFP_KERNEL, >> - get_order(NVMET_RDMA_INLINE_DATA_SIZE)); >> + get_order(inline_data_size)); > Now we do higher order allocations here. This means that the allocation > can fail if system memory is highly fragmented. And the allocations can no > longer be satisfied from the per cpu caches. So allocation performance > will drop. Yes. >> if (!c->inline_page) >> goto out_unmap_cmd; > Maybe think about some sort of fallback to vmalloc or so if this > alloc fails? The memory needs to be physically contiguous and will be mapped for DMA, so vmalloc() won't work. I could complicate the design and allocate a scatter gather table for this memory, and then register it into a single RDMA MR. That would allow allocating non-contiguous pages. But is that complication worth it here? Steve. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 30 May 2018, Steve Wise wrote: > > Now we do higher order allocations here. This means that the allocation > > can fail if system memory is highly fragmented. And the allocations can no > > longer be satisfied from the per cpu caches. So allocation performance > > will drop. > > Yes. Maybe do some test with fragmented memory? > >> if (!c->inline_page) > >> goto out_unmap_cmd; > > Maybe think about some sort of fallback to vmalloc or so if this > > alloc fails? > > The memory needs to be physically contiguous and will be mapped for DMA, > so vmalloc() won't work. > > I could complicate the design and allocate a scatter gather table for > this memory, and then register it into a single RDMA MR. That would > allow allocating non-contiguous pages. But is that complication worth > it here? Not sure..... Just saw the higher alloc and I have seen what other subsystems went through when dealing with larger order allocations
>> @@ -200,17 +204,17 @@ static int nvmet_rdma_alloc_cmd(struct nvmet_rdma_device *ndev, >> c->sge[0].length = sizeof(*c->nvme_cmd); >> c->sge[0].lkey = ndev->pd->local_dma_lkey; >> >> - if (!admin) { >> + if (!admin && inline_data_size) { >> c->inline_page = alloc_pages(GFP_KERNEL, >> - get_order(NVMET_RDMA_INLINE_DATA_SIZE)); >> + get_order(inline_data_size)); > > Now we do higher order allocations here. This means that the allocation > can fail if system memory is highly fragmented. And the allocations can no > longer be satisfied from the per cpu caches. So allocation performance > will drop. That was my first thought as well. I'm not too keen on having higher-order allocations on this, not at all. nvmet-rdma will allocate a whole bunch of those. I think we should try to be good citizens. I don't think its too complicated to do is it? -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 5/30/2018 4:45 PM, Sagi Grimberg wrote: > >>> @@ -200,17 +204,17 @@ static int nvmet_rdma_alloc_cmd(struct >>> nvmet_rdma_device *ndev, >>> c->sge[0].length = sizeof(*c->nvme_cmd); >>> c->sge[0].lkey = ndev->pd->local_dma_lkey; >>> >>> - if (!admin) { >>> + if (!admin && inline_data_size) { >>> c->inline_page = alloc_pages(GFP_KERNEL, >>> - get_order(NVMET_RDMA_INLINE_DATA_SIZE)); >>> + get_order(inline_data_size)); >> >> Now we do higher order allocations here. This means that the allocation >> can fail if system memory is highly fragmented. And the allocations >> can no >> longer be satisfied from the per cpu caches. So allocation performance >> will drop. > > That was my first thought as well. I'm not too keen on having > higher-order allocations on this, not at all. nvmet-rdma will > allocate a whole bunch of those. I think we should try to be > good citizens. I don't think its too complicated to do is it? Its ugly because registering an MR for the inline pages requires a connected QP to use the REG_MR WR. Maybe I'll just split the needed pages across the remaining recv sge entries available? IE if the device supports 5 recv sges, then 4 can be used for inline, and thus 4 non-contiguous pages could be used. cxgb4, however, only support 4 recv sges, so it would only support 12K of inline with this implementation. And perhaps there are rdma devices with even fewer recv sges? Do you have any other idea on how to avoid higher-order allocations? Steve. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> Its ugly because registering an MR for the inline pages requires a > connected QP to use the REG_MR WR. No, why MR? > Maybe I'll just split the needed > pages across the remaining recv sge entries available? IE if the device > supports 5 recv sges, then 4 can be used for inline, and thus 4 > non-contiguous pages could be used. That's what I had in mind... > cxgb4, however, only support 4 recv > sges, so it would only support 12K of inline with this implementation. :( > And perhaps there are rdma devices with even fewer recv sges? > > Do you have any other idea on how to avoid higher-order allocations? Not really. I guess that we should be able to recv the user inline_data_size. So maybe it would be ok if the we'd do that only if we must? -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 5/30/2018 5:13 PM, Sagi Grimberg wrote: > >> Its ugly because registering an MR for the inline pages requires a >> connected QP to use the REG_MR WR. > > No, why MR? > >> Maybe I'll just split the needed >> pages across the remaining recv sge entries available? IE if the device >> supports 5 recv sges, then 4 can be used for inline, and thus 4 >> non-contiguous pages could be used. > > That's what I had in mind... > >> cxgb4, however, only support 4 recv >> sges, so it would only support 12K of inline with this implementation. > > :( > >> And perhaps there are rdma devices with even fewer recv sges? >> >> Do you have any other idea on how to avoid higher-order allocations? > > Not really. I guess that we should be able to recv the user > inline_data_size. So maybe it would be ok if the we'd do that only if > we must? I'll explore this idea. I also thought about just reducing the inline data size based on the number of sges available. But I think the inline data size has already been advertised to the host side at the time the target is allocating these pages. :( Steve. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 5/29/2018 9:25 PM, Steve Wise wrote: > Add a new configfs port attribute, called inline_data_size, > to allow configuring the size of inline data for a given port. > The maximum size allowed is still enforced by nvmet-rdma with > NVMET_RDMA_MAX_INLINE_DATA_SIZE, which is increased to max(16KB, > PAGE_SIZE). And the default size, if not specified via configfs, > is still PAGE_SIZE. This preserves the existing behavior, but allows > larger inline sizes. > > Also support a configuration where inline_data_size is 0, which disables > using inline data. > > Signed-off-by: Steve Wise <swise@opengridcomputing.com> > --- > drivers/nvme/target/admin-cmd.c | 4 ++-- > drivers/nvme/target/configfs.c | 31 ++++++++++++++++++++++++++++ > drivers/nvme/target/core.c | 4 ++++ > drivers/nvme/target/discovery.c | 2 +- > drivers/nvme/target/nvmet.h | 2 +- > drivers/nvme/target/rdma.c | 45 ++++++++++++++++++++++++++++------------- > 6 files changed, 70 insertions(+), 18 deletions(-) snip.. > > diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c > index 52e0c5d..2f0b08e 100644 > --- a/drivers/nvme/target/rdma.c > +++ b/drivers/nvme/target/rdma.c > @@ -33,9 +33,10 @@ > #include "nvmet.h" > > /* > - * We allow up to a page of inline data to go with the SQE > + * We allow at least 1 page, and up to 16KB of inline data to go with the SQE > */ > -#define NVMET_RDMA_INLINE_DATA_SIZE PAGE_SIZE > +#define NVMET_RDMA_DEFAULT_INLINE_DATA_SIZE PAGE_SIZE > +#define NVMET_RDMA_MAX_INLINE_DATA_SIZE max_t(int, SZ_16K, PAGE_SIZE) why not use SZ_16K ? why we need to mention the PAGE_SIZE ? > > struct nvmet_rdma_cmd { > struct ib_sge sge[2]; > @@ -116,6 +117,7 @@ struct nvmet_rdma_device { > size_t srq_size; > struct kref ref; > struct list_head entry; > + int inline_data_size; > }; > > static bool nvmet_rdma_use_srq; > @@ -187,6 +189,8 @@ static inline bool nvmet_rdma_need_data_out(struct nvmet_rdma_rsp *rsp) > static int nvmet_rdma_alloc_cmd(struct nvmet_rdma_device *ndev, > struct nvmet_rdma_cmd *c, bool admin) > { > + int inline_data_size = ndev->inline_data_size; > + > /* NVMe command / RDMA RECV */ > c->nvme_cmd = kmalloc(sizeof(*c->nvme_cmd), GFP_KERNEL); > if (!c->nvme_cmd) > @@ -200,17 +204,17 @@ static int nvmet_rdma_alloc_cmd(struct nvmet_rdma_device *ndev, > c->sge[0].length = sizeof(*c->nvme_cmd); > c->sge[0].lkey = ndev->pd->local_dma_lkey; > > - if (!admin) { > + if (!admin && inline_data_size) { > c->inline_page = alloc_pages(GFP_KERNEL, > - get_order(NVMET_RDMA_INLINE_DATA_SIZE)); > + get_order(inline_data_size)); > if (!c->inline_page) > goto out_unmap_cmd; > c->sge[1].addr = ib_dma_map_page(ndev->device, > - c->inline_page, 0, NVMET_RDMA_INLINE_DATA_SIZE, > + c->inline_page, 0, inline_data_size, > DMA_FROM_DEVICE); > if (ib_dma_mapping_error(ndev->device, c->sge[1].addr)) > goto out_free_inline_page; > - c->sge[1].length = NVMET_RDMA_INLINE_DATA_SIZE; > + c->sge[1].length = inline_data_size; > c->sge[1].lkey = ndev->pd->local_dma_lkey; > } > > @@ -225,7 +229,7 @@ static int nvmet_rdma_alloc_cmd(struct nvmet_rdma_device *ndev, > out_free_inline_page: > if (!admin) { > __free_pages(c->inline_page, > - get_order(NVMET_RDMA_INLINE_DATA_SIZE)); > + get_order(inline_data_size)); > } > out_unmap_cmd: > ib_dma_unmap_single(ndev->device, c->sge[0].addr, > @@ -240,11 +244,13 @@ static int nvmet_rdma_alloc_cmd(struct nvmet_rdma_device *ndev, > static void nvmet_rdma_free_cmd(struct nvmet_rdma_device *ndev, > struct nvmet_rdma_cmd *c, bool admin) > { > - if (!admin) { > + int inline_data_size = ndev->inline_data_size; > + > + if (!admin && inline_data_size) { > ib_dma_unmap_page(ndev->device, c->sge[1].addr, > - NVMET_RDMA_INLINE_DATA_SIZE, DMA_FROM_DEVICE); > + inline_data_size, DMA_FROM_DEVICE); > __free_pages(c->inline_page, > - get_order(NVMET_RDMA_INLINE_DATA_SIZE)); > + get_order(inline_data_size)); > } > ib_dma_unmap_single(ndev->device, c->sge[0].addr, > sizeof(*c->nvme_cmd), DMA_FROM_DEVICE); > @@ -544,7 +550,7 @@ static u16 nvmet_rdma_map_sgl_inline(struct nvmet_rdma_rsp *rsp) > if (!nvme_is_write(rsp->req.cmd)) > return NVME_SC_INVALID_FIELD | NVME_SC_DNR; > > - if (off + len > NVMET_RDMA_INLINE_DATA_SIZE) { > + if (off + len > rsp->queue->dev->inline_data_size) { > pr_err("invalid inline data offset!\n"); > return NVME_SC_SGL_INVALID_OFFSET | NVME_SC_DNR; > } > @@ -793,6 +799,7 @@ static void nvmet_rdma_free_dev(struct kref *ref) > static struct nvmet_rdma_device * > nvmet_rdma_find_get_device(struct rdma_cm_id *cm_id) > { > + struct nvmet_port *port = cm_id->context; > struct nvmet_rdma_device *ndev; > int ret; > > @@ -807,6 +814,7 @@ static void nvmet_rdma_free_dev(struct kref *ref) > if (!ndev) > goto out_err; > > + ndev->inline_data_size = port->inline_data_size; > ndev->device = cm_id->device; > kref_init(&ndev->ref); > > @@ -1379,6 +1387,15 @@ static int nvmet_rdma_add_port(struct nvmet_port *port) > return -EINVAL; > } > > + if (port->inline_data_size < 0) { > + port->inline_data_size = NVMET_RDMA_DEFAULT_INLINE_DATA_SIZE; > + } else if (port->inline_data_size > NVMET_RDMA_MAX_INLINE_DATA_SIZE) { > + pr_err("invalid inline_data_size %d (max supported is %u)\n", > + port->inline_data_size, > + NVMET_RDMA_MAX_INLINE_DATA_SIZE); > + return -EINVAL; > + } > + > ret = inet_pton_with_scope(&init_net, af, port->disc_addr.traddr, > port->disc_addr.trsvcid, &addr); > if (ret) { > @@ -1418,8 +1435,9 @@ static int nvmet_rdma_add_port(struct nvmet_port *port) > goto out_destroy_id; > } > > - pr_info("enabling port %d (%pISpcs)\n", > - le16_to_cpu(port->disc_addr.portid), (struct sockaddr *)&addr); > + pr_info("enabling port %d (%pISpcs) inline_data_size %d\n", > + le16_to_cpu(port->disc_addr.portid), (struct sockaddr *)&addr, > + port->inline_data_size); > port->priv = cm_id; > return 0; > > @@ -1456,7 +1474,6 @@ static void nvmet_rdma_disc_port_addr(struct nvmet_req *req, > static const struct nvmet_fabrics_ops nvmet_rdma_ops = { > .owner = THIS_MODULE, > .type = NVMF_TRTYPE_RDMA, > - .sqe_inline_size = NVMET_RDMA_INLINE_DATA_SIZE, > .msdbd = 1, > .has_keyed_sgls = 1, > .add_port = nvmet_rdma_add_port, > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> -----Original Message----- > From: Max Gurtovoy <maxg@mellanox.com> > Sent: Sunday, June 3, 2018 3:40 AM > To: Steve Wise <swise@opengridcomputing.com>; axboe@kernel.dk; > hch@lst.de; keith.busch@intel.com; sagi@grimberg.me; linux- > nvme@lists.infradead.org > Cc: parav@mellanox.com; linux-rdma@vger.kernel.org > Subject: Re: [PATCH v3 3/3] nvmet-rdma: support 16K inline data > > > > On 5/29/2018 9:25 PM, Steve Wise wrote: > > Add a new configfs port attribute, called inline_data_size, > > to allow configuring the size of inline data for a given port. > > The maximum size allowed is still enforced by nvmet-rdma with > > NVMET_RDMA_MAX_INLINE_DATA_SIZE, which is increased to max(16KB, > > PAGE_SIZE). And the default size, if not specified via configfs, > > is still PAGE_SIZE. This preserves the existing behavior, but allows > > larger inline sizes. > > > > Also support a configuration where inline_data_size is 0, which disables > > using inline data. > > > > Signed-off-by: Steve Wise <swise@opengridcomputing.com> > > --- > > drivers/nvme/target/admin-cmd.c | 4 ++-- > > drivers/nvme/target/configfs.c | 31 ++++++++++++++++++++++++++++ > > drivers/nvme/target/core.c | 4 ++++ > > drivers/nvme/target/discovery.c | 2 +- > > drivers/nvme/target/nvmet.h | 2 +- > > drivers/nvme/target/rdma.c | 45 ++++++++++++++++++++++++++++----- > -------- > > 6 files changed, 70 insertions(+), 18 deletions(-) > > snip.. > > > > > diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c > > index 52e0c5d..2f0b08e 100644 > > --- a/drivers/nvme/target/rdma.c > > +++ b/drivers/nvme/target/rdma.c > > @@ -33,9 +33,10 @@ > > #include "nvmet.h" > > > > /* > > - * We allow up to a page of inline data to go with the SQE > > + * We allow at least 1 page, and up to 16KB of inline data to go with the > SQE > > */ > > -#define NVMET_RDMA_INLINE_DATA_SIZE PAGE_SIZE > > +#define NVMET_RDMA_DEFAULT_INLINE_DATA_SIZE PAGE_SIZE > > +#define NVMET_RDMA_MAX_INLINE_DATA_SIZE max_t(int, > SZ_16K, PAGE_SIZE) > > why not use SZ_16K ? why we need to mention the PAGE_SIZE ? > The idea is to allow at least 1 page. So for, say, a 64K page system, we'll allow 64K since we're allocating a page minimum for the buffer. > > > > struct nvmet_rdma_cmd { > > struct ib_sge sge[2]; > > @@ -116,6 +117,7 @@ struct nvmet_rdma_device { > > size_t srq_size; > > struct kref ref; > > struct list_head entry; > > + int inline_data_size; > > }; > > > > static bool nvmet_rdma_use_srq; > > @@ -187,6 +189,8 @@ static inline bool > nvmet_rdma_need_data_out(struct nvmet_rdma_rsp *rsp) > > static int nvmet_rdma_alloc_cmd(struct nvmet_rdma_device *ndev, > > struct nvmet_rdma_cmd *c, bool admin) > > { > > + int inline_data_size = ndev->inline_data_size; > > + > > /* NVMe command / RDMA RECV */ > > c->nvme_cmd = kmalloc(sizeof(*c->nvme_cmd), GFP_KERNEL); > > if (!c->nvme_cmd) > > @@ -200,17 +204,17 @@ static int nvmet_rdma_alloc_cmd(struct > nvmet_rdma_device *ndev, > > c->sge[0].length = sizeof(*c->nvme_cmd); > > c->sge[0].lkey = ndev->pd->local_dma_lkey; > > > > - if (!admin) { > > + if (!admin && inline_data_size) { > > c->inline_page = alloc_pages(GFP_KERNEL, > > - > get_order(NVMET_RDMA_INLINE_DATA_SIZE)); > > + get_order(inline_data_size)); > > if (!c->inline_page) > > goto out_unmap_cmd; > > c->sge[1].addr = ib_dma_map_page(ndev->device, > > - c->inline_page, 0, > NVMET_RDMA_INLINE_DATA_SIZE, > > + c->inline_page, 0, inline_data_size, > > DMA_FROM_DEVICE); > > if (ib_dma_mapping_error(ndev->device, c->sge[1].addr)) > > goto out_free_inline_page; > > - c->sge[1].length = NVMET_RDMA_INLINE_DATA_SIZE; > > + c->sge[1].length = inline_data_size; > > c->sge[1].lkey = ndev->pd->local_dma_lkey; > > } > > > > @@ -225,7 +229,7 @@ static int nvmet_rdma_alloc_cmd(struct > nvmet_rdma_device *ndev, > > out_free_inline_page: > > if (!admin) { > > __free_pages(c->inline_page, > > - > get_order(NVMET_RDMA_INLINE_DATA_SIZE)); > > + get_order(inline_data_size)); > > } > > out_unmap_cmd: > > ib_dma_unmap_single(ndev->device, c->sge[0].addr, > > @@ -240,11 +244,13 @@ static int nvmet_rdma_alloc_cmd(struct > nvmet_rdma_device *ndev, > > static void nvmet_rdma_free_cmd(struct nvmet_rdma_device *ndev, > > struct nvmet_rdma_cmd *c, bool admin) > > { > > - if (!admin) { > > + int inline_data_size = ndev->inline_data_size; > > + > > + if (!admin && inline_data_size) { > > ib_dma_unmap_page(ndev->device, c->sge[1].addr, > > - NVMET_RDMA_INLINE_DATA_SIZE, > DMA_FROM_DEVICE); > > + inline_data_size, DMA_FROM_DEVICE); > > __free_pages(c->inline_page, > > - > get_order(NVMET_RDMA_INLINE_DATA_SIZE)); > > + get_order(inline_data_size)); > > } > > ib_dma_unmap_single(ndev->device, c->sge[0].addr, > > sizeof(*c->nvme_cmd), > DMA_FROM_DEVICE); > > @@ -544,7 +550,7 @@ static u16 nvmet_rdma_map_sgl_inline(struct > nvmet_rdma_rsp *rsp) > > if (!nvme_is_write(rsp->req.cmd)) > > return NVME_SC_INVALID_FIELD | NVME_SC_DNR; > > > > - if (off + len > NVMET_RDMA_INLINE_DATA_SIZE) { > > + if (off + len > rsp->queue->dev->inline_data_size) { > > pr_err("invalid inline data offset!\n"); > > return NVME_SC_SGL_INVALID_OFFSET | NVME_SC_DNR; > > } > > @@ -793,6 +799,7 @@ static void nvmet_rdma_free_dev(struct kref *ref) > > static struct nvmet_rdma_device * > > nvmet_rdma_find_get_device(struct rdma_cm_id *cm_id) > > { > > + struct nvmet_port *port = cm_id->context; > > struct nvmet_rdma_device *ndev; > > int ret; > > > > @@ -807,6 +814,7 @@ static void nvmet_rdma_free_dev(struct kref *ref) > > if (!ndev) > > goto out_err; > > > > + ndev->inline_data_size = port->inline_data_size; > > ndev->device = cm_id->device; > > kref_init(&ndev->ref); > > > > @@ -1379,6 +1387,15 @@ static int nvmet_rdma_add_port(struct > nvmet_port *port) > > return -EINVAL; > > } > > > > + if (port->inline_data_size < 0) { > > + port->inline_data_size = > NVMET_RDMA_DEFAULT_INLINE_DATA_SIZE; > > + } else if (port->inline_data_size > > NVMET_RDMA_MAX_INLINE_DATA_SIZE) { > > + pr_err("invalid inline_data_size %d (max supported is > %u)\n", > > + port->inline_data_size, > > + NVMET_RDMA_MAX_INLINE_DATA_SIZE); > > + return -EINVAL; > > + } > > + > > ret = inet_pton_with_scope(&init_net, af, port->disc_addr.traddr, > > port->disc_addr.trsvcid, &addr); > > if (ret) { > > @@ -1418,8 +1435,9 @@ static int nvmet_rdma_add_port(struct > nvmet_port *port) > > goto out_destroy_id; > > } > > > > - pr_info("enabling port %d (%pISpcs)\n", > > - le16_to_cpu(port->disc_addr.portid), (struct sockaddr > *)&addr); > > + pr_info("enabling port %d (%pISpcs) inline_data_size %d\n", > > + le16_to_cpu(port->disc_addr.portid), (struct sockaddr > *)&addr, > > + port->inline_data_size); > > port->priv = cm_id; > > return 0; > > > > @@ -1456,7 +1474,6 @@ static void nvmet_rdma_disc_port_addr(struct > nvmet_req *req, > > static const struct nvmet_fabrics_ops nvmet_rdma_ops = { > > .owner = THIS_MODULE, > > .type = NVMF_TRTYPE_RDMA, > > - .sqe_inline_size = NVMET_RDMA_INLINE_DATA_SIZE, > > .msdbd = 1, > > .has_keyed_sgls = 1, > > .add_port = nvmet_rdma_add_port, > > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 6/3/2018 9:25 PM, Steve Wise wrote: > > >> -----Original Message----- >> From: Max Gurtovoy <maxg@mellanox.com> >> Sent: Sunday, June 3, 2018 3:40 AM >> To: Steve Wise <swise@opengridcomputing.com>; axboe@kernel.dk; >> hch@lst.de; keith.busch@intel.com; sagi@grimberg.me; linux- >> nvme@lists.infradead.org >> Cc: parav@mellanox.com; linux-rdma@vger.kernel.org >> Subject: Re: [PATCH v3 3/3] nvmet-rdma: support 16K inline data >> >> >> >> On 5/29/2018 9:25 PM, Steve Wise wrote: >>> Add a new configfs port attribute, called inline_data_size, >>> to allow configuring the size of inline data for a given port. >>> The maximum size allowed is still enforced by nvmet-rdma with >>> NVMET_RDMA_MAX_INLINE_DATA_SIZE, which is increased to max(16KB, >>> PAGE_SIZE). And the default size, if not specified via configfs, >>> is still PAGE_SIZE. This preserves the existing behavior, but allows >>> larger inline sizes. >>> >>> Also support a configuration where inline_data_size is 0, which disables >>> using inline data. >>> >>> Signed-off-by: Steve Wise <swise@opengridcomputing.com> >>> --- >>> drivers/nvme/target/admin-cmd.c | 4 ++-- >>> drivers/nvme/target/configfs.c | 31 ++++++++++++++++++++++++++++ >>> drivers/nvme/target/core.c | 4 ++++ >>> drivers/nvme/target/discovery.c | 2 +- >>> drivers/nvme/target/nvmet.h | 2 +- >>> drivers/nvme/target/rdma.c | 45 ++++++++++++++++++++++++++++----- >> -------- >>> 6 files changed, 70 insertions(+), 18 deletions(-) >> >> snip.. >> >>> >>> diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c >>> index 52e0c5d..2f0b08e 100644 >>> --- a/drivers/nvme/target/rdma.c >>> +++ b/drivers/nvme/target/rdma.c >>> @@ -33,9 +33,10 @@ >>> #include "nvmet.h" >>> >>> /* >>> - * We allow up to a page of inline data to go with the SQE >>> + * We allow at least 1 page, and up to 16KB of inline data to go with > the >> SQE >>> */ >>> -#define NVMET_RDMA_INLINE_DATA_SIZE PAGE_SIZE >>> +#define NVMET_RDMA_DEFAULT_INLINE_DATA_SIZE PAGE_SIZE >>> +#define NVMET_RDMA_MAX_INLINE_DATA_SIZE max_t(int, >> SZ_16K, PAGE_SIZE) >> >> why not use SZ_16K ? why we need to mention the PAGE_SIZE ? >> > > The idea is to allow at least 1 page. So for, say, a 64K page system, we'll > allow 64K since we're allocating a page minimum for the buffer. IMO you want to support upto 16K inline data and not upto 64k (also in PowerPC system)... -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> -----Original Message----- > From: Max Gurtovoy <maxg@mellanox.com> > Sent: Monday, June 4, 2018 8:58 AM > To: Steve Wise <swise@opengridcomputing.com>; axboe@kernel.dk; > hch@lst.de; keith.busch@intel.com; sagi@grimberg.me; linux- > nvme@lists.infradead.org > Cc: parav@mellanox.com; linux-rdma@vger.kernel.org > Subject: Re: [PATCH v3 3/3] nvmet-rdma: support 16K inline data > > > > On 6/3/2018 9:25 PM, Steve Wise wrote: > > > > > >> -----Original Message----- > >> From: Max Gurtovoy <maxg@mellanox.com> > >> Sent: Sunday, June 3, 2018 3:40 AM > >> To: Steve Wise <swise@opengridcomputing.com>; axboe@kernel.dk; > >> hch@lst.de; keith.busch@intel.com; sagi@grimberg.me; linux- > >> nvme@lists.infradead.org > >> Cc: parav@mellanox.com; linux-rdma@vger.kernel.org > >> Subject: Re: [PATCH v3 3/3] nvmet-rdma: support 16K inline data > >> > >> > >> > >> On 5/29/2018 9:25 PM, Steve Wise wrote: > >>> Add a new configfs port attribute, called inline_data_size, > >>> to allow configuring the size of inline data for a given port. > >>> The maximum size allowed is still enforced by nvmet-rdma with > >>> NVMET_RDMA_MAX_INLINE_DATA_SIZE, which is increased to > max(16KB, > >>> PAGE_SIZE). And the default size, if not specified via configfs, > >>> is still PAGE_SIZE. This preserves the existing behavior, but allows > >>> larger inline sizes. > >>> > >>> Also support a configuration where inline_data_size is 0, which disables > >>> using inline data. > >>> > >>> Signed-off-by: Steve Wise <swise@opengridcomputing.com> > >>> --- > >>> drivers/nvme/target/admin-cmd.c | 4 ++-- > >>> drivers/nvme/target/configfs.c | 31 ++++++++++++++++++++++++++++ > >>> drivers/nvme/target/core.c | 4 ++++ > >>> drivers/nvme/target/discovery.c | 2 +- > >>> drivers/nvme/target/nvmet.h | 2 +- > >>> drivers/nvme/target/rdma.c | 45 ++++++++++++++++++++++++++++- > ---- > >> -------- > >>> 6 files changed, 70 insertions(+), 18 deletions(-) > >> > >> snip.. > >> > >>> > >>> diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c > >>> index 52e0c5d..2f0b08e 100644 > >>> --- a/drivers/nvme/target/rdma.c > >>> +++ b/drivers/nvme/target/rdma.c > >>> @@ -33,9 +33,10 @@ > >>> #include "nvmet.h" > >>> > >>> /* > >>> - * We allow up to a page of inline data to go with the SQE > >>> + * We allow at least 1 page, and up to 16KB of inline data to go with > > the > >> SQE > >>> */ > >>> -#define NVMET_RDMA_INLINE_DATA_SIZE PAGE_SIZE > >>> +#define NVMET_RDMA_DEFAULT_INLINE_DATA_SIZE PAGE_SIZE > >>> +#define NVMET_RDMA_MAX_INLINE_DATA_SIZE max_t(int, > >> SZ_16K, PAGE_SIZE) > >> > >> why not use SZ_16K ? why we need to mention the PAGE_SIZE ? > >> > > > > The idea is to allow at least 1 page. So for, say, a 64K page system, we'll > > allow 64K since we're allocating a page minimum for the buffer. > > IMO you want to support upto 16K inline data and not upto 64k (also in > PowerPC system)... Why? -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 6/4/2018 5:18 PM, Steve Wise wrote: > > >> -----Original Message----- >> From: Max Gurtovoy <maxg@mellanox.com> >> Sent: Monday, June 4, 2018 8:58 AM >> To: Steve Wise <swise@opengridcomputing.com>; axboe@kernel.dk; >> hch@lst.de; keith.busch@intel.com; sagi@grimberg.me; linux- >> nvme@lists.infradead.org >> Cc: parav@mellanox.com; linux-rdma@vger.kernel.org >> Subject: Re: [PATCH v3 3/3] nvmet-rdma: support 16K inline data >> >> >> >> On 6/3/2018 9:25 PM, Steve Wise wrote: >>> >>> >>>> -----Original Message----- >>>> From: Max Gurtovoy <maxg@mellanox.com> >>>> Sent: Sunday, June 3, 2018 3:40 AM >>>> To: Steve Wise <swise@opengridcomputing.com>; axboe@kernel.dk; >>>> hch@lst.de; keith.busch@intel.com; sagi@grimberg.me; linux- >>>> nvme@lists.infradead.org >>>> Cc: parav@mellanox.com; linux-rdma@vger.kernel.org >>>> Subject: Re: [PATCH v3 3/3] nvmet-rdma: support 16K inline data >>>> >>>> >>>> >>>> On 5/29/2018 9:25 PM, Steve Wise wrote: >>>>> Add a new configfs port attribute, called inline_data_size, >>>>> to allow configuring the size of inline data for a given port. >>>>> The maximum size allowed is still enforced by nvmet-rdma with >>>>> NVMET_RDMA_MAX_INLINE_DATA_SIZE, which is increased to >> max(16KB, >>>>> PAGE_SIZE). And the default size, if not specified via configfs, >>>>> is still PAGE_SIZE. This preserves the existing behavior, but allows >>>>> larger inline sizes. >>>>> >>>>> Also support a configuration where inline_data_size is 0, which > disables >>>>> using inline data. >>>>> >>>>> Signed-off-by: Steve Wise <swise@opengridcomputing.com> >>>>> --- >>>>> drivers/nvme/target/admin-cmd.c | 4 ++-- >>>>> drivers/nvme/target/configfs.c | 31 ++++++++++++++++++++++++++++ >>>>> drivers/nvme/target/core.c | 4 ++++ >>>>> drivers/nvme/target/discovery.c | 2 +- >>>>> drivers/nvme/target/nvmet.h | 2 +- >>>>> drivers/nvme/target/rdma.c | 45 ++++++++++++++++++++++++++++- >> ---- >>>> -------- >>>>> 6 files changed, 70 insertions(+), 18 deletions(-) >>>> >>>> snip.. >>>> >>>>> >>>>> diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c >>>>> index 52e0c5d..2f0b08e 100644 >>>>> --- a/drivers/nvme/target/rdma.c >>>>> +++ b/drivers/nvme/target/rdma.c >>>>> @@ -33,9 +33,10 @@ >>>>> #include "nvmet.h" >>>>> >>>>> /* >>>>> - * We allow up to a page of inline data to go with the SQE >>>>> + * We allow at least 1 page, and up to 16KB of inline data to go with >>> the >>>> SQE >>>>> */ >>>>> -#define NVMET_RDMA_INLINE_DATA_SIZE PAGE_SIZE >>>>> +#define NVMET_RDMA_DEFAULT_INLINE_DATA_SIZE PAGE_SIZE >>>>> +#define NVMET_RDMA_MAX_INLINE_DATA_SIZE max_t(int, >>>> SZ_16K, PAGE_SIZE) >>>> >>>> why not use SZ_16K ? why we need to mention the PAGE_SIZE ? >>>> >>> >>> The idea is to allow at least 1 page. So for, say, a 64K page system, > we'll >>> allow 64K since we're allocating a page minimum for the buffer. >> >> IMO you want to support upto 16K inline data and not upto 64k (also in >> PowerPC system)... > > Why? > Ok we can use PAGE_SIZE from resource and perf perspective, but please change the subject since it is confusing. I guess the subject can be similar to the one you used in the initiator. I think we should consider clamping to NVMET_RDMA_MAX_INLINE_DATA_SIZE in case port->inline_data_size > NVMET_RDMA_MAX_INLINE_DATA_SIZE instead of failing the operation (and add an info print). -Max. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 6/5/2018 3:52 AM, Max Gurtovoy wrote: > > > On 6/4/2018 5:18 PM, Steve Wise wrote: >> >> >>> -----Original Message----- >>> From: Max Gurtovoy <maxg@mellanox.com> >>> Sent: Monday, June 4, 2018 8:58 AM >>> To: Steve Wise <swise@opengridcomputing.com>; axboe@kernel.dk; >>> hch@lst.de; keith.busch@intel.com; sagi@grimberg.me; linux- >>> nvme@lists.infradead.org >>> Cc: parav@mellanox.com; linux-rdma@vger.kernel.org >>> Subject: Re: [PATCH v3 3/3] nvmet-rdma: support 16K inline data >>> >>> >>> >>> On 6/3/2018 9:25 PM, Steve Wise wrote: >>>> >>>> >>>>> -----Original Message----- >>>>> From: Max Gurtovoy <maxg@mellanox.com> >>>>> Sent: Sunday, June 3, 2018 3:40 AM >>>>> To: Steve Wise <swise@opengridcomputing.com>; axboe@kernel.dk; >>>>> hch@lst.de; keith.busch@intel.com; sagi@grimberg.me; linux- >>>>> nvme@lists.infradead.org >>>>> Cc: parav@mellanox.com; linux-rdma@vger.kernel.org >>>>> Subject: Re: [PATCH v3 3/3] nvmet-rdma: support 16K inline data >>>>> >>>>> >>>>> >>>>> On 5/29/2018 9:25 PM, Steve Wise wrote: >>>>>> Add a new configfs port attribute, called inline_data_size, >>>>>> to allow configuring the size of inline data for a given port. >>>>>> The maximum size allowed is still enforced by nvmet-rdma with >>>>>> NVMET_RDMA_MAX_INLINE_DATA_SIZE, which is increased to >>> max(16KB, >>>>>> PAGE_SIZE). And the default size, if not specified via configfs, >>>>>> is still PAGE_SIZE. This preserves the existing behavior, but >>>>>> allows >>>>>> larger inline sizes. >>>>>> >>>>>> Also support a configuration where inline_data_size is 0, which >> disables >>>>>> using inline data. >>>>>> >>>>>> Signed-off-by: Steve Wise <swise@opengridcomputing.com> >>>>>> --- >>>>>> drivers/nvme/target/admin-cmd.c | 4 ++-- >>>>>> drivers/nvme/target/configfs.c | 31 >>>>>> ++++++++++++++++++++++++++++ >>>>>> drivers/nvme/target/core.c | 4 ++++ >>>>>> drivers/nvme/target/discovery.c | 2 +- >>>>>> drivers/nvme/target/nvmet.h | 2 +- >>>>>> drivers/nvme/target/rdma.c | 45 >>>>>> ++++++++++++++++++++++++++++- >>> ---- >>>>> -------- >>>>>> 6 files changed, 70 insertions(+), 18 deletions(-) >>>>> >>>>> snip.. >>>>> >>>>>> >>>>>> diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c >>>>>> index 52e0c5d..2f0b08e 100644 >>>>>> --- a/drivers/nvme/target/rdma.c >>>>>> +++ b/drivers/nvme/target/rdma.c >>>>>> @@ -33,9 +33,10 @@ >>>>>> #include "nvmet.h" >>>>>> >>>>>> /* >>>>>> - * We allow up to a page of inline data to go with the SQE >>>>>> + * We allow at least 1 page, and up to 16KB of inline data to go >>>>>> with >>>> the >>>>> SQE >>>>>> */ >>>>>> -#define NVMET_RDMA_INLINE_DATA_SIZE PAGE_SIZE >>>>>> +#define NVMET_RDMA_DEFAULT_INLINE_DATA_SIZE PAGE_SIZE >>>>>> +#define NVMET_RDMA_MAX_INLINE_DATA_SIZE max_t(int, >>>>> SZ_16K, PAGE_SIZE) >>>>> >>>>> why not use SZ_16K ? why we need to mention the PAGE_SIZE ? >>>>> >>>> >>>> The idea is to allow at least 1 page. So for, say, a 64K page system, >> we'll >>>> allow 64K since we're allocating a page minimum for the buffer. >>> >>> IMO you want to support upto 16K inline data and not upto 64k (also in >>> PowerPC system)... >> >> Why? >> > > Ok we can use PAGE_SIZE from resource and perf perspective, but please > change the subject since it is confusing. I guess the subject can be > similar to the one you used in the initiator. > > I think we should consider clamping to NVMET_RDMA_MAX_INLINE_DATA_SIZE > in case port->inline_data_size > NVMET_RDMA_MAX_INLINE_DATA_SIZE > instead of failing the operation (and add an info print). > This sounds reasonable. In my yet-to-be sent new version of this, I have similar logic if the rdma device doesn't have enough sge entries to support the configured size. In that case, the inline data size is clamped to the max that can be supported by the device. Steve. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c index 5e0e9fc..a9e3223 100644 --- a/drivers/nvme/target/admin-cmd.c +++ b/drivers/nvme/target/admin-cmd.c @@ -247,14 +247,14 @@ static void nvmet_execute_identify_ctrl(struct nvmet_req *req) id->sgls = cpu_to_le32(1 << 0); /* we always support SGLs */ if (ctrl->ops->has_keyed_sgls) id->sgls |= cpu_to_le32(1 << 2); - if (ctrl->ops->sqe_inline_size) + if (req->port->inline_data_size) id->sgls |= cpu_to_le32(1 << 20); strcpy(id->subnqn, ctrl->subsys->subsysnqn); /* Max command capsule size is sqe + single page of in-capsule data */ id->ioccsz = cpu_to_le32((sizeof(struct nvme_command) + - ctrl->ops->sqe_inline_size) / 16); + req->port->inline_data_size) / 16); /* Max response capsule size is cqe */ id->iorcsz = cpu_to_le32(sizeof(struct nvme_completion) / 16); diff --git a/drivers/nvme/target/configfs.c b/drivers/nvme/target/configfs.c index ad9ff27..9867783 100644 --- a/drivers/nvme/target/configfs.c +++ b/drivers/nvme/target/configfs.c @@ -214,6 +214,35 @@ static ssize_t nvmet_addr_trsvcid_store(struct config_item *item, CONFIGFS_ATTR(nvmet_, addr_trsvcid); +static ssize_t nvmet_param_inline_data_size_show(struct config_item *item, + char *page) +{ + struct nvmet_port *port = to_nvmet_port(item); + + return snprintf(page, PAGE_SIZE, "%d\n", port->inline_data_size); +} + +static ssize_t nvmet_param_inline_data_size_store(struct config_item *item, + const char *page, size_t count) +{ + struct nvmet_port *port = to_nvmet_port(item); + int ret; + + if (port->enabled) { + pr_err("Cannot modify inline_data_size enabled\n"); + pr_err("Disable the port before modifying\n"); + return -EACCES; + } + ret = kstrtoint(page, 0, &port->inline_data_size); + if (ret) { + pr_err("Invalid value '%s' for inline_data_size\n", page); + return -EINVAL; + } + return count; +} + +CONFIGFS_ATTR(nvmet_, param_inline_data_size); + static ssize_t nvmet_addr_trtype_show(struct config_item *item, char *page) { @@ -870,6 +899,7 @@ static void nvmet_port_release(struct config_item *item) &nvmet_attr_addr_traddr, &nvmet_attr_addr_trsvcid, &nvmet_attr_addr_trtype, + &nvmet_attr_param_inline_data_size, NULL, }; @@ -899,6 +929,7 @@ static struct config_group *nvmet_ports_make(struct config_group *group, INIT_LIST_HEAD(&port->entry); INIT_LIST_HEAD(&port->subsystems); INIT_LIST_HEAD(&port->referrals); + port->inline_data_size = -1; /* < 0 == let the transport choose */ port->disc_addr.portid = cpu_to_le16(portid); config_group_init_type_name(&port->group, name, &nvmet_port_type); diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c index e95424f..695ec17 100644 --- a/drivers/nvme/target/core.c +++ b/drivers/nvme/target/core.c @@ -189,6 +189,10 @@ int nvmet_enable_port(struct nvmet_port *port) return ret; } + /* If the transport didn't set inline_data_size, then disable it. */ + if (port->inline_data_size < 0) + port->inline_data_size = 0; + port->enabled = true; return 0; } diff --git a/drivers/nvme/target/discovery.c b/drivers/nvme/target/discovery.c index 231e04e..fc2e675 100644 --- a/drivers/nvme/target/discovery.c +++ b/drivers/nvme/target/discovery.c @@ -171,7 +171,7 @@ static void nvmet_execute_identify_disc_ctrl(struct nvmet_req *req) id->sgls = cpu_to_le32(1 << 0); /* we always support SGLs */ if (ctrl->ops->has_keyed_sgls) id->sgls |= cpu_to_le32(1 << 2); - if (ctrl->ops->sqe_inline_size) + if (req->port->inline_data_size) id->sgls |= cpu_to_le32(1 << 20); strcpy(id->subnqn, ctrl->subsys->subsysnqn); diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h index 15fd84a..db29e45 100644 --- a/drivers/nvme/target/nvmet.h +++ b/drivers/nvme/target/nvmet.h @@ -98,6 +98,7 @@ struct nvmet_port { struct list_head referrals; void *priv; bool enabled; + int inline_data_size; }; static inline struct nvmet_port *to_nvmet_port(struct config_item *item) @@ -202,7 +203,6 @@ struct nvmet_subsys_link { struct nvmet_fabrics_ops { struct module *owner; unsigned int type; - unsigned int sqe_inline_size; unsigned int msdbd; bool has_keyed_sgls : 1; void (*queue_response)(struct nvmet_req *req); diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c index 52e0c5d..2f0b08e 100644 --- a/drivers/nvme/target/rdma.c +++ b/drivers/nvme/target/rdma.c @@ -33,9 +33,10 @@ #include "nvmet.h" /* - * We allow up to a page of inline data to go with the SQE + * We allow at least 1 page, and up to 16KB of inline data to go with the SQE */ -#define NVMET_RDMA_INLINE_DATA_SIZE PAGE_SIZE +#define NVMET_RDMA_DEFAULT_INLINE_DATA_SIZE PAGE_SIZE +#define NVMET_RDMA_MAX_INLINE_DATA_SIZE max_t(int, SZ_16K, PAGE_SIZE) struct nvmet_rdma_cmd { struct ib_sge sge[2]; @@ -116,6 +117,7 @@ struct nvmet_rdma_device { size_t srq_size; struct kref ref; struct list_head entry; + int inline_data_size; }; static bool nvmet_rdma_use_srq; @@ -187,6 +189,8 @@ static inline bool nvmet_rdma_need_data_out(struct nvmet_rdma_rsp *rsp) static int nvmet_rdma_alloc_cmd(struct nvmet_rdma_device *ndev, struct nvmet_rdma_cmd *c, bool admin) { + int inline_data_size = ndev->inline_data_size; + /* NVMe command / RDMA RECV */ c->nvme_cmd = kmalloc(sizeof(*c->nvme_cmd), GFP_KERNEL); if (!c->nvme_cmd) @@ -200,17 +204,17 @@ static int nvmet_rdma_alloc_cmd(struct nvmet_rdma_device *ndev, c->sge[0].length = sizeof(*c->nvme_cmd); c->sge[0].lkey = ndev->pd->local_dma_lkey; - if (!admin) { + if (!admin && inline_data_size) { c->inline_page = alloc_pages(GFP_KERNEL, - get_order(NVMET_RDMA_INLINE_DATA_SIZE)); + get_order(inline_data_size)); if (!c->inline_page) goto out_unmap_cmd; c->sge[1].addr = ib_dma_map_page(ndev->device, - c->inline_page, 0, NVMET_RDMA_INLINE_DATA_SIZE, + c->inline_page, 0, inline_data_size, DMA_FROM_DEVICE); if (ib_dma_mapping_error(ndev->device, c->sge[1].addr)) goto out_free_inline_page; - c->sge[1].length = NVMET_RDMA_INLINE_DATA_SIZE; + c->sge[1].length = inline_data_size; c->sge[1].lkey = ndev->pd->local_dma_lkey; } @@ -225,7 +229,7 @@ static int nvmet_rdma_alloc_cmd(struct nvmet_rdma_device *ndev, out_free_inline_page: if (!admin) { __free_pages(c->inline_page, - get_order(NVMET_RDMA_INLINE_DATA_SIZE)); + get_order(inline_data_size)); } out_unmap_cmd: ib_dma_unmap_single(ndev->device, c->sge[0].addr, @@ -240,11 +244,13 @@ static int nvmet_rdma_alloc_cmd(struct nvmet_rdma_device *ndev, static void nvmet_rdma_free_cmd(struct nvmet_rdma_device *ndev, struct nvmet_rdma_cmd *c, bool admin) { - if (!admin) { + int inline_data_size = ndev->inline_data_size; + + if (!admin && inline_data_size) { ib_dma_unmap_page(ndev->device, c->sge[1].addr, - NVMET_RDMA_INLINE_DATA_SIZE, DMA_FROM_DEVICE); + inline_data_size, DMA_FROM_DEVICE); __free_pages(c->inline_page, - get_order(NVMET_RDMA_INLINE_DATA_SIZE)); + get_order(inline_data_size)); } ib_dma_unmap_single(ndev->device, c->sge[0].addr, sizeof(*c->nvme_cmd), DMA_FROM_DEVICE); @@ -544,7 +550,7 @@ static u16 nvmet_rdma_map_sgl_inline(struct nvmet_rdma_rsp *rsp) if (!nvme_is_write(rsp->req.cmd)) return NVME_SC_INVALID_FIELD | NVME_SC_DNR; - if (off + len > NVMET_RDMA_INLINE_DATA_SIZE) { + if (off + len > rsp->queue->dev->inline_data_size) { pr_err("invalid inline data offset!\n"); return NVME_SC_SGL_INVALID_OFFSET | NVME_SC_DNR; } @@ -793,6 +799,7 @@ static void nvmet_rdma_free_dev(struct kref *ref) static struct nvmet_rdma_device * nvmet_rdma_find_get_device(struct rdma_cm_id *cm_id) { + struct nvmet_port *port = cm_id->context; struct nvmet_rdma_device *ndev; int ret; @@ -807,6 +814,7 @@ static void nvmet_rdma_free_dev(struct kref *ref) if (!ndev) goto out_err; + ndev->inline_data_size = port->inline_data_size; ndev->device = cm_id->device; kref_init(&ndev->ref); @@ -1379,6 +1387,15 @@ static int nvmet_rdma_add_port(struct nvmet_port *port) return -EINVAL; } + if (port->inline_data_size < 0) { + port->inline_data_size = NVMET_RDMA_DEFAULT_INLINE_DATA_SIZE; + } else if (port->inline_data_size > NVMET_RDMA_MAX_INLINE_DATA_SIZE) { + pr_err("invalid inline_data_size %d (max supported is %u)\n", + port->inline_data_size, + NVMET_RDMA_MAX_INLINE_DATA_SIZE); + return -EINVAL; + } + ret = inet_pton_with_scope(&init_net, af, port->disc_addr.traddr, port->disc_addr.trsvcid, &addr); if (ret) { @@ -1418,8 +1435,9 @@ static int nvmet_rdma_add_port(struct nvmet_port *port) goto out_destroy_id; } - pr_info("enabling port %d (%pISpcs)\n", - le16_to_cpu(port->disc_addr.portid), (struct sockaddr *)&addr); + pr_info("enabling port %d (%pISpcs) inline_data_size %d\n", + le16_to_cpu(port->disc_addr.portid), (struct sockaddr *)&addr, + port->inline_data_size); port->priv = cm_id; return 0; @@ -1456,7 +1474,6 @@ static void nvmet_rdma_disc_port_addr(struct nvmet_req *req, static const struct nvmet_fabrics_ops nvmet_rdma_ops = { .owner = THIS_MODULE, .type = NVMF_TRTYPE_RDMA, - .sqe_inline_size = NVMET_RDMA_INLINE_DATA_SIZE, .msdbd = 1, .has_keyed_sgls = 1, .add_port = nvmet_rdma_add_port,
Add a new configfs port attribute, called inline_data_size, to allow configuring the size of inline data for a given port. The maximum size allowed is still enforced by nvmet-rdma with NVMET_RDMA_MAX_INLINE_DATA_SIZE, which is increased to max(16KB, PAGE_SIZE). And the default size, if not specified via configfs, is still PAGE_SIZE. This preserves the existing behavior, but allows larger inline sizes. Also support a configuration where inline_data_size is 0, which disables using inline data. Signed-off-by: Steve Wise <swise@opengridcomputing.com> --- drivers/nvme/target/admin-cmd.c | 4 ++-- drivers/nvme/target/configfs.c | 31 ++++++++++++++++++++++++++++ drivers/nvme/target/core.c | 4 ++++ drivers/nvme/target/discovery.c | 2 +- drivers/nvme/target/nvmet.h | 2 +- drivers/nvme/target/rdma.c | 45 ++++++++++++++++++++++++++++------------- 6 files changed, 70 insertions(+), 18 deletions(-)