Message ID | 20190605190836.32354-11-hch@lst.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [01/13] nvme-pci: don't limit DMA segement size | expand |
On 6/5/19 9:08 PM, Christoph Hellwig wrote: > This ensures all proper DMA layer handling is taken care of by the > SCSI midlayer. Note that the effect is global, as the IOMMU merging > is based off a paramters in struct device. We could still turn if off > if no PCIe devices are present, but I don't know how to find that out. > > Also remove the bogus nomerges flag, merges do take the virt_boundary > into account. > > Signed-off-by: Christoph Hellwig <hch@lst.de> > --- > drivers/scsi/megaraid/megaraid_sas_base.c | 46 +++++---------------- > drivers/scsi/megaraid/megaraid_sas_fusion.c | 7 ++++ > 2 files changed, 18 insertions(+), 35 deletions(-) > > diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c b/drivers/scsi/megaraid/megaraid_sas_base.c > index 3dd1df472dc6..20b3b3f8bc16 100644 > --- a/drivers/scsi/megaraid/megaraid_sas_base.c > +++ b/drivers/scsi/megaraid/megaraid_sas_base.c > @@ -1870,39 +1870,6 @@ void megasas_set_dynamic_target_properties(struct scsi_device *sdev, > } > } > > -/* > - * megasas_set_nvme_device_properties - > - * set nomerges=2 > - * set virtual page boundary = 4K (current mr_nvme_pg_size is 4K). > - * set maximum io transfer = MDTS of NVME device provided by MR firmware. > - * > - * MR firmware provides value in KB. Caller of this function converts > - * kb into bytes. > - * > - * e.a MDTS=5 means 2^5 * nvme page size. (In case of 4K page size, > - * MR firmware provides value 128 as (32 * 4K) = 128K. > - * > - * @sdev: scsi device > - * @max_io_size: maximum io transfer size > - * > - */ > -static inline void > -megasas_set_nvme_device_properties(struct scsi_device *sdev, u32 max_io_size) > -{ > - struct megasas_instance *instance; > - u32 mr_nvme_pg_size; > - > - instance = (struct megasas_instance *)sdev->host->hostdata; > - mr_nvme_pg_size = max_t(u32, instance->nvme_page_size, > - MR_DEFAULT_NVME_PAGE_SIZE); > - > - blk_queue_max_hw_sectors(sdev->request_queue, (max_io_size / 512)); > - > - blk_queue_flag_set(QUEUE_FLAG_NOMERGES, sdev->request_queue); > - blk_queue_virt_boundary(sdev->request_queue, mr_nvme_pg_size - 1); > -} > - > - > /* > * megasas_set_static_target_properties - > * Device property set by driver are static and it is not required to be > @@ -1961,8 +1928,10 @@ static void megasas_set_static_target_properties(struct scsi_device *sdev, > max_io_size_kb = le32_to_cpu(instance->tgt_prop->max_io_size_kb); > } > > - if (instance->nvme_page_size && max_io_size_kb) > - megasas_set_nvme_device_properties(sdev, (max_io_size_kb << 10)); > + if (instance->nvme_page_size && max_io_size_kb) { > + blk_queue_max_hw_sectors(sdev->request_queue, > + (max_io_size_kb << 10) / 512); > + } > > scsi_change_queue_depth(sdev, device_qd); > What happened to the NOMERGES queue flag? Cheers, Hannes
On Thu, Jun 06, 2019 at 08:02:07AM +0200, Hannes Reinecke wrote: > > scsi_change_queue_depth(sdev, device_qd); > > > What happened to the NOMERGES queue flag? Quote from the patch description: "Also remove the bogus nomerges flag, merges do take the virt_boundary into account."
> > This ensures all proper DMA layer handling is taken care of by the SCSI > midlayer. Note that the effect is global, as the IOMMU merging is based > off a > paramters in struct device. We could still turn if off if no PCIe devices > are > present, but I don't know how to find that out. > > Also remove the bogus nomerges flag, merges do take the virt_boundary into > account. Hi Christoph, Changes for <megaraid_sas> and <mpt3sas> looks good. We want to confirm few sanity before ACK. BTW, what benefit we will see moving virt_boundry setting to SCSI mid layer ? Is it just modular approach OR any functional fix ? Kashyap
On Thu, Jun 06, 2019 at 09:07:27PM +0530, Kashyap Desai wrote: > Hi Christoph, Changes for <megaraid_sas> and <mpt3sas> looks good. We want > to confirm few sanity before ACK. BTW, what benefit we will see moving > virt_boundry setting to SCSI mid layer ? Is it just modular approach OR any > functional fix ? The big difference is that virt_boundary now also changes the max_segment_size, and this ensures that this limit is also communicated to the DMA mapping layer.
So before I respin this series, can you help with a way to figure out for mpt3sas and megaraid if a given controller supports NVMe devices at all, so that we don't have to set the virt boundary if not?
> > On Thu, Jun 06, 2019 at 09:07:27PM +0530, Kashyap Desai wrote: > > Hi Christoph, Changes for <megaraid_sas> and <mpt3sas> looks good. We > > want to confirm few sanity before ACK. BTW, what benefit we will see > > moving virt_boundry setting to SCSI mid layer ? Is it just modular > > approach OR any functional fix ? > > The big difference is that virt_boundary now also changes the > max_segment_size, and this ensures that this limit is also communicated to > the DMA mapping layer. Is there any changes in API blk_queue_virt_boundary? I could not find relevant code which account for this. Can you help ? Which git repo shall I use for testing ? That way I can confirm, I didn't miss relevant changes. From your above explanation, it means (after this patch) max segment size of the MR controller will be set to 4K. Earlier it is possible to receive single SGE of 64K datalength (Since max seg size was 64K), but now the same buffer will reach the driver having 16 SGEs (Each SGE will contain 4K length). Right ? Kashyap
> > So before I respin this series, can you help with a way to figure out for > mpt3sas and megaraid if a given controller supports NVMe devices at all, so > that we don't have to set the virt boundary if not? In MegaRaid we have below enum - VENTURA_SERIES and AERO_SERIES supports NVME enum MR_ADAPTER_TYPE { MFI_SERIES = 1, THUNDERBOLT_SERIES = 2, INVADER_SERIES = 3, VENTURA_SERIES = 4, AERO_SERIES = 5, }; In mpt3sas driver we have below method - If IOC FACT reports NVME Device support in Protocol Flags, we can consider it as HBA with NVME drive support. ioc->facts.ProtocolFlags & MPI2_IOCFACTS_PROTOCOL_NVME_DEVICES Kashyap
On Fri, Jun 14, 2019 at 01:28:47AM +0530, Kashyap Desai wrote: > Is there any changes in API blk_queue_virt_boundary? I could not find > relevant code which account for this. Can you help ? > Which git repo shall I use for testing ? That way I can confirm, I didn't > miss relevant changes. Latest mainline plus the series (which is about to get resent). blk_queue_virt_boundary now forced an unlimited max_hw_sectors as that is how PRP-like schemes work, to work around a block driver merging bug. But we also need to communicate that limit to the DMA layer so that we don't set a smaller iommu segment size limitation. > >From your above explanation, it means (after this patch) max segment size > of the MR controller will be set to 4K. > Earlier it is possible to receive single SGE of 64K datalength (Since max > seg size was 64K), but now the same buffer will reach the driver having 16 > SGEs (Each SGE will contain 4K length). No, there is no more limit for the size of the segment at all, as for PRPs each PRP is sort of a segment from the hardware perspective. We just require the segments to not have gaps, as PRPs don't allow for that. That being said I think these patches are wrong for the case of megaraid or mpt having both NVMe and SAS/ATA devices behind a single controller. Is that a valid configuration?
> > On Fri, Jun 14, 2019 at 01:28:47AM +0530, Kashyap Desai wrote: > > Is there any changes in API blk_queue_virt_boundary? I could not find > > relevant code which account for this. Can you help ? > > Which git repo shall I use for testing ? That way I can confirm, I > > didn't miss relevant changes. > > Latest mainline plus the series (which is about to get resent). > blk_queue_virt_boundary now forced an unlimited max_hw_sectors as that is > how PRP-like schemes work, to work around a block driver merging bug. But > we also need to communicate that limit to the DMA layer so that we don't set > a smaller iommu segment size limitation. > > > >From your above explanation, it means (after this patch) max segment > > >size > > of the MR controller will be set to 4K. > > Earlier it is possible to receive single SGE of 64K datalength (Since > > max seg size was 64K), but now the same buffer will reach the driver > > having 16 SGEs (Each SGE will contain 4K length). > > No, there is no more limit for the size of the segment at all, as for PRPs each > PRP is sort of a segment from the hardware perspective. > We just require the segments to not have gaps, as PRPs don't allow for that. Thanks for clarification. I have also observed that max_segment_size Is unchanged and it is 64K. > > That being said I think these patches are wrong for the case of megaraid or > mpt having both NVMe and SAS/ATA devices behind a single controller. > Is that a valid configuration? Yes. This is a valid configuration.
diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c b/drivers/scsi/megaraid/megaraid_sas_base.c index 3dd1df472dc6..20b3b3f8bc16 100644 --- a/drivers/scsi/megaraid/megaraid_sas_base.c +++ b/drivers/scsi/megaraid/megaraid_sas_base.c @@ -1870,39 +1870,6 @@ void megasas_set_dynamic_target_properties(struct scsi_device *sdev, } } -/* - * megasas_set_nvme_device_properties - - * set nomerges=2 - * set virtual page boundary = 4K (current mr_nvme_pg_size is 4K). - * set maximum io transfer = MDTS of NVME device provided by MR firmware. - * - * MR firmware provides value in KB. Caller of this function converts - * kb into bytes. - * - * e.a MDTS=5 means 2^5 * nvme page size. (In case of 4K page size, - * MR firmware provides value 128 as (32 * 4K) = 128K. - * - * @sdev: scsi device - * @max_io_size: maximum io transfer size - * - */ -static inline void -megasas_set_nvme_device_properties(struct scsi_device *sdev, u32 max_io_size) -{ - struct megasas_instance *instance; - u32 mr_nvme_pg_size; - - instance = (struct megasas_instance *)sdev->host->hostdata; - mr_nvme_pg_size = max_t(u32, instance->nvme_page_size, - MR_DEFAULT_NVME_PAGE_SIZE); - - blk_queue_max_hw_sectors(sdev->request_queue, (max_io_size / 512)); - - blk_queue_flag_set(QUEUE_FLAG_NOMERGES, sdev->request_queue); - blk_queue_virt_boundary(sdev->request_queue, mr_nvme_pg_size - 1); -} - - /* * megasas_set_static_target_properties - * Device property set by driver are static and it is not required to be @@ -1961,8 +1928,10 @@ static void megasas_set_static_target_properties(struct scsi_device *sdev, max_io_size_kb = le32_to_cpu(instance->tgt_prop->max_io_size_kb); } - if (instance->nvme_page_size && max_io_size_kb) - megasas_set_nvme_device_properties(sdev, (max_io_size_kb << 10)); + if (instance->nvme_page_size && max_io_size_kb) { + blk_queue_max_hw_sectors(sdev->request_queue, + (max_io_size_kb << 10) / 512); + } scsi_change_queue_depth(sdev, device_qd); @@ -6258,6 +6227,7 @@ static int megasas_start_aen(struct megasas_instance *instance) static int megasas_io_attach(struct megasas_instance *instance) { struct Scsi_Host *host = instance->host; + u32 nvme_page_size = instance->nvme_page_size; /* * Export parameters required by SCSI mid-layer @@ -6298,6 +6268,12 @@ static int megasas_io_attach(struct megasas_instance *instance) host->max_lun = MEGASAS_MAX_LUN; host->max_cmd_len = 16; + if (nvme_page_size) { + if (nvme_page_size > MR_DEFAULT_NVME_PAGE_SIZE) + nvme_page_size = MR_DEFAULT_NVME_PAGE_SIZE; + host->virt_boundary_mask = nvme_page_size - 1; + } + /* * Notify the mid-layer about the new controller */ diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.c b/drivers/scsi/megaraid/megaraid_sas_fusion.c index 4dfa0685a86c..a9ff3a648e7b 100644 --- a/drivers/scsi/megaraid/megaraid_sas_fusion.c +++ b/drivers/scsi/megaraid/megaraid_sas_fusion.c @@ -1935,6 +1935,13 @@ megasas_is_prp_possible(struct megasas_instance *instance, build_prp = true; } +/* + * XXX: All the code following should go away. The block layer guarantees + * merging according to the virt boundary. And while we might have had some + * issues with that in the past we fixed them, and any new bug should be fixed + * in the core code as well. + */ + /* * Below code detects gaps/holes in IO data buffers. * What does holes/gaps mean?
This ensures all proper DMA layer handling is taken care of by the SCSI midlayer. Note that the effect is global, as the IOMMU merging is based off a paramters in struct device. We could still turn if off if no PCIe devices are present, but I don't know how to find that out. Also remove the bogus nomerges flag, merges do take the virt_boundary into account. Signed-off-by: Christoph Hellwig <hch@lst.de> --- drivers/scsi/megaraid/megaraid_sas_base.c | 46 +++++---------------- drivers/scsi/megaraid/megaraid_sas_fusion.c | 7 ++++ 2 files changed, 18 insertions(+), 35 deletions(-)