From patchwork Mon May 6 13:53:32 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shiraz Saleem X-Patchwork-Id: 10931091 X-Patchwork-Delegate: jgg@ziepe.ca Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7A2B116C1 for ; Mon, 6 May 2019 13:55:32 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6686C2864E for ; Mon, 6 May 2019 13:55:32 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 64DA3286F2; Mon, 6 May 2019 13:55:32 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 472A4286C0 for ; Mon, 6 May 2019 13:54:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726095AbfEFNyC (ORCPT ); Mon, 6 May 2019 09:54:02 -0400 Received: from mga12.intel.com ([192.55.52.136]:48135 "EHLO mga12.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726229AbfEFNyC (ORCPT ); Mon, 6 May 2019 09:54:02 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga106.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 06 May 2019 06:54:01 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.60,438,1549958400"; d="scan'208";a="344291811" Received: from ssaleem-mobl4.amr.corp.intel.com ([10.255.35.243]) by fmsmga006.fm.intel.com with ESMTP; 06 May 2019 06:54:00 -0700 From: Shiraz Saleem To: dledford@redhat.com, jgg@ziepe.ca Cc: linux-rdma@vger.kernel.org, Shiraz Saleem , Gal Pressman Subject: [PATCH v3 rdma-next 1/6] RDMA/umem: Add API to find best driver supported page size in an MR Date: Mon, 6 May 2019 08:53:32 -0500 Message-Id: <20190506135337.11324-2-shiraz.saleem@intel.com> X-Mailer: git-send-email 2.8.3 In-Reply-To: <20190506135337.11324-1-shiraz.saleem@intel.com> References: <20190506135337.11324-1-shiraz.saleem@intel.com> Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This helper iterates through the SG list to find the best page size to use from a bitmap of HW supported page sizes. Drivers that support multiple page sizes, but not mixed sizes in an MR can use this API. Suggested-by: Jason Gunthorpe Cc: Gal Pressman Signed-off-by: Shiraz Saleem --- drivers/infiniband/core/umem.c | 51 ++++++++++++++++++++++++++++++++++++++++++ include/rdma/ib_umem.h | 9 ++++++++ include/rdma/ib_verbs.h | 24 ++++++++++++++++++++ 3 files changed, 84 insertions(+) diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c index 7e912a9..2534ddd 100644 --- a/drivers/infiniband/core/umem.c +++ b/drivers/infiniband/core/umem.c @@ -127,6 +127,57 @@ static struct scatterlist *ib_umem_add_sg_table(struct scatterlist *sg, } /** + * ib_umem_find_best_pgsz - Find best HW page size to use for this MR + * + * @umem: umem struct + * @pgsz_bitmap: bitmap of HW supported page sizes + * @virt: IOVA + * + * This helper is intended for HW that support multiple page + * sizes but can do only a single page size in an MR. + * + * Returns 0 if the umem requires page sizes not supported by + * the driver to be mapped. Drivers always supporting PAGE_SIZE + * or smaller will never see a 0 result. + */ +unsigned long ib_umem_find_best_pgsz(struct ib_umem *umem, + unsigned long pgsz_bitmap, + unsigned long virt) +{ + struct scatterlist *sg; + unsigned int best_pg_bit; + unsigned long va, pgoff; + dma_addr_t mask; + int i; + + /* At minimum, drivers must support PAGE_SIZE or smaller */ + if (WARN_ON(!(pgsz_bitmap & GENMASK(PAGE_SHIFT, 0)))) + return 0; + + va = virt; + /* max page size not to exceed MR length */ + mask = roundup_pow_of_two(umem->length); + /* offset into first SGL */ + pgoff = umem->address & ~PAGE_MASK; + + for_each_sg(umem->sg_head.sgl, sg, umem->nmap, i) { + /* Walk SGL and reduce max page size if VA/PA bits differ + * for any address. + */ + mask |= (sg_dma_address(sg) + pgoff) ^ va; + if (i && i != (umem->nmap - 1)) + /* restrict by length as well for interior SGEs */ + mask |= sg_dma_len(sg); + va += sg_dma_len(sg) - pgoff; + pgoff = 0; + } + best_pg_bit = rdma_find_pg_bit(mask, pgsz_bitmap); + + return BIT_ULL(best_pg_bit); +} +EXPORT_SYMBOL(ib_umem_find_best_pgsz); + +/** * ib_umem_get - Pin and DMA map userspace memory. * * If access flags indicate ODP memory, avoid pinning. Instead, stores diff --git a/include/rdma/ib_umem.h b/include/rdma/ib_umem.h index b13a2e9..917b687 100644 --- a/include/rdma/ib_umem.h +++ b/include/rdma/ib_umem.h @@ -87,6 +87,9 @@ struct ib_umem *ib_umem_get(struct ib_udata *udata, unsigned long addr, int ib_umem_page_count(struct ib_umem *umem); int ib_umem_copy_from(void *dst, struct ib_umem *umem, size_t offset, size_t length); +unsigned long ib_umem_find_best_pgsz(struct ib_umem *umem, + unsigned long pgsz_bitmap, + unsigned long virt); #else /* CONFIG_INFINIBAND_USER_MEM */ @@ -104,6 +107,12 @@ static inline int ib_umem_copy_from(void *dst, struct ib_umem *umem, size_t offs size_t length) { return -EINVAL; } +static inline int ib_umem_find_best_pgsz(struct ib_umem *umem, + unsigned long pgsz_bitmap, + unsigned long virt) { + return -EINVAL; +} + #endif /* CONFIG_INFINIBAND_USER_MEM */ #endif /* IB_UMEM_H */ diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index de8724e..5391c24 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -3235,6 +3235,30 @@ static inline bool rdma_cap_read_inv(struct ib_device *dev, u32 port_num) return rdma_protocol_iwarp(dev, port_num); } +/** + * rdma_find_pg_bit - Find page bit given address and HW supported page sizes + * + * @addr: address + * @pgsz_bitmap: bitmap of HW supported page sizes + */ +static inline unsigned int rdma_find_pg_bit(unsigned long addr, + unsigned long pgsz_bitmap) +{ + unsigned long align; + unsigned long pgsz; + + align = addr & -addr; + + /* Find page bit such that addr is aligned to the highest supported + * HW page size + */ + pgsz = pgsz_bitmap & ~(-align << 1); + if (!pgsz) + return __ffs(pgsz_bitmap); + + return __fls(pgsz); +} + int ib_set_vf_link_state(struct ib_device *device, int vf, u8 port, int state); int ib_get_vf_config(struct ib_device *device, int vf, u8 port, From patchwork Mon May 6 13:53:33 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shiraz Saleem X-Patchwork-Id: 10931093 X-Patchwork-Delegate: jgg@ziepe.ca Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9A7A515A6 for ; Mon, 6 May 2019 13:55:32 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8971C2864E for ; Mon, 6 May 2019 13:55:32 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 87FA3286C0; Mon, 6 May 2019 13:55:32 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 280C8287AE for ; Mon, 6 May 2019 13:54:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726272AbfEFNyD (ORCPT ); Mon, 6 May 2019 09:54:03 -0400 Received: from mga12.intel.com ([192.55.52.136]:48135 "EHLO mga12.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726229AbfEFNyD (ORCPT ); Mon, 6 May 2019 09:54:03 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga106.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 06 May 2019 06:54:02 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.60,438,1549958400"; d="scan'208";a="344291820" Received: from ssaleem-mobl4.amr.corp.intel.com ([10.255.35.243]) by fmsmga006.fm.intel.com with ESMTP; 06 May 2019 06:54:01 -0700 From: Shiraz Saleem To: dledford@redhat.com, jgg@ziepe.ca Cc: linux-rdma@vger.kernel.org, Shiraz Saleem , Gal Pressman Subject: [PATCH v3 rdma-next 2/6] RDMA/verbs: Add a DMA iterator to return aligned contiguous memory blocks Date: Mon, 6 May 2019 08:53:33 -0500 Message-Id: <20190506135337.11324-3-shiraz.saleem@intel.com> X-Mailer: git-send-email 2.8.3 In-Reply-To: <20190506135337.11324-1-shiraz.saleem@intel.com> References: <20190506135337.11324-1-shiraz.saleem@intel.com> Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This helper iterates over a DMA-mapped SGL and returns contiguous memory blocks aligned to a HW supported page size. Suggested-by: Jason Gunthorpe Cc: Gal Pressman Signed-off-by: Shiraz Saleem --- drivers/infiniband/core/verbs.c | 34 +++++++++++++++++++++++++++++ include/rdma/ib_verbs.h | 47 +++++++++++++++++++++++++++++++++++++++++ 2 files changed, 81 insertions(+) diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c index 7313edc..3806038 100644 --- a/drivers/infiniband/core/verbs.c +++ b/drivers/infiniband/core/verbs.c @@ -2711,3 +2711,37 @@ int rdma_init_netdev(struct ib_device *device, u8 port_num, netdev, params.param); } EXPORT_SYMBOL(rdma_init_netdev); + +void __rdma_block_iter_start(struct ib_block_iter *biter, + struct scatterlist *sglist, unsigned int nents, + unsigned long pgsz) +{ + memset(biter, 0, sizeof(struct ib_block_iter)); + biter->__sg = sglist; + biter->__sg_nents = nents; + + /* Driver provides best block size to use */ + biter->__pg_bit = __fls(pgsz); +} +EXPORT_SYMBOL(__rdma_block_iter_start); + +bool __rdma_block_iter_next(struct ib_block_iter *biter) +{ + unsigned int block_offset; + + if (!biter->__sg_nents || !biter->__sg) + return false; + + biter->__dma_addr = sg_dma_address(biter->__sg) + biter->__sg_advance; + block_offset = biter->__dma_addr & (BIT_ULL(biter->__pg_bit) - 1); + biter->__sg_advance += BIT_ULL(biter->__pg_bit) - block_offset; + + if (biter->__sg_advance >= sg_dma_len(biter->__sg)) { + biter->__sg_advance = 0; + biter->__sg = sg_next(biter->__sg); + biter->__sg_nents--; + } + + return true; +} +EXPORT_SYMBOL(__rdma_block_iter_next); diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 5391c24..8a5ed04 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -2711,6 +2711,21 @@ struct ib_client { u8 no_kverbs_req:1; }; +/* + * IB block DMA iterator + * + * Iterates the DMA-mapped SGL in contiguous memory blocks aligned + * to a HW supported page size. + */ +struct ib_block_iter { + /* internal states */ + struct scatterlist *__sg; /* sg holding the current aligned block */ + dma_addr_t __dma_addr; /* unaligned DMA address of this block */ + unsigned int __sg_nents; /* number of SG entries */ + unsigned int __sg_advance; /* number of bytes to advance in sg in next step */ + unsigned int __pg_bit; /* alignment of current block */ +}; + struct ib_device *_ib_alloc_device(size_t size); #define ib_alloc_device(drv_struct, member) \ container_of(_ib_alloc_device(sizeof(struct drv_struct) + \ @@ -2731,6 +2746,38 @@ struct ib_client { int ib_register_client (struct ib_client *client); void ib_unregister_client(struct ib_client *client); +void __rdma_block_iter_start(struct ib_block_iter *biter, + struct scatterlist *sglist, + unsigned int nents, + unsigned long pgsz); +bool __rdma_block_iter_next(struct ib_block_iter *biter); + +/** + * rdma_block_iter_dma_address - get the aligned dma address of the current + * block held by the block iterator. + * @biter: block iterator holding the memory block + */ +static inline dma_addr_t +rdma_block_iter_dma_address(struct ib_block_iter *biter) +{ + return biter->__dma_addr & ~(BIT_ULL(biter->__pg_bit) - 1); +} + +/** + * rdma_for_each_block - iterate over contiguous memory blocks of the sg list + * @sglist: sglist to iterate over + * @biter: block iterator holding the memory block + * @nents: maximum number of sg entries to iterate over + * @pgsz: best HW supported page size to use + * + * Callers may use rdma_block_iter_dma_address() to get each + * blocks aligned DMA address. + */ +#define rdma_for_each_block(sglist, biter, nents, pgsz) \ + for (__rdma_block_iter_start(biter, sglist, nents, \ + pgsz); \ + __rdma_block_iter_next(biter);) + /** * ib_get_client_data - Get IB client context * @device:Device to get context for From patchwork Mon May 6 13:53:34 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shiraz Saleem X-Patchwork-Id: 10931101 X-Patchwork-Delegate: jgg@ziepe.ca Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BFD4415A6 for ; Mon, 6 May 2019 13:55:36 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id AFEDD2864E for ; Mon, 6 May 2019 13:55:36 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id A46E428701; Mon, 6 May 2019 13:55:36 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 64A9728833 for ; Mon, 6 May 2019 13:54:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726229AbfEFNyD (ORCPT ); Mon, 6 May 2019 09:54:03 -0400 Received: from mga12.intel.com ([192.55.52.136]:48135 "EHLO mga12.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726265AbfEFNyD (ORCPT ); Mon, 6 May 2019 09:54:03 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga106.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 06 May 2019 06:54:03 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.60,438,1549958400"; d="scan'208";a="344291828" Received: from ssaleem-mobl4.amr.corp.intel.com ([10.255.35.243]) by fmsmga006.fm.intel.com with ESMTP; 06 May 2019 06:54:02 -0700 From: Shiraz Saleem To: dledford@redhat.com, jgg@ziepe.ca Cc: linux-rdma@vger.kernel.org, Shiraz Saleem Subject: [PATCH v3 rdma-next 3/6] RDMA/i40iw: Use core helpers to get aligned DMA address within a supported page size Date: Mon, 6 May 2019 08:53:34 -0500 Message-Id: <20190506135337.11324-4-shiraz.saleem@intel.com> X-Mailer: git-send-email 2.8.3 In-Reply-To: <20190506135337.11324-1-shiraz.saleem@intel.com> References: <20190506135337.11324-1-shiraz.saleem@intel.com> Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Call the core helpers to retrieve the HW aligned address to use for the MR, within a supported i40iw page size. Remove code in i40iw to determine when MR is backed by 2M huge pages which involves checking the umem->hugetlb flag and VMA inspection. The new DMA iterator will return the 2M aligned address if the MR is backed by 2M pages. Fixes: f26c7c83395b ("i40iw: Add 2MB page support") Reviewed-by: Michael J. Ruhl Signed-off-by: Shiraz Saleem --- drivers/infiniband/hw/i40iw/i40iw_verbs.c | 46 +++++-------------------------- drivers/infiniband/hw/i40iw/i40iw_verbs.h | 3 +- 2 files changed, 8 insertions(+), 41 deletions(-) diff --git a/drivers/infiniband/hw/i40iw/i40iw_verbs.c b/drivers/infiniband/hw/i40iw/i40iw_verbs.c index 7bf7fe8..43579cd 100644 --- a/drivers/infiniband/hw/i40iw/i40iw_verbs.c +++ b/drivers/infiniband/hw/i40iw/i40iw_verbs.c @@ -1338,53 +1338,22 @@ static void i40iw_copy_user_pgaddrs(struct i40iw_mr *iwmr, struct i40iw_pbl *iwpbl = &iwmr->iwpbl; struct i40iw_pble_alloc *palloc = &iwpbl->pble_alloc; struct i40iw_pble_info *pinfo; - struct sg_dma_page_iter sg_iter; - u64 pg_addr = 0; + struct ib_block_iter biter; u32 idx = 0; - bool first_pg = true; pinfo = (level == I40IW_LEVEL_1) ? NULL : palloc->level2.leaf; if (iwmr->type == IW_MEMREG_TYPE_QP) iwpbl->qp_mr.sq_page = sg_page(region->sg_head.sgl); - for_each_sg_dma_page (region->sg_head.sgl, &sg_iter, region->nmap, 0) { - pg_addr = sg_page_iter_dma_address(&sg_iter); - if (first_pg) - *pbl = cpu_to_le64(pg_addr & iwmr->page_msk); - else if (!(pg_addr & ~iwmr->page_msk)) - *pbl = cpu_to_le64(pg_addr); - else - continue; - - first_pg = false; + rdma_for_each_block(region->sg_head.sgl, &biter, region->nmap, + iwmr->page_size) { + *pbl = rdma_block_iter_dma_address(&biter); pbl = i40iw_next_pbl_addr(pbl, &pinfo, &idx); } } /** - * i40iw_set_hugetlb_params - set MR pg size and mask to huge pg values. - * @addr: virtual address - * @iwmr: mr pointer for this memory registration - */ -static void i40iw_set_hugetlb_values(u64 addr, struct i40iw_mr *iwmr) -{ - struct vm_area_struct *vma; - struct hstate *h; - - down_read(¤t->mm->mmap_sem); - vma = find_vma(current->mm, addr); - if (vma && is_vm_hugetlb_page(vma)) { - h = hstate_vma(vma); - if (huge_page_size(h) == 0x200000) { - iwmr->page_size = huge_page_size(h); - iwmr->page_msk = huge_page_mask(h); - } - } - up_read(¤t->mm->mmap_sem); -} - -/** * i40iw_check_mem_contiguous - check if pbls stored in arr are contiguous * @arr: lvl1 pbl array * @npages: page count @@ -1839,10 +1808,9 @@ static struct ib_mr *i40iw_reg_user_mr(struct ib_pd *pd, iwmr->ibmr.device = pd->device; iwmr->page_size = PAGE_SIZE; - iwmr->page_msk = PAGE_MASK; - - if (region->hugetlb && (req.reg_type == IW_MEMREG_TYPE_MEM)) - i40iw_set_hugetlb_values(start, iwmr); + if (req.reg_type == IW_MEMREG_TYPE_MEM) + iwmr->page_size = ib_umem_find_best_pgsz(region, SZ_4K | SZ_2M, + virt); region_length = region->length + (start & (iwmr->page_size - 1)); pg_shift = ffs(iwmr->page_size) - 1; diff --git a/drivers/infiniband/hw/i40iw/i40iw_verbs.h b/drivers/infiniband/hw/i40iw/i40iw_verbs.h index 76cf173..3a41375 100644 --- a/drivers/infiniband/hw/i40iw/i40iw_verbs.h +++ b/drivers/infiniband/hw/i40iw/i40iw_verbs.h @@ -94,8 +94,7 @@ struct i40iw_mr { struct ib_umem *region; u16 type; u32 page_cnt; - u32 page_size; - u64 page_msk; + u64 page_size; u32 npages; u32 stag; u64 length; From patchwork Mon May 6 13:53:35 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shiraz Saleem X-Patchwork-Id: 10931095 X-Patchwork-Delegate: jgg@ziepe.ca Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2F14D15A6 for ; Mon, 6 May 2019 13:55:34 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1AA33286AE for ; Mon, 6 May 2019 13:55:34 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 192BC28762; Mon, 6 May 2019 13:55:34 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id AD2D428746 for ; Mon, 6 May 2019 13:54:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726220AbfEFNyF (ORCPT ); Mon, 6 May 2019 09:54:05 -0400 Received: from mga12.intel.com ([192.55.52.136]:48135 "EHLO mga12.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726046AbfEFNyE (ORCPT ); Mon, 6 May 2019 09:54:04 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga106.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 06 May 2019 06:54:04 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.60,438,1549958400"; d="scan'208";a="344291839" Received: from ssaleem-mobl4.amr.corp.intel.com ([10.255.35.243]) by fmsmga006.fm.intel.com with ESMTP; 06 May 2019 06:54:03 -0700 From: Shiraz Saleem To: dledford@redhat.com, jgg@ziepe.ca Cc: linux-rdma@vger.kernel.org, Shiraz Saleem , Selvin Xavier , Devesh Sharma Subject: [PATCH v3 rdma-next 4/6] RDMA/bnxt_re: Use core helpers to get aligned DMA address Date: Mon, 6 May 2019 08:53:35 -0500 Message-Id: <20190506135337.11324-5-shiraz.saleem@intel.com> X-Mailer: git-send-email 2.8.3 In-Reply-To: <20190506135337.11324-1-shiraz.saleem@intel.com> References: <20190506135337.11324-1-shiraz.saleem@intel.com> Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Call the core helpers to retrieve the HW aligned address to use for the MR, within a supported bnxt_re page size. Remove checking the umem->hugtetlb flag as it is no longer required. The new DMA block iterator will return the 2M aligned address if the MR is backed by 2M huge pages. Cc: Selvin Xavier Cc: Devesh Sharma Acked-by: Selvin Xavier Signed-off-by: Shiraz Saleem --- drivers/infiniband/hw/bnxt_re/ib_verbs.c | 27 ++++++++++----------------- 1 file changed, 10 insertions(+), 17 deletions(-) diff --git a/drivers/infiniband/hw/bnxt_re/ib_verbs.c b/drivers/infiniband/hw/bnxt_re/ib_verbs.c index 3fcc77c..311d541 100644 --- a/drivers/infiniband/hw/bnxt_re/ib_verbs.c +++ b/drivers/infiniband/hw/bnxt_re/ib_verbs.c @@ -3501,17 +3501,12 @@ static int fill_umem_pbl_tbl(struct ib_umem *umem, u64 *pbl_tbl_orig, int page_shift) { u64 *pbl_tbl = pbl_tbl_orig; - u64 paddr; - u64 page_mask = (1ULL << page_shift) - 1; - struct sg_dma_page_iter sg_iter; + u64 page_size = BIT_ULL(page_shift); + struct ib_block_iter biter; + + rdma_for_each_block(umem->sg_head.sgl, &biter, umem->nmap, page_size) + *pbl_tbl++ = rdma_block_iter_dma_address(&biter); - for_each_sg_dma_page (umem->sg_head.sgl, &sg_iter, umem->nmap, 0) { - paddr = sg_page_iter_dma_address(&sg_iter); - if (pbl_tbl == pbl_tbl_orig) - *pbl_tbl++ = paddr & ~page_mask; - else if ((paddr & page_mask) == 0) - *pbl_tbl++ = paddr; - } return pbl_tbl - pbl_tbl_orig; } @@ -3573,7 +3568,9 @@ struct ib_mr *bnxt_re_reg_user_mr(struct ib_pd *ib_pd, u64 start, u64 length, goto free_umem; } - page_shift = PAGE_SHIFT; + page_shift = __ffs(ib_umem_find_best_pgsz(umem, + BNXT_RE_PAGE_SIZE_4K | BNXT_RE_PAGE_SIZE_2M, + virt_addr)); if (!bnxt_re_page_size_ok(page_shift)) { dev_err(rdev_to_dev(rdev), "umem page size unsupported!"); @@ -3581,17 +3578,13 @@ struct ib_mr *bnxt_re_reg_user_mr(struct ib_pd *ib_pd, u64 start, u64 length, goto fail; } - if (!umem->hugetlb && length > BNXT_RE_MAX_MR_SIZE_LOW) { + if (page_shift == BNXT_RE_PAGE_SHIFT_4K && + length > BNXT_RE_MAX_MR_SIZE_LOW) { dev_err(rdev_to_dev(rdev), "Requested MR Sz:%llu Max sup:%llu", length, (u64)BNXT_RE_MAX_MR_SIZE_LOW); rc = -EINVAL; goto fail; } - if (umem->hugetlb && length > BNXT_RE_PAGE_SIZE_2M) { - page_shift = BNXT_RE_PAGE_SHIFT_2M; - dev_warn(rdev_to_dev(rdev), "umem hugetlb set page_size %x", - 1 << page_shift); - } /* Map umem buf ptrs to the PBL */ umem_pgs = fill_umem_pbl_tbl(umem, pbl_tbl, page_shift); From patchwork Mon May 6 13:53:36 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shiraz Saleem X-Patchwork-Id: 10931099 X-Patchwork-Delegate: jgg@ziepe.ca Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4D5CC92A for ; Mon, 6 May 2019 13:55:35 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 38DAE28703 for ; Mon, 6 May 2019 13:55:35 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 341EC287B8; Mon, 6 May 2019 13:55:35 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id F2807288D3 for ; Mon, 6 May 2019 13:54:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726046AbfEFNyF (ORCPT ); Mon, 6 May 2019 09:54:05 -0400 Received: from mga12.intel.com ([192.55.52.136]:48135 "EHLO mga12.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726190AbfEFNyF (ORCPT ); Mon, 6 May 2019 09:54:05 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga106.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 06 May 2019 06:54:04 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.60,438,1549958400"; d="scan'208";a="344291844" Received: from ssaleem-mobl4.amr.corp.intel.com ([10.255.35.243]) by fmsmga006.fm.intel.com with ESMTP; 06 May 2019 06:54:03 -0700 From: Shiraz Saleem To: dledford@redhat.com, jgg@ziepe.ca Cc: linux-rdma@vger.kernel.org, Shiraz Saleem Subject: [PATCH v3 rdma-next 5/6] RDMA/umem: Remove hugetlb flag Date: Mon, 6 May 2019 08:53:36 -0500 Message-Id: <20190506135337.11324-6-shiraz.saleem@intel.com> X-Mailer: git-send-email 2.8.3 In-Reply-To: <20190506135337.11324-1-shiraz.saleem@intel.com> References: <20190506135337.11324-1-shiraz.saleem@intel.com> Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP The drivers i40iw and bnxt_re no longer dependent on the hugetlb flag. So remove this flag from ib_umem structure. Reviewed-by: Michael J. Ruhl Signed-off-by: Shiraz Saleem --- drivers/infiniband/core/umem.c | 26 +------------------------- drivers/infiniband/core/umem_odp.c | 3 --- include/rdma/ib_umem.h | 1 - 3 files changed, 1 insertion(+), 29 deletions(-) diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c index 2534ddd..13441b2 100644 --- a/drivers/infiniband/core/umem.c +++ b/drivers/infiniband/core/umem.c @@ -37,7 +37,6 @@ #include #include #include -#include #include #include #include @@ -195,14 +194,12 @@ struct ib_umem *ib_umem_get(struct ib_udata *udata, unsigned long addr, struct ib_ucontext *context; struct ib_umem *umem; struct page **page_list; - struct vm_area_struct **vma_list; unsigned long lock_limit; unsigned long new_pinned; unsigned long cur_base; struct mm_struct *mm; unsigned long npages; int ret; - int i; unsigned long dma_attrs = 0; struct scatterlist *sg; unsigned int gup_flags = FOLL_WRITE; @@ -260,23 +257,12 @@ struct ib_umem *ib_umem_get(struct ib_udata *udata, unsigned long addr, return umem; } - /* We assume the memory is from hugetlb until proved otherwise */ - umem->hugetlb = 1; - page_list = (struct page **) __get_free_page(GFP_KERNEL); if (!page_list) { ret = -ENOMEM; goto umem_kfree; } - /* - * if we can't alloc the vma_list, it's not so bad; - * just assume the memory is not hugetlb memory - */ - vma_list = (struct vm_area_struct **) __get_free_page(GFP_KERNEL); - if (!vma_list) - umem->hugetlb = 0; - npages = ib_umem_num_pages(umem); if (npages == 0 || npages > UINT_MAX) { ret = -EINVAL; @@ -308,7 +294,7 @@ struct ib_umem *ib_umem_get(struct ib_udata *udata, unsigned long addr, ret = get_user_pages_longterm(cur_base, min_t(unsigned long, npages, PAGE_SIZE / sizeof (struct page *)), - gup_flags, page_list, vma_list); + gup_flags, page_list, NULL); if (ret < 0) { up_read(&mm->mmap_sem); goto umem_release; @@ -321,14 +307,6 @@ struct ib_umem *ib_umem_get(struct ib_udata *udata, unsigned long addr, dma_get_max_seg_size(context->device->dma_device), &umem->sg_nents); - /* Continue to hold the mmap_sem as vma_list access - * needs to be protected. - */ - for (i = 0; i < ret && umem->hugetlb; i++) { - if (vma_list && !is_vm_hugetlb_page(vma_list[i])) - umem->hugetlb = 0; - } - up_read(&mm->mmap_sem); } @@ -353,8 +331,6 @@ struct ib_umem *ib_umem_get(struct ib_udata *udata, unsigned long addr, vma: atomic64_sub(ib_umem_num_pages(umem), &mm->pinned_vm); out: - if (vma_list) - free_page((unsigned long) vma_list); free_page((unsigned long) page_list); umem_kfree: if (ret) { diff --git a/drivers/infiniband/core/umem_odp.c b/drivers/infiniband/core/umem_odp.c index 9721914..c7226cf 100644 --- a/drivers/infiniband/core/umem_odp.c +++ b/drivers/infiniband/core/umem_odp.c @@ -417,9 +417,6 @@ int ib_umem_odp_get(struct ib_umem_odp *umem_odp, int access) h = hstate_vma(vma); umem->page_shift = huge_page_shift(h); up_read(&mm->mmap_sem); - umem->hugetlb = 1; - } else { - umem->hugetlb = 0; } mutex_init(&umem_odp->umem_mutex); diff --git a/include/rdma/ib_umem.h b/include/rdma/ib_umem.h index 917b687..040d853 100644 --- a/include/rdma/ib_umem.h +++ b/include/rdma/ib_umem.h @@ -48,7 +48,6 @@ struct ib_umem { unsigned long address; int page_shift; u32 writable : 1; - u32 hugetlb : 1; u32 is_odp : 1; struct work_struct work; struct sg_table sg_head; From patchwork Mon May 6 13:53:37 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shiraz Saleem X-Patchwork-Id: 10931097 X-Patchwork-Delegate: jgg@ziepe.ca Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8917616C1 for ; Mon, 6 May 2019 13:55:34 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 75CEF28770 for ; Mon, 6 May 2019 13:55:34 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 7183228478; Mon, 6 May 2019 13:55:34 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9F12B288F3 for ; Mon, 6 May 2019 13:54:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726283AbfEFNyF (ORCPT ); Mon, 6 May 2019 09:54:05 -0400 Received: from mga12.intel.com ([192.55.52.136]:48135 "EHLO mga12.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726190AbfEFNyF (ORCPT ); Mon, 6 May 2019 09:54:05 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga106.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 06 May 2019 06:54:05 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.60,438,1549958400"; d="scan'208";a="344291848" Received: from ssaleem-mobl4.amr.corp.intel.com ([10.255.35.243]) by fmsmga006.fm.intel.com with ESMTP; 06 May 2019 06:54:04 -0700 From: Shiraz Saleem To: dledford@redhat.com, jgg@ziepe.ca Cc: linux-rdma@vger.kernel.org, Shiraz Saleem Subject: [PATCH v3 rdma-next 6/6] RDMA/verbs: Extend DMA block iterator support for mixed block sizes Date: Mon, 6 May 2019 08:53:37 -0500 Message-Id: <20190506135337.11324-7-shiraz.saleem@intel.com> X-Mailer: git-send-email 2.8.3 In-Reply-To: <20190506135337.11324-1-shiraz.saleem@intel.com> References: <20190506135337.11324-1-shiraz.saleem@intel.com> Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Extend the DMA block iterator for HW that can support mixed block sizes. A bitmap of HW supported page sizes are provided to block iterator which returns contiguous aligned memory blocks within a HW supported page size. Signed-off-by: Shiraz Saleem --- drivers/infiniband/core/verbs.c | 38 ++++++++++++++++++++++++++++++++++++-- include/rdma/ib_verbs.h | 18 ++++++++++++++---- 2 files changed, 50 insertions(+), 6 deletions(-) diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c index 3806038..fa9725d 100644 --- a/drivers/infiniband/core/verbs.c +++ b/drivers/infiniband/core/verbs.c @@ -2712,16 +2712,47 @@ int rdma_init_netdev(struct ib_device *device, u8 port_num, } EXPORT_SYMBOL(rdma_init_netdev); +static unsigned int rdma_find_mixed_pg_bit(struct ib_block_iter *biter) +{ + if (biter->__sg == biter->__sgl_head) { + return rdma_find_pg_bit(sg_dma_address(biter->__sg) + + sg_dma_len(biter->__sg), + biter->pgsz_bitmap); + } else if (sg_is_last(biter->__sg)) { + return rdma_find_pg_bit(sg_dma_address(biter->__sg), + biter->pgsz_bitmap); + } else { + unsigned int remaining = + sg_dma_address(biter->__sg) + sg_dma_len(biter->__sg) - + biter->__dma_addr; + unsigned int pg_bit = rdma_find_pg_bit(biter->__dma_addr, + biter->pgsz_bitmap); + if (remaining < BIT_ULL(biter->__pg_bit)) + pg_bit = rdma_find_pg_bit(remaining, + biter->pgsz_bitmap); + + return pg_bit; + } +} + void __rdma_block_iter_start(struct ib_block_iter *biter, struct scatterlist *sglist, unsigned int nents, - unsigned long pgsz) + unsigned long pgsz_bitmap) { memset(biter, 0, sizeof(struct ib_block_iter)); biter->__sg = sglist; + biter->pgsz_bitmap = pgsz_bitmap; biter->__sg_nents = nents; /* Driver provides best block size to use */ - biter->__pg_bit = __fls(pgsz); + if (hweight_long(pgsz_bitmap) == 1) { + biter->__pg_bit = __fls(pgsz_bitmap); + } else { + /* mixed block size support. compute best block size to use */ + WARN_ON(!(pgsz_bitmap & GENMASK(PAGE_SHIFT, 0))); + biter->__sgl_head = &sglist[0]; + biter->__mixed = true; + } } EXPORT_SYMBOL(__rdma_block_iter_start); @@ -2733,6 +2764,9 @@ bool __rdma_block_iter_next(struct ib_block_iter *biter) return false; biter->__dma_addr = sg_dma_address(biter->__sg) + biter->__sg_advance; + if (biter->__mixed) + biter->__pg_bit = rdma_find_mixed_pg_bit(biter); + block_offset = biter->__dma_addr & (BIT_ULL(biter->__pg_bit) - 1); biter->__sg_advance += BIT_ULL(biter->__pg_bit) - block_offset; diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 8a5ed04..1d8725a 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -2718,12 +2718,22 @@ struct ib_client { * to a HW supported page size. */ struct ib_block_iter { + unsigned long pgsz_bitmap; /* bitmap of supported HW page sizes. + * HW that can handle only blocks of a + * single page size must just provide + * the best page size to use in pgsz_bitmap + */ + /* internal states */ struct scatterlist *__sg; /* sg holding the current aligned block */ + struct scatterlist *__sgl_head; /* scatterlist head */ dma_addr_t __dma_addr; /* unaligned DMA address of this block */ unsigned int __sg_nents; /* number of SG entries */ unsigned int __sg_advance; /* number of bytes to advance in sg in next step */ unsigned int __pg_bit; /* alignment of current block */ + u8 __mixed; /* HW supports single block size or mixed + * block sizes + */ }; struct ib_device *_ib_alloc_device(size_t size); @@ -2749,7 +2759,7 @@ struct ib_block_iter { void __rdma_block_iter_start(struct ib_block_iter *biter, struct scatterlist *sglist, unsigned int nents, - unsigned long pgsz); + unsigned long pgsz_bitmap); bool __rdma_block_iter_next(struct ib_block_iter *biter); /** @@ -2768,14 +2778,14 @@ void __rdma_block_iter_start(struct ib_block_iter *biter, * @sglist: sglist to iterate over * @biter: block iterator holding the memory block * @nents: maximum number of sg entries to iterate over - * @pgsz: best HW supported page size to use + * @pgsz_bitmap: bitmap of HW supported page sizes * * Callers may use rdma_block_iter_dma_address() to get each * blocks aligned DMA address. */ -#define rdma_for_each_block(sglist, biter, nents, pgsz) \ +#define rdma_for_each_block(sglist, biter, nents, pgsz_bitmap) \ for (__rdma_block_iter_start(biter, sglist, nents, \ - pgsz); \ + pgsz_bitmap); \ __rdma_block_iter_next(biter);) /**