diff mbox

[36/41] IB/hfi1: add low level page locking

Message ID 20150611231142.16479.41039.stgit@phlsvslse11.ph.intel.com (mailing list archive)
State Superseded
Headers show

Commit Message

Marciniszyn, Mike June 11, 2015, 11:11 p.m. UTC
Signed-off-by: Andrew Friedley <andrew.friedley@intel.com>
Signed-off-by: Arthur Kepner <arthur.kepner@intel.com>
Signed-off-by: Brendan Cunningham <brendan.cunningham@intel.com>
Signed-off-by: Brian Welty <brian.welty@intel.com>
Signed-off-by: Caz Yokoyama <caz.yokoyama@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Easwar Hariharan <easwar.hariharan@intel.com>
Signed-off-by: Harish Chegondi <harish.chegondi@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Jim Snow <jim.m.snow@intel.com>
Signed-off-by: John Gregor <john.a.gregor@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Kevin Pine <kevin.pine@intel.com>
Signed-off-by: Kyle Liddell <kyle.liddell@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Ravi Krishnaswamy <ravi.krishnaswamy@intel.com>
Signed-off-by: Sadanand Warrier <sadanand.warrier@intel.com>
Signed-off-by: Sanath Kumar <sanath.s.kumar@intel.com>
Signed-off-by: Sudeep Dutt <sudeep.dutt@intel.com>
Signed-off-by: Vlad Danushevsky <vladimir.danusevsky@intel.com>
---
 drivers/infiniband/hw/hfi1/user_pages.c |  156 +++++++++++++++++++++++++++++++
 1 file changed, 156 insertions(+)
 create mode 100644 drivers/infiniband/hw/hfi1/user_pages.c


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Or Gerlitz June 14, 2015, 9:02 p.m. UTC | #1
On Fri, Jun 12, 2015 at 2:11 AM, Mike Marciniszyn
<mike.marciniszyn@intel.com> wrote:

> +++ b/drivers/infiniband/hw/hfi1/user_pages.c

> +/**
> + * hfi1_get_user_pages - lock user pages into memory
> + * @start_page: the start page
> + * @num_pages: the number of pages
> + * @p: the output page structures
> + *
> + * This function takes a given start page (page aligned user virtual
> + * address) and pins it and the following specified number of pages.  For
> + * now, num_pages is always 1, but that will probably change at some point
> + * (because caller is doing expected sends on a single virtually contiguous
> + * buffer, so we can do all pages at once).
> + */
> +int hfi1_get_user_pages(unsigned long start_page, size_t num_pages,
> +                       struct page **p)
> +{
> +       int ret;
> +
> +       down_write(&current->mm->mmap_sem);
> +
> +       ret = __hfi1_get_user_pages(start_page, num_pages, p);
> +
> +       up_write(&current->mm->mmap_sem);
> +
> +       return ret;
> +}
> +

anything wrong with the umem services provided by the IB core that
requires this implementation? what?

> +void hfi1_release_user_pages(struct page **p, size_t num_pages)
> +{
> +       if (current->mm) /* during close after signal, mm can be NULL */
> +               down_write(&current->mm->mmap_sem);
> +
> +       __hfi1_release_user_pages(p, num_pages, 1);
> +
> +       if (current->mm) {
> +               current->mm->pinned_vm -= num_pages;
> +               up_write(&current->mm->mmap_sem);
> +       }
> +}
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Marciniszyn, Mike June 17, 2015, 12:58 p.m. UTC | #2
PiA+ICsgKiBoZmkxX2dldF91c2VyX3BhZ2VzIC0gbG9jayB1c2VyIHBhZ2VzIGludG8gbWVtb3J5
DQo+ID4gKyAqIEBzdGFydF9wYWdlOiB0aGUgc3RhcnQgcGFnZQ0KPiA+ICsgKiBAbnVtX3BhZ2Vz
OiB0aGUgbnVtYmVyIG9mIHBhZ2VzDQo+ID4gKyAqIEBwOiB0aGUgb3V0cHV0IHBhZ2Ugc3RydWN0
dXJlcw0KPiA+ICsgKg0KPiA+ICsgKiBUaGlzIGZ1bmN0aW9uIHRha2VzIGEgZ2l2ZW4gc3RhcnQg
cGFnZSAocGFnZSBhbGlnbmVkIHVzZXIgdmlydHVhbA0KPiA+ICsgKiBhZGRyZXNzKSBhbmQgcGlu
cyBpdCBhbmQgdGhlIGZvbGxvd2luZyBzcGVjaWZpZWQgbnVtYmVyIG9mIHBhZ2VzLg0KPiA+ICtG
b3INCj4gPiArICogbm93LCBudW1fcGFnZXMgaXMgYWx3YXlzIDEsIGJ1dCB0aGF0IHdpbGwgcHJv
YmFibHkgY2hhbmdlIGF0IHNvbWUNCj4gPiArcG9pbnQNCj4gPiArICogKGJlY2F1c2UgY2FsbGVy
IGlzIGRvaW5nIGV4cGVjdGVkIHNlbmRzIG9uIGEgc2luZ2xlIHZpcnR1YWxseQ0KPiA+ICtjb250
aWd1b3VzDQo+ID4gKyAqIGJ1ZmZlciwgc28gd2UgY2FuIGRvIGFsbCBwYWdlcyBhdCBvbmNlKS4N
Cj4gPiArICovDQo+ID4gK2ludCBoZmkxX2dldF91c2VyX3BhZ2VzKHVuc2lnbmVkIGxvbmcgc3Rh
cnRfcGFnZSwgc2l6ZV90IG51bV9wYWdlcywNCj4gPiArICAgICAgICAgICAgICAgICAgICAgICBz
dHJ1Y3QgcGFnZSAqKnApIHsNCj4gPiArICAgICAgIGludCByZXQ7DQo+ID4gKw0KPiA+ICsgICAg
ICAgZG93bl93cml0ZSgmY3VycmVudC0+bW0tPm1tYXBfc2VtKTsNCj4gPiArDQo+ID4gKyAgICAg
ICByZXQgPSBfX2hmaTFfZ2V0X3VzZXJfcGFnZXMoc3RhcnRfcGFnZSwgbnVtX3BhZ2VzLCBwKTsN
Cj4gPiArDQo+ID4gKyAgICAgICB1cF93cml0ZSgmY3VycmVudC0+bW0tPm1tYXBfc2VtKTsNCj4g
PiArDQo+ID4gKyAgICAgICByZXR1cm4gcmV0Ow0KPiA+ICt9DQo+ID4gKw0KPiANCj4gYW55dGhp
bmcgd3Jvbmcgd2l0aCB0aGUgdW1lbSBzZXJ2aWNlcyBwcm92aWRlZCBieSB0aGUgSUIgY29yZSB0
aGF0DQo+IHJlcXVpcmVzIHRoaXMgaW1wbGVtZW50YXRpb24/IHdoYXQ/DQo+IA0KDQpXZSBhcmUg
Y3VycmVudGx5IGludmVzdGlnYXRpbmcgdGhpcy4NCg0KTWlrZQ0K
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Marciniszyn, Mike July 8, 2015, 10:08 p.m. UTC | #3
> anything wrong with the umem services provided by the IB core that

> requires this implementation? what?

> 


The current level of the API is mismatched with the PSM SDMA.

The ib_umem api:
- maps an SG list which isn't required by PSM since DMA mapping is done by the low level SDMA
- the mapping is bi-directional vs. from device
- The pd's context is not there

Certainly we would consider a core change that provides just the page locking and we could fake a context.

Mike
Haggai Eran July 9, 2015, 7:33 a.m. UTC | #4
On 09/07/2015 01:08, Marciniszyn, Mike wrote:
>> anything wrong with the umem services provided by the IB core that
>> requires this implementation? what?
>>
> 
> The current level of the API is mismatched with the PSM SDMA.
> 
> The ib_umem api:
> - maps an SG list which isn't required by PSM since DMA mapping is done by the low level SDMA
Can I ask why do you prefer mapping page by page over using a single sg
list?

> - the mapping is bi-directional vs. from device
> - The pd's context is not there
> 
> Certainly we would consider a core change that provides just the page locking and we could fake a context.
> 
> Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/infiniband/hw/hfi1/user_pages.c b/drivers/infiniband/hw/hfi1/user_pages.c
new file mode 100644
index 0000000..9071afb
--- /dev/null
+++ b/drivers/infiniband/hw/hfi1/user_pages.c
@@ -0,0 +1,156 @@ 
+/*
+ *
+ * This file is provided under a dual BSD/GPLv2 license.  When using or
+ * redistributing this file, you may do so under either license.
+ *
+ * GPL LICENSE SUMMARY
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * BSD LICENSE
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ *  - Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ *  - Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in
+ *    the documentation and/or other materials provided with the
+ *    distribution.
+ *  - Neither the name of Intel Corporation nor the names of its
+ *    contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ */
+
+#include <linux/mm.h>
+#include <linux/device.h>
+
+#include "hfi.h"
+
+static void __hfi1_release_user_pages(struct page **p, size_t num_pages,
+				      int dirty)
+{
+	size_t i;
+
+	for (i = 0; i < num_pages; i++) {
+		if (dirty)
+			set_page_dirty_lock(p[i]);
+		put_page(p[i]);
+	}
+}
+
+/*
+ * Call with current->mm->mmap_sem held.
+ */
+static int __hfi1_get_user_pages(unsigned long start_page, size_t num_pages,
+				 struct page **p)
+{
+	unsigned long lock_limit;
+	size_t got;
+	int ret;
+
+	lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
+
+	if (num_pages > lock_limit && !capable(CAP_IPC_LOCK)) {
+		ret = -ENOMEM;
+		goto bail;
+	}
+
+	for (got = 0; got < num_pages; got += ret) {
+		ret = get_user_pages(current, current->mm,
+				     start_page + got * PAGE_SIZE,
+				     num_pages - got, 1, 1,
+				     p + got, NULL);
+		if (ret < 0)
+			goto bail_release;
+	}
+
+	current->mm->pinned_vm += num_pages;
+
+	ret = 0;
+	goto bail;
+
+bail_release:
+	__hfi1_release_user_pages(p, got, 0);
+bail:
+	return ret;
+}
+
+/**
+ * hfi1_map_page - a safety wrapper around pci_map_page()
+ *
+ */
+dma_addr_t hfi1_map_page(struct pci_dev *hwdev, struct page *page,
+			 unsigned long offset, size_t size, int direction)
+{
+	dma_addr_t phys;
+
+	phys = pci_map_page(hwdev, page, offset, size, direction);
+
+	return phys;
+}
+
+/**
+ * hfi1_get_user_pages - lock user pages into memory
+ * @start_page: the start page
+ * @num_pages: the number of pages
+ * @p: the output page structures
+ *
+ * This function takes a given start page (page aligned user virtual
+ * address) and pins it and the following specified number of pages.  For
+ * now, num_pages is always 1, but that will probably change at some point
+ * (because caller is doing expected sends on a single virtually contiguous
+ * buffer, so we can do all pages at once).
+ */
+int hfi1_get_user_pages(unsigned long start_page, size_t num_pages,
+			struct page **p)
+{
+	int ret;
+
+	down_write(&current->mm->mmap_sem);
+
+	ret = __hfi1_get_user_pages(start_page, num_pages, p);
+
+	up_write(&current->mm->mmap_sem);
+
+	return ret;
+}
+
+void hfi1_release_user_pages(struct page **p, size_t num_pages)
+{
+	if (current->mm) /* during close after signal, mm can be NULL */
+		down_write(&current->mm->mmap_sem);
+
+	__hfi1_release_user_pages(p, num_pages, 1);
+
+	if (current->mm) {
+		current->mm->pinned_vm -= num_pages;
+		up_write(&current->mm->mmap_sem);
+	}
+}