Message ID | 20150611231142.16479.41039.stgit@phlsvslse11.ph.intel.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
On Fri, Jun 12, 2015 at 2:11 AM, Mike Marciniszyn <mike.marciniszyn@intel.com> wrote: > +++ b/drivers/infiniband/hw/hfi1/user_pages.c > +/** > + * hfi1_get_user_pages - lock user pages into memory > + * @start_page: the start page > + * @num_pages: the number of pages > + * @p: the output page structures > + * > + * This function takes a given start page (page aligned user virtual > + * address) and pins it and the following specified number of pages. For > + * now, num_pages is always 1, but that will probably change at some point > + * (because caller is doing expected sends on a single virtually contiguous > + * buffer, so we can do all pages at once). > + */ > +int hfi1_get_user_pages(unsigned long start_page, size_t num_pages, > + struct page **p) > +{ > + int ret; > + > + down_write(¤t->mm->mmap_sem); > + > + ret = __hfi1_get_user_pages(start_page, num_pages, p); > + > + up_write(¤t->mm->mmap_sem); > + > + return ret; > +} > + anything wrong with the umem services provided by the IB core that requires this implementation? what? > +void hfi1_release_user_pages(struct page **p, size_t num_pages) > +{ > + if (current->mm) /* during close after signal, mm can be NULL */ > + down_write(¤t->mm->mmap_sem); > + > + __hfi1_release_user_pages(p, num_pages, 1); > + > + if (current->mm) { > + current->mm->pinned_vm -= num_pages; > + up_write(¤t->mm->mmap_sem); > + } > +} > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
PiA+ICsgKiBoZmkxX2dldF91c2VyX3BhZ2VzIC0gbG9jayB1c2VyIHBhZ2VzIGludG8gbWVtb3J5 DQo+ID4gKyAqIEBzdGFydF9wYWdlOiB0aGUgc3RhcnQgcGFnZQ0KPiA+ICsgKiBAbnVtX3BhZ2Vz OiB0aGUgbnVtYmVyIG9mIHBhZ2VzDQo+ID4gKyAqIEBwOiB0aGUgb3V0cHV0IHBhZ2Ugc3RydWN0 dXJlcw0KPiA+ICsgKg0KPiA+ICsgKiBUaGlzIGZ1bmN0aW9uIHRha2VzIGEgZ2l2ZW4gc3RhcnQg cGFnZSAocGFnZSBhbGlnbmVkIHVzZXIgdmlydHVhbA0KPiA+ICsgKiBhZGRyZXNzKSBhbmQgcGlu cyBpdCBhbmQgdGhlIGZvbGxvd2luZyBzcGVjaWZpZWQgbnVtYmVyIG9mIHBhZ2VzLg0KPiA+ICtG b3INCj4gPiArICogbm93LCBudW1fcGFnZXMgaXMgYWx3YXlzIDEsIGJ1dCB0aGF0IHdpbGwgcHJv YmFibHkgY2hhbmdlIGF0IHNvbWUNCj4gPiArcG9pbnQNCj4gPiArICogKGJlY2F1c2UgY2FsbGVy IGlzIGRvaW5nIGV4cGVjdGVkIHNlbmRzIG9uIGEgc2luZ2xlIHZpcnR1YWxseQ0KPiA+ICtjb250 aWd1b3VzDQo+ID4gKyAqIGJ1ZmZlciwgc28gd2UgY2FuIGRvIGFsbCBwYWdlcyBhdCBvbmNlKS4N Cj4gPiArICovDQo+ID4gK2ludCBoZmkxX2dldF91c2VyX3BhZ2VzKHVuc2lnbmVkIGxvbmcgc3Rh cnRfcGFnZSwgc2l6ZV90IG51bV9wYWdlcywNCj4gPiArICAgICAgICAgICAgICAgICAgICAgICBz dHJ1Y3QgcGFnZSAqKnApIHsNCj4gPiArICAgICAgIGludCByZXQ7DQo+ID4gKw0KPiA+ICsgICAg ICAgZG93bl93cml0ZSgmY3VycmVudC0+bW0tPm1tYXBfc2VtKTsNCj4gPiArDQo+ID4gKyAgICAg ICByZXQgPSBfX2hmaTFfZ2V0X3VzZXJfcGFnZXMoc3RhcnRfcGFnZSwgbnVtX3BhZ2VzLCBwKTsN Cj4gPiArDQo+ID4gKyAgICAgICB1cF93cml0ZSgmY3VycmVudC0+bW0tPm1tYXBfc2VtKTsNCj4g PiArDQo+ID4gKyAgICAgICByZXR1cm4gcmV0Ow0KPiA+ICt9DQo+ID4gKw0KPiANCj4gYW55dGhp bmcgd3Jvbmcgd2l0aCB0aGUgdW1lbSBzZXJ2aWNlcyBwcm92aWRlZCBieSB0aGUgSUIgY29yZSB0 aGF0DQo+IHJlcXVpcmVzIHRoaXMgaW1wbGVtZW50YXRpb24/IHdoYXQ/DQo+IA0KDQpXZSBhcmUg Y3VycmVudGx5IGludmVzdGlnYXRpbmcgdGhpcy4NCg0KTWlrZQ0K -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> anything wrong with the umem services provided by the IB core that > requires this implementation? what? > The current level of the API is mismatched with the PSM SDMA. The ib_umem api: - maps an SG list which isn't required by PSM since DMA mapping is done by the low level SDMA - the mapping is bi-directional vs. from device - The pd's context is not there Certainly we would consider a core change that provides just the page locking and we could fake a context. Mike
On 09/07/2015 01:08, Marciniszyn, Mike wrote: >> anything wrong with the umem services provided by the IB core that >> requires this implementation? what? >> > > The current level of the API is mismatched with the PSM SDMA. > > The ib_umem api: > - maps an SG list which isn't required by PSM since DMA mapping is done by the low level SDMA Can I ask why do you prefer mapping page by page over using a single sg list? > - the mapping is bi-directional vs. from device > - The pd's context is not there > > Certainly we would consider a core change that provides just the page locking and we could fake a context. > > Mike -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/infiniband/hw/hfi1/user_pages.c b/drivers/infiniband/hw/hfi1/user_pages.c new file mode 100644 index 0000000..9071afb --- /dev/null +++ b/drivers/infiniband/hw/hfi1/user_pages.c @@ -0,0 +1,156 @@ +/* + * + * This file is provided under a dual BSD/GPLv2 license. When using or + * redistributing this file, you may do so under either license. + * + * GPL LICENSE SUMMARY + * + * Copyright(c) 2015 Intel Corporation. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of version 2 of the GNU General Public License as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * BSD LICENSE + * + * Copyright(c) 2015 Intel Corporation. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * - Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * - Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * - Neither the name of Intel Corporation nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + * + */ + +#include <linux/mm.h> +#include <linux/device.h> + +#include "hfi.h" + +static void __hfi1_release_user_pages(struct page **p, size_t num_pages, + int dirty) +{ + size_t i; + + for (i = 0; i < num_pages; i++) { + if (dirty) + set_page_dirty_lock(p[i]); + put_page(p[i]); + } +} + +/* + * Call with current->mm->mmap_sem held. + */ +static int __hfi1_get_user_pages(unsigned long start_page, size_t num_pages, + struct page **p) +{ + unsigned long lock_limit; + size_t got; + int ret; + + lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT; + + if (num_pages > lock_limit && !capable(CAP_IPC_LOCK)) { + ret = -ENOMEM; + goto bail; + } + + for (got = 0; got < num_pages; got += ret) { + ret = get_user_pages(current, current->mm, + start_page + got * PAGE_SIZE, + num_pages - got, 1, 1, + p + got, NULL); + if (ret < 0) + goto bail_release; + } + + current->mm->pinned_vm += num_pages; + + ret = 0; + goto bail; + +bail_release: + __hfi1_release_user_pages(p, got, 0); +bail: + return ret; +} + +/** + * hfi1_map_page - a safety wrapper around pci_map_page() + * + */ +dma_addr_t hfi1_map_page(struct pci_dev *hwdev, struct page *page, + unsigned long offset, size_t size, int direction) +{ + dma_addr_t phys; + + phys = pci_map_page(hwdev, page, offset, size, direction); + + return phys; +} + +/** + * hfi1_get_user_pages - lock user pages into memory + * @start_page: the start page + * @num_pages: the number of pages + * @p: the output page structures + * + * This function takes a given start page (page aligned user virtual + * address) and pins it and the following specified number of pages. For + * now, num_pages is always 1, but that will probably change at some point + * (because caller is doing expected sends on a single virtually contiguous + * buffer, so we can do all pages at once). + */ +int hfi1_get_user_pages(unsigned long start_page, size_t num_pages, + struct page **p) +{ + int ret; + + down_write(¤t->mm->mmap_sem); + + ret = __hfi1_get_user_pages(start_page, num_pages, p); + + up_write(¤t->mm->mmap_sem); + + return ret; +} + +void hfi1_release_user_pages(struct page **p, size_t num_pages) +{ + if (current->mm) /* during close after signal, mm can be NULL */ + down_write(¤t->mm->mmap_sem); + + __hfi1_release_user_pages(p, num_pages, 1); + + if (current->mm) { + current->mm->pinned_vm -= num_pages; + up_write(¤t->mm->mmap_sem); + } +}