Message ID | 20190302202435.31889-1-jhubbard@nvidia.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2] RDMA/umem: minor bug fix and cleanup in error handling paths | expand |
FWIW I don't have ODP hardware either. So I can't test this either. On Sat, Mar 02, 2019 at 12:24:35PM -0800, john.hubbard@gmail.com wrote: > From: John Hubbard <jhubbard@nvidia.com> > > 1. Bug fix: the error handling release pages starting > at the first page that experienced an error. > > 2. Refinement: release_pages() is better than put_page() > in a loop. > > 3. Dead code removal: the check for (user_virt & ~page_mask) > is checking for a condition that can never happen, > because earlier: > > user_virt = user_virt & page_mask; > > ...so, remove that entire phrase. > > 4. Minor: As long as I'm here, shorten up a couple of long lines > in the same function, without harming the ability to > grep for the printed error message. > > Cc: Ira Weiny <ira.weiny@intel.com> > Cc: Jason Gunthorpe <jgg@ziepe.ca> > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: Doug Ledford <dledford@redhat.com> > Cc: linux-rdma@vger.kernel.org > Cc: linux-mm@kvack.org > Signed-off-by: John Hubbard <jhubbard@nvidia.com> > --- > > v2: Fixes a kbuild test robot reported build failure, by directly > including pagemap.h > > drivers/infiniband/core/umem_odp.c | 25 ++++++++++--------------- > 1 file changed, 10 insertions(+), 15 deletions(-) > > diff --git a/drivers/infiniband/core/umem_odp.c b/drivers/infiniband/core/umem_odp.c > index acb882f279cb..83872c1f3f2c 100644 > --- a/drivers/infiniband/core/umem_odp.c > +++ b/drivers/infiniband/core/umem_odp.c > @@ -40,6 +40,7 @@ > #include <linux/vmalloc.h> > #include <linux/hugetlb.h> > #include <linux/interval_tree_generic.h> > +#include <linux/pagemap.h> > > #include <rdma/ib_verbs.h> > #include <rdma/ib_umem.h> > @@ -648,25 +649,17 @@ int ib_umem_odp_map_dma_pages(struct ib_umem_odp *umem_odp, u64 user_virt, > > if (npages < 0) { > if (npages != -EAGAIN) > - pr_warn("fail to get %zu user pages with error %d\n", gup_num_pages, npages); > + pr_warn("fail to get %zu user pages with error %d\n", > + gup_num_pages, npages); > else > - pr_debug("fail to get %zu user pages with error %d\n", gup_num_pages, npages); > + pr_debug("fail to get %zu user pages with error %d\n", > + gup_num_pages, npages); > break; > } > > bcnt -= min_t(size_t, npages << PAGE_SHIFT, bcnt); > mutex_lock(&umem_odp->umem_mutex); > for (j = 0; j < npages; j++, user_virt += PAGE_SIZE) { > - if (user_virt & ~page_mask) { > - p += PAGE_SIZE; > - if (page_to_phys(local_page_list[j]) != p) { > - ret = -EFAULT; > - break; > - } > - put_page(local_page_list[j]); > - continue; > - } > - I think this is trying to account for compound pages. (ie page_mask could represent more than PAGE_SIZE which is what user_virt is being incrimented by.) But putting the page in that case seems to be the wrong thing to do? Yes this was added by Artemy[1] now cc'ed. > ret = ib_umem_odp_map_dma_single_page( > umem_odp, k, local_page_list[j], > access_mask, current_seq); > @@ -684,9 +677,11 @@ int ib_umem_odp_map_dma_pages(struct ib_umem_odp *umem_odp, u64 user_virt, > mutex_unlock(&umem_odp->umem_mutex); > > if (ret < 0) { > - /* Release left over pages when handling errors. */ > - for (++j; j < npages; ++j) > - put_page(local_page_list[j]); > + /* > + * Release pages, starting at the the first page > + * that experienced an error. > + */ > + release_pages(&local_page_list[j], npages - j); My concern here is that release_pages handle compound pages, perhaps differently from the above code so calling it may may not work? But I've not really spent a lot of time on it... :-/ Ira [1] commit 403cd12e2cf759ead5cbdcb62bf9872b9618d400 Author: Artemy Kovalyov <artemyko@mellanox.com> Date: Wed Apr 5 09:23:55 2017 +0300 IB/umem: Add contiguous ODP support Currenlty ODP supports only regular MMU pages. Add ODP support for regions consisting of physically contiguous chunks of arbitrary order (huge pages for instance) to improve performance. Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
On 02/03/2019 21:44, Ira Weiny wrote: > > On Sat, Mar 02, 2019 at 12:24:35PM -0800, john.hubbard@gmail.com wrote: >> From: John Hubbard <jhubbard@nvidia.com> >> >> ... >> 3. Dead code removal: the check for (user_virt & ~page_mask) >> is checking for a condition that can never happen, >> because earlier: >> >> user_virt = user_virt & page_mask; >> >> ...so, remove that entire phrase. >> >> >> bcnt -= min_t(size_t, npages << PAGE_SHIFT, bcnt); >> mutex_lock(&umem_odp->umem_mutex); >> for (j = 0; j < npages; j++, user_virt += PAGE_SIZE) { >> - if (user_virt & ~page_mask) { >> - p += PAGE_SIZE; >> - if (page_to_phys(local_page_list[j]) != p) { >> - ret = -EFAULT; >> - break; >> - } >> - put_page(local_page_list[j]); >> - continue; >> - } >> - > > I think this is trying to account for compound pages. (ie page_mask could > represent more than PAGE_SIZE which is what user_virt is being incrimented by.) > But putting the page in that case seems to be the wrong thing to do? > > Yes this was added by Artemy[1] now cc'ed. Right, this is for huge pages, please keep it. put_page() needed to decrement refcount of the head page.
On Sun, Mar 03, 2019 at 11:52:41AM +0200, Artemy Kovalyov wrote: > > > On 02/03/2019 21:44, Ira Weiny wrote: > > > > On Sat, Mar 02, 2019 at 12:24:35PM -0800, john.hubbard@gmail.com wrote: > > > From: John Hubbard <jhubbard@nvidia.com> > > > > > > ... > > > 3. Dead code removal: the check for (user_virt & ~page_mask) > > > is checking for a condition that can never happen, > > > because earlier: > > > > > > user_virt = user_virt & page_mask; > > > > > > ...so, remove that entire phrase. > > > > > > bcnt -= min_t(size_t, npages << PAGE_SHIFT, bcnt); > > > mutex_lock(&umem_odp->umem_mutex); > > > for (j = 0; j < npages; j++, user_virt += PAGE_SIZE) { > > > - if (user_virt & ~page_mask) { > > > - p += PAGE_SIZE; > > > - if (page_to_phys(local_page_list[j]) != p) { > > > - ret = -EFAULT; > > > - break; > > > - } > > > - put_page(local_page_list[j]); > > > - continue; > > > - } > > > - > > > > I think this is trying to account for compound pages. (ie page_mask could > > represent more than PAGE_SIZE which is what user_virt is being incrimented by.) > > But putting the page in that case seems to be the wrong thing to do? > > > > Yes this was added by Artemy[1] now cc'ed. > > Right, this is for huge pages, please keep it. > put_page() needed to decrement refcount of the head page. You mean decrement the refcount of the _non_-head pages? Ira >
On 3/3/19 1:52 AM, Artemy Kovalyov wrote: > > > On 02/03/2019 21:44, Ira Weiny wrote: >> >> On Sat, Mar 02, 2019 at 12:24:35PM -0800, john.hubbard@gmail.com wrote: >>> From: John Hubbard <jhubbard@nvidia.com> >>> >>> ... >>> 3. Dead code removal: the check for (user_virt & ~page_mask) >>> is checking for a condition that can never happen, >>> because earlier: >>> >>> user_virt = user_virt & page_mask; >>> >>> ...so, remove that entire phrase. >>> >>> bcnt -= min_t(size_t, npages << PAGE_SHIFT, bcnt); >>> mutex_lock(&umem_odp->umem_mutex); >>> for (j = 0; j < npages; j++, user_virt += PAGE_SIZE) { >>> - if (user_virt & ~page_mask) { >>> - p += PAGE_SIZE; >>> - if (page_to_phys(local_page_list[j]) != p) { >>> - ret = -EFAULT; >>> - break; >>> - } >>> - put_page(local_page_list[j]); >>> - continue; >>> - } >>> - >> >> I think this is trying to account for compound pages. (ie page_mask could >> represent more than PAGE_SIZE which is what user_virt is being incrimented by.) >> But putting the page in that case seems to be the wrong thing to do? >> >> Yes this was added by Artemy[1] now cc'ed. > > Right, this is for huge pages, please keep it. > put_page() needed to decrement refcount of the head page. > OK, thanks for explaining! Artemy, while you're here, any thoughts about the release_pages, and the change of the starting point, from the other part of the patch: @@ -684,9 +677,11 @@ int ib_umem_odp_map_dma_pages(struct ib_umem_odp *umem_odp, u64 user_virt, mutex_unlock(&umem_odp->umem_mutex); if (ret < 0) { - /* Release left over pages when handling errors. */ - for (++j; j < npages; ++j) - put_page(local_page_list[j]); + /* + * Release pages, starting at the the first page + * that experienced an error. + */ + release_pages(&local_page_list[j], npages - j); break; } } ? thanks,
On Sun, Mar 03, 2019 at 02:37:33PM -0800, John Hubbard wrote: > On 3/3/19 1:52 AM, Artemy Kovalyov wrote: > > > > > > On 02/03/2019 21:44, Ira Weiny wrote: > > > > > > On Sat, Mar 02, 2019 at 12:24:35PM -0800, john.hubbard@gmail.com wrote: > > > > From: John Hubbard <jhubbard@nvidia.com> > > > > > > > > ... > > > > 3. Dead code removal: the check for (user_virt & ~page_mask) > > > > is checking for a condition that can never happen, > > > > because earlier: > > > > > > > > user_virt = user_virt & page_mask; > > > > > > > > ...so, remove that entire phrase. > > > > > > > > bcnt -= min_t(size_t, npages << PAGE_SHIFT, bcnt); > > > > mutex_lock(&umem_odp->umem_mutex); > > > > for (j = 0; j < npages; j++, user_virt += PAGE_SIZE) { > > > > - if (user_virt & ~page_mask) { > > > > - p += PAGE_SIZE; > > > > - if (page_to_phys(local_page_list[j]) != p) { > > > > - ret = -EFAULT; > > > > - break; > > > > - } > > > > - put_page(local_page_list[j]); > > > > - continue; > > > > - } > > > > - > > > > > > I think this is trying to account for compound pages. (ie page_mask could > > > represent more than PAGE_SIZE which is what user_virt is being incrimented by.) > > > But putting the page in that case seems to be the wrong thing to do? > > > > > > Yes this was added by Artemy[1] now cc'ed. > > > > Right, this is for huge pages, please keep it. > > put_page() needed to decrement refcount of the head page. > > > > OK, thanks for explaining! Artemy, while you're here, any thoughts about the > release_pages, and the change of the starting point, from the other part > of the patch: Your release pages code is right fix, it will be great if you prepare proper and standalone patch. Thanks > > @@ -684,9 +677,11 @@ int ib_umem_odp_map_dma_pages(struct ib_umem_odp > *umem_odp, u64 user_virt, > mutex_unlock(&umem_odp->umem_mutex); > > if (ret < 0) { > - /* Release left over pages when handling errors. */ > - for (++j; j < npages; ++j) > - put_page(local_page_list[j]); > + /* > + * Release pages, starting at the the first page > + * that experienced an error. > + */ > + release_pages(&local_page_list[j], npages - j); > break; > } > } > > ? > > thanks, > -- > John Hubbard > NVIDIA >
On Mon, Mar 04, 2019 at 03:11:05PM -0800, John Hubbard wrote: > On 3/3/19 8:55 AM, Ira Weiny wrote: > > On Sun, Mar 03, 2019 at 11:52:41AM +0200, Artemy Kovalyov wrote: > >> > >> > >> On 02/03/2019 21:44, Ira Weiny wrote: > >>> > >>> On Sat, Mar 02, 2019 at 12:24:35PM -0800, john.hubbard@gmail.com wrote: > >>>> From: John Hubbard <jhubbard@nvidia.com> > >>>> > >>>> ... > >>>> 3. Dead code removal: the check for (user_virt & ~page_mask) > >>>> is checking for a condition that can never happen, > >>>> because earlier: > >>>> > >>>> user_virt = user_virt & page_mask; > >>>> > >>>> ...so, remove that entire phrase. > >>>> > >>>> bcnt -= min_t(size_t, npages << PAGE_SHIFT, bcnt); > >>>> mutex_lock(&umem_odp->umem_mutex); > >>>> for (j = 0; j < npages; j++, user_virt += PAGE_SIZE) { > >>>> - if (user_virt & ~page_mask) { > >>>> - p += PAGE_SIZE; > >>>> - if (page_to_phys(local_page_list[j]) != p) { > >>>> - ret = -EFAULT; > >>>> - break; > >>>> - } > >>>> - put_page(local_page_list[j]); > >>>> - continue; > >>>> - } > >>>> - > >>> > >>> I think this is trying to account for compound pages. (ie page_mask could > >>> represent more than PAGE_SIZE which is what user_virt is being incrimented by.) > >>> But putting the page in that case seems to be the wrong thing to do? > >>> > >>> Yes this was added by Artemy[1] now cc'ed. > >> > >> Right, this is for huge pages, please keep it. > >> put_page() needed to decrement refcount of the head page. > > > > You mean decrement the refcount of the _non_-head pages? > > > > Ira > > > > Actually, I'm sure Artemy means head page, because put_page() always > operates on the head page. > > And this reminds me that I have a problem to solve nearby: get_user_pages > on huge pages increments the page->_refcount *for each tail page* as well. > That's a minor problem for my put_user_page() > patchset, because my approach so far assumed that I could just change us > over to: > > get_user_page(): increments page->_refcount by a large amount (1024) > > put_user_page(): decrements page->_refcount by a large amount (1024) > > ...and just stop doing the odd (to me) technique of incrementing once for > each tail page. I cannot see any reason why that's actually required, as > opposed to just "raise the page->_refcount enough to avoid losing the head > page too soon". What about splitting a huge page? From Documention/vm/transhuge.rst <quoute> split_huge_page internally has to distribute the refcounts in the head page to the tail pages before clearing all PG_head/tail bits from the page structures. It can be done easily for refcounts taken by page table entries. But we don't have enough information on how to distribute any additional pins (i.e. from get_user_pages). split_huge_page() fails any requests to split pinned huge page: it expects page count to be equal to sum of mapcount of all sub-pages plus one (split_huge_page caller must have reference for head page). </quote> FWIW, I'm not sure why it needs to "store" the reference in the head page for this. I don't see any check to make sure the ref has been "stored" but I'm not really familiar with the compound page code yet. Ira > > However, it may be tricky to do this in one pass. Probably at first, I'll have > to do this horrible thing approach: > > get_user_page(): increments page->_refcount by a large amount (1024) > > put_user_page(): decrements page->_refcount by a large amount (1024) MULTIPLIED > by the number of tail pages. argghhh that's ugly. > > thanks, > -- > John Hubbard > NVIDIA >
On 3/3/19 8:55 AM, Ira Weiny wrote: > On Sun, Mar 03, 2019 at 11:52:41AM +0200, Artemy Kovalyov wrote: >> >> >> On 02/03/2019 21:44, Ira Weiny wrote: >>> >>> On Sat, Mar 02, 2019 at 12:24:35PM -0800, john.hubbard@gmail.com wrote: >>>> From: John Hubbard <jhubbard@nvidia.com> >>>> >>>> ... >>>> 3. Dead code removal: the check for (user_virt & ~page_mask) >>>> is checking for a condition that can never happen, >>>> because earlier: >>>> >>>> user_virt = user_virt & page_mask; >>>> >>>> ...so, remove that entire phrase. >>>> >>>> bcnt -= min_t(size_t, npages << PAGE_SHIFT, bcnt); >>>> mutex_lock(&umem_odp->umem_mutex); >>>> for (j = 0; j < npages; j++, user_virt += PAGE_SIZE) { >>>> - if (user_virt & ~page_mask) { >>>> - p += PAGE_SIZE; >>>> - if (page_to_phys(local_page_list[j]) != p) { >>>> - ret = -EFAULT; >>>> - break; >>>> - } >>>> - put_page(local_page_list[j]); >>>> - continue; >>>> - } >>>> - >>> >>> I think this is trying to account for compound pages. (ie page_mask could >>> represent more than PAGE_SIZE which is what user_virt is being incrimented by.) >>> But putting the page in that case seems to be the wrong thing to do? >>> >>> Yes this was added by Artemy[1] now cc'ed. >> >> Right, this is for huge pages, please keep it. >> put_page() needed to decrement refcount of the head page. > > You mean decrement the refcount of the _non_-head pages? > > Ira > Actually, I'm sure Artemy means head page, because put_page() always operates on the head page. And this reminds me that I have a problem to solve nearby: get_user_pages on huge pages increments the page->_refcount *for each tail page* as well. That's a minor problem for my put_user_page() patchset, because my approach so far assumed that I could just change us over to: get_user_page(): increments page->_refcount by a large amount (1024) put_user_page(): decrements page->_refcount by a large amount (1024) ...and just stop doing the odd (to me) technique of incrementing once for each tail page. I cannot see any reason why that's actually required, as opposed to just "raise the page->_refcount enough to avoid losing the head page too soon". However, it may be tricky to do this in one pass. Probably at first, I'll have to do this horrible thing approach: get_user_page(): increments page->_refcount by a large amount (1024) put_user_page(): decrements page->_refcount by a large amount (1024) MULTIPLIED by the number of tail pages. argghhh that's ugly. thanks,
On 3/4/19 3:11 PM, John Hubbard wrote: > On 3/3/19 8:55 AM, Ira Weiny wrote: >> On Sun, Mar 03, 2019 at 11:52:41AM +0200, Artemy Kovalyov wrote: >>> >>> >>> On 02/03/2019 21:44, Ira Weiny wrote: >>>> >>>> On Sat, Mar 02, 2019 at 12:24:35PM -0800, john.hubbard@gmail.com wrote: >>>>> From: John Hubbard <jhubbard@nvidia.com> >>>>> >>>>> ... >>>>> 3. Dead code removal: the check for (user_virt & ~page_mask) >>>>> is checking for a condition that can never happen, >>>>> because earlier: >>>>> >>>>> user_virt = user_virt & page_mask; >>>>> >>>>> ...so, remove that entire phrase. >>>>> >>>>> bcnt -= min_t(size_t, npages << PAGE_SHIFT, bcnt); >>>>> mutex_lock(&umem_odp->umem_mutex); >>>>> for (j = 0; j < npages; j++, user_virt += PAGE_SIZE) { >>>>> - if (user_virt & ~page_mask) { >>>>> - p += PAGE_SIZE; >>>>> - if (page_to_phys(local_page_list[j]) != p) { >>>>> - ret = -EFAULT; >>>>> - break; >>>>> - } >>>>> - put_page(local_page_list[j]); >>>>> - continue; >>>>> - } >>>>> - >>>> >>>> I think this is trying to account for compound pages. (ie page_mask could >>>> represent more than PAGE_SIZE which is what user_virt is being incrimented by.) >>>> But putting the page in that case seems to be the wrong thing to do? >>>> >>>> Yes this was added by Artemy[1] now cc'ed. >>> >>> Right, this is for huge pages, please keep it. >>> put_page() needed to decrement refcount of the head page. >> >> You mean decrement the refcount of the _non_-head pages? >> >> Ira >> > > Actually, I'm sure Artemy means head page, because put_page() always > operates on the head page. > > And this reminds me that I have a problem to solve nearby: get_user_pages > on huge pages increments the page->_refcount *for each tail page* as well. > That's a minor problem for my put_user_page() > patchset, because my approach so far assumed that I could just change us > over to: > > get_user_page(): increments page->_refcount by a large amount (1024) > > put_user_page(): decrements page->_refcount by a large amount (1024) > > ...and just stop doing the odd (to me) technique of incrementing once for > each tail page. I cannot see any reason why that's actually required, as > opposed to just "raise the page->_refcount enough to avoid losing the head > page too soon". > > However, it may be tricky to do this in one pass. Probably at first, I'll have > to do this horrible thing approach: > > get_user_page(): increments page->_refcount by a large amount (1024) > > put_user_page(): decrements page->_refcount by a large amount (1024) MULTIPLIED > by the number of tail pages. argghhh that's ugly. > I see that this is still not stated quite right. ...to clarify, I mean to leave the existing behavior alone. So it would be the call sites (not put_user_page as the above says) that would be doing all that decrementing. The call sites know how many decrements are appropriate. Unless someone thinks of a clever way to clean this up in one shot. I'm not really seeing any way. thanks,
On Mon, Mar 04, 2019 at 03:11:05PM -0800, John Hubbard wrote: > get_user_page(): increments page->_refcount by a large amount (1024) > > put_user_page(): decrements page->_refcount by a large amount (1024) > > ...and just stop doing the odd (to me) technique of incrementing once for > each tail page. I cannot see any reason why that's actually required, as > opposed to just "raise the page->_refcount enough to avoid losing the head > page too soon". I'd very much like to see this in the infiniband umem code - the extra work and cost of touching every page in a huge page is very much undesired. Jason
On 3/4/19 12:13 PM, Ira Weiny wrote: [snip] >> And this reminds me that I have a problem to solve nearby: get_user_pages >> on huge pages increments the page->_refcount *for each tail page* as well. >> That's a minor problem for my put_user_page() >> patchset, because my approach so far assumed that I could just change us >> over to: >> >> get_user_page(): increments page->_refcount by a large amount (1024) >> >> put_user_page(): decrements page->_refcount by a large amount (1024) >> >> ...and just stop doing the odd (to me) technique of incrementing once for >> each tail page. I cannot see any reason why that's actually required, as >> opposed to just "raise the page->_refcount enough to avoid losing the head >> page too soon". > > What about splitting a huge page? > > From Documention/vm/transhuge.rst > > <quoute> > split_huge_page internally has to distribute the refcounts in the head > page to the tail pages before clearing all PG_head/tail bits from the page > structures. It can be done easily for refcounts taken by page table > entries. But we don't have enough information on how to distribute any > additional pins (i.e. from get_user_pages). split_huge_page() fails any > requests to split pinned huge page: it expects page count to be equal to > sum of mapcount of all sub-pages plus one (split_huge_page caller must > have reference for head page). > </quote> > heh, so in the end, split_huge_page just needs enough information to say "no" for gup pages. So as long as page->_refcount avoids one particular value, the code keeps working. :) > FWIW, I'm not sure why it needs to "store" the reference in the head page for > this. I don't see any check to make sure the ref has been "stored" but I'm not > really familiar with the compound page code yet. > > Ira > Thanks for peeking at this, I'll look deeper too. thanks,
On 04/03/2019 00:37, John Hubbard wrote: > On 3/3/19 1:52 AM, Artemy Kovalyov wrote: >> >> >> On 02/03/2019 21:44, Ira Weiny wrote: >>> >>> On Sat, Mar 02, 2019 at 12:24:35PM -0800, john.hubbard@gmail.com wrote: >>>> From: John Hubbard <jhubbard@nvidia.com> >>>> >>>> ... > > OK, thanks for explaining! Artemy, while you're here, any thoughts about the > release_pages, and the change of the starting point, from the other part of the > patch: > > @@ -684,9 +677,11 @@ int ib_umem_odp_map_dma_pages(struct ib_umem_odp *umem_odp, > u64 user_virt, > mutex_unlock(&umem_odp->umem_mutex); > > if (ret < 0) { > - /* Release left over pages when handling errors. */ > - for (++j; j < npages; ++j) release_pages() is an optimized batch put_page() so it's ok. but! release starting from page next to one cause failure in ib_umem_odp_map_dma_single_page() is correct because failure flow of this functions already called put_page(). So release_pages(&local_page_list[j+1], npages - j-1) would be correct. > - put_page(local_page_list[j]); > + /* > + * Release pages, starting at the the first page > + * that experienced an error. > + */ > + release_pages(&local_page_list[j], npages - j); > break; > } > } > > ? > > thanks, >
On Wed, Mar 06, 2019 at 03:02:36AM +0200, Artemy Kovalyov wrote: > > > On 04/03/2019 00:37, John Hubbard wrote: > > On 3/3/19 1:52 AM, Artemy Kovalyov wrote: > > > > > > > > > On 02/03/2019 21:44, Ira Weiny wrote: > > > > > > > > On Sat, Mar 02, 2019 at 12:24:35PM -0800, john.hubbard@gmail.com wrote: > > > > > From: John Hubbard <jhubbard@nvidia.com> > > > > > > > > > > ... > > > > OK, thanks for explaining! Artemy, while you're here, any thoughts about the > > release_pages, and the change of the starting point, from the other part of the > > patch: > > > > @@ -684,9 +677,11 @@ int ib_umem_odp_map_dma_pages(struct ib_umem_odp *umem_odp, > > u64 user_virt, > > mutex_unlock(&umem_odp->umem_mutex); > > > > if (ret < 0) { > > - /* Release left over pages when handling errors. */ > > - for (++j; j < npages; ++j) > release_pages() is an optimized batch put_page() so it's ok. > but! release starting from page next to one cause failure in > ib_umem_odp_map_dma_single_page() is correct because failure flow of this > functions already called put_page(). > So release_pages(&local_page_list[j+1], npages - j-1) would be correct. Someone send a fixup patch please... Jason
On 3/5/19 5:32 PM, Jason Gunthorpe wrote: > On Wed, Mar 06, 2019 at 03:02:36AM +0200, Artemy Kovalyov wrote: >> >> >> On 04/03/2019 00:37, John Hubbard wrote: >>> On 3/3/19 1:52 AM, Artemy Kovalyov wrote: >>>> >>>> >>>> On 02/03/2019 21:44, Ira Weiny wrote: >>>>> >>>>> On Sat, Mar 02, 2019 at 12:24:35PM -0800, john.hubbard@gmail.com wrote: >>>>>> From: John Hubbard <jhubbard@nvidia.com> >>>>>> >>>>>> ... >>> >>> OK, thanks for explaining! Artemy, while you're here, any thoughts about the >>> release_pages, and the change of the starting point, from the other part of the >>> patch: >>> >>> @@ -684,9 +677,11 @@ int ib_umem_odp_map_dma_pages(struct ib_umem_odp *umem_odp, >>> u64 user_virt, >>> mutex_unlock(&umem_odp->umem_mutex); >>> >>> if (ret < 0) { >>> - /* Release left over pages when handling errors. */ >>> - for (++j; j < npages; ++j) >> release_pages() is an optimized batch put_page() so it's ok. >> but! release starting from page next to one cause failure in >> ib_umem_odp_map_dma_single_page() is correct because failure flow of this >> functions already called put_page(). >> So release_pages(&local_page_list[j+1], npages - j-1) would be correct. > > Someone send a fixup patch please... > > Jason Yeah, I'm on it. Just need to double-check that this is the case. But Jason, you're confirming it already, so that helps too. Patch coming shortly. thanks,
On 3/5/19 5:34 PM, John Hubbard wrote: [snip] >>> So release_pages(&local_page_list[j+1], npages - j-1) would be correct. >> >> Someone send a fixup patch please... >> >> Jason > > Yeah, I'm on it. Just need to double-check that this is the case. But Jason, > you're confirming it already, so that helps too. > > Patch coming shortly. > Jason, btw, do you prefer a patch that fixes the previous one, or a new patch that stands alone? (I'm not sure how this tree is maintained, exactly.) thanks,
On Tue, Mar 05, 2019 at 05:37:18PM -0800, John Hubbard wrote: > On 3/5/19 5:34 PM, John Hubbard wrote: > [snip] > >>> So release_pages(&local_page_list[j+1], npages - j-1) would be correct. > >> > >> Someone send a fixup patch please... > >> > >> Jason > > > > Yeah, I'm on it. Just need to double-check that this is the case. But Jason, > > you're confirming it already, so that helps too. I didn't look, just assuming Artemy is right since he knows this code.. > > Patch coming shortly. > > > > Jason, btw, do you prefer a patch that fixes the previous one, or a new > patch that stands alone? (I'm not sure how this tree is maintained, exactly.) It has past the point when I should apply a fixup, rdma is general has about an approximately 1 day period after the 'thanks applied' message where rebase is possible Otherwise the tree is strictly no rebase Jason
On 3/5/19 5:51 PM, Jason Gunthorpe wrote: > On Tue, Mar 05, 2019 at 05:37:18PM -0800, John Hubbard wrote: >> On 3/5/19 5:34 PM, John Hubbard wrote: >> [snip] >>>>> So release_pages(&local_page_list[j+1], npages - j-1) would be correct. >>>> >>>> Someone send a fixup patch please... >>>> >>>> Jason >>> >>> Yeah, I'm on it. Just need to double-check that this is the case. But Jason, >>> you're confirming it already, so that helps too. > > I didn't look, just assuming Artemy is right since he knows this > code.. > OK. And I've confirmed it, too. >>> Patch coming shortly. >>> >> >> Jason, btw, do you prefer a patch that fixes the previous one, or a new >> patch that stands alone? (I'm not sure how this tree is maintained, exactly.) > > It has past the point when I should apply a fixup, rdma is general has > about an approximately 1 day period after the 'thanks applied' message > where rebase is possible > > Otherwise the tree is strictly no rebase > > Jason > Got it, patch is posted now. thanks,
diff --git a/drivers/infiniband/core/umem_odp.c b/drivers/infiniband/core/umem_odp.c index acb882f279cb..83872c1f3f2c 100644 --- a/drivers/infiniband/core/umem_odp.c +++ b/drivers/infiniband/core/umem_odp.c @@ -40,6 +40,7 @@ #include <linux/vmalloc.h> #include <linux/hugetlb.h> #include <linux/interval_tree_generic.h> +#include <linux/pagemap.h> #include <rdma/ib_verbs.h> #include <rdma/ib_umem.h> @@ -648,25 +649,17 @@ int ib_umem_odp_map_dma_pages(struct ib_umem_odp *umem_odp, u64 user_virt, if (npages < 0) { if (npages != -EAGAIN) - pr_warn("fail to get %zu user pages with error %d\n", gup_num_pages, npages); + pr_warn("fail to get %zu user pages with error %d\n", + gup_num_pages, npages); else - pr_debug("fail to get %zu user pages with error %d\n", gup_num_pages, npages); + pr_debug("fail to get %zu user pages with error %d\n", + gup_num_pages, npages); break; } bcnt -= min_t(size_t, npages << PAGE_SHIFT, bcnt); mutex_lock(&umem_odp->umem_mutex); for (j = 0; j < npages; j++, user_virt += PAGE_SIZE) { - if (user_virt & ~page_mask) { - p += PAGE_SIZE; - if (page_to_phys(local_page_list[j]) != p) { - ret = -EFAULT; - break; - } - put_page(local_page_list[j]); - continue; - } - ret = ib_umem_odp_map_dma_single_page( umem_odp, k, local_page_list[j], access_mask, current_seq); @@ -684,9 +677,11 @@ int ib_umem_odp_map_dma_pages(struct ib_umem_odp *umem_odp, u64 user_virt, mutex_unlock(&umem_odp->umem_mutex); if (ret < 0) { - /* Release left over pages when handling errors. */ - for (++j; j < npages; ++j) - put_page(local_page_list[j]); + /* + * Release pages, starting at the the first page + * that experienced an error. + */ + release_pages(&local_page_list[j], npages - j); break; } }