Message ID | 20200603225630.dODblpnlR%akpm@linux-foundation.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [001/131] mm/slub: fix a memory leak in sysfs_slab_add() | expand |
On Wed, Jun 3, 2020 at 3:56 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > From: John Hubbard <jhubbard@nvidia.com> > Subject: mm/gup: refactor and de-duplicate gup_fast() code > > There were two nearly identical sets of code for gup_fast() style of > walking the page tables with interrupts disabled. This has lead to the > usual maintenance problems that arise from having duplicated code. Andrew, this is actually an example of why you absolutely should *not* rebase your series in the middle of the development tree. Now you've rebased it on top of my commit 17839856fd58 ("gup: document and work around "COW can break either way" issue") and in the process you broke the result completely for read-only pages. Now it uses FOLL_WRITE (because that's what internal_get_user_pages_fast() does), which will disallow read-only pages (in order to handle them properly for COW in the slow path), and then the fact that the slow-path is entirely disabled for this case means that it doesn't work at all. This "rebase onto whatever random base Linus has today" absolutely has *got* to stop. It's not ok for git trees, but it's not ok for these patch-queues either. It means that all the testing your patch queue got in linux-next is completely worthless, because what you send me is something very different from what was tested. Exactly as with the git trees, where I tell people constantly not to rebase their patches. Give me a base that it has been tested on, and a series that has actually been tested. Not this "rebased for your convenience" thing. I'd _much_ rather get a merge conflict when your patch series changes something that somebody else also changed. Because then I know something clashed, and if I screw up the merge, I only have myself to blame. If it's a very complex merge, I'll ask for help. That would be much better than getting a patch-bomb with 131 patches that all _look_ sane and build cleanly, but can be randomly broken because they got rebased hours before with no testing. The "let me fix things up onto a daily snapshot" really is a completely broken model. You are making it _harder_ for me, not easier, because now I have to look for subtle issues in every single commit rather than the big honking clue of "oh, I got a merge error, I'll need to really look at it". It so happened that with this one, I was very aware of the rebase, because you rebased on a patch that I wrote so when I looked through the patches I went "Hmm.." What about all the other times when I wouldn't have noticed and been so aware of what changed recently? Again: merge conflicts are *much* better than silently rebasing and hiding problems. Linus
On Wed, Jun 3, 2020 at 7:19 PM Linus Torvalds <torvalds@linux-foundation.org> wrote: > > Now it uses FOLL_WRITE (because that's what > internal_get_user_pages_fast() does), which will disallow read-only > pages (in order to handle them properly for COW in the slow path), and > then the fact that the slow-path is entirely disabled for this case > means that it doesn't work at all. I have tried to fix it up, partly by editing the patches directly, and partly by then trying to fix up comments after-the-fact. The end result looks possibly correct after it all. But it would have been easier had I just had a merge conflict to deal with, rather than trying to fix up patches. Will do more testing etc before really merging and then pushing out. Linus
On Wed, Jun 3, 2020 at 8:19 PM Linus Torvalds <torvalds@linux-foundation.org> wrote: > > I have tried to fix it up, partly by editing the patches directly, and > partly by then trying to fix up comments after-the-fact. The end result passes the smell test, boots for me, and looks like it might work. But I don't have any good real-world test for this, and I hope and assume that John has something GPU-related that actually uses the code and cares. Presumably there was _something_ that triggered those changes to de-duplicate that code? So please give it a look. Because of how I edited the patches (and Andrew edited them before me), what is attributed to John Hubbard isn't really the same as the patch he originally wrote. If I broke something in the process, feel free to let me know in less than polite terms. But it look better than the intermediate situation that definitely looked like it would just fail entirely on any read-only mappings due to not being able to fall back on the slow case. The drm code probably doesn't even care about the possible ambiguity with GUP picking a COW page that might later break the other way. Linus
On 2020-06-03 21:31, Linus Torvalds wrote: > On Wed, Jun 3, 2020 at 8:19 PM Linus Torvalds > <torvalds@linux-foundation.org> wrote: >> >> I have tried to fix it up, partly by editing the patches directly, and >> partly by then trying to fix up comments after-the-fact. > > The end result passes the smell test, boots for me, and looks like it > might work. > > But I don't have any good real-world test for this, and I hope and > assume that John has something GPU-related that actually uses the code > and cares. Presumably there was _something_ that triggered those > changes to de-duplicate that code? Yes: the Intel i915 driver required a pin_user_pages*() variant of the gup fast-only code. So the next 2 patches put the refactored code into use: 2170ecfa7688 drm/i915: convert get_user_pages() --> pin_user_pages() 104acc327648 mm/gup: introduce pin_user_pages_fast_only() > > So please give it a look. Because of how I edited the patches (and > Andrew edited them before me), what is attributed to John Hubbard > isn't really the same as the patch he originally wrote. > Looking at it now. I'm pleased to see that the fix is basically identical to a local fix that I was testing an hour ago. The only difference is the name and type of the local fast_flags variable. An unsigned long is larger than the API requires, but that is of course fine for now. As for testing, the original version of this the was part of a 4-part series [1] that ended up converting Intel i915 to use pin_user_pages*(). And Chris Wilson (+cc) was kind enough to run some drm/i915 CI tests on that and they passed at the time. Also, I have a set of xfstests and a few other things exercise a fair amount of get_user_pages*() and pin_user_pages*(). Running those now. But my run time testing is not set up for stress testing, and it's a very narrow look at things. But so far it looks promising. [1] https://lore.kernel.org/r/20200522051931.54191-1-jhubbard@nvidia.com thanks,
--- a/include/linux/mm.h~mm-gup-refactor-and-de-duplicate-gup_fast-code +++ a/include/linux/mm.h @@ -2816,6 +2816,7 @@ struct page *follow_page(struct vm_area_ #define FOLL_LONGTERM 0x10000 /* mapping lifetime is indefinite: see below */ #define FOLL_SPLIT_PMD 0x20000 /* split huge pmd before returning */ #define FOLL_PIN 0x40000 /* pages must be released via unpin_user_page */ +#define FOLL_FAST_ONLY 0x80000 /* gup_fast: prevent fall-back to slow gup */ /* * FOLL_PIN and FOLL_LONGTERM may be used in various combinations with each --- a/mm/gup.c~mm-gup-refactor-and-de-duplicate-gup_fast-code +++ a/mm/gup.c @@ -2731,10 +2731,12 @@ static int internal_get_user_pages_fast( struct page **pages) { unsigned long addr, len, end; + unsigned long flags; int nr_pinned = 0, ret = 0; if (WARN_ON_ONCE(gup_flags & ~(FOLL_WRITE | FOLL_LONGTERM | - FOLL_FORCE | FOLL_PIN | FOLL_GET))) + FOLL_FORCE | FOLL_PIN | FOLL_GET | + FOLL_FAST_ONLY))) return -EINVAL; start = untagged_addr(start) & PAGE_MASK; @@ -2753,16 +2755,26 @@ static int internal_get_user_pages_fast( * order to avoid confusing the normal COW routines. So only * targets that are already writable are safe to do by just * looking at the page tables. + * + * Disable interrupts. The nested form is used, in order to allow full, + * general purpose use of this routine. + * + * With interrupts disabled, we block page table pages from being + * freed from under us. See struct mmu_table_batch comments in + * include/asm-generic/tlb.h for more details. + * + * We do not adopt an rcu_read_lock(.) here as we also want to + * block IPIs that come from THPs splitting. */ if (IS_ENABLED(CONFIG_HAVE_FAST_GUP) && gup_fast_permitted(start, end)) { - local_irq_disable(); + local_irq_save(flags); gup_pgd_range(addr, end, gup_flags | FOLL_WRITE, pages, &nr_pinned); - local_irq_enable(); + local_irq_restore(flags); ret = nr_pinned; } - if (nr_pinned < nr_pages) { + if (nr_pinned < nr_pages && !(gup_flags & FOLL_FAST_ONLY)) { /* Try to get the remaining pages with get_user_pages */ start += nr_pinned << PAGE_SHIFT; pages += nr_pinned; @@ -2798,37 +2810,27 @@ static int internal_get_user_pages_fast( int __get_user_pages_fast(unsigned long start, int nr_pages, int write, struct page **pages) { - unsigned long len, end; - unsigned long flags; - int nr_pinned = 0; + int nr_pinned; /* * Internally (within mm/gup.c), gup fast variants must set FOLL_GET, * because gup fast is always a "pin with a +1 page refcount" request. + * + * FOLL_FAST_ONLY is required in order to match the API description of + * this routine: no fall back to regular ("slow") GUP. */ - unsigned int gup_flags = FOLL_GET; + unsigned int gup_flags = FOLL_GET | FOLL_FAST_ONLY; if (write) gup_flags |= FOLL_WRITE; - start = untagged_addr(start) & PAGE_MASK; - len = (unsigned long) nr_pages << PAGE_SHIFT; - end = start + len; - - if (end <= start) - return 0; - if (unlikely(!access_ok((void __user *)start, len))) - return 0; + nr_pinned = internal_get_user_pages_fast(start, nr_pages, gup_flags, + pages); /* - * Disable interrupts. We use the nested form as we can already have - * interrupts disabled by get_futex_key. - * - * With interrupts disabled, we block page table pages from being - * freed from under us. See struct mmu_table_batch comments in - * include/asm-generic/tlb.h for more details. - * - * We do not adopt an rcu_read_lock(.) here as we also want to - * block IPIs that come from THPs splitting. + * As specified in the API description above, this routine is not + * allowed to return negative values. However, the common core + * routine internal_get_user_pages_fast() *can* return -errno. + * Therefore, correct for that here: * * NOTE! We allow read-only gup_fast() here, but you'd better be * careful about possible COW pages. You'll get _a_ COW page, but @@ -2836,13 +2838,8 @@ int __get_user_pages_fast(unsigned long * COW event happens after this. COW may break the page copy in a * random direction. */ - - if (IS_ENABLED(CONFIG_HAVE_FAST_GUP) && - gup_fast_permitted(start, end)) { - local_irq_save(flags); - gup_pgd_range(start, end, gup_flags, pages, &nr_pinned); - local_irq_restore(flags); - } + if (nr_pinned < 0) + nr_pinned = 0; return nr_pinned; }