Message ID | 20210926161259.238054-5-namit@vmware.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm/madvise: support process_madvise(MADV_DONTNEED) | expand |
On Sun, Sep 26, 2021 at 09:12:55AM -0700, Nadav Amit wrote: > From: Nadav Amit <namit@vmware.com> > > The different behaviors of madvise are different in several ways, which > are distributed across several functions. Use the design pattern from > iouring in order to define the actions that are required for each > behavior. > > The next patches will get rid of old helper functions that are modified > in this patch and the redundant use of array_index_nospec(). The next > patches will add more actions for each leaf into the new struct. > > No functional change is intended. > > Cc: Andrea Arcangeli <aarcange@redhat.com> > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: Minchan Kim <minchan@kernel.org> > Cc: Colin Cross <ccross@google.com> > Cc: Suren Baghdasarya <surenb@google.com> > Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> > Signed-off-by: Nadav Amit <namit@vmware.com> > --- > mm/madvise.c | 168 +++++++++++++++++++++++++++++++++------------------ > 1 file changed, 109 insertions(+), 59 deletions(-) > > diff --git a/mm/madvise.c b/mm/madvise.c > index 17e39c70704b..127507c71ba9 100644 > --- a/mm/madvise.c > +++ b/mm/madvise.c > @@ -29,6 +29,7 @@ > #include <linux/swapops.h> > #include <linux/shmem_fs.h> > #include <linux/mmu_notifier.h> > +#include <linux/nospec.h> > > #include <asm/tlb.h> > > @@ -39,6 +40,101 @@ struct madvise_walk_private { > bool pageout; > }; > > +struct madvise_info { > + u8 behavior_valid: 1; > + u8 process_behavior_valid: 1; > + u8 need_mmap_read_only: 1; > +}; > + > +static const struct madvise_info madvise_info[MADV_SOFT_OFFLINE+1] = { MADV_SOFT_OFFLINE+1 smells bad. And I don't like the change in general. Given that MADV_SOFT_OFFLINE is 101, the array will be mostly empty. I donno. I don't see any improvement with the patch. But maybe it's only me.
> On Sep 27, 2021, at 2:31 AM, Kirill A. Shutemov <kirill@shutemov.name> wrote: > > On Sun, Sep 26, 2021 at 09:12:55AM -0700, Nadav Amit wrote: >> From: Nadav Amit <namit@vmware.com> >> >> The different behaviors of madvise are different in several ways, which >> are distributed across several functions. Use the design pattern from >> iouring in order to define the actions that are required for each >> behavior. >> >> The next patches will get rid of old helper functions that are modified >> in this patch and the redundant use of array_index_nospec(). The next >> patches will add more actions for each leaf into the new struct. >> >> No functional change is intended. >> >> Cc: Andrea Arcangeli <aarcange@redhat.com> >> Cc: Andrew Morton <akpm@linux-foundation.org> >> Cc: Minchan Kim <minchan@kernel.org> >> Cc: Colin Cross <ccross@google.com> >> Cc: Suren Baghdasarya <surenb@google.com> >> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> >> Signed-off-by: Nadav Amit <namit@vmware.com> >> --- >> mm/madvise.c | 168 +++++++++++++++++++++++++++++++++------------------ >> 1 file changed, 109 insertions(+), 59 deletions(-) >> >> diff --git a/mm/madvise.c b/mm/madvise.c >> index 17e39c70704b..127507c71ba9 100644 >> --- a/mm/madvise.c >> +++ b/mm/madvise.c >> @@ -29,6 +29,7 @@ >> #include <linux/swapops.h> >> #include <linux/shmem_fs.h> >> #include <linux/mmu_notifier.h> >> +#include <linux/nospec.h> >> >> #include <asm/tlb.h> >> >> @@ -39,6 +40,101 @@ struct madvise_walk_private { >> bool pageout; >> }; >> >> +struct madvise_info { >> + u8 behavior_valid: 1; >> + u8 process_behavior_valid: 1; >> + u8 need_mmap_read_only: 1; >> +}; >> + >> +static const struct madvise_info madvise_info[MADV_SOFT_OFFLINE+1] = { > > MADV_SOFT_OFFLINE+1 smells bad. I can set another constant instead and let the compiler shout if anything outside the array is initialized. > > And I don't like the change in general. Given that MADV_SOFT_OFFLINE is > 101, the array will be mostly empty. Seriously, these is less than 128B - two cachelines. Perhaps they should be aligned. But this whole change should have no effect on code/data size. > > I donno. I don't see any improvement with the patch. But maybe it's only me. The following patches make it clearer when TLBs flushes are batched and when mmap_lock is not taken (which is by the way not clear from the code). I could have added two more functions for that and it would have taken me less time. I do not think the end result of having ~5 different functions to figure out the actions needed for each behavior would be as clear.
On Mon, Sep 27, 2021 at 03:31:21AM -0700, Nadav Amit wrote: > > > > On Sep 27, 2021, at 2:31 AM, Kirill A. Shutemov <kirill@shutemov.name> wrote: > > > > On Sun, Sep 26, 2021 at 09:12:55AM -0700, Nadav Amit wrote: > >> From: Nadav Amit <namit@vmware.com> > >> > >> The different behaviors of madvise are different in several ways, which > >> are distributed across several functions. Use the design pattern from > >> iouring in order to define the actions that are required for each > >> behavior. > >> > >> The next patches will get rid of old helper functions that are modified > >> in this patch and the redundant use of array_index_nospec(). The next > >> patches will add more actions for each leaf into the new struct. > >> > >> No functional change is intended. > >> > >> Cc: Andrea Arcangeli <aarcange@redhat.com> > >> Cc: Andrew Morton <akpm@linux-foundation.org> > >> Cc: Minchan Kim <minchan@kernel.org> > >> Cc: Colin Cross <ccross@google.com> > >> Cc: Suren Baghdasarya <surenb@google.com> > >> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> > >> Signed-off-by: Nadav Amit <namit@vmware.com> > >> --- > >> mm/madvise.c | 168 +++++++++++++++++++++++++++++++++------------------ > >> 1 file changed, 109 insertions(+), 59 deletions(-) > >> > >> diff --git a/mm/madvise.c b/mm/madvise.c > >> index 17e39c70704b..127507c71ba9 100644 > >> --- a/mm/madvise.c > >> +++ b/mm/madvise.c > >> @@ -29,6 +29,7 @@ > >> #include <linux/swapops.h> > >> #include <linux/shmem_fs.h> > >> #include <linux/mmu_notifier.h> > >> +#include <linux/nospec.h> > >> > >> #include <asm/tlb.h> > >> > >> @@ -39,6 +40,101 @@ struct madvise_walk_private { > >> bool pageout; > >> }; > >> > >> +struct madvise_info { > >> + u8 behavior_valid: 1; > >> + u8 process_behavior_valid: 1; > >> + u8 need_mmap_read_only: 1; > >> +}; > >> + > >> +static const struct madvise_info madvise_info[MADV_SOFT_OFFLINE+1] = { > > > > MADV_SOFT_OFFLINE+1 smells bad. > > I can set another constant instead and let the compiler shout if anything > outside the array is initialized. I would rather introduce a function that would return struct madvise_info for a given behavior. The function would have a switch inside. The default: may have BUILD_BUG() or something.
> On Sep 27, 2021, at 5:14 AM, Kirill A. Shutemov <kirill@shutemov.name> wrote: > > On Mon, Sep 27, 2021 at 03:31:21AM -0700, Nadav Amit wrote: >> >> >>> On Sep 27, 2021, at 2:31 AM, Kirill A. Shutemov <kirill@shutemov.name> wrote: >>> >>> On Sun, Sep 26, 2021 at 09:12:55AM -0700, Nadav Amit wrote: >>>> From: Nadav Amit <namit@vmware.com> >>>> >>>> The different behaviors of madvise are different in several ways, which >>>> are distributed across several functions. Use the design pattern from >>>> iouring in order to define the actions that are required for each >>>> behavior. >>>> >>>> The next patches will get rid of old helper functions that are modified >>>> in this patch and the redundant use of array_index_nospec(). The next >>>> patches will add more actions for each leaf into the new struct. >>> [ snip ] >>> MADV_SOFT_OFFLINE+1 smells bad. >> >> I can set another constant instead and let the compiler shout if anything >> outside the array is initialized. > > I would rather introduce a function that would return struct madvise_info > for a given behavior. The function would have a switch inside. The default: > may have BUILD_BUG() or something. Sounds better than my solution. I will do so.
diff --git a/mm/madvise.c b/mm/madvise.c index 17e39c70704b..127507c71ba9 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -29,6 +29,7 @@ #include <linux/swapops.h> #include <linux/shmem_fs.h> #include <linux/mmu_notifier.h> +#include <linux/nospec.h> #include <asm/tlb.h> @@ -39,6 +40,101 @@ struct madvise_walk_private { bool pageout; }; +struct madvise_info { + u8 behavior_valid: 1; + u8 process_behavior_valid: 1; + u8 need_mmap_read_only: 1; +}; + +static const struct madvise_info madvise_info[MADV_SOFT_OFFLINE+1] = { + [MADV_DOFORK] = { + .behavior_valid = 1, + }, + [MADV_DONTFORK] = { + .behavior_valid = 1, + }, + [MADV_NORMAL] = { + .behavior_valid = 1, + }, + [MADV_SEQUENTIAL] = { + .behavior_valid = 1, + }, + [MADV_RANDOM] = { + .behavior_valid = 1, + }, + [MADV_REMOVE] = { + .behavior_valid = 1, + .need_mmap_read_only = 1, + }, + [MADV_WILLNEED] = { + .behavior_valid = 1, + .process_behavior_valid = 1, + .need_mmap_read_only = 1, + }, + [MADV_DONTNEED] = { + .behavior_valid = 1, + .need_mmap_read_only = 1, + }, + [MADV_FREE] = { + .behavior_valid = 1, + .need_mmap_read_only = 1, + }, + [MADV_COLD] = { + .behavior_valid = 1, + .process_behavior_valid = 1, + .need_mmap_read_only = 1, + }, + [MADV_PAGEOUT] = { + .behavior_valid = 1, + .process_behavior_valid = 1, + .need_mmap_read_only = 1, + }, +#ifdef CONFIG_KSM + [MADV_MERGEABLE] = { + .behavior_valid = 1, + }, + [MADV_UNMERGEABLE] = { + .behavior_valid = 1, + }, +#endif +#ifdef CONFIG_TRANSPARENT_HUGEPAGE + [MADV_HUGEPAGE] = { + .behavior_valid = 1, + }, + [MADV_NOHUGEPAGE] = { + .behavior_valid = 1, + }, +#endif + [MADV_DONTDUMP] = { + .behavior_valid = 1, + }, + [MADV_DODUMP] = { + .behavior_valid = 1, + }, + [MADV_WIPEONFORK] = { + .behavior_valid = 1, + }, + [MADV_KEEPONFORK] = { + .behavior_valid = 1, + }, +#ifdef CONFIG_MEMORY_FAILURE + [MADV_HWPOISON] = { + .behavior_valid = 1, + }, + [MADV_SOFT_OFFLINE] = { + .behavior_valid = 1, + }, +#endif + [MADV_POPULATE_READ] = { + .behavior_valid = 1, + .need_mmap_read_only = 1, + }, + [MADV_POPULATE_WRITE] = { + .behavior_valid = 1, + .need_mmap_read_only = 1, + }, +}; + /* * Any behaviour which results in changes to the vma->vm_flags needs to * take mmap_lock for writing. Others, which simply traverse vmas, need @@ -46,20 +142,7 @@ struct madvise_walk_private { */ static int madvise_need_mmap_write(int behavior) { - switch (behavior) { - case MADV_REMOVE: - case MADV_WILLNEED: - case MADV_DONTNEED: - case MADV_COLD: - case MADV_PAGEOUT: - case MADV_FREE: - case MADV_POPULATE_READ: - case MADV_POPULATE_WRITE: - return 0; - default: - /* be safe, default to 1. list exceptions explicitly */ - return 1; - } + return !madvise_info[behavior].need_mmap_read_only; } /* @@ -999,56 +1082,23 @@ madvise_vma(struct vm_area_struct *vma, struct vm_area_struct **prev, } static bool -madvise_behavior_valid(int behavior) +madvise_behavior_valid(int *behavior) { - switch (behavior) { - case MADV_DOFORK: - case MADV_DONTFORK: - case MADV_NORMAL: - case MADV_SEQUENTIAL: - case MADV_RANDOM: - case MADV_REMOVE: - case MADV_WILLNEED: - case MADV_DONTNEED: - case MADV_FREE: - case MADV_COLD: - case MADV_PAGEOUT: - case MADV_POPULATE_READ: - case MADV_POPULATE_WRITE: -#ifdef CONFIG_KSM - case MADV_MERGEABLE: - case MADV_UNMERGEABLE: -#endif -#ifdef CONFIG_TRANSPARENT_HUGEPAGE - case MADV_HUGEPAGE: - case MADV_NOHUGEPAGE: -#endif - case MADV_DONTDUMP: - case MADV_DODUMP: - case MADV_WIPEONFORK: - case MADV_KEEPONFORK: -#ifdef CONFIG_MEMORY_FAILURE - case MADV_SOFT_OFFLINE: - case MADV_HWPOISON: -#endif - return true; - - default: + if (*behavior >= ARRAY_SIZE(madvise_info)) return false; - } + + *behavior = array_index_nospec(*behavior, ARRAY_SIZE(madvise_info)); + return madvise_info[*behavior].behavior_valid; } static bool -process_madvise_behavior_valid(int behavior) +process_madvise_behavior_valid(int *behavior) { - switch (behavior) { - case MADV_COLD: - case MADV_PAGEOUT: - case MADV_WILLNEED: - return true; - default: + if (*behavior >= ARRAY_SIZE(madvise_info)) return false; - } + + *behavior = array_index_nospec(*behavior, ARRAY_SIZE(madvise_info)); + return madvise_info[*behavior].process_behavior_valid; } /* @@ -1133,7 +1183,7 @@ int do_madvise(struct mm_struct *mm, unsigned long start, size_t len_in, int beh start = untagged_addr(start); - if (!madvise_behavior_valid(behavior)) + if (!madvise_behavior_valid(&behavior)) return error; if (!PAGE_ALIGNED(start)) @@ -1258,7 +1308,7 @@ SYSCALL_DEFINE5(process_madvise, int, pidfd, const struct iovec __user *, vec, goto put_pid; } - if (!process_madvise_behavior_valid(behavior)) { + if (!process_madvise_behavior_valid(&behavior)) { ret = -EINVAL; goto release_task; }