Message ID | 20220405014929.15158-1-peterx@redhat.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | userfaultfd-wp: Support shmem and hugetlbfs | expand |
Hi Peter, On Mon, Apr 04, 2022 at 09:49:29PM -0400, Peter Xu wrote: > Enable PTE markers by default. On x86_64 it means it'll auto-enable > PTE_MARKER_UFFD_WP as well. > > Signed-off-by: Peter Xu <peterx@redhat.com> > --- > mm/Kconfig | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/mm/Kconfig b/mm/Kconfig > index 6e7c2d59fa96..3eca34c864c5 100644 > --- a/mm/Kconfig > +++ b/mm/Kconfig > @@ -911,12 +911,14 @@ config ANON_VMA_NAME > > config PTE_MARKER > bool "Marker PTEs support" > + default y > > help > Allows to create marker PTEs for file-backed memory. make oldconfig just prompted me on these: --- Marker PTEs support (PTE_MARKER) [Y/n/?] (NEW) ? CONFIG_PTE_MARKER: Allows to create marker PTEs for file-backed memory. Symbol: PTE_MARKER [=y] Type : bool Defined at mm/Kconfig:1046 Prompt: Marker PTEs support Location: Main menu -> Memory Management options --- > config PTE_MARKER_UFFD_WP > bool "Marker PTEs support for userfaultfd write protection" > + default y > depends on PTE_MARKER && HAVE_ARCH_USERFAULTFD_WP It's not possible to answer them without looking at the code. But after looking at the code, I'm still not sure why it asks me. Isn't this infrastructure code? Wouldn't it make more sense to remove the prompt string and have userfaultfd simply select those? If this is too experimental to enable per default, a more reasonable question for the user would be a "userfaultfd file support" option or something, and have *that* select the marker code. Thanks!
On Tue, Apr 19, 2022 at 11:13:48AM -0400, Johannes Weiner wrote: > Hi Peter, Hi, Johannes, > > On Mon, Apr 04, 2022 at 09:49:29PM -0400, Peter Xu wrote: > > Enable PTE markers by default. On x86_64 it means it'll auto-enable > > PTE_MARKER_UFFD_WP as well. > > > > Signed-off-by: Peter Xu <peterx@redhat.com> > > --- > > mm/Kconfig | 2 ++ > > 1 file changed, 2 insertions(+) > > > > diff --git a/mm/Kconfig b/mm/Kconfig > > index 6e7c2d59fa96..3eca34c864c5 100644 > > --- a/mm/Kconfig > > +++ b/mm/Kconfig > > @@ -911,12 +911,14 @@ config ANON_VMA_NAME > > > > config PTE_MARKER > > bool "Marker PTEs support" > > + default y > > > > help > > Allows to create marker PTEs for file-backed memory. > > make oldconfig just prompted me on these: > > --- > Marker PTEs support (PTE_MARKER) [Y/n/?] (NEW) ? > > CONFIG_PTE_MARKER: > > Allows to create marker PTEs for file-backed memory. > > Symbol: PTE_MARKER [=y] > Type : bool > Defined at mm/Kconfig:1046 > Prompt: Marker PTEs support > Location: > Main menu > -> Memory Management options > --- > > > config PTE_MARKER_UFFD_WP > > bool "Marker PTEs support for userfaultfd write protection" > > + default y > > depends on PTE_MARKER && HAVE_ARCH_USERFAULTFD_WP > > It's not possible to answer them without looking at the code. > > But after looking at the code, I'm still not sure why it asks > me. Isn't this infrastructure code? > > Wouldn't it make more sense to remove the prompt string and have > userfaultfd simply select those? > > If this is too experimental to enable per default, a more reasonable > question for the user would be a "userfaultfd file support" option or > something, and have *that* select the marker code. Thanks for raising this question. Actually it's right now enabled by default, so I kept the options just to make sure we can always explicitly disable those options when we want. That's majorly why I kept this patch standalone one so if we want we can even drop it. Said that, I fully agree with you that having two options seem to be an overkill, especially the PTE_MARKER option will be too challenging to be correctly understood by anyone not familiar with it. So after a 2nd thought I'm trying to refine what I want to achieve with the kbuild system on this new feature: - On supported systems (x86_64), should be by default y with "make olddefconfig", but able to turn it off using "make oldconfig" etc., so the user will have a choice when they want. - On not-supported systems (non-x86_64), should be always n without any prompt asking, and user won't even see this entry. - PTE_MARKER option should always be hidden for all archs I plan to post a patch that is attached (I also reworded the entry to not mention about pte markers). Does that look acceptable on your side? Thanks,
Hi Peter, On Tue, Apr 19, 2022 at 03:59:21PM -0400, Peter Xu wrote: > @@ -910,16 +910,16 @@ config ANON_VMA_NAME > difference in their name. > > config PTE_MARKER > - bool "Marker PTEs support" > - default y > + bool > > help > Allows to create marker PTEs for file-backed memory. > > config PTE_MARKER_UFFD_WP > - bool "Marker PTEs support for userfaultfd write protection" > + bool "Userfaultfd write protection support for shmem/hugetlbfs" > default y > - depends on PTE_MARKER && HAVE_ARCH_USERFAULTFD_WP > + depends on HAVE_ARCH_USERFAULTFD_WP > + select PTE_MARKER This is much easier to understand, thanks! Btw, this doesn't do much without userfaultfd being enabled in general, right? Would it make sense to have it next to 'config USERFAULTFD' as a sub-option?
On Tue, Apr 19, 2022 at 04:14:11PM -0400, Johannes Weiner wrote: > Hi Peter, > > On Tue, Apr 19, 2022 at 03:59:21PM -0400, Peter Xu wrote: > > @@ -910,16 +910,16 @@ config ANON_VMA_NAME > > difference in their name. > > > > config PTE_MARKER > > - bool "Marker PTEs support" > > - default y > > + bool > > > > help > > Allows to create marker PTEs for file-backed memory. > > > > config PTE_MARKER_UFFD_WP > > - bool "Marker PTEs support for userfaultfd write protection" > > + bool "Userfaultfd write protection support for shmem/hugetlbfs" > > default y > > - depends on PTE_MARKER && HAVE_ARCH_USERFAULTFD_WP > > + depends on HAVE_ARCH_USERFAULTFD_WP > > + select PTE_MARKER > > This is much easier to understand, thanks! Cool! Sent as a formal patch just now. > > Btw, this doesn't do much without userfaultfd being enabled in > general, right? So far yes, but I'm thinking there can be potential other users of PTE_MARKERS from mm world. The most close discussion is on the swap read failures and this patch proposed by Miaohe: https://lore.kernel.org/lkml/20220416030549.60559-1-linmiaohe@huawei.com/ So I hope we can still keep them around here under mm/ if possible, and from the gut feeling it really should.. > Would it make sense to have it next to 'config USERFAULTFD' as a > sub-option? Yes another good question. :) IIUC CONFIG_USERFAULTFD resides in init/Kconfig because it introduces a new syscall. Same to the rest of the bits for uffd since then, namely: - USERFAULTFD_WP - USERFAULTFD_MINOR What I am thinking now is the other way round of your suggestion: whether we should move most of them out, at least the _WP and _MINOR configs into mm/? Because IMHO they are really pure mm ideas and they're irrelevant to syscalls and init. Any thoughts?
On Tue, Apr 19, 2022 at 04:28:16PM -0400, Peter Xu wrote: > On Tue, Apr 19, 2022 at 04:14:11PM -0400, Johannes Weiner wrote: > > On Tue, Apr 19, 2022 at 03:59:21PM -0400, Peter Xu wrote: > > > @@ -910,16 +910,16 @@ config ANON_VMA_NAME > > Btw, this doesn't do much without userfaultfd being enabled in > > general, right? > > So far yes, but I'm thinking there can be potential other users of > PTE_MARKERS from mm world. The most close discussion is on the swap read > failures and this patch proposed by Miaohe: > > https://lore.kernel.org/lkml/20220416030549.60559-1-linmiaohe@huawei.com/ > > So I hope we can still keep them around here under mm/ if possible, and > from the gut feeling it really should.. Agreed, mm/ seems a good fit for PTE_MARKER. If it's invisible and gets selected as needed, it's less of a concern, IMO. I'm somewhat worried about when and how the user-visible options show up right now, though... > > Would it make sense to have it next to 'config USERFAULTFD' as a > > sub-option? > > Yes another good question. :) > > IIUC CONFIG_USERFAULTFD resides in init/Kconfig because it introduces a new > syscall. Same to the rest of the bits for uffd since then, namely: > > - USERFAULTFD_WP > - USERFAULTFD_MINOR > > What I am thinking now is the other way round of your suggestion: whether > we should move most of them out, at least the _WP and _MINOR configs into > mm/? Because IMHO they are really pure mm ideas and they're irrelevant to > syscalls and init. I'm thinking the MM submenu would probably be a better fit for all user-visible userfaultfd options, including the syscall. Like you say, it's an MM concept. But if moving the syscall knob out from init isn't popular, IMO it would be better to add the new WP option to init as well. This ensures that when somebody selects userfaultfd, they also see the relevant suboptions and don't have to chase them down across multiple submenus. Conversely, they should also have the necessary depend clauses so that suboptions aren't visible without the main feature. E.g. it asked me for userfaultd options even though I have CONFIG_USERFAULTFD=n. What do you think?
On Tue, Apr 19, 2022 at 05:24:28PM -0400, Johannes Weiner wrote: > On Tue, Apr 19, 2022 at 04:28:16PM -0400, Peter Xu wrote: > > On Tue, Apr 19, 2022 at 04:14:11PM -0400, Johannes Weiner wrote: > > > On Tue, Apr 19, 2022 at 03:59:21PM -0400, Peter Xu wrote: > > > > @@ -910,16 +910,16 @@ config ANON_VMA_NAME > > > Btw, this doesn't do much without userfaultfd being enabled in > > > general, right? > > > > So far yes, but I'm thinking there can be potential other users of > > PTE_MARKERS from mm world. The most close discussion is on the swap read > > failures and this patch proposed by Miaohe: > > > > https://lore.kernel.org/lkml/20220416030549.60559-1-linmiaohe@huawei.com/ > > > > So I hope we can still keep them around here under mm/ if possible, and > > from the gut feeling it really should.. > > Agreed, mm/ seems a good fit for PTE_MARKER. > > If it's invisible and gets selected as needed, it's less of a concern, > IMO. I'm somewhat worried about when and how the user-visible options > show up right now, though... > > > > Would it make sense to have it next to 'config USERFAULTFD' as a > > > sub-option? > > > > Yes another good question. :) > > > > IIUC CONFIG_USERFAULTFD resides in init/Kconfig because it introduces a new > > syscall. Same to the rest of the bits for uffd since then, namely: > > > > - USERFAULTFD_WP > > - USERFAULTFD_MINOR > > > > What I am thinking now is the other way round of your suggestion: whether > > we should move most of them out, at least the _WP and _MINOR configs into > > mm/? Because IMHO they are really pure mm ideas and they're irrelevant to > > syscalls and init. > > I'm thinking the MM submenu would probably be a better fit for all > user-visible userfaultfd options, including the syscall. Like you say, > it's an MM concept. > > But if moving the syscall knob out from init isn't popular, IMO it > would be better to add the new WP option to init as well. This ensures > that when somebody selects userfaultfd, they also see the relevant > suboptions and don't have to chase them down across multiple submenus. > > Conversely, they should also have the necessary depend clauses so that > suboptions aren't visible without the main feature. E.g. it asked me > for userfaultd options even though I have CONFIG_USERFAULTFD=n. Hmm, this is a bit weird... since we do have that dependency chain for PTE_MARKER_UFFD_WP -> HAVE_ARCH_USERFAULTFD_WP -> USERFAULTFD: in arch/x86/Kconfig: config X86 ... select HAVE_ARCH_USERFAULTFD_WP if X86_64 && USERFAULTFD in mm/Kconfig (with/without the "mm/uffd: Hide PTE_MARKER" patch applied): config PTE_MARKER_UFFD_WP ... depends on HAVE_ARCH_USERFAULTFD_WP So logically if !USERFAULTFD we shouldn't see PTE_MARKER_UFFD_WP at all? That's also what I got when I tried it out for either !USERFAULTFD on x86, or any non-x86 platforms (because there we have !HAVE_ARCH_USERFAULTFD_WP constantly irrelevant of USERFAULTFD). Though I could have missed something.. > > What do you think? I don't have a strong preference here, I think it's okay if it's preferred that we only put user-visible configs into mm/Kconfig. It's just that I see we have tons of user-invisible configs already in mm/Kconfig, to list some: config ARCH_HAS_HUGEPD config MAPPING_DIRTY_HELPERS config KMAP_LOCAL config KMAP_LOCAL_NON_LINEAR_PTE_ARRAY But I'm not sure whether it's a rule of thumb somewhere else. At the meantime, I also looked at whether syscall configs are always and only be put under init/, and funnily I got: $ find . -name Kconfig | xargs grep --color -E "\".*syscall.*\"" ./init/Kconfig: bool "Enable process_vm_readv/writev syscalls" ./init/Kconfig: bool "uselib syscall" ./init/Kconfig: bool "sgetmask/ssetmask syscalls support" if EXPERT ./init/Kconfig: bool "Sysfs syscall support" if EXPERT ./init/Kconfig: bool "open by fhandle syscalls" if EXPERT ./init/Kconfig: bool "Enable madvise/fadvise syscalls" if EXPERT ./arch/xtensa/Kconfig: bool "Enable fast atomic syscalls" ./arch/xtensa/Kconfig: bool "Enable spill registers syscall" ./arch/powerpc/Kconfig: bool "Support setting protections for 4k subpages (subpage_prot syscall)" ./arch/powerpc/Kconfig: bool "Enable filtering of RTAS syscalls" ./arch/Kconfig: bool "Support for randomizing kernel stack offset on syscall entry" if EXPERT ./arch/s390/Kconfig: bool "Verify kernel signature during kexec_file_load() syscall" ./arch/sh/mm/Kconfig: bool "Support vsyscall page" ./arch/x86/Kconfig: bool "Enable vsyscall emulation" if EXPERT ./arch/x86/Kconfig: bool "Verify kernel signature during kexec_file_load() syscall" ./arch/x86/Kconfig: bool "Require a valid signature in kexec_file_load() syscall" ./arch/x86/Kconfig: prompt "vsyscall table for legacy applications" ./arch/arm64/Kconfig: bool "Verify kernel signature during kexec_file_load() syscall" ./arch/arm64/Kconfig: bool "Enable the tagged user addresses syscall ABI" ./kernel/trace/Kconfig: bool "Trace syscalls" ./kernel/trace/Kconfig: bool "Run selftest on syscall events" So let's put aside arch specific lines, ftrace does have FTRACE_SYSCALLS that lies in the kernel/trace/ dir.. not sure whether we could move USERFAULTFD and all the rest into mm/ as well? Or perhaps that's just a bad example? :) Thanks,
On Tue, Apr 19, 2022 at 06:01:14PM -0400, Peter Xu wrote: > On Tue, Apr 19, 2022 at 05:24:28PM -0400, Johannes Weiner wrote: > > On Tue, Apr 19, 2022 at 04:28:16PM -0400, Peter Xu wrote: > > > On Tue, Apr 19, 2022 at 04:14:11PM -0400, Johannes Weiner wrote: > > > > On Tue, Apr 19, 2022 at 03:59:21PM -0400, Peter Xu wrote: > > > > > @@ -910,16 +910,16 @@ config ANON_VMA_NAME > > > > Btw, this doesn't do much without userfaultfd being enabled in > > > > general, right? > > > > > > So far yes, but I'm thinking there can be potential other users of > > > PTE_MARKERS from mm world. The most close discussion is on the swap read > > > failures and this patch proposed by Miaohe: > > > > > > https://lore.kernel.org/lkml/20220416030549.60559-1-linmiaohe@huawei.com/ > > > > > > So I hope we can still keep them around here under mm/ if possible, and > > > from the gut feeling it really should.. > > > > Agreed, mm/ seems a good fit for PTE_MARKER. > > > > If it's invisible and gets selected as needed, it's less of a concern, > > IMO. I'm somewhat worried about when and how the user-visible options > > show up right now, though... > > > > > > Would it make sense to have it next to 'config USERFAULTFD' as a > > > > sub-option? > > > > > > Yes another good question. :) > > > > > > IIUC CONFIG_USERFAULTFD resides in init/Kconfig because it introduces a new > > > syscall. Same to the rest of the bits for uffd since then, namely: > > > > > > - USERFAULTFD_WP > > > - USERFAULTFD_MINOR > > > > > > What I am thinking now is the other way round of your suggestion: whether > > > we should move most of them out, at least the _WP and _MINOR configs into > > > mm/? Because IMHO they are really pure mm ideas and they're irrelevant to > > > syscalls and init. > > > > I'm thinking the MM submenu would probably be a better fit for all > > user-visible userfaultfd options, including the syscall. Like you say, > > it's an MM concept. > > > > But if moving the syscall knob out from init isn't popular, IMO it > > would be better to add the new WP option to init as well. This ensures > > that when somebody selects userfaultfd, they also see the relevant > > suboptions and don't have to chase them down across multiple submenus. > > > > Conversely, they should also have the necessary depend clauses so that > > suboptions aren't visible without the main feature. E.g. it asked me > > for userfaultd options even though I have CONFIG_USERFAULTFD=n. > > Hmm, this is a bit weird... since we do have that dependency chain for > PTE_MARKER_UFFD_WP -> HAVE_ARCH_USERFAULTFD_WP -> USERFAULTFD: > > in arch/x86/Kconfig: > config X86 > ... > select HAVE_ARCH_USERFAULTFD_WP if X86_64 && USERFAULTFD > > in mm/Kconfig (with/without the "mm/uffd: Hide PTE_MARKER" patch applied): > config PTE_MARKER_UFFD_WP > ... > depends on HAVE_ARCH_USERFAULTFD_WP > > So logically if !USERFAULTFD we shouldn't see PTE_MARKER_UFFD_WP at all? > > That's also what I got when I tried it out for either !USERFAULTFD on x86, > or any non-x86 platforms (because there we have !HAVE_ARCH_USERFAULTFD_WP > constantly irrelevant of USERFAULTFD). Though I could have missed > something.. Sorry, it asked me about PTE_MARKERS, and that conclusion got stuck in my head. Indeed, once that symbol is invisible we should be good. > > What do you think? > > I don't have a strong preference here, I think it's okay if it's preferred > that we only put user-visible configs into mm/Kconfig. It's just that I > see we have tons of user-invisible configs already in mm/Kconfig, to list > some: > > config ARCH_HAS_HUGEPD > config MAPPING_DIRTY_HELPERS > config KMAP_LOCAL > config KMAP_LOCAL_NON_LINEAR_PTE_ARRAY > > But I'm not sure whether it's a rule of thumb somewhere else. I wasn't objecting to invisible symbols in mm/. My point was simply that for the user it might be easiest and most intuitive if userfaultfd and its related suboptions are 1) grouped together and 2) in the MM submenu. > At the meantime, I also looked at whether syscall configs are always and > only be put under init/, and funnily I got: > > $ find . -name Kconfig | xargs grep --color -E "\".*syscall.*\"" > ./init/Kconfig: bool "Enable process_vm_readv/writev syscalls" > ./init/Kconfig: bool "uselib syscall" > ./init/Kconfig: bool "sgetmask/ssetmask syscalls support" if EXPERT > ./init/Kconfig: bool "Sysfs syscall support" if EXPERT > ./init/Kconfig: bool "open by fhandle syscalls" if EXPERT > ./init/Kconfig: bool "Enable madvise/fadvise syscalls" if EXPERT > ./arch/xtensa/Kconfig: bool "Enable fast atomic syscalls" > ./arch/xtensa/Kconfig: bool "Enable spill registers syscall" > ./arch/powerpc/Kconfig: bool "Support setting protections for 4k subpages (subpage_prot syscall)" > ./arch/powerpc/Kconfig: bool "Enable filtering of RTAS syscalls" > ./arch/Kconfig: bool "Support for randomizing kernel stack offset on syscall entry" if EXPERT > ./arch/s390/Kconfig: bool "Verify kernel signature during kexec_file_load() syscall" > ./arch/sh/mm/Kconfig: bool "Support vsyscall page" > ./arch/x86/Kconfig: bool "Enable vsyscall emulation" if EXPERT > ./arch/x86/Kconfig: bool "Verify kernel signature during kexec_file_load() syscall" > ./arch/x86/Kconfig: bool "Require a valid signature in kexec_file_load() syscall" > ./arch/x86/Kconfig: prompt "vsyscall table for legacy applications" > ./arch/arm64/Kconfig: bool "Verify kernel signature during kexec_file_load() syscall" > ./arch/arm64/Kconfig: bool "Enable the tagged user addresses syscall ABI" > ./kernel/trace/Kconfig: bool "Trace syscalls" > ./kernel/trace/Kconfig: bool "Run selftest on syscall events" > > So let's put aside arch specific lines, ftrace does have FTRACE_SYSCALLS > that lies in the kernel/trace/ dir.. not sure whether we could move > USERFAULTFD and all the rest into mm/ as well? Or perhaps that's just a > bad example? :) Yeah it looks like there is a healthy mix ;) To add to the list: - mm/Kconfig has CONFIG_SWAP for the swapon/swapoff syscalls. - fs/Kconfig has CONFIG_FILE_LOCKING, which adds the flock() syscall. - Interestingly, fs/Kconfig has CONFIG_MEMFD_CREATE for memfd_create() which is implemented in mm/memfd.c. It seems reasonable to me to move the userfaultfd stuff to mm as well, especially when it's becoming more than just a single binary question on whether you want a syscall or not, and has MM-specific suboptions.
On Wed, Apr 20, 2022 at 09:46:07AM -0400, Johannes Weiner wrote: > My point was simply that for the user it might be easiest and most > intuitive if userfaultfd and its related suboptions are 1) grouped > together and 2) in the MM submenu. Very reasonable request. > Yeah it looks like there is a healthy mix ;) To add to the list: > > - mm/Kconfig has CONFIG_SWAP for the swapon/swapoff syscalls. > - fs/Kconfig has CONFIG_FILE_LOCKING, which adds the flock() syscall. > - Interestingly, fs/Kconfig has CONFIG_MEMFD_CREATE for memfd_create() > which is implemented in mm/memfd.c. > > It seems reasonable to me to move the userfaultfd stuff to mm as well, > especially when it's becoming more than just a single binary question > on whether you want a syscall or not, and has MM-specific suboptions. Thanks for the extra information! Obviously as you said it's growing a little bit. I'll give it a shot later today.
diff --git a/mm/Kconfig b/mm/Kconfig index 6e7c2d59fa96..3eca34c864c5 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -911,12 +911,14 @@ config ANON_VMA_NAME config PTE_MARKER bool "Marker PTEs support" + default y help Allows to create marker PTEs for file-backed memory. config PTE_MARKER_UFFD_WP bool "Marker PTEs support for userfaultfd write protection" + default y depends on PTE_MARKER && HAVE_ARCH_USERFAULTFD_WP help
Enable PTE markers by default. On x86_64 it means it'll auto-enable PTE_MARKER_UFFD_WP as well. Signed-off-by: Peter Xu <peterx@redhat.com> --- mm/Kconfig | 2 ++ 1 file changed, 2 insertions(+)