Message ID | 20210527201927.29586-5-peterx@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | userfaultfd-wp: Support shmem and hugetlbfs | expand |
On Friday, 28 May 2021 6:19:04 AM AEST Peter Xu wrote: > This patch introduces a very special swap-like pte for file-backed memories. > > Currently it's only defined for x86_64 only, but as long as any arch that > can properly define the UFFD_WP_SWP_PTE_SPECIAL value as requested, it > should conceptually work too. > > We will use this special pte to arm the ptes that got either unmapped or > swapped out for a file-backed region that was previously wr-protected. This > special pte could trigger a page fault just like swap entries, and as long > as the page fault will satisfy pte_none()==false && pte_present()==false. > > Then we can revive the special pte into a normal pte backed by the page > cache. > > This idea is greatly inspired by Hugh and Andrea in the discussion, which is > referenced in the links below. > > The other idea (from Hugh) is that we use swp_type==1 and swp_offset=0 as > the special pte. The current solution (as pointed out by Andrea) is > slightly preferred in that we don't even need swp_entry_t knowledge at all > in trapping these accesses. Meanwhile, we also reuse _PAGE_SWP_UFFD_WP > from the anonymous swp entries. So to confirm my understanding the reason you use this special swap pte instead of a new swp_type is that you only need the fault and have no extra information that needs storing in the pte? Personally I think it might be better to define a new swp_type for this rather than introducing a new arch-specific concept. swp_type entries are portable so wouldn't need extra arch-specific bits defined. And as I understand things not all architectures (eg. ARM) have spare bits in their swap entry encoding anyway so would have to reserve a bit specifically for this which would be less efficient than using a swp_type. Anyway it seems I missed the initial discussion so don't have a strong opinion here, mainly just wanted to check my understanding of what's required and how these special entries work. > This patch only introduces the special pte and its operators. It's not yet > applied to have any functional difference. > > Link: https://lore.kernel.org/lkml/20201126222359.8120-1-peterx@redhat.com/ > Link: https://lore.kernel.org/lkml/20201130230603.46187-1-peterx@redhat.com/ > Suggested-by: Andrea Arcangeli <aarcange@redhat.com> > Suggested-by: Hugh Dickins <hughd@google.com> > Signed-off-by: Peter Xu <peterx@redhat.com> > --- > arch/x86/include/asm/pgtable.h | 28 ++++++++++++++++++++++++++++ > include/asm-generic/pgtable_uffd.h | 3 +++ > include/linux/userfaultfd_k.h | 21 +++++++++++++++++++++ > 3 files changed, 52 insertions(+) > > diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h > index b1099f2d9800..9781ba2da049 100644 > --- a/arch/x86/include/asm/pgtable.h > +++ b/arch/x86/include/asm/pgtable.h > @@ -1329,6 +1329,34 @@ static inline pmd_t pmd_swp_clear_soft_dirty(pmd_t > pmd) #endif > > #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP > + > +/* > + * This is a very special swap-like pte that marks this pte as > "wr-protected" + * by userfaultfd-wp. It should only exist for file-backed > memory where the + * page (previously got wr-protected) has been unmapped > or swapped out. + * > + * For anonymous memories, the userfaultfd-wp _PAGE_SWP_UFFD_WP bit is kept > + * along with a real swp entry instead. > + * > + * Let's make some rules for this special pte: > + * > + * (1) pte_none()==false, so that it'll not trigger a missing page fault. > + * > + * (2) pte_present()==false, so that it's recognized as swap (is_swap_pte). > + * > + * (3) pte_swp_uffd_wp()==true, so it can be tested just like a swap pte > that + * contains a valid swap entry, so that we can check a swap pte > always + * using "is_swap_pte() && pte_swp_uffd_wp()" without caring > about whether + * there's one swap entry inside of the pte. > + * > + * (4) It should not be a valid swap pte anywhere, so that when we see this > pte + * we know it does not contain a swap entry. > + * > + * For x86, the simplest special pte which satisfies all of above should be > the + * pte with only _PAGE_SWP_UFFD_WP bit set (where > swp_type==swp_offset==0). + */ > +#define UFFD_WP_SWP_PTE_SPECIAL __pte(_PAGE_SWP_UFFD_WP) > + > static inline pte_t pte_swp_mkuffd_wp(pte_t pte) > { > return pte_set_flags(pte, _PAGE_SWP_UFFD_WP); > diff --git a/include/asm-generic/pgtable_uffd.h > b/include/asm-generic/pgtable_uffd.h index 828966d4c281..95e9811ce9d1 > 100644 > --- a/include/asm-generic/pgtable_uffd.h > +++ b/include/asm-generic/pgtable_uffd.h > @@ -2,6 +2,9 @@ > #define _ASM_GENERIC_PGTABLE_UFFD_H > > #ifndef CONFIG_HAVE_ARCH_USERFAULTFD_WP > + > +#define UFFD_WP_SWP_PTE_SPECIAL __pte(0) > + > static __always_inline int pte_uffd_wp(pte_t pte) > { > return 0; > diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h > index 331d2ccf0bcc..93f932b53a71 100644 > --- a/include/linux/userfaultfd_k.h > +++ b/include/linux/userfaultfd_k.h > @@ -145,6 +145,17 @@ extern int userfaultfd_unmap_prep(struct vm_area_struct > *vma, extern void userfaultfd_unmap_complete(struct mm_struct *mm, > struct list_head *uf); > > +static inline pte_t pte_swp_mkuffd_wp_special(struct vm_area_struct *vma) > +{ > + WARN_ON_ONCE(vma_is_anonymous(vma)); > + return UFFD_WP_SWP_PTE_SPECIAL; > +} > + > +static inline bool pte_swp_uffd_wp_special(pte_t pte) > +{ > + return pte_same(pte, UFFD_WP_SWP_PTE_SPECIAL); > +} > + > #else /* CONFIG_USERFAULTFD */ > > /* mm helpers */ > @@ -234,6 +245,16 @@ static inline void userfaultfd_unmap_complete(struct > mm_struct *mm, { > } > > +static inline pte_t pte_swp_mkuffd_wp_special(struct vm_area_struct *vma) > +{ > + return __pte(0); > +} > + > +static inline bool pte_swp_uffd_wp_special(pte_t pte) > +{ > + return false; > +} > + > #endif /* CONFIG_USERFAULTFD */ > > #endif /* _LINUX_USERFAULTFD_K_H */
On Fri, May 28, 2021 at 06:32:52PM +1000, Alistair Popple wrote: > On Friday, 28 May 2021 6:19:04 AM AEST Peter Xu wrote: > > This patch introduces a very special swap-like pte for file-backed memories. > > > > Currently it's only defined for x86_64 only, but as long as any arch that > > can properly define the UFFD_WP_SWP_PTE_SPECIAL value as requested, it > > should conceptually work too. > > > > We will use this special pte to arm the ptes that got either unmapped or > > swapped out for a file-backed region that was previously wr-protected. This > > special pte could trigger a page fault just like swap entries, and as long > > as the page fault will satisfy pte_none()==false && pte_present()==false. > > > > Then we can revive the special pte into a normal pte backed by the page > > cache. > > > > This idea is greatly inspired by Hugh and Andrea in the discussion, which is > > referenced in the links below. > > > > The other idea (from Hugh) is that we use swp_type==1 and swp_offset=0 as > > the special pte. The current solution (as pointed out by Andrea) is > > slightly preferred in that we don't even need swp_entry_t knowledge at all > > in trapping these accesses. Meanwhile, we also reuse _PAGE_SWP_UFFD_WP > > from the anonymous swp entries. > > So to confirm my understanding the reason you use this special swap pte > instead of a new swp_type is that you only need the fault and have no extra > information that needs storing in the pte? Yes. > > Personally I think it might be better to define a new swp_type for this rather > than introducing a new arch-specific concept. The concept should not be arch-specific, it's the pte that's arch-specific. > swp_type entries are portable so wouldn't need extra arch-specific bits > defined. And as I understand things not all architectures (eg. ARM) have > spare bits in their swap entry encoding anyway so would have to reserve a bit > specifically for this which would be less efficient than using a swp_type. It looks a trade-off to me: I think it's fine to use swap type in my series, as you said it's portable, but it will also waste the swap address space for the arch when the arch enables it. The format of the special pte to trigger the fault in this series should be only a small portion of the code change. The main logic should still be the same - we just replace this pte with that one. IMHO it also means the format can be changed in the future, it's just that I don't know whether it's wise to take over a new swap type from start. > > Anyway it seems I missed the initial discussion so don't have a strong opinion > here, mainly just wanted to check my understanding of what's required and how > these special entries work. Thanks for mentioning this and join the discussion. I don't know ARM enough so good to know we may have issue on finding the bits. Actually before finding this bit for file-backed uffd-wp specifically, we need to firstly find a bit in the normal pte for ARM too anyways (see _PAGE_UFFD_WP). If there's no strong reason to switch to a new swap type, I'd tend to leave all these to the future when we enable them on ARM.
On Friday, 28 May 2021 10:56:02 PM AEST Peter Xu wrote: > On Fri, May 28, 2021 at 06:32:52PM +1000, Alistair Popple wrote: > > On Friday, 28 May 2021 6:19:04 AM AEST Peter Xu wrote: > > > This patch introduces a very special swap-like pte for file-backed > > > memories. > > > > > > Currently it's only defined for x86_64 only, but as long as any arch > > > that > > > can properly define the UFFD_WP_SWP_PTE_SPECIAL value as requested, it > > > should conceptually work too. > > > > > > We will use this special pte to arm the ptes that got either unmapped or > > > swapped out for a file-backed region that was previously wr-protected. > > > This special pte could trigger a page fault just like swap entries, and > > > as long as the page fault will satisfy pte_none()==false && > > > pte_present()==false. > > > > > > Then we can revive the special pte into a normal pte backed by the page > > > cache. > > > > > > This idea is greatly inspired by Hugh and Andrea in the discussion, > > > which is referenced in the links below. > > > > > > The other idea (from Hugh) is that we use swp_type==1 and swp_offset=0 > > > as > > > the special pte. The current solution (as pointed out by Andrea) is > > > slightly preferred in that we don't even need swp_entry_t knowledge at > > > all > > > in trapping these accesses. Meanwhile, we also reuse _PAGE_SWP_UFFD_WP > > > from the anonymous swp entries. > > > > So to confirm my understanding the reason you use this special swap pte > > instead of a new swp_type is that you only need the fault and have no > > extra > > information that needs storing in the pte? > > Yes. > > > Personally I think it might be better to define a new swp_type for this > > rather than introducing a new arch-specific concept. > > The concept should not be arch-specific, it's the pte that's arch-specific. Right, agree this is a minor detail. > > swp_type entries are portable so wouldn't need extra arch-specific bits > > defined. And as I understand things not all architectures (eg. ARM) have > > spare bits in their swap entry encoding anyway so would have to reserve a > > bit specifically for this which would be less efficient than using a > > swp_type. > It looks a trade-off to me: I think it's fine to use swap type in my series, > as you said it's portable, but it will also waste the swap address space > for the arch when the arch enables it. > > The format of the special pte to trigger the fault in this series should be > only a small portion of the code change. The main logic should still be the > same - we just replace this pte with that one. IMHO it also means the > format can be changed in the future, it's just that I don't know whether > it's wise to take over a new swap type from start. > > > Anyway it seems I missed the initial discussion so don't have a strong > > opinion here, mainly just wanted to check my understanding of what's > > required and how these special entries work. > > Thanks for mentioning this and join the discussion. I don't know ARM enough > so good to know we may have issue on finding the bits. Actually before > finding this bit for file-backed uffd-wp specifically, we need to firstly > find a bit in the normal pte for ARM too anyways (see _PAGE_UFFD_WP). If > there's no strong reason to switch to a new swap type, I'd tend to leave > all these to the future when we enable them on ARM. Yeah, makes sense to me. As you say it should be easy to change and other architectures need to find another bit anyway. Not sure how useful it will be but I'll try and take a look over the rest of the series as well. > -- > Peter Xu
On Thu, Jun 03, 2021 at 09:53:45PM +1000, Alistair Popple wrote: > On Friday, 28 May 2021 10:56:02 PM AEST Peter Xu wrote: > > On Fri, May 28, 2021 at 06:32:52PM +1000, Alistair Popple wrote: > > > On Friday, 28 May 2021 6:19:04 AM AEST Peter Xu wrote: > > > > This patch introduces a very special swap-like pte for file-backed > > > > memories. > > > > > > > > Currently it's only defined for x86_64 only, but as long as any arch > > > > that > > > > can properly define the UFFD_WP_SWP_PTE_SPECIAL value as requested, it > > > > should conceptually work too. > > > > > > > > We will use this special pte to arm the ptes that got either unmapped or > > > > swapped out for a file-backed region that was previously wr-protected. > > > > This special pte could trigger a page fault just like swap entries, and > > > > as long as the page fault will satisfy pte_none()==false && > > > > pte_present()==false. > > > > > > > > Then we can revive the special pte into a normal pte backed by the page > > > > cache. > > > > > > > > This idea is greatly inspired by Hugh and Andrea in the discussion, > > > > which is referenced in the links below. > > > > > > > > The other idea (from Hugh) is that we use swp_type==1 and swp_offset=0 > > > > as > > > > the special pte. The current solution (as pointed out by Andrea) is > > > > slightly preferred in that we don't even need swp_entry_t knowledge at > > > > all > > > > in trapping these accesses. Meanwhile, we also reuse _PAGE_SWP_UFFD_WP > > > > from the anonymous swp entries. > > > > > > So to confirm my understanding the reason you use this special swap pte > > > instead of a new swp_type is that you only need the fault and have no > > > extra > > > information that needs storing in the pte? > > > > Yes. > > > > > Personally I think it might be better to define a new swp_type for this > > > rather than introducing a new arch-specific concept. > > > > The concept should not be arch-specific, it's the pte that's arch-specific. > > Right, agree this is a minor detail. I can't say it's a minor detail, as that's still indeed one of the major ideas that I'd like to get comment for within the whole series. It's currently an outcome from previous discussion with Andrea and Hugh, but of course if there's better idea with reasoning I can always consider to rework the series. > > > > swp_type entries are portable so wouldn't need extra arch-specific bits > > > defined. And as I understand things not all architectures (eg. ARM) have > > > spare bits in their swap entry encoding anyway so would have to reserve a > > > bit specifically for this which would be less efficient than using a > > > swp_type. > > It looks a trade-off to me: I think it's fine to use swap type in my series, > > as you said it's portable, but it will also waste the swap address space > > for the arch when the arch enables it. > > > > The format of the special pte to trigger the fault in this series should be > > only a small portion of the code change. The main logic should still be the > > same - we just replace this pte with that one. IMHO it also means the > > format can be changed in the future, it's just that I don't know whether > > it's wise to take over a new swap type from start. > > > > > Anyway it seems I missed the initial discussion so don't have a strong > > > opinion here, mainly just wanted to check my understanding of what's > > > required and how these special entries work. > > > > Thanks for mentioning this and join the discussion. I don't know ARM enough > > so good to know we may have issue on finding the bits. Actually before > > finding this bit for file-backed uffd-wp specifically, we need to firstly > > find a bit in the normal pte for ARM too anyways (see _PAGE_UFFD_WP). If > > there's no strong reason to switch to a new swap type, I'd tend to leave > > all these to the future when we enable them on ARM. > > Yeah, makes sense to me. As you say it should be easy to change and other > architectures need to find another bit anyway. Not sure how useful it will be > but I'll try and take a look over the rest of the series as well. I'll highly appreciate that. Thanks Alistair!
On Friday, 4 June 2021 12:51:19 AM AEST Peter Xu wrote: > External email: Use caution opening links or attachments > > On Thu, Jun 03, 2021 at 09:53:45PM +1000, Alistair Popple wrote: > > On Friday, 28 May 2021 10:56:02 PM AEST Peter Xu wrote: > > > On Fri, May 28, 2021 at 06:32:52PM +1000, Alistair Popple wrote: > > > > On Friday, 28 May 2021 6:19:04 AM AEST Peter Xu wrote: > > > > > This patch introduces a very special swap-like pte for file-backed > > > > > memories. > > > > > > > > > > Currently it's only defined for x86_64 only, but as long as any arch > > > > > that > > > > > can properly define the UFFD_WP_SWP_PTE_SPECIAL value as requested, > > > > > it > > > > > should conceptually work too. > > > > > > > > > > We will use this special pte to arm the ptes that got either > > > > > unmapped or > > > > > swapped out for a file-backed region that was previously > > > > > wr-protected. > > > > > This special pte could trigger a page fault just like swap entries, > > > > > and > > > > > as long as the page fault will satisfy pte_none()==false && > > > > > pte_present()==false. > > > > > > > > > > Then we can revive the special pte into a normal pte backed by the > > > > > page > > > > > cache. > > > > > > > > > > This idea is greatly inspired by Hugh and Andrea in the discussion, > > > > > which is referenced in the links below. > > > > > > > > > > The other idea (from Hugh) is that we use swp_type==1 and > > > > > swp_offset=0 > > > > > as > > > > > the special pte. The current solution (as pointed out by Andrea) is > > > > > slightly preferred in that we don't even need swp_entry_t knowledge > > > > > at > > > > > all > > > > > in trapping these accesses. Meanwhile, we also reuse > > > > > _PAGE_SWP_UFFD_WP > > > > > from the anonymous swp entries. > > > > > > > > So to confirm my understanding the reason you use this special swap > > > > pte > > > > instead of a new swp_type is that you only need the fault and have no > > > > extra > > > > information that needs storing in the pte? > > > > > > Yes. > > > > > > > Personally I think it might be better to define a new swp_type for > > > > this > > > > rather than introducing a new arch-specific concept. > > > > > > The concept should not be arch-specific, it's the pte that's > > > arch-specific. > > > > Right, agree this is a minor detail. > > I can't say it's a minor detail, as that's still indeed one of the major > ideas that I'd like to get comment for within the whole series. It's > currently an outcome from previous discussion with Andrea and Hugh, but of > course if there's better idea with reasoning I can always consider to > rework the series. Sorry, I wasn't very clear there. What I meant is the high level arch- independent concept of using a special swap pte for this is the most important aspect of the design and looks good to me. The detail which is perhaps less important is whether to implement this using a new swap entry type or arch-specific swap bit. The argument for using a swap type is it will work across architectures due to the use of pte_to_swp_entry() and swp_entry_to_pte() to convert to and from the arch-dependent and independent representations. The argument against seems to have been that it is wasting a swap type. However if I'm understanding correctly that's not true for all architectures, and needing to reserve a bit is more wasteful than using a swap type. For example ARM encodes swap entries like so: * Encode and decode a swap entry. Swap entries are stored in the Linux * page tables as follows: * * 3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 * 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 * <--------------- offset ------------------------> < type -> 0 0 So the only way to get a spare bit is to reduce the width of type (or offset) which would halve the number of swap types. And if I understand correctly the same argument might apply to x86 - the spare bit being used here could instead be used to expand the width of type if a lack of available swap types is a concern. > > > > swp_type entries are portable so wouldn't need extra arch-specific > > > > bits > > > > defined. And as I understand things not all architectures (eg. ARM) > > > > have > > > > spare bits in their swap entry encoding anyway so would have to > > > > reserve a > > > > bit specifically for this which would be less efficient than using a > > > > swp_type. > > > > > > It looks a trade-off to me: I think it's fine to use swap type in my > > > series, as you said it's portable, but it will also waste the swap > > > address space for the arch when the arch enables it. > > > > > > The format of the special pte to trigger the fault in this series should > > > be > > > only a small portion of the code change. The main logic should still be > > > the same - we just replace this pte with that one. IMHO it also means > > > the format can be changed in the future, it's just that I don't know > > > whether it's wise to take over a new swap type from start. > > > > > > > Anyway it seems I missed the initial discussion so don't have a strong > > > > opinion here, mainly just wanted to check my understanding of what's > > > > required and how these special entries work. > > > > > > Thanks for mentioning this and join the discussion. I don't know ARM > > > enough > > > so good to know we may have issue on finding the bits. Actually before > > > finding this bit for file-backed uffd-wp specifically, we need to > > > firstly > > > find a bit in the normal pte for ARM too anyways (see _PAGE_UFFD_WP). > > > If > > > there's no strong reason to switch to a new swap type, I'd tend to leave > > > all these to the future when we enable them on ARM. > > > > Yeah, makes sense to me. As you say it should be easy to change and other > > architectures need to find another bit anyway. Not sure how useful it will > > be but I'll try and take a look over the rest of the series as well. > > I'll highly appreciate that. Thanks Alistair! > > -- > Peter Xu
On Fri, 4 Jun 2021, Alistair Popple wrote: > > The detail which is perhaps less important is whether to implement this using > a new swap entry type or arch-specific swap bit. The argument for using a swap > type is it will work across architectures due to the use of pte_to_swp_entry() > and swp_entry_to_pte() to convert to and from the arch-dependent and > independent representations. > > The argument against seems to have been that it is wasting a swap type. > However if I'm understanding correctly that's not true for all architectures, > and needing to reserve a bit is more wasteful than using a swap type. I'm on the outside, not paying much attention here, but thought Peter would have cleared this up already. My understanding is that it does *not* use an additional arch-dependent bit, but puts the _PAGE_UFFD_WP bit (already set aside by any architecture implementing UFFD WP) to an additional use. That's why I called this design (from Andrea) more elegant than mine (swap type business). If I've got that wrong, and yet another arch-dependent bit is needed, then I very much agree with you: finding arch-dependent pte bits is a much tougher job than another play with swap type. (And "more elegant" might not be "easier to understand": you decide.) Hugh
On Friday, 4 June 2021 1:14:31 PM AEST Hugh Dickins wrote: > On Fri, 4 Jun 2021, Alistair Popple wrote: > > > > The detail which is perhaps less important is whether to implement this using > > a new swap entry type or arch-specific swap bit. The argument for using a swap > > type is it will work across architectures due to the use of pte_to_swp_entry() > > and swp_entry_to_pte() to convert to and from the arch-dependent and > > independent representations. > > > > The argument against seems to have been that it is wasting a swap type. > > However if I'm understanding correctly that's not true for all architectures, > > and needing to reserve a bit is more wasteful than using a swap type. > > I'm on the outside, not paying much attention here, > but thought Peter would have cleared this up already. > > My understanding is that it does *not* use an additional arch-dependent > bit, but puts the _PAGE_UFFD_WP bit (already set aside by any architecture > implementing UFFD WP) to an additional use. That's why I called this > design (from Andrea) more elegant than mine (swap type business). Oh my bad, I had somehow missed this was reusing an *existing* arch-dependent swap bit (_PAGE_SWP_UFFD_WP, although the same argument could apply) even though it's in the commit message. Obviously I should have read that more carefully, apologies for the noise but thanks for the clarification. > If I've got that wrong, and yet another arch-dependent bit is needed, > then I very much agree with you: finding arch-dependent pte bits is a > much tougher job than another play with swap type. > > (And "more elegant" might not be "easier to understand": you decide.) Agree, that's a somewhat subjective debate. Conceptually I don't think this is particularly difficult to understand. It just adds another slightly different class of special swap pte's to know about. - Alistair > Hugh
On Fri, Jun 04, 2021 at 04:16:30PM +1000, Alistair Popple wrote: > > My understanding is that it does *not* use an additional arch-dependent > > bit, but puts the _PAGE_UFFD_WP bit (already set aside by any architecture > > implementing UFFD WP) to an additional use. That's why I called this > > design (from Andrea) more elegant than mine (swap type business). > > Oh my bad, I had somehow missed this was reusing an *existing* arch-dependent > swap bit (_PAGE_SWP_UFFD_WP, although the same argument could apply) even > though it's in the commit message. Obviously I should have read that more > carefully, apologies for the noise but thanks for the clarification. Right, as Hugh mentioned what this series wanted to use is one explicit pte that no one should ever be using, so ideally that should be the most saving way per address-space pov. Meanwhile I think that pte can actually be not related to _PAGE_UFFD_WP at all, as long as it's a specific pte value then it will service the same goal (even if to reuse a new swp type, I'll probably only use one pte for it and leave the rest for other use; but who knows who will start to use the rest!). I kept using it because that's suggested by Andrea (it actually has type==off==0 as Hugh suggested too - so it keeps a suggestion of both!) and it's a good idea to use it since (1) it's never used by anyone before, and (2) it is _somehow_ related to uffd-wp itself already by having that specific bit set in the special pte, while that's also the only bit set for the u64 field. It looks very nice too when debug, because when I dump the ptes it reads 0x4 on x86.. so the pte value is even easy to read as a number. :) However I can see that it is less easy to follow than the swap type solution. In all cases it's still something worth thinking about before using up the swap types - it's not so rich there, and we keep shrinking MAX_SWAPFILES.. so let's see whether uffd-wp could be the 1st one to open a new field for unused "invalid/swap pte" address space. Meanwhile, I did have a look at ARM on supporting uffd-wp in general, starting from anonymous pages. I doubt whether it can be done for old arms (uffd-wp not even supported on 32bit x86 after all), but for ARM64 I see it has: For normal ptes: /* * Level 3 descriptor (PTE). */ #define PTE_VALID (_AT(pteval_t, 1) << 0) #define PTE_TYPE_MASK (_AT(pteval_t, 3) << 0) #define PTE_TYPE_PAGE (_AT(pteval_t, 3) << 0) #define PTE_TABLE_BIT (_AT(pteval_t, 1) << 1) #define PTE_USER (_AT(pteval_t, 1) << 6) /* AP[1] */ #define PTE_RDONLY (_AT(pteval_t, 1) << 7) /* AP[2] */ #define PTE_SHARED (_AT(pteval_t, 3) << 8) /* SH[1:0], inner shareable */ #define PTE_AF (_AT(pteval_t, 1) << 10) /* Access Flag */ #define PTE_NG (_AT(pteval_t, 1) << 11) /* nG */ #define PTE_GP (_AT(pteval_t, 1) << 50) /* BTI guarded */ #define PTE_DBM (_AT(pteval_t, 1) << 51) /* Dirty Bit Management */ #define PTE_CONT (_AT(pteval_t, 1) << 52) /* Contiguous range */ #define PTE_PXN (_AT(pteval_t, 1) << 53) /* Privileged XN */ #define PTE_UXN (_AT(pteval_t, 1) << 54) /* User XN */ For swap ptes: /* * Encode and decode a swap entry: * bits 0-1: present (must be zero) * bits 2-7: swap type * bits 8-57: swap offset * bit 58: PTE_PROT_NONE (must be zero) */ So I feel like we still have chance there at least for 64bit ARM? As both normal/swap ptes have some bits free (bits 2-5,9 for normal ptes; bits 59-63 for swap ptes). But as I know little on ARM64, I hope I looked at the right things.. Thanks,
On Saturday, 5 June 2021 2:01:59 AM AEST Peter Xu wrote: > On Fri, Jun 04, 2021 at 04:16:30PM +1000, Alistair Popple wrote: > > > My understanding is that it does *not* use an additional arch-dependent > > > bit, but puts the _PAGE_UFFD_WP bit (already set aside by any architecture > > > implementing UFFD WP) to an additional use. That's why I called this > > > design (from Andrea) more elegant than mine (swap type business). > > > > Oh my bad, I had somehow missed this was reusing an *existing* arch-dependent > > swap bit (_PAGE_SWP_UFFD_WP, although the same argument could apply) even > > though it's in the commit message. Obviously I should have read that more > > carefully, apologies for the noise but thanks for the clarification. > > Right, as Hugh mentioned what this series wanted to use is one explicit pte > that no one should ever be using, so ideally that should be the most saving way > per address-space pov. > > Meanwhile I think that pte can actually be not related to _PAGE_UFFD_WP at all, > as long as it's a specific pte value then it will service the same goal (even > if to reuse a new swp type, I'll probably only use one pte for it and leave the > rest for other use; but who knows who will start to use the rest!). > > I kept using it because that's suggested by Andrea (it actually has > type==off==0 as Hugh suggested too - so it keeps a suggestion of both!) and > it's a good idea to use it since (1) it's never used by anyone before, and (2) > it is _somehow_ related to uffd-wp itself already by having that specific bit > set in the special pte, while that's also the only bit set for the u64 field. > > It looks very nice too when debug, because when I dump the ptes it reads 0x4 on > x86.. so the pte value is even easy to read as a number. :) > > However I can see that it is less easy to follow than the swap type solution. > In all cases it's still something worth thinking about before using up the swap > types - it's not so rich there, and we keep shrinking MAX_SWAPFILES.. so let's > see whether uffd-wp could be the 1st one to open a new field for unused > "invalid/swap pte" address space. Agreed, that matches with what I was thinking as well. If we do end up having more swap types such as this which don't need to store much information in the swap pte itself we could define a special swap type (eg. this bit) for that. > Meanwhile, I did have a look at ARM on supporting uffd-wp in general, starting > from anonymous pages. I doubt whether it can be done for old arms (uffd-wp not > even supported on 32bit x86 after all), but for ARM64 I see it has: > > For normal ptes: > > /* > * Level 3 descriptor (PTE). > */ > #define PTE_VALID (_AT(pteval_t, 1) << 0) > #define PTE_TYPE_MASK (_AT(pteval_t, 3) << 0) > #define PTE_TYPE_PAGE (_AT(pteval_t, 3) << 0) > #define PTE_TABLE_BIT (_AT(pteval_t, 1) << 1) > #define PTE_USER (_AT(pteval_t, 1) << 6) /* AP[1] */ > #define PTE_RDONLY (_AT(pteval_t, 1) << 7) /* AP[2] */ > #define PTE_SHARED (_AT(pteval_t, 3) << 8) /* SH[1:0], inner shareable */ > #define PTE_AF (_AT(pteval_t, 1) << 10) /* Access Flag */ > #define PTE_NG (_AT(pteval_t, 1) << 11) /* nG */ > #define PTE_GP (_AT(pteval_t, 1) << 50) /* BTI guarded */ > #define PTE_DBM (_AT(pteval_t, 1) << 51) /* Dirty Bit Management */ > #define PTE_CONT (_AT(pteval_t, 1) << 52) /* Contiguous range */ > #define PTE_PXN (_AT(pteval_t, 1) << 53) /* Privileged XN */ > #define PTE_UXN (_AT(pteval_t, 1) << 54) /* User XN */ > > For swap ptes: > > /* > * Encode and decode a swap entry: > * bits 0-1: present (must be zero) > * bits 2-7: swap type > * bits 8-57: swap offset > * bit 58: PTE_PROT_NONE (must be zero) > */ > > So I feel like we still have chance there at least for 64bit ARM? As both > normal/swap ptes have some bits free (bits 2-5,9 for normal ptes; bits 59-63 > for swap ptes). But as I know little on ARM64, I hope I looked at the right > things.. I don't claim to be an expert there either. Given there's already a bit defined for x86 anyway (which is what I missed) I now think the special swap idea is ok, although I still need to look at the rest of the series. > Thanks, > > -- > Peter Xu >
On Friday, 28 May 2021 6:19:04 AM AEST Peter Xu wrote: [...] > diff --git a/include/asm-generic/pgtable_uffd.h b/include/asm-generic/pgtable_uffd.h > index 828966d4c281..95e9811ce9d1 100644 > --- a/include/asm-generic/pgtable_uffd.h > +++ b/include/asm-generic/pgtable_uffd.h > @@ -2,6 +2,9 @@ > #define _ASM_GENERIC_PGTABLE_UFFD_H > > #ifndef CONFIG_HAVE_ARCH_USERFAULTFD_WP > + > +#define UFFD_WP_SWP_PTE_SPECIAL __pte(0) > + > static __always_inline int pte_uffd_wp(pte_t pte) > { > return 0; > diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h > index 331d2ccf0bcc..93f932b53a71 100644 > --- a/include/linux/userfaultfd_k.h > +++ b/include/linux/userfaultfd_k.h > @@ -145,6 +145,17 @@ extern int userfaultfd_unmap_prep(struct vm_area_struct *vma, > extern void userfaultfd_unmap_complete(struct mm_struct *mm, > struct list_head *uf); > > +static inline pte_t pte_swp_mkuffd_wp_special(struct vm_area_struct *vma) > +{ > + WARN_ON_ONCE(vma_is_anonymous(vma)); > + return UFFD_WP_SWP_PTE_SPECIAL; > +} > + > +static inline bool pte_swp_uffd_wp_special(pte_t pte) > +{ > + return pte_same(pte, UFFD_WP_SWP_PTE_SPECIAL); > +} > + Sorry, only just noticed this but do we need to define a different version of this helper that returns false for CONFIG_HAVE_ARCH_USERFAULTFD_WP=n to avoid spurious matches with __pte(0) on architectures supporting userfaultfd but not userfaultfd-wp? > #else /* CONFIG_USERFAULTFD */ > > /* mm helpers */ > @@ -234,6 +245,16 @@ static inline void userfaultfd_unmap_complete(struct mm_struct *mm, > { > } > > +static inline pte_t pte_swp_mkuffd_wp_special(struct vm_area_struct *vma) > +{ > + return __pte(0); > +} > + > +static inline bool pte_swp_uffd_wp_special(pte_t pte) > +{ > + return false; > +} > + > #endif /* CONFIG_USERFAULTFD */ > > #endif /* _LINUX_USERFAULTFD_K_H */ >
On Wed, Jun 09, 2021 at 11:06:32PM +1000, Alistair Popple wrote: > On Friday, 28 May 2021 6:19:04 AM AEST Peter Xu wrote: > > [...] > > > diff --git a/include/asm-generic/pgtable_uffd.h b/include/asm-generic/pgtable_uffd.h > > index 828966d4c281..95e9811ce9d1 100644 > > --- a/include/asm-generic/pgtable_uffd.h > > +++ b/include/asm-generic/pgtable_uffd.h > > @@ -2,6 +2,9 @@ > > #define _ASM_GENERIC_PGTABLE_UFFD_H > > > > #ifndef CONFIG_HAVE_ARCH_USERFAULTFD_WP > > + > > +#define UFFD_WP_SWP_PTE_SPECIAL __pte(0) > > + > > static __always_inline int pte_uffd_wp(pte_t pte) > > { > > return 0; > > diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h > > index 331d2ccf0bcc..93f932b53a71 100644 > > --- a/include/linux/userfaultfd_k.h > > +++ b/include/linux/userfaultfd_k.h > > @@ -145,6 +145,17 @@ extern int userfaultfd_unmap_prep(struct vm_area_struct *vma, > > extern void userfaultfd_unmap_complete(struct mm_struct *mm, > > struct list_head *uf); > > > > +static inline pte_t pte_swp_mkuffd_wp_special(struct vm_area_struct *vma) > > +{ > > + WARN_ON_ONCE(vma_is_anonymous(vma)); > > + return UFFD_WP_SWP_PTE_SPECIAL; > > +} > > + > > +static inline bool pte_swp_uffd_wp_special(pte_t pte) > > +{ > > + return pte_same(pte, UFFD_WP_SWP_PTE_SPECIAL); > > +} > > + > > Sorry, only just noticed this but do we need to define a different version of > this helper that returns false for CONFIG_HAVE_ARCH_USERFAULTFD_WP=n to avoid > spurious matches with __pte(0) on architectures supporting userfaultfd but not > userfaultfd-wp? Good point.. Yes we definitely don't want the empty pte to be recognized as the special pte.. I'll squash below into the same patch: ----8<---- diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h index 489fb375e66c..23ca449240d1 100644 --- a/include/linux/userfaultfd_k.h +++ b/include/linux/userfaultfd_k.h @@ -177,7 +177,11 @@ static inline pte_t pte_swp_mkuffd_wp_special(struct vm_area_struct *vma) static inline bool pte_swp_uffd_wp_special(pte_t pte) { +#ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP return pte_same(pte, UFFD_WP_SWP_PTE_SPECIAL); +#else + return false; +#fi } #else /* CONFIG_USERFAULTFD */ ----8<---- I'll see whether I can give some dry run without HAVE_ARCH_USERFAULTFD_WP but with USERFAULTFD. Thanks for spotting that!
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index b1099f2d9800..9781ba2da049 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -1329,6 +1329,34 @@ static inline pmd_t pmd_swp_clear_soft_dirty(pmd_t pmd) #endif #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP + +/* + * This is a very special swap-like pte that marks this pte as "wr-protected" + * by userfaultfd-wp. It should only exist for file-backed memory where the + * page (previously got wr-protected) has been unmapped or swapped out. + * + * For anonymous memories, the userfaultfd-wp _PAGE_SWP_UFFD_WP bit is kept + * along with a real swp entry instead. + * + * Let's make some rules for this special pte: + * + * (1) pte_none()==false, so that it'll not trigger a missing page fault. + * + * (2) pte_present()==false, so that it's recognized as swap (is_swap_pte). + * + * (3) pte_swp_uffd_wp()==true, so it can be tested just like a swap pte that + * contains a valid swap entry, so that we can check a swap pte always + * using "is_swap_pte() && pte_swp_uffd_wp()" without caring about whether + * there's one swap entry inside of the pte. + * + * (4) It should not be a valid swap pte anywhere, so that when we see this pte + * we know it does not contain a swap entry. + * + * For x86, the simplest special pte which satisfies all of above should be the + * pte with only _PAGE_SWP_UFFD_WP bit set (where swp_type==swp_offset==0). + */ +#define UFFD_WP_SWP_PTE_SPECIAL __pte(_PAGE_SWP_UFFD_WP) + static inline pte_t pte_swp_mkuffd_wp(pte_t pte) { return pte_set_flags(pte, _PAGE_SWP_UFFD_WP); diff --git a/include/asm-generic/pgtable_uffd.h b/include/asm-generic/pgtable_uffd.h index 828966d4c281..95e9811ce9d1 100644 --- a/include/asm-generic/pgtable_uffd.h +++ b/include/asm-generic/pgtable_uffd.h @@ -2,6 +2,9 @@ #define _ASM_GENERIC_PGTABLE_UFFD_H #ifndef CONFIG_HAVE_ARCH_USERFAULTFD_WP + +#define UFFD_WP_SWP_PTE_SPECIAL __pte(0) + static __always_inline int pte_uffd_wp(pte_t pte) { return 0; diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h index 331d2ccf0bcc..93f932b53a71 100644 --- a/include/linux/userfaultfd_k.h +++ b/include/linux/userfaultfd_k.h @@ -145,6 +145,17 @@ extern int userfaultfd_unmap_prep(struct vm_area_struct *vma, extern void userfaultfd_unmap_complete(struct mm_struct *mm, struct list_head *uf); +static inline pte_t pte_swp_mkuffd_wp_special(struct vm_area_struct *vma) +{ + WARN_ON_ONCE(vma_is_anonymous(vma)); + return UFFD_WP_SWP_PTE_SPECIAL; +} + +static inline bool pte_swp_uffd_wp_special(pte_t pte) +{ + return pte_same(pte, UFFD_WP_SWP_PTE_SPECIAL); +} + #else /* CONFIG_USERFAULTFD */ /* mm helpers */ @@ -234,6 +245,16 @@ static inline void userfaultfd_unmap_complete(struct mm_struct *mm, { } +static inline pte_t pte_swp_mkuffd_wp_special(struct vm_area_struct *vma) +{ + return __pte(0); +} + +static inline bool pte_swp_uffd_wp_special(pte_t pte) +{ + return false; +} + #endif /* CONFIG_USERFAULTFD */ #endif /* _LINUX_USERFAULTFD_K_H */
This patch introduces a very special swap-like pte for file-backed memories. Currently it's only defined for x86_64 only, but as long as any arch that can properly define the UFFD_WP_SWP_PTE_SPECIAL value as requested, it should conceptually work too. We will use this special pte to arm the ptes that got either unmapped or swapped out for a file-backed region that was previously wr-protected. This special pte could trigger a page fault just like swap entries, and as long as the page fault will satisfy pte_none()==false && pte_present()==false. Then we can revive the special pte into a normal pte backed by the page cache. This idea is greatly inspired by Hugh and Andrea in the discussion, which is referenced in the links below. The other idea (from Hugh) is that we use swp_type==1 and swp_offset=0 as the special pte. The current solution (as pointed out by Andrea) is slightly preferred in that we don't even need swp_entry_t knowledge at all in trapping these accesses. Meanwhile, we also reuse _PAGE_SWP_UFFD_WP from the anonymous swp entries. This patch only introduces the special pte and its operators. It's not yet applied to have any functional difference. Link: https://lore.kernel.org/lkml/20201126222359.8120-1-peterx@redhat.com/ Link: https://lore.kernel.org/lkml/20201130230603.46187-1-peterx@redhat.com/ Suggested-by: Andrea Arcangeli <aarcange@redhat.com> Suggested-by: Hugh Dickins <hughd@google.com> Signed-off-by: Peter Xu <peterx@redhat.com> --- arch/x86/include/asm/pgtable.h | 28 ++++++++++++++++++++++++++++ include/asm-generic/pgtable_uffd.h | 3 +++ include/linux/userfaultfd_k.h | 21 +++++++++++++++++++++ 3 files changed, 52 insertions(+)