diff mbox series

[v3,26/35] mm: fall back to mmap_lock if vma->anon_vma is not yet set

Message ID 20230216051750.3125598-27-surenb@google.com (mailing list archive)
State New
Headers show
Series Per-VMA locks | expand

Commit Message

Suren Baghdasaryan Feb. 16, 2023, 5:17 a.m. UTC
When vma->anon_vma is not set, page fault handler will set it by either
reusing anon_vma of an adjacent VMA if VMAs are compatible or by
allocating a new one. find_mergeable_anon_vma() walks VMA tree to find
a compatible adjacent VMA and that requires not only the faulting VMA
to be stable but also the tree structure and other VMAs inside that tree.
Therefore locking just the faulting VMA is not enough for this search.
Fall back to taking mmap_lock when vma->anon_vma is not set. This
situation happens only on the first page fault and should not affect
overall performance.

Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
 mm/memory.c | 4 ++++
 1 file changed, 4 insertions(+)

Comments

Matthew Wilcox Feb. 16, 2023, 3:44 p.m. UTC | #1
On Wed, Feb 15, 2023 at 09:17:41PM -0800, Suren Baghdasaryan wrote:
> When vma->anon_vma is not set, page fault handler will set it by either
> reusing anon_vma of an adjacent VMA if VMAs are compatible or by
> allocating a new one. find_mergeable_anon_vma() walks VMA tree to find
> a compatible adjacent VMA and that requires not only the faulting VMA
> to be stable but also the tree structure and other VMAs inside that tree.
> Therefore locking just the faulting VMA is not enough for this search.
> Fall back to taking mmap_lock when vma->anon_vma is not set. This
> situation happens only on the first page fault and should not affect
> overall performance.

I think I asked this before, but don't remember getting an aswer.
Why do we defer setting anon_vma to the first fault?  Why don't we
set it up at mmap time?
Suren Baghdasaryan Feb. 16, 2023, 7:43 p.m. UTC | #2
On Thu, Feb 16, 2023 at 7:44 AM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Wed, Feb 15, 2023 at 09:17:41PM -0800, Suren Baghdasaryan wrote:
> > When vma->anon_vma is not set, page fault handler will set it by either
> > reusing anon_vma of an adjacent VMA if VMAs are compatible or by
> > allocating a new one. find_mergeable_anon_vma() walks VMA tree to find
> > a compatible adjacent VMA and that requires not only the faulting VMA
> > to be stable but also the tree structure and other VMAs inside that tree.
> > Therefore locking just the faulting VMA is not enough for this search.
> > Fall back to taking mmap_lock when vma->anon_vma is not set. This
> > situation happens only on the first page fault and should not affect
> > overall performance.
>
> I think I asked this before, but don't remember getting an aswer.
> Why do we defer setting anon_vma to the first fault?  Why don't we
> set it up at mmap time?

Yeah, I remember that conversation Matthew and I could not find the
definitive answer at the time. I'll look into that again or maybe
someone can answer it here.

In the end rather than changing that logic I decided to skip
vma->anon_vma==NULL cases because I measured them being less than
0.01% of all page faults, so ROI from changing that would be quite
low. But I agree that the logic is weird and maybe we can improve
that. I will have to review that again when I'm working on eliminating
all these special cases we skip, like swap/userfaults/etc.
Suren Baghdasaryan Feb. 17, 2023, 2:14 a.m. UTC | #3
On Thu, Feb 16, 2023 at 11:43 AM Suren Baghdasaryan <surenb@google.com> wrote:
>
> On Thu, Feb 16, 2023 at 7:44 AM Matthew Wilcox <willy@infradead.org> wrote:
> >
> > On Wed, Feb 15, 2023 at 09:17:41PM -0800, Suren Baghdasaryan wrote:
> > > When vma->anon_vma is not set, page fault handler will set it by either
> > > reusing anon_vma of an adjacent VMA if VMAs are compatible or by
> > > allocating a new one. find_mergeable_anon_vma() walks VMA tree to find
> > > a compatible adjacent VMA and that requires not only the faulting VMA
> > > to be stable but also the tree structure and other VMAs inside that tree.
> > > Therefore locking just the faulting VMA is not enough for this search.
> > > Fall back to taking mmap_lock when vma->anon_vma is not set. This
> > > situation happens only on the first page fault and should not affect
> > > overall performance.
> >
> > I think I asked this before, but don't remember getting an aswer.
> > Why do we defer setting anon_vma to the first fault?  Why don't we
> > set it up at mmap time?
>
> Yeah, I remember that conversation Matthew and I could not find the
> definitive answer at the time. I'll look into that again or maybe
> someone can answer it here.

After looking into it again I'm still under the impression that
vma->anon_vma is populated lazily (during the first page fault rather
than at mmap time) to avoid doing extra work for areas which are never
faulted. Though I might be missing some important detail here.

>
> In the end rather than changing that logic I decided to skip
> vma->anon_vma==NULL cases because I measured them being less than
> 0.01% of all page faults, so ROI from changing that would be quite
> low. But I agree that the logic is weird and maybe we can improve
> that. I will have to review that again when I'm working on eliminating
> all these special cases we skip, like swap/userfaults/etc.
Hyeonggon Yoo Feb. 17, 2023, 10:21 a.m. UTC | #4
On Fri, Feb 17, 2023 at 11:15 AM Suren Baghdasaryan <surenb@google.com> wrote:
>
> On Thu, Feb 16, 2023 at 11:43 AM Suren Baghdasaryan <surenb@google.com> wrote:
> >
> > On Thu, Feb 16, 2023 at 7:44 AM Matthew Wilcox <willy@infradead.org> wrote:
> > >
> > > On Wed, Feb 15, 2023 at 09:17:41PM -0800, Suren Baghdasaryan wrote:
> > > > When vma->anon_vma is not set, page fault handler will set it by either
> > > > reusing anon_vma of an adjacent VMA if VMAs are compatible or by
> > > > allocating a new one. find_mergeable_anon_vma() walks VMA tree to find
> > > > a compatible adjacent VMA and that requires not only the faulting VMA
> > > > to be stable but also the tree structure and other VMAs inside that tree.
> > > > Therefore locking just the faulting VMA is not enough for this search.
> > > > Fall back to taking mmap_lock when vma->anon_vma is not set. This
> > > > situation happens only on the first page fault and should not affect
> > > > overall performance.
> > >
> > > I think I asked this before, but don't remember getting an aswer.
> > > Why do we defer setting anon_vma to the first fault?  Why don't we
> > > set it up at mmap time?
> >
> > Yeah, I remember that conversation Matthew and I could not find the
> > definitive answer at the time. I'll look into that again or maybe
> > someone can answer it here.
>
> After looking into it again I'm still under the impression that
> vma->anon_vma is populated lazily (during the first page fault rather
> than at mmap time) to avoid doing extra work for areas which are never
> faulted. Though I might be missing some important detail here.

I think this is because the kernel cannot merge VMAs that have
different anon_vmas?

Enabling lazy population of anon_vma could potentially increase the
chances of merging VMAs.

> > In the end rather than changing that logic I decided to skip
> > vma->anon_vma==NULL cases because I measured them being less than
> > 0.01% of all page faults, so ROI from changing that would be quite
> > low. But I agree that the logic is weird and maybe we can improve
> > that. I will have to review that again when I'm working on eliminating
> > all these special cases we skip, like swap/userfaults/etc.
Matthew Wilcox Feb. 17, 2023, 4:05 p.m. UTC | #5
On Thu, Feb 16, 2023 at 06:14:59PM -0800, Suren Baghdasaryan wrote:
> On Thu, Feb 16, 2023 at 11:43 AM Suren Baghdasaryan <surenb@google.com> wrote:
> >
> > On Thu, Feb 16, 2023 at 7:44 AM Matthew Wilcox <willy@infradead.org> wrote:
> > >
> > > On Wed, Feb 15, 2023 at 09:17:41PM -0800, Suren Baghdasaryan wrote:
> > > > When vma->anon_vma is not set, page fault handler will set it by either
> > > > reusing anon_vma of an adjacent VMA if VMAs are compatible or by
> > > > allocating a new one. find_mergeable_anon_vma() walks VMA tree to find
> > > > a compatible adjacent VMA and that requires not only the faulting VMA
> > > > to be stable but also the tree structure and other VMAs inside that tree.
> > > > Therefore locking just the faulting VMA is not enough for this search.
> > > > Fall back to taking mmap_lock when vma->anon_vma is not set. This
> > > > situation happens only on the first page fault and should not affect
> > > > overall performance.
> > >
> > > I think I asked this before, but don't remember getting an aswer.
> > > Why do we defer setting anon_vma to the first fault?  Why don't we
> > > set it up at mmap time?
> >
> > Yeah, I remember that conversation Matthew and I could not find the
> > definitive answer at the time. I'll look into that again or maybe
> > someone can answer it here.
> 
> After looking into it again I'm still under the impression that
> vma->anon_vma is populated lazily (during the first page fault rather
> than at mmap time) to avoid doing extra work for areas which are never
> faulted. Though I might be missing some important detail here.

How often does userspace call mmap() and then _never_ fault on it?
I appreciate that userspace might mmap() gigabytes of address space and
then only end up using a small amount of it, so populating it lazily
makes sense.  But creating a region and never faulting on it?  The only
use-case I can think of is loading shared libraries:

openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
(...)
mmap(NULL, 1970000, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f0ce612e000
mmap(0x7f0ce6154000, 1396736, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x26000) = 0x7f0ce6154000
mmap(0x7f0ce62a9000, 339968, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x17b000) = 0x7f0ce62a9000
mmap(0x7f0ce62fc000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1ce000) = 0x7f0ce62fc000
mmap(0x7f0ce6302000, 53072, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f0ce6302000

but that's a file-backed VMA, not an anon VMA.
Suren Baghdasaryan Feb. 17, 2023, 4:10 p.m. UTC | #6
On Fri, Feb 17, 2023 at 8:05 AM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Thu, Feb 16, 2023 at 06:14:59PM -0800, Suren Baghdasaryan wrote:
> > On Thu, Feb 16, 2023 at 11:43 AM Suren Baghdasaryan <surenb@google.com> wrote:
> > >
> > > On Thu, Feb 16, 2023 at 7:44 AM Matthew Wilcox <willy@infradead.org> wrote:
> > > >
> > > > On Wed, Feb 15, 2023 at 09:17:41PM -0800, Suren Baghdasaryan wrote:
> > > > > When vma->anon_vma is not set, page fault handler will set it by either
> > > > > reusing anon_vma of an adjacent VMA if VMAs are compatible or by
> > > > > allocating a new one. find_mergeable_anon_vma() walks VMA tree to find
> > > > > a compatible adjacent VMA and that requires not only the faulting VMA
> > > > > to be stable but also the tree structure and other VMAs inside that tree.
> > > > > Therefore locking just the faulting VMA is not enough for this search.
> > > > > Fall back to taking mmap_lock when vma->anon_vma is not set. This
> > > > > situation happens only on the first page fault and should not affect
> > > > > overall performance.
> > > >
> > > > I think I asked this before, but don't remember getting an aswer.
> > > > Why do we defer setting anon_vma to the first fault?  Why don't we
> > > > set it up at mmap time?
> > >
> > > Yeah, I remember that conversation Matthew and I could not find the
> > > definitive answer at the time. I'll look into that again or maybe
> > > someone can answer it here.
> >
> > After looking into it again I'm still under the impression that
> > vma->anon_vma is populated lazily (during the first page fault rather
> > than at mmap time) to avoid doing extra work for areas which are never
> > faulted. Though I might be missing some important detail here.
>
> How often does userspace call mmap() and then _never_ fault on it?
> I appreciate that userspace might mmap() gigabytes of address space and
> then only end up using a small amount of it, so populating it lazily
> makes sense.  But creating a region and never faulting on it?  The only
> use-case I can think of is loading shared libraries:
>
> openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
> (...)
> mmap(NULL, 1970000, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f0ce612e000
> mmap(0x7f0ce6154000, 1396736, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x26000) = 0x7f0ce6154000
> mmap(0x7f0ce62a9000, 339968, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x17b000) = 0x7f0ce62a9000
> mmap(0x7f0ce62fc000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1ce000) = 0x7f0ce62fc000
> mmap(0x7f0ce6302000, 53072, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f0ce6302000
>
> but that's a file-backed VMA, not an anon VMA.

Might the case of dup_mmap() while forking be the reason why a VMA in
the child process might be never used while parent uses it (or visa
versa)? Again, I'm not sure this is the reason but I can find no other
good explanation.
Suren Baghdasaryan Feb. 17, 2023, 4:13 p.m. UTC | #7
On Fri, Feb 17, 2023 at 2:21 AM Hyeonggon Yoo <42.hyeyoo@gmail.com> wrote:
>
> On Fri, Feb 17, 2023 at 11:15 AM Suren Baghdasaryan <surenb@google.com> wrote:
> >
> > On Thu, Feb 16, 2023 at 11:43 AM Suren Baghdasaryan <surenb@google.com> wrote:
> > >
> > > On Thu, Feb 16, 2023 at 7:44 AM Matthew Wilcox <willy@infradead.org> wrote:
> > > >
> > > > On Wed, Feb 15, 2023 at 09:17:41PM -0800, Suren Baghdasaryan wrote:
> > > > > When vma->anon_vma is not set, page fault handler will set it by either
> > > > > reusing anon_vma of an adjacent VMA if VMAs are compatible or by
> > > > > allocating a new one. find_mergeable_anon_vma() walks VMA tree to find
> > > > > a compatible adjacent VMA and that requires not only the faulting VMA
> > > > > to be stable but also the tree structure and other VMAs inside that tree.
> > > > > Therefore locking just the faulting VMA is not enough for this search.
> > > > > Fall back to taking mmap_lock when vma->anon_vma is not set. This
> > > > > situation happens only on the first page fault and should not affect
> > > > > overall performance.
> > > >
> > > > I think I asked this before, but don't remember getting an aswer.
> > > > Why do we defer setting anon_vma to the first fault?  Why don't we
> > > > set it up at mmap time?
> > >
> > > Yeah, I remember that conversation Matthew and I could not find the
> > > definitive answer at the time. I'll look into that again or maybe
> > > someone can answer it here.
> >
> > After looking into it again I'm still under the impression that
> > vma->anon_vma is populated lazily (during the first page fault rather
> > than at mmap time) to avoid doing extra work for areas which are never
> > faulted. Though I might be missing some important detail here.
>
> I think this is because the kernel cannot merge VMAs that have
> different anon_vmas?
>
> Enabling lazy population of anon_vma could potentially increase the
> chances of merging VMAs.

Hmm. Do you have a clear explanation why merging chances increase this
way? A couple of possibilities I can think of would be:
1. If after mmap'ing a VMA and before faulting the first page into it
we often change something that affects anon_vma_compatible() decision,
like vm_policy;
2. When mmap'ing VMAs we do not map them consecutively but the final
arrangement is actually contiguous.

Don't think either of those cases would be very representative of a
usual case but maybe I'm wrong or there is another reason?

>
> > > In the end rather than changing that logic I decided to skip
> > > vma->anon_vma==NULL cases because I measured them being less than
> > > 0.01% of all page faults, so ROI from changing that would be quite
> > > low. But I agree that the logic is weird and maybe we can improve
> > > that. I will have to review that again when I'm working on eliminating
> > > all these special cases we skip, like swap/userfaults/etc.
>
> --
> To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com.
>
Hyeonggon Yoo Feb. 17, 2023, 6:49 p.m. UTC | #8
On Fri, Feb 17, 2023 at 08:13:01AM -0800, Suren Baghdasaryan wrote:
> On Fri, Feb 17, 2023 at 2:21 AM Hyeonggon Yoo <42.hyeyoo@gmail.com> wrote:
> >
> > On Fri, Feb 17, 2023 at 11:15 AM Suren Baghdasaryan <surenb@google.com> wrote:
> > >
> > > On Thu, Feb 16, 2023 at 11:43 AM Suren Baghdasaryan <surenb@google.com> wrote:
> > > >
> > > > On Thu, Feb 16, 2023 at 7:44 AM Matthew Wilcox <willy@infradead.org> wrote:
> > > > >
> > > > > On Wed, Feb 15, 2023 at 09:17:41PM -0800, Suren Baghdasaryan wrote:
> > > > > > When vma->anon_vma is not set, page fault handler will set it by either
> > > > > > reusing anon_vma of an adjacent VMA if VMAs are compatible or by
> > > > > > allocating a new one. find_mergeable_anon_vma() walks VMA tree to find
> > > > > > a compatible adjacent VMA and that requires not only the faulting VMA
> > > > > > to be stable but also the tree structure and other VMAs inside that tree.
> > > > > > Therefore locking just the faulting VMA is not enough for this search.
> > > > > > Fall back to taking mmap_lock when vma->anon_vma is not set. This
> > > > > > situation happens only on the first page fault and should not affect
> > > > > > overall performance.
> > > > >
> > > > > I think I asked this before, but don't remember getting an aswer.
> > > > > Why do we defer setting anon_vma to the first fault?  Why don't we
> > > > > set it up at mmap time?
> > > >
> > > > Yeah, I remember that conversation Matthew and I could not find the
> > > > definitive answer at the time. I'll look into that again or maybe
> > > > someone can answer it here.
> > >
> > > After looking into it again I'm still under the impression that
> > > vma->anon_vma is populated lazily (during the first page fault rather
> > > than at mmap time) to avoid doing extra work for areas which are never
> > > faulted. Though I might be missing some important detail here.
> >
> > I think this is because the kernel cannot merge VMAs that have
> > different anon_vmas?
> >
> > Enabling lazy population of anon_vma could potentially increase the
> > chances of merging VMAs.
> 
> Hmm. Do you have a clear explanation why merging chances increase this
> way? A couple of possibilities I can think of would be:
> 1. If after mmap'ing a VMA and before faulting the first page into it
> we often change something that affects anon_vma_compatible() decision,
> like vm_policy;
> 2. When mmap'ing VMAs we do not map them consecutively but the final
> arrangement is actually contiguous.
> 
> Don't think either of those cases would be very representative of a
> usual case but maybe I'm wrong or there is another reason?

Ok. I agree it does not represent common cases.

Hmm then I wonder how it went from the initial approach of "allocate anon_vma
objects only via fork()" [1] to "populate anon_vma at page faults". [2] [3]

Maybe Hugh, Andrea or Andrew have opinions?

[1] anon_vma RFC2, lore.kernel.org
https://lore.kernel.org/lkml/20040311065254.GT30940@dualathlon.random

[2] The status of object-based reverse mapping, LWN.net
https://lwn.net/Articles/85908

[3] rmap 39 add anon_vma rmap
https://gitlab.com/hyeyoo/linux-historical/-/commit/8aa3448cabdfca146aa3fd36e852d0209fb2276a

> 
> >
> > > > In the end rather than changing that logic I decided to skip
> > > > vma->anon_vma==NULL cases because I measured them being less than
> > > > 0.01% of all page faults, so ROI from changing that would be quite
> > > > low. But I agree that the logic is weird and maybe we can improve
> > > > that. I will have to review that again when I'm working on eliminating
> > > > all these special cases we skip, like swap/userfaults/etc.
> >
> > --
> > To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com.
> >
Matthew Wilcox April 3, 2023, 7:49 p.m. UTC | #9
On Fri, Feb 17, 2023 at 08:10:35AM -0800, Suren Baghdasaryan wrote:
> On Fri, Feb 17, 2023 at 8:05 AM Matthew Wilcox <willy@infradead.org> wrote:
> >
> > On Thu, Feb 16, 2023 at 06:14:59PM -0800, Suren Baghdasaryan wrote:
> > > On Thu, Feb 16, 2023 at 11:43 AM Suren Baghdasaryan <surenb@google.com> wrote:
> > > >
> > > > On Thu, Feb 16, 2023 at 7:44 AM Matthew Wilcox <willy@infradead.org> wrote:
> > > > >
> > > > > On Wed, Feb 15, 2023 at 09:17:41PM -0800, Suren Baghdasaryan wrote:
> > > > > > When vma->anon_vma is not set, page fault handler will set it by either
> > > > > > reusing anon_vma of an adjacent VMA if VMAs are compatible or by
> > > > > > allocating a new one. find_mergeable_anon_vma() walks VMA tree to find
> > > > > > a compatible adjacent VMA and that requires not only the faulting VMA
> > > > > > to be stable but also the tree structure and other VMAs inside that tree.
> > > > > > Therefore locking just the faulting VMA is not enough for this search.
> > > > > > Fall back to taking mmap_lock when vma->anon_vma is not set. This
> > > > > > situation happens only on the first page fault and should not affect
> > > > > > overall performance.
> > > > >
> > > > > I think I asked this before, but don't remember getting an aswer.
> > > > > Why do we defer setting anon_vma to the first fault?  Why don't we
> > > > > set it up at mmap time?
> > > >
> > > > Yeah, I remember that conversation Matthew and I could not find the
> > > > definitive answer at the time. I'll look into that again or maybe
> > > > someone can answer it here.
> > >
> > > After looking into it again I'm still under the impression that
> > > vma->anon_vma is populated lazily (during the first page fault rather
> > > than at mmap time) to avoid doing extra work for areas which are never
> > > faulted. Though I might be missing some important detail here.
> >
> > How often does userspace call mmap() and then _never_ fault on it?
> > I appreciate that userspace might mmap() gigabytes of address space and
> > then only end up using a small amount of it, so populating it lazily
> > makes sense.  But creating a region and never faulting on it?  The only
> > use-case I can think of is loading shared libraries:
> >
> > openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
> > (...)
> > mmap(NULL, 1970000, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f0ce612e000
> > mmap(0x7f0ce6154000, 1396736, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x26000) = 0x7f0ce6154000
> > mmap(0x7f0ce62a9000, 339968, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x17b000) = 0x7f0ce62a9000
> > mmap(0x7f0ce62fc000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1ce000) = 0x7f0ce62fc000
> > mmap(0x7f0ce6302000, 53072, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f0ce6302000
> >
> > but that's a file-backed VMA, not an anon VMA.
> 
> Might the case of dup_mmap() while forking be the reason why a VMA in
> the child process might be never used while parent uses it (or visa
> versa)? Again, I'm not sure this is the reason but I can find no other
> good explanation.

I found an explanation!  Well, a partial one.  If we MAP_PRIVATE a file
mapping (like, er those ones up there) and only take read faults on it,
we can postpone allocation of the anon_vma indefinitely.  But once we
take a write fault in that VMA, we need to allocate an anon_vma for it
so that we can track the anonymous pages that have been allocated to
satisfy the copy-on-write (see do_cow_fault()).

However, I think in that caase, we could probably skip the
find_mergeable_anon_vma() step.  We don't today; we check whether
a->vm_file == b->vm_file in anon_vma_compatible, but I wonder if that
triggers often.
diff mbox series

Patch

diff --git a/mm/memory.c b/mm/memory.c
index 5e1c124552a1..13369ff15ec1 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -5242,6 +5242,10 @@  struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm,
 	if (!vma_is_anonymous(vma))
 		goto inval;
 
+	/* find_mergeable_anon_vma uses adjacent vmas which are not locked */
+	if (!vma->anon_vma)
+		goto inval;
+
 	if (!vma_start_read(vma))
 		goto inval;