Message ID | 20241128113714.492474-1-lorenzo.stoakes@oracle.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | perf: map pages in advance | expand |
On 28.11.24 12:37, Lorenzo Stoakes wrote: > We are current refactoring struct page to make it smaller, removing > unneeded fields that correctly belong to struct folio. > > Two of those fields are page->index and page->mapping. Perf is currently > making use of both of these, so this patch removes this usage as it turns > out it is unnecessary. > > Perf establishes its own internally controlled memory-mapped pages using > vm_ops hooks. The first page in the mapping is the read/write user control > page, and the rest of the mapping consists of read-only pages. > > The VMA is backed by kernel memory either from the buddy allocator or > vmalloc depending on configuration. It is intended to be mapped read/write, > but because it has a page_mkwrite() hook, vma_wants_writenotify() indicaets > that it should be mapped read-only. > > When a write fault occurs, the provided page_mkwrite() hook, > perf_mmap_fault() (doing double duty handing faults as well) uses the > vmf->pgoff field to determine if this is the first page, allowing for the > desired read/write first page, read-only rest mapping. > > For this to work the implementation has to carefully work around faulting > logic. When a page is write-faulted, the fault() hook is called first, then > its page_mkwrite() hook is called (to allow for dirty tracking in file > systems). > > On fault we set the folio's mapping in perf_mmap_fault(), this is because > when do_page_mkwrite() is subsequently invoked, it treats a missing mapping > as an indicator that the fault should be retried. > > We also set the folio's index so, given the folio is being treated as faux > user memory, it correctly references its offset within the VMA. > > This explains why the mapping and index fields are used - but it's not > necessary. > > We preallocate pages when perf_mmap() is called for the first time via > rb_alloc(), and further allocate auxiliary pages via rb_aux_alloc() as > needed if the mapping requires it. > > This allocation is done in the f_ops->mmap() hook provided in perf_mmap(), > and so we can instead simply map all the memory right away here - there's > no point in handling (read) page faults when we don't demand page nor need > to be notified about them (perf does not). > > This patch therefore changes this logic to map everything when the mmap() > hook is called, establishing a PFN map. It implements vm_ops->pfn_mkwrite() > to provide the required read/write vs. read-only behaviour, which does not > require the previously implemented workarounds. > > It makes sense semantically to establish a PFN map too - we are managing > the pages internally and so it is appropriate to mark this as a special > mapping. It's rather sad seeing more PFNMAP users where PFNMAP is not really required (-> this is struct page backed). Especially having to perform several independent remap_pfn_range() calls rather looks like yet another workaround ... Would we be able to achieve something comparable with vm_insert_pages(), to just map them in advance?
On Thu, Nov 28, 2024 at 02:08:27PM +0100, David Hildenbrand wrote: > On 28.11.24 12:37, Lorenzo Stoakes wrote: > > We are current refactoring struct page to make it smaller, removing > > unneeded fields that correctly belong to struct folio. > > > > Two of those fields are page->index and page->mapping. Perf is currently > > making use of both of these, so this patch removes this usage as it turns > > out it is unnecessary. > > > > Perf establishes its own internally controlled memory-mapped pages using > > vm_ops hooks. The first page in the mapping is the read/write user control > > page, and the rest of the mapping consists of read-only pages. > > > > The VMA is backed by kernel memory either from the buddy allocator or > > vmalloc depending on configuration. It is intended to be mapped read/write, > > but because it has a page_mkwrite() hook, vma_wants_writenotify() indicaets > > that it should be mapped read-only. > > > > When a write fault occurs, the provided page_mkwrite() hook, > > perf_mmap_fault() (doing double duty handing faults as well) uses the > > vmf->pgoff field to determine if this is the first page, allowing for the > > desired read/write first page, read-only rest mapping. > > > > For this to work the implementation has to carefully work around faulting > > logic. When a page is write-faulted, the fault() hook is called first, then > > its page_mkwrite() hook is called (to allow for dirty tracking in file > > systems). > > > > On fault we set the folio's mapping in perf_mmap_fault(), this is because > > when do_page_mkwrite() is subsequently invoked, it treats a missing mapping > > as an indicator that the fault should be retried. > > > > We also set the folio's index so, given the folio is being treated as faux > > user memory, it correctly references its offset within the VMA. > > > > This explains why the mapping and index fields are used - but it's not > > necessary. > > > > We preallocate pages when perf_mmap() is called for the first time via > > rb_alloc(), and further allocate auxiliary pages via rb_aux_alloc() as > > needed if the mapping requires it. > > > > This allocation is done in the f_ops->mmap() hook provided in perf_mmap(), > > and so we can instead simply map all the memory right away here - there's > > no point in handling (read) page faults when we don't demand page nor need > > to be notified about them (perf does not). > > > > This patch therefore changes this logic to map everything when the mmap() > > hook is called, establishing a PFN map. It implements vm_ops->pfn_mkwrite() > > to provide the required read/write vs. read-only behaviour, which does not > > require the previously implemented workarounds. > > > > It makes sense semantically to establish a PFN map too - we are managing > > the pages internally and so it is appropriate to mark this as a special > > mapping. > > It's rather sad seeing more PFNMAP users where PFNMAP is not really required > (-> this is struct page backed). > > Especially having to perform several independent remap_pfn_range() calls > rather looks like yet another workaround ... > > Would we be able to achieve something comparable with vm_insert_pages(), to > just map them in advance? Well, that's the thing, we can't use VM_MIXEDMAP as vm_insert_pages() and friends all refer vma->vm_page_prot which is not yet _correctly_ established at the point of the f_op->mmap() hook being invoked :) We set the field in __mmap_new_vma(), _but_ importantly, we defer the writenotify check to __mmap_complete() (set in vma_set_page_prot()) - so if we were to try to map using VM_MIXEDMAP in the f_op->mmap() hook, we'd get read/write mappings, which is emphatically not what we want - we want them read-only mapped, and for vm_ops->pfn_mkwrite() to be called so we can make the first page read/write and the rest read-only. It's this requirement that means this is really the only way to do this as far as I can tell. It is appropriate and correct that this is either a VM_PFNMAP or VM_MIXEDMAP mapping, as the pages reference kernel-allocated memory and are managed by perf, not put on any LRU, etc. It sucks to have to loop like this and it feels like a workaround, which makes me wonder if we need a new interface to better allow this stuff on mmap... In any case I think this is the most sensible solution currently available that avoids the pre-existing situation of pretending the pages are folios but somewhat abusing the interface to allow page_mkwrite() to work correctly by setting page->index, mapping. The alternative to this would be to folio-fy, but these are emphatically _not_ folios, that is userland memory managed as userland memory, it's a mapping onto kernel memory exposed to userspace. It feels like probably VM_MIXEDMAP is a better way of doing it, but you'd need to expose an interface that doesn't assume the VMA is already fully set up... but I think one for a future series perhaps. > > -- > Cheers, > > David / dhildenb > >
On 28.11.24 14:20, Lorenzo Stoakes wrote: > On Thu, Nov 28, 2024 at 02:08:27PM +0100, David Hildenbrand wrote: >> On 28.11.24 12:37, Lorenzo Stoakes wrote: >>> We are current refactoring struct page to make it smaller, removing >>> unneeded fields that correctly belong to struct folio. >>> >>> Two of those fields are page->index and page->mapping. Perf is currently >>> making use of both of these, so this patch removes this usage as it turns >>> out it is unnecessary. >>> >>> Perf establishes its own internally controlled memory-mapped pages using >>> vm_ops hooks. The first page in the mapping is the read/write user control >>> page, and the rest of the mapping consists of read-only pages. >>> >>> The VMA is backed by kernel memory either from the buddy allocator or >>> vmalloc depending on configuration. It is intended to be mapped read/write, >>> but because it has a page_mkwrite() hook, vma_wants_writenotify() indicaets >>> that it should be mapped read-only. >>> >>> When a write fault occurs, the provided page_mkwrite() hook, >>> perf_mmap_fault() (doing double duty handing faults as well) uses the >>> vmf->pgoff field to determine if this is the first page, allowing for the >>> desired read/write first page, read-only rest mapping. >>> >>> For this to work the implementation has to carefully work around faulting >>> logic. When a page is write-faulted, the fault() hook is called first, then >>> its page_mkwrite() hook is called (to allow for dirty tracking in file >>> systems). >>> >>> On fault we set the folio's mapping in perf_mmap_fault(), this is because >>> when do_page_mkwrite() is subsequently invoked, it treats a missing mapping >>> as an indicator that the fault should be retried. >>> >>> We also set the folio's index so, given the folio is being treated as faux >>> user memory, it correctly references its offset within the VMA. >>> >>> This explains why the mapping and index fields are used - but it's not >>> necessary. >>> >>> We preallocate pages when perf_mmap() is called for the first time via >>> rb_alloc(), and further allocate auxiliary pages via rb_aux_alloc() as >>> needed if the mapping requires it. >>> >>> This allocation is done in the f_ops->mmap() hook provided in perf_mmap(), >>> and so we can instead simply map all the memory right away here - there's >>> no point in handling (read) page faults when we don't demand page nor need >>> to be notified about them (perf does not). >>> >>> This patch therefore changes this logic to map everything when the mmap() >>> hook is called, establishing a PFN map. It implements vm_ops->pfn_mkwrite() >>> to provide the required read/write vs. read-only behaviour, which does not >>> require the previously implemented workarounds. >>> >>> It makes sense semantically to establish a PFN map too - we are managing >>> the pages internally and so it is appropriate to mark this as a special >>> mapping. >> >> It's rather sad seeing more PFNMAP users where PFNMAP is not really required >> (-> this is struct page backed). >> >> Especially having to perform several independent remap_pfn_range() calls >> rather looks like yet another workaround ... >> >> Would we be able to achieve something comparable with vm_insert_pages(), to >> just map them in advance? > > Well, that's the thing, we can't use VM_MIXEDMAP as vm_insert_pages() and > friends all refer vma->vm_page_prot which is not yet _correctly_ established at > the point of the f_op->mmap() hook being invoked :) So all you want is a vm_insert_pages() variant where we can pass in the vm_page_prot? Or a way detect internally that it is not setup yet and fallback to vm_get_page_prot(vma->vm_flags & ~VM_SHARED)? Or a way to just remove write permissions? > > We set the field in __mmap_new_vma(), _but_ importantly, we defer the > writenotify check to __mmap_complete() (set in vma_set_page_prot()) - so if we > were to try to map using VM_MIXEDMAP in the f_op->mmap() hook, we'd get > read/write mappings, which is emphatically not what we want - we want them > read-only mapped, and for vm_ops->pfn_mkwrite() to be called so we can make the > first page read/write and the rest read-only. > > It's this requirement that means this is really the only way to do this as far > as I can tell. > > It is appropriate and correct that this is either a VM_PFNMAP or VM_MIXEDMAP > mapping, as the pages reference kernel-allocated memory and are managed by perf, > not put on any LRU, etc. > > It sucks to have to loop like this and it feels like a workaround, which makes > me wonder if we need a new interface to better allow this stuff on mmap... > > In any case I think this is the most sensible solution currently available that > avoids the pre-existing situation of pretending the pages are folios but > somewhat abusing the interface to allow page_mkwrite() to work correctly by > setting page->index, mapping. Yes, that page->index stuff is nasty. > > The alternative to this would be to folio-fy, but these are emphatically _not_ > folios, that is userland memory managed as userland memory, it's a mapping onto > kernel memory exposed to userspace. Yes, we should even move away from folios completely in the future for vm_insert_page(). > > It feels like probably VM_MIXEDMAP is a better way of doing it, but you'd need > to expose an interface that doesn't assume the VMA is already fully set > up... but I think one for a future series perhaps. If the solution to your problem is as easy as making vm_insert_pages() pass something else than vma->vm_page_prot to insert_pages(), then I think we should go for that. Like ... vm_insert_pages_prot(). Observe how we already have vmf_insert_pfn() vs. vmf_insert_pfn_prot(). But yes, in an ideal world we'd avoid having temporarily messed up vma->vm_page_prot. So we'd then document clearly how vm_insert_pages_prot() may be used.
On Thu, Nov 28, 2024 at 02:37:17PM +0100, David Hildenbrand wrote: > On 28.11.24 14:20, Lorenzo Stoakes wrote: > > On Thu, Nov 28, 2024 at 02:08:27PM +0100, David Hildenbrand wrote: > > > On 28.11.24 12:37, Lorenzo Stoakes wrote: [snip] > > > > It makes sense semantically to establish a PFN map too - we are managing > > > > the pages internally and so it is appropriate to mark this as a special > > > > mapping. > > > > > > It's rather sad seeing more PFNMAP users where PFNMAP is not really required > > > (-> this is struct page backed). > > > > > > Especially having to perform several independent remap_pfn_range() calls > > > rather looks like yet another workaround ... > > > > > > Would we be able to achieve something comparable with vm_insert_pages(), to > > > just map them in advance? > > > > Well, that's the thing, we can't use VM_MIXEDMAP as vm_insert_pages() and > > friends all refer vma->vm_page_prot which is not yet _correctly_ established at > > the point of the f_op->mmap() hook being invoked :) > > So all you want is a vm_insert_pages() variant where we can pass in the > vm_page_prot? Hmm, looking into the code I don't think VM_MIXEDMAP is correct after all. We don't want these pages touched at all, we manage them ourselves, and VM_MIXEDMAP, unless mapping memory mapped I/O pages, will treat them as such. For instance, vm_insert_page() -> insert_page() -> insert_page_into_pte_locked() acts as if this is a folio, manipulating the ref count and invoking folio_add_file_rmap_pte() - which we emphatically do not want. Since this is a non-CoW mapping (according to vma->vm_flags), VM_PFNMAP even with !IS_ENABLED(CONFIG_ARCH_HAS_PTE_SPECIAL) will not touch vma->vm_pgoff (which we rely on equalling the offset into the range) and all works as it should. > > Or a way detect internally that it is not setup yet and fallback to > vm_get_page_prot(vma->vm_flags & ~VM_SHARED)? No, this would be too complicated I think. And we can't know that we will need to do this necessarily... Better to just expose prot the same way we do for remap_pfn_range(). > > Or a way to just remove write permissions? We can't really do this without bigger changes to the faulting mechanism, because a shared mapping with VM_WRITE that write-faults but doesn't have a page_mkwrite() or pfn_mkwrite() hook will end up invoking wp_page_reuse() and be made r/w again. > > > > > We set the field in __mmap_new_vma(), _but_ importantly, we defer the > > writenotify check to __mmap_complete() (set in vma_set_page_prot()) - so if we > > were to try to map using VM_MIXEDMAP in the f_op->mmap() hook, we'd get > > read/write mappings, which is emphatically not what we want - we want them > > read-only mapped, and for vm_ops->pfn_mkwrite() to be called so we can make the > > first page read/write and the rest read-only. > > > > It's this requirement that means this is really the only way to do this as far > > as I can tell. > > > > It is appropriate and correct that this is either a VM_PFNMAP or VM_MIXEDMAP > > mapping, as the pages reference kernel-allocated memory and are managed by perf, > > not put on any LRU, etc. > > > > It sucks to have to loop like this and it feels like a workaround, which makes > > me wonder if we need a new interface to better allow this stuff on mmap... > > > > In any case I think this is the most sensible solution currently available that > > avoids the pre-existing situation of pretending the pages are folios but > > somewhat abusing the interface to allow page_mkwrite() to work correctly by > > setting page->index, mapping. > > Yes, that page->index stuff is nasty. It's the ->mapping that is more of the issue I think, as that _has_ to be set in the original version, I can't actually see why index _must_ be set, there should be no case in which rmap is used on the page, so possibly was a mistake, but both fields are going from struct page so both must be eliminated :) > > > > > The alternative to this would be to folio-fy, but these are emphatically _not_ > > folios, that is userland memory managed as userland memory, it's a mapping onto > > kernel memory exposed to userspace. > > Yes, we should even move away from folios completely in the future for > vm_insert_page(). Well, isn't VM_MIXEDMAP intended specifically so you can mix normal user pages that live in the LRU and have an rmap etc. etc. with PFN mappings to I/O mapped memory? :) so then that's folios + raw PFN's. > > > > > It feels like probably VM_MIXEDMAP is a better way of doing it, but you'd need > > to expose an interface that doesn't assume the VMA is already fully set > > up... but I think one for a future series perhaps. > > If the solution to your problem is as easy as making vm_insert_pages() pass > something else than vma->vm_page_prot to insert_pages(), then I think we > should go for that. Like ... vm_insert_pages_prot(). Sadly no for reasons above. > > Observe how we already have vmf_insert_pfn() vs. vmf_insert_pfn_prot(). But > yes, in an ideal world we'd avoid having temporarily messed up > vma->vm_page_prot. So we'd then document clearly how vm_insert_pages_prot() > may be used. I think the thing with the delay in setting vma->vm_page_prot properly that is we have a chicken and egg scenario (oh so often the case in mmap_region() logic...) in that the mmap hook might change some of these flags which changes what that function will do... I was discussing with Liam recently how perhaps we should see how feasible it is to do away with this hook and replace it with something where drivers specify which VMA flags they want to set _ahead of time_, since this really is the only thing they should be changing other than vma->vm_private_data. Then we could possibly have a hook _only_ for assigning vma->vm_private_data to allow for any driver-specific init logic and doing mappings, and hey presto we have made things vastly saner. Could perhaps pass a const struct vm_area_struct * to make this clear... But I may be missing some weird corner cases (hey probably am) or being too optimistic :>) > > -- > Cheers, > > David / dhildenb > I wonder if we need a new interface then for 'pages which we don't want touched but do have a struct page' that is more expressed by the interface than remap_pfn_range() expresses. I mean from the comment around vm_normal_page(): * "Special" mappings do not wish to be associated with a "struct page" (either * it doesn't exist, or it exists but they don't want to touch it). In this * case, NULL is returned here. "Normal" mappings do have a struct page. ... * A raw VM_PFNMAP mapping (ie. one that is not COWed) is always considered a * special mapping (even if there are underlying and valid "struct pages"). * COWed pages of a VM_PFNMAP are always normal. So there's precedence for us just putting pages we allocate/manage ourselves in a VM_PFNMAP. So I guess this interface would be something like: int remap_kernel_pages(struct vm_area_struct *vma, unsigned long addr, struct page **pages, unsigned long size, pgprot_t prot); Certainly this area of the kernel is a bit confusing at any rate...
diff --git a/kernel/events/core.c b/kernel/events/core.c index 5d4a54f50826..0754b070497f 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -6284,41 +6284,6 @@ void perf_event_update_userpage(struct perf_event *event) } EXPORT_SYMBOL_GPL(perf_event_update_userpage); -static vm_fault_t perf_mmap_fault(struct vm_fault *vmf) -{ - struct perf_event *event = vmf->vma->vm_file->private_data; - struct perf_buffer *rb; - vm_fault_t ret = VM_FAULT_SIGBUS; - - if (vmf->flags & FAULT_FLAG_MKWRITE) { - if (vmf->pgoff == 0) - ret = 0; - return ret; - } - - rcu_read_lock(); - rb = rcu_dereference(event->rb); - if (!rb) - goto unlock; - - if (vmf->pgoff && (vmf->flags & FAULT_FLAG_WRITE)) - goto unlock; - - vmf->page = perf_mmap_to_page(rb, vmf->pgoff); - if (!vmf->page) - goto unlock; - - get_page(vmf->page); - vmf->page->mapping = vmf->vma->vm_file->f_mapping; - vmf->page->index = vmf->pgoff; - - ret = 0; -unlock: - rcu_read_unlock(); - - return ret; -} - static void ring_buffer_attach(struct perf_event *event, struct perf_buffer *rb) { @@ -6558,13 +6523,47 @@ static void perf_mmap_close(struct vm_area_struct *vma) ring_buffer_put(rb); /* could be last */ } +static vm_fault_t perf_mmap_pfn_mkwrite(struct vm_fault *vmf) +{ + /* The first page is the user control page, others are read-only. */ + return vmf->pgoff == 0 ? 0 : VM_FAULT_SIGBUS; +} + static const struct vm_operations_struct perf_mmap_vmops = { .open = perf_mmap_open, .close = perf_mmap_close, /* non mergeable */ - .fault = perf_mmap_fault, - .page_mkwrite = perf_mmap_fault, + .pfn_mkwrite = perf_mmap_pfn_mkwrite, }; +static int map_range(struct perf_buffer *rb, struct vm_area_struct *vma) +{ + unsigned long nr_pages = vma_pages(vma); + int err = 0; + unsigned long pgoff; + + for (pgoff = 0; pgoff < nr_pages; pgoff++) { + unsigned long va = vma->vm_start + PAGE_SIZE * pgoff; + struct page *page = perf_mmap_to_page(rb, pgoff); + + if (page == NULL) { + err = -EINVAL; + break; + } + + /* Map readonly, perf_mmap_pfn_mkwrite() called on write fault. */ + err = remap_pfn_range(vma, va, page_to_pfn(page), PAGE_SIZE, + vm_get_page_prot(vma->vm_flags & ~VM_SHARED)); + if (err) + break; + } + + /* Clear any partial mappings on error. */ + if (err) + zap_page_range_single(vma, vma->vm_start, nr_pages * PAGE_SIZE, NULL); + + return err; +} + static int perf_mmap(struct file *file, struct vm_area_struct *vma) { struct perf_event *event = file->private_data; @@ -6783,6 +6782,9 @@ static int perf_mmap(struct file *file, struct vm_area_struct *vma) vm_flags_set(vma, VM_DONTCOPY | VM_DONTEXPAND | VM_DONTDUMP); vma->vm_ops = &perf_mmap_vmops; + if (!ret) + ret = map_range(rb, vma); + if (event->pmu->event_mapped) event->pmu->event_mapped(event, vma->vm_mm); diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c index 4f46f688d0d4..180509132d4b 100644 --- a/kernel/events/ring_buffer.c +++ b/kernel/events/ring_buffer.c @@ -643,7 +643,6 @@ static void rb_free_aux_page(struct perf_buffer *rb, int idx) struct page *page = virt_to_page(rb->aux_pages[idx]); ClearPagePrivate(page); - page->mapping = NULL; __free_page(page); } @@ -819,7 +818,6 @@ static void perf_mmap_free_page(void *addr) { struct page *page = virt_to_page(addr); - page->mapping = NULL; __free_page(page); } @@ -890,28 +888,13 @@ __perf_mmap_to_page(struct perf_buffer *rb, unsigned long pgoff) return vmalloc_to_page((void *)rb->user_page + pgoff * PAGE_SIZE); } -static void perf_mmap_unmark_page(void *addr) -{ - struct page *page = vmalloc_to_page(addr); - - page->mapping = NULL; -} - static void rb_free_work(struct work_struct *work) { struct perf_buffer *rb; - void *base; - int i, nr; rb = container_of(work, struct perf_buffer, work); - nr = data_page_nr(rb); - - base = rb->user_page; - /* The '<=' counts in the user page. */ - for (i = 0; i <= nr; i++) - perf_mmap_unmark_page(base + (i * PAGE_SIZE)); - vfree(base); + vfree(rb->user_page); kfree(rb); }
We are current refactoring struct page to make it smaller, removing unneeded fields that correctly belong to struct folio. Two of those fields are page->index and page->mapping. Perf is currently making use of both of these, so this patch removes this usage as it turns out it is unnecessary. Perf establishes its own internally controlled memory-mapped pages using vm_ops hooks. The first page in the mapping is the read/write user control page, and the rest of the mapping consists of read-only pages. The VMA is backed by kernel memory either from the buddy allocator or vmalloc depending on configuration. It is intended to be mapped read/write, but because it has a page_mkwrite() hook, vma_wants_writenotify() indicaets that it should be mapped read-only. When a write fault occurs, the provided page_mkwrite() hook, perf_mmap_fault() (doing double duty handing faults as well) uses the vmf->pgoff field to determine if this is the first page, allowing for the desired read/write first page, read-only rest mapping. For this to work the implementation has to carefully work around faulting logic. When a page is write-faulted, the fault() hook is called first, then its page_mkwrite() hook is called (to allow for dirty tracking in file systems). On fault we set the folio's mapping in perf_mmap_fault(), this is because when do_page_mkwrite() is subsequently invoked, it treats a missing mapping as an indicator that the fault should be retried. We also set the folio's index so, given the folio is being treated as faux user memory, it correctly references its offset within the VMA. This explains why the mapping and index fields are used - but it's not necessary. We preallocate pages when perf_mmap() is called for the first time via rb_alloc(), and further allocate auxiliary pages via rb_aux_alloc() as needed if the mapping requires it. This allocation is done in the f_ops->mmap() hook provided in perf_mmap(), and so we can instead simply map all the memory right away here - there's no point in handling (read) page faults when we don't demand page nor need to be notified about them (perf does not). This patch therefore changes this logic to map everything when the mmap() hook is called, establishing a PFN map. It implements vm_ops->pfn_mkwrite() to provide the required read/write vs. read-only behaviour, which does not require the previously implemented workarounds. It makes sense semantically to establish a PFN map too - we are managing the pages internally and so it is appropriate to mark this as a special mapping. There should be no change to actual functionality as a result of this change. Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> --- kernel/events/core.c | 76 +++++++++++++++++++------------------ kernel/events/ring_buffer.c | 19 +--------- 2 files changed, 40 insertions(+), 55 deletions(-)