Message ID | d885cb259174736c2830a5dfe07f81b214ef3faa.1739469950.git.lorenzo.stoakes@oracle.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm: permit guard regions for file-backed/shmem mappings | expand |
On 2/13/25 19:17, Lorenzo Stoakes wrote: > There is no reason to disallow guard regions in file-backed mappings - > readahead and fault-around both function correctly in the presence of PTE > markers, equally other operations relating to memory-mapped files function > correctly. > > Additionally, read-only mappings if introducing guard-regions, only > restrict the mapping further, which means there is no violation of any > access rights by permitting this to be so. > > Removing this restriction allows for read-only mapped files (such as > executable files) correctly which would otherwise not be permitted. > > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> > --- > mm/madvise.c | 8 +------- > 1 file changed, 1 insertion(+), 7 deletions(-) > > diff --git a/mm/madvise.c b/mm/madvise.c > index 6ecead476a80..e01e93e179a8 100644 > --- a/mm/madvise.c > +++ b/mm/madvise.c > @@ -1051,13 +1051,7 @@ static bool is_valid_guard_vma(struct vm_area_struct *vma, bool allow_locked) > if (!allow_locked) > disallowed |= VM_LOCKED; > > - if (!vma_is_anonymous(vma)) > - return false; > - > - if ((vma->vm_flags & (VM_MAYWRITE | disallowed)) != VM_MAYWRITE) > - return false; > - > - return true; > + return !(vma->vm_flags & disallowed); > } > > static bool is_guard_pte_marker(pte_t ptent)
On 13.02.25 19:17, Lorenzo Stoakes wrote: > There is no reason to disallow guard regions in file-backed mappings - > readahead and fault-around both function correctly in the presence of PTE > markers, equally other operations relating to memory-mapped files function > correctly. > > Additionally, read-only mappings if introducing guard-regions, only > restrict the mapping further, which means there is no violation of any > access rights by permitting this to be so. > > Removing this restriction allows for read-only mapped files (such as > executable files) correctly which would otherwise not be permitted. > > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> > --- > mm/madvise.c | 8 +------- > 1 file changed, 1 insertion(+), 7 deletions(-) > > diff --git a/mm/madvise.c b/mm/madvise.c > index 6ecead476a80..e01e93e179a8 100644 > --- a/mm/madvise.c > +++ b/mm/madvise.c > @@ -1051,13 +1051,7 @@ static bool is_valid_guard_vma(struct vm_area_struct *vma, bool allow_locked) > if (!allow_locked) > disallowed |= VM_LOCKED; > > - if (!vma_is_anonymous(vma)) > - return false; > - > - if ((vma->vm_flags & (VM_MAYWRITE | disallowed)) != VM_MAYWRITE) > - return false; > - > - return true; > + return !(vma->vm_flags & disallowed); > } > > static bool is_guard_pte_marker(pte_t ptent) Acked-by: David Hildenbrand <david@redhat.com> I assume these markers cannot completely prevent us from allocating pages/folios for these underlying file/pageache ranges of these markers in case of shmem during page faults, right?
On Tue, Feb 18, 2025 at 05:01:16PM +0100, David Hildenbrand wrote: > On 13.02.25 19:17, Lorenzo Stoakes wrote: > > There is no reason to disallow guard regions in file-backed mappings - > > readahead and fault-around both function correctly in the presence of PTE > > markers, equally other operations relating to memory-mapped files function > > correctly. > > > > Additionally, read-only mappings if introducing guard-regions, only > > restrict the mapping further, which means there is no violation of any > > access rights by permitting this to be so. > > > > Removing this restriction allows for read-only mapped files (such as > > executable files) correctly which would otherwise not be permitted. > > > > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> > > --- > > mm/madvise.c | 8 +------- > > 1 file changed, 1 insertion(+), 7 deletions(-) > > > > diff --git a/mm/madvise.c b/mm/madvise.c > > index 6ecead476a80..e01e93e179a8 100644 > > --- a/mm/madvise.c > > +++ b/mm/madvise.c > > @@ -1051,13 +1051,7 @@ static bool is_valid_guard_vma(struct vm_area_struct *vma, bool allow_locked) > > if (!allow_locked) > > disallowed |= VM_LOCKED; > > - if (!vma_is_anonymous(vma)) > > - return false; > > - > > - if ((vma->vm_flags & (VM_MAYWRITE | disallowed)) != VM_MAYWRITE) > > - return false; > > - > > - return true; > > + return !(vma->vm_flags & disallowed); > > } > > static bool is_guard_pte_marker(pte_t ptent) > > Acked-by: David Hildenbrand <david@redhat.com> Thanks! > > I assume these markers cannot completely prevent us from allocating > pages/folios for these underlying file/pageache ranges of these markers in > case of shmem during page faults, right? If the markers are in place, then page faulting will result in a segfault. If we faulted in a shmem page then installed markers (which would zap the range), then the page cache will be populated, but obviously subject to standard reclaim. If we perform synchronous readahead prior to a guard region that includes (partially or fully) a guard region we might major fault entries into the page cache that are then not accessable _from that mapping_, this is rather unavoidable as this doesn't account for page table mappings and should be largely trivial overhead (also these folios are reclaimable). > > -- > Cheers, > > David / dhildenb >
On 18.02.25 17:12, Lorenzo Stoakes wrote: > On Tue, Feb 18, 2025 at 05:01:16PM +0100, David Hildenbrand wrote: >> On 13.02.25 19:17, Lorenzo Stoakes wrote: >>> There is no reason to disallow guard regions in file-backed mappings - >>> readahead and fault-around both function correctly in the presence of PTE >>> markers, equally other operations relating to memory-mapped files function >>> correctly. >>> >>> Additionally, read-only mappings if introducing guard-regions, only >>> restrict the mapping further, which means there is no violation of any >>> access rights by permitting this to be so. >>> >>> Removing this restriction allows for read-only mapped files (such as >>> executable files) correctly which would otherwise not be permitted. >>> >>> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> >>> --- >>> mm/madvise.c | 8 +------- >>> 1 file changed, 1 insertion(+), 7 deletions(-) >>> >>> diff --git a/mm/madvise.c b/mm/madvise.c >>> index 6ecead476a80..e01e93e179a8 100644 >>> --- a/mm/madvise.c >>> +++ b/mm/madvise.c >>> @@ -1051,13 +1051,7 @@ static bool is_valid_guard_vma(struct vm_area_struct *vma, bool allow_locked) >>> if (!allow_locked) >>> disallowed |= VM_LOCKED; >>> - if (!vma_is_anonymous(vma)) >>> - return false; >>> - >>> - if ((vma->vm_flags & (VM_MAYWRITE | disallowed)) != VM_MAYWRITE) >>> - return false; >>> - >>> - return true; >>> + return !(vma->vm_flags & disallowed); >>> } >>> static bool is_guard_pte_marker(pte_t ptent) >> >> Acked-by: David Hildenbrand <david@redhat.com> > > Thanks! > >> >> I assume these markers cannot completely prevent us from allocating >> pages/folios for these underlying file/pageache ranges of these markers in >> case of shmem during page faults, right? > > If the markers are in place, then page faulting will result in a > segfault. If we faulted in a shmem page then installed markers (which would > zap the range), then the page cache will be populated, but obviously > subject to standard reclaim. Well, yes, (a) if there is swap and (b), if the noswap option was not specified for tmpfs. Okay, so installing a guard entry might require punshing a hole to get rid of any already-existing memory. But readahead (below) might mess it up. > > If we perform synchronous readahead prior to a guard region that includes > (partially or fully) a guard region we might major fault entries into the > page cache that are then not accessable _from that mapping_, this is rather > unavoidable as this doesn't account for page table mappings and should be > largely trivial overhead (also these folios are reclaimable). Right, that's what I had in mind: assume I have a single marker in a PMD, shmem might allocate a PMD THP to back that region, ignoring the marker hint. (so I think) Thanks!
On Tue, Feb 18, 2025 at 05:17:20PM +0100, David Hildenbrand wrote: > On 18.02.25 17:12, Lorenzo Stoakes wrote: > > On Tue, Feb 18, 2025 at 05:01:16PM +0100, David Hildenbrand wrote: > > > On 13.02.25 19:17, Lorenzo Stoakes wrote: > > > > There is no reason to disallow guard regions in file-backed mappings - > > > > readahead and fault-around both function correctly in the presence of PTE > > > > markers, equally other operations relating to memory-mapped files function > > > > correctly. > > > > > > > > Additionally, read-only mappings if introducing guard-regions, only > > > > restrict the mapping further, which means there is no violation of any > > > > access rights by permitting this to be so. > > > > > > > > Removing this restriction allows for read-only mapped files (such as > > > > executable files) correctly which would otherwise not be permitted. > > > > > > > > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> > > > > --- > > > > mm/madvise.c | 8 +------- > > > > 1 file changed, 1 insertion(+), 7 deletions(-) > > > > > > > > diff --git a/mm/madvise.c b/mm/madvise.c > > > > index 6ecead476a80..e01e93e179a8 100644 > > > > --- a/mm/madvise.c > > > > +++ b/mm/madvise.c > > > > @@ -1051,13 +1051,7 @@ static bool is_valid_guard_vma(struct vm_area_struct *vma, bool allow_locked) > > > > if (!allow_locked) > > > > disallowed |= VM_LOCKED; > > > > - if (!vma_is_anonymous(vma)) > > > > - return false; > > > > - > > > > - if ((vma->vm_flags & (VM_MAYWRITE | disallowed)) != VM_MAYWRITE) > > > > - return false; > > > > - > > > > - return true; > > > > + return !(vma->vm_flags & disallowed); > > > > } > > > > static bool is_guard_pte_marker(pte_t ptent) > > > > > > Acked-by: David Hildenbrand <david@redhat.com> > > > > Thanks! > > > > > > > > I assume these markers cannot completely prevent us from allocating > > > pages/folios for these underlying file/pageache ranges of these markers in > > > case of shmem during page faults, right? > > > > If the markers are in place, then page faulting will result in a > > segfault. If we faulted in a shmem page then installed markers (which would > > zap the range), then the page cache will be populated, but obviously > > subject to standard reclaim. > > Well, yes, (a) if there is swap and (b), if the noswap option was not > specified for tmpfs. > Right, yeah if you don't have it set up such that dropping a reference to the folio doesn't drop the page altogether. I think this matches expectation though in that you'd get the same results from an MADV_DONTNEED followed by faulting the page again. > Okay, so installing a guard entry might require punshing a hole to get rid > of any already-existing memory. But readahead (below) might mess it up. Only if you are so concerned about avoiding the page cache being populated there that you want to do this :) Readahead I think will not readahead into a holepunched region as the hole punching extends to the fs layer _I believe_ I have not checked the code for this, but I believe it actually changes the underlying file too right to say 'this part of the file is empty'? (I did explicitly test hole punching with guard regions btw, by the by :) > > > > > If we perform synchronous readahead prior to a guard region that includes > > (partially or fully) a guard region we might major fault entries into the > > page cache that are then not accessable _from that mapping_, this is rather > > unavoidable as this doesn't account for page table mappings and should be > > largely trivial overhead (also these folios are reclaimable). > > Right, that's what I had in mind: assume I have a single marker in a PMD, > shmem might allocate a PMD THP to back that region, ignoring the marker > hint. (so I think) > > Thanks! > > -- > Cheers, > > David / dhildenb > >
On 18.02.25 17:21, Lorenzo Stoakes wrote: > On Tue, Feb 18, 2025 at 05:17:20PM +0100, David Hildenbrand wrote: >> On 18.02.25 17:12, Lorenzo Stoakes wrote: >>> On Tue, Feb 18, 2025 at 05:01:16PM +0100, David Hildenbrand wrote: >>>> On 13.02.25 19:17, Lorenzo Stoakes wrote: >>>>> There is no reason to disallow guard regions in file-backed mappings - >>>>> readahead and fault-around both function correctly in the presence of PTE >>>>> markers, equally other operations relating to memory-mapped files function >>>>> correctly. >>>>> >>>>> Additionally, read-only mappings if introducing guard-regions, only >>>>> restrict the mapping further, which means there is no violation of any >>>>> access rights by permitting this to be so. >>>>> >>>>> Removing this restriction allows for read-only mapped files (such as >>>>> executable files) correctly which would otherwise not be permitted. >>>>> >>>>> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> >>>>> --- >>>>> mm/madvise.c | 8 +------- >>>>> 1 file changed, 1 insertion(+), 7 deletions(-) >>>>> >>>>> diff --git a/mm/madvise.c b/mm/madvise.c >>>>> index 6ecead476a80..e01e93e179a8 100644 >>>>> --- a/mm/madvise.c >>>>> +++ b/mm/madvise.c >>>>> @@ -1051,13 +1051,7 @@ static bool is_valid_guard_vma(struct vm_area_struct *vma, bool allow_locked) >>>>> if (!allow_locked) >>>>> disallowed |= VM_LOCKED; >>>>> - if (!vma_is_anonymous(vma)) >>>>> - return false; >>>>> - >>>>> - if ((vma->vm_flags & (VM_MAYWRITE | disallowed)) != VM_MAYWRITE) >>>>> - return false; >>>>> - >>>>> - return true; >>>>> + return !(vma->vm_flags & disallowed); >>>>> } >>>>> static bool is_guard_pte_marker(pte_t ptent) >>>> >>>> Acked-by: David Hildenbrand <david@redhat.com> >>> >>> Thanks! >>> >>>> >>>> I assume these markers cannot completely prevent us from allocating >>>> pages/folios for these underlying file/pageache ranges of these markers in >>>> case of shmem during page faults, right? >>> >>> If the markers are in place, then page faulting will result in a >>> segfault. If we faulted in a shmem page then installed markers (which would >>> zap the range), then the page cache will be populated, but obviously >>> subject to standard reclaim. >> >> Well, yes, (a) if there is swap and (b), if the noswap option was not >> specified for tmpfs. >> > > Right, yeah if you don't have it set up such that dropping a reference to the > folio doesn't drop the page altogether. > > I think this matches expectation though in that you'd get the same results from > an MADV_DONTNEED followed by faulting the page again. It might make sense to document that: installing a guard behaves just like MADV_DONTNEED; in case of a file, that means that the pagecache is left untouched. > >> Okay, so installing a guard entry might require punshing a hole to get rid >> of any already-existing memory. But readahead (below) might mess it up. > > Only if you are so concerned about avoiding the page cache being populated there > that you want to do this :) > > Readahead I think will not readahead into a holepunched region as the hole > punching extends to the fs layer _I believe_ I have not checked the code for > this, but I believe it actually changes the underlying file too right to say > 'this part of the file is empty'? Well, we are talking about shmem here ... not your ordinary fs backed by an actual file :)
On Tue, Feb 18, 2025 at 05:27:24PM +0100, David Hildenbrand wrote: > On 18.02.25 17:21, Lorenzo Stoakes wrote: > > On Tue, Feb 18, 2025 at 05:17:20PM +0100, David Hildenbrand wrote: > > > On 18.02.25 17:12, Lorenzo Stoakes wrote: > > > > On Tue, Feb 18, 2025 at 05:01:16PM +0100, David Hildenbrand wrote: > > > > > On 13.02.25 19:17, Lorenzo Stoakes wrote: > > > > > > There is no reason to disallow guard regions in file-backed mappings - > > > > > > readahead and fault-around both function correctly in the presence of PTE > > > > > > markers, equally other operations relating to memory-mapped files function > > > > > > correctly. > > > > > > > > > > > > Additionally, read-only mappings if introducing guard-regions, only > > > > > > restrict the mapping further, which means there is no violation of any > > > > > > access rights by permitting this to be so. > > > > > > > > > > > > Removing this restriction allows for read-only mapped files (such as > > > > > > executable files) correctly which would otherwise not be permitted. > > > > > > > > > > > > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> > > > > > > --- > > > > > > mm/madvise.c | 8 +------- > > > > > > 1 file changed, 1 insertion(+), 7 deletions(-) > > > > > > > > > > > > diff --git a/mm/madvise.c b/mm/madvise.c > > > > > > index 6ecead476a80..e01e93e179a8 100644 > > > > > > --- a/mm/madvise.c > > > > > > +++ b/mm/madvise.c > > > > > > @@ -1051,13 +1051,7 @@ static bool is_valid_guard_vma(struct vm_area_struct *vma, bool allow_locked) > > > > > > if (!allow_locked) > > > > > > disallowed |= VM_LOCKED; > > > > > > - if (!vma_is_anonymous(vma)) > > > > > > - return false; > > > > > > - > > > > > > - if ((vma->vm_flags & (VM_MAYWRITE | disallowed)) != VM_MAYWRITE) > > > > > > - return false; > > > > > > - > > > > > > - return true; > > > > > > + return !(vma->vm_flags & disallowed); > > > > > > } > > > > > > static bool is_guard_pte_marker(pte_t ptent) > > > > > > > > > > Acked-by: David Hildenbrand <david@redhat.com> > > > > > > > > Thanks! > > > > > > > > > > > > > > I assume these markers cannot completely prevent us from allocating > > > > > pages/folios for these underlying file/pageache ranges of these markers in > > > > > case of shmem during page faults, right? > > > > > > > > If the markers are in place, then page faulting will result in a > > > > segfault. If we faulted in a shmem page then installed markers (which would > > > > zap the range), then the page cache will be populated, but obviously > > > > subject to standard reclaim. > > > > > > Well, yes, (a) if there is swap and (b), if the noswap option was not > > > specified for tmpfs. > > > > > > > Right, yeah if you don't have it set up such that dropping a reference to the > > folio doesn't drop the page altogether. > > > > I think this matches expectation though in that you'd get the same results from > > an MADV_DONTNEED followed by faulting the page again. > > It might make sense to document that: installing a guard behaves just like > MADV_DONTNEED; in case of a file, that means that the pagecache is left > untouched. More docs noooo! :P I will update the man pages when this is more obviously heading for landing in 6.15 accordingly. Current man page documentation on this is: 'If the region maps memory pages those mappings will be replaced as part of the operation' I think something like: 'If the region maps pages those mappings will be replaced as part of the operation. When guard regions are removed via MADV_GUARD_REMOVE, faulting in the page will behave as if that region had MADV_DONTNEED applied to it, that is anonymous ranges will be backed by newly allocated zeroed pages and file-backed ranges will be backed by the underlying file pages.' Probably something less wordy than this... > > > > > > Okay, so installing a guard entry might require punshing a hole to get rid > > > of any already-existing memory. But readahead (below) might mess it up. > > > > Only if you are so concerned about avoiding the page cache being populated there > > that you want to do this :) > > > > Readahead I think will not readahead into a holepunched region as the hole > > punching extends to the fs layer _I believe_ I have not checked the code for > > this, but I believe it actually changes the underlying file too right to say > > 'this part of the file is empty'? > > Well, we are talking about shmem here ... not your ordinary fs backed by an > actual file :) I am talking about both, I multitask ;) > > -- > Cheers, > > David / dhildenb >
On 18.02.25 17:49, Lorenzo Stoakes wrote: > On Tue, Feb 18, 2025 at 05:27:24PM +0100, David Hildenbrand wrote: >> On 18.02.25 17:21, Lorenzo Stoakes wrote: >>> On Tue, Feb 18, 2025 at 05:17:20PM +0100, David Hildenbrand wrote: >>>> On 18.02.25 17:12, Lorenzo Stoakes wrote: >>>>> On Tue, Feb 18, 2025 at 05:01:16PM +0100, David Hildenbrand wrote: >>>>>> On 13.02.25 19:17, Lorenzo Stoakes wrote: >>>>>>> There is no reason to disallow guard regions in file-backed mappings - >>>>>>> readahead and fault-around both function correctly in the presence of PTE >>>>>>> markers, equally other operations relating to memory-mapped files function >>>>>>> correctly. >>>>>>> >>>>>>> Additionally, read-only mappings if introducing guard-regions, only >>>>>>> restrict the mapping further, which means there is no violation of any >>>>>>> access rights by permitting this to be so. >>>>>>> >>>>>>> Removing this restriction allows for read-only mapped files (such as >>>>>>> executable files) correctly which would otherwise not be permitted. >>>>>>> >>>>>>> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> >>>>>>> --- >>>>>>> mm/madvise.c | 8 +------- >>>>>>> 1 file changed, 1 insertion(+), 7 deletions(-) >>>>>>> >>>>>>> diff --git a/mm/madvise.c b/mm/madvise.c >>>>>>> index 6ecead476a80..e01e93e179a8 100644 >>>>>>> --- a/mm/madvise.c >>>>>>> +++ b/mm/madvise.c >>>>>>> @@ -1051,13 +1051,7 @@ static bool is_valid_guard_vma(struct vm_area_struct *vma, bool allow_locked) >>>>>>> if (!allow_locked) >>>>>>> disallowed |= VM_LOCKED; >>>>>>> - if (!vma_is_anonymous(vma)) >>>>>>> - return false; >>>>>>> - >>>>>>> - if ((vma->vm_flags & (VM_MAYWRITE | disallowed)) != VM_MAYWRITE) >>>>>>> - return false; >>>>>>> - >>>>>>> - return true; >>>>>>> + return !(vma->vm_flags & disallowed); >>>>>>> } >>>>>>> static bool is_guard_pte_marker(pte_t ptent) >>>>>> >>>>>> Acked-by: David Hildenbrand <david@redhat.com> >>>>> >>>>> Thanks! >>>>> >>>>>> >>>>>> I assume these markers cannot completely prevent us from allocating >>>>>> pages/folios for these underlying file/pageache ranges of these markers in >>>>>> case of shmem during page faults, right? >>>>> >>>>> If the markers are in place, then page faulting will result in a >>>>> segfault. If we faulted in a shmem page then installed markers (which would >>>>> zap the range), then the page cache will be populated, but obviously >>>>> subject to standard reclaim. >>>> >>>> Well, yes, (a) if there is swap and (b), if the noswap option was not >>>> specified for tmpfs. >>>> >>> >>> Right, yeah if you don't have it set up such that dropping a reference to the >>> folio doesn't drop the page altogether. >>> >>> I think this matches expectation though in that you'd get the same results from >>> an MADV_DONTNEED followed by faulting the page again. >> >> It might make sense to document that: installing a guard behaves just like >> MADV_DONTNEED; in case of a file, that means that the pagecache is left >> untouched. > > More docs noooo! :P I will update the man pages when this is more obviously > heading for landing in 6.15 accordingly. > > Current man page documentation on this is: > > 'If the region maps memory pages those mappings will be replaced as part of > the operation' > > I think something like: > > 'If the region maps pages those mappings will be replaced as part of the > operation. When guard regions are removed via MADV_GUARD_REMOVE, faulting > in the page will behave as if that region had MADV_DONTNEED applied to it, > that is anonymous ranges will be backed by newly allocated zeroed pages and > file-backed ranges will be backed by the underlying file pages.' > > Probably something less wordy than this... Yeah, but sounds good to me. > >> >>> >>>> Okay, so installing a guard entry might require punshing a hole to get rid >>>> of any already-existing memory. But readahead (below) might mess it up. >>> >>> Only if you are so concerned about avoiding the page cache being populated there >>> that you want to do this :) >>> >>> Readahead I think will not readahead into a holepunched region as the hole >>> punching extends to the fs layer _I believe_ I have not checked the code for >>> this, but I believe it actually changes the underlying file too right to say >>> 'this part of the file is empty'? >> >> Well, we are talking about shmem here ... not your ordinary fs backed by an >> actual file :) > > I am talking about both, I multitask ;) For !shmem, we should indeed not be messing with a sparse file structure.
On Tue, Feb 18, 2025 at 06:00:36PM +0100, David Hildenbrand wrote: > On 18.02.25 17:49, Lorenzo Stoakes wrote: > > On Tue, Feb 18, 2025 at 05:27:24PM +0100, David Hildenbrand wrote: > > > On 18.02.25 17:21, Lorenzo Stoakes wrote: > > > > On Tue, Feb 18, 2025 at 05:17:20PM +0100, David Hildenbrand wrote: > > > > > On 18.02.25 17:12, Lorenzo Stoakes wrote: > > > > > > On Tue, Feb 18, 2025 at 05:01:16PM +0100, David Hildenbrand wrote: > > > > > > > On 13.02.25 19:17, Lorenzo Stoakes wrote: > > > > > > > > There is no reason to disallow guard regions in file-backed mappings - > > > > > > > > readahead and fault-around both function correctly in the presence of PTE > > > > > > > > markers, equally other operations relating to memory-mapped files function > > > > > > > > correctly. > > > > > > > > > > > > > > > > Additionally, read-only mappings if introducing guard-regions, only > > > > > > > > restrict the mapping further, which means there is no violation of any > > > > > > > > access rights by permitting this to be so. > > > > > > > > > > > > > > > > Removing this restriction allows for read-only mapped files (such as > > > > > > > > executable files) correctly which would otherwise not be permitted. > > > > > > > > > > > > > > > > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> > > > > > > > > --- > > > > > > > > mm/madvise.c | 8 +------- > > > > > > > > 1 file changed, 1 insertion(+), 7 deletions(-) > > > > > > > > > > > > > > > > diff --git a/mm/madvise.c b/mm/madvise.c > > > > > > > > index 6ecead476a80..e01e93e179a8 100644 > > > > > > > > --- a/mm/madvise.c > > > > > > > > +++ b/mm/madvise.c > > > > > > > > @@ -1051,13 +1051,7 @@ static bool is_valid_guard_vma(struct vm_area_struct *vma, bool allow_locked) > > > > > > > > if (!allow_locked) > > > > > > > > disallowed |= VM_LOCKED; > > > > > > > > - if (!vma_is_anonymous(vma)) > > > > > > > > - return false; > > > > > > > > - > > > > > > > > - if ((vma->vm_flags & (VM_MAYWRITE | disallowed)) != VM_MAYWRITE) > > > > > > > > - return false; > > > > > > > > - > > > > > > > > - return true; > > > > > > > > + return !(vma->vm_flags & disallowed); > > > > > > > > } > > > > > > > > static bool is_guard_pte_marker(pte_t ptent) > > > > > > > > > > > > > > Acked-by: David Hildenbrand <david@redhat.com> > > > > > > > > > > > > Thanks! > > > > > > > > > > > > > > > > > > > > I assume these markers cannot completely prevent us from allocating > > > > > > > pages/folios for these underlying file/pageache ranges of these markers in > > > > > > > case of shmem during page faults, right? > > > > > > > > > > > > If the markers are in place, then page faulting will result in a > > > > > > segfault. If we faulted in a shmem page then installed markers (which would > > > > > > zap the range), then the page cache will be populated, but obviously > > > > > > subject to standard reclaim. > > > > > > > > > > Well, yes, (a) if there is swap and (b), if the noswap option was not > > > > > specified for tmpfs. > > > > > > > > > > > > > Right, yeah if you don't have it set up such that dropping a reference to the > > > > folio doesn't drop the page altogether. > > > > > > > > I think this matches expectation though in that you'd get the same results from > > > > an MADV_DONTNEED followed by faulting the page again. > > > > > > It might make sense to document that: installing a guard behaves just like > > > MADV_DONTNEED; in case of a file, that means that the pagecache is left > > > untouched. > > > > More docs noooo! :P I will update the man pages when this is more obviously > > heading for landing in 6.15 accordingly. > > > > Current man page documentation on this is: > > > > 'If the region maps memory pages those mappings will be replaced as part of > > the operation' > > > > I think something like: > > > > 'If the region maps pages those mappings will be replaced as part of the > > operation. When guard regions are removed via MADV_GUARD_REMOVE, faulting > > in the page will behave as if that region had MADV_DONTNEED applied to it, > > that is anonymous ranges will be backed by newly allocated zeroed pages and > > file-backed ranges will be backed by the underlying file pages.' > > > > Probably something less wordy than this... > > Yeah, but sounds good to me. > > > > > > > > > > > > > > > Okay, so installing a guard entry might require punshing a hole to get rid > > > > > of any already-existing memory. But readahead (below) might mess it up. > > > > > > > > Only if you are so concerned about avoiding the page cache being populated there > > > > that you want to do this :) > > > > > > > > Readahead I think will not readahead into a holepunched region as the hole > > > > punching extends to the fs layer _I believe_ I have not checked the code for > > > > this, but I believe it actually changes the underlying file too right to say > > > > 'this part of the file is empty'? > > > > > > Well, we are talking about shmem here ... not your ordinary fs backed by an > > > actual file :) > > > > I am talking about both, I multitask ;) > > For !shmem, we should indeed not be messing with a sparse file structure. Ah right I get you, sorry missed your point here. Yeah MADV_REMOVE for shmem will just explicitly drop the pages which doesn't have any underlying file to impact as with actually file-backed memory. > > > -- > Cheers, > > David / dhildenb >
diff --git a/mm/madvise.c b/mm/madvise.c index 6ecead476a80..e01e93e179a8 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -1051,13 +1051,7 @@ static bool is_valid_guard_vma(struct vm_area_struct *vma, bool allow_locked) if (!allow_locked) disallowed |= VM_LOCKED; - if (!vma_is_anonymous(vma)) - return false; - - if ((vma->vm_flags & (VM_MAYWRITE | disallowed)) != VM_MAYWRITE) - return false; - - return true; + return !(vma->vm_flags & disallowed); } static bool is_guard_pte_marker(pte_t ptent)
There is no reason to disallow guard regions in file-backed mappings - readahead and fault-around both function correctly in the presence of PTE markers, equally other operations relating to memory-mapped files function correctly. Additionally, read-only mappings if introducing guard-regions, only restrict the mapping further, which means there is no violation of any access rights by permitting this to be so. Removing this restriction allows for read-only mapped files (such as executable files) correctly which would otherwise not be permitted. Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> --- mm/madvise.c | 8 +------- 1 file changed, 1 insertion(+), 7 deletions(-)