diff mbox series

[v6,19/26] proc/task_mmu: Mark devdax and fsdax pages as always unpinned

Message ID d7a6c9822ddc945daaf4f9db34d3f2b1c0454447.1736488799.git-series.apopple@nvidia.com
State New
Headers show
Series fs/dax: Fix ZONE_DEVICE page reference counts | expand

Commit Message

Alistair Popple Jan. 10, 2025, 6 a.m. UTC
The procfs mmu files such as smaps and pagemap currently ignore devdax and
fsdax pages because these pages are considered special. A future change
will start treating these as normal pages, meaning they can be exposed via
smaps and pagemap.

The only difference is that devdax and fsdax pages can never be pinned for
DMA via FOLL_LONGTERM, so add an explicit check in pte_is_pinned() to
reflect that.

Signed-off-by: Alistair Popple <apopple@nvidia.com>

---

Changes for v5:

 - After discussion with David remove the checks for DAX pages for
   smaps and pagemap walkers. This means DAX pages will now appear in
   those procfs files.
---
 fs/proc/task_mmu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Dan Williams Jan. 14, 2025, 2:28 a.m. UTC | #1
Alistair Popple wrote:
> The procfs mmu files such as smaps and pagemap currently ignore devdax and
> fsdax pages because these pages are considered special. A future change
> will start treating these as normal pages, meaning they can be exposed via
> smaps and pagemap.
> 
> The only difference is that devdax and fsdax pages can never be pinned for
> DMA via FOLL_LONGTERM, so add an explicit check in pte_is_pinned() to
> reflect that.

I don't understand this patch.

pin_user_pages() is also used for Direct-I/O page pinning, so the
comment about FOLL_LONGTERM is wrong, and I otherwise do not understand
what goes wrong if the only pte_is_pinned() user correctly detects the
pin state?
David Hildenbrand Jan. 14, 2025, 4:45 p.m. UTC | #2
On 14.01.25 03:28, Dan Williams wrote:
> Alistair Popple wrote:
>> The procfs mmu files such as smaps and pagemap currently ignore devdax and
>> fsdax pages because these pages are considered special. A future change
>> will start treating these as normal pages, meaning they can be exposed via
>> smaps and pagemap.
>>
>> The only difference is that devdax and fsdax pages can never be pinned for
>> DMA via FOLL_LONGTERM, so add an explicit check in pte_is_pinned() to
>> reflect that.
> 
> I don't understand this patch.


This whole pte_is_pinned() should likely be ripped out (and I have a 
patch here to do that for a long time).

But that's a different discussion.

> 
> pin_user_pages() is also used for Direct-I/O page pinning, so the
> comment about FOLL_LONGTERM is wrong, and I otherwise do not understand
> what goes wrong if the only pte_is_pinned() user correctly detects the
> pin state?

Yes, this patch should likely just be dropped.

Even if folio_maybe_dma_pinned() == true because of "false positives", 
it will behave just like other order-0 pages with false positives, and 
only affect soft-dirty tracking ... which nobody should be caring about 
here at all.

We would always detect the PTE as soft-dirty because we we never
	pte_wrprotect(old_pte)

Yes, nobody should care.
Alistair Popple Jan. 17, 2025, 1:28 a.m. UTC | #3
On Tue, Jan 14, 2025 at 05:45:46PM +0100, David Hildenbrand wrote:
> On 14.01.25 03:28, Dan Williams wrote:
> > Alistair Popple wrote:
> > > The procfs mmu files such as smaps and pagemap currently ignore devdax and
> > > fsdax pages because these pages are considered special. A future change
> > > will start treating these as normal pages, meaning they can be exposed via
> > > smaps and pagemap.
> > > 
> > > The only difference is that devdax and fsdax pages can never be pinned for
> > > DMA via FOLL_LONGTERM, so add an explicit check in pte_is_pinned() to
> > > reflect that.
> > 
> > I don't understand this patch.
> 
> 
> This whole pte_is_pinned() should likely be ripped out (and I have a patch
> here to do that for a long time).

Agreed.
 
> But that's a different discussion.
> 
> > 
> > pin_user_pages() is also used for Direct-I/O page pinning, so the
> > comment about FOLL_LONGTERM is wrong, and I otherwise do not understand
> > what goes wrong if the only pte_is_pinned() user correctly detects the
> > pin state?
> 
> Yes, this patch should likely just be dropped.

Yeah, I think I was just being overly cautious about the change to
vm_normal_page(). Agree this can be dropped. Looking at task_mmu.c there is one
other user of vm_normal_page() - clear_refs_pte_range().

We will start clearing access/referenced bits on DAX PTEs there. But I think
that's actually the right thing to do given we do currently clear them for PMD
mapped DAX pages.

> Even if folio_maybe_dma_pinned() == true because of "false positives", it
> will behave just like other order-0 pages with false positives, and only
> affect soft-dirty tracking ... which nobody should be caring about here at
> all.
> 
> We would always detect the PTE as soft-dirty because we we never
> 	pte_wrprotect(old_pte)
> 
> Yes, nobody should care.
> 
> -- 
> Cheers,
> 
> David / dhildenb
>
diff mbox series

Patch

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 38a5a3e..9a8a7d3 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -1378,7 +1378,7 @@  static inline bool pte_is_pinned(struct vm_area_struct *vma, unsigned long addr,
 	if (likely(!test_bit(MMF_HAS_PINNED, &vma->vm_mm->flags)))
 		return false;
 	folio = vm_normal_folio(vma, addr, pte);
-	if (!folio)
+	if (!folio || folio_is_devdax(folio) || folio_is_fsdax(folio))
 		return false;
 	return folio_maybe_dma_pinned(folio);
 }