diff mbox

mm: add ZONE_DEVICE statistics to smaps

Message ID 147881591739.39198.1358237993213024627.stgit@dwillia2-desk3.amr.corp.intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Dan Williams Nov. 10, 2016, 10:11 p.m. UTC
ZONE_DEVICE pages are mapped into a process via the filesystem-dax and
device-dax mechanisms.  There are also proposals to use ZONE_DEVICE
pages for other usages outside of dax.  Add statistics to smaps so
applications can debug that they are obtaining the mappings they expect,
or otherwise accounting them.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 fs/proc/task_mmu.c |   10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

Comments

Anshuman Khandual Nov. 11, 2016, 4:12 a.m. UTC | #1
On 11/11/2016 03:41 AM, Dan Williams wrote:
> ZONE_DEVICE pages are mapped into a process via the filesystem-dax and
> device-dax mechanisms.  There are also proposals to use ZONE_DEVICE
> pages for other usages outside of dax.  Add statistics to smaps so
> applications can debug that they are obtaining the mappings they expect,
> or otherwise accounting them.

This might also help when we will have ZONE_DEVICE based solution for
HMM based device memory.
Dan Williams Nov. 15, 2016, 3:14 a.m. UTC | #2
On Thu, Nov 10, 2016 at 2:11 PM, Dan Williams <dan.j.williams@intel.com> wrote:
> ZONE_DEVICE pages are mapped into a process via the filesystem-dax and
> device-dax mechanisms.  There are also proposals to use ZONE_DEVICE
> pages for other usages outside of dax.  Add statistics to smaps so
> applications can debug that they are obtaining the mappings they expect,
> or otherwise accounting them.
>
> Cc: Christoph Hellwig <hch@lst.de>

Christoph,

Wanted to get your opinion on this given your earlier concerns about
the VM_DAX flag.

This instead lets an application know how much of a vma is backed by
ZONE_DEVICE pages, but does not make any indications about the vma
having DAX semantics or not.  I.e. it is possible that 'device' and
'device_huge' are non-zero *and* vma_is_dax() is false.  So, it is
purely accounting the composition of the present pages in the vma.

Another option is to have something like 'shared_thp' just to account
for file backed huge pages that dax can map.  However if ZONE_DEVICE
is leaking into other use cases I think it makes sense to have it be a
first class-citizen with respect to accounting alongside
'anonymous_thp'.

> Cc: Dave Hansen <dave.hansen@intel.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  fs/proc/task_mmu.c |   10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> index 35b92d81692f..6765cafcf057 100644
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -445,6 +445,8 @@ struct mem_size_stats {
>         unsigned long swap;
>         unsigned long shared_hugetlb;
>         unsigned long private_hugetlb;
> +       unsigned long device;
> +       unsigned long device_huge;
>         u64 pss;
>         u64 swap_pss;
>         bool check_shmem_swap;
> @@ -458,6 +460,8 @@ static void smaps_account(struct mem_size_stats *mss, struct page *page,
>
>         if (PageAnon(page))
>                 mss->anonymous += size;
> +       else if (is_zone_device_page(page))
> +               mss->device += size;
>
>         mss->resident += size;
>         /* Accumulate the size in pages that have been accessed. */
> @@ -575,7 +579,7 @@ static void smaps_pmd_entry(pmd_t *pmd, unsigned long addr,
>         else if (PageSwapBacked(page))
>                 mss->shmem_thp += HPAGE_PMD_SIZE;
>         else if (is_zone_device_page(page))
> -               /* pass */;
> +               mss->device_huge += HPAGE_PMD_SIZE;
>         else
>                 VM_BUG_ON_PAGE(1, page);
>         smaps_account(mss, page, true, pmd_young(*pmd), pmd_dirty(*pmd));
> @@ -774,6 +778,8 @@ static int show_smap(struct seq_file *m, void *v, int is_pid)
>                    "ShmemPmdMapped: %8lu kB\n"
>                    "Shared_Hugetlb: %8lu kB\n"
>                    "Private_Hugetlb: %7lu kB\n"
> +                  "Device:         %8lu kB\n"
> +                  "DeviceHugePages: %7lu kB\n"
>                    "Swap:           %8lu kB\n"
>                    "SwapPss:        %8lu kB\n"
>                    "KernelPageSize: %8lu kB\n"
> @@ -792,6 +798,8 @@ static int show_smap(struct seq_file *m, void *v, int is_pid)
>                    mss.shmem_thp >> 10,
>                    mss.shared_hugetlb >> 10,
>                    mss.private_hugetlb >> 10,
> +                  mss.device >> 10,
> +                  mss.device_huge >> 10,
>                    mss.swap >> 10,
>                    (unsigned long)(mss.swap_pss >> (10 + PSS_SHIFT)),
>                    vma_kernel_pagesize(vma) >> 10,
>
Christoph Hellwig Nov. 15, 2016, 6:50 p.m. UTC | #3
Hi Dan,

On Mon, Nov 14, 2016 at 07:14:22PM -0800, Dan Williams wrote:
> Wanted to get your opinion on this given your earlier concerns about
> the VM_DAX flag.
> 
> This instead lets an application know how much of a vma is backed by
> ZONE_DEVICE pages, but does not make any indications about the vma
> having DAX semantics or not.  I.e. it is possible that 'device' and
> 'device_huge' are non-zero *and* vma_is_dax() is false.  So, it is
> purely accounting the composition of the present pages in the vma.
> 
> Another option is to have something like 'shared_thp' just to account
> for file backed huge pages that dax can map.  However if ZONE_DEVICE
> is leaking into other use cases I think it makes sense to have it be a
> first class-citizen with respect to accounting alongside
> 'anonymous_thp'.

This counter sounds fine to me, it's a debug tool and not an obvious
abuse candidate like VM_DAX.  But I'll defer to the VM folks for a real
review.
Andrew Morton Nov. 16, 2016, 12:04 a.m. UTC | #4
On Thu, 10 Nov 2016 14:11:57 -0800 Dan Williams <dan.j.williams@intel.com> wrote:

> ZONE_DEVICE pages are mapped into a process via the filesystem-dax and
> device-dax mechanisms.  There are also proposals to use ZONE_DEVICE
> pages for other usages outside of dax.  Add statistics to smaps so
> applications can debug that they are obtaining the mappings they expect,
> or otherwise accounting them.
> 
> ...
>
>  fs/proc/task_mmu.c |   10 +++++++++-

Please update Documentation/filesystems/proc.txt.

(While there, please check to see if anything else is missed?)
Dave Hansen Nov. 16, 2016, 12:15 a.m. UTC | #5
On 11/10/2016 02:11 PM, Dan Williams wrote:
> @@ -774,6 +778,8 @@ static int show_smap(struct seq_file *m, void *v, int is_pid)
>  		   "ShmemPmdMapped: %8lu kB\n"
>  		   "Shared_Hugetlb: %8lu kB\n"
>  		   "Private_Hugetlb: %7lu kB\n"
> +		   "Device:         %8lu kB\n"
> +		   "DeviceHugePages: %7lu kB\n"
>  		   "Swap:           %8lu kB\n"
>  		   "SwapPss:        %8lu kB\n"
>  		   "KernelPageSize: %8lu kB\n"

So, a couple of nits...

smaps is getting a bit big, and the fields that get added in this patch
are going to be pretty infrequently used.  Is it OK if smaps grows
forever, even if most of them items are "0 kB"?

IOW, Could we make it output Device* only for DAX VMAs?  All the parsers
have to handle that field being there or not (for old kernels).

The other thing missing for DAX is the page size.  DAX mappings support
mixed page sizes, so MMUPageSize in this context is pretty worthless.
What will we do in here for 1GB DAX pages?
Dan Williams Nov. 16, 2016, 12:48 a.m. UTC | #6
On Tue, Nov 15, 2016 at 4:15 PM, Dave Hansen <dave.hansen@intel.com> wrote:
> On 11/10/2016 02:11 PM, Dan Williams wrote:
>> @@ -774,6 +778,8 @@ static int show_smap(struct seq_file *m, void *v, int is_pid)
>>                  "ShmemPmdMapped: %8lu kB\n"
>>                  "Shared_Hugetlb: %8lu kB\n"
>>                  "Private_Hugetlb: %7lu kB\n"
>> +                "Device:         %8lu kB\n"
>> +                "DeviceHugePages: %7lu kB\n"
>>                  "Swap:           %8lu kB\n"
>>                  "SwapPss:        %8lu kB\n"
>>                  "KernelPageSize: %8lu kB\n"
>
> So, a couple of nits...
>
> smaps is getting a bit big, and the fields that get added in this patch
> are going to be pretty infrequently used.  Is it OK if smaps grows
> forever, even if most of them items are "0 kB"?
>
> IOW, Could we make it output Device* only for DAX VMAs?  All the parsers
> have to handle that field being there or not (for old kernels).

How about just hiding the field if it is zero?  That way it's not an
backdoor way to leak vma_is_dax() which was Christoph's concern.

> The other thing missing for DAX is the page size.  DAX mappings support
> mixed page sizes, so MMUPageSize in this context is pretty worthless.
> What will we do in here for 1GB DAX pages?

I was thinking that would be yet another field "DeviceGiganticPages?"
when we eventually add 1GB support (not there today).
diff mbox

Patch

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 35b92d81692f..6765cafcf057 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -445,6 +445,8 @@  struct mem_size_stats {
 	unsigned long swap;
 	unsigned long shared_hugetlb;
 	unsigned long private_hugetlb;
+	unsigned long device;
+	unsigned long device_huge;
 	u64 pss;
 	u64 swap_pss;
 	bool check_shmem_swap;
@@ -458,6 +460,8 @@  static void smaps_account(struct mem_size_stats *mss, struct page *page,
 
 	if (PageAnon(page))
 		mss->anonymous += size;
+	else if (is_zone_device_page(page))
+		mss->device += size;
 
 	mss->resident += size;
 	/* Accumulate the size in pages that have been accessed. */
@@ -575,7 +579,7 @@  static void smaps_pmd_entry(pmd_t *pmd, unsigned long addr,
 	else if (PageSwapBacked(page))
 		mss->shmem_thp += HPAGE_PMD_SIZE;
 	else if (is_zone_device_page(page))
-		/* pass */;
+		mss->device_huge += HPAGE_PMD_SIZE;
 	else
 		VM_BUG_ON_PAGE(1, page);
 	smaps_account(mss, page, true, pmd_young(*pmd), pmd_dirty(*pmd));
@@ -774,6 +778,8 @@  static int show_smap(struct seq_file *m, void *v, int is_pid)
 		   "ShmemPmdMapped: %8lu kB\n"
 		   "Shared_Hugetlb: %8lu kB\n"
 		   "Private_Hugetlb: %7lu kB\n"
+		   "Device:         %8lu kB\n"
+		   "DeviceHugePages: %7lu kB\n"
 		   "Swap:           %8lu kB\n"
 		   "SwapPss:        %8lu kB\n"
 		   "KernelPageSize: %8lu kB\n"
@@ -792,6 +798,8 @@  static int show_smap(struct seq_file *m, void *v, int is_pid)
 		   mss.shmem_thp >> 10,
 		   mss.shared_hugetlb >> 10,
 		   mss.private_hugetlb >> 10,
+		   mss.device >> 10,
+		   mss.device_huge >> 10,
 		   mss.swap >> 10,
 		   (unsigned long)(mss.swap_pss >> (10 + PSS_SHIFT)),
 		   vma_kernel_pagesize(vma) >> 10,