diff mbox series

[v2,4/6] fs: report per-mount io stats

Message ID 20220228113910.1727819-5-amir73il@gmail.com (mailing list archive)
State New, archived
Headers show
Series Generic per-mount io stats | expand

Commit Message

Amir Goldstein Feb. 28, 2022, 11:39 a.m. UTC
Show optional collected per-mount io stats in /proc/<pid>/mountstats
for filesystems that do not implement their own show_stats() method
and opted-in to generic per-mount stats with FS_MOUNT_STATS flag.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/mount.h          |  1 +
 fs/namespace.c      |  2 ++
 fs/proc_namespace.c | 13 +++++++++++++
 3 files changed, 16 insertions(+)

Comments

Miklos Szeredi Feb. 28, 2022, 3:06 p.m. UTC | #1
On Mon, 28 Feb 2022 at 12:39, Amir Goldstein <amir73il@gmail.com> wrote:
>
> Show optional collected per-mount io stats in /proc/<pid>/mountstats
> for filesystems that do not implement their own show_stats() method
> and opted-in to generic per-mount stats with FS_MOUNT_STATS flag.

This would allow some filesystems to report per-mount I/O stats, while
leaving CIFS and NFS reporting a different set of per-sb stats.  This
doesn't sound very clean.

There was an effort to create saner and more efficient interfaces for
per-mount info.  IMO this should be part of that effort instead of
overloading the old interface.

Thanks,
Miklos
Amir Goldstein Feb. 28, 2022, 4:18 p.m. UTC | #2
On Mon, Feb 28, 2022 at 5:06 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> On Mon, 28 Feb 2022 at 12:39, Amir Goldstein <amir73il@gmail.com> wrote:
> >
> > Show optional collected per-mount io stats in /proc/<pid>/mountstats
> > for filesystems that do not implement their own show_stats() method
> > and opted-in to generic per-mount stats with FS_MOUNT_STATS flag.
>
> This would allow some filesystems to report per-mount I/O stats, while
> leaving CIFS and NFS reporting a different set of per-sb stats.  This
> doesn't sound very clean.
>
> There was an effort to create saner and more efficient interfaces for
> per-mount info.  IMO this should be part of that effort instead of
> overloading the old interface.
>

That's fair, but actually, I have no much need for per-mount I/O stats
in overlayfs/fuse use cases, so I could amend the patches to collect and
show per-sb I/O stats.

Then, the generic show_stats() will not be "overloading the old interface".
Instead, it will be creating a common implementation to share among different
filesystems and using an existing vfs interface as it was intended.

Would you be willing to accept adding per-sb I/O stats to overlayfs
and/or fuse via /proc/<pid>/mountstats?

Thanks,
Amir.
Miklos Szeredi Feb. 28, 2022, 4:31 p.m. UTC | #3
On Mon, 28 Feb 2022 at 17:19, Amir Goldstein <amir73il@gmail.com> wrote:
>
> On Mon, Feb 28, 2022 at 5:06 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
> >
> > On Mon, 28 Feb 2022 at 12:39, Amir Goldstein <amir73il@gmail.com> wrote:
> > >
> > > Show optional collected per-mount io stats in /proc/<pid>/mountstats
> > > for filesystems that do not implement their own show_stats() method
> > > and opted-in to generic per-mount stats with FS_MOUNT_STATS flag.
> >
> > This would allow some filesystems to report per-mount I/O stats, while
> > leaving CIFS and NFS reporting a different set of per-sb stats.  This
> > doesn't sound very clean.
> >
> > There was an effort to create saner and more efficient interfaces for
> > per-mount info.  IMO this should be part of that effort instead of
> > overloading the old interface.
> >
>
> That's fair, but actually, I have no much need for per-mount I/O stats
> in overlayfs/fuse use cases, so I could amend the patches to collect and
> show per-sb I/O stats.
>
> Then, the generic show_stats() will not be "overloading the old interface".
> Instead, it will be creating a common implementation to share among different
> filesystems and using an existing vfs interface as it was intended.
>
> Would you be willing to accept adding per-sb I/O stats to overlayfs
> and/or fuse via /proc/<pid>/mountstats?

Yes, that would certainly be more sane.   But I'm also wondering if
there could be some commonality with the stats provided by NFS...

Thanks,
Miklos
Amir Goldstein Feb. 28, 2022, 5:06 p.m. UTC | #4
On Mon, Feb 28, 2022 at 6:31 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> On Mon, 28 Feb 2022 at 17:19, Amir Goldstein <amir73il@gmail.com> wrote:
> >
> > On Mon, Feb 28, 2022 at 5:06 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
> > >
> > > On Mon, 28 Feb 2022 at 12:39, Amir Goldstein <amir73il@gmail.com> wrote:
> > > >
> > > > Show optional collected per-mount io stats in /proc/<pid>/mountstats
> > > > for filesystems that do not implement their own show_stats() method
> > > > and opted-in to generic per-mount stats with FS_MOUNT_STATS flag.
> > >
> > > This would allow some filesystems to report per-mount I/O stats, while
> > > leaving CIFS and NFS reporting a different set of per-sb stats.  This
> > > doesn't sound very clean.
> > >
> > > There was an effort to create saner and more efficient interfaces for
> > > per-mount info.  IMO this should be part of that effort instead of
> > > overloading the old interface.
> > >
> >
> > That's fair, but actually, I have no much need for per-mount I/O stats
> > in overlayfs/fuse use cases, so I could amend the patches to collect and
> > show per-sb I/O stats.
> >
> > Then, the generic show_stats() will not be "overloading the old interface".
> > Instead, it will be creating a common implementation to share among different
> > filesystems and using an existing vfs interface as it was intended.
> >
> > Would you be willing to accept adding per-sb I/O stats to overlayfs
> > and/or fuse via /proc/<pid>/mountstats?
>
> Yes, that would certainly be more sane.   But I'm also wondering if
> there could be some commonality with the stats provided by NFS...
>

hmm, maybe, but look:

device genesis:/shares/media mounted on /mnt/nfs with fstype nfs4 statvers=1.1
opts: rw,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,acregmin=3,acregmax=60,acdirmin=30,acdirmax=60,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=192.168.122.1,local_lock=none
age: 38
impl_id: name='',domain='',date='0,0'
caps: caps=0x3ffbffff,wtmult=512,dtsize=32768,bsize=0,namlen=255
nfsv4: bm0=0xfdffbfff,bm1=0xf9be3e,bm2=0x68800,acl=0x3,sessions,pnfs=not
configured,lease_time=90,lease_expired=0
sec: flavor=1,pseudoflavor=1
events: 0 0 0 0 0 2 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
bytes: 0 0 0 0 0 0 0 0
RPC iostats version: 1.1  p/v: 100003/4 (nfs)
xprt: tcp 0 0 2 0 38 27 27 0 27 0 2 0 0
per-op statistics
       NULL: 1 1 0 44 24 0 0 0 0
       READ: 0 0 0 0 0 0 0 0 0
       WRITE: 0 0 0 0 0 0 0 0 0
       COMMIT: 0 0 0 0 0 0 0 0 0
       OPEN: 0 0 0 0 0 0 0 0 0
       OPEN_CONFIRM: 0 0 0 0 0 0 0 0 0
       OPEN_NOATTR: 0 0 0 0 0 0 0 0 0
       OPEN_DOWNGRADE: 0 0 0 0 0 0 0 0 0
       CLOSE: 0 0 0 0 0 0 0 0 0
       ...

Those stats are pretty specific to NFS.
Most of it is RPC protocol stats, not including cached /I/O stats,
so in fact the generic collected io stats could be added to nfs_show_stats()
and they will not break the <tag>:<value> format.

I used the same output format as /proc/<pid>/io for a reason.
The iotop utility parses /proc/<pid>/io to display io per task and
also displays total io stats for block devices.

I am envisioning an extended version of iotop (-m ?) that also
displays total iostats per mount/sb, when available.

Thanks,
Amir.
Dave Chinner Feb. 28, 2022, 9:11 p.m. UTC | #5
On Mon, Feb 28, 2022 at 01:39:08PM +0200, Amir Goldstein wrote:
> Show optional collected per-mount io stats in /proc/<pid>/mountstats
> for filesystems that do not implement their own show_stats() method
> and opted-in to generic per-mount stats with FS_MOUNT_STATS flag.
> 
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> ---
>  fs/mount.h          |  1 +
>  fs/namespace.c      |  2 ++
>  fs/proc_namespace.c | 13 +++++++++++++
>  3 files changed, 16 insertions(+)
> 
> diff --git a/fs/mount.h b/fs/mount.h
> index f98bf4cd5b1a..2ab6308af78b 100644
> --- a/fs/mount.h
> +++ b/fs/mount.h
> @@ -91,6 +91,7 @@ struct mount {
>  	int mnt_id;			/* mount identifier */
>  	int mnt_group_id;		/* peer group identifier */
>  	int mnt_expiry_mark;		/* true if marked for expiry */
> +	time64_t mnt_time;		/* time of mount */
>  	struct hlist_head mnt_pins;
>  	struct hlist_head mnt_stuck_children;
>  } __randomize_layout;
> diff --git a/fs/namespace.c b/fs/namespace.c
> index 3fb8f11a42a1..546f07ed44c5 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -220,6 +220,8 @@ static struct mount *alloc_vfsmnt(const char *name)
>  		mnt->mnt_count = 1;
>  		mnt->mnt_writers = 0;
>  #endif
> +		/* For proc/<pid>/mountstats */
> +		mnt->mnt_time = ktime_get_seconds();
>  
>  		INIT_HLIST_NODE(&mnt->mnt_hash);
>  		INIT_LIST_HEAD(&mnt->mnt_child);
> diff --git a/fs/proc_namespace.c b/fs/proc_namespace.c
> index 49650e54d2f8..d744fb8543f5 100644
> --- a/fs/proc_namespace.c
> +++ b/fs/proc_namespace.c
> @@ -232,6 +232,19 @@ static int show_vfsstat(struct seq_file *m, struct vfsmount *mnt)
>  	if (sb->s_op->show_stats) {
>  		seq_putc(m, ' ');
>  		err = sb->s_op->show_stats(m, mnt_path.dentry);
> +	} else if (mnt_has_stats(mnt)) {
> +		/* Similar to /proc/<pid>/io */
> +		seq_printf(m, "\n"
> +			   "\ttimes: %lld %lld\n"
> +			   "\trchar: %lld\n"
> +			   "\twchar: %lld\n"
> +			   "\tsyscr: %lld\n"
> +			   "\tsyscw: %lld\n",
> +			   r->mnt_time, ktime_get_seconds(),
> +			   mnt_iostats_counter_read(r, MNTIOS_CHARS_RD),
> +			   mnt_iostats_counter_read(r, MNTIOS_CHARS_WR),
> +			   mnt_iostats_counter_read(r, MNTIOS_SYSCALLS_RD),
> +			   mnt_iostats_counter_read(r, MNTIOS_SYSCALLS_WR));

This doesn't scale as {cpus, mounts, counters, read frequency}
matrix explodes.  Please iterate the per-mount per cpu counters
once, adding up all counters in one pass to an array on stack, then
print them all from the array.

Cheers,

Dave.
Amir Goldstein Feb. 28, 2022, 9:57 p.m. UTC | #6
On Mon, Feb 28, 2022 at 11:11 PM Dave Chinner <david@fromorbit.com> wrote:
>
> On Mon, Feb 28, 2022 at 01:39:08PM +0200, Amir Goldstein wrote:
> > Show optional collected per-mount io stats in /proc/<pid>/mountstats
> > for filesystems that do not implement their own show_stats() method
> > and opted-in to generic per-mount stats with FS_MOUNT_STATS flag.
> >
> > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> > ---
> >  fs/mount.h          |  1 +
> >  fs/namespace.c      |  2 ++
> >  fs/proc_namespace.c | 13 +++++++++++++
> >  3 files changed, 16 insertions(+)
> >
> > diff --git a/fs/mount.h b/fs/mount.h
> > index f98bf4cd5b1a..2ab6308af78b 100644
> > --- a/fs/mount.h
> > +++ b/fs/mount.h
> > @@ -91,6 +91,7 @@ struct mount {
> >       int mnt_id;                     /* mount identifier */
> >       int mnt_group_id;               /* peer group identifier */
> >       int mnt_expiry_mark;            /* true if marked for expiry */
> > +     time64_t mnt_time;              /* time of mount */
> >       struct hlist_head mnt_pins;
> >       struct hlist_head mnt_stuck_children;
> >  } __randomize_layout;
> > diff --git a/fs/namespace.c b/fs/namespace.c
> > index 3fb8f11a42a1..546f07ed44c5 100644
> > --- a/fs/namespace.c
> > +++ b/fs/namespace.c
> > @@ -220,6 +220,8 @@ static struct mount *alloc_vfsmnt(const char *name)
> >               mnt->mnt_count = 1;
> >               mnt->mnt_writers = 0;
> >  #endif
> > +             /* For proc/<pid>/mountstats */
> > +             mnt->mnt_time = ktime_get_seconds();
> >
> >               INIT_HLIST_NODE(&mnt->mnt_hash);
> >               INIT_LIST_HEAD(&mnt->mnt_child);
> > diff --git a/fs/proc_namespace.c b/fs/proc_namespace.c
> > index 49650e54d2f8..d744fb8543f5 100644
> > --- a/fs/proc_namespace.c
> > +++ b/fs/proc_namespace.c
> > @@ -232,6 +232,19 @@ static int show_vfsstat(struct seq_file *m, struct vfsmount *mnt)
> >       if (sb->s_op->show_stats) {
> >               seq_putc(m, ' ');
> >               err = sb->s_op->show_stats(m, mnt_path.dentry);
> > +     } else if (mnt_has_stats(mnt)) {
> > +             /* Similar to /proc/<pid>/io */
> > +             seq_printf(m, "\n"
> > +                        "\ttimes: %lld %lld\n"
> > +                        "\trchar: %lld\n"
> > +                        "\twchar: %lld\n"
> > +                        "\tsyscr: %lld\n"
> > +                        "\tsyscw: %lld\n",
> > +                        r->mnt_time, ktime_get_seconds(),
> > +                        mnt_iostats_counter_read(r, MNTIOS_CHARS_RD),
> > +                        mnt_iostats_counter_read(r, MNTIOS_CHARS_WR),
> > +                        mnt_iostats_counter_read(r, MNTIOS_SYSCALLS_RD),
> > +                        mnt_iostats_counter_read(r, MNTIOS_SYSCALLS_WR));
>
> This doesn't scale as {cpus, mounts, counters, read frequency}
> matrix explodes.  Please iterate the per-mount per cpu counters
> once, adding up all counters in one pass to an array on stack, then
> print them all from the array.

I am planning to move to per-sb iostats and was thinking of using
an array of 4 struct percpu_counter. That will make this sort of iteration
more challenging.

Do you really think the read frequency of /proc/self/mountstats
warrants such performance optimization?

It's not like the case of the mighty struct xfsstats.
It is only going to fold 4 per cpu iterations into 1.
This doesn't look like a game changer to me.
Am I missing something?

Thanks,
Amir.
Dave Chinner March 1, 2022, 9:46 a.m. UTC | #7
On Mon, Feb 28, 2022 at 11:57:19PM +0200, Amir Goldstein wrote:
> On Mon, Feb 28, 2022 at 11:11 PM Dave Chinner <david@fromorbit.com> wrote:
> >
> > On Mon, Feb 28, 2022 at 01:39:08PM +0200, Amir Goldstein wrote:
> > > Show optional collected per-mount io stats in /proc/<pid>/mountstats
> > > for filesystems that do not implement their own show_stats() method
> > > and opted-in to generic per-mount stats with FS_MOUNT_STATS flag.
> > >
> > > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> > > ---
> > >  fs/mount.h          |  1 +
> > >  fs/namespace.c      |  2 ++
> > >  fs/proc_namespace.c | 13 +++++++++++++
> > >  3 files changed, 16 insertions(+)
> > >
> > > diff --git a/fs/mount.h b/fs/mount.h
> > > index f98bf4cd5b1a..2ab6308af78b 100644
> > > --- a/fs/mount.h
> > > +++ b/fs/mount.h
> > > @@ -91,6 +91,7 @@ struct mount {
> > >       int mnt_id;                     /* mount identifier */
> > >       int mnt_group_id;               /* peer group identifier */
> > >       int mnt_expiry_mark;            /* true if marked for expiry */
> > > +     time64_t mnt_time;              /* time of mount */
> > >       struct hlist_head mnt_pins;
> > >       struct hlist_head mnt_stuck_children;
> > >  } __randomize_layout;
> > > diff --git a/fs/namespace.c b/fs/namespace.c
> > > index 3fb8f11a42a1..546f07ed44c5 100644
> > > --- a/fs/namespace.c
> > > +++ b/fs/namespace.c
> > > @@ -220,6 +220,8 @@ static struct mount *alloc_vfsmnt(const char *name)
> > >               mnt->mnt_count = 1;
> > >               mnt->mnt_writers = 0;
> > >  #endif
> > > +             /* For proc/<pid>/mountstats */
> > > +             mnt->mnt_time = ktime_get_seconds();
> > >
> > >               INIT_HLIST_NODE(&mnt->mnt_hash);
> > >               INIT_LIST_HEAD(&mnt->mnt_child);
> > > diff --git a/fs/proc_namespace.c b/fs/proc_namespace.c
> > > index 49650e54d2f8..d744fb8543f5 100644
> > > --- a/fs/proc_namespace.c
> > > +++ b/fs/proc_namespace.c
> > > @@ -232,6 +232,19 @@ static int show_vfsstat(struct seq_file *m, struct vfsmount *mnt)
> > >       if (sb->s_op->show_stats) {
> > >               seq_putc(m, ' ');
> > >               err = sb->s_op->show_stats(m, mnt_path.dentry);
> > > +     } else if (mnt_has_stats(mnt)) {
> > > +             /* Similar to /proc/<pid>/io */
> > > +             seq_printf(m, "\n"
> > > +                        "\ttimes: %lld %lld\n"
> > > +                        "\trchar: %lld\n"
> > > +                        "\twchar: %lld\n"
> > > +                        "\tsyscr: %lld\n"
> > > +                        "\tsyscw: %lld\n",
> > > +                        r->mnt_time, ktime_get_seconds(),
> > > +                        mnt_iostats_counter_read(r, MNTIOS_CHARS_RD),
> > > +                        mnt_iostats_counter_read(r, MNTIOS_CHARS_WR),
> > > +                        mnt_iostats_counter_read(r, MNTIOS_SYSCALLS_RD),
> > > +                        mnt_iostats_counter_read(r, MNTIOS_SYSCALLS_WR));
> >
> > This doesn't scale as {cpus, mounts, counters, read frequency}
> > matrix explodes.  Please iterate the per-mount per cpu counters
> > once, adding up all counters in one pass to an array on stack, then
> > print them all from the array.
> 
> I am planning to move to per-sb iostats and was thinking of using
> an array of 4 struct percpu_counter. That will make this sort of iteration
> more challenging.

No, it would get rid of it entirely. percpu_counter_read_positive()
does not require any summing at all - that's a much better solution
than a hand rolled set of percpu counters. Please do this.

> Do you really think the read frequency of /proc/self/mountstats
> warrants such performance optimization?

We get bug reports every so often about the overhead of frequently
summing per-cpu stats on large systems. Nothing ratelimits or
restricts access to /proc/self/mountstats, so when you have a
thousand CPUs and a million monkeys...

Rule of thumb: don't do computationally expensive things to generate
data for globally accessible sysfs files.

> It's not like the case of the mighty struct xfsstats.
> It is only going to fold 4 per cpu iterations into 1.
> This doesn't look like a game changer to me.
> Am I missing something?

I'm just pointing out something we've had problems with in
the past and are trying to help you avoid making the same mistakes.
That's what reviewers are supposed to do, yes?

Cheers,

Dave.
Amir Goldstein March 1, 2022, 10:56 a.m. UTC | #8
On Tue, Mar 1, 2022 at 11:46 AM Dave Chinner <david@fromorbit.com> wrote:
>
> On Mon, Feb 28, 2022 at 11:57:19PM +0200, Amir Goldstein wrote:
> > On Mon, Feb 28, 2022 at 11:11 PM Dave Chinner <david@fromorbit.com> wrote:
> > >
> > > On Mon, Feb 28, 2022 at 01:39:08PM +0200, Amir Goldstein wrote:
> > > > Show optional collected per-mount io stats in /proc/<pid>/mountstats
> > > > for filesystems that do not implement their own show_stats() method
> > > > and opted-in to generic per-mount stats with FS_MOUNT_STATS flag.
> > > >
> > > > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> > > > ---
> > > >  fs/mount.h          |  1 +
> > > >  fs/namespace.c      |  2 ++
> > > >  fs/proc_namespace.c | 13 +++++++++++++
> > > >  3 files changed, 16 insertions(+)
> > > >
> > > > diff --git a/fs/mount.h b/fs/mount.h
> > > > index f98bf4cd5b1a..2ab6308af78b 100644
> > > > --- a/fs/mount.h
> > > > +++ b/fs/mount.h
> > > > @@ -91,6 +91,7 @@ struct mount {
> > > >       int mnt_id;                     /* mount identifier */
> > > >       int mnt_group_id;               /* peer group identifier */
> > > >       int mnt_expiry_mark;            /* true if marked for expiry */
> > > > +     time64_t mnt_time;              /* time of mount */
> > > >       struct hlist_head mnt_pins;
> > > >       struct hlist_head mnt_stuck_children;
> > > >  } __randomize_layout;
> > > > diff --git a/fs/namespace.c b/fs/namespace.c
> > > > index 3fb8f11a42a1..546f07ed44c5 100644
> > > > --- a/fs/namespace.c
> > > > +++ b/fs/namespace.c
> > > > @@ -220,6 +220,8 @@ static struct mount *alloc_vfsmnt(const char *name)
> > > >               mnt->mnt_count = 1;
> > > >               mnt->mnt_writers = 0;
> > > >  #endif
> > > > +             /* For proc/<pid>/mountstats */
> > > > +             mnt->mnt_time = ktime_get_seconds();
> > > >
> > > >               INIT_HLIST_NODE(&mnt->mnt_hash);
> > > >               INIT_LIST_HEAD(&mnt->mnt_child);
> > > > diff --git a/fs/proc_namespace.c b/fs/proc_namespace.c
> > > > index 49650e54d2f8..d744fb8543f5 100644
> > > > --- a/fs/proc_namespace.c
> > > > +++ b/fs/proc_namespace.c
> > > > @@ -232,6 +232,19 @@ static int show_vfsstat(struct seq_file *m, struct vfsmount *mnt)
> > > >       if (sb->s_op->show_stats) {
> > > >               seq_putc(m, ' ');
> > > >               err = sb->s_op->show_stats(m, mnt_path.dentry);
> > > > +     } else if (mnt_has_stats(mnt)) {
> > > > +             /* Similar to /proc/<pid>/io */
> > > > +             seq_printf(m, "\n"
> > > > +                        "\ttimes: %lld %lld\n"
> > > > +                        "\trchar: %lld\n"
> > > > +                        "\twchar: %lld\n"
> > > > +                        "\tsyscr: %lld\n"
> > > > +                        "\tsyscw: %lld\n",
> > > > +                        r->mnt_time, ktime_get_seconds(),
> > > > +                        mnt_iostats_counter_read(r, MNTIOS_CHARS_RD),
> > > > +                        mnt_iostats_counter_read(r, MNTIOS_CHARS_WR),
> > > > +                        mnt_iostats_counter_read(r, MNTIOS_SYSCALLS_RD),
> > > > +                        mnt_iostats_counter_read(r, MNTIOS_SYSCALLS_WR));
> > >
> > > This doesn't scale as {cpus, mounts, counters, read frequency}
> > > matrix explodes.  Please iterate the per-mount per cpu counters
> > > once, adding up all counters in one pass to an array on stack, then
> > > print them all from the array.
> >
> > I am planning to move to per-sb iostats and was thinking of using
> > an array of 4 struct percpu_counter. That will make this sort of iteration
> > more challenging.
>
> No, it would get rid of it entirely. percpu_counter_read_positive()
> does not require any summing at all - that's a much better solution
> than a hand rolled set of percpu counters. Please do this.
>
> > Do you really think the read frequency of /proc/self/mountstats
> > warrants such performance optimization?
>
> We get bug reports every so often about the overhead of frequently
> summing per-cpu stats on large systems. Nothing ratelimits or
> restricts access to /proc/self/mountstats, so when you have a
> thousand CPUs and a million monkeys...
>
> Rule of thumb: don't do computationally expensive things to generate
> data for globally accessible sysfs files.
>
> > It's not like the case of the mighty struct xfsstats.
> > It is only going to fold 4 per cpu iterations into 1.
> > This doesn't look like a game changer to me.
> > Am I missing something?
>
> I'm just pointing out something we've had problems with in
> the past and are trying to help you avoid making the same mistakes.
> That's what reviewers are supposed to do, yes?

Yes, thank you :)

Amir.
diff mbox series

Patch

diff --git a/fs/mount.h b/fs/mount.h
index f98bf4cd5b1a..2ab6308af78b 100644
--- a/fs/mount.h
+++ b/fs/mount.h
@@ -91,6 +91,7 @@  struct mount {
 	int mnt_id;			/* mount identifier */
 	int mnt_group_id;		/* peer group identifier */
 	int mnt_expiry_mark;		/* true if marked for expiry */
+	time64_t mnt_time;		/* time of mount */
 	struct hlist_head mnt_pins;
 	struct hlist_head mnt_stuck_children;
 } __randomize_layout;
diff --git a/fs/namespace.c b/fs/namespace.c
index 3fb8f11a42a1..546f07ed44c5 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -220,6 +220,8 @@  static struct mount *alloc_vfsmnt(const char *name)
 		mnt->mnt_count = 1;
 		mnt->mnt_writers = 0;
 #endif
+		/* For proc/<pid>/mountstats */
+		mnt->mnt_time = ktime_get_seconds();
 
 		INIT_HLIST_NODE(&mnt->mnt_hash);
 		INIT_LIST_HEAD(&mnt->mnt_child);
diff --git a/fs/proc_namespace.c b/fs/proc_namespace.c
index 49650e54d2f8..d744fb8543f5 100644
--- a/fs/proc_namespace.c
+++ b/fs/proc_namespace.c
@@ -232,6 +232,19 @@  static int show_vfsstat(struct seq_file *m, struct vfsmount *mnt)
 	if (sb->s_op->show_stats) {
 		seq_putc(m, ' ');
 		err = sb->s_op->show_stats(m, mnt_path.dentry);
+	} else if (mnt_has_stats(mnt)) {
+		/* Similar to /proc/<pid>/io */
+		seq_printf(m, "\n"
+			   "\ttimes: %lld %lld\n"
+			   "\trchar: %lld\n"
+			   "\twchar: %lld\n"
+			   "\tsyscr: %lld\n"
+			   "\tsyscw: %lld\n",
+			   r->mnt_time, ktime_get_seconds(),
+			   mnt_iostats_counter_read(r, MNTIOS_CHARS_RD),
+			   mnt_iostats_counter_read(r, MNTIOS_CHARS_WR),
+			   mnt_iostats_counter_read(r, MNTIOS_SYSCALLS_RD),
+			   mnt_iostats_counter_read(r, MNTIOS_SYSCALLS_WR));
 	}
 
 	seq_putc(m, '\n');