mbox series

[v1,0/4] ceph: forward average read/write/metadata latency

Message ID 20210913131311.1347903-1-vshankar@redhat.com (mailing list archive)
Headers show
Series ceph: forward average read/write/metadata latency | expand

Message

Venky Shankar Sept. 13, 2021, 1:13 p.m. UTC
Right now, cumulative read/write/metadata latencies are tracked
and are periodically forwarded to the MDS. These meterics are not
particularly useful. A much more useful metric is the average latency
and standard deviation (stdev) which is what this series of patches
aims to do.

The userspace (libcephfs+tool) changes are here::

          https://github.com/ceph/ceph/pull/41397

The math involved in keeping track of the average latency and stdev
seems incorrect, so, this series fixes that up too (closely mimics
how its done in userspace with some restrictions obviously) as per::

          NEW_AVG = OLD_AVG + ((latency - OLD_AVG) / total_ops)
          NEW_STDEV = SQRT(((OLD_STDEV + (latency - OLD_AVG)*(latency - NEW_AVG)) / (total_ops - 1)))

Note that the cumulative latencies are still forwarded to the MDS but
the tool (cephfs-top) ignores it altogether.

Venky Shankar (4):
  ceph: use "struct ceph_timespec" for r/w/m latencies
  ceph: track average/stdev r/w/m latency
  ceph: include average/stddev r/w/m latency in mds metrics
  ceph: use tracked average r/w/m latencies to display metrics in
    debugfs

 fs/ceph/debugfs.c | 12 +++----
 fs/ceph/metric.c  | 81 +++++++++++++++++++++++++----------------------
 fs/ceph/metric.h  | 64 +++++++++++++++++++++++--------------
 3 files changed, 90 insertions(+), 67 deletions(-)

Comments

Jeff Layton Sept. 13, 2021, 3:13 p.m. UTC | #1
On Mon, 2021-09-13 at 18:43 +0530, Venky Shankar wrote:
> Right now, cumulative read/write/metadata latencies are tracked
> and are periodically forwarded to the MDS. These meterics are not
> particularly useful. A much more useful metric is the average latency
> and standard deviation (stdev) which is what this series of patches
> aims to do.
> 
> The userspace (libcephfs+tool) changes are here::
> 
>           https://github.com/ceph/ceph/pull/41397
> 
> The math involved in keeping track of the average latency and stdev
> seems incorrect, so, this series fixes that up too (closely mimics
> how its done in userspace with some restrictions obviously) as per::
> 
>           NEW_AVG = OLD_AVG + ((latency - OLD_AVG) / total_ops)
>           NEW_STDEV = SQRT(((OLD_STDEV + (latency - OLD_AVG)*(latency - NEW_AVG)) / (total_ops - 1)))
> 
> Note that the cumulative latencies are still forwarded to the MDS but
> the tool (cephfs-top) ignores it altogether.
> 
> Venky Shankar (4):
>   ceph: use "struct ceph_timespec" for r/w/m latencies
>   ceph: track average/stdev r/w/m latency
>   ceph: include average/stddev r/w/m latency in mds metrics
>   ceph: use tracked average r/w/m latencies to display metrics in
>     debugfs
> 
>  fs/ceph/debugfs.c | 12 +++----
>  fs/ceph/metric.c  | 81 +++++++++++++++++++++++++----------------------
>  fs/ceph/metric.h  | 64 +++++++++++++++++++++++--------------
>  3 files changed, 90 insertions(+), 67 deletions(-)
> 

This looks reasonably sane. I'll plan to go ahead and pull this into the
testing kernels and do some testing with them. If anyone has objections
(Xiubo?) let me know and I can take them out.

Thanks,
Jeff Layton Sept. 13, 2021, 3:21 p.m. UTC | #2
On Mon, 2021-09-13 at 11:13 -0400, Jeff Layton wrote:
> On Mon, 2021-09-13 at 18:43 +0530, Venky Shankar wrote:
> > Right now, cumulative read/write/metadata latencies are tracked
> > and are periodically forwarded to the MDS. These meterics are not
> > particularly useful. A much more useful metric is the average latency
> > and standard deviation (stdev) which is what this series of patches
> > aims to do.
> > 
> > The userspace (libcephfs+tool) changes are here::
> > 
> >           https://github.com/ceph/ceph/pull/41397
> > 
> > The math involved in keeping track of the average latency and stdev
> > seems incorrect, so, this series fixes that up too (closely mimics
> > how its done in userspace with some restrictions obviously) as per::
> > 
> >           NEW_AVG = OLD_AVG + ((latency - OLD_AVG) / total_ops)
> >           NEW_STDEV = SQRT(((OLD_STDEV + (latency - OLD_AVG)*(latency - NEW_AVG)) / (total_ops - 1)))
> > 
> > Note that the cumulative latencies are still forwarded to the MDS but
> > the tool (cephfs-top) ignores it altogether.
> > 
> > Venky Shankar (4):
> >   ceph: use "struct ceph_timespec" for r/w/m latencies
> >   ceph: track average/stdev r/w/m latency
> >   ceph: include average/stddev r/w/m latency in mds metrics
> >   ceph: use tracked average r/w/m latencies to display metrics in
> >     debugfs
> > 
> >  fs/ceph/debugfs.c | 12 +++----
> >  fs/ceph/metric.c  | 81 +++++++++++++++++++++++++----------------------
> >  fs/ceph/metric.h  | 64 +++++++++++++++++++++++--------------
> >  3 files changed, 90 insertions(+), 67 deletions(-)
> > 
> 
> This looks reasonably sane. I'll plan to go ahead and pull this into the
> testing kernels and do some testing with them. If anyone has objections
> (Xiubo?) let me know and I can take them out.
> 
> Thanks,

Hmm...I take it back. There are some non-trivial merge conflicts in this
series vs. the current testing branch. Venky can you rebase this onto
the ceph-client/testing branch and resubmit?

Thanks,
Venky Shankar Sept. 13, 2021, 3:41 p.m. UTC | #3
On Mon, Sep 13, 2021 at 8:51 PM Jeff Layton <jlayton@redhat.com> wrote:
>
> On Mon, 2021-09-13 at 11:13 -0400, Jeff Layton wrote:
> > On Mon, 2021-09-13 at 18:43 +0530, Venky Shankar wrote:
> > > Right now, cumulative read/write/metadata latencies are tracked
> > > and are periodically forwarded to the MDS. These meterics are not
> > > particularly useful. A much more useful metric is the average latency
> > > and standard deviation (stdev) which is what this series of patches
> > > aims to do.
> > >
> > > The userspace (libcephfs+tool) changes are here::
> > >
> > >           https://github.com/ceph/ceph/pull/41397
> > >
> > > The math involved in keeping track of the average latency and stdev
> > > seems incorrect, so, this series fixes that up too (closely mimics
> > > how its done in userspace with some restrictions obviously) as per::
> > >
> > >           NEW_AVG = OLD_AVG + ((latency - OLD_AVG) / total_ops)
> > >           NEW_STDEV = SQRT(((OLD_STDEV + (latency - OLD_AVG)*(latency - NEW_AVG)) / (total_ops - 1)))
> > >
> > > Note that the cumulative latencies are still forwarded to the MDS but
> > > the tool (cephfs-top) ignores it altogether.
> > >
> > > Venky Shankar (4):
> > >   ceph: use "struct ceph_timespec" for r/w/m latencies
> > >   ceph: track average/stdev r/w/m latency
> > >   ceph: include average/stddev r/w/m latency in mds metrics
> > >   ceph: use tracked average r/w/m latencies to display metrics in
> > >     debugfs
> > >
> > >  fs/ceph/debugfs.c | 12 +++----
> > >  fs/ceph/metric.c  | 81 +++++++++++++++++++++++++----------------------
> > >  fs/ceph/metric.h  | 64 +++++++++++++++++++++++--------------
> > >  3 files changed, 90 insertions(+), 67 deletions(-)
> > >
> >
> > This looks reasonably sane. I'll plan to go ahead and pull this into the
> > testing kernels and do some testing with them. If anyone has objections
> > (Xiubo?) let me know and I can take them out.
> >
> > Thanks,
>
> Hmm...I take it back. There are some non-trivial merge conflicts in this
> series vs. the current testing branch. Venky can you rebase this onto
> the ceph-client/testing branch and resubmit?

Sure, will do.

>
> Thanks,
> --
> Jeff Layton <jlayton@redhat.com>
>
Xiubo Li Sept. 14, 2021, 12:53 p.m. UTC | #4
On 9/13/21 11:13 PM, Jeff Layton wrote:
> On Mon, 2021-09-13 at 18:43 +0530, Venky Shankar wrote:
>> Right now, cumulative read/write/metadata latencies are tracked
>> and are periodically forwarded to the MDS. These meterics are not
>> particularly useful. A much more useful metric is the average latency
>> and standard deviation (stdev) which is what this series of patches
>> aims to do.
>>
>> The userspace (libcephfs+tool) changes are here::
>>
>>            https://github.com/ceph/ceph/pull/41397
>>
>> The math involved in keeping track of the average latency and stdev
>> seems incorrect, so, this series fixes that up too (closely mimics
>> how its done in userspace with some restrictions obviously) as per::
>>
>>            NEW_AVG = OLD_AVG + ((latency - OLD_AVG) / total_ops)
>>            NEW_STDEV = SQRT(((OLD_STDEV + (latency - OLD_AVG)*(latency - NEW_AVG)) / (total_ops - 1)))
>>
>> Note that the cumulative latencies are still forwarded to the MDS but
>> the tool (cephfs-top) ignores it altogether.
>>
>> Venky Shankar (4):
>>    ceph: use "struct ceph_timespec" for r/w/m latencies
>>    ceph: track average/stdev r/w/m latency
>>    ceph: include average/stddev r/w/m latency in mds metrics
>>    ceph: use tracked average r/w/m latencies to display metrics in
>>      debugfs
>>
>>   fs/ceph/debugfs.c | 12 +++----
>>   fs/ceph/metric.c  | 81 +++++++++++++++++++++++++----------------------
>>   fs/ceph/metric.h  | 64 +++++++++++++++++++++++--------------
>>   3 files changed, 90 insertions(+), 67 deletions(-)
>>
> This looks reasonably sane. I'll plan to go ahead and pull this into the
> testing kernels and do some testing with them. If anyone has objections
> (Xiubo?) let me know and I can take them out.

Sorry I think I missed this patch set.

Comment it in the V2.

Thanks.


>
> Thanks,