Message ID | 20210913131311.1347903-1-vshankar@redhat.com (mailing list archive) |
---|---|
Headers | show |
Series | ceph: forward average read/write/metadata latency | expand |
On Mon, 2021-09-13 at 18:43 +0530, Venky Shankar wrote: > Right now, cumulative read/write/metadata latencies are tracked > and are periodically forwarded to the MDS. These meterics are not > particularly useful. A much more useful metric is the average latency > and standard deviation (stdev) which is what this series of patches > aims to do. > > The userspace (libcephfs+tool) changes are here:: > > https://github.com/ceph/ceph/pull/41397 > > The math involved in keeping track of the average latency and stdev > seems incorrect, so, this series fixes that up too (closely mimics > how its done in userspace with some restrictions obviously) as per:: > > NEW_AVG = OLD_AVG + ((latency - OLD_AVG) / total_ops) > NEW_STDEV = SQRT(((OLD_STDEV + (latency - OLD_AVG)*(latency - NEW_AVG)) / (total_ops - 1))) > > Note that the cumulative latencies are still forwarded to the MDS but > the tool (cephfs-top) ignores it altogether. > > Venky Shankar (4): > ceph: use "struct ceph_timespec" for r/w/m latencies > ceph: track average/stdev r/w/m latency > ceph: include average/stddev r/w/m latency in mds metrics > ceph: use tracked average r/w/m latencies to display metrics in > debugfs > > fs/ceph/debugfs.c | 12 +++---- > fs/ceph/metric.c | 81 +++++++++++++++++++++++++---------------------- > fs/ceph/metric.h | 64 +++++++++++++++++++++++-------------- > 3 files changed, 90 insertions(+), 67 deletions(-) > This looks reasonably sane. I'll plan to go ahead and pull this into the testing kernels and do some testing with them. If anyone has objections (Xiubo?) let me know and I can take them out. Thanks,
On Mon, 2021-09-13 at 11:13 -0400, Jeff Layton wrote: > On Mon, 2021-09-13 at 18:43 +0530, Venky Shankar wrote: > > Right now, cumulative read/write/metadata latencies are tracked > > and are periodically forwarded to the MDS. These meterics are not > > particularly useful. A much more useful metric is the average latency > > and standard deviation (stdev) which is what this series of patches > > aims to do. > > > > The userspace (libcephfs+tool) changes are here:: > > > > https://github.com/ceph/ceph/pull/41397 > > > > The math involved in keeping track of the average latency and stdev > > seems incorrect, so, this series fixes that up too (closely mimics > > how its done in userspace with some restrictions obviously) as per:: > > > > NEW_AVG = OLD_AVG + ((latency - OLD_AVG) / total_ops) > > NEW_STDEV = SQRT(((OLD_STDEV + (latency - OLD_AVG)*(latency - NEW_AVG)) / (total_ops - 1))) > > > > Note that the cumulative latencies are still forwarded to the MDS but > > the tool (cephfs-top) ignores it altogether. > > > > Venky Shankar (4): > > ceph: use "struct ceph_timespec" for r/w/m latencies > > ceph: track average/stdev r/w/m latency > > ceph: include average/stddev r/w/m latency in mds metrics > > ceph: use tracked average r/w/m latencies to display metrics in > > debugfs > > > > fs/ceph/debugfs.c | 12 +++---- > > fs/ceph/metric.c | 81 +++++++++++++++++++++++++---------------------- > > fs/ceph/metric.h | 64 +++++++++++++++++++++++-------------- > > 3 files changed, 90 insertions(+), 67 deletions(-) > > > > This looks reasonably sane. I'll plan to go ahead and pull this into the > testing kernels and do some testing with them. If anyone has objections > (Xiubo?) let me know and I can take them out. > > Thanks, Hmm...I take it back. There are some non-trivial merge conflicts in this series vs. the current testing branch. Venky can you rebase this onto the ceph-client/testing branch and resubmit? Thanks,
On Mon, Sep 13, 2021 at 8:51 PM Jeff Layton <jlayton@redhat.com> wrote: > > On Mon, 2021-09-13 at 11:13 -0400, Jeff Layton wrote: > > On Mon, 2021-09-13 at 18:43 +0530, Venky Shankar wrote: > > > Right now, cumulative read/write/metadata latencies are tracked > > > and are periodically forwarded to the MDS. These meterics are not > > > particularly useful. A much more useful metric is the average latency > > > and standard deviation (stdev) which is what this series of patches > > > aims to do. > > > > > > The userspace (libcephfs+tool) changes are here:: > > > > > > https://github.com/ceph/ceph/pull/41397 > > > > > > The math involved in keeping track of the average latency and stdev > > > seems incorrect, so, this series fixes that up too (closely mimics > > > how its done in userspace with some restrictions obviously) as per:: > > > > > > NEW_AVG = OLD_AVG + ((latency - OLD_AVG) / total_ops) > > > NEW_STDEV = SQRT(((OLD_STDEV + (latency - OLD_AVG)*(latency - NEW_AVG)) / (total_ops - 1))) > > > > > > Note that the cumulative latencies are still forwarded to the MDS but > > > the tool (cephfs-top) ignores it altogether. > > > > > > Venky Shankar (4): > > > ceph: use "struct ceph_timespec" for r/w/m latencies > > > ceph: track average/stdev r/w/m latency > > > ceph: include average/stddev r/w/m latency in mds metrics > > > ceph: use tracked average r/w/m latencies to display metrics in > > > debugfs > > > > > > fs/ceph/debugfs.c | 12 +++---- > > > fs/ceph/metric.c | 81 +++++++++++++++++++++++++---------------------- > > > fs/ceph/metric.h | 64 +++++++++++++++++++++++-------------- > > > 3 files changed, 90 insertions(+), 67 deletions(-) > > > > > > > This looks reasonably sane. I'll plan to go ahead and pull this into the > > testing kernels and do some testing with them. If anyone has objections > > (Xiubo?) let me know and I can take them out. > > > > Thanks, > > Hmm...I take it back. There are some non-trivial merge conflicts in this > series vs. the current testing branch. Venky can you rebase this onto > the ceph-client/testing branch and resubmit? Sure, will do. > > Thanks, > -- > Jeff Layton <jlayton@redhat.com> >
On 9/13/21 11:13 PM, Jeff Layton wrote: > On Mon, 2021-09-13 at 18:43 +0530, Venky Shankar wrote: >> Right now, cumulative read/write/metadata latencies are tracked >> and are periodically forwarded to the MDS. These meterics are not >> particularly useful. A much more useful metric is the average latency >> and standard deviation (stdev) which is what this series of patches >> aims to do. >> >> The userspace (libcephfs+tool) changes are here:: >> >> https://github.com/ceph/ceph/pull/41397 >> >> The math involved in keeping track of the average latency and stdev >> seems incorrect, so, this series fixes that up too (closely mimics >> how its done in userspace with some restrictions obviously) as per:: >> >> NEW_AVG = OLD_AVG + ((latency - OLD_AVG) / total_ops) >> NEW_STDEV = SQRT(((OLD_STDEV + (latency - OLD_AVG)*(latency - NEW_AVG)) / (total_ops - 1))) >> >> Note that the cumulative latencies are still forwarded to the MDS but >> the tool (cephfs-top) ignores it altogether. >> >> Venky Shankar (4): >> ceph: use "struct ceph_timespec" for r/w/m latencies >> ceph: track average/stdev r/w/m latency >> ceph: include average/stddev r/w/m latency in mds metrics >> ceph: use tracked average r/w/m latencies to display metrics in >> debugfs >> >> fs/ceph/debugfs.c | 12 +++---- >> fs/ceph/metric.c | 81 +++++++++++++++++++++++++---------------------- >> fs/ceph/metric.h | 64 +++++++++++++++++++++++-------------- >> 3 files changed, 90 insertions(+), 67 deletions(-) >> > This looks reasonably sane. I'll plan to go ahead and pull this into the > testing kernels and do some testing with them. If anyone has objections > (Xiubo?) let me know and I can take them out. Sorry I think I missed this patch set. Comment it in the V2. Thanks. > > Thanks,