diff mbox series

[v2] ceph: only send metrics when the MDS rank is ready

Message ID 20230606005732.1056361-1-xiubli@redhat.com (mailing list archive)
State New, archived
Headers show
Series [v2] ceph: only send metrics when the MDS rank is ready | expand

Commit Message

Xiubo Li June 6, 2023, 12:57 a.m. UTC
From: Xiubo Li <xiubli@redhat.com>

When the MDS rank is in clientreplay state, the metrics requests
will be discarded directly. Also, when there are a lot of known
client requests to recover from, the metrics requests will slow
down the MDS rank from getting to the active state sooner.

With this patch, we will send the metrics requests only when the
MDS rank is in active state.

URL: https://tracker.ceph.com/issues/61524
Reviewed-by: Milind Changire <mchangir@redhat.com>
Signed-off-by: Xiubo Li <xiubli@redhat.com>
---

V2:
- rephrase the commit comment from Milind's comments.


 fs/ceph/metric.c | 8 ++++++++
 1 file changed, 8 insertions(+)

Comments

Venky Shankar June 16, 2023, 4:40 a.m. UTC | #1
Hi Xiubo,

On Tue, Jun 6, 2023 at 6:30 AM <xiubli@redhat.com> wrote:
>
> From: Xiubo Li <xiubli@redhat.com>
>
> When the MDS rank is in clientreplay state, the metrics requests
> will be discarded directly. Also, when there are a lot of known
> client requests to recover from, the metrics requests will slow
> down the MDS rank from getting to the active state sooner.
>
> With this patch, we will send the metrics requests only when the
> MDS rank is in active state.

Although the changes look fine. I have a question though - the metrics
are sent by the client each second (on tick()) - how many clients were
connected for the MDS to experience further slowness due to metric
update messages?

>
> URL: https://tracker.ceph.com/issues/61524
> Reviewed-by: Milind Changire <mchangir@redhat.com>
> Signed-off-by: Xiubo Li <xiubli@redhat.com>
> ---
>
> V2:
> - rephrase the commit comment from Milind's comments.
>
>
>  fs/ceph/metric.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
>
> diff --git a/fs/ceph/metric.c b/fs/ceph/metric.c
> index c47347d2e84e..cce78d769f55 100644
> --- a/fs/ceph/metric.c
> +++ b/fs/ceph/metric.c
> @@ -36,6 +36,14 @@ static bool ceph_mdsc_send_metrics(struct ceph_mds_client *mdsc,
>         s32 items = 0;
>         s32 len;
>
> +       /* Do not send the metrics until the MDS rank is ready */
> +       mutex_lock(&mdsc->mutex);
> +       if (ceph_mdsmap_get_state(mdsc->mdsmap, s->s_mds) != CEPH_MDS_STATE_ACTIVE) {
> +               mutex_unlock(&mdsc->mutex);
> +               return false;
> +       }
> +       mutex_unlock(&mdsc->mutex);
> +
>         len = sizeof(*head) + sizeof(*cap) + sizeof(*read) + sizeof(*write)
>               + sizeof(*meta) + sizeof(*dlease) + sizeof(*files)
>               + sizeof(*icaps) + sizeof(*inodes) + sizeof(*rsize)
> --
> 2.40.1
>
Xiubo Li June 18, 2023, 11:07 p.m. UTC | #2
On 6/16/23 12:40, Venky Shankar wrote:
> Hi Xiubo,
>
> On Tue, Jun 6, 2023 at 6:30 AM <xiubli@redhat.com> wrote:
>> From: Xiubo Li <xiubli@redhat.com>
>>
>> When the MDS rank is in clientreplay state, the metrics requests
>> will be discarded directly. Also, when there are a lot of known
>> client requests to recover from, the metrics requests will slow
>> down the MDS rank from getting to the active state sooner.
>>
>> With this patch, we will send the metrics requests only when the
>> MDS rank is in active state.
> Although the changes look fine. I have a question though - the metrics
> are sent by the client each second (on tick()) - how many clients were
> connected for the MDS to experience further slowness due to metric
> update messages?

As I remembered there were 1761 clients. The MDS will just queue the 
metric messages and then trigger to print one debug log message , which 
could slow down the MDS, to buffer and file(?) for each when handling them.

Thanks

- Xiubo

>
>> URL: https://tracker.ceph.com/issues/61524
>> Reviewed-by: Milind Changire <mchangir@redhat.com>
>> Signed-off-by: Xiubo Li <xiubli@redhat.com>
>> ---
>>
>> V2:
>> - rephrase the commit comment from Milind's comments.
>>
>>
>>   fs/ceph/metric.c | 8 ++++++++
>>   1 file changed, 8 insertions(+)
>>
>> diff --git a/fs/ceph/metric.c b/fs/ceph/metric.c
>> index c47347d2e84e..cce78d769f55 100644
>> --- a/fs/ceph/metric.c
>> +++ b/fs/ceph/metric.c
>> @@ -36,6 +36,14 @@ static bool ceph_mdsc_send_metrics(struct ceph_mds_client *mdsc,
>>          s32 items = 0;
>>          s32 len;
>>
>> +       /* Do not send the metrics until the MDS rank is ready */
>> +       mutex_lock(&mdsc->mutex);
>> +       if (ceph_mdsmap_get_state(mdsc->mdsmap, s->s_mds) != CEPH_MDS_STATE_ACTIVE) {
>> +               mutex_unlock(&mdsc->mutex);
>> +               return false;
>> +       }
>> +       mutex_unlock(&mdsc->mutex);
>> +
>>          len = sizeof(*head) + sizeof(*cap) + sizeof(*read) + sizeof(*write)
>>                + sizeof(*meta) + sizeof(*dlease) + sizeof(*files)
>>                + sizeof(*icaps) + sizeof(*inodes) + sizeof(*rsize)
>> --
>> 2.40.1
>>
>
diff mbox series

Patch

diff --git a/fs/ceph/metric.c b/fs/ceph/metric.c
index c47347d2e84e..cce78d769f55 100644
--- a/fs/ceph/metric.c
+++ b/fs/ceph/metric.c
@@ -36,6 +36,14 @@  static bool ceph_mdsc_send_metrics(struct ceph_mds_client *mdsc,
 	s32 items = 0;
 	s32 len;
 
+	/* Do not send the metrics until the MDS rank is ready */
+	mutex_lock(&mdsc->mutex);
+	if (ceph_mdsmap_get_state(mdsc->mdsmap, s->s_mds) != CEPH_MDS_STATE_ACTIVE) {
+		mutex_unlock(&mdsc->mutex);
+		return false;
+	}
+	mutex_unlock(&mdsc->mutex);
+
 	len = sizeof(*head) + sizeof(*cap) + sizeof(*read) + sizeof(*write)
 	      + sizeof(*meta) + sizeof(*dlease) + sizeof(*files)
 	      + sizeof(*icaps) + sizeof(*inodes) + sizeof(*rsize)