net/mlx5: Added cond_resched() to crdump collection

Message ID	20240819214259.38259-1-mkhalfella@purestorage.com (mailing list archive)
State	Superseded
Delegated to:	Netdev Maintainers
Headers	show Received: from mail-pf1-f177.google.com (mail-pf1-f177.google.com [209.85.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5CBF514B945 for <netdev@vger.kernel.org>; Mon, 19 Aug 2024 21:43:18 +0000 (UTC) From: Mohamed Khalfella <mkhalfella@purestorage.com> To: Saeed Mahameed <saeedm@nvidia.com>, Leon Romanovsky <leon@kernel.org>, Tariq Toukan <tariqt@nvidia.com>, "David S. Miller" <davem@davemloft.net>, Eric Dumazet <edumazet@google.com>, Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com> Cc: Mohamed Khalfella <mkhalfella@purestorage.com>, Yuanyuan Zhong <yzhong@purestorage.com>, Shay Drori <shayd@nvidia.com>, Moshe Shemesh <moshe@nvidia.com>, netdev@vger.kernel.org, linux-rdma@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH] net/mlx5: Added cond_resched() to crdump collection Date: Mon, 19 Aug 2024 15:42:57 -0600 Message-Id: <20240819214259.38259-1-mkhalfella@purestorage.com> Precedence: bulk
Series	net/mlx5: Added cond_resched() to crdump collection \| expand net/mlx5: Added cond_resched() to crdump collection

Context	Check	Description
netdev/series_format	warning	Single patches do not need cover letters; Target tree name not specified in the subject
netdev/tree_selection	success	Guessed tree name to be net-next
netdev/ynl	success	Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present	success	Fixes tag not required for -next series
netdev/header_inline	success	No static functions without inline keyword in header files
netdev/build_32bit	success	Errors and warnings before: 16 this patch: 16
netdev/build_tools	success	No tools touched, skip
netdev/cc_maintainers	success	CCed 12 of 12 maintainers
netdev/build_clang	success	Errors and warnings before: 16 this patch: 16
netdev/verify_signedoff	success	Signed-off-by tag matches author and committer
netdev/deprecated_api	success	None detected
netdev/check_selftest	success	No net selftest shell script
netdev/verify_fixes	success	No Fixes tag
netdev/build_allmodconfig_warn	success	Errors and warnings before: 16 this patch: 16
netdev/checkpatch	success	total: 0 errors, 0 warnings, 0 checks, 14 lines checked
netdev/build_clang_rust	success	No Rust files in patch. Skipping build
netdev/kdoc	success	Errors and warnings before: 0 this patch: 0
netdev/source_inline	success	Was 0 now: 0
netdev/contest	success	net-next-2024-08-20--09-00 (tests: 712)

Mohamed Khalfella Aug. 19, 2024, 9:42 p.m. UTC

Collecting crdump involves dumping vsc registers from pci config space
of mlx device. The code can run for long time starving other threads
want to run on the cpu. Added conditional reschedule between register
reads and while waiting for register value to release the cpu more
often.

Reviewed-by: Yuanyuan Zhong <yzhong@purestorage.com>
Signed-off-by: Mohamed Khalfella <mkhalfella@purestorage.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c | 2 ++
 1 file changed, 2 insertions(+)

Przemek Kitszel Aug. 20, 2024, 10:09 a.m. UTC | #1

On 8/19/24 23:42, Mohamed Khalfella wrote:
> Collecting crdump involves dumping vsc registers from pci config space
> of mlx device. The code can run for long time starving other threads
> want to run on the cpu. Added conditional reschedule between register
> reads and while waiting for register value to release the cpu more
> often.
> 
> Reviewed-by: Yuanyuan Zhong <yzhong@purestorage.com>
> Signed-off-by: Mohamed Khalfella <mkhalfella@purestorage.com>
> ---
>   drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c | 2 ++
>   1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c
> index d0b595ba6110..377cc39643b4 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c
> @@ -191,6 +191,7 @@ static int mlx5_vsc_wait_on_flag(struct mlx5_core_dev *dev, u8 expected_val)
>   		if ((retries & 0xf) == 0)
>   			usleep_range(1000, 2000);
>   
> +		cond_resched();

the sleeping logic above (including what is out of git diff context) is
a bit weird (tight loop with a sleep after each 16 attempts, with an
upper bound of 2k attempts!)

My understanding of usleep_range() is that it puts process to sleep
(and even leads to sched() call).
So cond_resched() looks redundant here.

>   	} while (flag != expected_val);
>   
>   	return 0;
> @@ -280,6 +281,7 @@ int mlx5_vsc_gw_read_block_fast(struct mlx5_core_dev *dev, u32 *data,
>   			return read_addr;
>   
>   		read_addr = next_read_addr;
> +		cond_resched();

Would be great to see how many registers there are/how long it takes to
dump them in commit message.
My guess is that a single mlx5_vsc_gw_read_fast() call is very short and
there are many. With that cond_resched() should be rather under some
if (iterator % XXX == 0) condition.

>   	}
>   	return length;
>   }

Mohamed Khalfella Aug. 20, 2024, 10:27 p.m. UTC | #2

On 2024-08-20 12:09:37 +0200, Przemek Kitszel wrote:
> On 8/19/24 23:42, Mohamed Khalfella wrote:
> > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c
> > index d0b595ba6110..377cc39643b4 100644
> > --- a/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c
> > +++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c
> > @@ -191,6 +191,7 @@ static int mlx5_vsc_wait_on_flag(struct mlx5_core_dev *dev, u8 expected_val)
> >   		if ((retries & 0xf) == 0)
> >   			usleep_range(1000, 2000);
> >   
> > +		cond_resched();
> 
> the sleeping logic above (including what is out of git diff context) is
> a bit weird (tight loop with a sleep after each 16 attempts, with an
> upper bound of 2k attempts!)
> 
> My understanding of usleep_range() is that it puts process to sleep
> (and even leads to sched() call).
> So cond_resched() looks redundant here.

This matches my understanding too. usleep_range() should put the thread
to sleep, effectively releasing the cpu to do other work. The reason I
put cond_resched() here is that pci_read_config_dword() might take long
time when that card sees fatal errors. I was not able to reproduce this
so I am okay with removing this cond_resched().

> 
> >   	} while (flag != expected_val);
> >   
> >   	return 0;
> > @@ -280,6 +281,7 @@ int mlx5_vsc_gw_read_block_fast(struct mlx5_core_dev *dev, u32 *data,
> >   			return read_addr;
> >   
> >   		read_addr = next_read_addr;
> > +		cond_resched();
> 
> Would be great to see how many registers there are/how long it takes to
> dump them in commit message.
> My guess is that a single mlx5_vsc_gw_read_fast() call is very short and
> there are many. With that cond_resched() should be rather under some

I did some testing on ConnectX-5 Ex MCX516A-CDAT and here is what I saw:

- mlx5_vsc_gw_read_block_fast() was called with length = 1310716
- mlx5_vsc_gw_read_fast() does 4 bytes at a time but the did not read
  full 1310716 bytes. Instead it was called 53813 times only. There are
  jumps in read_addr.
- On average mlx5_vsc_gw_read_fast() took 35284.4ns
- In total mlx5_vsc_wait_on_flag() called vsc_read() 54707 times with
  average runtime of 17548.3ns for each call. In some instances vsc_read()
  was called more than once until mlx5_vsc_wait_on_flag() returned. Mostly
  one time, but I saw 5, 8, and in one instance 16 times. As expected,
  the thread released the cpu after 16 iterations.
- Total time to read the dump was 35284.4ns * 53813 ~= 1.898s

> if (iterator % XXX == 0) condition.

Putting a cond_resched() every 16 register reads, similar to
mlx5_vsc_wait_on_flag(), should be okay. With the numbers above, this
will result in cond_resched() every ~0.56ms, which is okay IMO.

Moshe Shemesh Aug. 22, 2024, 6:40 a.m. UTC | #3

On 8/21/2024 1:27 AM, Mohamed Khalfella wrote:
> 
> On 2024-08-20 12:09:37 +0200, Przemek Kitszel wrote:
>> On 8/19/24 23:42, Mohamed Khalfella wrote:
>>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c
>>> index d0b595ba6110..377cc39643b4 100644
>>> --- a/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c
>>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c
>>> @@ -191,6 +191,7 @@ static int mlx5_vsc_wait_on_flag(struct mlx5_core_dev *dev, u8 expected_val)
>>>              if ((retries & 0xf) == 0)
>>>                      usleep_range(1000, 2000);
>>>
>>> +           cond_resched();
>>
>> the sleeping logic above (including what is out of git diff context) is
>> a bit weird (tight loop with a sleep after each 16 attempts, with an
>> upper bound of 2k attempts!)
>>
>> My understanding of usleep_range() is that it puts process to sleep
>> (and even leads to sched() call).
>> So cond_resched() looks redundant here.
> 
> This matches my understanding too. usleep_range() should put the thread
> to sleep, effectively releasing the cpu to do other work. The reason I
> put cond_resched() here is that pci_read_config_dword() might take long
> time when that card sees fatal errors. I was not able to reproduce this
> so I am okay with removing this cond_resched().
> 
>>
>>>      } while (flag != expected_val);
>>>
>>>      return 0;
>>> @@ -280,6 +281,7 @@ int mlx5_vsc_gw_read_block_fast(struct mlx5_core_dev *dev, u32 *data,
>>>                      return read_addr;
>>>
>>>              read_addr = next_read_addr;
>>> +           cond_resched();
>>
>> Would be great to see how many registers there are/how long it takes to
>> dump them in commit message.
>> My guess is that a single mlx5_vsc_gw_read_fast() call is very short and
>> there are many. With that cond_resched() should be rather under some
> 
> I did some testing on ConnectX-5 Ex MCX516A-CDAT and here is what I saw:
> 
> - mlx5_vsc_gw_read_block_fast() was called with length = 1310716
> - mlx5_vsc_gw_read_fast() does 4 bytes at a time but the did not read
>    full 1310716 bytes. Instead it was called 53813 times only. There are
>    jumps in read_addr.
> - On average mlx5_vsc_gw_read_fast() took 35284.4ns
> - In total mlx5_vsc_wait_on_flag() called vsc_read() 54707 times with
>    average runtime of 17548.3ns for each call. In some instances vsc_read()
>    was called more than once until mlx5_vsc_wait_on_flag() returned. Mostly
>    one time, but I saw 5, 8, and in one instance 16 times. As expected,
>    the thread released the cpu after 16 iterations.
> - Total time to read the dump was 35284.4ns * 53813 ~= 1.898s
> 
>> if (iterator % XXX == 0) condition.
> 
> Putting a cond_resched() every 16 register reads, similar to
> mlx5_vsc_wait_on_flag(), should be okay. With the numbers above, this
> will result in cond_resched() every ~0.56ms, which is okay IMO.

Sorry for the late response, I just got back from vacation.
All your measures looks right.
crdump is the devlink health dump of mlx5 FW fatal health reporter.
In the common case since auto-dump and auto-recover are default for this 
health reporter, the crdump will be collected on fatal error of the mlx5 
device and the recovery flow waits for it and run right after crdump 
finished.
I agree with adding cond_resched(), but I would reduce the frequency, 
like once in 1024 iterations of register read.
mlx5_vsc_wait_on_flag() is a bit different case as the usleep there is 
after 16 retries waiting for the value to change.
Thanks.

Mohamed Khalfella Aug. 22, 2024, 5:08 p.m. UTC | #4

On 2024-08-22 09:40:21 +0300, Moshe Shemesh wrote:
> 
> 
> On 8/21/2024 1:27 AM, Mohamed Khalfella wrote:
> > 
> > On 2024-08-20 12:09:37 +0200, Przemek Kitszel wrote:
> >> On 8/19/24 23:42, Mohamed Khalfella wrote:
> >>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c
> >>> index d0b595ba6110..377cc39643b4 100644
> >>> --- a/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c
> >>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c
> >>> @@ -191,6 +191,7 @@ static int mlx5_vsc_wait_on_flag(struct mlx5_core_dev *dev, u8 expected_val)
> >>>              if ((retries & 0xf) == 0)
> >>>                      usleep_range(1000, 2000);
> >>>
> >>> +           cond_resched();
> >>
> >> the sleeping logic above (including what is out of git diff context) is
> >> a bit weird (tight loop with a sleep after each 16 attempts, with an
> >> upper bound of 2k attempts!)
> >>
> >> My understanding of usleep_range() is that it puts process to sleep
> >> (and even leads to sched() call).
> >> So cond_resched() looks redundant here.
> > 
> > This matches my understanding too. usleep_range() should put the thread
> > to sleep, effectively releasing the cpu to do other work. The reason I
> > put cond_resched() here is that pci_read_config_dword() might take long
> > time when that card sees fatal errors. I was not able to reproduce this
> > so I am okay with removing this cond_resched().
> > 
> >>
> >>>      } while (flag != expected_val);
> >>>
> >>>      return 0;
> >>> @@ -280,6 +281,7 @@ int mlx5_vsc_gw_read_block_fast(struct mlx5_core_dev *dev, u32 *data,
> >>>                      return read_addr;
> >>>
> >>>              read_addr = next_read_addr;
> >>> +           cond_resched();
> >>
> >> Would be great to see how many registers there are/how long it takes to
> >> dump them in commit message.
> >> My guess is that a single mlx5_vsc_gw_read_fast() call is very short and
> >> there are many. With that cond_resched() should be rather under some
> > 
> > I did some testing on ConnectX-5 Ex MCX516A-CDAT and here is what I saw:
> > 
> > - mlx5_vsc_gw_read_block_fast() was called with length = 1310716
> > - mlx5_vsc_gw_read_fast() does 4 bytes at a time but the did not read
> >    full 1310716 bytes. Instead it was called 53813 times only. There are
> >    jumps in read_addr.
> > - On average mlx5_vsc_gw_read_fast() took 35284.4ns
> > - In total mlx5_vsc_wait_on_flag() called vsc_read() 54707 times with
> >    average runtime of 17548.3ns for each call. In some instances vsc_read()
> >    was called more than once until mlx5_vsc_wait_on_flag() returned. Mostly
> >    one time, but I saw 5, 8, and in one instance 16 times. As expected,
> >    the thread released the cpu after 16 iterations.
> > - Total time to read the dump was 35284.4ns * 53813 ~= 1.898s
> > 
> >> if (iterator % XXX == 0) condition.
> > 
> > Putting a cond_resched() every 16 register reads, similar to
> > mlx5_vsc_wait_on_flag(), should be okay. With the numbers above, this
> > will result in cond_resched() every ~0.56ms, which is okay IMO.
> 
> Sorry for the late response, I just got back from vacation.
> All your measures looks right.
> crdump is the devlink health dump of mlx5 FW fatal health reporter.
> In the common case since auto-dump and auto-recover are default for this 
> health reporter, the crdump will be collected on fatal error of the mlx5 
> device and the recovery flow waits for it and run right after crdump 
> finished.
> I agree with adding cond_resched(), but I would reduce the frequency, 
> like once in 1024 iterations of register read.
> mlx5_vsc_wait_on_flag() is a bit different case as the usleep there is 
> after 16 retries waiting for the value to change.
> Thanks.

Thanks for taking a look. Once in every 1024 iterations approximately
translates to 35284.4ns * 1024 ~= 36.1ms, which is relatively long time
IMO. How about any power-of-two <= 128 (~4.51ms)?

Przemek Kitszel Aug. 23, 2024, 4:08 a.m. UTC | #5

On 8/22/24 19:08, Mohamed Khalfella wrote:
> On 2024-08-22 09:40:21 +0300, Moshe Shemesh wrote:
>>
>>
>> On 8/21/2024 1:27 AM, Mohamed Khalfella wrote:
>>>
>>> On 2024-08-20 12:09:37 +0200, Przemek Kitszel wrote:
>>>> On 8/19/24 23:42, Mohamed Khalfella wrote:


>>>
>>> Putting a cond_resched() every 16 register reads, similar to
>>> mlx5_vsc_wait_on_flag(), should be okay. With the numbers above, this
>>> will result in cond_resched() every ~0.56ms, which is okay IMO.
>>
>> Sorry for the late response, I just got back from vacation.
>> All your measures looks right.
>> crdump is the devlink health dump of mlx5 FW fatal health reporter.
>> In the common case since auto-dump and auto-recover are default for this
>> health reporter, the crdump will be collected on fatal error of the mlx5
>> device and the recovery flow waits for it and run right after crdump
>> finished.
>> I agree with adding cond_resched(), but I would reduce the frequency,
>> like once in 1024 iterations of register read.
>> mlx5_vsc_wait_on_flag() is a bit different case as the usleep there is
>> after 16 retries waiting for the value to change.
>> Thanks.
> 
> Thanks for taking a look. Once in every 1024 iterations approximately
> translates to 35284.4ns * 1024 ~= 36.1ms, which is relatively long time
> IMO. How about any power-of-two <= 128 (~4.51ms)?

Such tune-up would matter for interactive use of the machine with very
little cores, is that the case? Otherwise I see no point [to make it
overall a little slower, as that is the tradeoff].

Moshe Shemesh Aug. 23, 2024, 5:16 a.m. UTC | #6

On 8/23/2024 7:08 AM, Przemek Kitszel wrote:
> 
> On 8/22/24 19:08, Mohamed Khalfella wrote:
>> On 2024-08-22 09:40:21 +0300, Moshe Shemesh wrote:
>>>
>>>
>>> On 8/21/2024 1:27 AM, Mohamed Khalfella wrote:
>>>>
>>>> On 2024-08-20 12:09:37 +0200, Przemek Kitszel wrote:
>>>>> On 8/19/24 23:42, Mohamed Khalfella wrote:
> 
> 
>>>>
>>>> Putting a cond_resched() every 16 register reads, similar to
>>>> mlx5_vsc_wait_on_flag(), should be okay. With the numbers above, this
>>>> will result in cond_resched() every ~0.56ms, which is okay IMO.
>>>
>>> Sorry for the late response, I just got back from vacation.
>>> All your measures looks right.
>>> crdump is the devlink health dump of mlx5 FW fatal health reporter.
>>> In the common case since auto-dump and auto-recover are default for this
>>> health reporter, the crdump will be collected on fatal error of the mlx5
>>> device and the recovery flow waits for it and run right after crdump
>>> finished.
>>> I agree with adding cond_resched(), but I would reduce the frequency,
>>> like once in 1024 iterations of register read.
>>> mlx5_vsc_wait_on_flag() is a bit different case as the usleep there is
>>> after 16 retries waiting for the value to change.
>>> Thanks.
>>
>> Thanks for taking a look. Once in every 1024 iterations approximately
>> translates to 35284.4ns * 1024 ~= 36.1ms, which is relatively long time
>> IMO. How about any power-of-two <= 128 (~4.51ms)?

OK
> 
> Such tune-up would matter for interactive use of the machine with very
> little cores, is that the case? Otherwise I see no point [to make it
> overall a little slower, as that is the tradeoff].

Yes, as I see it, the point here is host with very few cores.

Mohamed Khalfella Aug. 23, 2024, 5:41 p.m. UTC | #7

On 2024-08-23 08:16:32 +0300, Moshe Shemesh wrote:
> 
> 
> On 8/23/2024 7:08 AM, Przemek Kitszel wrote:
> > 
> > On 8/22/24 19:08, Mohamed Khalfella wrote:
> >> On 2024-08-22 09:40:21 +0300, Moshe Shemesh wrote:
> >>>
> >>>
> >>> On 8/21/2024 1:27 AM, Mohamed Khalfella wrote:
> >>>>
> >>>> On 2024-08-20 12:09:37 +0200, Przemek Kitszel wrote:
> >>>>> On 8/19/24 23:42, Mohamed Khalfella wrote:
> > 
> > 
> >>>>
> >>>> Putting a cond_resched() every 16 register reads, similar to
> >>>> mlx5_vsc_wait_on_flag(), should be okay. With the numbers above, this
> >>>> will result in cond_resched() every ~0.56ms, which is okay IMO.
> >>>
> >>> Sorry for the late response, I just got back from vacation.
> >>> All your measures looks right.
> >>> crdump is the devlink health dump of mlx5 FW fatal health reporter.
> >>> In the common case since auto-dump and auto-recover are default for this
> >>> health reporter, the crdump will be collected on fatal error of the mlx5
> >>> device and the recovery flow waits for it and run right after crdump
> >>> finished.
> >>> I agree with adding cond_resched(), but I would reduce the frequency,
> >>> like once in 1024 iterations of register read.
> >>> mlx5_vsc_wait_on_flag() is a bit different case as the usleep there is
> >>> after 16 retries waiting for the value to change.
> >>> Thanks.
> >>
> >> Thanks for taking a look. Once in every 1024 iterations approximately
> >> translates to 35284.4ns * 1024 ~= 36.1ms, which is relatively long time
> >> IMO. How about any power-of-two <= 128 (~4.51ms)?
> 
> OK
> > 
> > Such tune-up would matter for interactive use of the machine with very
> > little cores, is that the case? Otherwise I see no point [to make it
> > overall a little slower, as that is the tradeoff].
> 
> Yes, as I see it, the point here is host with very few cores.

It should make a difference for systems with few cores. Add to that the
numbers above is what I was able to get from the lab. It has been seen
in the field that collecting crdump takes more than 5 seconds causing
issues. If this makes sense I will submit v2 with the updated commit
message and cond_resched() every 128 iterations of register read.

Moshe Shemesh Aug. 25, 2024, 5:11 a.m. UTC | #8

On 8/23/2024 8:41 PM, Mohamed Khalfella wrote:
> 
> On 2024-08-23 08:16:32 +0300, Moshe Shemesh wrote:
>>
>>
>> On 8/23/2024 7:08 AM, Przemek Kitszel wrote:
>>>
>>> On 8/22/24 19:08, Mohamed Khalfella wrote:
>>>> On 2024-08-22 09:40:21 +0300, Moshe Shemesh wrote:
>>>>>
>>>>>
>>>>> On 8/21/2024 1:27 AM, Mohamed Khalfella wrote:
>>>>>>
>>>>>> On 2024-08-20 12:09:37 +0200, Przemek Kitszel wrote:
>>>>>>> On 8/19/24 23:42, Mohamed Khalfella wrote:
>>>
>>>
>>>>>>
>>>>>> Putting a cond_resched() every 16 register reads, similar to
>>>>>> mlx5_vsc_wait_on_flag(), should be okay. With the numbers above, this
>>>>>> will result in cond_resched() every ~0.56ms, which is okay IMO.
>>>>>
>>>>> Sorry for the late response, I just got back from vacation.
>>>>> All your measures looks right.
>>>>> crdump is the devlink health dump of mlx5 FW fatal health reporter.
>>>>> In the common case since auto-dump and auto-recover are default for this
>>>>> health reporter, the crdump will be collected on fatal error of the mlx5
>>>>> device and the recovery flow waits for it and run right after crdump
>>>>> finished.
>>>>> I agree with adding cond_resched(), but I would reduce the frequency,
>>>>> like once in 1024 iterations of register read.
>>>>> mlx5_vsc_wait_on_flag() is a bit different case as the usleep there is
>>>>> after 16 retries waiting for the value to change.
>>>>> Thanks.
>>>>
>>>> Thanks for taking a look. Once in every 1024 iterations approximately
>>>> translates to 35284.4ns * 1024 ~= 36.1ms, which is relatively long time
>>>> IMO. How about any power-of-two <= 128 (~4.51ms)?
>>
>> OK
>>>
>>> Such tune-up would matter for interactive use of the machine with very
>>> little cores, is that the case? Otherwise I see no point [to make it
>>> overall a little slower, as that is the tradeoff].
>>
>> Yes, as I see it, the point here is host with very few cores.
> 
> It should make a difference for systems with few cores. Add to that the
> numbers above is what I was able to get from the lab. It has been seen
> in the field that collecting crdump takes more than 5 seconds causing
> issues. If this makes sense I will submit v2 with the updated commit
> message and cond_resched() every 128 iterations of register read.

Fine with me.
Thanks.

net/mlx5: Added cond_resched() to crdump collection

Checks

Commit Message

Comments

Patch