Message ID | 20240829213856.77619-2-mkhalfella@purestorage.com (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
Series | net/mlx5: Added cond_resched() to crdump collection | expand |
From: Mohamed Khalfella <mkhalfella@purestorage.com> Date: Thu, 29 Aug 2024 15:38:56 -0600 > Collecting crdump involves reading vsc registers from pci config space > of mlx device, which can take long time to complete. This might result > in starving other threads waiting to run on the cpu. > > Numbers I got from testing ConnectX-5 Ex MCX516A-CDAT in the lab: > > - mlx5_vsc_gw_read_block_fast() was called with length = 1310716. > - mlx5_vsc_gw_read_fast() reads 4 bytes at a time. It was not used to > read the entire 1310716 bytes. It was called 53813 times because > there are jumps in read_addr. > - On average mlx5_vsc_gw_read_fast() took 35284.4ns. > - In total mlx5_vsc_wait_on_flag() called vsc_read() 54707 times. > The average time for each call was 17548.3ns. In some instances > vsc_read() was called more than one time when the flag was not set. > As expected the thread released the cpu after 16 iterations in > mlx5_vsc_wait_on_flag(). > - Total time to read crdump was 35284.4ns * 53813 ~= 1.898s. > > It was seen in the field that crdump can take more than 5 seconds to > complete. During that time mlx5_vsc_wait_on_flag() did not release the > cpu because it did not complete 16 iterations. It is believed that pci > config reads were slow. This change adds conditional reschedule call > every 128 register read to release the cpu if needed. > > Reviewed-by: Yuanyuan Zhong <yzhong@purestorage.com> > Signed-off-by: Mohamed Khalfella <mkhalfella@purestorage.com> > --- > drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c > index 6b774e0c2766..bc6c38a68702 100644 > --- a/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c > +++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c > @@ -269,6 +269,7 @@ int mlx5_vsc_gw_read_block_fast(struct mlx5_core_dev *dev, u32 *data, > { > unsigned int next_read_addr = 0; > unsigned int read_addr = 0; > + unsigned int count = 0; > > while (read_addr < length) { > if (mlx5_vsc_gw_read_fast(dev, read_addr, &next_read_addr, > @@ -276,6 +277,9 @@ int mlx5_vsc_gw_read_block_fast(struct mlx5_core_dev *dev, u32 *data, > return read_addr; > > read_addr = next_read_addr; > + /* Yield the cpu every 128 register read */ > + if ((++count & 0x7f) == 0) > + cond_resched(); Why & 0x7f, could it be written more clearly? if (++count == 128) { cond_resched(); count = 0; } Also, I'd make this open-coded value a #define somewhere at the beginning of the file with a comment with a short explanation. BTW, why 128? Not 64, not 256 etc? You just picked it, I don't see any explanation in the commitmsg or here in the code why exactly 128. Have you tried different values? > } > return length; > } Thanks, Olek
On 2024-08-30 15:07:45 +0200, Alexander Lobakin wrote: > From: Mohamed Khalfella <mkhalfella@purestorage.com> > Date: Thu, 29 Aug 2024 15:38:56 -0600 > > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c > > index 6b774e0c2766..bc6c38a68702 100644 > > --- a/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c > > +++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c > > @@ -269,6 +269,7 @@ int mlx5_vsc_gw_read_block_fast(struct mlx5_core_dev *dev, u32 *data, > > { > > unsigned int next_read_addr = 0; > > unsigned int read_addr = 0; > > + unsigned int count = 0; > > > > while (read_addr < length) { > > if (mlx5_vsc_gw_read_fast(dev, read_addr, &next_read_addr, > > @@ -276,6 +277,9 @@ int mlx5_vsc_gw_read_block_fast(struct mlx5_core_dev *dev, u32 *data, > > return read_addr; > > > > read_addr = next_read_addr; > > + /* Yield the cpu every 128 register read */ > > + if ((++count & 0x7f) == 0) > > + cond_resched(); > > Why & 0x7f, could it be written more clearly? > > if (++count == 128) { > cond_resched(); > count = 0; > } > > Also, I'd make this open-coded value a #define somewhere at the > beginning of the file with a comment with a short explanation. What you are suggesting should work also. I copied the style from mlx5_vsc_wait_on_flag() to keep the code consistent. The comment above the line should make it clear. > > BTW, why 128? Not 64, not 256 etc? You just picked it, I don't see any > explanation in the commitmsg or here in the code why exactly 128. Have > you tried different values? This mostly subjective. For the numbers I saw in the lab, this will release the cpu after ~4.51ms. If crdump takes ~5s, the code should release the cpu after ~18.0ms. These numbers look reasonable to me.
From: Mohamed Khalfella <mkhalfella@purestorage.com> Date: Fri, 30 Aug 2024 11:01:19 -0700 > On 2024-08-30 15:07:45 +0200, Alexander Lobakin wrote: >> From: Mohamed Khalfella <mkhalfella@purestorage.com> >> Date: Thu, 29 Aug 2024 15:38:56 -0600 >>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c >>> index 6b774e0c2766..bc6c38a68702 100644 >>> --- a/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c >>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c >>> @@ -269,6 +269,7 @@ int mlx5_vsc_gw_read_block_fast(struct mlx5_core_dev *dev, u32 *data, >>> { >>> unsigned int next_read_addr = 0; >>> unsigned int read_addr = 0; >>> + unsigned int count = 0; >>> >>> while (read_addr < length) { >>> if (mlx5_vsc_gw_read_fast(dev, read_addr, &next_read_addr, >>> @@ -276,6 +277,9 @@ int mlx5_vsc_gw_read_block_fast(struct mlx5_core_dev *dev, u32 *data, >>> return read_addr; >>> >>> read_addr = next_read_addr; >>> + /* Yield the cpu every 128 register read */ >>> + if ((++count & 0x7f) == 0) >>> + cond_resched(); >> >> Why & 0x7f, could it be written more clearly? >> >> if (++count == 128) { >> cond_resched(); >> count = 0; >> } >> >> Also, I'd make this open-coded value a #define somewhere at the >> beginning of the file with a comment with a short explanation. This is still valid. > > What you are suggesting should work also. I copied the style from > mlx5_vsc_wait_on_flag() to keep the code consistent. The comment above > the line should make it clear. I just don't see a reason to make the code less readable. > >> >> BTW, why 128? Not 64, not 256 etc? You just picked it, I don't see any >> explanation in the commitmsg or here in the code why exactly 128. Have >> you tried different values? > > This mostly subjective. For the numbers I saw in the lab, this will > release the cpu after ~4.51ms. If crdump takes ~5s, the code should > release the cpu after ~18.0ms. These numbers look reasonable to me. So just mention in the commit message that you tried different values and 128 gave you the best results. Thanks, Olek
On 2024-09-03 14:14:58 +0200, Alexander Lobakin wrote: > From: Mohamed Khalfella <mkhalfella@purestorage.com> > Date: Fri, 30 Aug 2024 11:01:19 -0700 > > > On 2024-08-30 15:07:45 +0200, Alexander Lobakin wrote: > >> From: Mohamed Khalfella <mkhalfella@purestorage.com> > >> Date: Thu, 29 Aug 2024 15:38:56 -0600 > >>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c > >>> index 6b774e0c2766..bc6c38a68702 100644 > >>> --- a/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c > >>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c > >>> @@ -269,6 +269,7 @@ int mlx5_vsc_gw_read_block_fast(struct mlx5_core_dev *dev, u32 *data, > >>> { > >>> unsigned int next_read_addr = 0; > >>> unsigned int read_addr = 0; > >>> + unsigned int count = 0; > >>> > >>> while (read_addr < length) { > >>> if (mlx5_vsc_gw_read_fast(dev, read_addr, &next_read_addr, > >>> @@ -276,6 +277,9 @@ int mlx5_vsc_gw_read_block_fast(struct mlx5_core_dev *dev, u32 *data, > >>> return read_addr; > >>> > >>> read_addr = next_read_addr; > >>> + /* Yield the cpu every 128 register read */ > >>> + if ((++count & 0x7f) == 0) > >>> + cond_resched(); > >> > >> Why & 0x7f, could it be written more clearly? > >> > >> if (++count == 128) { > >> cond_resched(); > >> count = 0; > >> } > >> > >> Also, I'd make this open-coded value a #define somewhere at the > >> beginning of the file with a comment with a short explanation. > > This is still valid. Done. See <1>. > > > > > What you are suggesting should work also. I copied the style from > > mlx5_vsc_wait_on_flag() to keep the code consistent. The comment above > > the line should make it clear. > > I just don't see a reason to make the code less readable. <1> Now I am looking at mlx5_vsc_wait_on_flag() again, I realized the code does not want to reset retries to 0 because it needs to check when it reaches VSC_MAX_RETRIES. This is not the case here. I will update the code as suggested. > > > > >> > >> BTW, why 128? Not 64, not 256 etc? You just picked it, I don't see any > >> explanation in the commitmsg or here in the code why exactly 128. Have > >> you tried different values? > > > > This mostly subjective. For the numbers I saw in the lab, this will > > release the cpu after ~4.51ms. If crdump takes ~5s, the code should > > release the cpu after ~18.0ms. These numbers look reasonable to me. > > So just mention in the commit message that you tried different values > and 128 gave you the best results. I will update the commit message in v3.
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c index 6b774e0c2766..bc6c38a68702 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c @@ -269,6 +269,7 @@ int mlx5_vsc_gw_read_block_fast(struct mlx5_core_dev *dev, u32 *data, { unsigned int next_read_addr = 0; unsigned int read_addr = 0; + unsigned int count = 0; while (read_addr < length) { if (mlx5_vsc_gw_read_fast(dev, read_addr, &next_read_addr, @@ -276,6 +277,9 @@ int mlx5_vsc_gw_read_block_fast(struct mlx5_core_dev *dev, u32 *data, return read_addr; read_addr = next_read_addr; + /* Yield the cpu every 128 register read */ + if ((++count & 0x7f) == 0) + cond_resched(); } return length; }