diff mbox series

[RFC,v1,2/3] i40e: limit msix vectors based on housekeeping CPUs

Message ID 20200909150818.313699-3-nitesh@redhat.com
State Superseded
Headers show
Series isolation: limit msix vectors based on housekeeping CPUs | expand

Commit Message

Nitesh Narayan Lal Sept. 9, 2020, 3:08 p.m. UTC
In a realtime environment, it is essential to isolate unwanted IRQs from
isolated CPUs to prevent latency overheads. Creating MSIX vectors only
based on the online CPUs could lead to a potential issue on an RT setup
that has several isolated CPUs but a very few housekeeping CPUs. This is
because in these kinds of setups an attempt to move the IRQs to the
limited housekeeping CPUs from isolated CPUs might fail due to the per
CPU vector limit. This could eventually result in latency spikes because
of the IRQ threads that we fail to move from isolated CPUs.

This patch prevents i40e to add vectors only based on available
housekeeping CPUs by using num_housekeeping_cpus().

Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Marcelo Tosatti Sept. 11, 2020, 3:23 p.m. UTC | #1
On Wed, Sep 09, 2020 at 11:08:17AM -0400, Nitesh Narayan Lal wrote:
> In a realtime environment, it is essential to isolate unwanted IRQs from
> isolated CPUs to prevent latency overheads. Creating MSIX vectors only
> based on the online CPUs could lead to a potential issue on an RT setup
> that has several isolated CPUs but a very few housekeeping CPUs. This is
> because in these kinds of setups an attempt to move the IRQs to the
> limited housekeeping CPUs from isolated CPUs might fail due to the per
> CPU vector limit. This could eventually result in latency spikes because
> of the IRQ threads that we fail to move from isolated CPUs.
> 
> This patch prevents i40e to add vectors only based on available
> housekeeping CPUs by using num_housekeeping_cpus().
> 
> Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
> ---
>  drivers/net/ethernet/intel/i40e/i40e_main.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
> index 2e433fdbf2c3..3b4cd4b3de85 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_main.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
> @@ -5,6 +5,7 @@
>  #include <linux/of_net.h>
>  #include <linux/pci.h>
>  #include <linux/bpf.h>
> +#include <linux/sched/isolation.h>
>  #include <generated/utsrelease.h>
>  
>  /* Local includes */
> @@ -11002,7 +11003,7 @@ static int i40e_init_msix(struct i40e_pf *pf)
>  	 * will use any remaining vectors to reach as close as we can to the
>  	 * number of online CPUs.
>  	 */
> -	cpus = num_online_cpus();
> +	cpus = num_housekeeping_cpus();
>  	pf->num_lan_msix = min_t(int, cpus, vectors_left / 2);
>  	vectors_left -= pf->num_lan_msix;
>  
> -- 
> 2.27.0

For patches 1 and 2:

Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Jesse Brandeburg Sept. 17, 2020, 6:23 p.m. UTC | #2
Nitesh Narayan Lal wrote:

> In a realtime environment, it is essential to isolate unwanted IRQs from
> isolated CPUs to prevent latency overheads. Creating MSIX vectors only
> based on the online CPUs could lead to a potential issue on an RT setup
> that has several isolated CPUs but a very few housekeeping CPUs. This is
> because in these kinds of setups an attempt to move the IRQs to the
> limited housekeeping CPUs from isolated CPUs might fail due to the per
> CPU vector limit. This could eventually result in latency spikes because
> of the IRQ threads that we fail to move from isolated CPUs.
> 
> This patch prevents i40e to add vectors only based on available
> housekeeping CPUs by using num_housekeeping_cpus().
> 
> Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>

The driver changes are straightforward, but this isn't the only driver
with this issue, right?  I'm sure ixgbe and ice both have this problem
too, you should fix them as well, at a minimum, and probably other
vendors drivers:

$ rg -c --stats num_online_cpus drivers/net/ethernet
...
50 files contained matches

for this patch i40e
Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Nitesh Narayan Lal Sept. 17, 2020, 6:31 p.m. UTC | #3
On 9/17/20 2:23 PM, Jesse Brandeburg wrote:
> Nitesh Narayan Lal wrote:
>
>> In a realtime environment, it is essential to isolate unwanted IRQs from
>> isolated CPUs to prevent latency overheads. Creating MSIX vectors only
>> based on the online CPUs could lead to a potential issue on an RT setup
>> that has several isolated CPUs but a very few housekeeping CPUs. This is
>> because in these kinds of setups an attempt to move the IRQs to the
>> limited housekeeping CPUs from isolated CPUs might fail due to the per
>> CPU vector limit. This could eventually result in latency spikes because
>> of the IRQ threads that we fail to move from isolated CPUs.
>>
>> This patch prevents i40e to add vectors only based on available
>> housekeeping CPUs by using num_housekeeping_cpus().
>>
>> Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
> The driver changes are straightforward, but this isn't the only driver
> with this issue, right?

Indeed, I was hoping to modify them over the time with some testing.

>   I'm sure ixgbe and ice both have this problem
> too, you should fix them as well, at a minimum, and probably other
> vendors drivers:

Sure, I can atleast include ixgbe and ice in the next posting if that makes
sense.
The reason I skipped them is that I was not very sure about the right way to
test these changes.

>
> $ rg -c --stats num_online_cpus drivers/net/ethernet
> ...
> 50 files contained matches
>
> for this patch i40e
> Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
>
Frederic Weisbecker Sept. 21, 2020, 10:58 p.m. UTC | #4
On Thu, Sep 17, 2020 at 11:23:59AM -0700, Jesse Brandeburg wrote:
> Nitesh Narayan Lal wrote:
> 
> > In a realtime environment, it is essential to isolate unwanted IRQs from
> > isolated CPUs to prevent latency overheads. Creating MSIX vectors only
> > based on the online CPUs could lead to a potential issue on an RT setup
> > that has several isolated CPUs but a very few housekeeping CPUs. This is
> > because in these kinds of setups an attempt to move the IRQs to the
> > limited housekeeping CPUs from isolated CPUs might fail due to the per
> > CPU vector limit. This could eventually result in latency spikes because
> > of the IRQ threads that we fail to move from isolated CPUs.
> > 
> > This patch prevents i40e to add vectors only based on available
> > housekeeping CPUs by using num_housekeeping_cpus().
> > 
> > Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
> 
> The driver changes are straightforward, but this isn't the only driver
> with this issue, right?  I'm sure ixgbe and ice both have this problem
> too, you should fix them as well, at a minimum, and probably other
> vendors drivers:
> 
> $ rg -c --stats num_online_cpus drivers/net/ethernet
> ...
> 50 files contained matches

Ouch, I was indeed surprised that these MSI vector allocations were done
at the driver level and not at some $SUBSYSTEM level.

The logic is already there in the driver so I wouldn't oppose to this very patch
but would a shared infrastructure make sense for this? Something that would
also handle hotplug operations?

Does it possibly go even beyond networking drivers?

Thanks.

> 
> for this patch i40e
> Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Nitesh Narayan Lal Sept. 22, 2020, 3:08 a.m. UTC | #5
On 9/21/20 6:58 PM, Frederic Weisbecker wrote:
> On Thu, Sep 17, 2020 at 11:23:59AM -0700, Jesse Brandeburg wrote:
>> Nitesh Narayan Lal wrote:
>>
>>> In a realtime environment, it is essential to isolate unwanted IRQs from
>>> isolated CPUs to prevent latency overheads. Creating MSIX vectors only
>>> based on the online CPUs could lead to a potential issue on an RT setup
>>> that has several isolated CPUs but a very few housekeeping CPUs. This is
>>> because in these kinds of setups an attempt to move the IRQs to the
>>> limited housekeeping CPUs from isolated CPUs might fail due to the per
>>> CPU vector limit. This could eventually result in latency spikes because
>>> of the IRQ threads that we fail to move from isolated CPUs.
>>>
>>> This patch prevents i40e to add vectors only based on available
>>> housekeeping CPUs by using num_housekeeping_cpus().
>>>
>>> Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
>> The driver changes are straightforward, but this isn't the only driver
>> with this issue, right?  I'm sure ixgbe and ice both have this problem
>> too, you should fix them as well, at a minimum, and probably other
>> vendors drivers:
>>
>> $ rg -c --stats num_online_cpus drivers/net/ethernet
>> ...
>> 50 files contained matches
> Ouch, I was indeed surprised that these MSI vector allocations were done
> at the driver level and not at some $SUBSYSTEM level.
>
> The logic is already there in the driver so I wouldn't oppose to this very patch
> but would a shared infrastructure make sense for this? Something that would
> also handle hotplug operations?
>
> Does it possibly go even beyond networking drivers?

From a generic solution perspective, I think it makes sense to come up with a
shared infrastructure.
Something that can be consumed by all the drivers and maybe hotplug operations
as well (I will have to further explore the hotplug part).

However, there are RT workloads that are getting affected because of this
issue, so does it make sense to go ahead with this per-driver basis approach
for now?

Since a generic solution will require a fair amount of testing and
understanding of different drivers. Having said that, I can definetly start
looking in that direction.

Thanks for reviewing.

> Thanks.
>
>> for this patch i40e
>> Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Frederic Weisbecker Sept. 22, 2020, 9:54 a.m. UTC | #6
On Mon, Sep 21, 2020 at 11:08:20PM -0400, Nitesh Narayan Lal wrote:
> 
> On 9/21/20 6:58 PM, Frederic Weisbecker wrote:
> > On Thu, Sep 17, 2020 at 11:23:59AM -0700, Jesse Brandeburg wrote:
> >> Nitesh Narayan Lal wrote:
> >>
> >>> In a realtime environment, it is essential to isolate unwanted IRQs from
> >>> isolated CPUs to prevent latency overheads. Creating MSIX vectors only
> >>> based on the online CPUs could lead to a potential issue on an RT setup
> >>> that has several isolated CPUs but a very few housekeeping CPUs. This is
> >>> because in these kinds of setups an attempt to move the IRQs to the
> >>> limited housekeeping CPUs from isolated CPUs might fail due to the per
> >>> CPU vector limit. This could eventually result in latency spikes because
> >>> of the IRQ threads that we fail to move from isolated CPUs.
> >>>
> >>> This patch prevents i40e to add vectors only based on available
> >>> housekeeping CPUs by using num_housekeeping_cpus().
> >>>
> >>> Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
> >> The driver changes are straightforward, but this isn't the only driver
> >> with this issue, right?  I'm sure ixgbe and ice both have this problem
> >> too, you should fix them as well, at a minimum, and probably other
> >> vendors drivers:
> >>
> >> $ rg -c --stats num_online_cpus drivers/net/ethernet
> >> ...
> >> 50 files contained matches
> > Ouch, I was indeed surprised that these MSI vector allocations were done
> > at the driver level and not at some $SUBSYSTEM level.
> >
> > The logic is already there in the driver so I wouldn't oppose to this very patch
> > but would a shared infrastructure make sense for this? Something that would
> > also handle hotplug operations?
> >
> > Does it possibly go even beyond networking drivers?
> 
> From a generic solution perspective, I think it makes sense to come up with a
> shared infrastructure.
> Something that can be consumed by all the drivers and maybe hotplug operations
> as well (I will have to further explore the hotplug part).

That would be great. I'm completely clueless about those MSI things and the
actual needs of those drivers. Now it seems to me that if several CPUs become
offline, or as is planned in the future, CPU isolation gets enabled/disabled
through cpuset, then the vectors may need some reorganization.

But I don't also want to push toward a complicated solution to handle CPU hotplug
if there is no actual problem to solve there. So I let you guys judge.

> However, there are RT workloads that are getting affected because of this
> issue, so does it make sense to go ahead with this per-driver basis approach
> for now?

Yep that sounds good.

> 
> Since a generic solution will require a fair amount of testing and
> understanding of different drivers. Having said that, I can definetly start
> looking in that direction.

Thanks a lot!
Nitesh Narayan Lal Sept. 22, 2020, 1:34 p.m. UTC | #7
On 9/22/20 5:54 AM, Frederic Weisbecker wrote:
> On Mon, Sep 21, 2020 at 11:08:20PM -0400, Nitesh Narayan Lal wrote:
>> On 9/21/20 6:58 PM, Frederic Weisbecker wrote:
>>> On Thu, Sep 17, 2020 at 11:23:59AM -0700, Jesse Brandeburg wrote:
>>>> Nitesh Narayan Lal wrote:
>>>>
>>>>> In a realtime environment, it is essential to isolate unwanted IRQs from
>>>>> isolated CPUs to prevent latency overheads. Creating MSIX vectors only
>>>>> based on the online CPUs could lead to a potential issue on an RT setup
>>>>> that has several isolated CPUs but a very few housekeeping CPUs. This is
>>>>> because in these kinds of setups an attempt to move the IRQs to the
>>>>> limited housekeeping CPUs from isolated CPUs might fail due to the per
>>>>> CPU vector limit. This could eventually result in latency spikes because
>>>>> of the IRQ threads that we fail to move from isolated CPUs.
>>>>>
>>>>> This patch prevents i40e to add vectors only based on available
>>>>> housekeeping CPUs by using num_housekeeping_cpus().
>>>>>
>>>>> Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
>>>> The driver changes are straightforward, but this isn't the only driver
>>>> with this issue, right?  I'm sure ixgbe and ice both have this problem
>>>> too, you should fix them as well, at a minimum, and probably other
>>>> vendors drivers:
>>>>
>>>> $ rg -c --stats num_online_cpus drivers/net/ethernet
>>>> ...
>>>> 50 files contained matches
>>> Ouch, I was indeed surprised that these MSI vector allocations were done
>>> at the driver level and not at some $SUBSYSTEM level.
>>>
>>> The logic is already there in the driver so I wouldn't oppose to this very patch
>>> but would a shared infrastructure make sense for this? Something that would
>>> also handle hotplug operations?
>>>
>>> Does it possibly go even beyond networking drivers?
>> From a generic solution perspective, I think it makes sense to come up with a
>> shared infrastructure.
>> Something that can be consumed by all the drivers and maybe hotplug operations
>> as well (I will have to further explore the hotplug part).
> That would be great. I'm completely clueless about those MSI things and the
> actual needs of those drivers. Now it seems to me that if several CPUs become
> offline, or as is planned in the future, CPU isolation gets enabled/disabled
> through cpuset, then the vectors may need some reorganization.

+1

>
> But I don't also want to push toward a complicated solution to handle CPU hotplug
> if there is no actual problem to solve there.

Sure, even I am not particularly sure about the hotplug scenarios.

>  So I let you guys judge.
>
>> However, there are RT workloads that are getting affected because of this
>> issue, so does it make sense to go ahead with this per-driver basis approach
>> for now?
> Yep that sounds good.

Thank you for confirming.

>
>> Since a generic solution will require a fair amount of testing and
>> understanding of different drivers. Having said that, I can definetly start
>> looking in that direction.
> Thanks a lot!
>
Frederic Weisbecker Sept. 22, 2020, 8:44 p.m. UTC | #8
On Tue, Sep 22, 2020 at 09:34:02AM -0400, Nitesh Narayan Lal wrote:
> On 9/22/20 5:54 AM, Frederic Weisbecker wrote:
> > But I don't also want to push toward a complicated solution to handle CPU hotplug
> > if there is no actual problem to solve there.
> 
> Sure, even I am not particularly sure about the hotplug scenarios.

Surely when isolation is something that will be configurable through
cpuset, it will become interesting. For now it's probably not worth
adding hundreds lines of code if nobody steps into such issue yet.

What is probably more worthy for now is some piece of code to consolidate
the allocation of those MSI vectors on top of the number of housekeeping
CPUs.

Thanks.
Nitesh Narayan Lal Sept. 22, 2020, 9:05 p.m. UTC | #9
On 9/22/20 4:44 PM, Frederic Weisbecker wrote:
> On Tue, Sep 22, 2020 at 09:34:02AM -0400, Nitesh Narayan Lal wrote:
>> On 9/22/20 5:54 AM, Frederic Weisbecker wrote:
>>> But I don't also want to push toward a complicated solution to handle CPU hotplug
>>> if there is no actual problem to solve there.
>> Sure, even I am not particularly sure about the hotplug scenarios.
> Surely when isolation is something that will be configurable through
> cpuset, it will become interesting. For now it's probably not worth
> adding hundreds lines of code if nobody steps into such issue yet.
>
> What is probably more worthy for now is some piece of code to consolidate
> the allocation of those MSI vectors on top of the number of housekeeping
> CPUs.

+1

>
> Thanks.
>
diff mbox series

Patch

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 2e433fdbf2c3..3b4cd4b3de85 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -5,6 +5,7 @@ 
 #include <linux/of_net.h>
 #include <linux/pci.h>
 #include <linux/bpf.h>
+#include <linux/sched/isolation.h>
 #include <generated/utsrelease.h>
 
 /* Local includes */
@@ -11002,7 +11003,7 @@  static int i40e_init_msix(struct i40e_pf *pf)
 	 * will use any remaining vectors to reach as close as we can to the
 	 * number of online CPUs.
 	 */
-	cpus = num_online_cpus();
+	cpus = num_housekeeping_cpus();
 	pf->num_lan_msix = min_t(int, cpus, vectors_left / 2);
 	vectors_left -= pf->num_lan_msix;