diff mbox series

[v2,1/4] sched/isolation: API to get housekeeping online CPUs

Message ID 20200923181126.223766-2-nitesh@redhat.com (mailing list archive)
State Not Applicable, archived
Headers show
Series isolation: limit msix vectors based on housekeeping CPUs | expand

Commit Message

Nitesh Narayan Lal Sept. 23, 2020, 6:11 p.m. UTC
Introduce a new API hk_num_online_cpus(), that can be used to
retrieve the number of online housekeeping CPUs that are meant to handle
managed IRQ jobs.

This API is introduced for the drivers that were previously relying only
on num_online_cpus() to determine the number of MSIX vectors to create.
In an RT environment with large isolated but fewer housekeeping CPUs this
was leading to a situation where an attempt to move all of the vectors
corresponding to isolated CPUs to housekeeping CPUs were failing due to
per CPU vector limit.

Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
---
 include/linux/sched/isolation.h | 13 +++++++++++++
 1 file changed, 13 insertions(+)

Comments

Peter Zijlstra Sept. 24, 2020, 8:40 a.m. UTC | #1
On Wed, Sep 23, 2020 at 02:11:23PM -0400, Nitesh Narayan Lal wrote:
> Introduce a new API hk_num_online_cpus(), that can be used to
> retrieve the number of online housekeeping CPUs that are meant to handle
> managed IRQ jobs.
> 
> This API is introduced for the drivers that were previously relying only
> on num_online_cpus() to determine the number of MSIX vectors to create.
> In an RT environment with large isolated but fewer housekeeping CPUs this
> was leading to a situation where an attempt to move all of the vectors
> corresponding to isolated CPUs to housekeeping CPUs were failing due to
> per CPU vector limit.
> 
> Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
> ---
>  include/linux/sched/isolation.h | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
> 
> diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolation.h
> index cc9f393e2a70..2e96b626e02e 100644
> --- a/include/linux/sched/isolation.h
> +++ b/include/linux/sched/isolation.h
> @@ -57,4 +57,17 @@ static inline bool housekeeping_cpu(int cpu, enum hk_flags flags)
>  	return true;
>  }
>  
> +static inline unsigned int hk_num_online_cpus(void)

This breaks with the established naming of that header.
Frederic Weisbecker Sept. 24, 2020, 12:09 p.m. UTC | #2
On Thu, Sep 24, 2020 at 10:40:29AM +0200, peterz@infradead.org wrote:
> On Wed, Sep 23, 2020 at 02:11:23PM -0400, Nitesh Narayan Lal wrote:
> > Introduce a new API hk_num_online_cpus(), that can be used to
> > retrieve the number of online housekeeping CPUs that are meant to handle
> > managed IRQ jobs.
> > 
> > This API is introduced for the drivers that were previously relying only
> > on num_online_cpus() to determine the number of MSIX vectors to create.
> > In an RT environment with large isolated but fewer housekeeping CPUs this
> > was leading to a situation where an attempt to move all of the vectors
> > corresponding to isolated CPUs to housekeeping CPUs were failing due to
> > per CPU vector limit.
> > 
> > Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
> > ---
> >  include/linux/sched/isolation.h | 13 +++++++++++++
> >  1 file changed, 13 insertions(+)
> > 
> > diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolation.h
> > index cc9f393e2a70..2e96b626e02e 100644
> > --- a/include/linux/sched/isolation.h
> > +++ b/include/linux/sched/isolation.h
> > @@ -57,4 +57,17 @@ static inline bool housekeeping_cpu(int cpu, enum hk_flags flags)
> >  	return true;
> >  }
> >  
> > +static inline unsigned int hk_num_online_cpus(void)
> 
> This breaks with the established naming of that header.

I guess we can make it housekeeping_num_online_cpus() ?
Frederic Weisbecker Sept. 24, 2020, 12:11 p.m. UTC | #3
On Wed, Sep 23, 2020 at 02:11:23PM -0400, Nitesh Narayan Lal wrote:
> Introduce a new API hk_num_online_cpus(), that can be used to
> retrieve the number of online housekeeping CPUs that are meant to handle
> managed IRQ jobs.
> 
> This API is introduced for the drivers that were previously relying only
> on num_online_cpus() to determine the number of MSIX vectors to create.
> In an RT environment with large isolated but fewer housekeeping CPUs this
> was leading to a situation where an attempt to move all of the vectors
> corresponding to isolated CPUs to housekeeping CPUs were failing due to
> per CPU vector limit.
> 
> Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
> ---
>  include/linux/sched/isolation.h | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
> 
> diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolation.h
> index cc9f393e2a70..2e96b626e02e 100644
> --- a/include/linux/sched/isolation.h
> +++ b/include/linux/sched/isolation.h
> @@ -57,4 +57,17 @@ static inline bool housekeeping_cpu(int cpu, enum hk_flags flags)
>  	return true;
>  }
>  
> +static inline unsigned int hk_num_online_cpus(void)
> +{
> +#ifdef CONFIG_CPU_ISOLATION
> +	const struct cpumask *hk_mask;
> +
> +	if (static_branch_unlikely(&housekeeping_overridden)) {
> +		hk_mask = housekeeping_cpumask(HK_FLAG_MANAGED_IRQ);

HK_FLAG_MANAGED_IRQ should be pass as an argument to the function:

housekeeping_num_online_cpus(HK_FLAG_MANAGED_IRQ) because it's
completely arbitrary otherwise.

> +		return cpumask_weight(hk_mask);
> +	}
> +#endif
> +	return cpumask_weight(cpu_online_mask);
> +}
> +
>  #endif /* _LINUX_SCHED_ISOLATION_H */
> -- 
> 2.18.2
>
Nitesh Narayan Lal Sept. 24, 2020, 12:23 p.m. UTC | #4
On 9/24/20 8:09 AM, Frederic Weisbecker wrote:
> On Thu, Sep 24, 2020 at 10:40:29AM +0200, peterz@infradead.org wrote:
>> On Wed, Sep 23, 2020 at 02:11:23PM -0400, Nitesh Narayan Lal wrote:
>>> Introduce a new API hk_num_online_cpus(), that can be used to
>>> retrieve the number of online housekeeping CPUs that are meant to handle
>>> managed IRQ jobs.
>>>
>>> This API is introduced for the drivers that were previously relying only
>>> on num_online_cpus() to determine the number of MSIX vectors to create.
>>> In an RT environment with large isolated but fewer housekeeping CPUs this
>>> was leading to a situation where an attempt to move all of the vectors
>>> corresponding to isolated CPUs to housekeeping CPUs were failing due to
>>> per CPU vector limit.
>>>
>>> Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
>>> ---
>>>  include/linux/sched/isolation.h | 13 +++++++++++++
>>>  1 file changed, 13 insertions(+)
>>>
>>> diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolation.h
>>> index cc9f393e2a70..2e96b626e02e 100644
>>> --- a/include/linux/sched/isolation.h
>>> +++ b/include/linux/sched/isolation.h
>>> @@ -57,4 +57,17 @@ static inline bool housekeeping_cpu(int cpu, enum hk_flags flags)
>>>  	return true;
>>>  }
>>>  
>>> +static inline unsigned int hk_num_online_cpus(void)
>> This breaks with the established naming of that header.
> I guess we can make it housekeeping_num_online_cpus() ?

Right, I can do that.

Thanks

>
Peter Zijlstra Sept. 24, 2020, 12:24 p.m. UTC | #5
On Thu, Sep 24, 2020 at 02:09:57PM +0200, Frederic Weisbecker wrote:

> > > +static inline unsigned int hk_num_online_cpus(void)
> > 
> > This breaks with the established naming of that header.
> 
> I guess we can make it housekeeping_num_online_cpus() ?

That would be consistent :-)
Nitesh Narayan Lal Sept. 24, 2020, 12:26 p.m. UTC | #6
On 9/24/20 8:11 AM, Frederic Weisbecker wrote:
> On Wed, Sep 23, 2020 at 02:11:23PM -0400, Nitesh Narayan Lal wrote:
>> Introduce a new API hk_num_online_cpus(), that can be used to
>> retrieve the number of online housekeeping CPUs that are meant to handle
>> managed IRQ jobs.
>>
>> This API is introduced for the drivers that were previously relying only
>> on num_online_cpus() to determine the number of MSIX vectors to create.
>> In an RT environment with large isolated but fewer housekeeping CPUs this
>> was leading to a situation where an attempt to move all of the vectors
>> corresponding to isolated CPUs to housekeeping CPUs were failing due to
>> per CPU vector limit.
>>
>> Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
>> ---
>>  include/linux/sched/isolation.h | 13 +++++++++++++
>>  1 file changed, 13 insertions(+)
>>
>> diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolation.h
>> index cc9f393e2a70..2e96b626e02e 100644
>> --- a/include/linux/sched/isolation.h
>> +++ b/include/linux/sched/isolation.h
>> @@ -57,4 +57,17 @@ static inline bool housekeeping_cpu(int cpu, enum hk_flags flags)
>>  	return true;
>>  }
>>  
>> +static inline unsigned int hk_num_online_cpus(void)
>> +{
>> +#ifdef CONFIG_CPU_ISOLATION
>> +	const struct cpumask *hk_mask;
>> +
>> +	if (static_branch_unlikely(&housekeeping_overridden)) {
>> +		hk_mask = housekeeping_cpumask(HK_FLAG_MANAGED_IRQ);
> HK_FLAG_MANAGED_IRQ should be pass as an argument to the function:
>
> housekeeping_num_online_cpus(HK_FLAG_MANAGED_IRQ) because it's
> completely arbitrary otherwise.


Yeap that is more sensible, I will do that.
Do you have any other concerns/suggestions on any other patch?

>
>> +		return cpumask_weight(hk_mask);
>> +	}
>> +#endif
>> +	return cpumask_weight(cpu_online_mask);
>> +}
>> +
>>  #endif /* _LINUX_SCHED_ISOLATION_H */
>> -- 
>> 2.18.2
>>
Peter Zijlstra Sept. 24, 2020, 12:46 p.m. UTC | #7
FWIW, cross-posting to moderated lists is annoying. I don't know why we
allow them in MAINTAINERS :-(
Nitesh Narayan Lal Sept. 24, 2020, 1:45 p.m. UTC | #8
On 9/24/20 8:46 AM, Peter Zijlstra wrote:
>
> FWIW, cross-posting to moderated lists is annoying. I don't know why we
> allow them in MAINTAINERS :-(

Yeah, it sends out an acknowledgment for every email.
I had to include it because sending the patches to it apparently allows them
to get tested by Intel folks.

>
Bjorn Helgaas Sept. 24, 2020, 8:47 p.m. UTC | #9
On Wed, Sep 23, 2020 at 02:11:23PM -0400, Nitesh Narayan Lal wrote:
> Introduce a new API hk_num_online_cpus(), that can be used to
> retrieve the number of online housekeeping CPUs that are meant to handle
> managed IRQ jobs.
> 
> This API is introduced for the drivers that were previously relying only
> on num_online_cpus() to determine the number of MSIX vectors to create.
> In an RT environment with large isolated but fewer housekeeping CPUs this
> was leading to a situation where an attempt to move all of the vectors
> corresponding to isolated CPUs to housekeeping CPUs were failing due to
> per CPU vector limit.
> 
> Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
> ---
>  include/linux/sched/isolation.h | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
> 
> diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolation.h
> index cc9f393e2a70..2e96b626e02e 100644
> --- a/include/linux/sched/isolation.h
> +++ b/include/linux/sched/isolation.h
> @@ -57,4 +57,17 @@ static inline bool housekeeping_cpu(int cpu, enum hk_flags flags)
>  	return true;
>  }
>  
> +static inline unsigned int hk_num_online_cpus(void)
> +{
> +#ifdef CONFIG_CPU_ISOLATION
> +	const struct cpumask *hk_mask;
> +
> +	if (static_branch_unlikely(&housekeeping_overridden)) {
> +		hk_mask = housekeeping_cpumask(HK_FLAG_MANAGED_IRQ);
> +		return cpumask_weight(hk_mask);
> +	}
> +#endif
> +	return cpumask_weight(cpu_online_mask);

Just curious: why is this not

  #ifdef CONFIG_CPU_ISOLATION
  ...
  #endif
    return num_online_cpus();

> +}
> +
>  #endif /* _LINUX_SCHED_ISOLATION_H */
> -- 
> 2.18.2
>
Nitesh Narayan Lal Sept. 24, 2020, 9:52 p.m. UTC | #10
On 9/24/20 4:47 PM, Bjorn Helgaas wrote:
> On Wed, Sep 23, 2020 at 02:11:23PM -0400, Nitesh Narayan Lal wrote:
>> Introduce a new API hk_num_online_cpus(), that can be used to
>> retrieve the number of online housekeeping CPUs that are meant to handle
>> managed IRQ jobs.
>>
>> This API is introduced for the drivers that were previously relying only
>> on num_online_cpus() to determine the number of MSIX vectors to create.
>> In an RT environment with large isolated but fewer housekeeping CPUs this
>> was leading to a situation where an attempt to move all of the vectors
>> corresponding to isolated CPUs to housekeeping CPUs were failing due to
>> per CPU vector limit.
>>
>> Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
>> ---
>>  include/linux/sched/isolation.h | 13 +++++++++++++
>>  1 file changed, 13 insertions(+)
>>
>> diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolation.h
>> index cc9f393e2a70..2e96b626e02e 100644
>> --- a/include/linux/sched/isolation.h
>> +++ b/include/linux/sched/isolation.h
>> @@ -57,4 +57,17 @@ static inline bool housekeeping_cpu(int cpu, enum hk_flags flags)
>>  	return true;
>>  }
>>  
>> +static inline unsigned int hk_num_online_cpus(void)
>> +{
>> +#ifdef CONFIG_CPU_ISOLATION
>> +	const struct cpumask *hk_mask;
>> +
>> +	if (static_branch_unlikely(&housekeeping_overridden)) {
>> +		hk_mask = housekeeping_cpumask(HK_FLAG_MANAGED_IRQ);
>> +		return cpumask_weight(hk_mask);
>> +	}
>> +#endif
>> +	return cpumask_weight(cpu_online_mask);
> Just curious: why is this not
>
>   #ifdef CONFIG_CPU_ISOLATION
>   ...
>   #endif
>     return num_online_cpus();

I think doing an atomic read is better than a bitmap operation.
Thanks for pointing this out.

>
>> +}
>> +
>>  #endif /* _LINUX_SCHED_ISOLATION_H */
>> -- 
>> 2.18.2
>>
diff mbox series

Patch

diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolation.h
index cc9f393e2a70..2e96b626e02e 100644
--- a/include/linux/sched/isolation.h
+++ b/include/linux/sched/isolation.h
@@ -57,4 +57,17 @@  static inline bool housekeeping_cpu(int cpu, enum hk_flags flags)
 	return true;
 }
 
+static inline unsigned int hk_num_online_cpus(void)
+{
+#ifdef CONFIG_CPU_ISOLATION
+	const struct cpumask *hk_mask;
+
+	if (static_branch_unlikely(&housekeeping_overridden)) {
+		hk_mask = housekeeping_cpumask(HK_FLAG_MANAGED_IRQ);
+		return cpumask_weight(hk_mask);
+	}
+#endif
+	return cpumask_weight(cpu_online_mask);
+}
+
 #endif /* _LINUX_SCHED_ISOLATION_H */