[2/2] drm/i915/pmu: Fix CPU hotplug with multiple GPUs

Message ID	20201020100822.543332-2-tvrtko.ursulin@linux.intel.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=qNAp=D3=lists.freedesktop.org=intel-gfx-bounces@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7C02122253 IronPort-SDR: xwS3srs7G0zUJdyCN/GYckaGJoIjhAyPBfGk6jUNnKn1qKCclZhMyMBDhmTOqMx653hJIO2aB5 fwV43kTBRZxQ== IronPort-SDR: /ZTpuH9NYrPDF4jpg6GMATJnfjjLzUQ7PzmapPEXpqxeZ/tfgDO3V8wvLCbWfz4fzIj/QjTOyn 525v8+8heACg== From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> To: Intel-gfx@lists.freedesktop.org Date: Tue, 20 Oct 2020 11:08:22 +0100 Message-Id: <20201020100822.543332-2-tvrtko.ursulin@linux.intel.com> In-Reply-To: <20201020100822.543332-1-tvrtko.ursulin@linux.intel.com> References: <20201020100822.543332-1-tvrtko.ursulin@linux.intel.com> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 2/2] drm/i915/pmu: Fix CPU hotplug with multiple GPUs Precedence: list Cc: Daniel Vetter <daniel.vetter@intel.com>, Chris Wilson <chris@chris-wilson.co.uk> Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>
Series	[1/2] drm/i915/pmu: Handle PCI unbind \| expand [1/2] drm/i915/pmu: Handle PCI unbind [2/2] drm/i915/pmu: Fix CPU hotplug with multiple GPUs

Tvrtko Ursulin Oct. 20, 2020, 10:08 a.m. UTC

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Since we keep a driver global mask of online CPUs and base the decision
whether PMU needs to be migrated upon it, we need to make sure the
migration is done for all registered PMUs (so GPUs).

To do this we need to track the current CPU for each PMU and base the
decision on whether to migrate on a comparison between global and local
state.

At the same time, since dynamic CPU hotplug notification slots are a
scarce resource and given how we already register the multi instance type
state, we can and should add multiple instance of the i915 PMU to this
same state and not allocate a new one for every GPU.

v2:
 * Use pr_notice. (Chris)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Suggested-by: Daniel Vetter <daniel.vetter@intel.com> # dynamic slot optimisation
Cc: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_pci.c |  7 ++++-
 drivers/gpu/drm/i915/i915_pmu.c | 50 ++++++++++++++++++++-------------
 drivers/gpu/drm/i915/i915_pmu.h |  6 +++-
 3 files changed, 41 insertions(+), 22 deletions(-)

Chris Wilson Oct. 20, 2020, 11:59 a.m. UTC | #1

Quoting Tvrtko Ursulin (2020-10-20 11:08:22)
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> Since we keep a driver global mask of online CPUs and base the decision
> whether PMU needs to be migrated upon it, we need to make sure the
> migration is done for all registered PMUs (so GPUs).
> 
> To do this we need to track the current CPU for each PMU and base the
> decision on whether to migrate on a comparison between global and local
> state.
> 
> At the same time, since dynamic CPU hotplug notification slots are a
> scarce resource and given how we already register the multi instance type
> state, we can and should add multiple instance of the i915 PMU to this
> same state and not allocate a new one for every GPU.
> 
> v2:
>  * Use pr_notice. (Chris)
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Suggested-by: Daniel Vetter <daniel.vetter@intel.com> # dynamic slot optimisation
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/i915_pci.c |  7 ++++-
>  drivers/gpu/drm/i915/i915_pmu.c | 50 ++++++++++++++++++++-------------
>  drivers/gpu/drm/i915/i915_pmu.h |  6 +++-
>  3 files changed, 41 insertions(+), 22 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
> index 27964ac0638a..a384f51c91c1 100644
> --- a/drivers/gpu/drm/i915/i915_pci.c
> +++ b/drivers/gpu/drm/i915/i915_pci.c
> @@ -1150,9 +1150,13 @@ static int __init i915_init(void)
>                 return 0;
>         }
>  
> +       i915_pmu_init();
> +
>         err = pci_register_driver(&i915_pci_driver);
> -       if (err)
> +       if (err) {
> +               i915_pmu_exit();
>                 return err;
> +       }
>  
>         i915_perf_sysctl_register();
>         return 0;
> @@ -1166,6 +1170,7 @@ static void __exit i915_exit(void)
>         i915_perf_sysctl_unregister();
>         pci_unregister_driver(&i915_pci_driver);
>         i915_globals_exit();
> +       i915_pmu_exit();
>  }
>  
>  module_init(i915_init);
> diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
> index 51ed7d0efcdc..0d6c0945621e 100644
> --- a/drivers/gpu/drm/i915/i915_pmu.c
> +++ b/drivers/gpu/drm/i915/i915_pmu.c
> @@ -30,6 +30,7 @@
>  #define ENGINE_SAMPLE_BITS (1 << I915_PMU_SAMPLE_BITS)
>  
>  static cpumask_t i915_pmu_cpumask;
> +static unsigned int i915_pmu_target_cpu = -1;
>  
>  static u8 engine_config_sample(u64 config)
>  {
> @@ -1049,25 +1050,32 @@ static int i915_pmu_cpu_online(unsigned int cpu, struct hlist_node *node)
>  static int i915_pmu_cpu_offline(unsigned int cpu, struct hlist_node *node)
>  {
>         struct i915_pmu *pmu = hlist_entry_safe(node, typeof(*pmu), cpuhp.node);
> -       unsigned int target;
> +       unsigned int target = i915_pmu_target_cpu;

So we still have multiple callbacks, one per pmu. But each callback is
now stored in a list from the cpuhp_slot instead of each callback having
its own slot.

>  
>         GEM_BUG_ON(!pmu->base.event_init);
>  
>         if (cpumask_test_and_clear_cpu(cpu, &i915_pmu_cpumask)) {

On first callback...

>                 target = cpumask_any_but(topology_sibling_cpumask(cpu), cpu);

Pick any other cpu.

> +
>                 /* Migrate events if there is a valid target */
>                 if (target < nr_cpu_ids) {
>                         cpumask_set_cpu(target, &i915_pmu_cpumask);
> -                       perf_pmu_migrate_context(&pmu->base, cpu, target);
> +                       i915_pmu_target_cpu = target;

Store target for all callbacks.

>                 }
>         }
>  
> +       if (target < nr_cpu_ids && target != pmu->cpuhp.cpu) {

If global [i915_pmu_target_cpu] target has changed, update perf.

> +               perf_pmu_migrate_context(&pmu->base, cpu, target);
> +               pmu->cpuhp.cpu = target;

It is claimed that cpuhp_state_remove_instance() will call the offline
callback for all online cpus... Do we need a pmu->base.state != STOPPED
guard?
-Chris

Chris Wilson Oct. 20, 2020, 12:10 p.m. UTC | #2

Quoting Chris Wilson (2020-10-20 12:59:57)
> Quoting Tvrtko Ursulin (2020-10-20 11:08:22)
> > From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > 
> > Since we keep a driver global mask of online CPUs and base the decision
> > whether PMU needs to be migrated upon it, we need to make sure the
> > migration is done for all registered PMUs (so GPUs).
> > 
> > To do this we need to track the current CPU for each PMU and base the
> > decision on whether to migrate on a comparison between global and local
> > state.
> > 
> > At the same time, since dynamic CPU hotplug notification slots are a
> > scarce resource and given how we already register the multi instance type
> > state, we can and should add multiple instance of the i915 PMU to this
> > same state and not allocate a new one for every GPU.
> > 
> > v2:
> >  * Use pr_notice. (Chris)
> > 
> > Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > Suggested-by: Daniel Vetter <daniel.vetter@intel.com> # dynamic slot optimisation
> > Cc: Chris Wilson <chris@chris-wilson.co.uk>
> > ---
> >  drivers/gpu/drm/i915/i915_pci.c |  7 ++++-
> >  drivers/gpu/drm/i915/i915_pmu.c | 50 ++++++++++++++++++++-------------
> >  drivers/gpu/drm/i915/i915_pmu.h |  6 +++-
> >  3 files changed, 41 insertions(+), 22 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
> > index 27964ac0638a..a384f51c91c1 100644
> > --- a/drivers/gpu/drm/i915/i915_pci.c
> > +++ b/drivers/gpu/drm/i915/i915_pci.c
> > @@ -1150,9 +1150,13 @@ static int __init i915_init(void)
> >                 return 0;
> >         }
> >  
> > +       i915_pmu_init();
> > +
> >         err = pci_register_driver(&i915_pci_driver);
> > -       if (err)
> > +       if (err) {
> > +               i915_pmu_exit();
> >                 return err;
> > +       }
> >  
> >         i915_perf_sysctl_register();
> >         return 0;
> > @@ -1166,6 +1170,7 @@ static void __exit i915_exit(void)
> >         i915_perf_sysctl_unregister();
> >         pci_unregister_driver(&i915_pci_driver);
> >         i915_globals_exit();
> > +       i915_pmu_exit();
> >  }
> >  
> >  module_init(i915_init);
> > diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
> > index 51ed7d0efcdc..0d6c0945621e 100644
> > --- a/drivers/gpu/drm/i915/i915_pmu.c
> > +++ b/drivers/gpu/drm/i915/i915_pmu.c
> > @@ -30,6 +30,7 @@
> >  #define ENGINE_SAMPLE_BITS (1 << I915_PMU_SAMPLE_BITS)
> >  
> >  static cpumask_t i915_pmu_cpumask;
> > +static unsigned int i915_pmu_target_cpu = -1;
> >  
> >  static u8 engine_config_sample(u64 config)
> >  {
> > @@ -1049,25 +1050,32 @@ static int i915_pmu_cpu_online(unsigned int cpu, struct hlist_node *node)
> >  static int i915_pmu_cpu_offline(unsigned int cpu, struct hlist_node *node)
> >  {
> >         struct i915_pmu *pmu = hlist_entry_safe(node, typeof(*pmu), cpuhp.node);
> > -       unsigned int target;
> > +       unsigned int target = i915_pmu_target_cpu;
> 
> So we still have multiple callbacks, one per pmu. But each callback is
> now stored in a list from the cpuhp_slot instead of each callback having
> its own slot.
> 
> >  
> >         GEM_BUG_ON(!pmu->base.event_init);
> >  
> >         if (cpumask_test_and_clear_cpu(cpu, &i915_pmu_cpumask)) {
> 
> On first callback...
> 
> >                 target = cpumask_any_but(topology_sibling_cpumask(cpu), cpu);
> 
> Pick any other cpu.
> 
> > +
> >                 /* Migrate events if there is a valid target */
> >                 if (target < nr_cpu_ids) {
> >                         cpumask_set_cpu(target, &i915_pmu_cpumask);
> > -                       perf_pmu_migrate_context(&pmu->base, cpu, target);
> > +                       i915_pmu_target_cpu = target;
> 
> Store target for all callbacks.
> 
> >                 }
> >         }
> >  
> > +       if (target < nr_cpu_ids && target != pmu->cpuhp.cpu) {
> 
> If global [i915_pmu_target_cpu] target has changed, update perf.
> 
> > +               perf_pmu_migrate_context(&pmu->base, cpu, target);
> > +               pmu->cpuhp.cpu = target;
> 
> It is claimed that cpuhp_state_remove_instance() will call the offline
> callback for all online cpus... Do we need a pmu->base.state != STOPPED
> guard?

s/claimed/it definitely does :)/

Or rather pmu->closed.
-Chris

Tvrtko Ursulin Oct. 20, 2020, 12:33 p.m. UTC | #3

On 20/10/2020 13:10, Chris Wilson wrote:
> Quoting Chris Wilson (2020-10-20 12:59:57)
>> Quoting Tvrtko Ursulin (2020-10-20 11:08:22)
>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>
>>> Since we keep a driver global mask of online CPUs and base the decision
>>> whether PMU needs to be migrated upon it, we need to make sure the
>>> migration is done for all registered PMUs (so GPUs).
>>>
>>> To do this we need to track the current CPU for each PMU and base the
>>> decision on whether to migrate on a comparison between global and local
>>> state.
>>>
>>> At the same time, since dynamic CPU hotplug notification slots are a
>>> scarce resource and given how we already register the multi instance type
>>> state, we can and should add multiple instance of the i915 PMU to this
>>> same state and not allocate a new one for every GPU.
>>>
>>> v2:
>>>   * Use pr_notice. (Chris)
>>>
>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>> Suggested-by: Daniel Vetter <daniel.vetter@intel.com> # dynamic slot optimisation
>>> Cc: Chris Wilson <chris@chris-wilson.co.uk>
>>> ---
>>>   drivers/gpu/drm/i915/i915_pci.c |  7 ++++-
>>>   drivers/gpu/drm/i915/i915_pmu.c | 50 ++++++++++++++++++++-------------
>>>   drivers/gpu/drm/i915/i915_pmu.h |  6 +++-
>>>   3 files changed, 41 insertions(+), 22 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
>>> index 27964ac0638a..a384f51c91c1 100644
>>> --- a/drivers/gpu/drm/i915/i915_pci.c
>>> +++ b/drivers/gpu/drm/i915/i915_pci.c
>>> @@ -1150,9 +1150,13 @@ static int __init i915_init(void)
>>>                  return 0;
>>>          }
>>>   
>>> +       i915_pmu_init();
>>> +
>>>          err = pci_register_driver(&i915_pci_driver);
>>> -       if (err)
>>> +       if (err) {
>>> +               i915_pmu_exit();
>>>                  return err;
>>> +       }
>>>   
>>>          i915_perf_sysctl_register();
>>>          return 0;
>>> @@ -1166,6 +1170,7 @@ static void __exit i915_exit(void)
>>>          i915_perf_sysctl_unregister();
>>>          pci_unregister_driver(&i915_pci_driver);
>>>          i915_globals_exit();
>>> +       i915_pmu_exit();
>>>   }
>>>   
>>>   module_init(i915_init);
>>> diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
>>> index 51ed7d0efcdc..0d6c0945621e 100644
>>> --- a/drivers/gpu/drm/i915/i915_pmu.c
>>> +++ b/drivers/gpu/drm/i915/i915_pmu.c
>>> @@ -30,6 +30,7 @@
>>>   #define ENGINE_SAMPLE_BITS (1 << I915_PMU_SAMPLE_BITS)
>>>   
>>>   static cpumask_t i915_pmu_cpumask;
>>> +static unsigned int i915_pmu_target_cpu = -1;
>>>   
>>>   static u8 engine_config_sample(u64 config)
>>>   {
>>> @@ -1049,25 +1050,32 @@ static int i915_pmu_cpu_online(unsigned int cpu, struct hlist_node *node)
>>>   static int i915_pmu_cpu_offline(unsigned int cpu, struct hlist_node *node)
>>>   {
>>>          struct i915_pmu *pmu = hlist_entry_safe(node, typeof(*pmu), cpuhp.node);
>>> -       unsigned int target;
>>> +       unsigned int target = i915_pmu_target_cpu;
>>
>> So we still have multiple callbacks, one per pmu. But each callback is
>> now stored in a list from the cpuhp_slot instead of each callback having
>> its own slot.
>>
>>>   
>>>          GEM_BUG_ON(!pmu->base.event_init);
>>>   
>>>          if (cpumask_test_and_clear_cpu(cpu, &i915_pmu_cpumask)) {
>>
>> On first callback...
>>
>>>                  target = cpumask_any_but(topology_sibling_cpumask(cpu), cpu);
>>
>> Pick any other cpu.
>>
>>> +
>>>                  /* Migrate events if there is a valid target */
>>>                  if (target < nr_cpu_ids) {
>>>                          cpumask_set_cpu(target, &i915_pmu_cpumask);
>>> -                       perf_pmu_migrate_context(&pmu->base, cpu, target);
>>> +                       i915_pmu_target_cpu = target;
>>
>> Store target for all callbacks.
>>
>>>                  }
>>>          }
>>>   
>>> +       if (target < nr_cpu_ids && target != pmu->cpuhp.cpu) {
>>
>> If global [i915_pmu_target_cpu] target has changed, update perf.
>>
>>> +               perf_pmu_migrate_context(&pmu->base, cpu, target);
>>> +               pmu->cpuhp.cpu = target;
>>
>> It is claimed that cpuhp_state_remove_instance() will call the offline
>> callback for all online cpus... Do we need a pmu->base.state != STOPPED
>> guard?
> 
> s/claimed/it definitely does :)/
> 
> Or rather pmu->closed.

Hm why? You think perf_pmu_migrate_context accesses something in the PMU 
outside of the already protected entry points?

Regards,

Tvrtko

Chris Wilson Oct. 20, 2020, 12:40 p.m. UTC | #4

Quoting Tvrtko Ursulin (2020-10-20 13:33:12)
> 
> On 20/10/2020 13:10, Chris Wilson wrote:
> > Quoting Chris Wilson (2020-10-20 12:59:57)
> >> Quoting Tvrtko Ursulin (2020-10-20 11:08:22)
> >>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>>
> >>> Since we keep a driver global mask of online CPUs and base the decision
> >>> whether PMU needs to be migrated upon it, we need to make sure the
> >>> migration is done for all registered PMUs (so GPUs).
> >>>
> >>> To do this we need to track the current CPU for each PMU and base the
> >>> decision on whether to migrate on a comparison between global and local
> >>> state.
> >>>
> >>> At the same time, since dynamic CPU hotplug notification slots are a
> >>> scarce resource and given how we already register the multi instance type
> >>> state, we can and should add multiple instance of the i915 PMU to this
> >>> same state and not allocate a new one for every GPU.
> >>>
> >>> v2:
> >>>   * Use pr_notice. (Chris)
> >>>
> >>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>> Suggested-by: Daniel Vetter <daniel.vetter@intel.com> # dynamic slot optimisation
> >>> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> >>> ---
> >>>   drivers/gpu/drm/i915/i915_pci.c |  7 ++++-
> >>>   drivers/gpu/drm/i915/i915_pmu.c | 50 ++++++++++++++++++++-------------
> >>>   drivers/gpu/drm/i915/i915_pmu.h |  6 +++-
> >>>   3 files changed, 41 insertions(+), 22 deletions(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
> >>> index 27964ac0638a..a384f51c91c1 100644
> >>> --- a/drivers/gpu/drm/i915/i915_pci.c
> >>> +++ b/drivers/gpu/drm/i915/i915_pci.c
> >>> @@ -1150,9 +1150,13 @@ static int __init i915_init(void)
> >>>                  return 0;
> >>>          }
> >>>   
> >>> +       i915_pmu_init();
> >>> +
> >>>          err = pci_register_driver(&i915_pci_driver);
> >>> -       if (err)
> >>> +       if (err) {
> >>> +               i915_pmu_exit();
> >>>                  return err;
> >>> +       }
> >>>   
> >>>          i915_perf_sysctl_register();
> >>>          return 0;
> >>> @@ -1166,6 +1170,7 @@ static void __exit i915_exit(void)
> >>>          i915_perf_sysctl_unregister();
> >>>          pci_unregister_driver(&i915_pci_driver);
> >>>          i915_globals_exit();
> >>> +       i915_pmu_exit();
> >>>   }
> >>>   
> >>>   module_init(i915_init);
> >>> diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
> >>> index 51ed7d0efcdc..0d6c0945621e 100644
> >>> --- a/drivers/gpu/drm/i915/i915_pmu.c
> >>> +++ b/drivers/gpu/drm/i915/i915_pmu.c
> >>> @@ -30,6 +30,7 @@
> >>>   #define ENGINE_SAMPLE_BITS (1 << I915_PMU_SAMPLE_BITS)
> >>>   
> >>>   static cpumask_t i915_pmu_cpumask;
> >>> +static unsigned int i915_pmu_target_cpu = -1;
> >>>   
> >>>   static u8 engine_config_sample(u64 config)
> >>>   {
> >>> @@ -1049,25 +1050,32 @@ static int i915_pmu_cpu_online(unsigned int cpu, struct hlist_node *node)
> >>>   static int i915_pmu_cpu_offline(unsigned int cpu, struct hlist_node *node)
> >>>   {
> >>>          struct i915_pmu *pmu = hlist_entry_safe(node, typeof(*pmu), cpuhp.node);
> >>> -       unsigned int target;
> >>> +       unsigned int target = i915_pmu_target_cpu;
> >>
> >> So we still have multiple callbacks, one per pmu. But each callback is
> >> now stored in a list from the cpuhp_slot instead of each callback having
> >> its own slot.
> >>
> >>>   
> >>>          GEM_BUG_ON(!pmu->base.event_init);
> >>>   
> >>>          if (cpumask_test_and_clear_cpu(cpu, &i915_pmu_cpumask)) {
> >>
> >> On first callback...
> >>
> >>>                  target = cpumask_any_but(topology_sibling_cpumask(cpu), cpu);
> >>
> >> Pick any other cpu.
> >>
> >>> +
> >>>                  /* Migrate events if there is a valid target */
> >>>                  if (target < nr_cpu_ids) {
> >>>                          cpumask_set_cpu(target, &i915_pmu_cpumask);
> >>> -                       perf_pmu_migrate_context(&pmu->base, cpu, target);
> >>> +                       i915_pmu_target_cpu = target;
> >>
> >> Store target for all callbacks.
> >>
> >>>                  }
> >>>          }
> >>>   
> >>> +       if (target < nr_cpu_ids && target != pmu->cpuhp.cpu) {
> >>
> >> If global [i915_pmu_target_cpu] target has changed, update perf.
> >>
> >>> +               perf_pmu_migrate_context(&pmu->base, cpu, target);
> >>> +               pmu->cpuhp.cpu = target;
> >>
> >> It is claimed that cpuhp_state_remove_instance() will call the offline
> >> callback for all online cpus... Do we need a pmu->base.state != STOPPED
> >> guard?
> > 
> > s/claimed/it definitely does :)/
> > 
> > Or rather pmu->closed.
> 
> Hm why? You think perf_pmu_migrate_context accesses something in the PMU 
> outside of the already protected entry points?

If this callback is being called for every online when we unplug one
device, we then believe that no cpus remain online for all other devices.
Should a cpu then be offlined, target is -1u so greater than
nr_cpu_online and we move the perf context to the void, worst case, in
the best case we fail to migrate the perf context off the dying cpu.
-Chris

Tvrtko Ursulin Oct. 20, 2020, 1:05 p.m. UTC | #5

On 20/10/2020 13:40, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2020-10-20 13:33:12)
>>
>> On 20/10/2020 13:10, Chris Wilson wrote:
>>> Quoting Chris Wilson (2020-10-20 12:59:57)
>>>> Quoting Tvrtko Ursulin (2020-10-20 11:08:22)
>>>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>>
>>>>> Since we keep a driver global mask of online CPUs and base the decision
>>>>> whether PMU needs to be migrated upon it, we need to make sure the
>>>>> migration is done for all registered PMUs (so GPUs).
>>>>>
>>>>> To do this we need to track the current CPU for each PMU and base the
>>>>> decision on whether to migrate on a comparison between global and local
>>>>> state.
>>>>>
>>>>> At the same time, since dynamic CPU hotplug notification slots are a
>>>>> scarce resource and given how we already register the multi instance type
>>>>> state, we can and should add multiple instance of the i915 PMU to this
>>>>> same state and not allocate a new one for every GPU.
>>>>>
>>>>> v2:
>>>>>    * Use pr_notice. (Chris)
>>>>>
>>>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>> Suggested-by: Daniel Vetter <daniel.vetter@intel.com> # dynamic slot optimisation
>>>>> Cc: Chris Wilson <chris@chris-wilson.co.uk>
>>>>> ---
>>>>>    drivers/gpu/drm/i915/i915_pci.c |  7 ++++-
>>>>>    drivers/gpu/drm/i915/i915_pmu.c | 50 ++++++++++++++++++++-------------
>>>>>    drivers/gpu/drm/i915/i915_pmu.h |  6 +++-
>>>>>    3 files changed, 41 insertions(+), 22 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
>>>>> index 27964ac0638a..a384f51c91c1 100644
>>>>> --- a/drivers/gpu/drm/i915/i915_pci.c
>>>>> +++ b/drivers/gpu/drm/i915/i915_pci.c
>>>>> @@ -1150,9 +1150,13 @@ static int __init i915_init(void)
>>>>>                   return 0;
>>>>>           }
>>>>>    
>>>>> +       i915_pmu_init();
>>>>> +
>>>>>           err = pci_register_driver(&i915_pci_driver);
>>>>> -       if (err)
>>>>> +       if (err) {
>>>>> +               i915_pmu_exit();
>>>>>                   return err;
>>>>> +       }
>>>>>    
>>>>>           i915_perf_sysctl_register();
>>>>>           return 0;
>>>>> @@ -1166,6 +1170,7 @@ static void __exit i915_exit(void)
>>>>>           i915_perf_sysctl_unregister();
>>>>>           pci_unregister_driver(&i915_pci_driver);
>>>>>           i915_globals_exit();
>>>>> +       i915_pmu_exit();
>>>>>    }
>>>>>    
>>>>>    module_init(i915_init);
>>>>> diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
>>>>> index 51ed7d0efcdc..0d6c0945621e 100644
>>>>> --- a/drivers/gpu/drm/i915/i915_pmu.c
>>>>> +++ b/drivers/gpu/drm/i915/i915_pmu.c
>>>>> @@ -30,6 +30,7 @@
>>>>>    #define ENGINE_SAMPLE_BITS (1 << I915_PMU_SAMPLE_BITS)
>>>>>    
>>>>>    static cpumask_t i915_pmu_cpumask;
>>>>> +static unsigned int i915_pmu_target_cpu = -1;
>>>>>    
>>>>>    static u8 engine_config_sample(u64 config)
>>>>>    {
>>>>> @@ -1049,25 +1050,32 @@ static int i915_pmu_cpu_online(unsigned int cpu, struct hlist_node *node)
>>>>>    static int i915_pmu_cpu_offline(unsigned int cpu, struct hlist_node *node)
>>>>>    {
>>>>>           struct i915_pmu *pmu = hlist_entry_safe(node, typeof(*pmu), cpuhp.node);
>>>>> -       unsigned int target;
>>>>> +       unsigned int target = i915_pmu_target_cpu;
>>>>
>>>> So we still have multiple callbacks, one per pmu. But each callback is
>>>> now stored in a list from the cpuhp_slot instead of each callback having
>>>> its own slot.
>>>>
>>>>>    
>>>>>           GEM_BUG_ON(!pmu->base.event_init);
>>>>>    
>>>>>           if (cpumask_test_and_clear_cpu(cpu, &i915_pmu_cpumask)) {
>>>>
>>>> On first callback...
>>>>
>>>>>                   target = cpumask_any_but(topology_sibling_cpumask(cpu), cpu);
>>>>
>>>> Pick any other cpu.
>>>>
>>>>> +
>>>>>                   /* Migrate events if there is a valid target */
>>>>>                   if (target < nr_cpu_ids) {
>>>>>                           cpumask_set_cpu(target, &i915_pmu_cpumask);
>>>>> -                       perf_pmu_migrate_context(&pmu->base, cpu, target);
>>>>> +                       i915_pmu_target_cpu = target;
>>>>
>>>> Store target for all callbacks.
>>>>
>>>>>                   }
>>>>>           }
>>>>>    
>>>>> +       if (target < nr_cpu_ids && target != pmu->cpuhp.cpu) {
>>>>
>>>> If global [i915_pmu_target_cpu] target has changed, update perf.
>>>>
>>>>> +               perf_pmu_migrate_context(&pmu->base, cpu, target);
>>>>> +               pmu->cpuhp.cpu = target;
>>>>
>>>> It is claimed that cpuhp_state_remove_instance() will call the offline
>>>> callback for all online cpus... Do we need a pmu->base.state != STOPPED
>>>> guard?
>>>
>>> s/claimed/it definitely does :)/
>>>
>>> Or rather pmu->closed.
>>
>> Hm why? You think perf_pmu_migrate_context accesses something in the PMU
>> outside of the already protected entry points?
> 
> If this callback is being called for every online when we unplug one
> device, we then believe that no cpus remain online for all other devices.
> Should a cpu then be offlined, target is -1u so greater than
> nr_cpu_online and we move the perf context to the void, worst case, in
> the best case we fail to migrate the perf context off the dying cpu.

Well spotted nasty interaction, thanks.

Regards,

Tvrtko

[2/2] drm/i915/pmu: Fix CPU hotplug with multiple GPUs

Commit Message

Comments

Patch