diff mbox series

soc: qcom: rpmh-rsc: Remove the pm_lock

Message ID 20200414102312.1.I295cb72bc5334a2af80313cbe97cb5c9dcb1442c@changeid (mailing list archive)
State New, archived
Headers show
Series soc: qcom: rpmh-rsc: Remove the pm_lock | expand

Commit Message

Doug Anderson April 14, 2020, 5:23 p.m. UTC
It has been postulated that the pm_lock is bad for performance because
a CPU currently running rpmh_flush() could block other CPUs from
coming out of idle.  Similarly CPUs coming out of / going into idle
all need to contend with each other for the spinlock just to update
the variable tracking who's in PM.

Let's optimize this a bit.  Specifically:

- Use a count rather than a bitmask.  This is faster to access and
  also means we can use the atomic_inc_return() function to really
  detect who the last one to enter PM was.
- Accept that it's OK if we race and are doing the flush (because we
  think we're last) while another CPU is coming out of idle.  As long
  as we block that CPU if/when it tries to do an active-only transfer
  we're OK.

Signed-off-by: Douglas Anderson <dianders@chromium.org>
---

 drivers/soc/qcom/rpmh-internal.h |  7 ++---
 drivers/soc/qcom/rpmh-rsc.c      | 46 +++++++++++++++-----------------
 2 files changed, 24 insertions(+), 29 deletions(-)

Comments

Stephen Boyd April 15, 2020, 7:06 a.m. UTC | #1
Quoting Douglas Anderson (2020-04-14 10:23:26)
> diff --git a/drivers/soc/qcom/rpmh-rsc.c b/drivers/soc/qcom/rpmh-rsc.c
> index 732316bb67dc..4e45a8ac6cde 100644
> --- a/drivers/soc/qcom/rpmh-rsc.c
> +++ b/drivers/soc/qcom/rpmh-rsc.c
> @@ -791,36 +790,36 @@ static int rpmh_rsc_cpu_pm_callback(struct notifier_block *nfb,
>  {
>         struct rsc_drv *drv = container_of(nfb, struct rsc_drv, rsc_pm);
>         int ret = NOTIFY_OK;
> -
> -       spin_lock(&drv->pm_lock);
> +       int cpus_in_pm;
>  
>         switch (action) {
>         case CPU_PM_ENTER:
> -               cpumask_set_cpu(smp_processor_id(), &drv->cpus_entered_pm);
> -
> -               if (!cpumask_equal(&drv->cpus_entered_pm, cpu_online_mask))
> -                       goto exit;
> +               cpus_in_pm = atomic_inc_return(&drv->cpus_in_pm);
> +               if (cpus_in_pm < num_online_cpus())

Might be worth adding a comment here explaining that num_online_cpus()
is stable because this is called from the cpu PM notifier path and a CPU
can't go offline or come online without stopping the world.

> +                       return NOTIFY_OK;
>                 break;
>         case CPU_PM_ENTER_FAILED:
>         case CPU_PM_EXIT:
> -               cpumask_clear_cpu(smp_processor_id(), &drv->cpus_entered_pm);
> -               goto exit;
> -       }
> -
> -       ret = rpmh_rsc_ctrlr_is_busy(drv);
> -       if (ret) {
> -               ret = NOTIFY_BAD;
> -               goto exit;
> +               atomic_dec(&drv->cpus_in_pm);

We should also handle the cluster PM enums. I'm actually confused the
compiler didn't complain about that already. Presumably we want to just
ignore the cluster PM notifications because the counter handles it
already. Looks like other code uses NOTIFY_DONE for the default case.

> +               return NOTIFY_OK;
>         }
>  
> -       ret = rpmh_flush(&drv->client);
> -       if (ret)
> +       /*
> +        * It's likely we're on the last CPU. Grab the drv->lock and write
> +        * out the sleep/wake commands to RPMH hardware. Grabbing the lock
> +        * means that if we race with another CPU coming up we are still
> +        * guaranteed to be safe. If another CPU came up just after we checked
> +        * and has already started an active transfer then we'll notice we're
> +        * busy and abort. If another CPU comes up after we start flushing it
> +        * will be blocked from starting an active transfer until we're done
> +        * flushing. If another CPU starts an active transfer after we release
> +        * the lock we're still OK because we're no longer the last CPU.
> +        */
> +       spin_lock(&drv->lock);

This should probably be a raw spinlock given that this is called from
the idle path and sleeping there is not very nice for RT. That can come
later of course.

> +       if (rpmh_rsc_ctrlr_is_busy(drv) || !rpmh_flush(&drv->client))

It almost feels like rpmh_rsc_ctrlr_is_busy() shold be rolled straight
into rpmh_flush() so that rpmh_flush() fails if there are active
requests in flight.

>                 ret = NOTIFY_BAD;
> -       else
> -               ret = NOTIFY_OK;
> +       spin_unlock(&drv->lock);

I'm looking at the latest linux-next code that I think has all the
patches on the list for rpmh (latest commit is 1d3c6f86fd3f ("soc: qcom:
rpmh: Allow RPMH driver to be loaded as a module")). I see that
tcs->lock is taken first, and then drv->lock is taken next in
tcs_write(). But then this takes drv->lock first and then calls
rpmh_flush() (which goes to a different file.. yay!) and that calls
flush_batch() which calls rpmh_rsc_write_ctrl_data() (back to this
file... yay again!) which then locks tcs->lock. Isn't that an ABBA
deadlock?

>  
> -exit:
> -       spin_unlock(&drv->pm_lock);
>         return ret;
Doug Anderson April 16, 2020, 12:20 a.m. UTC | #2
Hi,

On Wed, Apr 15, 2020 at 12:06 AM Stephen Boyd <swboyd@chromium.org> wrote:
>
> Quoting Douglas Anderson (2020-04-14 10:23:26)
> > diff --git a/drivers/soc/qcom/rpmh-rsc.c b/drivers/soc/qcom/rpmh-rsc.c
> > index 732316bb67dc..4e45a8ac6cde 100644
> > --- a/drivers/soc/qcom/rpmh-rsc.c
> > +++ b/drivers/soc/qcom/rpmh-rsc.c
> > @@ -791,36 +790,36 @@ static int rpmh_rsc_cpu_pm_callback(struct notifier_block *nfb,
> >  {
> >         struct rsc_drv *drv = container_of(nfb, struct rsc_drv, rsc_pm);
> >         int ret = NOTIFY_OK;
> > -
> > -       spin_lock(&drv->pm_lock);
> > +       int cpus_in_pm;
> >
> >         switch (action) {
> >         case CPU_PM_ENTER:
> > -               cpumask_set_cpu(smp_processor_id(), &drv->cpus_entered_pm);
> > -
> > -               if (!cpumask_equal(&drv->cpus_entered_pm, cpu_online_mask))
> > -                       goto exit;
> > +               cpus_in_pm = atomic_inc_return(&drv->cpus_in_pm);
> > +               if (cpus_in_pm < num_online_cpus())
>
> Might be worth adding a comment here explaining that num_online_cpus()
> is stable because this is called from the cpu PM notifier path and a CPU
> can't go offline or come online without stopping the world.

Good idea.


> > +                       return NOTIFY_OK;
> >                 break;
> >         case CPU_PM_ENTER_FAILED:
> >         case CPU_PM_EXIT:
> > -               cpumask_clear_cpu(smp_processor_id(), &drv->cpus_entered_pm);
> > -               goto exit;
> > -       }
> > -
> > -       ret = rpmh_rsc_ctrlr_is_busy(drv);
> > -       if (ret) {
> > -               ret = NOTIFY_BAD;
> > -               goto exit;
> > +               atomic_dec(&drv->cpus_in_pm);
>
> We should also handle the cluster PM enums. I'm actually confused the
> compiler didn't complain about that already. Presumably we want to just
> ignore the cluster PM notifications because the counter handles it
> already. Looks like other code uses NOTIFY_DONE for the default case.

Hrm, I guess my compiler isn't set to warn for that?  :-/

...in any case I think the right thing to do here is to add
"default:".  Really we _only_ care about the ones we already have
cases for and if anyone adds any other notifications we really don't
care about them.


> > +               return NOTIFY_OK;
> >         }
> >
> > -       ret = rpmh_flush(&drv->client);
> > -       if (ret)
> > +       /*
> > +        * It's likely we're on the last CPU. Grab the drv->lock and write
> > +        * out the sleep/wake commands to RPMH hardware. Grabbing the lock
> > +        * means that if we race with another CPU coming up we are still
> > +        * guaranteed to be safe. If another CPU came up just after we checked
> > +        * and has already started an active transfer then we'll notice we're
> > +        * busy and abort. If another CPU comes up after we start flushing it
> > +        * will be blocked from starting an active transfer until we're done
> > +        * flushing. If another CPU starts an active transfer after we release
> > +        * the lock we're still OK because we're no longer the last CPU.
> > +        */
> > +       spin_lock(&drv->lock);
>
> This should probably be a raw spinlock given that this is called from
> the idle path and sleeping there is not very nice for RT. That can come
> later of course.

Actually, maybe I should just do a spin_trylock().  If I fail to get
the lock I can just return NOTIFY_BAD, right?


> > +       if (rpmh_rsc_ctrlr_is_busy(drv) || !rpmh_flush(&drv->client))
>
> It almost feels like rpmh_rsc_ctrlr_is_busy() shold be rolled straight
> into rpmh_flush() so that rpmh_flush() fails if there are active
> requests in flight.

I'm going to leave that change out for now.  Maulik says there are
other code paths in future patches that will call rpmh_flush().  If we
see every call to rpmh_flush() follow the same pattern then we can
roll it in then?


> >                 ret = NOTIFY_BAD;

Oh, I think we have a bug here.  If we return NOTIFY_BAD we probably
need to decrement our count.  From reading the code I think
CPU_PM_ENTER_FAILED doesn't get called for the person that returned
NOTIFY_BAD.  I'll try to confirm, then fix.


> > -       else
> > -               ret = NOTIFY_OK;
> > +       spin_unlock(&drv->lock);
>
> I'm looking at the latest linux-next code that I think has all the
> patches on the list for rpmh (latest commit is 1d3c6f86fd3f ("soc: qcom:
> rpmh: Allow RPMH driver to be loaded as a module")). I see that
> tcs->lock is taken first, and then drv->lock is taken next in
> tcs_write(). But then this takes drv->lock first and then calls
> rpmh_flush() (which goes to a different file.. yay!) and that calls
> flush_batch() which calls rpmh_rsc_write_ctrl_data() (back to this
> file... yay again!) which then locks tcs->lock. Isn't that an ABBA
> deadlock?

Oops.  Somehow I thought I had checked that and the ABBA was only
there before all the cleanup patches, but I think you're right.  I
think I can fix this by just changing the order we grab locks in
tcs_write().  At first I was bummed because I thought that would mean
I'd have to hold both locks while calling:

  __tcs_buffer_write(drv, tcs_id, 0, msg);
  __tcs_set_trigger(drv, tcs_id, true);

...but I just realized that even if I change the order I grab the
locks it doesn't mean I have to change the order I release the locks!


I'll plan to send another spin tomorrow since my day is about over now.

-Doug
Maulik Shah April 16, 2020, 10:51 a.m. UTC | #3
Hi,

On 4/14/2020 10:53 PM, Douglas Anderson wrote:
> It has been postulated that the pm_lock is bad for performance because
> a CPU currently running rpmh_flush() could block other CPUs from
> coming out of idle.  Similarly CPUs coming out of / going into idle
> all need to contend with each other for the spinlock just to update
> the variable tracking who's in PM.
>
> Let's optimize this a bit.  Specifically:
>
> - Use a count rather than a bitmask.  This is faster to access and
>    also means we can use the atomic_inc_return() function to really
>    detect who the last one to enter PM was.
> - Accept that it's OK if we race and are doing the flush (because we
>    think we're last) while another CPU is coming out of idle.  As long
>    as we block that CPU if/when it tries to do an active-only transfer
>    we're OK.
>
> Signed-off-by: Douglas Anderson <dianders@chromium.org>
> ---
>
>   drivers/soc/qcom/rpmh-internal.h |  7 ++---
>   drivers/soc/qcom/rpmh-rsc.c      | 46 +++++++++++++++-----------------
>   2 files changed, 24 insertions(+), 29 deletions(-)
>
> diff --git a/drivers/soc/qcom/rpmh-internal.h b/drivers/soc/qcom/rpmh-internal.h
> index dba8510c0669..449cd511702b 100644
> --- a/drivers/soc/qcom/rpmh-internal.h
> +++ b/drivers/soc/qcom/rpmh-internal.h
> @@ -97,7 +97,7 @@ struct rpmh_ctrlr {
>    * @num_tcs:            Number of TCSes in this DRV.
>    * @rsc_pm:             CPU PM notifier for controller.
>    *                      Used when solver mode is not present.
> - * @cpus_entered_pm:    CPU mask for cpus in idle power collapse.
> + * @cpus_in_pm:         Number of CPUs not in idle power collapse.
>    *                      Used when solver mode is not present.
>    * @tcs:                TCS groups.
>    * @tcs_in_use:         S/W state of the TCS; only set for ACTIVE_ONLY
> @@ -111,8 +111,6 @@ struct rpmh_ctrlr {
>    *                      grabbing this lock and a tcs_lock at the same time,
>    *                      grab the tcs_lock first so we always have a
>    *                      consistent lock ordering.
> - * @pm_lock:            Synchronize during PM notifications.
> - *                      Used when solver mode is not present.
>    * @client:             Handle to the DRV's client.
>    */
>   struct rsc_drv {
> @@ -121,11 +119,10 @@ struct rsc_drv {
>   	int id;
>   	int num_tcs;
>   	struct notifier_block rsc_pm;
> -	struct cpumask cpus_entered_pm;
> +	atomic_t cpus_in_pm;
>   	struct tcs_group tcs[TCS_TYPE_NR];
>   	DECLARE_BITMAP(tcs_in_use, MAX_TCS_NR);
>   	spinlock_t lock;
> -	spinlock_t pm_lock;
>   	struct rpmh_ctrlr client;
>   };
>   
> diff --git a/drivers/soc/qcom/rpmh-rsc.c b/drivers/soc/qcom/rpmh-rsc.c
> index 732316bb67dc..4e45a8ac6cde 100644
> --- a/drivers/soc/qcom/rpmh-rsc.c
> +++ b/drivers/soc/qcom/rpmh-rsc.c
> @@ -740,6 +740,8 @@ int rpmh_rsc_write_ctrl_data(struct rsc_drv *drv, const struct tcs_request *msg)
>    * SLEEP and WAKE sets. If AMCs are busy, controller can not enter
>    * power collapse, so deny from the last cpu's pm notification.
>    *
> + * Context: Must be called with the drv->lock held.
> + *
>    * Return:
>    * * False		- AMCs are idle
>    * * True		- AMCs are busy
> @@ -754,9 +756,6 @@ static bool rpmh_rsc_ctrlr_is_busy(struct rsc_drv *drv)
>   	 * dedicated TCS for active state use, then re-purposed wake TCSes
>   	 * should be checked for not busy, because we used wake TCSes for
>   	 * active requests in this case.
> -	 *
> -	 * Since this is called from the last cpu, need not take drv or tcs
> -	 * lock before checking tcs_is_free().
>   	 */
>   	if (!tcs->num_tcs)
>   		tcs = &drv->tcs[WAKE_TCS];
> @@ -791,36 +790,36 @@ static int rpmh_rsc_cpu_pm_callback(struct notifier_block *nfb,
>   {
>   	struct rsc_drv *drv = container_of(nfb, struct rsc_drv, rsc_pm);
>   	int ret = NOTIFY_OK;
> -
> -	spin_lock(&drv->pm_lock);
> +	int cpus_in_pm;
>   
>   	switch (action) {
>   	case CPU_PM_ENTER:
> -		cpumask_set_cpu(smp_processor_id(), &drv->cpus_entered_pm);
> -
> -		if (!cpumask_equal(&drv->cpus_entered_pm, cpu_online_mask))
> -			goto exit;
> +		cpus_in_pm = atomic_inc_return(&drv->cpus_in_pm);
> +		if (cpus_in_pm < num_online_cpus())
> +			return NOTIFY_OK;
>   		break;
>   	case CPU_PM_ENTER_FAILED:
>   	case CPU_PM_EXIT:
> -		cpumask_clear_cpu(smp_processor_id(), &drv->cpus_entered_pm);
> -		goto exit;
> -	}
> -
> -	ret = rpmh_rsc_ctrlr_is_busy(drv);
> -	if (ret) {
> -		ret = NOTIFY_BAD;
> -		goto exit;
> +		atomic_dec(&drv->cpus_in_pm);
> +		return NOTIFY_OK;
>   	}
>   
> -	ret = rpmh_flush(&drv->client);
> -	if (ret)
> +	/*
> +	 * It's likely we're on the last CPU. Grab the drv->lock and write
> +	 * out the sleep/wake commands to RPMH hardware. Grabbing the lock
> +	 * means that if we race with another CPU coming up we are still
> +	 * guaranteed to be safe. If another CPU came up just after we checked
> +	 * and has already started an active transfer then we'll notice we're
> +	 * busy and abort. If another CPU comes up after we start flushing it
> +	 * will be blocked from starting an active transfer until we're done
> +	 * flushing. If another CPU starts an active transfer after we release
> +	 * the lock we're still OK because we're no longer the last CPU.
> +	 */
> +	spin_lock(&drv->lock);
> +	if (rpmh_rsc_ctrlr_is_busy(drv) || !rpmh_flush(&drv->client))
>   		ret = NOTIFY_BAD;
> -	else
> -		ret = NOTIFY_OK;
> +	spin_unlock(&drv->lock);
>   
> -exit:
> -	spin_unlock(&drv->pm_lock);
>   	return ret;
>   }
There is a race if we allow other CPUs to exit without pm lock
below scenarios can happen.

1. for the last cpu (CPU-x) CPU_PM_ENTER notification came
2. On CPU-x, rpmh_flush() is in progress with drv->lock held, which is 
at the flush_batch() stage
3. Meanwhile any other CPU (CPU-y) woken up and did CPU_PM_EXIT
4. From the woken up cpu, rpmh_invalidate() is invoked.
5. It takes ctrlr->cache_lock and empties the cached batch requests 
(making list empty and kfree() the cached requests)
6. The last cpu (CPU-x) which was doing rpmh_flush() suddently gets the 
list empty which it was traversing in flush_batch(),
    leading to null pointer derefence.

To Fix above race condtion, ctrlr->cache_lock should be taken inside 
rpmh_flush() starting point.
also need to remove lockdep_assert_irqs_disabled() which got insterted, 
since IRQ might be enabled by this time if some CPU exits.

Thanks,
Maulik

>   
> @@ -964,7 +963,6 @@ static int rpmh_rsc_probe(struct platform_device *pdev)
>   	solver_config = solver_config >> DRV_HW_SOLVER_SHIFT;
>   	if (!solver_config) {
>   		drv->rsc_pm.notifier_call = rpmh_rsc_cpu_pm_callback;
> -		spin_lock_init(&drv->pm_lock);
>   		cpu_pm_register_notifier(&drv->rsc_pm);
>   	}
>
Doug Anderson April 17, 2020, 12:33 a.m. UTC | #4
Hi,

On Thu, Apr 16, 2020 at 3:51 AM Maulik Shah <mkshah@codeaurora.org> wrote:
>
> Hi,
>
> On 4/14/2020 10:53 PM, Douglas Anderson wrote:
> > It has been postulated that the pm_lock is bad for performance because
> > a CPU currently running rpmh_flush() could block other CPUs from
> > coming out of idle.  Similarly CPUs coming out of / going into idle
> > all need to contend with each other for the spinlock just to update
> > the variable tracking who's in PM.
> >
> > Let's optimize this a bit.  Specifically:
> >
> > - Use a count rather than a bitmask.  This is faster to access and
> >    also means we can use the atomic_inc_return() function to really
> >    detect who the last one to enter PM was.
> > - Accept that it's OK if we race and are doing the flush (because we
> >    think we're last) while another CPU is coming out of idle.  As long
> >    as we block that CPU if/when it tries to do an active-only transfer
> >    we're OK.
> >
> > Signed-off-by: Douglas Anderson <dianders@chromium.org>
> > ---
> >
> >   drivers/soc/qcom/rpmh-internal.h |  7 ++---
> >   drivers/soc/qcom/rpmh-rsc.c      | 46 +++++++++++++++-----------------
> >   2 files changed, 24 insertions(+), 29 deletions(-)
> >
> > diff --git a/drivers/soc/qcom/rpmh-internal.h b/drivers/soc/qcom/rpmh-internal.h
> > index dba8510c0669..449cd511702b 100644
> > --- a/drivers/soc/qcom/rpmh-internal.h
> > +++ b/drivers/soc/qcom/rpmh-internal.h
> > @@ -97,7 +97,7 @@ struct rpmh_ctrlr {
> >    * @num_tcs:            Number of TCSes in this DRV.
> >    * @rsc_pm:             CPU PM notifier for controller.
> >    *                      Used when solver mode is not present.
> > - * @cpus_entered_pm:    CPU mask for cpus in idle power collapse.
> > + * @cpus_in_pm:         Number of CPUs not in idle power collapse.
> >    *                      Used when solver mode is not present.
> >    * @tcs:                TCS groups.
> >    * @tcs_in_use:         S/W state of the TCS; only set for ACTIVE_ONLY
> > @@ -111,8 +111,6 @@ struct rpmh_ctrlr {
> >    *                      grabbing this lock and a tcs_lock at the same time,
> >    *                      grab the tcs_lock first so we always have a
> >    *                      consistent lock ordering.
> > - * @pm_lock:            Synchronize during PM notifications.
> > - *                      Used when solver mode is not present.
> >    * @client:             Handle to the DRV's client.
> >    */
> >   struct rsc_drv {
> > @@ -121,11 +119,10 @@ struct rsc_drv {
> >       int id;
> >       int num_tcs;
> >       struct notifier_block rsc_pm;
> > -     struct cpumask cpus_entered_pm;
> > +     atomic_t cpus_in_pm;
> >       struct tcs_group tcs[TCS_TYPE_NR];
> >       DECLARE_BITMAP(tcs_in_use, MAX_TCS_NR);
> >       spinlock_t lock;
> > -     spinlock_t pm_lock;
> >       struct rpmh_ctrlr client;
> >   };
> >
> > diff --git a/drivers/soc/qcom/rpmh-rsc.c b/drivers/soc/qcom/rpmh-rsc.c
> > index 732316bb67dc..4e45a8ac6cde 100644
> > --- a/drivers/soc/qcom/rpmh-rsc.c
> > +++ b/drivers/soc/qcom/rpmh-rsc.c
> > @@ -740,6 +740,8 @@ int rpmh_rsc_write_ctrl_data(struct rsc_drv *drv, const struct tcs_request *msg)
> >    * SLEEP and WAKE sets. If AMCs are busy, controller can not enter
> >    * power collapse, so deny from the last cpu's pm notification.
> >    *
> > + * Context: Must be called with the drv->lock held.
> > + *
> >    * Return:
> >    * * False          - AMCs are idle
> >    * * True           - AMCs are busy
> > @@ -754,9 +756,6 @@ static bool rpmh_rsc_ctrlr_is_busy(struct rsc_drv *drv)
> >        * dedicated TCS for active state use, then re-purposed wake TCSes
> >        * should be checked for not busy, because we used wake TCSes for
> >        * active requests in this case.
> > -      *
> > -      * Since this is called from the last cpu, need not take drv or tcs
> > -      * lock before checking tcs_is_free().
> >        */
> >       if (!tcs->num_tcs)
> >               tcs = &drv->tcs[WAKE_TCS];
> > @@ -791,36 +790,36 @@ static int rpmh_rsc_cpu_pm_callback(struct notifier_block *nfb,
> >   {
> >       struct rsc_drv *drv = container_of(nfb, struct rsc_drv, rsc_pm);
> >       int ret = NOTIFY_OK;
> > -
> > -     spin_lock(&drv->pm_lock);
> > +     int cpus_in_pm;
> >
> >       switch (action) {
> >       case CPU_PM_ENTER:
> > -             cpumask_set_cpu(smp_processor_id(), &drv->cpus_entered_pm);
> > -
> > -             if (!cpumask_equal(&drv->cpus_entered_pm, cpu_online_mask))
> > -                     goto exit;
> > +             cpus_in_pm = atomic_inc_return(&drv->cpus_in_pm);
> > +             if (cpus_in_pm < num_online_cpus())
> > +                     return NOTIFY_OK;
> >               break;
> >       case CPU_PM_ENTER_FAILED:
> >       case CPU_PM_EXIT:
> > -             cpumask_clear_cpu(smp_processor_id(), &drv->cpus_entered_pm);
> > -             goto exit;
> > -     }
> > -
> > -     ret = rpmh_rsc_ctrlr_is_busy(drv);
> > -     if (ret) {
> > -             ret = NOTIFY_BAD;
> > -             goto exit;
> > +             atomic_dec(&drv->cpus_in_pm);
> > +             return NOTIFY_OK;
> >       }
> >
> > -     ret = rpmh_flush(&drv->client);
> > -     if (ret)
> > +     /*
> > +      * It's likely we're on the last CPU. Grab the drv->lock and write
> > +      * out the sleep/wake commands to RPMH hardware. Grabbing the lock
> > +      * means that if we race with another CPU coming up we are still
> > +      * guaranteed to be safe. If another CPU came up just after we checked
> > +      * and has already started an active transfer then we'll notice we're
> > +      * busy and abort. If another CPU comes up after we start flushing it
> > +      * will be blocked from starting an active transfer until we're done
> > +      * flushing. If another CPU starts an active transfer after we release
> > +      * the lock we're still OK because we're no longer the last CPU.
> > +      */
> > +     spin_lock(&drv->lock);
> > +     if (rpmh_rsc_ctrlr_is_busy(drv) || !rpmh_flush(&drv->client))
> >               ret = NOTIFY_BAD;
> > -     else
> > -             ret = NOTIFY_OK;
> > +     spin_unlock(&drv->lock);
> >
> > -exit:
> > -     spin_unlock(&drv->pm_lock);
> >       return ret;
> >   }
> There is a race if we allow other CPUs to exit without pm lock
> below scenarios can happen.
>
> 1. for the last cpu (CPU-x) CPU_PM_ENTER notification came
> 2. On CPU-x, rpmh_flush() is in progress with drv->lock held, which is
> at the flush_batch() stage
> 3. Meanwhile any other CPU (CPU-y) woken up and did CPU_PM_EXIT
> 4. From the woken up cpu, rpmh_invalidate() is invoked.
> 5. It takes ctrlr->cache_lock and empties the cached batch requests
> (making list empty and kfree() the cached requests)
> 6. The last cpu (CPU-x) which was doing rpmh_flush() suddently gets the
> list empty which it was traversing in flush_batch(),
>     leading to null pointer derefence.
>
> To Fix above race condtion, ctrlr->cache_lock should be taken inside
> rpmh_flush() starting point.

Good catch!


> also need to remove lockdep_assert_irqs_disabled() which got insterted,
> since IRQ might be enabled by this time if some CPU exits.

I'm pretty sure lockdep_assert_irqs_disabled() should stay.  That's
talking about whether the local CPU's IRQs are disabled.  Whether
another processor is running or not will not affect the local CPU.


-Doug
diff mbox series

Patch

diff --git a/drivers/soc/qcom/rpmh-internal.h b/drivers/soc/qcom/rpmh-internal.h
index dba8510c0669..449cd511702b 100644
--- a/drivers/soc/qcom/rpmh-internal.h
+++ b/drivers/soc/qcom/rpmh-internal.h
@@ -97,7 +97,7 @@  struct rpmh_ctrlr {
  * @num_tcs:            Number of TCSes in this DRV.
  * @rsc_pm:             CPU PM notifier for controller.
  *                      Used when solver mode is not present.
- * @cpus_entered_pm:    CPU mask for cpus in idle power collapse.
+ * @cpus_in_pm:         Number of CPUs not in idle power collapse.
  *                      Used when solver mode is not present.
  * @tcs:                TCS groups.
  * @tcs_in_use:         S/W state of the TCS; only set for ACTIVE_ONLY
@@ -111,8 +111,6 @@  struct rpmh_ctrlr {
  *                      grabbing this lock and a tcs_lock at the same time,
  *                      grab the tcs_lock first so we always have a
  *                      consistent lock ordering.
- * @pm_lock:            Synchronize during PM notifications.
- *                      Used when solver mode is not present.
  * @client:             Handle to the DRV's client.
  */
 struct rsc_drv {
@@ -121,11 +119,10 @@  struct rsc_drv {
 	int id;
 	int num_tcs;
 	struct notifier_block rsc_pm;
-	struct cpumask cpus_entered_pm;
+	atomic_t cpus_in_pm;
 	struct tcs_group tcs[TCS_TYPE_NR];
 	DECLARE_BITMAP(tcs_in_use, MAX_TCS_NR);
 	spinlock_t lock;
-	spinlock_t pm_lock;
 	struct rpmh_ctrlr client;
 };
 
diff --git a/drivers/soc/qcom/rpmh-rsc.c b/drivers/soc/qcom/rpmh-rsc.c
index 732316bb67dc..4e45a8ac6cde 100644
--- a/drivers/soc/qcom/rpmh-rsc.c
+++ b/drivers/soc/qcom/rpmh-rsc.c
@@ -740,6 +740,8 @@  int rpmh_rsc_write_ctrl_data(struct rsc_drv *drv, const struct tcs_request *msg)
  * SLEEP and WAKE sets. If AMCs are busy, controller can not enter
  * power collapse, so deny from the last cpu's pm notification.
  *
+ * Context: Must be called with the drv->lock held.
+ *
  * Return:
  * * False		- AMCs are idle
  * * True		- AMCs are busy
@@ -754,9 +756,6 @@  static bool rpmh_rsc_ctrlr_is_busy(struct rsc_drv *drv)
 	 * dedicated TCS for active state use, then re-purposed wake TCSes
 	 * should be checked for not busy, because we used wake TCSes for
 	 * active requests in this case.
-	 *
-	 * Since this is called from the last cpu, need not take drv or tcs
-	 * lock before checking tcs_is_free().
 	 */
 	if (!tcs->num_tcs)
 		tcs = &drv->tcs[WAKE_TCS];
@@ -791,36 +790,36 @@  static int rpmh_rsc_cpu_pm_callback(struct notifier_block *nfb,
 {
 	struct rsc_drv *drv = container_of(nfb, struct rsc_drv, rsc_pm);
 	int ret = NOTIFY_OK;
-
-	spin_lock(&drv->pm_lock);
+	int cpus_in_pm;
 
 	switch (action) {
 	case CPU_PM_ENTER:
-		cpumask_set_cpu(smp_processor_id(), &drv->cpus_entered_pm);
-
-		if (!cpumask_equal(&drv->cpus_entered_pm, cpu_online_mask))
-			goto exit;
+		cpus_in_pm = atomic_inc_return(&drv->cpus_in_pm);
+		if (cpus_in_pm < num_online_cpus())
+			return NOTIFY_OK;
 		break;
 	case CPU_PM_ENTER_FAILED:
 	case CPU_PM_EXIT:
-		cpumask_clear_cpu(smp_processor_id(), &drv->cpus_entered_pm);
-		goto exit;
-	}
-
-	ret = rpmh_rsc_ctrlr_is_busy(drv);
-	if (ret) {
-		ret = NOTIFY_BAD;
-		goto exit;
+		atomic_dec(&drv->cpus_in_pm);
+		return NOTIFY_OK;
 	}
 
-	ret = rpmh_flush(&drv->client);
-	if (ret)
+	/*
+	 * It's likely we're on the last CPU. Grab the drv->lock and write
+	 * out the sleep/wake commands to RPMH hardware. Grabbing the lock
+	 * means that if we race with another CPU coming up we are still
+	 * guaranteed to be safe. If another CPU came up just after we checked
+	 * and has already started an active transfer then we'll notice we're
+	 * busy and abort. If another CPU comes up after we start flushing it
+	 * will be blocked from starting an active transfer until we're done
+	 * flushing. If another CPU starts an active transfer after we release
+	 * the lock we're still OK because we're no longer the last CPU.
+	 */
+	spin_lock(&drv->lock);
+	if (rpmh_rsc_ctrlr_is_busy(drv) || !rpmh_flush(&drv->client))
 		ret = NOTIFY_BAD;
-	else
-		ret = NOTIFY_OK;
+	spin_unlock(&drv->lock);
 
-exit:
-	spin_unlock(&drv->pm_lock);
 	return ret;
 }
 
@@ -964,7 +963,6 @@  static int rpmh_rsc_probe(struct platform_device *pdev)
 	solver_config = solver_config >> DRV_HW_SOLVER_SHIFT;
 	if (!solver_config) {
 		drv->rsc_pm.notifier_call = rpmh_rsc_cpu_pm_callback;
-		spin_lock_init(&drv->pm_lock);
 		cpu_pm_register_notifier(&drv->rsc_pm);
 	}