diff mbox series

[RFC,v3,05/13] ASoC: soc-pcm: align BE 'atomicity' with that of the FE

Message ID 20211013143050.244444-6-pierre-louis.bossart@linux.intel.com (mailing list archive)
State New, archived
Headers show
Series ASoC : soc-pcm: fix trigger race conditions with shared BE | expand

Commit Message

Pierre-Louis Bossart Oct. 13, 2021, 2:30 p.m. UTC
Since the flow for DPCM is based on taking a lock for the FE first, we
need to make sure during the connection between a BE and an FE that
they both use the same 'atomicity', otherwise we may sleep in atomic
context.

If the FE is nonatomic, this patch forces the BE to be nonatomic as
well. That should have no negative impact since the BE 'inherits' the
FE properties.

However, if the FE is atomic and the BE is not, then the configuration
is flagged as invalid.

Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
---
 sound/soc/soc-pcm.c | 22 +++++++++++++++++++++-
 1 file changed, 21 insertions(+), 1 deletion(-)

Comments

Sameer Pujar Oct. 15, 2021, 6:24 a.m. UTC | #1
On 10/13/2021 8:00 PM, Pierre-Louis Bossart wrote:
> Since the flow for DPCM is based on taking a lock for the FE first, we
> need to make sure during the connection between a BE and an FE that
> they both use the same 'atomicity', otherwise we may sleep in atomic
> context.
>
> If the FE is nonatomic, this patch forces the BE to be nonatomic as
> well. That should have no negative impact since the BE 'inherits' the
> FE properties.
>
> However, if the FE is atomic and the BE is not, then the configuration
> is flagged as invalid.

In normal PCM, atomicity seems to apply only for trigger(). Other 
callbacks like prepare, hw_params are executed in non-atomic context. So 
when 'nonatomic' flag is false, still it is possible to sleep in a 
prepare or hw_param callback and this is true for FE as well. So I am 
not sure if atomicity is applicable as a whole even for FE.

At this point it does not cause serious problems, but with subsequent 
patches (especially when patch 7/13 is picked) I see failures. Please 
refer to patch 7/13 thread for more details.


I am wondering if it is possible to only use locks internally for DPCM 
state management and decouple BE callbacks from this, like normal PCMs do?
Takashi Iwai Oct. 15, 2021, 7:39 a.m. UTC | #2
On Fri, 15 Oct 2021 08:24:41 +0200,
Sameer Pujar wrote:
> 
> 
> 
> On 10/13/2021 8:00 PM, Pierre-Louis Bossart wrote:
> > Since the flow for DPCM is based on taking a lock for the FE first, we
> > need to make sure during the connection between a BE and an FE that
> > they both use the same 'atomicity', otherwise we may sleep in atomic
> > context.
> >
> > If the FE is nonatomic, this patch forces the BE to be nonatomic as
> > well. That should have no negative impact since the BE 'inherits' the
> > FE properties.
> >
> > However, if the FE is atomic and the BE is not, then the configuration
> > is flagged as invalid.
> 
> In normal PCM, atomicity seems to apply only for trigger(). Other
> callbacks like prepare, hw_params are executed in non-atomic
> context. So when 'nonatomic' flag is false, still it is possible to
> sleep in a prepare or hw_param callback and this is true for FE as
> well. So I am not sure if atomicity is applicable as a whole even for
> FE.
> 
> At this point it does not cause serious problems, but with subsequent
> patches (especially when patch 7/13 is picked) I see failures. Please
> refer to patch 7/13 thread for more details.
> 
> 
> I am wondering if it is possible to only use locks internally for DPCM
> state management and decouple BE callbacks from this, like normal PCMs
> do?

Actually the patch looks like an overkill by adding the FE stream lock
at every loop, and this caused the problem, AFAIU.

Basically we need to protect the link addition / deletion while the
list traversal (there is a need for protection of BE vs BE access
race, but that's a different code path).  For the normal cases, it
seems already protected by card->pcm_mutex, but the problem is the FE
trigger case.  It was attempted by dpcm_lock, but that failed because
it couldn't be applied properly there.

That said, what we'd need is only:
- Drop dpcm_lock codes once
- Put FE stream lock around dpcm_be_connect() and dpcm_be_disconnect()

That should suffice for the race at trigger.  The FE stream lock is
already taken at trigger callback, and the rest list addition /
deletion are called from different code paths without stream locks, so
the explicit FE stream lock is needed there.

In addition, a lock around dpcm_show_state() might be needed to be
replaced with card->pcm_mutex, and we may need to revisit whether all
other paths take card->pcm_mutex.

Also, BE-vs-BE race can be protected by taking a BE lock inside
dpcm_be_dai_trigger().


Takashi
Pierre-Louis Bossart Oct. 15, 2021, 11:22 a.m. UTC | #3
>> In normal PCM, atomicity seems to apply only for trigger(). Other
>> callbacks like prepare, hw_params are executed in non-atomic
>> context. So when 'nonatomic' flag is false, still it is possible to
>> sleep in a prepare or hw_param callback and this is true for FE as
>> well. So I am not sure if atomicity is applicable as a whole even for
>> FE.

The typical path is snd_pcm_elapsed() on the FE, which will trigger the
BE. When we add the BE lock in patch7, things break: what matters is the
FE context, the locks used for the BE have to be aligned with the FE
atomicity.

My test results show the problem:
https://github.com/thesofproject/linux/pull/3209#issuecomment-941229502
and this patch fixes the issue.

I am all ears if someone has a better solution, but the problem is real.

I chose to add this patch first, before the BE lock is added in
dpcm_be_dai_trigger(), and if this causes problems already there are
even more issues in DPCM :-(

If this patch causes issues outside of the trigger phase, then maybe we
could just change the BE nonatomic flag temporarily and restore it after
taking the lock, but IMHO something else is broken in other drivers.

>> At this point it does not cause serious problems, but with subsequent
>> patches (especially when patch 7/13 is picked) I see failures. Please
>> refer to patch 7/13 thread for more details.
>>
>>
>> I am wondering if it is possible to only use locks internally for DPCM
>> state management and decouple BE callbacks from this, like normal PCMs
>> do?
> 
> Actually the patch looks like an overkill by adding the FE stream lock
> at every loop, and this caused the problem, AFAIU.
> 
> Basically we need to protect the link addition / deletion while the
> list traversal (there is a need for protection of BE vs BE access
> race, but that's a different code path).  For the normal cases, it
> seems already protected by card->pcm_mutex, but the problem is the FE
> trigger case.  It was attempted by dpcm_lock, but that failed because
> it couldn't be applied properly there.
> 
> That said, what we'd need is only:
> - Drop dpcm_lock codes once

I am not able to follow this sentence, what did you mean here?

> - Put FE stream lock around dpcm_be_connect() and dpcm_be_disconnect()
> 
> That should suffice for the race at trigger.  The FE stream lock is
> already taken at trigger callback, and the rest list addition /
> deletion are called from different code paths without stream locks, so
> the explicit FE stream lock is needed there.

I am not able to follow what you meant after "the rest". This sentence
mentions the FE stream lock in two places, but the second is not clear
to me.

> In addition, a lock around dpcm_show_state() might be needed to be
> replaced with card->pcm_mutex, and we may need to revisit whether all
> other paths take card->pcm_mutex.

What happens if we show the state while a trigger happens? That's my
main concern with using two separate locks (pcm_mutex and FE stream
lock) to protect the same list, there are still windows of time where
the list is not protected.

same with the use of for_each_dpcm_be() in soc-compress, SH and FSL
drivers, there's no other mutex there.

My approach might have been overkill in some places, but it's comprehensive.


> Also, BE-vs-BE race can be protected by taking a BE lock inside
> dpcm_be_dai_trigger().

that's what I did, no?
Pierre-Louis Bossart Oct. 15, 2021, 12:04 p.m. UTC | #4
On 10/15/21 6:22 AM, Pierre-Louis Bossart wrote:
> If this patch causes issues outside of the trigger phase, then maybe we
> could just change the BE nonatomic flag temporarily and restore it after
> taking the lock, but IMHO something else is broken in other drivers.

Now that I think of it, this wouldn't work, the type of locks for the
BEs has to be set before the trigger: the DPCM flow is that we start
from the FEs and find which BEs need to be triggered. That would prevent
us from modifying a global BE property - each FE is independent.
Takashi Iwai Oct. 15, 2021, 3:38 p.m. UTC | #5
On Fri, 15 Oct 2021 13:22:50 +0200,
Pierre-Louis Bossart wrote:
> 
> >> At this point it does not cause serious problems, but with subsequent
> >> patches (especially when patch 7/13 is picked) I see failures. Please
> >> refer to patch 7/13 thread for more details.
> >>
> >>
> >> I am wondering if it is possible to only use locks internally for DPCM
> >> state management and decouple BE callbacks from this, like normal PCMs
> >> do?
> > 
> > Actually the patch looks like an overkill by adding the FE stream lock
> > at every loop, and this caused the problem, AFAIU.
> > 
> > Basically we need to protect the link addition / deletion while the
> > list traversal (there is a need for protection of BE vs BE access
> > race, but that's a different code path).  For the normal cases, it
> > seems already protected by card->pcm_mutex, but the problem is the FE
> > trigger case.  It was attempted by dpcm_lock, but that failed because
> > it couldn't be applied properly there.
> > 
> > That said, what we'd need is only:
> > - Drop dpcm_lock codes once
> 
> I am not able to follow this sentence, what did you mean here?

Just remove all dpcm_lock usages one to replace with a new one.

> > - Put FE stream lock around dpcm_be_connect() and dpcm_be_disconnect()
> > 
> > That should suffice for the race at trigger.  The FE stream lock is
> > already taken at trigger callback, and the rest list addition /
> > deletion are called from different code paths without stream locks, so
> > the explicit FE stream lock is needed there.
> 
> I am not able to follow what you meant after "the rest". This sentence
> mentions the FE stream lock in two places, but the second is not clear
> to me.

The FE stream locks are necessary only two points: at adding and
deleting the BE in the link.  We used dpcm_lock in other places, but
those are superfluous or would make problem if converted to a FE
stream lock.

> > In addition, a lock around dpcm_show_state() might be needed to be
> > replaced with card->pcm_mutex, and we may need to revisit whether all
> > other paths take card->pcm_mutex.
> 
> What happens if we show the state while a trigger happens? That's my
> main concern with using two separate locks (pcm_mutex and FE stream
> lock) to protect the same list, there are still windows of time where
> the list is not protected.

With the proper use of mutex, the list itself is protected.
If we need to protect the concurrent access to each BE in the show
method, an additional BE lock is needed in that part.  But that's a
subtle issue, as the link traversal itself is protected by the mutex.

> same with the use of for_each_dpcm_be() in soc-compress, SH and FSL
> drivers, there's no other mutex there.
> 
> My approach might have been overkill in some places, but it's comprehensive.
> 
> 
> > Also, BE-vs-BE race can be protected by taking a BE lock inside
> > dpcm_be_dai_trigger().
> 
> that's what I did, no?

Right.

So what I wrote is like the patches below.  Those three should be
applicable on top of the latest Linus tree.  It's merely a PoC, and it
doesn't take the compress-offload usage into account at all, but this
should illustrate my idea.


Takashi
Pierre-Louis Bossart Oct. 15, 2021, 4:22 p.m. UTC | #6
> The FE stream locks are necessary only two points: at adding and
> deleting the BE in the link.  We used dpcm_lock in other places, but
> those are superfluous or would make problem if converted to a FE
> stream lock.

I must be missing a fundamental concept here - possibly a set of concepts...

It is my understanding that the FE-BE connection can be updated
dynamically without any relationship to the usual ALSA steps, e.g. as a
result of a control being changed by a user.

So if you only protect the addition/removal, isn't there a case where
the for_each_dpcm_be() loop would either miss a BE or point to an
invalid one?

In other words, don't we need the *same* lock to be used
a) before changing and
b) walking through the list?

I also don't get what would happen if the dpcm_lock was converted to an
FE stream lock. It works fine in my tests, so if there's limitation I
didn't see it.

>>> In addition, a lock around dpcm_show_state() might be needed to be
>>> replaced with card->pcm_mutex, and we may need to revisit whether all
>>> other paths take card->pcm_mutex.
>>
>> What happens if we show the state while a trigger happens? That's my
>> main concern with using two separate locks (pcm_mutex and FE stream
>> lock) to protect the same list, there are still windows of time where
>> the list is not protected.
> 
> With the proper use of mutex, the list itself is protected.
> If we need to protect the concurrent access to each BE in the show
> method, an additional BE lock is needed in that part.  But that's a
> subtle issue, as the link traversal itself is protected by the mutex.

If I look at your patch2, dpcm_be_disconnect() protects the list removal
with the fe stream lock, but the show state is protected by both the
pcm_mutex and the fe stream lock.

I have not been able to figure out when you need
a) the pcm_mutex only
b) the fe stream lock only
c) both pcm_mutex and fe stream lock
Takashi Iwai Oct. 15, 2021, 4:56 p.m. UTC | #7
On Fri, 15 Oct 2021 18:22:58 +0200,
Pierre-Louis Bossart wrote:
> 
> 
> > The FE stream locks are necessary only two points: at adding and
> > deleting the BE in the link.  We used dpcm_lock in other places, but
> > those are superfluous or would make problem if converted to a FE
> > stream lock.
> 
> I must be missing a fundamental concept here - possibly a set of concepts...
> 
> It is my understanding that the FE-BE connection can be updated
> dynamically without any relationship to the usual ALSA steps, e.g. as a
> result of a control being changed by a user.
> 
> So if you only protect the addition/removal, isn't there a case where
> the for_each_dpcm_be() loop would either miss a BE or point to an
> invalid one?

No, for sleepable context, pcm_mutex is *always* taken when
adding/deleting a BE, and that's the main protection for the link.
The BE stream lock is taken additionally over it at adding/deleting a
BE, just for the code path via FE and BE trigger.

> In other words, don't we need the *same* lock to be used
> a) before changing and
> b) walking through the list?

> I also don't get what would happen if the dpcm_lock was converted to an
> FE stream lock. It works fine in my tests, so if there's limitation I
> didn't see it.

dpcm_lock was put in the places that could be recursively taken.
So this caused some deadlock, I suppose.

> >>> In addition, a lock around dpcm_show_state() might be needed to be
> >>> replaced with card->pcm_mutex, and we may need to revisit whether all
> >>> other paths take card->pcm_mutex.
> >>
> >> What happens if we show the state while a trigger happens? That's my
> >> main concern with using two separate locks (pcm_mutex and FE stream
> >> lock) to protect the same list, there are still windows of time where
> >> the list is not protected.
> > 
> > With the proper use of mutex, the list itself is protected.
> > If we need to protect the concurrent access to each BE in the show
> > method, an additional BE lock is needed in that part.  But that's a
> > subtle issue, as the link traversal itself is protected by the mutex.
> 
> If I look at your patch2, dpcm_be_disconnect() protects the list removal
> with the fe stream lock, but the show state is protected by both the
> pcm_mutex and the fe stream lock.

No, show_state() itself doesn't take any lock, but its caller
dpcm_state_read_file() takes the pcm_mutex.  That protects the list
addition / deletion.

> I have not been able to figure out when you need
> a) the pcm_mutex only
> b) the fe stream lock only
> c) both pcm_mutex and fe stream lock

The pcm_mutex is needed for every sleepable function that treat DPCM
FE link, but the mutex is taken only at the upper level, i.e. the
top-most caller like PCM ops FE itself or the DAPM calls.

That said, pcm_mutex is the top-most protection of BE links in FE.
But, there is a code path where a mutex can't be used, and that's the
FE and BE trigger.  For protecting against this, the FE stream lock is
taken only at the placing both adding and deleting a BE *in addition*.
At those places, both pcm_mutex and FE stream lock are taken.

BE stream lock is taken in addition below the above mutex and FE
locks.


Takashi
Pierre-Louis Bossart Oct. 15, 2021, 5:08 p.m. UTC | #8
>> I have not been able to figure out when you need
>> a) the pcm_mutex only
>> b) the fe stream lock only
>> c) both pcm_mutex and fe stream lock
> 
> The pcm_mutex is needed for every sleepable function that treat DPCM
> FE link, but the mutex is taken only at the upper level, i.e. the
> top-most caller like PCM ops FE itself or the DAPM calls.
> 
> That said, pcm_mutex is the top-most protection of BE links in FE.
> But, there is a code path where a mutex can't be used, and that's the
> FE and BE trigger.  For protecting against this, the FE stream lock is
> taken only at the placing both adding and deleting a BE *in addition*.
> At those places, both pcm_mutex and FE stream lock are taken.
> 
> BE stream lock is taken in addition below the above mutex and FE
> locks.

Thanks Takashi, now I get the idea. Makes sense indeed. I'll make sure
to add this explanation to the commit message/cover so that it's not lost.

I added your three patches to my tests, so far so good, code is that
https://github.com/thesofproject/linux/pull/3215

Thanks and have a nice week-end.
-Pierre
diff mbox series

Patch

diff --git a/sound/soc/soc-pcm.c b/sound/soc/soc-pcm.c
index 52851827d53f..f22bbf95319d 100644
--- a/sound/soc/soc-pcm.c
+++ b/sound/soc/soc-pcm.c
@@ -1151,13 +1151,33 @@  static snd_pcm_uframes_t soc_pcm_pointer(struct snd_pcm_substream *substream)
 static int dpcm_be_connect(struct snd_soc_pcm_runtime *fe,
 		struct snd_soc_pcm_runtime *be, int stream)
 {
+	struct snd_pcm_substream *fe_substream;
+	struct snd_pcm_substream *be_substream;
 	struct snd_soc_dpcm *dpcm;
 
 	/* only add new dpcms */
+	snd_soc_dpcm_fe_lock_irq(fe, stream);
 	for_each_dpcm_be(fe, stream, dpcm) {
-		if (dpcm->be == be && dpcm->fe == fe)
+		if (dpcm->be == be && dpcm->fe == fe) {
+			snd_soc_dpcm_fe_unlock_irq(fe, stream);
 			return 0;
+		}
+	}
+	fe_substream = snd_soc_dpcm_get_substream(fe, stream);
+	be_substream = snd_soc_dpcm_get_substream(be, stream);
+
+	if (!fe_substream->pcm->nonatomic && be_substream->pcm->nonatomic) {
+		dev_err(be->dev, "%s: FE is atomic but BE is nonatomic, invalid configuration\n",
+			__func__);
+		snd_soc_dpcm_fe_unlock_irq(fe, stream);
+		return -EINVAL;
 	}
+	if (fe_substream->pcm->nonatomic && !be_substream->pcm->nonatomic) {
+		dev_warn(be->dev, "%s: FE is nonatomic but BE is not, forcing BE as nonatomic\n",
+			 __func__);
+		be_substream->pcm->nonatomic = 1;
+	}
+	snd_soc_dpcm_fe_unlock_irq(fe, stream);
 
 	dpcm = kzalloc(sizeof(struct snd_soc_dpcm), GFP_KERNEL);
 	if (!dpcm)