Message ID | 20191120060256.212818-4-tzungbi@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | ASoC: max98090: fix PLL unlocked workaround-related | expand |
On 11/20/19 12:02 AM, Tzung-Bi Shih wrote: > max98090_interrupt() and max98090_pll_work() run in 2 different threads. > There are 2 possible races: > > Note: M98090_REG_DEVICE_STATUS = 0x01. > Note: ULK == 0, PLL is locked; ULK == 1, PLL is unlocked. > > max98090_interrupt max98090_pll_work > ---------------------------------------------- > schedule max98090_pll_work > restart max98090 codec > receive ULK INT > assert ULK == 0 > schedule max98090_pll_work (1). > > In the case (1), the PLL is locked but max98090_interrupt unnecessarily > schedules another max98090_pll_work. if you re-test that the PLL is already running, then you can exit the work function immediately without redoing the sequence? maybe also play with the masks so that the PLL unlock is masked in the interrupt and unmasked after the PLL locks? > > max98090_interrupt max98090_pll_work max98090 codec > ---------------------------------------------------------------------- > ULK = 1 > receive ULK INT > read 0x01 > ULK = 0 (clear on read) > schedule max98090_pll_work > restart max98090 codec > ULK = 1 > receive ULK INT > read 0x01 > ULK = 0 (clear on read) > read 0x1 > assert ULK == 0 (2). what are those 0x01 and 0x1? is the second a typo possibly? > > In the case (2), both max98090_interrupt and max98090_pll_work read > the same clear-on-read register. max98090_pll_work would falsely > thought PLL is locked. > > There are 2 possible options: > A. turn off ULK interrupt before scheduling max98090_pll_work; and turn > on again before exiting max98090_pll_work. > B. remove the second thread of execution. > > Adopts option B which is more straightforward. but has the side effect of possibly adding a 10ms delay in the interrupt thread?
On Thu, Nov 21, 2019 at 12:10 AM Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> wrote: > On 11/20/19 12:02 AM, Tzung-Bi Shih wrote: > > max98090_interrupt() and max98090_pll_work() run in 2 different threads. > > There are 2 possible races: > > > > Note: M98090_REG_DEVICE_STATUS = 0x01. > > Note: ULK == 0, PLL is locked; ULK == 1, PLL is unlocked. > > > > max98090_interrupt max98090_pll_work > > ---------------------------------------------- > > schedule max98090_pll_work > > restart max98090 codec > > receive ULK INT > > assert ULK == 0 > > schedule max98090_pll_work (1). > > > > In the case (1), the PLL is locked but max98090_interrupt unnecessarily > > schedules another max98090_pll_work. > > if you re-test that the PLL is already running, then you can exit the > work function immediately without redoing the sequence? ACK. > maybe also play with the masks so that the PLL unlock is masked in the > interrupt and unmasked after the PLL locks? ACK, this is our option A mentioned below. > > max98090_interrupt max98090_pll_work max98090 codec > > ---------------------------------------------------------------------- > > ULK = 1 > > receive ULK INT > > read 0x01 > > ULK = 0 (clear on read) > > schedule max98090_pll_work > > restart max98090 codec > > ULK = 1 > > receive ULK INT > > read 0x01 > > ULK = 0 (clear on read) > > read 0x1 > > assert ULK == 0 (2). > > what are those 0x01 and 0x1? is the second a typo possibly? ACK, a typo. > > In the case (2), both max98090_interrupt and max98090_pll_work read > > the same clear-on-read register. max98090_pll_work would falsely > > thought PLL is locked. > > > > There are 2 possible options: > > A. turn off ULK interrupt before scheduling max98090_pll_work; and turn > > on again before exiting max98090_pll_work. > > B. remove the second thread of execution. > > > > Adopts option B which is more straightforward. > > but has the side effect of possibly adding a 10ms delay in the interrupt > thread? (forgot to mention) Option A cannot fix the case (2) race condition: there would be 2 threads read the same clear-on-read register. In theory, the hardware should faster than CPUs' accesses via I2C. max98090 should returns ULK=1 any time if PLL is unlocked. Shall we ignore the case (2) and adopt option A?
> >>> max98090_interrupt max98090_pll_work max98090 codec >>> ---------------------------------------------------------------------- >>> ULK = 1 >>> receive ULK INT >>> read 0x01 >>> ULK = 0 (clear on read) >>> schedule max98090_pll_work >>> restart max98090 codec >>> ULK = 1 >>> receive ULK INT >>> read 0x01 >>> ULK = 0 (clear on read) >>> read 0x1 >>> assert ULK == 0 (2). >> >> what are those 0x01 and 0x1? is the second a typo possibly? > > ACK, a typo. > >>> In the case (2), both max98090_interrupt and max98090_pll_work read >>> the same clear-on-read register. max98090_pll_work would falsely >>> thought PLL is locked. >>> >>> There are 2 possible options: >>> A. turn off ULK interrupt before scheduling max98090_pll_work; and turn >>> on again before exiting max98090_pll_work. >>> B. remove the second thread of execution. >>> >>> Adopts option B which is more straightforward. >> >> but has the side effect of possibly adding a 10ms delay in the interrupt >> thread? > > (forgot to mention) Option A cannot fix the case (2) race condition: > there would be 2 threads read the same clear-on-read register. In > theory, the hardware should faster than CPUs' accesses via I2C. > max98090 should returns ULK=1 any time if PLL is unlocked. Shall we > ignore the case (2) and adopt option A? I don't have any specific recommendation, just trying to follow your line of thought on a problem that bugged be in the past. If your tests shows that the extra delay is fine, then it's good progress Still you may want to clarify your second point and the commit message. The first race condition was clear, the second one not so much. what did you mean with 'assert ULK == 0' and what does this map to in pll_work the code looks as this: static void max98090_pll_work(struct work_struct *work) { if (!snd_soc_component_is_active(component)) return; /* Toggle shutdown OFF then ON */ snd_soc_component_update_bits(component, M98090_REG_DEVICE_SHUTDOWN, M98090_SHDNN_MASK, 0); snd_soc_component_update_bits(component, M98090_REG_DEVICE_SHUTDOWN, M98090_SHDNN_MASK, M98090_SHDNN_MASK); /* Give PLL time to lock */ msleep(10); } so in what does the race occur? or was it race with the other work queue, which starts like this: static void max98090_pll_det_enable_work(struct work_struct *work) { struct max98090_priv *max98090 = container_of(work, struct max98090_priv, pll_det_enable_work.work); struct snd_soc_component *component = max98090->component; unsigned int status, mask; /* * Clear status register in order to clear possibly already occurred * PLL unlock. If PLL hasn't still locked, the status will be set * again and PLL unlock interrupt will occur. * Note this will clear all status bits */ regmap_read(max98090->regmap, M98090_REG_DEVICE_STATUS, &status); So here indeed there is the same read, but what is the consequence of the read? did you mean that in the pll_det_enable_work the value of 'status' may be incorrect as it was cleared already? But the interrupt does not schedule the pll_det_enable_work(), it happens on a trigger, so how would the race happen? it'd be good if you provide more details on the flow so that all reviewers get the ideas, it's a good opportunity to fix this for good and move on.
On Thu, Nov 21, 2019 at 11:20 AM Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> wrote: > Still you may want to clarify your second point and the commit message. > The first race condition was clear, the second one not so much. > > what did you mean with 'assert ULK == 0' and what does this map to in > pll_work The case (2) race condition is based on the previous patch: https://mailman.alsa-project.org/pipermail/alsa-devel/2019-November/158852.html After applying the patch, the code would be something like: (in the case, two threads read 0x01.) static void max98090_pll_work(struct work_struct *work) { [snip] snd_soc_component_update_bits(component, M98090_REG_DEVICE_SHUTDOWN, M98090_SHDNN_MASK, 0); snd_soc_component_update_bits(component, M98090_REG_DEVICE_SHUTDOWN, M98090_SHDNN_MASK, M98090_SHDNN_MASK); for (i = 0; i < 10; ++i) { /* Check lock status */ pll = snd_soc_component_read32( component, M98090_REG_DEVICE_STATUS); if (!(pll & M98090_ULK_MASK)) break; /* Give PLL time to lock */ usleep_range(1000, 1200); } } I guess we have 2 choices: i. stick to the original plan: option B, remove the second thread of execution. In the case, we would be punished by at most 10~12ms delay in the interrupt handler thread. I browsed the max98090_interrupt() and tend to think it should be fine if we delay jack detection for extra 10ms. ii. adopt option A, play with the masks; and drop the previous patch to avoid multiple threads access 0x01 We would be forced to wait >10ms before re-enabling ULK interrupt. Notably, Documentation/timers/timers-howto.txt states "msleep(1~20) may not do what the caller intends, and will often sleep longer".Documentation/timers/timers-howto.txt". Either way should be fine. Which one would you suggest?
On 11/21/19 12:17 AM, Tzung-Bi Shih wrote: > On Thu, Nov 21, 2019 at 11:20 AM Pierre-Louis Bossart > <pierre-louis.bossart@linux.intel.com> wrote: >> Still you may want to clarify your second point and the commit message. >> The first race condition was clear, the second one not so much. >> >> what did you mean with 'assert ULK == 0' and what does this map to in >> pll_work > > The case (2) race condition is based on the previous patch: > https://mailman.alsa-project.org/pipermail/alsa-devel/2019-November/158852.html ah, ok, so that's a race you introduced :-) > > After applying the patch, the code would be something like: (in the > case, two threads read 0x01.) > static void max98090_pll_work(struct work_struct *work) > { > [snip] > snd_soc_component_update_bits(component, M98090_REG_DEVICE_SHUTDOWN, > M98090_SHDNN_MASK, 0); > snd_soc_component_update_bits(component, M98090_REG_DEVICE_SHUTDOWN, > M98090_SHDNN_MASK, M98090_SHDNN_MASK); > > for (i = 0; i < 10; ++i) { > /* Check lock status */ > pll = snd_soc_component_read32( > component, M98090_REG_DEVICE_STATUS); > if (!(pll & M98090_ULK_MASK)) > break; > > /* Give PLL time to lock */ > usleep_range(1000, 1200); > } > } Sorry, I don't see at what point the race happens. if your interrupt sees an ULK status, it will schedule the pll_work. You only test the status again *after* switching the device on/off, so what is the side effect? I am also wondering if you shouldn't sleep first in the loop, then check the status? And in case you do have a side effect due to the clear on read, maybe you could save the status in a context and retest in the workqueue, that way you'd have a trace of the ULK event even if the register was cleared.
On Thu, Nov 21, 2019 at 11:41 PM Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> wrote: > On 11/21/19 12:17 AM, Tzung-Bi Shih wrote: > > On Thu, Nov 21, 2019 at 11:20 AM Pierre-Louis Bossart > > <pierre-louis.bossart@linux.intel.com> wrote: > >> Still you may want to clarify your second point and the commit message. > >> The first race condition was clear, the second one not so much. > >> > >> what did you mean with 'assert ULK == 0' and what does this map to in > >> pll_work > > > > The case (2) race condition is based on the previous patch: > > https://mailman.alsa-project.org/pipermail/alsa-devel/2019-November/158852.html > > ah, ok, so that's a race you introduced :-) > > > > > After applying the patch, the code would be something like: (in the > > case, two threads read 0x01.) > > static void max98090_pll_work(struct work_struct *work) > > { > > [snip] > > snd_soc_component_update_bits(component, M98090_REG_DEVICE_SHUTDOWN, > > M98090_SHDNN_MASK, 0); > > snd_soc_component_update_bits(component, M98090_REG_DEVICE_SHUTDOWN, > > M98090_SHDNN_MASK, M98090_SHDNN_MASK); > > > > for (i = 0; i < 10; ++i) { > > /* Check lock status */ > > pll = snd_soc_component_read32( > > component, M98090_REG_DEVICE_STATUS); > > if (!(pll & M98090_ULK_MASK)) > > break; > > > > /* Give PLL time to lock */ > > usleep_range(1000, 1200); > > } > > } > > Sorry, I don't see at what point the race happens. > > if your interrupt sees an ULK status, it will schedule the pll_work. > You only test the status again *after* switching the device on/off, so > what is the side effect? Both interrupt thread and pll_work thread access the status register. The side effect is: interrupt thread could read the status register right before pll_work thread's read. > I am also wondering if you shouldn't sleep first in the loop, then check > the status? ACK, "sleep first and check the status" makes more sense. > And in case you do have a side effect due to the clear on read, maybe > you could save the status in a context and retest in the workqueue, that > way you'd have a trace of the ULK event even if the register was cleared. I think cache the status couldn't be a good idea. We couldn't make sure the freshness of the cache. Can we suppose CPUs always get ULK=1 if PLL is unlocked so that we can simply ignore the case (2)? Since we know the register is volatile and read the register via I2C should be much slower than hardware raise the bit.
>> if your interrupt sees an ULK status, it will schedule the pll_work. >> You only test the status again *after* switching the device on/off, so >> what is the side effect? > > Both interrupt thread and pll_work thread access the status register. > The side effect is: interrupt thread could read the status register > right before pll_work thread's read. > >> I am also wondering if you shouldn't sleep first in the loop, then check >> the status? > > ACK, "sleep first and check the status" makes more sense. > >> And in case you do have a side effect due to the clear on read, maybe >> you could save the status in a context and retest in the workqueue, that >> way you'd have a trace of the ULK event even if the register was cleared. > > I think cache the status couldn't be a good idea. We couldn't make > sure the freshness of the cache. I was thinking more of treating it as a volatile, but it's ugly indeed. > Can we suppose CPUs always get ULK=1 if PLL is unlocked so that we can > simply ignore the case (2)? Since we know the register is volatile > and read the register via I2C should be much slower than hardware > raise the bit. Your option B with the small change to sleep then retest is probably the better solution overall. BTW did you test this on both Baytrail and BSW chromebooks? In the past I saw baytrail devices exposed this error a lot more than BSW.
On Fri, Nov 22, 2019 at 3:20 AM Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> wrote: > > Can we suppose CPUs always get ULK=1 if PLL is unlocked so that we can > > simply ignore the case (2)? Since we know the register is volatile > > and read the register via I2C should be much slower than hardware > > raise the bit. > > Your option B with the small change to sleep then retest is probably the > better solution overall. Have sent v2, https://mailman.alsa-project.org/pipermail/alsa-devel/2019-November/159050.html > BTW did you test this on both Baytrail and BSW chromebooks? In the past > I saw baytrail devices exposed this error a lot more than BSW. I tested on a Baytrail-based chromebook which is easy to generate lots of "PLL unlocked" on the console. After applying this series and the next series (https://mailman.alsa-project.org/pipermail/alsa-devel/2019-November/158853.html), I seldom see "PLL unlocked" on the console. It still happens a few times, but with very very low probability.
diff --git a/sound/soc/codecs/max98090.c b/sound/soc/codecs/max98090.c index 2ccdfb2383b7..75abe98dfc44 100644 --- a/sound/soc/codecs/max98090.c +++ b/sound/soc/codecs/max98090.c @@ -2103,10 +2103,8 @@ static void max98090_pll_det_disable_work(struct work_struct *work) M98090_IULK_MASK, 0); } -static void max98090_pll_work(struct work_struct *work) +static void max98090_pll_work(struct max98090_priv *max98090) { - struct max98090_priv *max98090 = - container_of(work, struct max98090_priv, pll_work); struct snd_soc_component *component = max98090->component; unsigned int pll; int i; @@ -2268,7 +2266,7 @@ static irqreturn_t max98090_interrupt(int irq, void *data) if (active & M98090_ULK_MASK) { dev_dbg(component->dev, "M98090_ULK_MASK\n"); - schedule_work(&max98090->pll_work); + max98090_pll_work(max98090); } if (active & M98090_JDET_MASK) { @@ -2431,7 +2429,6 @@ static int max98090_probe(struct snd_soc_component *component) max98090_pll_det_enable_work); INIT_WORK(&max98090->pll_det_disable_work, max98090_pll_det_disable_work); - INIT_WORK(&max98090->pll_work, max98090_pll_work); /* Enable jack detection */ snd_soc_component_write(component, M98090_REG_JACK_DETECT, @@ -2484,7 +2481,6 @@ static void max98090_remove(struct snd_soc_component *component) cancel_delayed_work_sync(&max98090->jack_work); cancel_delayed_work_sync(&max98090->pll_det_enable_work); cancel_work_sync(&max98090->pll_det_disable_work); - cancel_work_sync(&max98090->pll_work); max98090->component = NULL; } diff --git a/sound/soc/codecs/max98090.h b/sound/soc/codecs/max98090.h index 57965cd678b4..a197114b0dad 100644 --- a/sound/soc/codecs/max98090.h +++ b/sound/soc/codecs/max98090.h @@ -1530,7 +1530,6 @@ struct max98090_priv { struct delayed_work jack_work; struct delayed_work pll_det_enable_work; struct work_struct pll_det_disable_work; - struct work_struct pll_work; struct snd_soc_jack *jack; unsigned int dai_fmt; int tdm_slots;
max98090_interrupt() and max98090_pll_work() run in 2 different threads. There are 2 possible races: Note: M98090_REG_DEVICE_STATUS = 0x01. Note: ULK == 0, PLL is locked; ULK == 1, PLL is unlocked. max98090_interrupt max98090_pll_work ---------------------------------------------- schedule max98090_pll_work restart max98090 codec receive ULK INT assert ULK == 0 schedule max98090_pll_work (1). In the case (1), the PLL is locked but max98090_interrupt unnecessarily schedules another max98090_pll_work. max98090_interrupt max98090_pll_work max98090 codec ---------------------------------------------------------------------- ULK = 1 receive ULK INT read 0x01 ULK = 0 (clear on read) schedule max98090_pll_work restart max98090 codec ULK = 1 receive ULK INT read 0x01 ULK = 0 (clear on read) read 0x1 assert ULK == 0 (2). In the case (2), both max98090_interrupt and max98090_pll_work read the same clear-on-read register. max98090_pll_work would falsely thought PLL is locked. There are 2 possible options: A. turn off ULK interrupt before scheduling max98090_pll_work; and turn on again before exiting max98090_pll_work. B. remove the second thread of execution. Adopts option B which is more straightforward. Signed-off-by: Tzung-Bi Shih <tzungbi@google.com> --- sound/soc/codecs/max98090.c | 8 ++------ sound/soc/codecs/max98090.h | 1 - 2 files changed, 2 insertions(+), 7 deletions(-)