diff mbox series

[V2] pwm: imx27: workaround of the pwm output bug when decrease the duty cycle

Message ID 20211220073130.1429723-1-xiaoning.wang@nxp.com (mailing list archive)
State New, archived
Headers show
Series [V2] pwm: imx27: workaround of the pwm output bug when decrease the duty cycle | expand

Commit Message

Clark Wang Dec. 20, 2021, 7:31 a.m. UTC
This is a limited workaround for the PWM IP issue.

Root cause:
When the SAR FIFO is empty, the new write value will be directly applied
to SAR even the current period is not over.
If the new SAR value is less than the old one, and the counter is
greater than the new SAR value, the current period will not filp the
level. This will result in a pulse with a duty cycle of 100%.

Workaround:
Add an old value SAR write before updating the new duty cycle to SAR.
This will keep the new value is always in a not empty fifo, and can be wait
to update after a period finished.

Limitation:
This workaround can only solve this issue when the PWM period is longer than
2us(or <500KHz).

Reviewed-by: Jun Li <jun.li@nxp.com>
Signed-off-by: Clark Wang <xiaoning.wang@nxp.com>
---
V2:
 Fix the warnings when built in riscv-gcc, which is reported by kernel test robot <lkp@intel.com>
---
 drivers/pwm/pwm-imx27.c | 67 ++++++++++++++++++++++++++++++++++++++---
 1 file changed, 62 insertions(+), 5 deletions(-)

Comments

Uwe Kleine-König Dec. 20, 2021, 10:55 a.m. UTC | #1
Hello,

[added Jun Li who reviewed the patch to Cc:]

On Mon, Dec 20, 2021 at 03:31:30PM +0800, Clark Wang wrote:
> This is a limited workaround for the PWM IP issue.
> 
> Root cause:
> When the SAR FIFO is empty, the new write value will be directly applied
> to SAR even the current period is not over.


> If the new SAR value is less than the old one, and the counter is
> greater than the new SAR value, the current period will not filp the
> level. This will result in a pulse with a duty cycle of 100%.

I read the i.MX25 PWM chapter a few times now, and I played around a
bit. I could confirm the issue.

> Workaround:
> Add an old value SAR write before updating the new duty cycle to SAR.
> This will keep the new value is always in a not empty fifo, and can be wait
> to update after a period finished.
> 
> Limitation:
> This workaround can only solve this issue when the PWM period is longer than
> 2us(or <500KHz).
> 
> Reviewed-by: Jun Li <jun.li@nxp.com>
> Signed-off-by: Clark Wang <xiaoning.wang@nxp.com>
> ---
> V2:
>  Fix the warnings when built in riscv-gcc, which is reported by kernel test robot <lkp@intel.com>
> ---
>  drivers/pwm/pwm-imx27.c | 67 ++++++++++++++++++++++++++++++++++++++---
>  1 file changed, 62 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/pwm/pwm-imx27.c b/drivers/pwm/pwm-imx27.c
> index ea91a2f81a9f..3d9ca60e2baa 100644
> --- a/drivers/pwm/pwm-imx27.c
> +++ b/drivers/pwm/pwm-imx27.c
> @@ -21,11 +21,13 @@
>  #include <linux/platform_device.h>
>  #include <linux/pwm.h>
>  #include <linux/slab.h>
> +#include <linux/spinlock.h>
>  
>  #define MX3_PWMCR			0x00    /* PWM Control Register */
>  #define MX3_PWMSR			0x04    /* PWM Status Register */
>  #define MX3_PWMSAR			0x0C    /* PWM Sample Register */
>  #define MX3_PWMPR			0x10    /* PWM Period Register */
> +#define MX3_PWMCNR			0x14    /* PWM Counter Register */
>  
>  #define MX3_PWMCR_FWM			GENMASK(27, 26)
>  #define MX3_PWMCR_STOPEN		BIT(25)
> @@ -91,6 +93,7 @@ struct pwm_imx27_chip {
>  	 * value to return in that case.
>  	 */
>  	unsigned int duty_cycle;
> +	spinlock_t lock;
>  };
>  
>  #define to_pwm_imx27_chip(chip)	container_of(chip, struct pwm_imx27_chip, chip)
> @@ -201,10 +204,10 @@ static void pwm_imx27_wait_fifo_slot(struct pwm_chip *chip,
>  
>  	sr = readl(imx->mmio_base + MX3_PWMSR);
>  	fifoav = FIELD_GET(MX3_PWMSR_FIFOAV, sr);
> -	if (fifoav == MX3_PWMSR_FIFOAV_4WORDS) {
> +	if (fifoav >= MX3_PWMSR_FIFOAV_3WORDS) {
>  		period_ms = DIV_ROUND_UP_ULL(pwm_get_period(pwm),
>  					 NSEC_PER_MSEC);
> -		msleep(period_ms);
> +		msleep(period_ms * (fifoav - 2));

This changes semantic, from "wait for at least one free FIFO slot" to
"wait for at least two FIFO slots". Maybe a comment would be good? At
least the comment above the caller of pwm_imx27_wait_fifo_slot() needs
adaption.

Also I wonder: If there is only a single free slot, there is no problem,
is there?

>  		sr = readl(imx->mmio_base + MX3_PWMSR);
>  		if (fifoav == FIELD_GET(MX3_PWMSR_FIFOAV, sr))

The error test is wrong then. If fifoav was initially 4 and waiting only
reduced it to 3, you don't trigger a warning.

On a side note: pwm_get_period() might return a value that is too big.
This could be improved using readl_poll_timeout.

> @@ -215,13 +218,15 @@ static void pwm_imx27_wait_fifo_slot(struct pwm_chip *chip,
>  static int pwm_imx27_apply(struct pwm_chip *chip, struct pwm_device *pwm,
>  			   const struct pwm_state *state)
>  {
> -	unsigned long period_cycles, duty_cycles, prescale;
> +	unsigned long period_cycles, duty_cycles, prescale, counter_check, flags;
>  	struct pwm_imx27_chip *imx = to_pwm_imx27_chip(chip);
> +	void __iomem *reg_sar = imx->mmio_base + MX3_PWMSAR;
> +	__force u32 sar_last, sar_current;
>  	struct pwm_state cstate;
>  	unsigned long long c;
>  	unsigned long long clkrate;
>  	int ret;
> -	u32 cr;
> +	u32 cr, timeout = 1000;
>  
>  	pwm_get_state(pwm, &cstate);
>  
> @@ -262,7 +267,57 @@ static int pwm_imx27_apply(struct pwm_chip *chip, struct pwm_device *pwm,
>  		pwm_imx27_sw_reset(chip);
>  	}
>  
> -	writel(duty_cycles, imx->mmio_base + MX3_PWMSAR);
> +	/*
> +	 * This is a limited workaround. When the SAR FIFO is empty, the new
> +	 * write value will be directly applied to SAR even the current period
> +	 * is not over.

	When the SAR FIFO is empty, a new SAR value will be applied directly
	instead of waiting for the current period to complete.

is grammatically better (at least with my German sense for English
grammar) and might be better to understand.

> +	 * If the new SAR value is less than the old one, and the counter is
> +	 * greater than the new SAR value, the current period will not filp

s/filp/flip/

> +	 * the level. This will result in a pulse with a duty cycle of 100%.
> +	 * So, writing the current value of the SAR to SAR here before updating
> +	 * the new SAR value can avoid this issue.

This can be expressed a bit easier (and more correct) I think: 

	If the SAR value is decreased over the counter value, there is
	no compare event in the current period resulting in a 100%
	period.

(This is more correct because decreasing the SAR from 200 to 100 if the
counter is already at 300 doesn't result in the problem.)

> +	 * Add a spin lock and turn off the interrupt to ensure that the

The usual term is "turn off interrupts" because not only the PWM irq is
disabled, but all are. I wonder if just disabling irqs serves the same
purpose.

> +	 * real-time performance can be guaranteed as much as possible when

Disabling interrupts is never good for real-time performance.

Having said that I think I'd go for a solution without irq disabling.

> +	 * operating the following operations.
> +	 *
> +	 * 1. Add a threshold of 1.5us. If the time T between the read current
> +	 * count value CNR and the end of the cycle is less than 1.5us, wait
> +	 * for T to be longer than 1.5us before updating the SAR register.
> +	 * This is to avoid the situation that when the first SAR is written,
> +	 * the current cycle just ends and the SAR FIFO that just be written
> +	 * is emptied again.
> +	 *
> +	 * 2. Use __raw_writel() to minimize the interval between two writes to
> +	 * the SAR register to increase the fastest pwm frequency supported.
> +	 *
> +	 * When the PWM period is longer than 2us(or <500KHz), this workaround
> +	 * can solve this problem.
> +	 */
> +	if (duty_cycles < imx->duty_cycle) {
> +		c = clkrate * 1500;
> +		do_div(c, NSEC_PER_SEC);
> +		counter_check = c;

This needs to account for PWMCR.PRESCALER.

> +		sar_last = (__force u32) cpu_to_le32(imx->duty_cycle);
> +		sar_current = (__force u32) cpu_to_le32(duty_cycles);

If endianess conversion is necessary, please apply it to the complete
driver in a separate patch. Doing this only in one place and skipping
several others is just irritating.

> +		spin_lock_irqsave(&imx->lock, flags);
> +		if (state->period >= 2000) {

This checks the new period value that doesn't have to do anything with
the hardware state yet. So this check isn't sensible, is it?

> +			while ((period_cycles -
> +				readl_relaxed(imx->mmio_base + MX3_PWMCNR))
> +				< counter_check) {

period_cycles is the new overflow value.  If the current overflow value
is lower, the loop might terminate immediately. If however the current
overflow value is higher than period_cycles, the subtraction might
overflow resulting in the check to be true. In both cases there isn't
any relation to the next overflow event.

> +				if (!--timeout)
> +					break;

On a fast machine with a long period this might end in a timeout before
the rollover.

> +			};
> +		}
> +		if (!(MX3_PWMSR_FIFOAV &
> +		      readl_relaxed(imx->mmio_base + MX3_PWMSR)))
> +			__raw_writel(sar_last, reg_sar);

Why not use FIELD_GET for consistency here?

> +		__raw_writel(sar_current, reg_sar);
> +		spin_unlock_irqrestore(&imx->lock, flags);
> +	} else
> +		writel(duty_cycles, imx->mmio_base + MX3_PWMSAR);

This violates the coding style. If one branch of an if needs braces, all
branches should use them.

> +
>  	writel(period_cycles, imx->mmio_base + MX3_PWMPR);

Changing the period value resets the counter. So to minimize the effect
of the mitigation (i.e. a busy loop for up to 1.5µs with irqs off) I
suggest to do the wait and write SAR twice only if period doesn't change
and SAR is actually decreased over the current counter value.
 
>  	/*
> @@ -323,6 +378,8 @@ static int pwm_imx27_probe(struct platform_device *pdev)
>  		return dev_err_probe(&pdev->dev, PTR_ERR(imx->clk_per),
>  				     "failed to get peripheral clock\n");
>  
> +	spin_lock_init(&imx->lock);
> +	imx->duty_cycle = 0;

duty_cycle is already 0 here, so there is no need to initialize it.

>  	imx->chip.ops = &pwm_imx27_ops;
>  	imx->chip.dev = &pdev->dev;
>  	imx->chip.npwm = 1;

Best regards
Uwe
Clark Wang Dec. 21, 2021, 7:11 a.m. UTC | #2
Hi Uwe,

Thank you very much for your detailed review!
Please find my comment below.

Best Regards,
Clark Wang

> -----Original Message-----
> From: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
> Sent: Monday, December 20, 2021 18:56
> To: Clark Wang <xiaoning.wang@nxp.com>
> Cc: thierry.reding@gmail.com; lee.jones@linaro.org; linux-
> pwm@vger.kernel.org; shawnguo@kernel.org; s.hauer@pengutronix.de; linux-
> kernel@vger.kernel.org; dl-linux-imx <linux-imx@nxp.com>;
> kernel@pengutronix.de; festevam@gmail.com; linux-arm-
> kernel@lists.infradead.org; Jun Li <jun.li@nxp.com>
> Subject: Re: [PATCH V2] pwm: imx27: workaround of the pwm output bug when
> decrease the duty cycle
> 
> Hello,
> 
> [added Jun Li who reviewed the patch to Cc:]
> 
> On Mon, Dec 20, 2021 at 03:31:30PM +0800, Clark Wang wrote:
> > This is a limited workaround for the PWM IP issue.
> >
> > Root cause:
> > When the SAR FIFO is empty, the new write value will be directly applied
> > to SAR even the current period is not over.
> 
> 
> > If the new SAR value is less than the old one, and the counter is
> > greater than the new SAR value, the current period will not filp the
> > level. This will result in a pulse with a duty cycle of 100%.
> 
> I read the i.MX25 PWM chapter a few times now, and I played around a
> bit. I could confirm the issue.

[Clark] Yes, this issue is on all platforms which uses this PWM IP.

> 
> > Workaround:
> > Add an old value SAR write before updating the new duty cycle to SAR.
> > This will keep the new value is always in a not empty fifo, and can be wait
> > to update after a period finished.
> >
> > Limitation:
> > This workaround can only solve this issue when the PWM period is longer than
> > 2us(or <500KHz).
> >
> > Reviewed-by: Jun Li <jun.li@nxp.com>
> > Signed-off-by: Clark Wang <xiaoning.wang@nxp.com>
> > ---
> > V2:
> >  Fix the warnings when built in riscv-gcc, which is reported by kernel test robot
> <lkp@intel.com>
> > ---
> >  drivers/pwm/pwm-imx27.c | 67
> ++++++++++++++++++++++++++++++++++++++---
> >  1 file changed, 62 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/pwm/pwm-imx27.c b/drivers/pwm/pwm-imx27.c
> > index ea91a2f81a9f..3d9ca60e2baa 100644
> > --- a/drivers/pwm/pwm-imx27.c
> > +++ b/drivers/pwm/pwm-imx27.c
> > @@ -21,11 +21,13 @@
> >  #include <linux/platform_device.h>
> >  #include <linux/pwm.h>
> >  #include <linux/slab.h>
> > +#include <linux/spinlock.h>
> >
> >  #define MX3_PWMCR			0x00    /* PWM Control Register */
> >  #define MX3_PWMSR			0x04    /* PWM Status Register */
> >  #define MX3_PWMSAR			0x0C    /* PWM Sample
> Register */
> >  #define MX3_PWMPR			0x10    /* PWM Period Register */
> > +#define MX3_PWMCNR			0x14    /* PWM Counter
> Register */
> >
> >  #define MX3_PWMCR_FWM			GENMASK(27, 26)
> >  #define MX3_PWMCR_STOPEN		BIT(25)
> > @@ -91,6 +93,7 @@ struct pwm_imx27_chip {
> >  	 * value to return in that case.
> >  	 */
> >  	unsigned int duty_cycle;
> > +	spinlock_t lock;
> >  };
> >
> >  #define to_pwm_imx27_chip(chip)	container_of(chip, struct
> pwm_imx27_chip, chip)
> > @@ -201,10 +204,10 @@ static void pwm_imx27_wait_fifo_slot(struct
> pwm_chip *chip,
> >
> >  	sr = readl(imx->mmio_base + MX3_PWMSR);
> >  	fifoav = FIELD_GET(MX3_PWMSR_FIFOAV, sr);
> > -	if (fifoav == MX3_PWMSR_FIFOAV_4WORDS) {
> > +	if (fifoav >= MX3_PWMSR_FIFOAV_3WORDS) {
> >  		period_ms = DIV_ROUND_UP_ULL(pwm_get_period(pwm),
> >  					 NSEC_PER_MSEC);
> > -		msleep(period_ms);
> > +		msleep(period_ms * (fifoav - 2));
> 
> This changes semantic, from "wait for at least one free FIFO slot" to
> "wait for at least two FIFO slots". Maybe a comment would be good? At
> least the comment above the caller of pwm_imx27_wait_fifo_slot() needs
> adaption.

[Clark] OK, I will add a comment here.
The reason for waiting for at least two empty FIFO slots here is to ensure as much as possible that the SAR will not fail because the FIFO is full when writing to the SAR twice.

> 
> Also I wonder: If there is only a single free slot, there is no problem,
> is there?

[Clark] In most cases, this situation is fine. As long as the FIFO is not empty when the SAR is updated, this problem will not be triggered.

> 
> >  		sr = readl(imx->mmio_base + MX3_PWMSR);
> >  		if (fifoav == FIELD_GET(MX3_PWMSR_FIFOAV, sr))
> 
> The error test is wrong then. If fifoav was initially 4 and waiting only
> reduced it to 3, you don't trigger a warning.

[Clark] Yes, Thanks. I will change it to " if (FIELD_GET(MX3_PWMSR_FIFOAV, sr) < MX3_PWMSR_FIFOAV_3WORDS)"

>
> On a side note: pwm_get_period() might return a value that is too big.
> This could be improved using readl_poll_timeout.

[Clark] Yes, this is a good idea! I can change to use readl_poll_timeout to wait fifoav < 3.

> 
> > @@ -215,13 +218,15 @@ static void pwm_imx27_wait_fifo_slot(struct
> pwm_chip *chip,
> >  static int pwm_imx27_apply(struct pwm_chip *chip, struct pwm_device *pwm,
> >  			   const struct pwm_state *state)
> >  {
> > -	unsigned long period_cycles, duty_cycles, prescale;
> > +	unsigned long period_cycles, duty_cycles, prescale, counter_check,
> flags;
> >  	struct pwm_imx27_chip *imx = to_pwm_imx27_chip(chip);
> > +	void __iomem *reg_sar = imx->mmio_base + MX3_PWMSAR;
> > +	__force u32 sar_last, sar_current;
> >  	struct pwm_state cstate;
> >  	unsigned long long c;
> >  	unsigned long long clkrate;
> >  	int ret;
> > -	u32 cr;
> > +	u32 cr, timeout = 1000;
> >
> >  	pwm_get_state(pwm, &cstate);
> >
> > @@ -262,7 +267,57 @@ static int pwm_imx27_apply(struct pwm_chip *chip,
> struct pwm_device *pwm,
> >  		pwm_imx27_sw_reset(chip);
> >  	}
> >
> > -	writel(duty_cycles, imx->mmio_base + MX3_PWMSAR);
> > +	/*
> > +	 * This is a limited workaround. When the SAR FIFO is empty, the new
> > +	 * write value will be directly applied to SAR even the current period
> > +	 * is not over.
> 
> 	When the SAR FIFO is empty, a new SAR value will be applied directly
> 	instead of waiting for the current period to complete.
> 
> is grammatically better (at least with my German sense for English
> grammar) and might be better to understand.

[Clark] OK. My description may be a bit verbose.

> 
> > +	 * If the new SAR value is less than the old one, and the counter is
> > +	 * greater than the new SAR value, the current period will not filp
> 
> s/filp/flip/
> 
> > +	 * the level. This will result in a pulse with a duty cycle of 100%.
> > +	 * So, writing the current value of the SAR to SAR here before updating
> > +	 * the new SAR value can avoid this issue.
> 
> This can be expressed a bit easier (and more correct) I think:
> 
> 	If the SAR value is decreased over the counter value, there is
> 	no compare event in the current period resulting in a 100%
> 	period.
> 
> (This is more correct because decreasing the SAR from 200 to 100 if the
> counter is already at 300 doesn't result in the problem.)

[Clark] Thank you for your detailed comments, I will revise it in the next edition.

> 
> > +	 * Add a spin lock and turn off the interrupt to ensure that the
> 
> The usual term is "turn off interrupts" because not only the PWM irq is
> disabled, but all are. I wonder if just disabling irqs serves the same
> purpose.

[Clark] Our purpose here is to turn off all interrupts. I will modify the description here.

> 
> > +	 * real-time performance can be guaranteed as much as possible when
> 
> Disabling interrupts is never good for real-time performance.
> 
> Having said that I think I'd go for a solution without irq disabling.


[Clark]
The two necessary conditions that cause this problem are:
a) SAR FIFO is empty
b) New SAR<CNR<Old SAR

Code in the following "if (duty_cycles <imx->duty_cycle) {}" is to prevent the problem from recurring even if SAR is written twice in the following situations:
1. The pwm period is set at will. Our logic is to directly update the new SAR value when the FIFO is not empty to avoid this issue. But if after we read FIFO == 1, the time to the end of the current cycle is less than the time used by writing to the SAR twice, the second updated SAR value is still equivalent to being written in the empty FIFO. This may cause this issue again.
2. The PWM period is extremely small (such as 2us). When the FIFO itself is empty, if the total time of two SAR writes cannot be guaranteed to be within 2us, then the second updated SAR value is still equivalent to being written in the empty FIFO. Because the first updated SAR value is updated to the on-going SAR register, the SAR FIFO is empty again when write the second SAR value.
So in order to avoid the above two situations, we must ensure that:
a) The total time for writing two SARs is as small as possible.
b) The two SAR writes need to be completed within one PWM cycle.

I did an experiment:
On imx8mp, set the PWM period to 50us.
When spin_lock_irqsave() is not added, a 100% high level period can still be observed with a high probability even write SAR twice.
After spin_lock_irqsave() is added, there is a very small probability that a 100% high level period will appear. (This avoids the second situation described above.)
At the same time, add counter_check to ensure that the end of the current period is greater than 1.5us. Then quickly write the SAR twice. Period greater than 2us can no longer reproduce this problem. (This avoids the first situation described above.)

From the experimental results, adding spin_lock_irqsave() can effectively guarantee the code running time during the period.
If you have a better way, we can discuss. :)
[Clark]


> 
> > +	 * operating the following operations.
> > +	 *
> > +	 * 1. Add a threshold of 1.5us. If the time T between the read current
> > +	 * count value CNR and the end of the cycle is less than 1.5us, wait
> > +	 * for T to be longer than 1.5us before updating the SAR register.
> > +	 * This is to avoid the situation that when the first SAR is written,
> > +	 * the current cycle just ends and the SAR FIFO that just be written
> > +	 * is emptied again.
> > +	 *
> > +	 * 2. Use __raw_writel() to minimize the interval between two writes to
> > +	 * the SAR register to increase the fastest pwm frequency supported.
> > +	 *
> > +	 * When the PWM period is longer than 2us(or <500KHz), this
> workaround
> > +	 * can solve this problem.
> > +	 */
> > +	if (duty_cycles < imx->duty_cycle) {
> > +		c = clkrate * 1500;
> > +		do_div(c, NSEC_PER_SEC);
> > +		counter_check = c;
> 
> This needs to account for PWMCR.PRESCALER.

[Clark] Yes, will fix it.

> 
> > +		sar_last = (__force u32) cpu_to_le32(imx->duty_cycle);
> > +		sar_current = (__force u32) cpu_to_le32(duty_cycles);
> 
> If endianess conversion is necessary, please apply it to the complete
> driver in a separate patch. Doing this only in one place and skipping
> several others is just irritating.

[Clark] Sorry. But the endianess conversion here is the step before the value is actually written into the register in marco writel_relaxed().
#define writel_relaxed(v,c)	__raw_writel((__force u32) cpu_to_le32(v),c)

The purpose of doing the endianess conversion here in advance is to ensure that the writing INTERVAL of the next two SAR updates is as short as possible.
	__raw_writel(sar_last, reg_sar);
	↑ The INTERVAL between these two lines. ↓
	__raw_writel(sar_current, reg_sar);
[Clark]

> 
> > +		spin_lock_irqsave(&imx->lock, flags);
> > +		if (state->period >= 2000) {
> 
> This checks the new period value that doesn't have to do anything with
> the hardware state yet. So this check isn't sensible, is it?

[Clark] No, this check is useful.
Because we add counter_check to ensure that the end of the current period is greater than 1.5us. This workaround can only fix situations when the period is set greater than 1.5us.
If the PWM period set by user is shorter than 1.5us, then the 1.5us counter_check will never be true. This is why this check is added.
A 1.5us period cannot satisfy that the counter_check is greater than 1.5us, so the minimum cycle for solving(to workaround) this problem is extended to 2us.

Because this problem does not happened 100%, and a full-height period in such a short cycle will not cause much impact, in this case, the time to the end of the current period(counter_check) will not be checked. The SAR will be written twice directly.
[Clark]

> 
> > +			while ((period_cycles -
> > +				readl_relaxed(imx->mmio_base +
> MX3_PWMCNR))
> > +				< counter_check) {
> 
> period_cycles is the new overflow value.  If the current overflow value
> is lower, the loop might terminate immediately. If however the current
> overflow value is higher than period_cycles, the subtraction might
> overflow resulting in the check to be true. In both cases there isn't
> any relation to the next overflow event.

[Clark] Using signed numbers here may solve the problem you raised. Thanks.

> 
> > +				if (!--timeout)
> > +					break;
> 
> On a fast machine with a long period this might end in a timeout before
> the rollover.

[Clark]
No. 
If it is a long period, as long as the absolute time from the end of the current period is greater than 1.5us. The while loop above will finish. Will not cause a timeout. 
If the time from the end of the current period is less than 1.5us, after waiting for 1.5us, it will enter the next new long period, which must meet counter_check > 1.5us, and will also finish the while loop.
[Clark]

> 
> > +			};
> > +		}
> > +		if (!(MX3_PWMSR_FIFOAV &
> > +		      readl_relaxed(imx->mmio_base + MX3_PWMSR)))
> > +			__raw_writel(sar_last, reg_sar);
> 
> Why not use FIELD_GET for consistency here?

[Clark] In order to ensure that the interval between two SAR writes is as short as possible. This allows this workaround to support shorter PWM period.

> 
> > +		__raw_writel(sar_current, reg_sar);
> > +		spin_unlock_irqrestore(&imx->lock, flags);
> > +	} else
> > +		writel(duty_cycles, imx->mmio_base + MX3_PWMSAR);
> 
> This violates the coding style. If one branch of an if needs braces, all
> branches should use them.

[Clark] Thanks, will fix it.

> 
> > +
> >  	writel(period_cycles, imx->mmio_base + MX3_PWMPR);
> 
> Changing the period value resets the counter. So to minimize the effect
> of the mitigation (i.e. a busy loop for up to 1.5µs with irqs off) I
> suggest to do the wait and write SAR twice only if period doesn't change
> and SAR is actually decreased over the current counter value.

[Clark] Okay, will add this check.

> 
> >  	/*
> > @@ -323,6 +378,8 @@ static int pwm_imx27_probe(struct platform_device
> *pdev)
> >  		return dev_err_probe(&pdev->dev, PTR_ERR(imx->clk_per),
> >  				     "failed to get peripheral clock\n");
> >
> > +	spin_lock_init(&imx->lock);
> > +	imx->duty_cycle = 0;
> 
> duty_cycle is already 0 here, so there is no need to initialize it.

[Clark] Oh, yes. It is a kzalloc above. Will remove it.

Best Regards,
Clark Wang

> 
> >  	imx->chip.ops = &pwm_imx27_ops;
> >  	imx->chip.dev = &pdev->dev;
> >  	imx->chip.npwm = 1;
> 
> Best regards
> Uwe
> 
> --
> Pengutronix e.K.                           | Uwe Kleine-König            |
> Industrial Linux Solutions                 | https://www.pengutronix.de/ |
Uwe Kleine-König Dec. 21, 2021, 9:50 a.m. UTC | #3
Hello Clark,

On Tue, Dec 21, 2021 at 07:11:15AM +0000, Clark Wang wrote:
> > On Mon, Dec 20, 2021 at 03:31:30PM +0800, Clark Wang wrote:
> > > Workaround:
> > > Add an old value SAR write before updating the new duty cycle to SAR.
> > > This will keep the new value is always in a not empty fifo, and can be wait
> > > to update after a period finished.
> > >
> > > Limitation:
> > > This workaround can only solve this issue when the PWM period is longer than
> > > 2us(or <500KHz).
> > >
> > > Reviewed-by: Jun Li <jun.li@nxp.com>
> > > Signed-off-by: Clark Wang <xiaoning.wang@nxp.com>
> > > ---
> > > V2:
> > >  Fix the warnings when built in riscv-gcc, which is reported by kernel test robot
> > <lkp@intel.com>
> > > ---
> > >  drivers/pwm/pwm-imx27.c | 67
> > ++++++++++++++++++++++++++++++++++++++---
> > >  1 file changed, 62 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/drivers/pwm/pwm-imx27.c b/drivers/pwm/pwm-imx27.c
> > > index ea91a2f81a9f..3d9ca60e2baa 100644
> > > --- a/drivers/pwm/pwm-imx27.c
> > > +++ b/drivers/pwm/pwm-imx27.c
> > > @@ -21,11 +21,13 @@
> > >  #include <linux/platform_device.h>
> > >  #include <linux/pwm.h>
> > >  #include <linux/slab.h>
> > > +#include <linux/spinlock.h>
> > >
> > >  #define MX3_PWMCR			0x00    /* PWM Control Register */
> > >  #define MX3_PWMSR			0x04    /* PWM Status Register */
> > >  #define MX3_PWMSAR			0x0C    /* PWM Sample
> > Register */
> > >  #define MX3_PWMPR			0x10    /* PWM Period Register */
> > > +#define MX3_PWMCNR			0x14    /* PWM Counter
> > Register */
> > >
> > >  #define MX3_PWMCR_FWM			GENMASK(27, 26)
> > >  #define MX3_PWMCR_STOPEN		BIT(25)
> > > @@ -91,6 +93,7 @@ struct pwm_imx27_chip {
> > >  	 * value to return in that case.
> > >  	 */
> > >  	unsigned int duty_cycle;
> > > +	spinlock_t lock;
> > >  };
> > >
> > >  #define to_pwm_imx27_chip(chip)	container_of(chip, struct
> > pwm_imx27_chip, chip)
> > > @@ -201,10 +204,10 @@ static void pwm_imx27_wait_fifo_slot(struct
> > pwm_chip *chip,
> > >
> > >  	sr = readl(imx->mmio_base + MX3_PWMSR);
> > >  	fifoav = FIELD_GET(MX3_PWMSR_FIFOAV, sr);
> > > -	if (fifoav == MX3_PWMSR_FIFOAV_4WORDS) {
> > > +	if (fifoav >= MX3_PWMSR_FIFOAV_3WORDS) {
> > >  		period_ms = DIV_ROUND_UP_ULL(pwm_get_period(pwm),
> > >  					 NSEC_PER_MSEC);
> > > -		msleep(period_ms);
> > > +		msleep(period_ms * (fifoav - 2));
> > 
> > This changes semantic, from "wait for at least one free FIFO slot" to
> > "wait for at least two FIFO slots". Maybe a comment would be good? At
> > least the comment above the caller of pwm_imx27_wait_fifo_slot() needs
> > adaption.
> 
> [Clark] OK, I will add a comment here.
> The reason for waiting for at least two empty FIFO slots here is to ensure as much as possible that the SAR will not fail because the FIFO is full when writing to the SAR twice.

However if there are still 3 pending SAR values you don't need to write
twice. So waiting for a single free slot should be good enough.
 
> > Also I wonder: If there is only a single free slot, there is no problem,
> > is there?
> 
> [Clark] In most cases, this situation is fine. As long as the FIFO is not empty when the SAR is updated, this problem will not be triggered.
> 
> > 
> > >  		sr = readl(imx->mmio_base + MX3_PWMSR);
> > >  		if (fifoav == FIELD_GET(MX3_PWMSR_FIFOAV, sr))
> > 
> > The error test is wrong then. If fifoav was initially 4 and waiting only
> > reduced it to 3, you don't trigger a warning.
> 
> [Clark] Yes, Thanks. I will change it to " if (FIELD_GET(MX3_PWMSR_FIFOAV, sr) < MX3_PWMSR_FIFOAV_3WORDS)"

(However this request is void if you stick to waiting for a single free
slot.)

> > On a side note: pwm_get_period() might return a value that is too big.
> > This could be improved using readl_poll_timeout.
> 
> [Clark] Yes, this is a good idea! I can change to use readl_poll_timeout to wait fifoav < 3.

In a separate patch please (and as above, wait for fifoav < 4 only). 

> > > @@ -215,13 +218,15 @@ static void pwm_imx27_wait_fifo_slot(struct
> > pwm_chip *chip,
> > >  static int pwm_imx27_apply(struct pwm_chip *chip, struct pwm_device *pwm,
> > >  			   const struct pwm_state *state)
> > >  {
> > > -	unsigned long period_cycles, duty_cycles, prescale;
> > > +	unsigned long period_cycles, duty_cycles, prescale, counter_check,
> > flags;
> > >  	struct pwm_imx27_chip *imx = to_pwm_imx27_chip(chip);
> > > +	void __iomem *reg_sar = imx->mmio_base + MX3_PWMSAR;
> > > +	__force u32 sar_last, sar_current;
> > >  	struct pwm_state cstate;
> > >  	unsigned long long c;
> > >  	unsigned long long clkrate;
> > >  	int ret;
> > > -	u32 cr;
> > > +	u32 cr, timeout = 1000;
> > >
> > >  	pwm_get_state(pwm, &cstate);
> > >
> > > @@ -262,7 +267,57 @@ static int pwm_imx27_apply(struct pwm_chip *chip,
> > struct pwm_device *pwm,
> > >  		pwm_imx27_sw_reset(chip);
> > >  	}
> > >
> > > -	writel(duty_cycles, imx->mmio_base + MX3_PWMSAR);
> > > +	/*
> > > +	 * This is a limited workaround. When the SAR FIFO is empty, the new
> > > +	 * write value will be directly applied to SAR even the current period
> > > +	 * is not over.
> > 
> > 	When the SAR FIFO is empty, a new SAR value will be applied directly
> > 	instead of waiting for the current period to complete.
> > 
> > is grammatically better (at least with my German sense for English
> > grammar) and might be better to understand.
> 
> [Clark] OK. My description may be a bit verbose.
> 
> > 
> > > +	 * If the new SAR value is less than the old one, and the counter is
> > > +	 * greater than the new SAR value, the current period will not filp
> > 
> > s/filp/flip/
> > 
> > > +	 * the level. This will result in a pulse with a duty cycle of 100%.
> > > +	 * So, writing the current value of the SAR to SAR here before updating
> > > +	 * the new SAR value can avoid this issue.
> > 
> > This can be expressed a bit easier (and more correct) I think:
> > 
> > 	If the SAR value is decreased over the counter value, there is
> > 	no compare event in the current period resulting in a 100%
> > 	period.
> > 
> > (This is more correct because decreasing the SAR from 200 to 100 if the
> > counter is already at 300 doesn't result in the problem.)
> 
> [Clark] Thank you for your detailed comments, I will revise it in the next edition.
> 
> > 
> > > +	 * Add a spin lock and turn off the interrupt to ensure that the
> > 
> > The usual term is "turn off interrupts" because not only the PWM irq is
> > disabled, but all are. I wonder if just disabling irqs serves the same
> > purpose.
> 
> [Clark] Our purpose here is to turn off all interrupts. I will modify the description here.
> 
> > 
> > > +	 * real-time performance can be guaranteed as much as possible when
> > 
> > Disabling interrupts is never good for real-time performance.
> > 
> > Having said that I think I'd go for a solution without irq disabling.
> 
> 
> [Clark]
> The two necessary conditions that cause this problem are:
> a) SAR FIFO is empty
> b) New SAR<CNR<Old SAR
> 
> Code in the following "if (duty_cycles <imx->duty_cycle) {}" is to prevent the problem from recurring even if SAR is written twice in the following situations:
> 1. The pwm period is set at will. Our logic is to directly update the new SAR value when the FIFO is not empty to avoid this issue. But if after we read FIFO == 1, the time to the end of the current cycle is less than the time used by writing to the SAR twice, the second updated SAR value is still equivalent to being written in the empty FIFO. This may cause this issue again.
> 2. The PWM period is extremely small (such as 2us). When the FIFO itself is empty, if the total time of two SAR writes cannot be guaranteed to be within 2us, then the second updated SAR value is still equivalent to being written in the empty FIFO. Because the first updated SAR value is updated to the on-going SAR register, the SAR FIFO is empty again when write the second SAR value.
> So in order to avoid the above two situations, we must ensure that:
> a) The total time for writing two SARs is as small as possible.
> b) The two SAR writes need to be completed within one PWM cycle.
> 
> I did an experiment:
> On imx8mp, set the PWM period to 50us.
> When spin_lock_irqsave() is not added, a 100% high level period can still be observed with a high probability even write SAR twice.
> After spin_lock_irqsave() is added, there is a very small probability that a 100% high level period will appear. (This avoids the second situation described above.)
> At the same time, add counter_check to ensure that the end of the current period is greater than 1.5us. Then quickly write the SAR twice. Period greater than 2us can no longer reproduce this problem. (This avoids the first situation described above.)
> 
> From the experimental results, adding spin_lock_irqsave() can effectively guarantee the code running time during the period.

Did you test with local_irq_save() instead of the spin_lock? 

> If you have a better way, we can discuss. :)

Maybe disable the PWM on reconfiguration? That is, clear the enable bit,
then the counter stops (and output freezes). Then you can check for a
word in the fifo, add one or two and reenable the PWM.

Then you don't have to disable irqs at all and only increase the current
period by by the time you need to setup the new settings.

> > > +		sar_last = (__force u32) cpu_to_le32(imx->duty_cycle);
> > > +		sar_current = (__force u32) cpu_to_le32(duty_cycles);
> > 
> > If endianess conversion is necessary, please apply it to the complete
> > driver in a separate patch. Doing this only in one place and skipping
> > several others is just irritating.
> 
> [Clark] Sorry. But the endianess conversion here is the step before the value is actually written into the register in marco writel_relaxed().
> #define writel_relaxed(v,c)	__raw_writel((__force u32) cpu_to_le32(v),c)

Ah I see. I suggest to mention that in a comment.
 
> The purpose of doing the endianess conversion here in advance is to ensure that the writing INTERVAL of the next two SAR updates is as short as possible.
> 	__raw_writel(sar_last, reg_sar);
> 	↑ The INTERVAL between these two lines. ↓
> 	__raw_writel(sar_current, reg_sar);

Did you verify that the compiler understood your wish and that splitting
the call to writel_relaxed() indeed improves the situation?

> [Clark]
> 
> > 
> > > +		spin_lock_irqsave(&imx->lock, flags);
> > > +		if (state->period >= 2000) {
> > 
> > This checks the new period value that doesn't have to do anything with
> > the hardware state yet. So this check isn't sensible, is it?
> 
> [Clark] No, this check is useful.
> Because we add counter_check to ensure that the end of the current period is greater than 1.5us. This workaround can only fix situations when the period is set greater than 1.5us.

You didn't get what I wanted to point out here. Consider the PWM is
currently running with period = 100 ns. Then .apply is called,
requesting period = 3000 ns. Your code then does the wait dance because
the new value is big enough. However it's the currently configured value
that should be considered to judge if the dance is required.

> If the PWM period set by user is shorter than 1.5us, then the 1.5us
> counter_check will never be true. This is why this check is added.

The check is sound, however your suggested implementation is wrong.

> > > +			while ((period_cycles -
> > > +				readl_relaxed(imx->mmio_base +
> > MX3_PWMCNR))
> > > +				< counter_check) {
> > 
> > period_cycles is the new overflow value.  If the current overflow value
> > is lower, the loop might terminate immediately. If however the current
> > overflow value is higher than period_cycles, the subtraction might
> > overflow resulting in the check to be true. In both cases there isn't
> > any relation to the next overflow event.
> 
> [Clark] Using signed numbers here may solve the problem you raised. Thanks.

uah, no. Please stick to unsigned and just fix the test.

> > > +				if (!--timeout)
> > > +					break;
> > 
> > On a fast machine with a long period this might end in a timeout before
> > the rollover.
> 
> [Clark]
> No. 
> If it is a long period, as long as the absolute time from the end of the current period is greater than 1.5us. The while loop above will finish. Will not cause a timeout. 

If the currently running period lasts 1.4 µs at function entry you
intend to wait for 1.4 µs, right? You check up to 1000 times if the
1.4µs are already over. If each check takes 1 ns, you only wait 1µs.
While this probably doesn't happen because in the future a check will
take longer than 1.5ns for quite some time, it's bad style and a bad
template for someone implementing something similar on a different
machine where the timings are different.

> If the time from the end of the current period is less than 1.5us, after waiting for 1.5us, it will enter the next new long period, which must meet counter_check > 1.5us, and will also finish the while loop.
> [Clark]
> 
> > 
> > > +			};
> > > +		}
> > > +		if (!(MX3_PWMSR_FIFOAV &
> > > +		      readl_relaxed(imx->mmio_base + MX3_PWMSR)))
> > > +			__raw_writel(sar_last, reg_sar);
> > 
> > Why not use FIELD_GET for consistency here?
> 
> [Clark] In order to ensure that the interval between two SAR writes is as short as possible. This allows this workaround to support shorter PWM period.

I would expect that using FIELD_GET does result in the same code being
generated. So the only effect is to make this more readable to a human
reader.

Best regards
Uwe
diff mbox series

Patch

diff --git a/drivers/pwm/pwm-imx27.c b/drivers/pwm/pwm-imx27.c
index ea91a2f81a9f..3d9ca60e2baa 100644
--- a/drivers/pwm/pwm-imx27.c
+++ b/drivers/pwm/pwm-imx27.c
@@ -21,11 +21,13 @@ 
 #include <linux/platform_device.h>
 #include <linux/pwm.h>
 #include <linux/slab.h>
+#include <linux/spinlock.h>
 
 #define MX3_PWMCR			0x00    /* PWM Control Register */
 #define MX3_PWMSR			0x04    /* PWM Status Register */
 #define MX3_PWMSAR			0x0C    /* PWM Sample Register */
 #define MX3_PWMPR			0x10    /* PWM Period Register */
+#define MX3_PWMCNR			0x14    /* PWM Counter Register */
 
 #define MX3_PWMCR_FWM			GENMASK(27, 26)
 #define MX3_PWMCR_STOPEN		BIT(25)
@@ -91,6 +93,7 @@  struct pwm_imx27_chip {
 	 * value to return in that case.
 	 */
 	unsigned int duty_cycle;
+	spinlock_t lock;
 };
 
 #define to_pwm_imx27_chip(chip)	container_of(chip, struct pwm_imx27_chip, chip)
@@ -201,10 +204,10 @@  static void pwm_imx27_wait_fifo_slot(struct pwm_chip *chip,
 
 	sr = readl(imx->mmio_base + MX3_PWMSR);
 	fifoav = FIELD_GET(MX3_PWMSR_FIFOAV, sr);
-	if (fifoav == MX3_PWMSR_FIFOAV_4WORDS) {
+	if (fifoav >= MX3_PWMSR_FIFOAV_3WORDS) {
 		period_ms = DIV_ROUND_UP_ULL(pwm_get_period(pwm),
 					 NSEC_PER_MSEC);
-		msleep(period_ms);
+		msleep(period_ms * (fifoav - 2));
 
 		sr = readl(imx->mmio_base + MX3_PWMSR);
 		if (fifoav == FIELD_GET(MX3_PWMSR_FIFOAV, sr))
@@ -215,13 +218,15 @@  static void pwm_imx27_wait_fifo_slot(struct pwm_chip *chip,
 static int pwm_imx27_apply(struct pwm_chip *chip, struct pwm_device *pwm,
 			   const struct pwm_state *state)
 {
-	unsigned long period_cycles, duty_cycles, prescale;
+	unsigned long period_cycles, duty_cycles, prescale, counter_check, flags;
 	struct pwm_imx27_chip *imx = to_pwm_imx27_chip(chip);
+	void __iomem *reg_sar = imx->mmio_base + MX3_PWMSAR;
+	__force u32 sar_last, sar_current;
 	struct pwm_state cstate;
 	unsigned long long c;
 	unsigned long long clkrate;
 	int ret;
-	u32 cr;
+	u32 cr, timeout = 1000;
 
 	pwm_get_state(pwm, &cstate);
 
@@ -262,7 +267,57 @@  static int pwm_imx27_apply(struct pwm_chip *chip, struct pwm_device *pwm,
 		pwm_imx27_sw_reset(chip);
 	}
 
-	writel(duty_cycles, imx->mmio_base + MX3_PWMSAR);
+	/*
+	 * This is a limited workaround. When the SAR FIFO is empty, the new
+	 * write value will be directly applied to SAR even the current period
+	 * is not over.
+	 * If the new SAR value is less than the old one, and the counter is
+	 * greater than the new SAR value, the current period will not filp
+	 * the level. This will result in a pulse with a duty cycle of 100%.
+	 * So, writing the current value of the SAR to SAR here before updating
+	 * the new SAR value can avoid this issue.
+	 *
+	 * Add a spin lock and turn off the interrupt to ensure that the
+	 * real-time performance can be guaranteed as much as possible when
+	 * operating the following operations.
+	 *
+	 * 1. Add a threshold of 1.5us. If the time T between the read current
+	 * count value CNR and the end of the cycle is less than 1.5us, wait
+	 * for T to be longer than 1.5us before updating the SAR register.
+	 * This is to avoid the situation that when the first SAR is written,
+	 * the current cycle just ends and the SAR FIFO that just be written
+	 * is emptied again.
+	 *
+	 * 2. Use __raw_writel() to minimize the interval between two writes to
+	 * the SAR register to increase the fastest pwm frequency supported.
+	 *
+	 * When the PWM period is longer than 2us(or <500KHz), this workaround
+	 * can solve this problem.
+	 */
+	if (duty_cycles < imx->duty_cycle) {
+		c = clkrate * 1500;
+		do_div(c, NSEC_PER_SEC);
+		counter_check = c;
+		sar_last = (__force u32) cpu_to_le32(imx->duty_cycle);
+		sar_current = (__force u32) cpu_to_le32(duty_cycles);
+
+		spin_lock_irqsave(&imx->lock, flags);
+		if (state->period >= 2000) {
+			while ((period_cycles -
+				readl_relaxed(imx->mmio_base + MX3_PWMCNR))
+				< counter_check) {
+				if (!--timeout)
+					break;
+			};
+		}
+		if (!(MX3_PWMSR_FIFOAV &
+		      readl_relaxed(imx->mmio_base + MX3_PWMSR)))
+			__raw_writel(sar_last, reg_sar);
+		__raw_writel(sar_current, reg_sar);
+		spin_unlock_irqrestore(&imx->lock, flags);
+	} else
+		writel(duty_cycles, imx->mmio_base + MX3_PWMSAR);
+
 	writel(period_cycles, imx->mmio_base + MX3_PWMPR);
 
 	/*
@@ -323,6 +378,8 @@  static int pwm_imx27_probe(struct platform_device *pdev)
 		return dev_err_probe(&pdev->dev, PTR_ERR(imx->clk_per),
 				     "failed to get peripheral clock\n");
 
+	spin_lock_init(&imx->lock);
+	imx->duty_cycle = 0;
 	imx->chip.ops = &pwm_imx27_ops;
 	imx->chip.dev = &pdev->dev;
 	imx->chip.npwm = 1;