diff mbox

[1/3] tty: serial: 8250_omap: do not use RX DMA if pause is not supported

Message ID 1438977619-15488-2-git-send-email-bigeasy@linutronix.de (mailing list archive)
State New, archived
Headers show

Commit Message

Sebastian Andrzej Siewior Aug. 7, 2015, 8 p.m. UTC
The 8250-omap driver requires the DMA-engine driver to support the pause
command in order to properly turn off programmed RX transfer before the
driver stars manually reading from the FIFO.
The lacking support of the requirement has been discovered recently. In
order to stay safe here we disable support for RX-DMA as soon as we
notice that it does not work. This should happen very early.
If the user does not want to see this backtrace he can either disable
DMA support (completely or RX-only) or backport the required patches for
edma / omap-dma once they hit mainline.

Cc: <stable@vger.kernel.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 drivers/tty/serial/8250/8250_omap.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

Comments

Peter Hurley Aug. 8, 2015, 12:28 a.m. UTC | #1
On 08/07/2015 04:00 PM, Sebastian Andrzej Siewior wrote:
> The 8250-omap driver requires the DMA-engine driver to support the pause
> command in order to properly turn off programmed RX transfer before the
> driver stars manually reading from the FIFO.
> The lacking support of the requirement has been discovered recently. In
> order to stay safe here we disable support for RX-DMA as soon as we
> notice that it does not work. This should happen very early.
> If the user does not want to see this backtrace he can either disable
> DMA support (completely or RX-only) or backport the required patches for
> edma / omap-dma once they hit mainline.
> 
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---
>  drivers/tty/serial/8250/8250_omap.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/tty/serial/8250/8250_omap.c b/drivers/tty/serial/8250/8250_omap.c
> index 0340ee6ba970..07a11e0935e4 100644
> --- a/drivers/tty/serial/8250/8250_omap.c
> +++ b/drivers/tty/serial/8250/8250_omap.c
> @@ -112,6 +112,7 @@ struct omap8250_priv {
>  	struct work_struct qos_work;
>  	struct uart_8250_dma omap8250_dma;
>  	spinlock_t rx_dma_lock;
> +	bool rx_dma_broken;
>  };
>  
>  static u32 uart_read(struct uart_8250_port *up, u32 reg)
> @@ -761,6 +762,7 @@ static void omap_8250_rx_dma_flush(struct uart_8250_port *p)
>  	struct omap8250_priv	*priv = p->port.private_data;
>  	struct uart_8250_dma	*dma = p->dma;
>  	unsigned long		flags;
> +	int ret;
>  
>  	spin_lock_irqsave(&priv->rx_dma_lock, flags);
>  
> @@ -769,7 +771,9 @@ static void omap_8250_rx_dma_flush(struct uart_8250_port *p)
>  		return;
>  	}
>  
> -	dmaengine_pause(dma->rxchan);
> +	ret = dmaengine_pause(dma->rxchan);
> +	if (WARN_ON_ONCE(ret))
> +		priv->rx_dma_broken = true;

No offense, Sebastian, but it boggles my mind that anyone could defend this
as solid api design. We're in the middle of an interrupt handler and the
slave dma driver is /just/ telling us now that it doesn't implement this
functionality?!!?

The dmaengine api has _so much_ setup and none of it contemplates telling the
consumer that critical functionality is missing?

Even dma_get_slave_caps() returns _true_ for cmd_pause support; ok, that
interface is pointless.

Rather than losing /critical data/ here, the interrupt handler should just
busy-wait until dmaengine_tx_status() returns DMA_COMPLETE for the rx_cookie.

Regards,
Peter Hurley

>  	spin_unlock_irqrestore(&priv->rx_dma_lock, flags);
>  
> @@ -813,6 +817,9 @@ static int omap_8250_rx_dma(struct uart_8250_port *p, unsigned int iir)
>  		break;
>  	}
>  
> +	if (priv->rx_dma_broken)
> +		return -EINVAL;
> +
>  	spin_lock_irqsave(&priv->rx_dma_lock, flags);
>  
>  	if (dma->rx_running)
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Russell King - ARM Linux Aug. 8, 2015, 9:03 a.m. UTC | #2
On Fri, Aug 07, 2015 at 08:28:57PM -0400, Peter Hurley wrote:
> Even dma_get_slave_caps() returns _true_ for cmd_pause support; ok, that
> interface is pointless.

How about reporting that as a bug then, because if you look back in the
git history, as you are fully capable of, you will find that the slave
capability stuff went in _after_ omap-dma, and *many* DMA engine drivers
have not been updated.  Here, let me do _your_ work for you:

commit cb8cea513c80db1dfe2dce468d2d0772005bb9a1
Author: Maxime Ripard <maxime.ripard@free-electrons.com>
Date:   Mon Nov 17 14:42:04 2014 +0100

    dmaengine: Create a generic dma_slave_caps callback

commit 2dcdf570936168d488acf90be9b04a3d32dafce7
Author: Peter Ujfalusi <peter.ujfalusi@ti.com>
Date:   Fri Sep 14 15:05:45 2012 +0300

    dmaengine: omap: Add support for pause/resume in cyclic dma mode

Oh look, omap-dma pre-dates the creation of dma_slave_caps by over two
years!

However, it's *not* as trivial as you think: the dma_slave_caps() call
has no knowledge whether a channel will be used in cyclic mode or not,
or which direction it will be used.  So, it really _can't_ report
whether pause mode is supported or not.  So actually, this is something
that can't be fixed trivially, and was a detail missed when the slave
caps was reviewed (I probably missed the review or missed the point.)

So, how about you stop playing the blame game and trying to shovel your
shit onto DMA engine.
Sebastian Andrzej Siewior Aug. 8, 2015, 9:32 a.m. UTC | #3
On 08/08/2015 02:28 AM, Peter Hurley wrote:
>> diff --git a/drivers/tty/serial/8250/8250_omap.c b/drivers/tty/serial/8250/8250_omap.c
>> index 0340ee6ba970..07a11e0935e4 100644
>> --- a/drivers/tty/serial/8250/8250_omap.c
>> +++ b/drivers/tty/serial/8250/8250_omap.c
>> @@ -769,7 +771,9 @@ static void omap_8250_rx_dma_flush(struct uart_8250_port *p)
>>  		return;
>>  	}
>>  
>> -	dmaengine_pause(dma->rxchan);
>> +	ret = dmaengine_pause(dma->rxchan);
>> +	if (WARN_ON_ONCE(ret))
>> +		priv->rx_dma_broken = true;
> 
> No offense, Sebastian, but it boggles my mind that anyone could defend this
> as solid api design. We're in the middle of an interrupt handler and the
> slave dma driver is /just/ telling us now that it doesn't implement this
> functionality?!!?

This is the best thing I could come up with. But to be fair: This
happens once _very_ early in the process and is 100% reproducible. The
WARN_ON should trigger user attention.

> The dmaengine api has _so much_ setup and none of it contemplates telling the
> consumer that critical functionality is missing?

Lets say it is a missing feature which was not noticed / required until
now. I have the missing functionality in patch #3. I don't see the
point in adding a lookup API for this feature which has to be
backported stable just to figure out that RX-DMA might not work.
Again: the WARN_ON triggers on the first short RX read _or_ on shutdown
of the port which is not after hours using the port.

> Even dma_get_slave_caps() returns _true_ for cmd_pause support; ok, that
> interface is pointless.
> 
> Rather than losing /critical data/ here, the interrupt handler should just
> busy-wait until dmaengine_tx_status() returns DMA_COMPLETE for the rx_cookie.

This might not happen at all. At 115200 I *never* encouraged this. Once
the FIFO is filled with less than RX-trigger size than the UART sends
the time-out interrupt and the DMA *never* completes. Even if the FIFO
reaches trigger size later. So you have to abort the DMA transfer and
read data manually from the FIFO. That is why the omap enqueues the
RX-transfer upfront (while the FIFO is still empty).

It happens *sometimes* that the UART sends a timeout interrupt and the
DMA transfer just is started and it has been only observed at 3mbaud
(but there were no tests 115200 > x > 3mbaud that I know of).

Therefor I think this is a fair fix without hiding anything.

> Regards,
> Peter Hurley

Sebastian
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Russell King - ARM Linux Aug. 8, 2015, 9:57 a.m. UTC | #4
On Sat, Aug 08, 2015 at 11:32:05AM +0200, Sebastian Andrzej Siewior wrote:
> This might not happen at all. At 115200 I *never* encouraged this. Once
> the FIFO is filled with less than RX-trigger size than the UART sends
> the time-out interrupt and the DMA *never* completes.

Careful with statements like that, they have a habbit of backfiring :)

It just needs the right timing - you need a character sequence which
has a pause long enough to trigger the timeout, but short enough to
get characters in between the UART deciding to report a timeout, and
the DMA being terminated.

You'll then be in the situation where the UART is receiving characters
and using the DMA, but you've been interrupted to handle the RX timeout
event.  Whether the UART resets the IIR in that situation to indicate
something other than a RX timeout, I'm not sure - if it doesn't, then
there's a race there (which from your behaviour at faster rates seems
likely.)

The more loaded the system, the longer an IRQ may take to be serviced
(especially if there are interrupt handlers which aren't fast) and the
bigger the window is for that to happen.  I've seen some long service
times with USB with HID in older kernels (milliseconds), but that seems
to have been fixed now - longest IRQ handling I'm now seeing is around
500us.  Given 115200 baud, the character time is 87us, that's probably
more than enough if you get the timing just right.
Greg KH Aug. 8, 2015, 3:40 p.m. UTC | #5
On Sat, Aug 08, 2015 at 11:32:05AM +0200, Sebastian Andrzej Siewior wrote:
> On 08/08/2015 02:28 AM, Peter Hurley wrote:
> >> diff --git a/drivers/tty/serial/8250/8250_omap.c b/drivers/tty/serial/8250/8250_omap.c
> >> index 0340ee6ba970..07a11e0935e4 100644
> >> --- a/drivers/tty/serial/8250/8250_omap.c
> >> +++ b/drivers/tty/serial/8250/8250_omap.c
> >> @@ -769,7 +771,9 @@ static void omap_8250_rx_dma_flush(struct uart_8250_port *p)
> >>  		return;
> >>  	}
> >>  
> >> -	dmaengine_pause(dma->rxchan);
> >> +	ret = dmaengine_pause(dma->rxchan);
> >> +	if (WARN_ON_ONCE(ret))
> >> +		priv->rx_dma_broken = true;
> > 
> > No offense, Sebastian, but it boggles my mind that anyone could defend this
> > as solid api design. We're in the middle of an interrupt handler and the
> > slave dma driver is /just/ telling us now that it doesn't implement this
> > functionality?!!?
> 
> This is the best thing I could come up with. But to be fair: This
> happens once _very_ early in the process and is 100% reproducible. The
> WARN_ON should trigger user attention.

Never expect a _user_ to do anything with a WARN_ON except ignore it.

WARN_ON is for developers when something really bad went wrong that you
want to be notified so that you can fix the code to prevent that from
ever happening again.

So this isn't ok, sorry, a user would not know what to do with this at
all except email me asking why the driver is broken because a kernel
"oops" just happened, when in reality, their hardware is broken in a way
that you already know about when you wrote the patch.

You know better than this.

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Peter Ujfalusi Aug. 10, 2015, 11:54 a.m. UTC | #6
On 08/07/2015 11:00 PM, Sebastian Andrzej Siewior wrote:
> The 8250-omap driver requires the DMA-engine driver to support the pause
> command in order to properly turn off programmed RX transfer before the
> driver stars manually reading from the FIFO.
> The lacking support of the requirement has been discovered recently. In
> order to stay safe here we disable support for RX-DMA as soon as we
> notice that it does not work. This should happen very early.
> If the user does not want to see this backtrace he can either disable
> DMA support (completely or RX-only) or backport the required patches for
> edma / omap-dma once they hit mainline.
> 
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---
>  drivers/tty/serial/8250/8250_omap.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/tty/serial/8250/8250_omap.c b/drivers/tty/serial/8250/8250_omap.c
> index 0340ee6ba970..07a11e0935e4 100644
> --- a/drivers/tty/serial/8250/8250_omap.c
> +++ b/drivers/tty/serial/8250/8250_omap.c
> @@ -112,6 +112,7 @@ struct omap8250_priv {
>  	struct work_struct qos_work;
>  	struct uart_8250_dma omap8250_dma;
>  	spinlock_t rx_dma_lock;
> +	bool rx_dma_broken;
>  };
>  
>  static u32 uart_read(struct uart_8250_port *up, u32 reg)
> @@ -761,6 +762,7 @@ static void omap_8250_rx_dma_flush(struct uart_8250_port *p)
>  	struct omap8250_priv	*priv = p->port.private_data;
>  	struct uart_8250_dma	*dma = p->dma;
>  	unsigned long		flags;
> +	int ret;
>  
>  	spin_lock_irqsave(&priv->rx_dma_lock, flags);
>  
> @@ -769,7 +771,9 @@ static void omap_8250_rx_dma_flush(struct uart_8250_port *p)
>  		return;
>  	}
>  
> -	dmaengine_pause(dma->rxchan);
> +	ret = dmaengine_pause(dma->rxchan);
> +	if (WARN_ON_ONCE(ret))
> +		priv->rx_dma_broken = true;

I don't think this is good thing for the stable _and_ for the mainline at the
same time:
in stable the rx DMA should not be allowed since the stable kernels does not
allow pause/resume with omap-dma, so there the rx DMA should be just disabled
for UART. This change will cause regression since it introduce a WARN_ON_ONCE,
which will be printed if the user tries to use non working feature.

In mainline you will eventually going to have pause/resume support so this
patch will make no sense there.

>  
>  	spin_unlock_irqrestore(&priv->rx_dma_lock, flags);
>  
> @@ -813,6 +817,9 @@ static int omap_8250_rx_dma(struct uart_8250_port *p, unsigned int iir)
>  		break;
>  	}
>  
> +	if (priv->rx_dma_broken)
> +		return -EINVAL;
> +
>  	spin_lock_irqsave(&priv->rx_dma_lock, flags);
>  
>  	if (dma->rx_running)
>
Sebastian Andrzej Siewior Aug. 10, 2015, 12:19 p.m. UTC | #7
On 08/10/2015 01:54 PM, Peter Ujfalusi wrote:
>> diff --git a/drivers/tty/serial/8250/8250_omap.c b/drivers/tty/serial/8250/8250_omap.c
>> index 0340ee6ba970..07a11e0935e4 100644
>> --- a/drivers/tty/serial/8250/8250_omap.c
>> @@ -769,7 +771,9 @@ static void omap_8250_rx_dma_flush(struct uart_8250_port *p)
>>  		return;
>>  	}
>>  
>> -	dmaengine_pause(dma->rxchan);
>> +	ret = dmaengine_pause(dma->rxchan);
>> +	if (WARN_ON_ONCE(ret))
>> +		priv->rx_dma_broken = true;
> 
> I don't think this is good thing for the stable _and_ for the mainline at the
> same time:
> in stable the rx DMA should not be allowed since the stable kernels does not
> allow pause/resume with omap-dma, so there the rx DMA should be just disabled
> for UART. This change will cause regression since it introduce a WARN_ON_ONCE,
> which will be printed if the user tries to use non working feature.

Okay. We do have pause support in mainline for edma since v4.2-rc1.
This driver can use edma or sdma depending on the configuration. But it
is not yet released. So you suggest remove RX-DMA support completely
from the 8250-omap, mark it stable, and revert that patch once we have
it fixed in sdma?

> In mainline you will eventually going to have pause/resume support so this
> patch will make no sense there.

The way this works is that it has to be fixed upstream before it can be
backported stable. Also Russell made clear (for a good reason) that the
RX problem has to be fixed upstream before he thinks about acking the
SDMA patch.
So if you prefer to instead remove RX-DMA until it is fixed, I am all
yours.


Sebastian
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Peter Hurley Aug. 10, 2015, 1 p.m. UTC | #8
On 08/10/2015 07:54 AM, Peter Ujfalusi wrote:
> On 08/07/2015 11:00 PM, Sebastian Andrzej Siewior wrote:

> I don't think this is good thing for the stable _and_ for the mainline at the
> same time:
> in stable the rx DMA should not be allowed since the stable kernels does not
> allow pause/resume with omap-dma, so there the rx DMA should be just disabled
> for UART. This change will cause regression since it introduce a WARN_ON_ONCE,
> which will be printed if the user tries to use non working feature.
> 
> In mainline you will eventually going to have pause/resume support so this
> patch will make no sense there.

No, the whole point of this mess is that omap-dma does not provide pause/resume
support (without data loss). omap-dma will only be suitable for pause/terminate dma.

And adding pause support doesn't address the underlying problem that dmaengine
is not providing a means of determining if suitable support is available for
use by serial drivers, so this same problem is just waiting to happen again.

I think the way forward is,

For -stable, disable dma in the 8250_omap driver.
Then for mainline,
* extend the dma_get_slave_caps() api to differentiate the types of pause support
* query dma_get_slave_caps() when setting up the slave channel in 8250_dma & 8250_omap
  and only enable dma if suitable pause support is available
* add required dmaengine error checking in 8250_dma & 8250_omap _for unexpected errors_
  (so _not_ WARNs)
* do whatever with omap-dma. Not even sure it's worth trying to support dma with that;
  it still won't fully support tx dma which is forcing all kinds of goofy workarounds


Russell seemed to think that the current dma operation was necessary information to
differentiate types of pause support, but I don't think that's required.
As Sebastian's omap-dma driver patch shows, partial pause support has more
to do with how it's being used.

Regards,
Peter Hurley
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Russell King - ARM Linux Aug. 10, 2015, 5:15 p.m. UTC | #9
On Mon, Aug 10, 2015 at 09:00:29AM -0400, Peter Hurley wrote:
> Russell seemed to think that the current dma operation was necessary information to
> differentiate types of pause support, but I don't think that's required.
> As Sebastian's omap-dma driver patch shows, partial pause support has more
> to do with how it's being used.

Do you think you can rewrite the first sentence above in gramatically
correct English please, I'm failing to understand what you're saying.
Vinod Koul Aug. 11, 2015, 9:57 a.m. UTC | #10
On Sat, Aug 08, 2015 at 10:03:43AM +0100, Russell King - ARM Linux wrote:
> On Fri, Aug 07, 2015 at 08:28:57PM -0400, Peter Hurley wrote:
> > Even dma_get_slave_caps() returns _true_ for cmd_pause support; ok, that
> > interface is pointless.
> 
> How about reporting that as a bug then, because if you look back in the
> git history, as you are fully capable of, you will find that the slave
> capability stuff went in _after_ omap-dma, and *many* DMA engine drivers
> have not been updated.  Here, let me do _your_ work for you:
> 
> commit cb8cea513c80db1dfe2dce468d2d0772005bb9a1
> Author: Maxime Ripard <maxime.ripard@free-electrons.com>
> Date:   Mon Nov 17 14:42:04 2014 +0100
> 
>     dmaengine: Create a generic dma_slave_caps callback
> 
> commit 2dcdf570936168d488acf90be9b04a3d32dafce7
> Author: Peter Ujfalusi <peter.ujfalusi@ti.com>
> Date:   Fri Sep 14 15:05:45 2012 +0300
> 
>     dmaengine: omap: Add support for pause/resume in cyclic dma mode
> 
> Oh look, omap-dma pre-dates the creation of dma_slave_caps by over two
> years!
> 
> However, it's *not* as trivial as you think: the dma_slave_caps() call
> has no knowledge whether a channel will be used in cyclic mode or not,
> or which direction it will be used.  So, it really _can't_ report
> whether pause mode is supported or not.  So actually, this is something
> that can't be fixed trivially, and was a detail missed when the slave
> caps was reviewed (I probably missed the review or missed the point.)

The API queries the capabilities for a channel. So a channel has been
allocated. BUT it would not imply the direction as that is descriptor based,
so should we query the capabilities for a descriptor and use those in
clients ?

Also the current dma_slave_caps() has been moved to framework and reports
based on driver callbacks.
Now we have a hardware which supports pause for one direction only so we
should model it bit differently

Thoughts... ??
diff mbox

Patch

diff --git a/drivers/tty/serial/8250/8250_omap.c b/drivers/tty/serial/8250/8250_omap.c
index 0340ee6ba970..07a11e0935e4 100644
--- a/drivers/tty/serial/8250/8250_omap.c
+++ b/drivers/tty/serial/8250/8250_omap.c
@@ -112,6 +112,7 @@  struct omap8250_priv {
 	struct work_struct qos_work;
 	struct uart_8250_dma omap8250_dma;
 	spinlock_t rx_dma_lock;
+	bool rx_dma_broken;
 };
 
 static u32 uart_read(struct uart_8250_port *up, u32 reg)
@@ -761,6 +762,7 @@  static void omap_8250_rx_dma_flush(struct uart_8250_port *p)
 	struct omap8250_priv	*priv = p->port.private_data;
 	struct uart_8250_dma	*dma = p->dma;
 	unsigned long		flags;
+	int ret;
 
 	spin_lock_irqsave(&priv->rx_dma_lock, flags);
 
@@ -769,7 +771,9 @@  static void omap_8250_rx_dma_flush(struct uart_8250_port *p)
 		return;
 	}
 
-	dmaengine_pause(dma->rxchan);
+	ret = dmaengine_pause(dma->rxchan);
+	if (WARN_ON_ONCE(ret))
+		priv->rx_dma_broken = true;
 
 	spin_unlock_irqrestore(&priv->rx_dma_lock, flags);
 
@@ -813,6 +817,9 @@  static int omap_8250_rx_dma(struct uart_8250_port *p, unsigned int iir)
 		break;
 	}
 
+	if (priv->rx_dma_broken)
+		return -EINVAL;
+
 	spin_lock_irqsave(&priv->rx_dma_lock, flags);
 
 	if (dma->rx_running)