Message ID | 1425472054-7543-2-git-send-email-Dave.Martin@arm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, Mar 04, 2015 at 12:27:33PM +0000, Dave Martin wrote: > The current PL011 driver transmits a dummy character when the UART > is opened, to assert the TX IRQ for the first time > (see pl011_startup()). The UART is put in loopback mode temporarily, > so the receiver presumably shouldn't see anything. We do this so that we know for certain that the PL011 has room to accept at least one character. > However... > > At least some platforms containing a PL011 send characters down the > wire even when loopback mode is enabled. This means that a > spurious NUL character may be seen at the receiver when the PL011 is > opened through the TTY layer. ... which is an annoyance. > The current code also temporarily sets the baud rate to maximum and > the character width to the minimum, to that the dummy TX completes > as quickly as possible. If this is seen by the receiver it will > result in a framing error and can knock the receiver out of sync -- > turning subsequent output into garbage until synchronisation > is reestablished. (Particularly problematic during boot with systemd.) Yea, I have my own issues with systemd, but let's stay away from that potential argument. :) > To avoid spurious transmissions, this patch removes assumptions about > whether the TX IRQ will fire until at least one TX IRQ has been seen. > > Instead, the UART will unmask the TX IRQ and then slow-start via > polling and timer-based soft IRQs initially. If the TTY layer writes > enough data to fill the FIFO to the interrupt threshold in one go, > the TX IRQ should assert, at which point the driver changes to > fully interrupt-driven TX. I'm concerned about this. What happens if the PL011 is also being used as a console, and the UART TX FIFO is fully stuffed? Reading the updated code, it seems that we can call pl011_tx_chars() irrespective of whether the TX FIFO is full or not. pl011_tx_chars() makes the assumption that the TX FIFO can always accept the next character, and it results in (eg, in the x_char handling) the next character being written, even if the FIFO is full. If hardware CTS flow control is enabled, the problem gets worse - in that mode, the TX FIFO can remain fully occupied for an indeterminant amount of time. This introduces a whole new unreliability to the driver which wasn't there to start with. So, I'm afraid this gets a NAK. You need to check the TX FIFO status before trying to stuff it with characters.
On Thu, Mar 12, 2015 at 11:03:45AM +0000, Russell King - ARM Linux wrote: > On Wed, Mar 04, 2015 at 12:27:33PM +0000, Dave Martin wrote: > > The current PL011 driver transmits a dummy character when the UART > > is opened, to assert the TX IRQ for the first time > > (see pl011_startup()). The UART is put in loopback mode temporarily, > > so the receiver presumably shouldn't see anything. > > We do this so that we know for certain that the PL011 has room to accept > at least one character. Thanks for taking a look at this. [Greg, please hold these patches until I've reworked them] > > > However... > > > > At least some platforms containing a PL011 send characters down the > > wire even when loopback mode is enabled. This means that a > > spurious NUL character may be seen at the receiver when the PL011 is > > opened through the TTY layer. > > ... which is an annoyance. > > > The current code also temporarily sets the baud rate to maximum and > > the character width to the minimum, to that the dummy TX completes > > as quickly as possible. If this is seen by the receiver it will > > result in a framing error and can knock the receiver out of sync -- > > turning subsequent output into garbage until synchronisation > > is reestablished. (Particularly problematic during boot with systemd.) > > Yea, I have my own issues with systemd, but let's stay away from that > potential argument. :) (dunno what you could be suggesting...) > > > To avoid spurious transmissions, this patch removes assumptions about > > whether the TX IRQ will fire until at least one TX IRQ has been seen. > > > > Instead, the UART will unmask the TX IRQ and then slow-start via > > polling and timer-based soft IRQs initially. If the TTY layer writes > > enough data to fill the FIFO to the interrupt threshold in one go, > > the TX IRQ should assert, at which point the driver changes to > > fully interrupt-driven TX. > > I'm concerned about this. What happens if the PL011 is also being used > as a console, and the UART TX FIFO is fully stuffed? I don't think my patches change this situation. For normal printk, pl011_console_write() will disable IRQs at the CPU and take port->lock: so either pl011_int() finishes all its work safely before the console output is done, or the console output is done first. In the latter case, the condition that triggered the IRQ (i.e., TX FIFO has space) may no longer be true: the lock protects pl011_int() from the race that would otherwise exist between the check on TXIS and the actual writes to the FIFO. Only in special cases (sysrq and oopses) might the lock not be held. But this problem exists independently of my changes AFAICT. Correct output seems to be best-effort in these cases, which is probably the right thing to do. > > Reading the updated code, it seems that we can call pl011_tx_chars() > irrespective of whether the TX FIFO is full or not. pl011_tx_chars() > makes the assumption that the TX FIFO can always accept the next > character, and it results in (eg, in the x_char handling) the next > character being written, even if the FIFO is full. There are multiple paths into pl011_tx_chars() now, and the original invariant (TXIS is asserted and nothing else is filling the FIFO, therefore the FIFO is at least half-empty) is replaced with a more complex one. Now, the FIFO is definitely half-empty if tx_irq_seen >= 2. Otherwise, the FIFO fill level is unknown, and TXFF is polled for each write. I've been thinking about this, and my code is really more complex than it needs to be here. All that's really needed is an explicit argument to say whether pl011_tx_chars() was called due to a FIFO-definitely-has-space condition (IRQ or DMA underflow), or simply because there was some data to write (the pl011_start_tx() and softirq cases) -- this just boils down to identifying the call site. > If hardware CTS flow control is enabled, the problem gets worse - in > that mode, the TX FIFO can remain fully occupied for an indeterminant > amount of time. Not really? Regardless of why the FIFO is full, pl011_int() won't call pl011_tx_chars() because a full FIFO means TXIS is deasserted. In the start_tx/softirq case TXFF will be checked, as I try to explain above. > This introduces a whole new unreliability to the driver which wasn't > there to start with. I don't _think_ this adds unreliability (except for a dumbass bug I noticed where I call spin_lock() instead of spin_lock_irq() in pl011_tx_softirq()). But it does add complexity and will likely make the code harder to maintiain. Unless you have any other ideas I will try to simplify the code along the above lines. > You need to check the TX FIFO status before trying to stuff it with > characters. Indeed. Cheers ---Dave
On Thu, Mar 12, 2015 at 12:55:19PM +0000, Dave Martin wrote: > I don't think my patches change this situation. For normal printk, > pl011_console_write() will disable IRQs at the CPU and take port->lock: > so either pl011_int() finishes all its work safely before the console > output is done, or the console output is done first. > > In the latter case, the condition that triggered the IRQ (i.e., TX > FIFO has space) may no longer be true: the lock protects pl011_int() > from the race that would otherwise exist between the check on TXIS > and the actual writes to the FIFO. > > Only in special cases (sysrq and oopses) might the lock not be > held. But this problem exists independently of my changes AFAICT. > Correct output seems to be best-effort in these cases, which is > probably the right thing to do. Actually, yes, you're right, I'm wrong. The console won't drop the spinlock until the last character has left the TX FIFO and has been sent on the wire, so at the point I was originally thinking of, we guarantee that the FIFO is empty (if it weren't like that, it'd almost certainly be full.) > > If hardware CTS flow control is enabled, the problem gets worse - in > > that mode, the TX FIFO can remain fully occupied for an indeterminant > > amount of time. > > Not really? Regardless of why the FIFO is full, pl011_int() won't > call pl011_tx_chars() because a full FIFO means TXIS is deasserted. > > In the start_tx/softirq case TXFF will be checked, as I try to > explain above. Yes, in that case, pl011_int() won't be called for transmit, but what about the softirq case. Looking at the patch again, I don't see pl011_tx_chars() ever returning true - so the while (pl011_tx_chars()); in the softirq looks strange (if it always returns false, we won't ever iterate.) > Unless you have any other ideas I will try to simplify the code > along the above lines. I forget whether we have problems with UARTs which don't generate any TX irqs - I hope we don't. What about doing this instead, which should result in fewer changes to the driver (especially to the fast paths): - On startup, set a flag to indicate that the TX state is unknown (so we can detect the first start_tx().) - When we get our first start_tx(), check the flag. - read the FIFO flags and interrupt status. If the FIFO flags indicate that the TX FIFO is empty, but there is no transmit interrupt, read and load the first character into the FIFO. Instead of just loading the first character, as we've checked the FIFO status, we could alternatively load the first half of the FIFO via a call to pl011_tx_chars(). - clear the flag. This should mean that we keep the driver structure we have today. The only concern I have about that is (from what I remember) we don't expect a call to ->start_tx to cause a tx stop event, though I don't see anything in serial_core which would get upset if we did.
On Thu, Mar 12, 2015 at 02:39:30PM +0000, Russell King - ARM Linux wrote: > On Thu, Mar 12, 2015 at 12:55:19PM +0000, Dave Martin wrote: > > I don't think my patches change this situation. For normal printk, > > pl011_console_write() will disable IRQs at the CPU and take port->lock: > > so either pl011_int() finishes all its work safely before the console > > output is done, or the console output is done first. > > > > In the latter case, the condition that triggered the IRQ (i.e., TX > > FIFO has space) may no longer be true: the lock protects pl011_int() > > from the race that would otherwise exist between the check on TXIS > > and the actual writes to the FIFO. > > > > Only in special cases (sysrq and oopses) might the lock not be > > held. But this problem exists independently of my changes AFAICT. > > Correct output seems to be best-effort in these cases, which is > > probably the right thing to do. > > Actually, yes, you're right, I'm wrong. The console won't drop the > spinlock until the last character has left the TX FIFO and has been > sent on the wire, so at the point I was originally thinking of, we > guarantee that the FIFO is empty (if it weren't like that, it'd > almost certainly be full.) Yes, that's also true. > > > If hardware CTS flow control is enabled, the problem gets worse - in > > > that mode, the TX FIFO can remain fully occupied for an indeterminant > > > amount of time. > > > > Not really? Regardless of why the FIFO is full, pl011_int() won't > > call pl011_tx_chars() because a full FIFO means TXIS is deasserted. > > > > In the start_tx/softirq case TXFF will be checked, as I try to > > explain above. > > Yes, in that case, pl011_int() won't be called for transmit, but what > about the softirq case. > > Looking at the patch again, I don't see pl011_tx_chars() ever returning > true - so the while (pl011_tx_chars()); in the softirq looks strange > (if it always returns false, we won't ever iterate.) Oops, you're right. This is a legacy from an older spin of the patches, where tx_softirq had to spin around tx_chars in the case of an xchar, just like pl011_int must at present. That seemed contrived, so I got rid of this requirement, but that makes the return value and the loop redundant. > > > Unless you have any other ideas I will try to simplify the code > > along the above lines. > > I forget whether we have problems with UARTs which don't generate any TX > irqs - I hope we don't. Currently the code relies on a working TX IRQ anyway, so I'm hoping it's safe to assume we don't have to cope with that. > What about doing this instead, which should result in fewer changes to > the driver (especially to the fast paths): > > - On startup, set a flag to indicate that the TX state is unknown (so we > can detect the first start_tx().) > - When we get our first start_tx(), check the flag. > - read the FIFO flags and interrupt status. If the FIFO flags indicate > that the TX FIFO is empty, but there is no transmit interrupt, read > and load the first character into the FIFO. Instead of just loading > the first character, as we've checked the FIFO status, we could > alternatively load the first half of the FIFO via a call to > pl011_tx_chars(). > - clear the flag. > > This should mean that we keep the driver structure we have today. I had originally hoped to do things that way, but there is a problem: unless pl011_tx_chars() sees TXFF at some point, there is no way to know how full the FIFO got. For example, if there are fifosize/2 chars queued from serial_core initially, it's likely that the FIFO won't get half-full. This permits a situation where we have to leave the flag set (to be on the safe side), but a TX IRQ does actually happen. So, the flag can't be cleared unconditionally -- I took an "I'll believe it when I see it" approach to the TX IRQ. tx_irq_seen should really be a bool though -- the counter-style thing works around a race which could only happen because I mastakenly wasn't disabling interrupts in pl011_tx_softirq(). There could still be merit in transmitting the initial chars directly instead of waiting for a TX IRQ, even if we already know the TX IRQ works, especially if start_tx() is called because of a newly pending x_char. But this is really an optimisation, so I may present it as a separate patch. > The only concern I have about that is (from what I remember) we don't > expect a call to ->start_tx to cause a tx stop event, though I don't > see anything in serial_core which would get upset if we did. That's a fair point. stop_tx will normally happen asynchronously, but the port lock has to be released first. 8250 does stop_tx from its softirq in the analogous situation, so we could do the same here. I'll rework the series around my current ideas first, and then we can see if it needs further restructuring... Cheers ---Dave
diff --git a/drivers/tty/serial/amba-pl011.c b/drivers/tty/serial/amba-pl011.c index 8d94c19..69910f7 100644 --- a/drivers/tty/serial/amba-pl011.c +++ b/drivers/tty/serial/amba-pl011.c @@ -58,6 +58,7 @@ #include <linux/pinctrl/consumer.h> #include <linux/sizes.h> #include <linux/io.h> +#include <linux/workqueue.h> #define UART_NR 14 @@ -156,7 +157,9 @@ struct uart_amba_port { unsigned int lcrh_tx; /* vendor-specific */ unsigned int lcrh_rx; /* vendor-specific */ unsigned int old_cr; /* state during shutdown */ + struct delayed_work tx_softirq_work; bool autorts; + unsigned int tx_irq_seen; /* 0=none, 1=1, 2=2 or more */ char type[12]; #ifdef CONFIG_DMA_ENGINE /* DMA stuff */ @@ -440,8 +443,9 @@ static void pl011_dma_remove(struct uart_amba_port *uap) dma_release_channel(uap->dmarx.chan); } -/* Forward declare this for the refill routine */ +/* Forward declare these for the refill routine */ static int pl011_dma_tx_refill(struct uart_amba_port *uap); +static void pl011_start_tx_pio(struct uart_amba_port *uap); /* * The current DMA TX buffer has been sent. @@ -479,14 +483,13 @@ static void pl011_dma_tx_callback(void *data) return; } - if (pl011_dma_tx_refill(uap) <= 0) { + if (pl011_dma_tx_refill(uap) <= 0) /* * We didn't queue a DMA buffer for some reason, but we * have data pending to be sent. Re-enable the TX IRQ. */ - uap->im |= UART011_TXIM; - writew(uap->im, uap->port.membase + UART011_IMSC); - } + pl011_start_tx_pio(uap); + spin_unlock_irqrestore(&uap->port.lock, flags); } @@ -664,12 +667,10 @@ static inline bool pl011_dma_tx_start(struct uart_amba_port *uap) if (!uap->dmatx.queued) { if (pl011_dma_tx_refill(uap) > 0) { uap->im &= ~UART011_TXIM; - ret = true; - } else { - uap->im |= UART011_TXIM; + writew(uap->im, uap->port.membase + + UART011_IMSC); + } else ret = false; - } - writew(uap->im, uap->port.membase + UART011_IMSC); } else if (!(uap->dmacr & UART011_TXDMAE)) { uap->dmacr |= UART011_TXDMAE; writew(uap->dmacr, @@ -1208,15 +1209,24 @@ static void pl011_stop_tx(struct uart_port *port) pl011_dma_tx_stop(uap); } +static bool pl011_tx_chars(struct uart_amba_port *uap); + +/* Start TX with programmed I/O only (no DMA) */ +static void pl011_start_tx_pio(struct uart_amba_port *uap) +{ + uap->im |= UART011_TXIM; + writew(uap->im, uap->port.membase + UART011_IMSC); + if (!uap->tx_irq_seen) + pl011_tx_chars(uap); +} + static void pl011_start_tx(struct uart_port *port) { struct uart_amba_port *uap = container_of(port, struct uart_amba_port, port); - if (!pl011_dma_tx_start(uap)) { - uap->im |= UART011_TXIM; - writew(uap->im, uap->port.membase + UART011_IMSC); - } + if (!pl011_dma_tx_start(uap)) + pl011_start_tx_pio(uap); } static void pl011_stop_rx(struct uart_port *port) @@ -1274,40 +1284,87 @@ __acquires(&uap->port.lock) spin_lock(&uap->port.lock); } -static void pl011_tx_chars(struct uart_amba_port *uap) +/* + * Transmit a character + * There must be at least one free entry in the TX FIFO to accept the char. + * + * Returns true if the FIFO might have space in it afterwards; + * returns false if the FIFO definitely became full. + */ +static bool pl011_tx_char(struct uart_amba_port *uap, unsigned char c) +{ + writew(c, uap->port.membase + UART01x_DR); + uap->port.icount.tx++; + + if (likely(uap->tx_irq_seen > 1)) + return true; + + return !(readw(uap->port.membase + UART01x_FR) & UART01x_FR_TXFF); +} + +static bool pl011_tx_chars(struct uart_amba_port *uap) { struct circ_buf *xmit = &uap->port.state->xmit; int count; + if (unlikely(uap->tx_irq_seen < 2)) + /* + * Initial FIFO fill level unknown: we must check TXFF + * after each write, so just try to fill up the FIFO. + */ + count = uap->fifosize; + else /* tx_irq_seen >= 2 */ + /* + * FIFO initially at least half-empty, so we can simply + * write half the FIFO without polling TXFF. + + * Note: the *first* TX IRQ can still race with + * pl011_start_tx_pio(), which can result in the FIFO + * being fuller than expected in that case. + */ + count = uap->fifosize >> 1; + + /* + * If the FIFO is full we're guaranteed a TX IRQ at some later point, + * and can't transmit immediately in any case: + */ + if (unlikely(uap->tx_irq_seen < 2 && + readw(uap->port.membase + UART01x_FR) & UART01x_FR_TXFF)) + return false; + if (uap->port.x_char) { - writew(uap->port.x_char, uap->port.membase + UART01x_DR); - uap->port.icount.tx++; + pl011_tx_char(uap, uap->port.x_char); uap->port.x_char = 0; - return; + --count; } if (uart_circ_empty(xmit) || uart_tx_stopped(&uap->port)) { pl011_stop_tx(&uap->port); - return; + goto done; } /* If we are using DMA mode, try to send some characters. */ if (pl011_dma_tx_irq(uap)) - return; + goto done; - count = uap->fifosize >> 1; - do { - writew(xmit->buf[xmit->tail], uap->port.membase + UART01x_DR); + while (count-- > 0 && pl011_tx_char(uap, xmit->buf[xmit->tail])) { xmit->tail = (xmit->tail + 1) & (UART_XMIT_SIZE - 1); - uap->port.icount.tx++; if (uart_circ_empty(xmit)) break; - } while (--count > 0); + } if (uart_circ_chars_pending(xmit) < WAKEUP_CHARS) uart_write_wakeup(&uap->port); - if (uart_circ_empty(xmit)) + if (uart_circ_empty(xmit)) { pl011_stop_tx(&uap->port); + goto done; + } + + if (unlikely(!uap->tx_irq_seen)) + schedule_delayed_work(&uap->tx_softirq_work, uap->port.timeout); + +done: + return false; } static void pl011_modem_status(struct uart_amba_port *uap) @@ -1334,6 +1391,28 @@ static void pl011_modem_status(struct uart_amba_port *uap) wake_up_interruptible(&uap->port.state->port.delta_msr_wait); } +static void pl011_tx_softirq(struct work_struct *work) +{ + struct delayed_work *dwork = to_delayed_work(work); + struct uart_amba_port *uap = + container_of(dwork, struct uart_amba_port, tx_softirq_work); + + spin_lock(&uap->port.lock); + while (pl011_tx_chars(uap)) ; + spin_unlock(&uap->port.lock); +} + +static void pl011_tx_irq_seen(struct uart_amba_port *uap) +{ + if (likely(uap->tx_irq_seen > 1)) + return; + + uap->tx_irq_seen++; + if (uap->tx_irq_seen < 2) + /* first TX IRQ */ + cancel_delayed_work(&uap->tx_softirq_work); +} + static irqreturn_t pl011_int(int irq, void *dev_id) { struct uart_amba_port *uap = dev_id; @@ -1372,8 +1451,10 @@ static irqreturn_t pl011_int(int irq, void *dev_id) if (status & (UART011_DSRMIS|UART011_DCDMIS| UART011_CTSMIS|UART011_RIMIS)) pl011_modem_status(uap); - if (status & UART011_TXIS) + if (status & UART011_TXIS) { + pl011_tx_irq_seen(uap); pl011_tx_chars(uap); + } if (pass_counter-- == 0) break; @@ -1577,7 +1658,7 @@ static int pl011_startup(struct uart_port *port) { struct uart_amba_port *uap = container_of(port, struct uart_amba_port, port); - unsigned int cr, lcr_h, fbrd, ibrd; + unsigned int cr; int retval; retval = pl011_hwinit(port); @@ -1595,29 +1676,10 @@ static int pl011_startup(struct uart_port *port) writew(uap->vendor->ifls, uap->port.membase + UART011_IFLS); - /* - * Provoke TX FIFO interrupt into asserting. Taking care to preserve - * baud rate and data format specified by FBRD, IBRD and LCRH as the - * UART may already be in use as a console. - */ - spin_lock_irq(&uap->port.lock); - - fbrd = readw(uap->port.membase + UART011_FBRD); - ibrd = readw(uap->port.membase + UART011_IBRD); - lcr_h = readw(uap->port.membase + uap->lcrh_rx); - - cr = UART01x_CR_UARTEN | UART011_CR_TXE | UART011_CR_LBE; - writew(cr, uap->port.membase + UART011_CR); - writew(0, uap->port.membase + UART011_FBRD); - writew(1, uap->port.membase + UART011_IBRD); - pl011_write_lcr_h(uap, 0); - writew(0, uap->port.membase + UART01x_DR); - while (readw(uap->port.membase + UART01x_FR) & UART01x_FR_BUSY) - barrier(); + /* Assume that TX IRQ doesn't work until we see one: */ + uap->tx_irq_seen = 0; - writew(fbrd, uap->port.membase + UART011_FBRD); - writew(ibrd, uap->port.membase + UART011_IBRD); - pl011_write_lcr_h(uap, lcr_h); + spin_lock_irq(&uap->port.lock); /* restore RTS and DTR */ cr = uap->old_cr & (UART011_CR_RTS | UART011_CR_DTR); @@ -1672,6 +1734,8 @@ static void pl011_shutdown(struct uart_port *port) container_of(port, struct uart_amba_port, port); unsigned int cr; + cancel_delayed_work_sync(&uap->tx_softirq_work); + /* * disable all interrupts */ @@ -2218,6 +2282,7 @@ static int pl011_probe(struct amba_device *dev, const struct amba_id *id) uap->port.ops = &amba_pl011_pops; uap->port.flags = UPF_BOOT_AUTOCONF; uap->port.line = i; + INIT_DELAYED_WORK(&uap->tx_softirq_work, pl011_tx_softirq); pl011_dma_probe(&dev->dev, uap); /* Ensure interrupts from this UART are masked and cleared */
The current PL011 driver transmits a dummy character when the UART is opened, to assert the TX IRQ for the first time (see pl011_startup()). The UART is put in loopback mode temporarily, so the receiver presumably shouldn't see anything. However... At least some platforms containing a PL011 send characters down the wire even when loopback mode is enabled. This means that a spurious NUL character may be seen at the receiver when the PL011 is opened through the TTY layer. The current code also temporarily sets the baud rate to maximum and the character width to the minimum, to that the dummy TX completes as quickly as possible. If this is seen by the receiver it will result in a framing error and can knock the receiver out of sync -- turning subsequent output into garbage until synchronisation is reestablished. (Particularly problematic during boot with systemd.) To avoid spurious transmissions, this patch removes assumptions about whether the TX IRQ will fire until at least one TX IRQ has been seen. Instead, the UART will unmask the TX IRQ and then slow-start via polling and timer-based soft IRQs initially. If the TTY layer writes enough data to fill the FIFO to the interrupt threshold in one go, the TX IRQ should assert, at which point the driver changes to fully interrupt-driven TX. In this way, the TX IRQ is activated as a side-effect instead of being done deliberately. This should also mean that the driver works on the SBSA Generic UART[1] (a cut-down PL011) without invasive changes. The Generic UART lacks some features needed for the dummy TX approach to work (FIFO disabling and loopback). [1] Server Base System Architecture (ARM-DEN-0029-v2.3) http://infocenter.arm.com/ (click-thru required :/) Signed-off-by: Dave Martin <Dave.Martin@arm.com> --- drivers/tty/serial/amba-pl011.c | 165 +++++++++++++++++++++++++++------------ 1 file changed, 115 insertions(+), 50 deletions(-)