diff mbox series

serial: qcom-geni: Avoid hard lockup if bytes are dropped

Message ID 20240625092440.1.Icf914852be911b95aefa9d798b6f1cd1a180f985@changeid (mailing list archive)
State Not Applicable
Headers show
Series serial: qcom-geni: Avoid hard lockup if bytes are dropped | expand

Commit Message

Doug Anderson June 25, 2024, 4:24 p.m. UTC
If you start sending a large chunk of text over the UART (like `cat
/var/log/messages`) and then hit Ctrl-C to stop it then, as of commit
1788cf6a91d9 ("tty: serial: switch from circ_buf to kfifo"), you'll
get a hard lockup. Specifically, the driver ends up looping in
qcom_geni_serial_send_chunk_fifo(). uart_fifo_out() will return 0
bytes were transferred and the loop will never make progress.

Avoid the hard lockup by making sure we never kick off a transfer that
is larger than we can queue up immediately.

The issue stems from the fact that the geni serial driver tried to be
more efficient by kicking off large transfers. It tried to do this
because the design of geni means that whenever we get to the end of a
transfer there is a period of time where the line goes idle while we
wait for an interrupt to start a new transfer.

The geni serial driver kicked off large transfers by peeking into the
Linux software FIFO and kicking off a transfer based on the number of
bytes there. While that worked (mostly), there was an unhandled corner
case when the Linux software FIFO shrank, as happens when you kill a
process that had queued up lots of data to send.

Prior to the recent kfifo change, the geni driver would keep sending
data that had been "removed" from the Linux software FIFO. While
definitely wrong, this didn't feel too terrible. In the above instance
of hitting Ctrl-C while catting a large file you'd see the file keep
spewing out to the console for a few hundred milliseconds after the
process died. As mentioned above, after the kfifo change we get a hard
lockup.

Digging into the geni serial driver shows a whole pile of issues and
those should be fixed. One patch series attempting to fix the issue
has had positive testing/reviews [1] but it's a fairly large change.
While debating / researching the right long term solution, this small
patch at least prevents the hard lockup.

NOTE: this change does have performance impacts. On my sc7180-trogdor
device I measured something like a 2% slowdown. Others has seen
something more like a 20-25% slowdown [2]. However, correctness trumps
performance so landing this makes sense while better solutions are
devised.

[1] https://lore.kernel.org/r/20240610222515.3023730-1-dianders@chromium.org
[2] https://lore.kernel.org/r/ZnraAlR9QeYhd628@hovoldconsulting.com

Fixes: 1788cf6a91d9 ("tty: serial: switch from circ_buf to kfifo")
Signed-off-by: Douglas Anderson <dianders@chromium.org>
---
As discussed with Johan [3], this probably makes sense to land as a
stopgap while we come to agreement on how to solve the larger issues.

NOTE: I'll be away from my work computer for the next 1.5 weeks.
Hopefully this can land in the meantime. If it needs spinning /
reworking I wouldn't object to someone else taking it on.

I've removed all "Tested-by" tags here since the code is pretty
different and it only solves a subset of the issues of the larger
series.

[3] https://lore.kernel.org/r/ZnraAlR9QeYhd628@hovoldconsulting.com

 drivers/tty/serial/qcom_geni_serial.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
diff mbox series

Patch

diff --git a/drivers/tty/serial/qcom_geni_serial.c b/drivers/tty/serial/qcom_geni_serial.c
index 2bd25afe0d92..fc202233a3ee 100644
--- a/drivers/tty/serial/qcom_geni_serial.c
+++ b/drivers/tty/serial/qcom_geni_serial.c
@@ -904,8 +904,8 @@  static void qcom_geni_serial_handle_tx_fifo(struct uart_port *uport,
 		goto out_write_wakeup;
 
 	if (!port->tx_remaining) {
-		qcom_geni_serial_setup_tx(uport, pending);
-		port->tx_remaining = pending;
+		qcom_geni_serial_setup_tx(uport, chunk);
+		port->tx_remaining = chunk;
 
 		irq_en = readl(uport->membase + SE_GENI_M_IRQ_EN);
 		if (!(irq_en & M_TX_FIFO_WATERMARK_EN))