Message ID | 20240625092440.1.Icf914852be911b95aefa9d798b6f1cd1a180f985@changeid (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
Series | serial: qcom-geni: Avoid hard lockup if bytes are dropped | expand |
diff --git a/drivers/tty/serial/qcom_geni_serial.c b/drivers/tty/serial/qcom_geni_serial.c index 2bd25afe0d92..fc202233a3ee 100644 --- a/drivers/tty/serial/qcom_geni_serial.c +++ b/drivers/tty/serial/qcom_geni_serial.c @@ -904,8 +904,8 @@ static void qcom_geni_serial_handle_tx_fifo(struct uart_port *uport, goto out_write_wakeup; if (!port->tx_remaining) { - qcom_geni_serial_setup_tx(uport, pending); - port->tx_remaining = pending; + qcom_geni_serial_setup_tx(uport, chunk); + port->tx_remaining = chunk; irq_en = readl(uport->membase + SE_GENI_M_IRQ_EN); if (!(irq_en & M_TX_FIFO_WATERMARK_EN))
If you start sending a large chunk of text over the UART (like `cat /var/log/messages`) and then hit Ctrl-C to stop it then, as of commit 1788cf6a91d9 ("tty: serial: switch from circ_buf to kfifo"), you'll get a hard lockup. Specifically, the driver ends up looping in qcom_geni_serial_send_chunk_fifo(). uart_fifo_out() will return 0 bytes were transferred and the loop will never make progress. Avoid the hard lockup by making sure we never kick off a transfer that is larger than we can queue up immediately. The issue stems from the fact that the geni serial driver tried to be more efficient by kicking off large transfers. It tried to do this because the design of geni means that whenever we get to the end of a transfer there is a period of time where the line goes idle while we wait for an interrupt to start a new transfer. The geni serial driver kicked off large transfers by peeking into the Linux software FIFO and kicking off a transfer based on the number of bytes there. While that worked (mostly), there was an unhandled corner case when the Linux software FIFO shrank, as happens when you kill a process that had queued up lots of data to send. Prior to the recent kfifo change, the geni driver would keep sending data that had been "removed" from the Linux software FIFO. While definitely wrong, this didn't feel too terrible. In the above instance of hitting Ctrl-C while catting a large file you'd see the file keep spewing out to the console for a few hundred milliseconds after the process died. As mentioned above, after the kfifo change we get a hard lockup. Digging into the geni serial driver shows a whole pile of issues and those should be fixed. One patch series attempting to fix the issue has had positive testing/reviews [1] but it's a fairly large change. While debating / researching the right long term solution, this small patch at least prevents the hard lockup. NOTE: this change does have performance impacts. On my sc7180-trogdor device I measured something like a 2% slowdown. Others has seen something more like a 20-25% slowdown [2]. However, correctness trumps performance so landing this makes sense while better solutions are devised. [1] https://lore.kernel.org/r/20240610222515.3023730-1-dianders@chromium.org [2] https://lore.kernel.org/r/ZnraAlR9QeYhd628@hovoldconsulting.com Fixes: 1788cf6a91d9 ("tty: serial: switch from circ_buf to kfifo") Signed-off-by: Douglas Anderson <dianders@chromium.org> --- As discussed with Johan [3], this probably makes sense to land as a stopgap while we come to agreement on how to solve the larger issues. NOTE: I'll be away from my work computer for the next 1.5 weeks. Hopefully this can land in the meantime. If it needs spinning / reworking I wouldn't object to someone else taking it on. I've removed all "Tested-by" tags here since the code is pretty different and it only solves a subset of the issues of the larger series. [3] https://lore.kernel.org/r/ZnraAlR9QeYhd628@hovoldconsulting.com drivers/tty/serial/qcom_geni_serial.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)