diff mbox series

[v2] i2c: i2c-qcom-geni: Fix DMA transfer race

Message ID 20200722145948.v2.1.I7efdf6efaa6edadbb690196cd4fbe3392a582c89@changeid (mailing list archive)
State Accepted
Commit 02b9aec59243c6240fc42884acc958602146ddf6
Headers show
Series [v2] i2c: i2c-qcom-geni: Fix DMA transfer race | expand

Commit Message

Doug Anderson July 22, 2020, 10 p.m. UTC
When I have KASAN enabled on my kernel and I start stressing the
touchscreen my system tends to hang.  The touchscreen is one of the
only things that does a lot of big i2c transfers and ends up hitting
the DMA paths in the geni i2c driver.  It appears that KASAN adds
enough delay in my system to tickle a race condition in the DMA setup
code.

When the system hangs, I found that it was running the geni_i2c_irq()
over and over again.  It had these:

m_stat   = 0x04000080
rx_st    = 0x30000011
dm_tx_st = 0x00000000
dm_rx_st = 0x00000000
dma      = 0x00000001

Notably we're in DMA mode but are getting M_RX_IRQ_EN and
M_RX_FIFO_WATERMARK_EN over and over again.

Putting some traces in geni_i2c_rx_one_msg() showed that when we
failed we were getting to the start of geni_i2c_rx_one_msg() but were
never executing geni_se_rx_dma_prep().

I believe that the problem here is that we are starting the geni
command before we run geni_se_rx_dma_prep().  If a transfer makes it
far enough before we do that then we get into the state I have
observed.  Let's change the order, which seems to work fine.

Although problems were seen on the RX path, code inspection suggests
that the TX should be changed too.  Change it as well.

Fixes: 37692de5d523 ("i2c: i2c-qcom-geni: Add bus driver for the Qualcomm GENI I2C controller")
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Tested-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
Reviewed-by: Akash Asthana <akashast@codeaurora.org>
---
Even though this patch is slightly different than v1 I have kept tags.
Hopefully this is OK.

Changes in v2:
- Fix both TX and RX.
- Only move the setting up of the command, not the set of the length.

 drivers/i2c/busses/i2c-qcom-geni.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

Comments

Stephen Boyd July 23, 2020, 12:50 a.m. UTC | #1
Quoting Douglas Anderson (2020-07-22 15:00:21)
> When I have KASAN enabled on my kernel and I start stressing the
> touchscreen my system tends to hang.  The touchscreen is one of the
> only things that does a lot of big i2c transfers and ends up hitting
> the DMA paths in the geni i2c driver.  It appears that KASAN adds
> enough delay in my system to tickle a race condition in the DMA setup
> code.
> 
> When the system hangs, I found that it was running the geni_i2c_irq()
> over and over again.  It had these:
> 
> m_stat   = 0x04000080
> rx_st    = 0x30000011
> dm_tx_st = 0x00000000
> dm_rx_st = 0x00000000
> dma      = 0x00000001
> 
> Notably we're in DMA mode but are getting M_RX_IRQ_EN and
> M_RX_FIFO_WATERMARK_EN over and over again.
> 
> Putting some traces in geni_i2c_rx_one_msg() showed that when we
> failed we were getting to the start of geni_i2c_rx_one_msg() but were
> never executing geni_se_rx_dma_prep().
> 
> I believe that the problem here is that we are starting the geni
> command before we run geni_se_rx_dma_prep().  If a transfer makes it
> far enough before we do that then we get into the state I have
> observed.  Let's change the order, which seems to work fine.
> 
> Although problems were seen on the RX path, code inspection suggests
> that the TX should be changed too.  Change it as well.
> 
> Fixes: 37692de5d523 ("i2c: i2c-qcom-geni: Add bus driver for the Qualcomm GENI I2C controller")
> Signed-off-by: Douglas Anderson <dianders@chromium.org>
> Tested-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
> Reviewed-by: Akash Asthana <akashast@codeaurora.org>
> ---

Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Mukesh, Savaliya July 23, 2020, 6:16 a.m. UTC | #2
On 7/23/2020 6:20 AM, Stephen Boyd wrote:
> Quoting Douglas Anderson (2020-07-22 15:00:21)
>> When I have KASAN enabled on my kernel and I start stressing the
>> touchscreen my system tends to hang.  The touchscreen is one of the
>> only things that does a lot of big i2c transfers and ends up hitting
>> the DMA paths in the geni i2c driver.  It appears that KASAN adds
>> enough delay in my system to tickle a race condition in the DMA setup
>> code.
>>
>> When the system hangs, I found that it was running the geni_i2c_irq()
>> over and over again.  It had these:
>>
>> m_stat   = 0x04000080
>> rx_st    = 0x30000011
>> dm_tx_st = 0x00000000
>> dm_rx_st = 0x00000000
>> dma      = 0x00000001
>>
>> Notably we're in DMA mode but are getting M_RX_IRQ_EN and
>> M_RX_FIFO_WATERMARK_EN over and over again.
>>
>> Putting some traces in geni_i2c_rx_one_msg() showed that when we
>> failed we were getting to the start of geni_i2c_rx_one_msg() but were
>> never executing geni_se_rx_dma_prep().
>>
>> I believe that the problem here is that we are starting the geni
>> command before we run geni_se_rx_dma_prep().  If a transfer makes it
>> far enough before we do that then we get into the state I have
>> observed.  Let's change the order, which seems to work fine.
>>
>> Although problems were seen on the RX path, code inspection suggests
>> that the TX should be changed too.  Change it as well.
>>
>> Fixes: 37692de5d523 ("i2c: i2c-qcom-geni: Add bus driver for the Qualcomm GENI I2C controller")
>> Signed-off-by: Douglas Anderson <dianders@chromium.org>
>> Tested-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
>> Reviewed-by: Akash Asthana <akashast@codeaurora.org>
Reviewed-by: Mukesh Kumar Savaliya <msavaliy@codeaurora.org>
>> ---
> Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Wolfram Sang July 23, 2020, 7:56 p.m. UTC | #3
On Wed, Jul 22, 2020 at 03:00:21PM -0700, Douglas Anderson wrote:
> When I have KASAN enabled on my kernel and I start stressing the
> touchscreen my system tends to hang.  The touchscreen is one of the
> only things that does a lot of big i2c transfers and ends up hitting
> the DMA paths in the geni i2c driver.  It appears that KASAN adds
> enough delay in my system to tickle a race condition in the DMA setup
> code.
> 
> When the system hangs, I found that it was running the geni_i2c_irq()
> over and over again.  It had these:
> 
> m_stat   = 0x04000080
> rx_st    = 0x30000011
> dm_tx_st = 0x00000000
> dm_rx_st = 0x00000000
> dma      = 0x00000001
> 
> Notably we're in DMA mode but are getting M_RX_IRQ_EN and
> M_RX_FIFO_WATERMARK_EN over and over again.
> 
> Putting some traces in geni_i2c_rx_one_msg() showed that when we
> failed we were getting to the start of geni_i2c_rx_one_msg() but were
> never executing geni_se_rx_dma_prep().
> 
> I believe that the problem here is that we are starting the geni
> command before we run geni_se_rx_dma_prep().  If a transfer makes it
> far enough before we do that then we get into the state I have
> observed.  Let's change the order, which seems to work fine.
> 
> Although problems were seen on the RX path, code inspection suggests
> that the TX should be changed too.  Change it as well.
> 
> Fixes: 37692de5d523 ("i2c: i2c-qcom-geni: Add bus driver for the Qualcomm GENI I2C controller")
> Signed-off-by: Douglas Anderson <dianders@chromium.org>
> Tested-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
> Reviewed-by: Akash Asthana <akashast@codeaurora.org>

Applied to for-current, thanks!
Wolfram Sang July 26, 2020, 12:47 p.m. UTC | #4
On Thu, Jul 23, 2020 at 09:56:34PM +0200, Wolfram Sang wrote:
> On Wed, Jul 22, 2020 at 03:00:21PM -0700, Douglas Anderson wrote:
> > When I have KASAN enabled on my kernel and I start stressing the
> > touchscreen my system tends to hang.  The touchscreen is one of the
> > only things that does a lot of big i2c transfers and ends up hitting
> > the DMA paths in the geni i2c driver.  It appears that KASAN adds
> > enough delay in my system to tickle a race condition in the DMA setup
> > code.
> > 
> > When the system hangs, I found that it was running the geni_i2c_irq()
> > over and over again.  It had these:
> > 
> > m_stat   = 0x04000080
> > rx_st    = 0x30000011
> > dm_tx_st = 0x00000000
> > dm_rx_st = 0x00000000
> > dma      = 0x00000001
> > 
> > Notably we're in DMA mode but are getting M_RX_IRQ_EN and
> > M_RX_FIFO_WATERMARK_EN over and over again.
> > 
> > Putting some traces in geni_i2c_rx_one_msg() showed that when we
> > failed we were getting to the start of geni_i2c_rx_one_msg() but were
> > never executing geni_se_rx_dma_prep().
> > 
> > I believe that the problem here is that we are starting the geni
> > command before we run geni_se_rx_dma_prep().  If a transfer makes it
> > far enough before we do that then we get into the state I have
> > observed.  Let's change the order, which seems to work fine.
> > 
> > Although problems were seen on the RX path, code inspection suggests
> > that the TX should be changed too.  Change it as well.
> > 
> > Fixes: 37692de5d523 ("i2c: i2c-qcom-geni: Add bus driver for the Qualcomm GENI I2C controller")
> > Signed-off-by: Douglas Anderson <dianders@chromium.org>
> > Tested-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
> > Reviewed-by: Akash Asthana <akashast@codeaurora.org>
> 
> Applied to for-current, thanks!

Glad we got this sorted. I just wondered that Alok wasn't part of the
discussion. Is he still interested in maintaining the driver? Also
because there is an unprocessed patch left for this driver:

http://patchwork.ozlabs.org/project/linux-i2c/patch/20191103212204.13606-1-colin.king@canonical.com/
Akash Asthana July 27, 2020, 8:36 a.m. UTC | #5
On 7/26/2020 6:17 PM, Wolfram Sang wrote:
> On Thu, Jul 23, 2020 at 09:56:34PM +0200, Wolfram Sang wrote:
>> On Wed, Jul 22, 2020 at 03:00:21PM -0700, Douglas Anderson wrote:
>>> When I have KASAN enabled on my kernel and I start stressing the
>>> touchscreen my system tends to hang.  The touchscreen is one of the
>>> only things that does a lot of big i2c transfers and ends up hitting
>>> the DMA paths in the geni i2c driver.  It appears that KASAN adds
>>> enough delay in my system to tickle a race condition in the DMA setup
>>> code.
>>>
>>> When the system hangs, I found that it was running the geni_i2c_irq()
>>> over and over again.  It had these:
>>>
>>> m_stat   = 0x04000080
>>> rx_st    = 0x30000011
>>> dm_tx_st = 0x00000000
>>> dm_rx_st = 0x00000000
>>> dma      = 0x00000001
>>>
>>> Notably we're in DMA mode but are getting M_RX_IRQ_EN and
>>> M_RX_FIFO_WATERMARK_EN over and over again.
>>>
>>> Putting some traces in geni_i2c_rx_one_msg() showed that when we
>>> failed we were getting to the start of geni_i2c_rx_one_msg() but were
>>> never executing geni_se_rx_dma_prep().
>>>
>>> I believe that the problem here is that we are starting the geni
>>> command before we run geni_se_rx_dma_prep().  If a transfer makes it
>>> far enough before we do that then we get into the state I have
>>> observed.  Let's change the order, which seems to work fine.
>>>
>>> Although problems were seen on the RX path, code inspection suggests
>>> that the TX should be changed too.  Change it as well.
>>>
>>> Fixes: 37692de5d523 ("i2c: i2c-qcom-geni: Add bus driver for the Qualcomm GENI I2C controller")
>>> Signed-off-by: Douglas Anderson <dianders@chromium.org>
>>> Tested-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
>>> Reviewed-by: Akash Asthana <akashast@codeaurora.org>
>> Applied to for-current, thanks!
> Glad we got this sorted. I just wondered that Alok wasn't part of the
> discussion. Is he still interested in maintaining the driver? Also
> because there is an unprocessed patch left for this driver:
>
> http://patchwork.ozlabs.org/project/linux-i2c/patch/20191103212204.13606-1-colin.king@canonical.com/

Alok has moved out of GENI team, he no longer supports GENI I2C drivers.

I have posted a patch to update maintainers list. Patch: 
https://patchwork.kernel.org/patch/11686541/ [MAINTAINERS: Update Geni 
I2C maintainers list]

Also, Girish Mahadevan, Sagar Dharia and Karthikeyan Ramasubramanian  no 
longer supports GENI drivers.

Regards,

Akash

>
diff mbox series

Patch

diff --git a/drivers/i2c/busses/i2c-qcom-geni.c b/drivers/i2c/busses/i2c-qcom-geni.c
index 18d1e4fd4cf3..7f130829bf01 100644
--- a/drivers/i2c/busses/i2c-qcom-geni.c
+++ b/drivers/i2c/busses/i2c-qcom-geni.c
@@ -367,7 +367,6 @@  static int geni_i2c_rx_one_msg(struct geni_i2c_dev *gi2c, struct i2c_msg *msg,
 		geni_se_select_mode(se, GENI_SE_FIFO);
 
 	writel_relaxed(len, se->base + SE_I2C_RX_TRANS_LEN);
-	geni_se_setup_m_cmd(se, I2C_READ, m_param);
 
 	if (dma_buf && geni_se_rx_dma_prep(se, dma_buf, len, &rx_dma)) {
 		geni_se_select_mode(se, GENI_SE_FIFO);
@@ -375,6 +374,8 @@  static int geni_i2c_rx_one_msg(struct geni_i2c_dev *gi2c, struct i2c_msg *msg,
 		dma_buf = NULL;
 	}
 
+	geni_se_setup_m_cmd(se, I2C_READ, m_param);
+
 	time_left = wait_for_completion_timeout(&gi2c->done, XFER_TIMEOUT);
 	if (!time_left)
 		geni_i2c_abort_xfer(gi2c);
@@ -408,7 +409,6 @@  static int geni_i2c_tx_one_msg(struct geni_i2c_dev *gi2c, struct i2c_msg *msg,
 		geni_se_select_mode(se, GENI_SE_FIFO);
 
 	writel_relaxed(len, se->base + SE_I2C_TX_TRANS_LEN);
-	geni_se_setup_m_cmd(se, I2C_WRITE, m_param);
 
 	if (dma_buf && geni_se_tx_dma_prep(se, dma_buf, len, &tx_dma)) {
 		geni_se_select_mode(se, GENI_SE_FIFO);
@@ -416,6 +416,8 @@  static int geni_i2c_tx_one_msg(struct geni_i2c_dev *gi2c, struct i2c_msg *msg,
 		dma_buf = NULL;
 	}
 
+	geni_se_setup_m_cmd(se, I2C_WRITE, m_param);
+
 	if (!dma_buf) /* Get FIFO IRQ */
 		writel_relaxed(1, se->base + SE_GENI_TX_WATERMARK_REG);