From patchwork Tue Mar 17 20:37:06 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Doug Anderson X-Patchwork-Id: 11443937 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 759CB1392 for ; Tue, 17 Mar 2020 20:38:03 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 4BC4620714 for ; Tue, 17 Mar 2020 20:38:03 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=chromium.org header.i=@chromium.org header.b="E7kG0Bh4" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726549AbgCQUiC (ORCPT ); Tue, 17 Mar 2020 16:38:02 -0400 Received: from mail-pl1-f193.google.com ([209.85.214.193]:40827 "EHLO mail-pl1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726730AbgCQUiC (ORCPT ); Tue, 17 Mar 2020 16:38:02 -0400 Received: by mail-pl1-f193.google.com with SMTP id h11so10126298plk.7 for ; Tue, 17 Mar 2020 13:38:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=jSVWcinVBkF0TMKRp+eiaEghUtTt1NtWoljB+scbYVU=; b=E7kG0Bh4ffCX+4pdDmDPEmUhRmqjjHdpF00agUUQidB2g5ozNWGsEsH0SIKM17G1o0 kKvHkzZFHoDdyK2uhSwOtfxD85hYXYbqjBRhD2Z8MyBL1yLf91SoC8M8RanLJT9OhKxG UJbMDTT3i+mBSqkInAH4BzNZDwEB/dKYUUW8k= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=jSVWcinVBkF0TMKRp+eiaEghUtTt1NtWoljB+scbYVU=; b=ArYdVRN0Hf/DpsJo9tH7AtjUCiVy4RCrC4iWCALQ52sBwp+QhBEviLiaEduuGLxKB/ /aiD/AryGMtUp/8l0IwEzyZn7oY6eO5ZPU81tr/ulxLCuzthA6dV3XA5pXEJLa9e/o/X SFQ7PQAbkKwnaxdqMAvDuNCRLEDNSBZgaRRRb7Kh5yfi73nFzRQyH4Pjw0Y9l8LdPK2F TrRm67D//KKLweNZSfyR8nahSNKBq999Oi3swDtSrkSIPykOGYQw9LaFvC7SxJLUtVWG J0U4o+fjvFl/Kvvny2zgGXGzmqKfbs9tUOPDAJt+kyNg++WORFt34CLBHg9E95pezaaa Xl7A== X-Gm-Message-State: ANhLgQ1QFADCfTDC4qvRR7+51kJkeKGM0kGR6a0teMMqk+xzcm+Sfpdw wCxq4oujuTZVz1YrKhzCQMwVVg== X-Google-Smtp-Source: ADFU+vuIDjiQMApXAYFSk7YrsY5SDvRSiRmwdvphWyIPFu20X5lDnDUyBU77VGieDUZBYjnqzJ4NKQ== X-Received: by 2002:a17:902:7d97:: with SMTP id a23mr436529plm.31.1584477479756; Tue, 17 Mar 2020 13:37:59 -0700 (PDT) Received: from tictac2.mtv.corp.google.com ([2620:15c:202:1:24fa:e766:52c9:e3b2]) by smtp.gmail.com with ESMTPSA id q123sm1039286pfb.54.2020.03.17.13.37.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 17 Mar 2020 13:37:59 -0700 (PDT) From: Douglas Anderson To: Mark Brown Cc: Alok Chauhan , skakit@codeaurora.org, Douglas Anderson , Andy Gross , Bjorn Andersson , Girish Mahadevan , Stephen Boyd , linux-arm-msm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-spi@vger.kernel.org Subject: [PATCH v2] spi: spi-geni-qcom: Speculative fix of "nobody cared" about interrupt Date: Tue, 17 Mar 2020 13:37:06 -0700 Message-Id: <20200317133653.v2.1.I752ebdcfd5e8bf0de06d66e767b8974932b3620e@changeid> X-Mailer: git-send-email 2.25.1.481.gfbce0eb801-goog MIME-Version: 1.0 Sender: linux-arm-msm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-arm-msm@vger.kernel.org Every once in a while (like once in a few months on a device) people have seen warnings on devices using spi-geni-qcom like: irq ...: nobody cared (try booting with the "irqpoll" option) ...where the interrupt number listed matches up with "spi_geni" in /proc/interrupts. You can get "nobody cared" if the interrupt handler returns IRQ_NONE. Once you get into this state the driver will just stop working. Auditing the code makes me believe that it's probably not so good checking "cur_mcmd" in the interrupt handler not under spinlock. Let's move it to be under spinlock. Looking more closely at the code, it looks as if there are some other places that could be under spinlock, so let's add. It looks as if the original code was assuming that by the time that the interrupt handler got called that the write to "cur_mcmd" (to make it not CMD_NONE anymore) would make it to the processor handling the interrupt. Perhaps with weakly ordered memory this sometimes (though very rarely) happened. Let's also add a warning (with the IRQ status) in the case that we ever end up getting an interrupt when we think we shouldn't. This would give us a better chance to debug if this patch doesn't help the issue. We'll also try our best to clear the interrupt to hopefully get us out of this state. Patch is marked "speculative" because I have no way to reproduce this problem, so I'm just hoping this fixes it. Weakly ordered memory makes my brain hurt. Fixes: 561de45f72bd ("spi: spi-geni-qcom: Add SPI driver support for GENI based QUP") Signed-off-by: Douglas Anderson Reported-by: kbuild test robot Reported-by: Dan Carpenter --- Changes in v2: - Detect true spurious interrupt. - Still return IRQ_NONE for state machine mismatch, but print warn. drivers/spi/spi-geni-qcom.c | 35 ++++++++++++++++++++++++++++++----- 1 file changed, 30 insertions(+), 5 deletions(-) diff --git a/drivers/spi/spi-geni-qcom.c b/drivers/spi/spi-geni-qcom.c index c3972424af71..92e51ccb427f 100644 --- a/drivers/spi/spi-geni-qcom.c +++ b/drivers/spi/spi-geni-qcom.c @@ -151,16 +151,19 @@ static void spi_geni_set_cs(struct spi_device *slv, bool set_flag) struct geni_se *se = &mas->se; unsigned long time_left; - reinit_completion(&mas->xfer_done); pm_runtime_get_sync(mas->dev); if (!(slv->mode & SPI_CS_HIGH)) set_flag = !set_flag; + spin_lock_irq(&mas->lock); + reinit_completion(&mas->xfer_done); + mas->cur_mcmd = CMD_CS; if (set_flag) geni_se_setup_m_cmd(se, SPI_CS_ASSERT, 0); else geni_se_setup_m_cmd(se, SPI_CS_DEASSERT, 0); + spin_unlock_irq(&mas->lock); time_left = wait_for_completion_timeout(&mas->xfer_done, HZ); if (!time_left) @@ -307,6 +310,8 @@ static void setup_fifo_xfer(struct spi_transfer *xfer, u32 spi_tx_cfg, len; struct geni_se *se = &mas->se; + spin_lock_irq(&mas->lock); + spi_tx_cfg = readl(se->base + SE_SPI_TRANS_CFG); if (xfer->bits_per_word != mas->cur_bits_per_word) { spi_setup_word_len(mas, mode, xfer->bits_per_word); @@ -376,6 +381,8 @@ static void setup_fifo_xfer(struct spi_transfer *xfer, */ if (m_cmd & SPI_TX_ONLY) writel(mas->tx_wm, se->base + SE_GENI_TX_WATERMARK_REG); + + spin_unlock_irq(&mas->lock); } static int spi_geni_transfer_one(struct spi_master *spi, @@ -478,13 +485,29 @@ static irqreturn_t geni_spi_isr(int irq, void *data) struct geni_se *se = &mas->se; u32 m_irq; unsigned long flags; - - if (mas->cur_mcmd == CMD_NONE) - return IRQ_NONE; + irqreturn_t ret = IRQ_HANDLED; spin_lock_irqsave(&mas->lock, flags); m_irq = readl(se->base + SE_GENI_M_IRQ_STATUS); + /* Check for spurious interrupt */ + if (!m_irq) { + ret = IRQ_NONE; + goto exit; + } + + /* + * If we got a real interrupt but software state machine thinks + * we were idle then give a warning. We'll do our best to clear + * the interrupt but we'll still return IRQ_NONE. If this keeps + * happening the system will eventually disable us. + */ + if (mas->cur_mcmd == CMD_NONE) { + pr_warn("Unexpected SPI interrupt: %#010x\n", m_irq); + ret = IRQ_NONE; + goto exit; + } + if ((m_irq & M_RX_FIFO_WATERMARK_EN) || (m_irq & M_RX_FIFO_LAST_EN)) geni_spi_handle_rx(mas); @@ -523,9 +546,11 @@ static irqreturn_t geni_spi_isr(int irq, void *data) complete(&mas->xfer_done); } +exit: writel(m_irq, se->base + SE_GENI_M_IRQ_CLEAR); spin_unlock_irqrestore(&mas->lock, flags); - return IRQ_HANDLED; + + return ret; } static int spi_geni_probe(struct platform_device *pdev)