From patchwork Wed Jan 22 20:00:58 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Connor Abbott X-Patchwork-Id: 13947654 Received: from mail-qk1-f176.google.com (mail-qk1-f176.google.com [209.85.222.176]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B78A918E764 for ; Wed, 22 Jan 2025 20:01:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.222.176 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737576072; cv=none; b=uKMCfGB/N72L0SPdLI/wDhEpGO48pCynDJ1ghHR294faXIjm5VgOSqbwPK0rjYzO6wIV6oOi/BeuyN2XPiJgkEDEPRWjS0zX88IkAJpGmf7D+x17NUBxzqn9MAPAaQY9ZYEmKUUuQ+YzmDkya4yrpjZpvJvZ59GWBd5drMFbesI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737576072; c=relaxed/simple; bh=LZOQxUUGdQqLBW3ZuPFFeTY2r3+nASK7HMPqbv/N5EE=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=H8CXKNtF65yoKmfJbaLTUTnh3Hszh5UkKlaagK/EptBPsi4u/n+d2Cx2bAoWIqbbwu4AR2961sadKHEt6iwH0C7Yq/6vxtzY8q45naFc+2X4wEWsUq8D7cX4eMirOA2K87PMNPRnm+/+FJDzURIDSOmu47izzjOI0HLYtejsQNU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=O7hHyodG; arc=none smtp.client-ip=209.85.222.176 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="O7hHyodG" Received: by mail-qk1-f176.google.com with SMTP id af79cd13be357-7b6ffda45f3so1395785a.2 for ; Wed, 22 Jan 2025 12:01:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1737576069; x=1738180869; darn=vger.kernel.org; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=C0W+ZEfLiMkXEH7rt2/GiovuzxyzO0s3Xlcxt3/DbVg=; b=O7hHyodGh5jg7iBjvA464nROIWjayN0IQifsyVX9bi1Ro6DEGU8szQXsNpJYs81Pj5 mTZnl46KIrz6I9mrbBtBiyeiRUGSvT0BgR3NV+YWKy++wbKy6fK+TtiRHi1+NNaUfT/Q a1i8usBn144GBRFs4Mqz9G1Zuzglf9Wg0/bzm4miDq+HxzLaz2+q7vBkNb9xHavP/aRv ig+vdgv7hvwshxcEIvAJbelzqUsUmdOOD6RLpfjDSTo/krfhxvnsWvg6cy2Pci9vxxpq bGYAeUrQxmnIbRf5U/Y8KkB3x9L/HUc5dUg/VqGdYkrqjJK0wPgZr8c+0siH6aCbYHVu ujMQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737576069; x=1738180869; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=C0W+ZEfLiMkXEH7rt2/GiovuzxyzO0s3Xlcxt3/DbVg=; b=a+OzVFA6czHB10YQ9QeTClAB76nFmc+g47wPkKjJhzWGH/WFlSiGjc4ehJX9+qiLos bJniUPOvVVqWKyxSLWFXsd7OXeSfquOGgh8kCH97cY/S274VEVQnZBnSqCssxfFdIzPB i+TwXdBOmAsEgFIjMXJlqHfptsXjiYgKJ4NgqkH/F5G2lB1Ikkw2Dj+epjxtVOmiOdGl Xfib2PgRSNWd4llULPLaWJz2JZJdTrsuRFTx18v5prksmLeU6tCzY42d+eVLMCO7wDGK Z0HVzxVc87Bv0dBqMfMGFiX0hLRkVfNS8HOgc7qiYWmxG7uV6AbkNEu5bwFV7TBBLm9i vNMQ== X-Forwarded-Encrypted: i=1; AJvYcCWTX4MQv+IST7nsJX3dyZ45wIZPPW4zmR2W2/nMGe7giviHF0wGcJlYnR8LJhrXNb6J0dEt8mgvrzvBt9Aj@vger.kernel.org X-Gm-Message-State: AOJu0Yz74vm4eZ33nDfltLMRbI7Ct7eKumTeo9ZZ2RbntZZ3jmfY5tvF YLWXKE1V2oXFN1swt4ITW+cjRbytx3ulPV8SOcvkyaiH/UBV+5iKBK7Vhw== X-Gm-Gg: ASbGncusDmBq6JprCgUjsa1PO/ScWcSL6NaHsfHJMIZUN4jG09giPM8O+BHsWjv6CIK FNCZ8KR45n3GKsysSaQmJmlFwUZqDGi3gDoFeqgO+ick3Act1DjCBOCKCTy1mUoh8oyVw585lG6 F7KN0vw6tiT2EHeQvq5SJskKDPZMec80bXQ6fcOV6CzqU4NxmFbQqBMeRzoT8yWK4xAr15e/R1P 7SY8JUwWNzRAAgIFTQQYXTRrLx2RBbZ5UnGmUOVXOTwG2vTJ78PfG0qG1xB5vdPyyl32ayOIxgr YgviiB210tA/aAzTxQ7BjSyEw6f8 X-Google-Smtp-Source: AGHT+IH7Bb216J6U5VF2T1X3drcXM9V1TK2n9v5TtKhVnuUToiXKlxArzXqCFTvaC3c4LRNMangaew== X-Received: by 2002:a05:620a:2893:b0:7b1:3bf8:b3c4 with SMTP id af79cd13be357-7be63158aa9mr1312874885a.0.1737576069547; Wed, 22 Jan 2025 12:01:09 -0800 (PST) Received: from [192.168.1.99] (ool-4355b0da.dyn.optonline.net. [67.85.176.218]) by smtp.gmail.com with ESMTPSA id af79cd13be357-7be6147e30asm694606385a.31.2025.01.22.12.01.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 22 Jan 2025 12:01:09 -0800 (PST) From: Connor Abbott Date: Wed, 22 Jan 2025 15:00:58 -0500 Subject: [PATCH v3 1/3] iommu/arm-smmu: Fix spurious interrupts with stall-on-fault Precedence: bulk X-Mailing-List: linux-arm-msm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20250122-msm-gpu-fault-fixes-next-v3-1-0afa00158521@gmail.com> References: <20250122-msm-gpu-fault-fixes-next-v3-0-0afa00158521@gmail.com> In-Reply-To: <20250122-msm-gpu-fault-fixes-next-v3-0-0afa00158521@gmail.com> To: Rob Clark , Will Deacon , Robin Murphy , Joerg Roedel , Sean Paul , Konrad Dybcio , Abhinav Kumar , Dmitry Baryshkov , Marijn Suijten Cc: iommu@lists.linux.dev, linux-arm-msm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, freedreno@lists.freedesktop.org, Connor Abbott X-Mailer: b4 0.14.2 X-Developer-Signature: v=1; a=ed25519-sha256; t=1737576067; l=5809; i=cwabbott0@gmail.com; s=20240426; h=from:subject:message-id; bh=LZOQxUUGdQqLBW3ZuPFFeTY2r3+nASK7HMPqbv/N5EE=; b=+NXIUeaospHwBoU1GxlEI04bEiPY7s3tZ+XXIo6zrznDjds++HImSxpVTUy3AAj3qlRlGR+0D 8GDZdqn+QNSDR9v9/8qj5NGy4z/lrwbX5MhDaoec++7JU7xDAILX6pc X-Developer-Key: i=cwabbott0@gmail.com; a=ed25519; pk=dkpOeRSXLzVgqhy0Idr3nsBr4ranyERLMnoAgR4cHmY= On some SMMUv2 implementations, including MMU-500, SMMU_CBn_FSR.SS asserts an interrupt. The only way to clear that bit is to resume the transaction by writing SMMU_CBn_RESUME, but typically resuming the transaction requires complex operations (copying in pages, etc.) that can't be done in IRQ context. drm/msm already has a problem, because its fault handler sometimes schedules a job to dump the GPU state and doesn't resume translation until this is complete. Work around this by disabling context fault interrupts until after the transaction is resumed. Because other context banks can share an IRQ line, we may still get an interrupt intended for another context bank, but in this case only SMMU_CBn_FSR.SS will be asserted and we can skip it assuming that interrupts are disabled which is accomplished by removing the bit from ARM_SMMU_CB_FSR_FAULT. SMMU_CBn_FSR.SS won't be asserted unless an external user enabled stall-on-fault, and they are expected to resume the translation and re-enable interrupts. Signed-off-by: Connor Abbott Reviewed-by Robin Murphy --- drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 15 ++++++++++- drivers/iommu/arm/arm-smmu/arm-smmu.c | 41 +++++++++++++++++++++++++++++- drivers/iommu/arm/arm-smmu/arm-smmu.h | 1 - 3 files changed, 54 insertions(+), 3 deletions(-) diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c index 59d02687280e8d37b5e944619fcfe4ebd1bd6926..7d86e9972094eb4d304b24259f4ed9a4820cabc7 100644 --- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c +++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c @@ -125,12 +125,25 @@ static void qcom_adreno_smmu_resume_translation(const void *cookie, bool termina struct arm_smmu_domain *smmu_domain = (void *)cookie; struct arm_smmu_cfg *cfg = &smmu_domain->cfg; struct arm_smmu_device *smmu = smmu_domain->smmu; - u32 reg = 0; + u32 reg = 0, sctlr; + unsigned long flags; if (terminate) reg |= ARM_SMMU_RESUME_TERMINATE; + spin_lock_irqsave(&smmu_domain->cb_lock, flags); + arm_smmu_cb_write(smmu, cfg->cbndx, ARM_SMMU_CB_RESUME, reg); + + /* + * Re-enable interrupts after they were disabled by + * arm_smmu_context_fault(). + */ + sctlr = arm_smmu_cb_read(smmu, cfg->cbndx, ARM_SMMU_CB_SCTLR); + sctlr |= ARM_SMMU_SCTLR_CFIE; + arm_smmu_cb_write(smmu, cfg->cbndx, ARM_SMMU_CB_SCTLR, sctlr); + + spin_unlock_irqrestore(&smmu_domain->cb_lock, flags); } static void qcom_adreno_smmu_set_prr_bit(const void *cookie, bool set) diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c index 79afc92e1d8b984dd35c469a3f283ad0c78f3d26..ca1ff59015a63912f0f9c5256452b2b2efa928f1 100644 --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c @@ -463,13 +463,52 @@ static irqreturn_t arm_smmu_context_fault(int irq, void *dev) if (!(cfi.fsr & ARM_SMMU_CB_FSR_FAULT)) return IRQ_NONE; + /* + * On some implementations FSR.SS asserts a context fault + * interrupt. We do not want this behavior, because resolving the + * original context fault typically requires operations that cannot be + * performed in IRQ context but leaving the stall unacknowledged will + * immediately lead to another spurious interrupt as FSR.SS is still + * set. Work around this by disabling interrupts for this context bank. + * It's expected that interrupts are re-enabled after resuming the + * translation. + * + * We have to do this before report_iommu_fault() so that we don't + * leave interrupts disabled in case the downstream user decides the + * fault can be resolved inside its fault handler. + * + * There is a possible race if there are multiple context banks sharing + * the same interrupt and both signal an interrupt in between writing + * RESUME and SCTLR. We could disable interrupts here before we + * re-enable them in the resume handler, leaving interrupts enabled. + * Lock the write to serialize it with the resume handler. + */ + if (cfi.fsr & ARM_SMMU_CB_FSR_SS) { + u32 val; + + spin_lock(&smmu_domain->cb_lock); + val = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_SCTLR); + val &= ~ARM_SMMU_SCTLR_CFIE; + arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_SCTLR, val); + spin_unlock(&smmu_domain->cb_lock); + } + + /* + * The SMMUv2 architecture specification says that if stall-on-fault is + * enabled the correct sequence is to write to SMMU_CBn_FSR to clear + * the fault and then write to SMMU_CBn_RESUME. Clear the interrupt + * first before running the user's fault handler to make sure we follow + * this sequence. It should be ok if there is another fault in the + * meantime because we have already read the fault info. + */ + arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_FSR, cfi.fsr); + ret = report_iommu_fault(&smmu_domain->domain, NULL, cfi.iova, cfi.fsynr & ARM_SMMU_CB_FSYNR0_WNR ? IOMMU_FAULT_WRITE : IOMMU_FAULT_READ); if (ret == -ENOSYS && __ratelimit(&rs)) arm_smmu_print_context_fault_info(smmu, idx, &cfi); - arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_FSR, cfi.fsr); return IRQ_HANDLED; } diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.h b/drivers/iommu/arm/arm-smmu/arm-smmu.h index 2dbf3243b5ad2db01e17fb26c26c838942a491be..789c64ff3eb9944c8af37426e005241a8288da20 100644 --- a/drivers/iommu/arm/arm-smmu/arm-smmu.h +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.h @@ -216,7 +216,6 @@ enum arm_smmu_cbar_type { ARM_SMMU_CB_FSR_TLBLKF) #define ARM_SMMU_CB_FSR_FAULT (ARM_SMMU_CB_FSR_MULTI | \ - ARM_SMMU_CB_FSR_SS | \ ARM_SMMU_CB_FSR_UUT | \ ARM_SMMU_CB_FSR_EF | \ ARM_SMMU_CB_FSR_PF | \ From patchwork Wed Jan 22 20:00:59 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Connor Abbott X-Patchwork-Id: 13947655 Received: from mail-qk1-f170.google.com (mail-qk1-f170.google.com [209.85.222.170]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CA41D1BFE03 for ; Wed, 22 Jan 2025 20:01:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.222.170 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737576073; cv=none; b=S9QhuM+z5GK8foFApROoxzrpN40lq/RgHJfv0aeMcOqy2yg86RioH1iGLGFShPB5Zx8pbezinObcHXMHhO1sgaCeDhDjuEOibqRZ+UomcfMwBdhqvpEOTQ/TsnS0os1Z9WN2jGpjWmC/27HDtU/ST/uzK2PVrdfhXNcuV9vBcDs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737576073; c=relaxed/simple; bh=4obbXrvjN1YYsEVfq8apmOVUt5dXPe6TwBqzW1FSgs0=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=sseTBk2cF8cuSZC/NEVxzMlaSooO2juvf784pUqL0XyHVsbu9SiWrvXd6/2EjKQ+HTr8BGcwad0bXXBA+vBWWfcQy7b8Iq71cNiAdHtpxZn8dnNchjLQi/OcTzJN8D3V8qIFnz+8g7nSA75nWx1GQfBdeF3FUrgwn3UeQ4T1WPE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=kMycX969; arc=none smtp.client-ip=209.85.222.170 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="kMycX969" Received: by mail-qk1-f170.google.com with SMTP id af79cd13be357-7b6fd251d7cso1542385a.0 for ; Wed, 22 Jan 2025 12:01:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1737576070; x=1738180870; darn=vger.kernel.org; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=clItPhIQuRHdt/X/5Yn3hkPoRrPZ3x4kk0mvqKl7Y8o=; b=kMycX9699ceXNN85uq9FwmW9jyEbqvrWiqO6PRaaN+53dIRbkNHgK2lfJPyk21yxb+ KWkJ1QVmpbU86EGNO4FlGoW+fV7VnFOJIkcWUKxLL8qyU9goS+rg2jlXKchTp2ML6jCg Ht0+KKN4ZO8L3BEaTAcc8o6eg3Mu+rru1Bzt8Edk7QvPrBInXH6RWdGGtF4wJ3fOa1fp Pw43hG8Ug/IJD81J6agjbZ9z8cfPtlKRXSRI9Dqc38aLkTYoUY6PGVI1Aa/B1xvXm/KN vBIYwWMiFt6RpoF+iBh2F3FVcS6AiLnpOLbbnz9YR96JfvBMXiAF8SyC86iwUBOEnHGr t+WA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737576070; x=1738180870; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=clItPhIQuRHdt/X/5Yn3hkPoRrPZ3x4kk0mvqKl7Y8o=; b=JkzU4seF4L9zsdTb4B2uXKZYlRzBxALkD21Zf0Xw/i1QZyt4a2FJuWOWXnPfzsfpAl plYW3K+fsEH1S8psrFW7U8FsguXD5NvD50s/kuyHpJ+d/kP/3SfJfHo8wv4j/gY49TTH qVUa00izi5k+uPcZvp4uHEUY2bnBLKjVrQetU018xl/1YXvHah3o2NtZ2C0/NHWOxHm5 2OAjDWeApYFDp41MzCNhK79Gqg1BfY0YRl0ZvaKB4OZAi60t5p2aN3amnvD93XuNzQ3Q 3UxZJ+D8c4dt4m5jUTv4aDS/ISpngzRrfVsX0KUdv74fSg1TS30mWbvvqJOg34hfhE85 gl4g== X-Forwarded-Encrypted: i=1; AJvYcCXhtLhnAuN+iMyUVmUoR+t5Xd8sPopwB4zMPa8jZbGnz6dZnuoNMv7xD+xEqX1iLaBQTs39Y0KiZcEghl1b@vger.kernel.org X-Gm-Message-State: AOJu0Yy0frufbc/IcctQ6XVmmtr0pmTKu38rKIvjMmjutA8YyMfJF6Oc FEQ7lsFeDHZHGkzYpp9fOszpADtfiR2bUERxLJ35aHwY49T0tcgl X-Gm-Gg: ASbGncs9xmjtNOr5N7gFLpNzc5yl4lHe8wX8ApXhNeOTcH7re1p0jknAb6oIECEVWr+ KIPgONuZmx1I+bTZoyyG+VoRShBSo/Eo8dhRd1/zczbm+QUV3vFCpBnvA4rEBduZw9EaVUUSmky p6bsziijwQjE3U0s6kWkeNN5pmB/ehyithL6Bsm1G+Oc9Icv0124g7TN0HDcQHG/MrzEpbS/YIh 9LN73RFcLLwuuDPjKMKYZCfgcSbjuw2N96dUb0e2LRc66G/BtqgYWHdXWiESNxvlwaw1T0bR9sJ JU78N+KDDUpzJ/MhVp09qja7ooTs X-Google-Smtp-Source: AGHT+IFbsQgN+OYWoJ8tGUrhqTk3b6gEBJi7yaE4oLpHOAjzYS5P1remXjUXmLoR8liW70biIpVbJQ== X-Received: by 2002:a05:620a:1911:b0:7be:3cf2:5b46 with SMTP id af79cd13be357-7be6320bc70mr1391147185a.8.1737576070653; Wed, 22 Jan 2025 12:01:10 -0800 (PST) Received: from [192.168.1.99] (ool-4355b0da.dyn.optonline.net. [67.85.176.218]) by smtp.gmail.com with ESMTPSA id af79cd13be357-7be6147e30asm694606385a.31.2025.01.22.12.01.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 22 Jan 2025 12:01:10 -0800 (PST) From: Connor Abbott Date: Wed, 22 Jan 2025 15:00:59 -0500 Subject: [PATCH v3 2/3] iommu/arm-smmu-qcom: Make set_stall work when the device is on Precedence: bulk X-Mailing-List: linux-arm-msm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20250122-msm-gpu-fault-fixes-next-v3-2-0afa00158521@gmail.com> References: <20250122-msm-gpu-fault-fixes-next-v3-0-0afa00158521@gmail.com> In-Reply-To: <20250122-msm-gpu-fault-fixes-next-v3-0-0afa00158521@gmail.com> To: Rob Clark , Will Deacon , Robin Murphy , Joerg Roedel , Sean Paul , Konrad Dybcio , Abhinav Kumar , Dmitry Baryshkov , Marijn Suijten Cc: iommu@lists.linux.dev, linux-arm-msm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, freedreno@lists.freedesktop.org, Connor Abbott X-Mailer: b4 0.14.2 X-Developer-Signature: v=1; a=ed25519-sha256; t=1737576067; l=2108; i=cwabbott0@gmail.com; s=20240426; h=from:subject:message-id; bh=4obbXrvjN1YYsEVfq8apmOVUt5dXPe6TwBqzW1FSgs0=; b=UtLt9Gxw8iQpbRsxeRyHyMb7Pm7ArBDOREehdF+WX9yPMugQJ3Icy/92gYrtsug8VrkLc1is8 CiXL67DhVUCAGFOPZCD8bglUB3zCsRgXLnoLC4DgfCG1QTviaTVY5MT X-Developer-Key: i=cwabbott0@gmail.com; a=ed25519; pk=dkpOeRSXLzVgqhy0Idr3nsBr4ranyERLMnoAgR4cHmY= Up until now we have only called the set_stall callback during initialization when the device is off. But we will soon start calling it to temporarily disable stall-on-fault when the device is on, so handle that by checking if the device is on and writing SCTLR. Signed-off-by: Connor Abbott --- drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 30 +++++++++++++++++++++++++++--- 1 file changed, 27 insertions(+), 3 deletions(-) diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c index 7d86e9972094eb4d304b24259f4ed9a4820cabc7..6693d8f8e3ae4e970ca9d7f549321ab4f59e8b32 100644 --- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c +++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c @@ -112,12 +112,36 @@ static void qcom_adreno_smmu_set_stall(const void *cookie, bool enabled) { struct arm_smmu_domain *smmu_domain = (void *)cookie; struct arm_smmu_cfg *cfg = &smmu_domain->cfg; - struct qcom_smmu *qsmmu = to_qcom_smmu(smmu_domain->smmu); + struct arm_smmu_device *smmu = smmu_domain->smmu; + struct qcom_smmu *qsmmu = to_qcom_smmu(smmu); + u32 mask = BIT(cfg->cbndx); + bool stall_changed = !!(qsmmu->stall_enabled & mask) != enabled; + unsigned long flags; if (enabled) - qsmmu->stall_enabled |= BIT(cfg->cbndx); + qsmmu->stall_enabled |= mask; else - qsmmu->stall_enabled &= ~BIT(cfg->cbndx); + qsmmu->stall_enabled &= ~mask; + + /* + * If the device is on and we changed the setting, update the register. + */ + if (stall_changed && pm_runtime_get_if_active(smmu->dev) > 0) { + spin_lock_irqsave(&smmu_domain->cb_lock, flags); + + u32 reg = arm_smmu_cb_read(smmu, cfg->cbndx, ARM_SMMU_CB_SCTLR); + + if (enabled) + reg |= ARM_SMMU_SCTLR_CFCFG; + else + reg &= ~ARM_SMMU_SCTLR_CFCFG; + + arm_smmu_cb_write(smmu, cfg->cbndx, ARM_SMMU_CB_SCTLR, reg); + + spin_unlock_irqrestore(&smmu_domain->cb_lock, flags); + + pm_runtime_put_autosuspend(smmu->dev); + } } static void qcom_adreno_smmu_resume_translation(const void *cookie, bool terminate) From patchwork Wed Jan 22 20:01:00 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Connor Abbott X-Patchwork-Id: 13947656 Received: from mail-qk1-f170.google.com (mail-qk1-f170.google.com [209.85.222.170]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F2D6521576D for ; Wed, 22 Jan 2025 20:01:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.222.170 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737576076; cv=none; b=JxvAl0yk9n2EfcqzUl3kE59DnZ1mkJE83blSHr/+bqGgBcAByajU74IkyFW6Z0ZNG8mK5fss8ylCeFT1QRv0oS9gmQYxPK5A4E13ms8HA/MdVUSjouBVsNfJadPHCPRuVCQojS8me1Cdk70xAzfRHpxiD1faU54l4TLuwfRaIgE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737576076; c=relaxed/simple; bh=1LEQ/SMNbwf2yI/kiLpEh729+NspG9MjxFycNdw8Mco=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=QEf4pUCDC3JOl+fc7zsKCPNWPUozgzxXY25YrK7BhIL4gYcJw0XIqa2h2HEkpDIP45JMj+4/Mn64KyRXv5J2JYo6oiCgeErFMA6ZiUgUq/3gA75r5TMfVuuu9ILyIhhVVAkQI+tER9yaF3tkq/TqF4aLkj/y10qoKjhog0pQa00= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=ZhnGLrJj; arc=none smtp.client-ip=209.85.222.170 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="ZhnGLrJj" Received: by mail-qk1-f170.google.com with SMTP id af79cd13be357-7b807057cc9so612085a.1 for ; Wed, 22 Jan 2025 12:01:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1737576072; x=1738180872; darn=vger.kernel.org; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=uvSBZTv+Bn83Qgi3n8/RjQE01uKIcw895iDtbXzMRaM=; b=ZhnGLrJjb2cIDdA2KcNFrZzVYtp2JXLevKPwrxdlzjNCAlsWDBL7QH5Wf4K4YPHkYH GJBfrtauREczlGRZ7RZY4imsDuBz71FcIbWbvEvNxv6IZtxARxYCgxKlgaOdvoe6gFcc egf/C8ddLM094S6mqAaRPQmiIylN1Wu+jTI+s/7nb511CWNThit9xJZNr0MPZ8VP2HC+ KViucR7qOFqnf11Gb2RH+uLfINFKgLit7GPoZXRjOflpEvssBr9min+eveSeFIsMa7x+ GClKTkMR1vyjQYQZ2S5X960nrtk2A/Xial1zgZhhkiYi4yFIuSaYZtdIQ4aUJLCmxaYT lDOg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737576072; x=1738180872; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=uvSBZTv+Bn83Qgi3n8/RjQE01uKIcw895iDtbXzMRaM=; b=XUJ2Og9g6VFy2yk3gZ4DPkCzAhaT/T8VZfyUPiHJr+J+3dtEPGvzBWlHsXb/9ggRAv KYg3Zv/4lz2Fxk1+PnqSf2IbI0b570JwggrviY+Wmap7LXM8+z80MP/3J87bNoSsgp4F qJVNsEe/OOYfVtKDb6WAnfEYjDgkqsTtjTF33hN9FUIzAAsjkWlfVx1eUwRDp3dboGe6 EZOpiS9SOCUQRnkPp0dDGRQBthFMpjzh/X0zAhFN7QhxbLDOCEs2pBUlJimppLSe4uOR KUTGE+jafkAAy/gp+YGxC/SxXbXjY2Et3QUgyT+W2iiifhMPGrc1sNGmH24r35XFBljN U0FQ== X-Forwarded-Encrypted: i=1; AJvYcCUl0zo/Leil87Ta5r588wsaEqaVCn6B0jIB7b6SxlaxYVB4H1ngppTod1uJB7ABlwH2n4siilVJdlf6sa8S@vger.kernel.org X-Gm-Message-State: AOJu0YzCze5u78XmqVunEdQ/NUbLQZMWhZoVoJREMW0ku4oxPthP4W8q KL3Vwd3khjwAlwCPfsOoQBscQBXPGnPUKanzxiXa09ckJFNu94CH X-Gm-Gg: ASbGncv8cCUGlkyMPFiB2c2MQ141I+6JGZSL12cIAhu+/GKkLxE5SQ/f4N0l2vYNAjo vUmpNoX37ic/W0E1VwRd4BkUzGDAwB7VsOVF2IzoJ9AisE8ITFebnuha44o3QvITmifoGNDL+TW LIVSZJMF/8UpWY0HhFLWuD84w0JDIFTHcKhx7q5C6zryLcUWnctNKt8cPFFsP46BxupX9krDqXq 9H36ixIPO8y+B+dP9sqc7e7SIZhGEAFenYqvE/E+lQTqj8H4IotWk0+wwl1k7T64cpSaiNES91f MASxbaB+VUPTN3bXQSA58K5scNW1 X-Google-Smtp-Source: AGHT+IG7pcOO4j8rcAXTAXAOFHFzbrJ+ShZfoiNI9k3XAbErfF6sb3mFVDmAynz5yJb92coD3wnqQQ== X-Received: by 2002:a05:620a:394b:b0:7af:cac7:5017 with SMTP id af79cd13be357-7be8b2eaceamr237140485a.4.1737576071617; Wed, 22 Jan 2025 12:01:11 -0800 (PST) Received: from [192.168.1.99] (ool-4355b0da.dyn.optonline.net. [67.85.176.218]) by smtp.gmail.com with ESMTPSA id af79cd13be357-7be6147e30asm694606385a.31.2025.01.22.12.01.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 22 Jan 2025 12:01:11 -0800 (PST) From: Connor Abbott Date: Wed, 22 Jan 2025 15:01:00 -0500 Subject: [PATCH v3 3/3] drm/msm: Temporarily disable stall-on-fault after a page fault Precedence: bulk X-Mailing-List: linux-arm-msm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20250122-msm-gpu-fault-fixes-next-v3-3-0afa00158521@gmail.com> References: <20250122-msm-gpu-fault-fixes-next-v3-0-0afa00158521@gmail.com> In-Reply-To: <20250122-msm-gpu-fault-fixes-next-v3-0-0afa00158521@gmail.com> To: Rob Clark , Will Deacon , Robin Murphy , Joerg Roedel , Sean Paul , Konrad Dybcio , Abhinav Kumar , Dmitry Baryshkov , Marijn Suijten Cc: iommu@lists.linux.dev, linux-arm-msm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, freedreno@lists.freedesktop.org, Connor Abbott X-Mailer: b4 0.14.2 X-Developer-Signature: v=1; a=ed25519-sha256; t=1737576067; l=9275; i=cwabbott0@gmail.com; s=20240426; h=from:subject:message-id; bh=1LEQ/SMNbwf2yI/kiLpEh729+NspG9MjxFycNdw8Mco=; b=A9/IQz3QphuH91aEBD88IGONhoh/I0uf9QAf2bM/Kp176OTv6KhhkKafFHl0Sv7zzoxyaJ6hY MPmB4Vx6LgoASfwb9x1YGfU/BJSXsIzB1PEk2zUafGAkpNmG1blCRpo X-Developer-Key: i=cwabbott0@gmail.com; a=ed25519; pk=dkpOeRSXLzVgqhy0Idr3nsBr4ranyERLMnoAgR4cHmY= When things go wrong, the GPU is capable of quickly generating millions of faulting translation requests per second. When that happens, in the stall-on-fault model each access will stall until it wins the race to signal the fault and then the RESUME register is written. This slows processing page faults to a crawl as the GPU can generate faults much faster than the CPU can acknowledge them. It also means that all available resources in the SMMU are saturated waiting for the stalled transactions, so that other transactions such as transactions generated by the GMU, which shares a context bank with the GPU, cannot proceed. This causes a GMU watchdog timeout, which leads to a failed reset because GX cannot collapse when there is a transaction pending and a permanently hung GPU. On older platforms with qcom,smmu-v2, it seems that when one transaction is stalled subsequent faulting transactions are terminated, which avoids this problem, but the MMU-500 follows the spec here. To work around these problem, disable stall-on-fault as soon as we get a page fault until a cooldown period after pagefaults stop. This allows the GMU some guaranteed time to continue working. We only use stall-on-fault to halt the GPU while we collect a devcoredump and we always terminate the transaction afterward, so it's fine to miss some subsequent page faults. We also keep it disabled so long as the current devcoredump hasn't been deleted, because in that case we likely won't capture another one if there's a fault. After this commit HFI messages still occasionally time out, because the crashdump handler doesn't run fast enough to let the GMU resume, but the driver seems to recover from it. This will probably go away after the HFI timeout is increased. Signed-off-by: Connor Abbott --- drivers/gpu/drm/msm/adreno/a5xx_gpu.c | 2 ++ drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 4 ++++ drivers/gpu/drm/msm/adreno/adreno_gpu.c | 42 ++++++++++++++++++++++++++++++++- drivers/gpu/drm/msm/adreno/adreno_gpu.h | 24 +++++++++++++++++++ drivers/gpu/drm/msm/msm_iommu.c | 9 +++++++ drivers/gpu/drm/msm/msm_mmu.h | 1 + 6 files changed, 81 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c index 71dca78cd7a5324e9ff5b14f173e2209fa42e196..670141531112c9d29cef8ef1fd51b74759fdd6d2 100644 --- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c +++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c @@ -131,6 +131,8 @@ static void a5xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit) struct msm_ringbuffer *ring = submit->ring; unsigned int i, ibs = 0; + adreno_check_and_reenable_stall(adreno_gpu); + if (IS_ENABLED(CONFIG_DRM_MSM_GPU_SUDO) && submit->in_rb) { ring->cur_ctx_seqno = 0; a5xx_submit_in_rb(gpu, submit); diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c index 0ae29a7c8a4d3f74236a35cc919f69d5c0a384a0..5a34cd2109a2d74c92841448a61ccb0d4f34e264 100644 --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c @@ -212,6 +212,8 @@ static void a6xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit) struct msm_ringbuffer *ring = submit->ring; unsigned int i, ibs = 0; + adreno_check_and_reenable_stall(adreno_gpu); + a6xx_set_pagetable(a6xx_gpu, ring, submit); get_stats_counter(ring, REG_A6XX_RBBM_PERFCTR_CP(0), @@ -335,6 +337,8 @@ static void a7xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit) struct msm_ringbuffer *ring = submit->ring; unsigned int i, ibs = 0; + adreno_check_and_reenable_stall(adreno_gpu); + /* * Toggle concurrent binning for pagetable switch and set the thread to * BR since only it can execute the pagetable switch packets. diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/drm/msm/adreno/adreno_gpu.c index 1238f326597808eb28b4c6822cbd41a26e555eb9..bac586101dc0494f46b069a8440a45825dfe9b5e 100644 --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c @@ -246,16 +246,53 @@ u64 adreno_private_address_space_size(struct msm_gpu *gpu) return SZ_4G; } +void adreno_check_and_reenable_stall(struct adreno_gpu *adreno_gpu) +{ + struct msm_gpu *gpu = &adreno_gpu->base; + unsigned long flags; + + /* + * Wait until the cooldown period has passed and we would actually + * collect a crashdump to re-enable stall-on-fault. + */ + spin_lock_irqsave(&adreno_gpu->fault_stall_lock, flags); + if (!adreno_gpu->stall_enabled && + ktime_after(ktime_get(), adreno_gpu->stall_reenable_time) && + !READ_ONCE(gpu->crashstate)) { + adreno_gpu->stall_enabled = true; + + gpu->aspace->mmu->funcs->set_stall(gpu->aspace->mmu, true); + } + spin_unlock_irqrestore(&adreno_gpu->fault_stall_lock, flags); +} + #define ARM_SMMU_FSR_TF BIT(1) #define ARM_SMMU_FSR_PF BIT(3) #define ARM_SMMU_FSR_EF BIT(4) +#define ARM_SMMU_FSR_SS BIT(30) int adreno_fault_handler(struct msm_gpu *gpu, unsigned long iova, int flags, struct adreno_smmu_fault_info *info, const char *block, u32 scratch[4]) { + struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu); const char *type = "UNKNOWN"; - bool do_devcoredump = info && !READ_ONCE(gpu->crashstate); + bool do_devcoredump = info && (info->fsr & ARM_SMMU_FSR_SS) && + !READ_ONCE(gpu->crashstate); + unsigned long irq_flags; + + /* + * In case there is a subsequent storm of pagefaults, disable + * stall-on-fault for at least half a second. + */ + spin_lock_irqsave(&adreno_gpu->fault_stall_lock, irq_flags); + if (adreno_gpu->stall_enabled) { + adreno_gpu->stall_enabled = false; + + gpu->aspace->mmu->funcs->set_stall(gpu->aspace->mmu, false); + } + adreno_gpu->stall_reenable_time = ktime_add_ms(ktime_get(), 500); + spin_unlock_irqrestore(&adreno_gpu->fault_stall_lock, irq_flags); /* * If we aren't going to be resuming later from fault_worker, then do @@ -1143,6 +1180,9 @@ int adreno_gpu_init(struct drm_device *drm, struct platform_device *pdev, adreno_gpu->info->inactive_period); pm_runtime_use_autosuspend(dev); + spin_lock_init(&adreno_gpu->fault_stall_lock); + adreno_gpu->stall_enabled = true; + return msm_gpu_init(drm, pdev, &adreno_gpu->base, &funcs->base, gpu_name, &adreno_gpu_config); } diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.h b/drivers/gpu/drm/msm/adreno/adreno_gpu.h index dcf454629ce037b2a8274a6699674ad754ce1f07..a528036b46216bd898f6d48c5fb0555c4c4b053b 100644 --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.h +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.h @@ -205,6 +205,28 @@ struct adreno_gpu { /* firmware: */ const struct firmware *fw[ADRENO_FW_MAX]; + /** + * fault_stall_lock: + * + * Serialize changes to stall-on-fault state. + */ + spinlock_t fault_stall_lock; + + /** + * fault_stall_reenable_time: + * + * if stall_enabled is false, when to reenable stall-on-fault. + */ + ktime_t stall_reenable_time; + + /** + * stall_enabled: + * + * Whether stall-on-fault is currently enabled. + */ + bool stall_enabled; + + struct { /** * @rgb565_predicator: Unknown, introduced with A650 family, @@ -629,6 +651,8 @@ int adreno_fault_handler(struct msm_gpu *gpu, unsigned long iova, int flags, struct adreno_smmu_fault_info *info, const char *block, u32 scratch[4]); +void adreno_check_and_reenable_stall(struct adreno_gpu *gpu); + int adreno_read_speedbin(struct device *dev, u32 *speedbin); /* diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c index 2a94e82316f95c5f9dcc37ef0a4664a29e3492b2..8d5380e6dcc217c7c209b51527bf15748b3ada71 100644 --- a/drivers/gpu/drm/msm/msm_iommu.c +++ b/drivers/gpu/drm/msm/msm_iommu.c @@ -351,6 +351,14 @@ static void msm_iommu_resume_translation(struct msm_mmu *mmu) adreno_smmu->resume_translation(adreno_smmu->cookie, true); } +static void msm_iommu_set_stall(struct msm_mmu *mmu, bool enable) +{ + struct adreno_smmu_priv *adreno_smmu = dev_get_drvdata(mmu->dev); + + if (adreno_smmu->set_stall) + adreno_smmu->set_stall(adreno_smmu->cookie, enable); +} + static void msm_iommu_detach(struct msm_mmu *mmu) { struct msm_iommu *iommu = to_msm_iommu(mmu); @@ -399,6 +407,7 @@ static const struct msm_mmu_funcs funcs = { .unmap = msm_iommu_unmap, .destroy = msm_iommu_destroy, .resume_translation = msm_iommu_resume_translation, + .set_stall = msm_iommu_set_stall, }; struct msm_mmu *msm_iommu_new(struct device *dev, unsigned long quirks) diff --git a/drivers/gpu/drm/msm/msm_mmu.h b/drivers/gpu/drm/msm/msm_mmu.h index 88af4f490881f2a6789ae2d03e1c02d10046331a..2694a356a17904e7572b767b16ed0cee806406cf 100644 --- a/drivers/gpu/drm/msm/msm_mmu.h +++ b/drivers/gpu/drm/msm/msm_mmu.h @@ -16,6 +16,7 @@ struct msm_mmu_funcs { int (*unmap)(struct msm_mmu *mmu, uint64_t iova, size_t len); void (*destroy)(struct msm_mmu *mmu); void (*resume_translation)(struct msm_mmu *mmu); + void (*set_stall)(struct msm_mmu *mmu, bool enable); }; enum msm_mmu_type {