From patchwork Wed Jan 22 20:00:58 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Connor Abbott X-Patchwork-Id: 13947660 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 03028C02181 for ; Wed, 22 Jan 2025 20:05:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Cc:To:In-Reply-To:References :Message-Id:Content-Transfer-Encoding:Content-Type:MIME-Version:Subject:Date: From:Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=C0W+ZEfLiMkXEH7rt2/GiovuzxyzO0s3Xlcxt3/DbVg=; b=vt9IxpmrkYlX49PYEdLktZtrhT n2/E6iC2HnH4GG0QO/reMv+cUe5oQw5+3GG2RH0O1laOkEnJZtcO6DGfDnXUI+z3RnARcCDvGEubO UbrEKbHDTuUgTUVpJheZKAsQWypWfxHaS/jTOv98J2g1cbsVkfsvYY1K0wItblXvLg5/7Ok1J07pA Tric6AybGoAbZ4REeOGwjRVm24RbjP4cliPJJwcL/rjFyvh9DhAs5TizTqYBubtEolujkO3/Wrero XwiOfuodnbWx/Qh5fafyUXjZn9NzBs9X2UO8+70FVzCZjuDK/CNtC4UEnZ5Ui/d5TYfGubNO3UJ88 AY4gQ82A==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1tagyX-0000000B8Aq-0T6g; Wed, 22 Jan 2025 20:05:21 +0000 Received: from mail-qk1-x72e.google.com ([2607:f8b0:4864:20::72e]) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1taguU-0000000B7b5-2u15 for linux-arm-kernel@lists.infradead.org; Wed, 22 Jan 2025 20:01:13 +0000 Received: by mail-qk1-x72e.google.com with SMTP id af79cd13be357-7b6eadd2946so1363085a.1 for ; Wed, 22 Jan 2025 12:01:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1737576069; x=1738180869; darn=lists.infradead.org; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=C0W+ZEfLiMkXEH7rt2/GiovuzxyzO0s3Xlcxt3/DbVg=; b=c1FB4cTr8wnnSqAhiW8/geR0c0a9LOU4N6yUEh7W1onBaVTUCfU+QYMwq99qW+vWfc Xu3KOR5XHQctUAn0XGxiuYTEC4VALDwwOonDdYFIkDEiTwLzJTjl6Y0XV+lASp90YJcj sXdQGCl5ae87QlBWhVuy4x0XW/F704qBKOLYteyMyf6t1WuXeaMgJMlN0DSpIkH05Xbt dHJnE9PKiRur3eDpBJvL2Jn9g9AoE8IkPI7NSoWGQavad6yZlixa1njE4XrhvSt8mCf+ cEgFsQBAsvEXwC1/Om+Ly7rDRUA0U7w2+eWgxv+fmti5d5rficP1EceYEdTRrJMZHyFo c9fQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737576069; x=1738180869; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=C0W+ZEfLiMkXEH7rt2/GiovuzxyzO0s3Xlcxt3/DbVg=; b=bktYrpXhn5bsxTYg6uE5kTLX/mQ7146qlMP73/XFN7wBzBuxLckR+IjqPu12YfFUMp XC3iWOL67/1wVJM5N5jhAeEA/f/7kl4M0X3lEY8LyRAJ30EFpYjZZBmQKVRco7fvGn7q hflBEnHHc3NFI+XGCYKkWsbixN1AVb/5OsGRmRJfGA5ikSCjVBU/pZumg1/4ZCtoNvEn Kt0VpJnzfa+U/fiYp8e66c/xCAlXflJYX+GxcNgC/Z14gSe5eTo/RXrmbtbY8X2TvXuS UW/SPCQ39Ju9AkGLmd684KTskaeB5PJuzj/GOVoyZJYIxe8V0X7MWEAAlX7rmhkq7xsS qSkw== X-Forwarded-Encrypted: i=1; AJvYcCXmpEsyjf/KAeiB7apWJwS9qjVC2JEUORw/S0UeeaMMeAmUTLBpuzaScsi5A8M1ZeoOfU26PJSfuS608Rjt5hZp@lists.infradead.org X-Gm-Message-State: AOJu0YxAn4hYevmCNYQ8bKn1N66fu+UMKiUx+zkNESVQ6mEPite0FMLn O5I8oUWcTX6wY9tl2dlex4mR0bB9Wjtc3NLUgS7txrcGD6HgD86K X-Gm-Gg: ASbGnctwUL72lTMoau54D0Y4vk45UUMp1woF5OJYeoYEbHz4+1tSrbc47WVUccJ/7aA F/O6Ct0Dr+g6xXiZNpz5e1EaDOzaWU4UrVBN4/+ZOCANUqru4gkjIfPomMDgQFBRLSxkYxLTjYL uVt2GuIyRwkachsye2fI7emwpZQg2OOLUbwuoGY+F3fk891V0mgfUam7/vndODsk8iF0LobVvJW PDPJ/CCtpuHAo2ARVNhV5SHojHH+ab0etPU+WQxiTNPpRAV2h2Rvn3U55dDpQSORMs2AJI7baxQ QkObUMFe9UB+XETJsriAGyo77udC X-Google-Smtp-Source: AGHT+IH7Bb216J6U5VF2T1X3drcXM9V1TK2n9v5TtKhVnuUToiXKlxArzXqCFTvaC3c4LRNMangaew== X-Received: by 2002:a05:620a:2893:b0:7b1:3bf8:b3c4 with SMTP id af79cd13be357-7be63158aa9mr1312874885a.0.1737576069547; Wed, 22 Jan 2025 12:01:09 -0800 (PST) Received: from [192.168.1.99] (ool-4355b0da.dyn.optonline.net. [67.85.176.218]) by smtp.gmail.com with ESMTPSA id af79cd13be357-7be6147e30asm694606385a.31.2025.01.22.12.01.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 22 Jan 2025 12:01:09 -0800 (PST) From: Connor Abbott Date: Wed, 22 Jan 2025 15:00:58 -0500 Subject: [PATCH v3 1/3] iommu/arm-smmu: Fix spurious interrupts with stall-on-fault MIME-Version: 1.0 Message-Id: <20250122-msm-gpu-fault-fixes-next-v3-1-0afa00158521@gmail.com> References: <20250122-msm-gpu-fault-fixes-next-v3-0-0afa00158521@gmail.com> In-Reply-To: <20250122-msm-gpu-fault-fixes-next-v3-0-0afa00158521@gmail.com> To: Rob Clark , Will Deacon , Robin Murphy , Joerg Roedel , Sean Paul , Konrad Dybcio , Abhinav Kumar , Dmitry Baryshkov , Marijn Suijten Cc: iommu@lists.linux.dev, linux-arm-msm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, freedreno@lists.freedesktop.org, Connor Abbott X-Mailer: b4 0.14.2 X-Developer-Signature: v=1; a=ed25519-sha256; t=1737576067; l=5809; i=cwabbott0@gmail.com; s=20240426; h=from:subject:message-id; bh=LZOQxUUGdQqLBW3ZuPFFeTY2r3+nASK7HMPqbv/N5EE=; b=+NXIUeaospHwBoU1GxlEI04bEiPY7s3tZ+XXIo6zrznDjds++HImSxpVTUy3AAj3qlRlGR+0D 8GDZdqn+QNSDR9v9/8qj5NGy4z/lrwbX5MhDaoec++7JU7xDAILX6pc X-Developer-Key: i=cwabbott0@gmail.com; a=ed25519; pk=dkpOeRSXLzVgqhy0Idr3nsBr4ranyERLMnoAgR4cHmY= X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20250122_120110_740272_070CCC47 X-CRM114-Status: GOOD ( 24.50 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On some SMMUv2 implementations, including MMU-500, SMMU_CBn_FSR.SS asserts an interrupt. The only way to clear that bit is to resume the transaction by writing SMMU_CBn_RESUME, but typically resuming the transaction requires complex operations (copying in pages, etc.) that can't be done in IRQ context. drm/msm already has a problem, because its fault handler sometimes schedules a job to dump the GPU state and doesn't resume translation until this is complete. Work around this by disabling context fault interrupts until after the transaction is resumed. Because other context banks can share an IRQ line, we may still get an interrupt intended for another context bank, but in this case only SMMU_CBn_FSR.SS will be asserted and we can skip it assuming that interrupts are disabled which is accomplished by removing the bit from ARM_SMMU_CB_FSR_FAULT. SMMU_CBn_FSR.SS won't be asserted unless an external user enabled stall-on-fault, and they are expected to resume the translation and re-enable interrupts. Signed-off-by: Connor Abbott Reviewed-by Robin Murphy --- drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 15 ++++++++++- drivers/iommu/arm/arm-smmu/arm-smmu.c | 41 +++++++++++++++++++++++++++++- drivers/iommu/arm/arm-smmu/arm-smmu.h | 1 - 3 files changed, 54 insertions(+), 3 deletions(-) diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c index 59d02687280e8d37b5e944619fcfe4ebd1bd6926..7d86e9972094eb4d304b24259f4ed9a4820cabc7 100644 --- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c +++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c @@ -125,12 +125,25 @@ static void qcom_adreno_smmu_resume_translation(const void *cookie, bool termina struct arm_smmu_domain *smmu_domain = (void *)cookie; struct arm_smmu_cfg *cfg = &smmu_domain->cfg; struct arm_smmu_device *smmu = smmu_domain->smmu; - u32 reg = 0; + u32 reg = 0, sctlr; + unsigned long flags; if (terminate) reg |= ARM_SMMU_RESUME_TERMINATE; + spin_lock_irqsave(&smmu_domain->cb_lock, flags); + arm_smmu_cb_write(smmu, cfg->cbndx, ARM_SMMU_CB_RESUME, reg); + + /* + * Re-enable interrupts after they were disabled by + * arm_smmu_context_fault(). + */ + sctlr = arm_smmu_cb_read(smmu, cfg->cbndx, ARM_SMMU_CB_SCTLR); + sctlr |= ARM_SMMU_SCTLR_CFIE; + arm_smmu_cb_write(smmu, cfg->cbndx, ARM_SMMU_CB_SCTLR, sctlr); + + spin_unlock_irqrestore(&smmu_domain->cb_lock, flags); } static void qcom_adreno_smmu_set_prr_bit(const void *cookie, bool set) diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c index 79afc92e1d8b984dd35c469a3f283ad0c78f3d26..ca1ff59015a63912f0f9c5256452b2b2efa928f1 100644 --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c @@ -463,13 +463,52 @@ static irqreturn_t arm_smmu_context_fault(int irq, void *dev) if (!(cfi.fsr & ARM_SMMU_CB_FSR_FAULT)) return IRQ_NONE; + /* + * On some implementations FSR.SS asserts a context fault + * interrupt. We do not want this behavior, because resolving the + * original context fault typically requires operations that cannot be + * performed in IRQ context but leaving the stall unacknowledged will + * immediately lead to another spurious interrupt as FSR.SS is still + * set. Work around this by disabling interrupts for this context bank. + * It's expected that interrupts are re-enabled after resuming the + * translation. + * + * We have to do this before report_iommu_fault() so that we don't + * leave interrupts disabled in case the downstream user decides the + * fault can be resolved inside its fault handler. + * + * There is a possible race if there are multiple context banks sharing + * the same interrupt and both signal an interrupt in between writing + * RESUME and SCTLR. We could disable interrupts here before we + * re-enable them in the resume handler, leaving interrupts enabled. + * Lock the write to serialize it with the resume handler. + */ + if (cfi.fsr & ARM_SMMU_CB_FSR_SS) { + u32 val; + + spin_lock(&smmu_domain->cb_lock); + val = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_SCTLR); + val &= ~ARM_SMMU_SCTLR_CFIE; + arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_SCTLR, val); + spin_unlock(&smmu_domain->cb_lock); + } + + /* + * The SMMUv2 architecture specification says that if stall-on-fault is + * enabled the correct sequence is to write to SMMU_CBn_FSR to clear + * the fault and then write to SMMU_CBn_RESUME. Clear the interrupt + * first before running the user's fault handler to make sure we follow + * this sequence. It should be ok if there is another fault in the + * meantime because we have already read the fault info. + */ + arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_FSR, cfi.fsr); + ret = report_iommu_fault(&smmu_domain->domain, NULL, cfi.iova, cfi.fsynr & ARM_SMMU_CB_FSYNR0_WNR ? IOMMU_FAULT_WRITE : IOMMU_FAULT_READ); if (ret == -ENOSYS && __ratelimit(&rs)) arm_smmu_print_context_fault_info(smmu, idx, &cfi); - arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_FSR, cfi.fsr); return IRQ_HANDLED; } diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.h b/drivers/iommu/arm/arm-smmu/arm-smmu.h index 2dbf3243b5ad2db01e17fb26c26c838942a491be..789c64ff3eb9944c8af37426e005241a8288da20 100644 --- a/drivers/iommu/arm/arm-smmu/arm-smmu.h +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.h @@ -216,7 +216,6 @@ enum arm_smmu_cbar_type { ARM_SMMU_CB_FSR_TLBLKF) #define ARM_SMMU_CB_FSR_FAULT (ARM_SMMU_CB_FSR_MULTI | \ - ARM_SMMU_CB_FSR_SS | \ ARM_SMMU_CB_FSR_UUT | \ ARM_SMMU_CB_FSR_EF | \ ARM_SMMU_CB_FSR_PF | \