Message ID | 20250319-msm-gpu-fault-fixes-next-v5-0-97561209dd8c@gmail.com (mailing list archive) |
---|---|
Headers | show
Return-Path: <linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org> X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 20ADFC35FFA for <linux-arm-kernel@archiver.kernel.org>; Wed, 19 Mar 2025 14:47:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Cc:To: Content-Transfer-Encoding:Content-Type:MIME-Version:Message-Id:Date:Subject: From:Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:In-Reply-To:References: List-Owner; bh=DZvJrb+zBc9tB4E4JhmrwjQA5QPEwUV+8PpbNLpgvNM=; b=CaYypT0mUjfHpU tSi4d5/V6lJer35q+S32dRbR8F15edAc2q9lv5iHpR2Gy9k3x9+ghQvamKy0+Edj/KNaRLYdDi3jR rRjOvhbWuq8vWH8S96lsetLNRzNSjtuz8k5ZVygc4ie/ES0TBMCKERooH4/w2Mlu+NklYS4WfHp9J nIGpn7SS5mGXT8uaI02tQFSbyj1skuoB5fmVpebRttFQt9Q2FlVb6XN9e1Hr/o0w8zJbBlfVUz2s8 GgT0zProieZFZnqjJID/NcOe8V48HOg5vG2SOfcdhdRvwB4UfX8ESfKlwzv9rPTWlV7856ICYtlr5 G9NdreE2QBWRRc1lWdvw==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1tuuha-00000009FiR-1Gd4; Wed, 19 Mar 2025 14:47:26 +0000 Received: from mail-qk1-x72b.google.com ([2607:f8b0:4864:20::72b]) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1tuueP-00000009FGC-0XpC for linux-arm-kernel@lists.infradead.org; Wed, 19 Mar 2025 14:44:10 +0000 Received: by mail-qk1-x72b.google.com with SMTP id af79cd13be357-7c0e36b823fso109632785a.2 for <linux-arm-kernel@lists.infradead.org>; Wed, 19 Mar 2025 07:44:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1742395447; x=1743000247; darn=lists.infradead.org; h=cc:to:content-transfer-encoding:mime-version:message-id:date :subject:from:from:to:cc:subject:date:message-id:reply-to; bh=DZvJrb+zBc9tB4E4JhmrwjQA5QPEwUV+8PpbNLpgvNM=; b=W5xGgHM+Lrpa29llpAhYvIjQ3QfCmo2amw9qCwy55Xxn6ZmorxNLAasxuIS0t7Jznn v7kxKWvUmj2m7MB7Tx7Ph/aDLR74Dft9Xbll+cWBo5QXAgM4BhgbM/YfPm+CuF5SetLG kcr0z6Tbs8e4exTyxxs46O3iQppeW4uNpHwOylKF3WNnMZpcuNh45iQIc+QfXO7V39M0 Ru4UeM2yRRApXhLD4MiyOCArwe8UITM8lvz5s0UQYien1IqesxssaIe3uxvYD8VFmc8n sJkYjAi0jqPvdOmIXmlzsxsVhILfTs8Syfuq6fIZ50qtHZnDNcwthvs4ktZiuYCuO49O B8QA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1742395447; x=1743000247; h=cc:to:content-transfer-encoding:mime-version:message-id:date :subject:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=DZvJrb+zBc9tB4E4JhmrwjQA5QPEwUV+8PpbNLpgvNM=; b=pWBLCczhKaWtCA34AvFMzFQUUNCUBnQR92FzL5nEe0HrdE+I9TwQOfkLJwY3KTtPCU zWGgxPMzW/zDKTbYLkayEMdlG/qyGGTwUc5R0X32+bJC5LvqsDyvzZCTYv0UFbx9LVvI v+MlYMkuD0zGY+Cy3EQi6TchIw1Xuhkznco7d5PCnH94f2Qc4GpCmI/3owcn3LNGayel uPtqokNMw0jerW2Cp+gydvH4eVqS19nLRsUKnrRE+07JjFHB70tLx4sD4RS65QASict5 KVsZElTVX5woCnyRHluKAyRPPfNIwaCVyeKNjnxOb7gHoKTp/eLR2mYp1CQ0KpEDXEWw g9hw== X-Forwarded-Encrypted: i=1; AJvYcCW5U36hbHTH7XHFVAjkoHmhVPRVkQKV/ZpPxBAmyTHIHOg/ajGqF/8ni0ePf4+2NKAkbxynhZlYfg0Lx5wRCbWo@lists.infradead.org X-Gm-Message-State: AOJu0YzpX6efKjOtu1mY7wmq8UODeBqOG6r6GAUDIvQe6ovPg5DRImb1 Bfcr4rVIEMTwH3OBFX+fLRT5GnJbI3OBHBlwByyehwYgll/wsH9Q X-Gm-Gg: ASbGnctHOE6x8P4kIkEkj8lnVumO0vwVjTL1nwY5fyCUz0ORId3LeWFMsYAspQINmVs rxhv3IrEUHGHbqe13Wxa5+yRFmac+wcCZ+uOEm+lIFpBT5pJ74nUljXAVa2sIObTpbHe6TE+awv JWaihlQ8jPVDcn+W1v1AXeiBSdkgQQTIzGdKMVE8GCL4h9PlgAIWH3mgh+ZIF/SbIyCb9Wqdq7X VxUR4XStQe/ePwvriK0VR0lNvRDUgpel8hetqAcCryAISWO+jquc7uaVASNf1ma8SM4Sl70gyOy ALQvuk0EZJO3w0JgKUgm131wxx7F2EwqMatCPGAr7HBrl0Fr+ORC2vDntBdcAz7P6KM5s3Mm2nM Whwqo0ZYd+6MgPQ== X-Google-Smtp-Source: AGHT+IH8wFR2r4P4bYD39ofrHSl4H2gYM7bk1WgoKLisHaottHVJmyfq/MFmaRHVPdil7K8nUd/zNA== X-Received: by 2002:a05:620a:472c:b0:7c5:8f36:fbeb with SMTP id af79cd13be357-7c5a8454c75mr149047885a.12.1742395447329; Wed, 19 Mar 2025 07:44:07 -0700 (PDT) Received: from [192.168.1.99] (ool-4355b0da.dyn.optonline.net. [67.85.176.218]) by smtp.gmail.com with ESMTPSA id af79cd13be357-7c573c5201fsm868587485a.23.2025.03.19.07.44.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 19 Mar 2025 07:44:06 -0700 (PDT) From: Connor Abbott <cwabbott0@gmail.com> Subject: [PATCH v5 0/5] iommu/arm-smmu, drm/msm: Fixes for stall-on-fault Date: Wed, 19 Mar 2025 10:43:59 -0400 Message-Id: <20250319-msm-gpu-fault-fixes-next-v5-0-97561209dd8c@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit X-B4-Tracking: v=1; b=H4sIAC/Y2mcC/43PwW6DMAwG4Fepcp4nx06g7LT3mHZIgqGRClSEo lYV797Qy1APaMff1v9ZfqgkY5Skvg4PNcocUxz6HOzHQYWT61uBWOesCMmi1iV0qYP2coXGXc8 TNPEmCXq5TVAVwlgdkVi0yvXLKK9tbv/85nyKaRrG++vSrNfpP9BZA4IPlWcmb2v0323n4vkzD J1a0Zk2EOEORBmqCy6CQSrR0DvEW4h2IM4QusYhanu0pN8h8wcxmh3IrK+JNl64bEzgLbQsyxM DZl6smwEAAA== X-Change-ID: 20250117-msm-gpu-fault-fixes-next-96e3098023e1 To: Rob Clark <robdclark@gmail.com>, Will Deacon <will@kernel.org>, Robin Murphy <robin.murphy@arm.com>, Joerg Roedel <joro@8bytes.org>, Sean Paul <sean@poorly.run>, Konrad Dybcio <konradybcio@kernel.org>, Abhinav Kumar <quic_abhinavk@quicinc.com>, Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>, Marijn Suijten <marijn.suijten@somainline.org> Cc: iommu@lists.linux.dev, linux-arm-msm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, freedreno@lists.freedesktop.org, Connor Abbott <cwabbott0@gmail.com> X-Mailer: b4 0.14.2 X-Developer-Signature: v=1; a=ed25519-sha256; t=1742395446; l=3767; i=cwabbott0@gmail.com; s=20240426; h=from:subject:message-id; bh=W+2qxiUePPwUKrcvsCPyy4DS+hrPlZHy31bDsxo4fQg=; b=KuiYN+2L5L9ZTkYhp2nQbTVJiw0RI+ulKpDIosj5jHuAWlohXExldKuu0bne/raFKDh/osyLf wtodUGcp3MSBERRMcHsuvuYsQDtGtaxFLRVCv768rnsZMtcPy0a/I1z X-Developer-Key: i=cwabbott0@gmail.com; a=ed25519; pk=dkpOeRSXLzVgqhy0Idr3nsBr4ranyERLMnoAgR4cHmY= X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20250319_074409_174501_A621DCB1 X-CRM114-Status: GOOD ( 17.75 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: <linux-arm-kernel.lists.infradead.org> List-Unsubscribe: <http://lists.infradead.org/mailman/options/linux-arm-kernel>, <mailto:linux-arm-kernel-request@lists.infradead.org?subject=unsubscribe> List-Archive: <http://lists.infradead.org/pipermail/linux-arm-kernel/> List-Post: <mailto:linux-arm-kernel@lists.infradead.org> List-Help: <mailto:linux-arm-kernel-request@lists.infradead.org?subject=help> List-Subscribe: <http://lists.infradead.org/mailman/listinfo/linux-arm-kernel>, <mailto:linux-arm-kernel-request@lists.infradead.org?subject=subscribe> Sender: "linux-arm-kernel" <linux-arm-kernel-bounces@lists.infradead.org> Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org |
Series |
iommu/arm-smmu, drm/msm: Fixes for stall-on-fault
|
expand
|
drm/msm uses the stall-on-fault model to record the GPU state on the first GPU page fault to help debugging. On systems where the GPU is paired with a MMU-500, there were two problems: 1. The MMU-500 doesn't de-assert its interrupt line until the fault is resumed, which led to a storm of interrupts until the fault handler was called. If we got unlucky and the fault handler was on the same CPU as the interrupt, there was a deadlock. 2. The GPU is capable of generating page faults much faster than we can resume them. GMU (GPU Management Unit) shares the same context bank as the GPU, so if there was a sudden spurt of page faults it would be effectively starved and would trigger a watchdog reset, made even worse because the GPU cannot be reset while there's a pending transaction leaving the GPU permanently wedged. Patches 1-3 fixes the first problem and is independent of the rest of the series. Patch 5 fixes the second problem and is dependent on patch 4, so there will have to be some cross-tree coordination. I've rebased this series on the latest linux-next to avoid rebase troubles. Signed-off-by: Connor Abbott <cwabbott0@gmail.com> --- Changes in v5: - Don't read CONTEXTIDR for stage 2 domains. - Clarify that we don't need TLB invalidation when changing SMMU_CBn_SCTLR.CFCFG. - Link to v4: https://lore.kernel.org/r/20250304-msm-gpu-fault-fixes-next-v4-0-be14be37f4c3@gmail.com Changes in v4: - Add patches 1-2, which fix reading registers in drm/msm when acknowledging the fault early. This was Robin's preferred solution compared to making drm/msm's fault handler tell arm-smmu to resume the fault. - Link to v3: https://lore.kernel.org/r/20250122-msm-gpu-fault-fixes-next-v3-0-0afa00158521@gmail.com Changes in v3: - Acknowledge the fault before resuming the transaction in patch 1. - Add suggested extra context to commit messages. - Link to v2: https://lore.kernel.org/r/20250120-msm-gpu-fault-fixes-next-v2-0-d636c4027042@gmail.com Changes in v2: - Remove unnecessary _irqsave when locking in IRQ handler (Robin) - Reuse existing spinlock for CFIE manipulation (Robin) - Lock CFCFG manipulation against concurrent CFIE manipulation - Don't use timer to re-enable stall-on-fault. (Rob) - Use more descriptive name for the function that re-enables stall-on-fault if the cooldown period has ended. (Rob) - Link to v1: https://lore.kernel.org/r/20250117-msm-gpu-fault-fixes-next-v1-0-bc9b332b5d0b@gmail.com --- Connor Abbott (5): iommu/arm-smmu: Save additional information on context fault iommu/arm-smmu-qcom: Don't read fault registers directly iommu/arm-smmu: Fix spurious interrupts with stall-on-fault iommu/arm-smmu-qcom: Make set_stall work when the device is on drm/msm: Temporarily disable stall-on-fault after a page fault drivers/gpu/drm/msm/adreno/a5xx_gpu.c | 2 + drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 4 ++ drivers/gpu/drm/msm/adreno/adreno_gpu.c | 42 +++++++++++- drivers/gpu/drm/msm/adreno/adreno_gpu.h | 26 ++++++++ drivers/gpu/drm/msm/msm_iommu.c | 9 +++ drivers/gpu/drm/msm/msm_mmu.h | 1 + drivers/iommu/arm/arm-smmu/arm-smmu-qcom-debug.c | 6 +- drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 67 +++++++++++++++---- drivers/iommu/arm/arm-smmu/arm-smmu.c | 84 ++++++++++++++++++------ drivers/iommu/arm/arm-smmu/arm-smmu.h | 21 +++--- 10 files changed, 216 insertions(+), 46 deletions(-) --- base-commit: 866e43b945bf98f8e807dfa45eca92f931f3a032 change-id: 20250117-msm-gpu-fault-fixes-next-96e3098023e1 Best regards,