From patchwork Thu Mar 7 13:39:14 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yosry Ahmed X-Patchwork-Id: 13585599 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 31007C54E4A for ; Thu, 7 Mar 2024 13:39:24 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A90B46B017F; Thu, 7 Mar 2024 08:39:23 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A40D96B0181; Thu, 7 Mar 2024 08:39:23 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8E15A6B0182; Thu, 7 Mar 2024 08:39:23 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 772446B017F for ; Thu, 7 Mar 2024 08:39:23 -0500 (EST) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 1A4811C1394 for ; Thu, 7 Mar 2024 13:39:23 +0000 (UTC) X-FDA: 81870349806.01.DF1F922 Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) by imf09.hostedemail.com (Postfix) with ESMTP id 7C44B14000E for ; Thu, 7 Mar 2024 13:39:21 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=AghDYDNt; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf09.hostedemail.com: domain of 3iMPpZQoKCOslbfelNUZRQTbbTYR.PbZYVahk-ZZXiNPX.beT@flex--yosryahmed.bounces.google.com designates 209.85.219.201 as permitted sender) smtp.mailfrom=3iMPpZQoKCOslbfelNUZRQTbbTYR.PbZYVahk-ZZXiNPX.beT@flex--yosryahmed.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1709818761; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=XhRtgDTyBtYE/TCEGVA0Jcde7nXTTPmxFYt2TI6fczU=; b=0hW5aLEL2ws1bkTVbvdcGWCZt6gEjnIVnmYSUaSqbrqOvI1KV9MvMJPGqMj3CEvQ6H6Dq3 nF63brbcN3/c8Mr4avT8SQqfS7i58pXzXZA2oO1Kd30sOJbzjVvSEBMYMbM7R+Mw0/eL2L dw37pVIJ++cdGvFEqhqekGfznZVDrGk= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=AghDYDNt; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf09.hostedemail.com: domain of 3iMPpZQoKCOslbfelNUZRQTbbTYR.PbZYVahk-ZZXiNPX.beT@flex--yosryahmed.bounces.google.com designates 209.85.219.201 as permitted sender) smtp.mailfrom=3iMPpZQoKCOslbfelNUZRQTbbTYR.PbZYVahk-ZZXiNPX.beT@flex--yosryahmed.bounces.google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1709818761; a=rsa-sha256; cv=none; b=ENb0t0HydmTJJP3QFaMZdV7M0kQGZjcNWnuEL030FDnvHXpg1or1kW8nRXVlVU1OIZ5isZ 5xwVS2djdYmxPuSgFlW1xOew1iyz8KjE00qV9jFjELmP+vF/ebg0kVfY7y5cDjS5Q9vw9m HsnAGFCIBX2HI3T/m1/KF2knrhXDZFc= Received: by mail-yb1-f201.google.com with SMTP id 3f1490d57ef6-dc6b26845cdso1290915276.3 for ; Thu, 07 Mar 2024 05:39:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1709818760; x=1710423560; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=XhRtgDTyBtYE/TCEGVA0Jcde7nXTTPmxFYt2TI6fczU=; b=AghDYDNtXYjnuJ5HJgcNINERosIm83dD+SjqG7/BWRHCsLsTHfsEB0C+osPfHeeHJ7 r4LfSWs7xgdA1eJ/j9DmUlak7+HeyfQWDZYicUdqO5nJ/DBdP+3ePFlA60+P42yA3Pt9 Z1QgMhe/Dp9jA+L9CaB68P9g4S9rmYZ8wKVQ1RfUwtxdt/wVFoyko4Sw3s5cVYiqn3fZ w85j6APTdSiHanoSpJiniggsVSEJuBk8TYDy6YHzDm43hdGgc1FdVxSGhJCanaIp5p7n 2FW7Z96yYUpMLTR1tdM2Qzj28TVu8a5DCU/LAJhYmHzNrugR1G+KaPheBQWsDIhSrsxv zmXg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709818760; x=1710423560; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=XhRtgDTyBtYE/TCEGVA0Jcde7nXTTPmxFYt2TI6fczU=; b=U1xKWjYK9U+baDD/15CIQVRenQRFRymnZkElVGsp3dxYQe29FEQWauLCEMYW7w0uli b7ZJn/dU5uiG/tFvyVjs5AfyszPvqGdXLGK7I2bFcKeLuk+eTZ4KkkxvwlrZhyvVkQwC 0rbqvCguV+1hOTDAIjOmckRwEcZjv4DwF2V47i24eKMpFeUCsKu3iRwsySM0GnYVDg6y i9Y3fkndGrcpWRxOWFqylkGsA5t22+Q6opvoJrvkJkHi7rr26rP8JaOKJxH3EAky9sq5 57nezTJG8atMWoe/+Pjptfs1xhzvtJv+np/FmpI7EclA8nY3p7n1GL6CVYuv/DFJERg6 01jg== X-Forwarded-Encrypted: i=1; AJvYcCXucvxiTCB7RYQ0mVz9vIT6iK5o+7d2L8i5LfNl/Lv3g2MqvM8a/PqDo/Bu9Z4fp+2BK2r9OB8WFY4ihgexGzty3ec= X-Gm-Message-State: AOJu0YzbkgB+S39o7IXXPEzcxHlvVzMtFE0Ty+LNxNkV4eeDBujIDaTp lVxWCOc1tth5js6xtq6qwXiy91V6bmhfI3J3DlenLplUWD+QCZgqPp471O3kVhGK/Wry+mOJy4h GqjMYyVXAgVy0bKJysQ== X-Google-Smtp-Source: AGHT+IFRqFk4oamlQows6KoGWBh8WrwoJ8kNbWyjPiHy+1cnclmprEIfJ2rYatkNe3gWdYcqZi/b1yYDg6XF3ZIA X-Received: from yosry.c.googlers.com ([fda3:e722:ac3:cc00:20:ed76:c0a8:29b4]) (user=yosryahmed job=sendgmr) by 2002:a25:ab0e:0:b0:dc6:519b:5425 with SMTP id u14-20020a25ab0e000000b00dc6519b5425mr4354682ybi.11.1709818760421; Thu, 07 Mar 2024 05:39:20 -0800 (PST) Date: Thu, 7 Mar 2024 13:39:14 +0000 In-Reply-To: <20240307133916.3782068-1-yosryahmed@google.com> Mime-Version: 1.0 References: <20240307133916.3782068-1-yosryahmed@google.com> X-Mailer: git-send-email 2.44.0.278.ge034bb2e1d-goog Message-ID: <20240307133916.3782068-2-yosryahmed@google.com> Subject: [RFC PATCH 1/3] x86/mm: fix LAM cr3 mask inconsistency during context switch From: Yosry Ahmed To: Andrew Morton Cc: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , Peter Zijlstra , Andy Lutomirski , "Kirill A. Shutemov" , x86@kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Yosry Ahmed X-Rspamd-Queue-Id: 7C44B14000E X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: zxb14tt8ndphirekx78dnqogd3dmbsmg X-HE-Tag: 1709818761-542933 X-HE-Meta: U2FsdGVkX18luMrbFXVSe8kRUFoNqdZqBOLVBJZTI5ldDqVK124Fr2FLSkTvxbvYdVPiEQ7wa7QT61e24j44Do8gWwugrvTYyJ1tzIByS1tQ7uUMhKURNlqmOxSJH4JMhYN7BvAM3f0OCUSpR+0HHxQknKBnl4jLFfBgbN2xYpvIM9Lu19NMjOl7F4MLu2VhZJPZHsK4KP91TdvueNP3Y6br39cWECXdhLRT8+ZYiOGdvFfWjnbJZnVmqIGMWbYC7f25YCDMmfhAD9fARvZXFJWTP3vGS+B+Yx9Fw5oXSIIh34snOxmswFckqbE/R5e96BP+R84XWFiEcGPsXxnbMPUkbowXffbaT9DFxxEk5nQgstJAvYxPd9Gm0ZSZiLJfnP3tl4oI1ndF7Z299ZmeWrV+I/D3tQL8LsE03jAEwBjShTLRp4XrL+4L7HYTSZzlnTKZA2yg3knzi7g/2IxIaOcBf/Sgc/LWxw/k9B9pil1kuoURYSSJY5MoTl2DkOy2/HMe2NNvTFhBAprYXjiZ3Uyj41dG4rN4Us0+ZyP8aID5SJfun6arauDJNej59oXCmS/CBah5/GNAhpfaZaar6vroYme71rkg9y66ajmYJp+0FKeqzjlLrs9jo1UlFqEDEXAIrLw9gzCbGBeLyioEje1h4C3TKM/nAYpKNSSuc9iuGeAMqgg5EbSS85Gge8Z9RPe9MvFArzeHwj2x7DAO+6XAGx3bPojYRJxxdvHIrfNATxPFa4Gry0rsKPQOwhUFrM9tajOSBKWJJ1yUyHQo3aWmkq6/ZqKyWIRtIwMGrFVvLm2DwHYyL/Xw54gnfiHsXlWNp+K39Nc4UPlj1eIo1dtBhkV07nRGINlS5gfIL3tGt+ZHwYL89rY6xFdzgiSDNP3BbKK+wzeuAFp3rh6zzxCA4Bn8H1Be46BpavduUwjh+m2zC+3YD6bvj5T9uetdryDDMg3T+Q2N+BC08sQ F856tTOu 67jLnaHM608syDXfsxa7TsfoPovgI2EoF0PwpEYGpTO8MsahnpftnpB+aVGClZL+Dqb35CtNNVnaddZTXMzKpGM1QgSQgWq0lLg3DeBDWzIZ0svVsGt9z9LLc20zAODl2lPp9lQwDhNv2SChCHNJWMXETiPTCM14nPYPUZPUbsLjR00/gIarH8Osfe4iZnEMGHVJBvUEmrcMZCYO+e49XMqhHNgDcLXSEMSXi2vU/nW2t5fUEPzIGLeWa+TurL900LgpLVO7UfwxbCAvMijBdBr6CRgYTtpTUufZB2YYhMLcXWMuYANBitVYE05slVTSoTK3v5EorImwlRqAC7p7lwS/6j8GxTWyzAoQQKn1sMm6rMbmh2B1oG+PO/E/Y4vjhG5yt X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: In switch_mm_irqs_off(), we read the 'mm->context.lam_cr3_mask' into 'new_lam', which is later passed to load_new_mm_cr3(). However, there is a call to set_tlbstate_lam_mode() in between which will read 'mm->context.lam_cr3_mask' again and set 'cpu_tlbstate.lam' accordingly. If we race with another thread updating 'mm->context.lam_cr3_mask', the value in 'cpu_tlbstate.lam' could end up being different from CR3. Fix the problem by updating set_tlbstate_lam_mode() to return the LAM mask that was set to 'cpu_tlbstate.lam', and use that mask in switch_mm_irqs_off() when writing CR3. Use READ_ONCE to make sure we read the mask once and use it consistenly. No practical problems have been observed from this, but it's a recipe for future problems (e.g. debug warnings in switch_mm_irqs_off() or __get_current_cr3_fast() could fire). It is unlikely that this can cause any real issues since only a single-threaded process can update its own LAM mask, so the race here could happen when context switching between kthreads using a borrowed MM. In this case, it's unlikely that LAM is important. If it is, then we would need to ensure all CPUs using the mm are updated before returning to userspace when LAM is enabled -- but we don't do that. While we are at it, remove the misguiding comment that states that 'new_lam' may not match tlbstate_lam_cr3_mask() if a race occurs. This can happen without a race, a different thread could have just enabled LAM since the last context switch on the current CPU. Replace it with a hopefully clearer comment above set_tlbstate_lam_mode(). Signed-off-by: Yosry Ahmed --- arch/x86/include/asm/tlbflush.h | 11 +++++++---- arch/x86/mm/tlb.c | 17 ++++++++--------- 2 files changed, 15 insertions(+), 13 deletions(-) diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h index 25726893c6f4d..a4ddb20f84fe7 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -399,11 +399,13 @@ static inline u64 tlbstate_lam_cr3_mask(void) return lam << X86_CR3_LAM_U57_BIT; } -static inline void set_tlbstate_lam_mode(struct mm_struct *mm) +static inline unsigned long set_tlbstate_lam_mode(struct mm_struct *mm) { - this_cpu_write(cpu_tlbstate.lam, - mm->context.lam_cr3_mask >> X86_CR3_LAM_U57_BIT); + unsigned long lam = READ_ONCE(mm->context.lam_cr3_mask); + + this_cpu_write(cpu_tlbstate.lam, lam >> X86_CR3_LAM_U57_BIT); this_cpu_write(tlbstate_untag_mask, mm->context.untag_mask); + return lam; } #else @@ -413,8 +415,9 @@ static inline u64 tlbstate_lam_cr3_mask(void) return 0; } -static inline void set_tlbstate_lam_mode(struct mm_struct *mm) +static inline unsigned long set_tlbstate_lam_mode(struct mm_struct *mm) { + return 0; } #endif #endif /* !MODULE */ diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 51f9f56941058..2975d3f89a5de 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -503,9 +503,9 @@ void switch_mm_irqs_off(struct mm_struct *unused, struct mm_struct *next, { struct mm_struct *prev = this_cpu_read(cpu_tlbstate.loaded_mm); u16 prev_asid = this_cpu_read(cpu_tlbstate.loaded_mm_asid); - unsigned long new_lam = mm_lam_cr3_mask(next); bool was_lazy = this_cpu_read(cpu_tlbstate_shared.is_lazy); unsigned cpu = smp_processor_id(); + unsigned long new_lam; u64 next_tlb_gen; bool need_flush; u16 new_asid; @@ -561,11 +561,6 @@ void switch_mm_irqs_off(struct mm_struct *unused, struct mm_struct *next, VM_WARN_ON(this_cpu_read(cpu_tlbstate.ctxs[prev_asid].ctx_id) != next->context.ctx_id); - /* - * If this races with another thread that enables lam, 'new_lam' - * might not match tlbstate_lam_cr3_mask(). - */ - /* * Even in lazy TLB mode, the CPU should stay set in the * mm_cpumask. The TLB shootdown code can figure out from @@ -633,7 +628,12 @@ void switch_mm_irqs_off(struct mm_struct *unused, struct mm_struct *next, barrier(); } - set_tlbstate_lam_mode(next); + /* + * Even if we are not actually switching mm's, another thread could have + * updated mm->context.lam_cr3_mask. Make sure tlbstate_lam_cr3_mask() + * and the loaded CR3 use the up-to-date mask. + */ + new_lam = set_tlbstate_lam_mode(next); if (need_flush) { this_cpu_write(cpu_tlbstate.ctxs[new_asid].ctx_id, next->context.ctx_id); this_cpu_write(cpu_tlbstate.ctxs[new_asid].tlb_gen, next_tlb_gen); @@ -705,7 +705,6 @@ void initialize_tlbstate_and_flush(void) /* LAM expected to be disabled */ WARN_ON(cr3 & (X86_CR3_LAM_U48 | X86_CR3_LAM_U57)); - WARN_ON(mm_lam_cr3_mask(mm)); /* * Assert that CR4.PCIDE is set if needed. (CR4.PCIDE initialization @@ -724,7 +723,7 @@ void initialize_tlbstate_and_flush(void) this_cpu_write(cpu_tlbstate.next_asid, 1); this_cpu_write(cpu_tlbstate.ctxs[0].ctx_id, mm->context.ctx_id); this_cpu_write(cpu_tlbstate.ctxs[0].tlb_gen, tlb_gen); - set_tlbstate_lam_mode(mm); + WARN_ON(set_tlbstate_lam_mode(mm)); for (i = 1; i < TLB_NR_DYN_ASIDS; i++) this_cpu_write(cpu_tlbstate.ctxs[i].ctx_id, 0);