diff mbox series

arch/cacheflush: Introduce flush_all_caches()

Message ID 20220815160706.tqd42dv24tgb7x7y@offworld (mailing list archive)
State Superseded
Headers show
Series arch/cacheflush: Introduce flush_all_caches() | expand

Commit Message

Davidlohr Bueso Aug. 15, 2022, 4:07 p.m. UTC
With CXL security features, global CPU cache flushing nvdimm
requirements are no longer specific to that subsystem, even
beyond the scope of security_ops. CXL will need such semantics
for features not necessarily limited to persistent memory.

While the scope of this is for physical address space, add a
new flush_all_caches() in cacheflush headers such that each
architecture can define it, when capable. For x86 just use the
wbinvd hammer and prevent any other arch from being capable.
While there can be performance penalties or delays response
times, these calls are both rare and explicitly security
related, and therefore become less important.

Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
---

After a few iterations I circled back to an interface without granularity.
It just doesn't make sense right now to define a range if arm64 will not
support this (won't do VA-based physical address space flushes) and, until
it comes up with consistent caches, security operations will simply be
unsupported.

  arch/x86/include/asm/cacheflush.h |  3 +++
  drivers/acpi/nfit/intel.c         | 41 ++++++++++++++-----------------
  include/asm-generic/cacheflush.h  | 22 +++++++++++++++++
  3 files changed, 43 insertions(+), 23 deletions(-)

--
2.37.2

Comments

Peter Zijlstra Aug. 16, 2022, 9:01 a.m. UTC | #1
On Mon, Aug 15, 2022 at 09:07:06AM -0700, Davidlohr Bueso wrote:
> diff --git a/arch/x86/include/asm/cacheflush.h b/arch/x86/include/asm/cacheflush.h
> index b192d917a6d0..ce2ec9556093 100644
> --- a/arch/x86/include/asm/cacheflush.h
> +++ b/arch/x86/include/asm/cacheflush.h
> @@ -10,4 +10,7 @@
> 
>  void clflush_cache_range(void *addr, unsigned int size);
> 
> +#define flush_all_caches() \
> +	do { wbinvd_on_all_cpus(); } while(0)
> +

This is horrific... we've done our utmost best to remove all WBINVD
usage and here you're adding it back in the most horrible form possible
?!?

Please don't do this, do *NOT* use WBINVD.
Dan Williams Aug. 16, 2022, 4:50 p.m. UTC | #2
Peter Zijlstra wrote:
> On Mon, Aug 15, 2022 at 09:07:06AM -0700, Davidlohr Bueso wrote:
> > diff --git a/arch/x86/include/asm/cacheflush.h b/arch/x86/include/asm/cacheflush.h
> > index b192d917a6d0..ce2ec9556093 100644
> > --- a/arch/x86/include/asm/cacheflush.h
> > +++ b/arch/x86/include/asm/cacheflush.h
> > @@ -10,4 +10,7 @@
> > 
> >  void clflush_cache_range(void *addr, unsigned int size);
> > 
> > +#define flush_all_caches() \
> > +	do { wbinvd_on_all_cpus(); } while(0)
> > +
> 
> This is horrific... we've done our utmost best to remove all WBINVD
> usage and here you're adding it back in the most horrible form possible
> ?!?
> 
> Please don't do this, do *NOT* use WBINVD.

Unfortunately there are a few good options here, and the changelog did
not make clear that this is continuing legacy [1], not adding new wbinvd
usage.

The functionality this is enabling is to be able to instantaneously
secure erase potentially terabytes of memory at once and the kernel
needs to be sure that none of the data from before the secure is still
present in the cache. It is also used when unlocking a memory device
where speculative reads and firmware accesses could have cached poison
from before the device was unlocked.

This capability is typically only used once per-boot (for unlock), or
once per bare metal provisioning event (secure erase), like when handing
off the system to another tenant. That small scope plus the fact that
none of this is available to a VM limits the potential damage. So,
similar to the mitigation we did in [2] that did not kill off wbinvd
completely, this is limited to specific scenarios and should be disabled
in any scenario where wbinvd is painful / forbidden.

[1]: 4c6926a23b76 ("acpi/nfit, libnvdimm: Add unlock of nvdimm support for Intel DIMMs")
[2]: e2efb6359e62 ("ACPICA: Avoid cache flush inside virtual machines")
Davidlohr Bueso Aug. 16, 2022, 4:53 p.m. UTC | #3
On Tue, 16 Aug 2022, Dan Williams wrote:

>Peter Zijlstra wrote:
>> On Mon, Aug 15, 2022 at 09:07:06AM -0700, Davidlohr Bueso wrote:
>> > diff --git a/arch/x86/include/asm/cacheflush.h b/arch/x86/include/asm/cacheflush.h
>> > index b192d917a6d0..ce2ec9556093 100644
>> > --- a/arch/x86/include/asm/cacheflush.h
>> > +++ b/arch/x86/include/asm/cacheflush.h
>> > @@ -10,4 +10,7 @@
>> >
>> >  void clflush_cache_range(void *addr, unsigned int size);
>> >
>> > +#define flush_all_caches() \
>> > +	do { wbinvd_on_all_cpus(); } while(0)
>> > +
>>
>> This is horrific... we've done our utmost best to remove all WBINVD
>> usage and here you're adding it back in the most horrible form possible
>> ?!?
>>
>> Please don't do this, do *NOT* use WBINVD.
>
>Unfortunately there are a few good options here, and the changelog did
>not make clear that this is continuing legacy [1], not adding new wbinvd
>usage.

While I was hoping that it was obvious from the intel.c changes that this
was not a new wbinvd, I can certainly improve the changelog with the below.

Thanks,
Davidlohr

>
>The functionality this is enabling is to be able to instantaneously
>secure erase potentially terabytes of memory at once and the kernel
>needs to be sure that none of the data from before the secure is still
>present in the cache. It is also used when unlocking a memory device
>where speculative reads and firmware accesses could have cached poison
>from before the device was unlocked.
>
>This capability is typically only used once per-boot (for unlock), or
>once per bare metal provisioning event (secure erase), like when handing
>off the system to another tenant. That small scope plus the fact that
>none of this is available to a VM limits the potential damage. So,
>similar to the mitigation we did in [2] that did not kill off wbinvd
>completely, this is limited to specific scenarios and should be disabled
>in any scenario where wbinvd is painful / forbidden.
>
>[1]: 4c6926a23b76 ("acpi/nfit, libnvdimm: Add unlock of nvdimm support for Intel DIMMs")
>[2]: e2efb6359e62 ("ACPICA: Avoid cache flush inside virtual machines")
Dan Williams Aug. 16, 2022, 5:42 p.m. UTC | #4
On Tue, Aug 16, 2022 at 10:30 AM Davidlohr Bueso <dave@stgolabs.net> wrote:
>
> On Tue, 16 Aug 2022, Dan Williams wrote:
>
> >Peter Zijlstra wrote:
> >> On Mon, Aug 15, 2022 at 09:07:06AM -0700, Davidlohr Bueso wrote:
> >> > diff --git a/arch/x86/include/asm/cacheflush.h b/arch/x86/include/asm/cacheflush.h
> >> > index b192d917a6d0..ce2ec9556093 100644
> >> > --- a/arch/x86/include/asm/cacheflush.h
> >> > +++ b/arch/x86/include/asm/cacheflush.h
> >> > @@ -10,4 +10,7 @@
> >> >
> >> >  void clflush_cache_range(void *addr, unsigned int size);
> >> >
> >> > +#define flush_all_caches() \
> >> > +  do { wbinvd_on_all_cpus(); } while(0)
> >> > +
> >>
> >> This is horrific... we've done our utmost best to remove all WBINVD
> >> usage and here you're adding it back in the most horrible form possible
> >> ?!?
> >>
> >> Please don't do this, do *NOT* use WBINVD.
> >
> >Unfortunately there are a few good options here, and the changelog did
> >not make clear that this is continuing legacy [1], not adding new wbinvd
> >usage.
>
> While I was hoping that it was obvious from the intel.c changes that this
> was not a new wbinvd, I can certainly improve the changelog with the below.

I also think this cache_flush_region() API wants a prominent comment
clarifying the limited applicability of this API. I.e. that it is not
for general purpose usage, not for VMs, and only for select bare metal
scenarios that instantaneously invalidate wide swaths of memory.
Otherwise, I can now see how this looks like a potentially scary
expansion of the usage of wbinvd.
Davidlohr Bueso Aug. 16, 2022, 5:52 p.m. UTC | #5
On Tue, 16 Aug 2022, Dan Williams wrote:

>On Tue, Aug 16, 2022 at 10:30 AM Davidlohr Bueso <dave@stgolabs.net> wrote:
>>
>> On Tue, 16 Aug 2022, Dan Williams wrote:
>>
>> >Peter Zijlstra wrote:
>> >> On Mon, Aug 15, 2022 at 09:07:06AM -0700, Davidlohr Bueso wrote:
>> >> > diff --git a/arch/x86/include/asm/cacheflush.h b/arch/x86/include/asm/cacheflush.h
>> >> > index b192d917a6d0..ce2ec9556093 100644
>> >> > --- a/arch/x86/include/asm/cacheflush.h
>> >> > +++ b/arch/x86/include/asm/cacheflush.h
>> >> > @@ -10,4 +10,7 @@
>> >> >
>> >> >  void clflush_cache_range(void *addr, unsigned int size);
>> >> >
>> >> > +#define flush_all_caches() \
>> >> > +  do { wbinvd_on_all_cpus(); } while(0)
>> >> > +
>> >>
>> >> This is horrific... we've done our utmost best to remove all WBINVD
>> >> usage and here you're adding it back in the most horrible form possible
>> >> ?!?
>> >>
>> >> Please don't do this, do *NOT* use WBINVD.
>> >
>> >Unfortunately there are a few good options here, and the changelog did
>> >not make clear that this is continuing legacy [1], not adding new wbinvd
>> >usage.
>>
>> While I was hoping that it was obvious from the intel.c changes that this
>> was not a new wbinvd, I can certainly improve the changelog with the below.
>
>I also think this cache_flush_region() API wants a prominent comment
>clarifying the limited applicability of this API. I.e. that it is not
>for general purpose usage, not for VMs, and only for select bare metal
>scenarios that instantaneously invalidate wide swaths of memory.
>Otherwise, I can now see how this looks like a potentially scary
>expansion of the usage of wbinvd.

Sure.

Also, in the future we might be able to bypass this hammer in the presence
of persistent cpu caches.

Thanks,
Davidlohr
Dan Williams Aug. 16, 2022, 6:49 p.m. UTC | #6
Davidlohr Bueso wrote:
> On Tue, 16 Aug 2022, Dan Williams wrote:
> 
> >On Tue, Aug 16, 2022 at 10:30 AM Davidlohr Bueso <dave@stgolabs.net> wrote:
> >>
> >> On Tue, 16 Aug 2022, Dan Williams wrote:
> >>
> >> >Peter Zijlstra wrote:
> >> >> On Mon, Aug 15, 2022 at 09:07:06AM -0700, Davidlohr Bueso wrote:
> >> >> > diff --git a/arch/x86/include/asm/cacheflush.h b/arch/x86/include/asm/cacheflush.h
> >> >> > index b192d917a6d0..ce2ec9556093 100644
> >> >> > --- a/arch/x86/include/asm/cacheflush.h
> >> >> > +++ b/arch/x86/include/asm/cacheflush.h
> >> >> > @@ -10,4 +10,7 @@
> >> >> >
> >> >> >  void clflush_cache_range(void *addr, unsigned int size);
> >> >> >
> >> >> > +#define flush_all_caches() \
> >> >> > +  do { wbinvd_on_all_cpus(); } while(0)
> >> >> > +
> >> >>
> >> >> This is horrific... we've done our utmost best to remove all WBINVD
> >> >> usage and here you're adding it back in the most horrible form possible
> >> >> ?!?
> >> >>
> >> >> Please don't do this, do *NOT* use WBINVD.
> >> >
> >> >Unfortunately there are a few good options here, and the changelog did
> >> >not make clear that this is continuing legacy [1], not adding new wbinvd
> >> >usage.
> >>
> >> While I was hoping that it was obvious from the intel.c changes that this
> >> was not a new wbinvd, I can certainly improve the changelog with the below.
> >
> >I also think this cache_flush_region() API wants a prominent comment
> >clarifying the limited applicability of this API. I.e. that it is not
> >for general purpose usage, not for VMs, and only for select bare metal
> >scenarios that instantaneously invalidate wide swaths of memory.
> >Otherwise, I can now see how this looks like a potentially scary
> >expansion of the usage of wbinvd.
> 
> Sure.
> 
> Also, in the future we might be able to bypass this hammer in the presence
> of persistent cpu caches.

What would have helped is if the secure-erase and unlock definition in
the specification mandated that the device emit cache invalidations for
everything it has mapped when it is erased. However, that has some
holes, and it also makes me think there is a gap in the current region
provisioning code. If I have device-A mapped at physical-address-X and then
tear that down and instantiate device-B at that same physical address
there needs to be CPU cache invalidation between those 2 events.
Peter Zijlstra Aug. 17, 2022, 7:49 a.m. UTC | #7
On Tue, Aug 16, 2022 at 10:42:03AM -0700, Dan Williams wrote:

> I also think this cache_flush_region() API wants a prominent comment
> clarifying the limited applicability of this API. I.e. that it is not
> for general purpose usage, not for VMs, and only for select bare metal
> scenarios that instantaneously invalidate wide swaths of memory.
> Otherwise, I can now see how this looks like a potentially scary
> expansion of the usage of wbinvd.

This; because adding a generic API like this makes it ripe for usage.
And this is absolutely the very last thing we want used.
Peter Zijlstra Aug. 17, 2022, 7:53 a.m. UTC | #8
On Tue, Aug 16, 2022 at 11:49:59AM -0700, Dan Williams wrote:

> What would have helped is if the secure-erase and unlock definition in
> the specification mandated that the device emit cache invalidations for
> everything it has mapped when it is erased. However, that has some
> holes, and it also makes me think there is a gap in the current region
> provisioning code. If I have device-A mapped at physical-address-X and then
> tear that down and instantiate device-B at that same physical address
> there needs to be CPU cache invalidation between those 2 events.

Can we pretty please get those holes fixed ASAP such that future
generations can avoid the WBINVD nonsense?
diff mbox series

Patch

diff --git a/arch/x86/include/asm/cacheflush.h b/arch/x86/include/asm/cacheflush.h
index b192d917a6d0..ce2ec9556093 100644
--- a/arch/x86/include/asm/cacheflush.h
+++ b/arch/x86/include/asm/cacheflush.h
@@ -10,4 +10,7 @@ 

  void clflush_cache_range(void *addr, unsigned int size);

+#define flush_all_caches() \
+	do { wbinvd_on_all_cpus(); } while(0)
+
  #endif /* _ASM_X86_CACHEFLUSH_H */
diff --git a/drivers/acpi/nfit/intel.c b/drivers/acpi/nfit/intel.c
index 8dd792a55730..f2f6c31e6ab7 100644
--- a/drivers/acpi/nfit/intel.c
+++ b/drivers/acpi/nfit/intel.c
@@ -4,6 +4,7 @@ 
  #include <linux/ndctl.h>
  #include <linux/acpi.h>
  #include <asm/smp.h>
+#include <linux/cacheflush.h>
  #include "intel.h"
  #include "nfit.h"

@@ -190,8 +191,6 @@  static int intel_security_change_key(struct nvdimm *nvdimm,
	}
  }

-static void nvdimm_invalidate_cache(void);
-
  static int __maybe_unused intel_security_unlock(struct nvdimm *nvdimm,
		const struct nvdimm_key_data *key_data)
  {
@@ -210,6 +209,9 @@  static int __maybe_unused intel_security_unlock(struct nvdimm *nvdimm,
	};
	int rc;

+	if (!flush_all_caches_capable())
+		return -EINVAL;
+
	if (!test_bit(NVDIMM_INTEL_UNLOCK_UNIT, &nfit_mem->dsm_mask))
		return -ENOTTY;

@@ -228,7 +230,7 @@  static int __maybe_unused intel_security_unlock(struct nvdimm *nvdimm,
	}

	/* DIMM unlocked, invalidate all CPU caches before we read it */
-	nvdimm_invalidate_cache();
+	flush_all_caches();

	return 0;
  }
@@ -294,11 +296,14 @@  static int __maybe_unused intel_security_erase(struct nvdimm *nvdimm,
		},
	};

+	if (!flush_all_caches_capable())
+		return -EINVAL;
+
	if (!test_bit(cmd, &nfit_mem->dsm_mask))
		return -ENOTTY;

	/* flush all cache before we erase DIMM */
-	nvdimm_invalidate_cache();
+	flush_all_caches();
	memcpy(nd_cmd.cmd.passphrase, key->data,
			sizeof(nd_cmd.cmd.passphrase));
	rc = nvdimm_ctl(nvdimm, ND_CMD_CALL, &nd_cmd, sizeof(nd_cmd), NULL);
@@ -318,7 +323,7 @@  static int __maybe_unused intel_security_erase(struct nvdimm *nvdimm,
	}

	/* DIMM erased, invalidate all CPU caches before we read it */
-	nvdimm_invalidate_cache();
+	flush_all_caches();
	return 0;
  }

@@ -338,6 +343,9 @@  static int __maybe_unused intel_security_query_overwrite(struct nvdimm *nvdimm)
		},
	};

+	if (!flush_all_caches_capable())
+		return -EINVAL;
+
	if (!test_bit(NVDIMM_INTEL_QUERY_OVERWRITE, &nfit_mem->dsm_mask))
		return -ENOTTY;

@@ -355,7 +363,7 @@  static int __maybe_unused intel_security_query_overwrite(struct nvdimm *nvdimm)
	}

	/* flush all cache before we make the nvdimms available */
-	nvdimm_invalidate_cache();
+	flush_all_caches();
	return 0;
  }

@@ -377,11 +385,14 @@  static int __maybe_unused intel_security_overwrite(struct nvdimm *nvdimm,
		},
	};

+	if (!flush_all_caches_capable())
+		return -EINVAL;
+
	if (!test_bit(NVDIMM_INTEL_OVERWRITE, &nfit_mem->dsm_mask))
		return -ENOTTY;

	/* flush all cache before we erase DIMM */
-	nvdimm_invalidate_cache();
+	flush_all_caches();
	memcpy(nd_cmd.cmd.passphrase, nkey->data,
			sizeof(nd_cmd.cmd.passphrase));
	rc = nvdimm_ctl(nvdimm, ND_CMD_CALL, &nd_cmd, sizeof(nd_cmd), NULL);
@@ -401,22 +412,6 @@  static int __maybe_unused intel_security_overwrite(struct nvdimm *nvdimm,
	}
  }

-/*
- * TODO: define a cross arch wbinvd equivalent when/if
- * NVDIMM_FAMILY_INTEL command support arrives on another arch.
- */
-#ifdef CONFIG_X86
-static void nvdimm_invalidate_cache(void)
-{
-	wbinvd_on_all_cpus();
-}
-#else
-static void nvdimm_invalidate_cache(void)
-{
-	WARN_ON_ONCE("cache invalidation required after unlock\n");
-}
-#endif
-
  static const struct nvdimm_security_ops __intel_security_ops = {
	.get_flags = intel_security_flags,
	.freeze = intel_security_freeze,
diff --git a/include/asm-generic/cacheflush.h b/include/asm-generic/cacheflush.h
index 4f07afacbc23..f249142b4908 100644
--- a/include/asm-generic/cacheflush.h
+++ b/include/asm-generic/cacheflush.h
@@ -115,4 +115,26 @@  static inline void flush_cache_vunmap(unsigned long start, unsigned long end)
	memcpy(dst, src, len)
  #endif

+/*
+ * Flush the entire caches across all CPUs. It is considered
+ * a big hammer (latency and performance). Unlike the APIs
+ * above, this function can be defined on architectures which
+ * have VIPT or PIPT caches, and thus is beyond the scope of
+ * virtual to physical mappings/page tables changing.
+ *
+ * The limitation here is that the architectures that make
+ * use of it must can actually comply with the semantics,
+ * such as those which caches are in a consistent state. The
+ * caller can verify the situation early on.
+ */
+#ifndef flush_all_caches
+# define flush_all_caches_capable() false
+static inline void flush_all_caches(void)
+{
+	WARN_ON_ONCE("cache invalidation required\n");
+}
+#else
+# define flush_all_caches_capable() true
+#endif
+
  #endif /* _ASM_GENERIC_CACHEFLUSH_H */