Message ID | 20200610062343.492293-1-aneesh.kumar@linux.ibm.com (mailing list archive) |
---|---|
Headers | show |
Series | Support new pmem flush and sync instructions for POWER | expand |
"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes: > This patch series enables the usage os new pmem flush and sync instructions on POWER > architecture. POWER10 introduces two new variants of dcbf instructions (dcbstps and dcbfps) > that can be used to write modified locations back to persistent storage. Additionally, > POWER10 also introduce phwsync and plwsync which can be used to establish order of these > writes to persistent storage. > > This series exposes these instructions to the rest of the kernel. The existing > dcbf and hwsync instructions in P8 and P9 are adequate to enable appropriate > synchronization with OpenCAPI-hosted persistent storage. Hence the new instructions > are added as a variant of the old ones that old hardware won't differentiate. > > On POWER10, pmem devices will be represented by a different device tree compat > strings. This ensures that older kernels won't initialize pmem devices on POWER10. > > W.r.t userspace we want to make sure applications are enabled to use MAP_SYNC only > if they are using the new instructions. To avoid the wrong usage of MAP_SYNC on > newer hardware, we disable MAP_SYNC by default on newer hardware. The namespace specific > attribute /sys/block/pmem0/dax/sync_fault can be used to enable MAP_SYNC later. > > With this: > 1) vPMEM continues to work since it is a volatile region. That > doesn't need any flush instructions. > > 2) pmdk and other user applications get updated to use new instructions > and updated packages are made available to all distributions > > 3) On newer hardware, the device will appear with a new compat string. > Hence older distributions won't initialize pmem on newer hardware. > > 4) If we have a newer kernel with an older distro, we use the per > namespace sysfs knob that prevents the usage of MAP_SYNC. > > 5) Sometime in the future, we mark the CONFIG_ARCH_MAP_SYNC_DISABLE=n > on ppc64 when we are confident that everybody is using the new flush > instruction. > > Chaanges from V4: > * Add namespace specific sychronous fault control. > > Changes from V3: > * Add new compat string to be used for the device. > * Use arch_pmem_flush_barrier() in dm-writecache. > > Aneesh Kumar K.V (10): > powerpc/pmem: Restrict papr_scm to P8 and above. > powerpc/pmem: Add new instructions for persistent storage and sync > powerpc/pmem: Add flush routines using new pmem store and sync > instruction > libnvdimm/nvdimm/flush: Allow architecture to override the flush > barrier > powerpc/pmem/of_pmem: Update of_pmem to use the new barrier > instruction. > powerpc/pmem: Avoid the barrier in flush routines > powerpc/book3s/pmem: Add WARN_ONCE to catch the wrong usage of pmem > flush functions. > libnvdimm/dax: Add a dax flag to control synchronous fault support > powerpc/pmem: Disable synchronous fault by default > powerpc/pmem: Initialize pmem device on newer hardware > > arch/powerpc/include/asm/cacheflush.h | 10 ++++ > arch/powerpc/include/asm/ppc-opcode.h | 12 ++++ > arch/powerpc/lib/pmem.c | 46 ++++++++++++-- > arch/powerpc/platforms/Kconfig.cputype | 9 +++ > arch/powerpc/platforms/pseries/papr_scm.c | 31 +++++++++- > arch/powerpc/platforms/pseries/pmem.c | 6 ++ > drivers/dax/bus.c | 2 +- > drivers/dax/super.c | 73 +++++++++++++++++++++++ > drivers/md/dm-writecache.c | 2 +- > drivers/nvdimm/of_pmem.c | 8 +++ > drivers/nvdimm/pmem.c | 4 ++ > drivers/nvdimm/region_devs.c | 24 ++++++-- > include/linux/dax.h | 16 +++++ > include/linux/libnvdimm.h | 8 +++ > mm/Kconfig | 3 + > 15 files changed, 243 insertions(+), 11 deletions(-) Ping. Are we good with the approach here? -aneesh