diff mbox series

[updated] libnvdimm/nvdimm/flush: Allow architecture to override the flush barrier

Message ID 20200629202901.83516-1-aneesh.kumar@linux.ibm.com (mailing list archive)
State New, archived
Headers show
Series [updated] libnvdimm/nvdimm/flush: Allow architecture to override the flush barrier | expand

Commit Message

Aneesh Kumar K.V June 29, 2020, 8:29 p.m. UTC
Architectures like ppc64 provide persistent memory specific barriers
that will ensure that all stores for which the modifications are
written to persistent storage by preceding dcbfps and dcbstps
instructions have updated persistent storage before any data
access or data transfer caused by subsequent instructions is initiated.
This is in addition to the ordering done by wmb()

Update nvdimm core such that architecture can use barriers other than
wmb to ensure all previous writes are architecturally visible for
the platform buffer flush.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 drivers/md/dm-writecache.c   | 2 +-
 drivers/nvdimm/region_devs.c | 8 ++++----
 include/linux/libnvdimm.h    | 4 ++++
 3 files changed, 9 insertions(+), 5 deletions(-)

Comments

Dan Williams June 30, 2020, 1:32 a.m. UTC | #1
On Mon, Jun 29, 2020 at 1:29 PM Aneesh Kumar K.V
<aneesh.kumar@linux.ibm.com> wrote:
>
> Architectures like ppc64 provide persistent memory specific barriers
> that will ensure that all stores for which the modifications are
> written to persistent storage by preceding dcbfps and dcbstps
> instructions have updated persistent storage before any data
> access or data transfer caused by subsequent instructions is initiated.
> This is in addition to the ordering done by wmb()
>
> Update nvdimm core such that architecture can use barriers other than
> wmb to ensure all previous writes are architecturally visible for
> the platform buffer flush.
>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> ---
>  drivers/md/dm-writecache.c   | 2 +-
>  drivers/nvdimm/region_devs.c | 8 ++++----
>  include/linux/libnvdimm.h    | 4 ++++
>  3 files changed, 9 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/md/dm-writecache.c b/drivers/md/dm-writecache.c
> index 74f3c506f084..8c6b6dce64e2 100644
> --- a/drivers/md/dm-writecache.c
> +++ b/drivers/md/dm-writecache.c
> @@ -536,7 +536,7 @@ static void ssd_commit_superblock(struct dm_writecache *wc)
>  static void writecache_commit_flushed(struct dm_writecache *wc, bool wait_for_ios)
>  {
>         if (WC_MODE_PMEM(wc))
> -               wmb();
> +               arch_pmem_flush_barrier();
>         else
>                 ssd_commit_flushed(wc, wait_for_ios);
>  }
> diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
> index 4502f9c4708d..b308ad09b63d 100644
> --- a/drivers/nvdimm/region_devs.c
> +++ b/drivers/nvdimm/region_devs.c
> @@ -1206,13 +1206,13 @@ int generic_nvdimm_flush(struct nd_region *nd_region)
>         idx = this_cpu_add_return(flush_idx, hash_32(current->pid + idx, 8));
>
>         /*
> -        * The first wmb() is needed to 'sfence' all previous writes
> -        * such that they are architecturally visible for the platform
> -        * buffer flush.  Note that we've already arranged for pmem
> +        * The first arch_pmem_flush_barrier() is needed to 'sfence' all
> +        * previous writes such that they are architecturally visible for
> +        * the platform buffer flush. Note that we've already arranged for pmem
>          * writes to avoid the cache via memcpy_flushcache().  The final
>          * wmb() ensures ordering for the NVDIMM flush write.
>          */
> -       wmb();
> +       arch_pmem_flush_barrier();
>         for (i = 0; i < nd_region->ndr_mappings; i++)
>                 if (ndrd_get_flush_wpq(ndrd, i, 0))
>                         writeq(1, ndrd_get_flush_wpq(ndrd, i, idx));
> diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
> index 18da4059be09..66f6c65bd789 100644
> --- a/include/linux/libnvdimm.h
> +++ b/include/linux/libnvdimm.h
> @@ -286,4 +286,8 @@ static inline void arch_invalidate_pmem(void *addr, size_t size)
>  }
>  #endif
>
> +#ifndef arch_pmem_flush_barrier
> +#define arch_pmem_flush_barrier() wmb()
> +#endif

I think it is out of place to define this in libnvdimm.h and it is odd
to give it such a long name. The other pmem api helpers like
arch_wb_cache_pmem() and arch_invalidate_pmem() are function calls for
libnvdimm driver operations, this barrier is just an instruction and
is closer to wmb() than the pmem api routine.

Since it is a store fence for pmem, so let's just call it pmem_wmb()
and define the generic version in include/linux/compiler.h. It should
probably also be documented alongside dma_wmb() in
Documentation/memory-barriers.txt about why code would use it over
wmb(), and why a symmetric pmem_rmb() is not needed.
Aneesh Kumar K.V June 30, 2020, 5:01 a.m. UTC | #2
Dan Williams <dan.j.williams@intel.com> writes:

> On Mon, Jun 29, 2020 at 1:29 PM Aneesh Kumar K.V
> <aneesh.kumar@linux.ibm.com> wrote:
>>
>> Architectures like ppc64 provide persistent memory specific barriers
>> that will ensure that all stores for which the modifications are
>> written to persistent storage by preceding dcbfps and dcbstps
>> instructions have updated persistent storage before any data
>> access or data transfer caused by subsequent instructions is initiated.
>> This is in addition to the ordering done by wmb()
>>
>> Update nvdimm core such that architecture can use barriers other than
>> wmb to ensure all previous writes are architecturally visible for
>> the platform buffer flush.
>>
>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>> ---
>>  drivers/md/dm-writecache.c   | 2 +-
>>  drivers/nvdimm/region_devs.c | 8 ++++----
>>  include/linux/libnvdimm.h    | 4 ++++
>>  3 files changed, 9 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/md/dm-writecache.c b/drivers/md/dm-writecache.c
>> index 74f3c506f084..8c6b6dce64e2 100644
>> --- a/drivers/md/dm-writecache.c
>> +++ b/drivers/md/dm-writecache.c
>> @@ -536,7 +536,7 @@ static void ssd_commit_superblock(struct dm_writecache *wc)
>>  static void writecache_commit_flushed(struct dm_writecache *wc, bool wait_for_ios)
>>  {
>>         if (WC_MODE_PMEM(wc))
>> -               wmb();
>> +               arch_pmem_flush_barrier();
>>         else
>>                 ssd_commit_flushed(wc, wait_for_ios);
>>  }
>> diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
>> index 4502f9c4708d..b308ad09b63d 100644
>> --- a/drivers/nvdimm/region_devs.c
>> +++ b/drivers/nvdimm/region_devs.c
>> @@ -1206,13 +1206,13 @@ int generic_nvdimm_flush(struct nd_region *nd_region)
>>         idx = this_cpu_add_return(flush_idx, hash_32(current->pid + idx, 8));
>>
>>         /*
>> -        * The first wmb() is needed to 'sfence' all previous writes
>> -        * such that they are architecturally visible for the platform
>> -        * buffer flush.  Note that we've already arranged for pmem
>> +        * The first arch_pmem_flush_barrier() is needed to 'sfence' all
>> +        * previous writes such that they are architecturally visible for
>> +        * the platform buffer flush. Note that we've already arranged for pmem
>>          * writes to avoid the cache via memcpy_flushcache().  The final
>>          * wmb() ensures ordering for the NVDIMM flush write.
>>          */
>> -       wmb();
>> +       arch_pmem_flush_barrier();
>>         for (i = 0; i < nd_region->ndr_mappings; i++)
>>                 if (ndrd_get_flush_wpq(ndrd, i, 0))
>>                         writeq(1, ndrd_get_flush_wpq(ndrd, i, idx));
>> diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
>> index 18da4059be09..66f6c65bd789 100644
>> --- a/include/linux/libnvdimm.h
>> +++ b/include/linux/libnvdimm.h
>> @@ -286,4 +286,8 @@ static inline void arch_invalidate_pmem(void *addr, size_t size)
>>  }
>>  #endif
>>
>> +#ifndef arch_pmem_flush_barrier
>> +#define arch_pmem_flush_barrier() wmb()
>> +#endif
>
> I think it is out of place to define this in libnvdimm.h and it is odd
> to give it such a long name. The other pmem api helpers like
> arch_wb_cache_pmem() and arch_invalidate_pmem() are function calls for
> libnvdimm driver operations, this barrier is just an instruction and
> is closer to wmb() than the pmem api routine.
>
> Since it is a store fence for pmem, so let's just call it pmem_wmb()
> and define the generic version in include/linux/compiler.h. It should
> probably also be documented alongside dma_wmb() in
> Documentation/memory-barriers.txt about why code would use it over
> wmb(), and why a symmetric pmem_rmb() is not needed.

How about the below? I used pmem_barrier() instead of pmem_wmb(). I
guess we wanted this to order() any data access not jus the following
stores to persistent storage? W.r.t why a symmetric pmem_rmb() is not
needed I was not sure how to explain that. Are you suggesting to explain
why a read/load from persistent storage don't want to wait for
pmem_barrier() ?

modified   Documentation/memory-barriers.txt
@@ -1935,6 +1935,16 @@ There are some more advanced barrier functions:
      relaxed I/O accessors and the Documentation/DMA-API.txt file for more
      information on consistent memory.
 
+ (*) pmem_barrier();
+
+     These are for use with persistent memory to esure the ordering of stores
+     to persistent memory region.
+
+     For example, after a non temporal write to persistent storage we use pmem_barrier()
+     to ensures that stores have updated the persistent storage before
+     any data access or data transfer caused by subsequent instructions is initiated.
+
 
 ===============================
 IMPLICIT KERNEL MEMORY BARRIERS
modified   arch/powerpc/include/asm/barrier.h
@@ -97,6 +97,19 @@ do {									\
 #define barrier_nospec()
 #endif /* CONFIG_PPC_BARRIER_NOSPEC */
 
+/*
+ * pmem_barrier() ensures that all stores for which the modification
+ * are written to persistent storage by preceding dcbfps/dcbstps
+ * instructions have updated persistent storage before any data
+ * access or data transfer caused by subsequent instructions is
+ * initiated.
+ */
+#define pmem_barrier pmem_barrier
+static inline void pmem_barrier(void)
+{
+	asm volatile(PPC_PHWSYNC ::: "memory");
+}
+
 #include <asm-generic/barrier.h>
 
 #endif /* _ASM_POWERPC_BARRIER_H */
modified   include/asm-generic/barrier.h
@@ -257,5 +257,16 @@ do {									\
 })
 #endif
 
+/*
+ * pmem_barrier() ensures that all stores for which the modification
+ * are written to persistent storage by preceding instructions have
+ * updated persistent storage before any data  access or data transfer
+ * caused by subsequent instructions is
+ * initiated.
+ */
+#ifndef pmem_barrier
+#define pmem_barrier  wmb()
+#endif
+
 #endif /* !__ASSEMBLY__ */
 #endif /* __ASM_GENERIC_BARRIER_H */
Dan Williams June 30, 2020, 7:06 a.m. UTC | #3
On Mon, Jun 29, 2020 at 10:02 PM Aneesh Kumar K.V
<aneesh.kumar@linux.ibm.com> wrote:
>
> Dan Williams <dan.j.williams@intel.com> writes:
>
> > On Mon, Jun 29, 2020 at 1:29 PM Aneesh Kumar K.V
> > <aneesh.kumar@linux.ibm.com> wrote:
> >>
> >> Architectures like ppc64 provide persistent memory specific barriers
> >> that will ensure that all stores for which the modifications are
> >> written to persistent storage by preceding dcbfps and dcbstps
> >> instructions have updated persistent storage before any data
> >> access or data transfer caused by subsequent instructions is initiated.
> >> This is in addition to the ordering done by wmb()
> >>
> >> Update nvdimm core such that architecture can use barriers other than
> >> wmb to ensure all previous writes are architecturally visible for
> >> the platform buffer flush.
> >>
> >> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> >> ---
> >>  drivers/md/dm-writecache.c   | 2 +-
> >>  drivers/nvdimm/region_devs.c | 8 ++++----
> >>  include/linux/libnvdimm.h    | 4 ++++
> >>  3 files changed, 9 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/drivers/md/dm-writecache.c b/drivers/md/dm-writecache.c
> >> index 74f3c506f084..8c6b6dce64e2 100644
> >> --- a/drivers/md/dm-writecache.c
> >> +++ b/drivers/md/dm-writecache.c
> >> @@ -536,7 +536,7 @@ static void ssd_commit_superblock(struct dm_writecache *wc)
> >>  static void writecache_commit_flushed(struct dm_writecache *wc, bool wait_for_ios)
> >>  {
> >>         if (WC_MODE_PMEM(wc))
> >> -               wmb();
> >> +               arch_pmem_flush_barrier();
> >>         else
> >>                 ssd_commit_flushed(wc, wait_for_ios);
> >>  }
> >> diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
> >> index 4502f9c4708d..b308ad09b63d 100644
> >> --- a/drivers/nvdimm/region_devs.c
> >> +++ b/drivers/nvdimm/region_devs.c
> >> @@ -1206,13 +1206,13 @@ int generic_nvdimm_flush(struct nd_region *nd_region)
> >>         idx = this_cpu_add_return(flush_idx, hash_32(current->pid + idx, 8));
> >>
> >>         /*
> >> -        * The first wmb() is needed to 'sfence' all previous writes
> >> -        * such that they are architecturally visible for the platform
> >> -        * buffer flush.  Note that we've already arranged for pmem
> >> +        * The first arch_pmem_flush_barrier() is needed to 'sfence' all
> >> +        * previous writes such that they are architecturally visible for
> >> +        * the platform buffer flush. Note that we've already arranged for pmem
> >>          * writes to avoid the cache via memcpy_flushcache().  The final
> >>          * wmb() ensures ordering for the NVDIMM flush write.
> >>          */
> >> -       wmb();
> >> +       arch_pmem_flush_barrier();
> >>         for (i = 0; i < nd_region->ndr_mappings; i++)
> >>                 if (ndrd_get_flush_wpq(ndrd, i, 0))
> >>                         writeq(1, ndrd_get_flush_wpq(ndrd, i, idx));
> >> diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
> >> index 18da4059be09..66f6c65bd789 100644
> >> --- a/include/linux/libnvdimm.h
> >> +++ b/include/linux/libnvdimm.h
> >> @@ -286,4 +286,8 @@ static inline void arch_invalidate_pmem(void *addr, size_t size)
> >>  }
> >>  #endif
> >>
> >> +#ifndef arch_pmem_flush_barrier
> >> +#define arch_pmem_flush_barrier() wmb()
> >> +#endif
> >
> > I think it is out of place to define this in libnvdimm.h and it is odd
> > to give it such a long name. The other pmem api helpers like
> > arch_wb_cache_pmem() and arch_invalidate_pmem() are function calls for
> > libnvdimm driver operations, this barrier is just an instruction and
> > is closer to wmb() than the pmem api routine.
> >
> > Since it is a store fence for pmem, so let's just call it pmem_wmb()
> > and define the generic version in include/linux/compiler.h. It should
> > probably also be documented alongside dma_wmb() in
> > Documentation/memory-barriers.txt about why code would use it over
> > wmb(), and why a symmetric pmem_rmb() is not needed.
>
> How about the below? I used pmem_barrier() instead of pmem_wmb().

Why? A barrier() is a bi-directional ordering mechanic for reads and
writes, and the proposed semantics mechanism only orders writes +
persistence. Otherwise the default fallback to wmb() on archs that
don't override it does not make sense.

> I
> guess we wanted this to order() any data access not jus the following
> stores to persistent storage?

Why?

> W.r.t why a symmetric pmem_rmb() is not
> needed I was not sure how to explain that. Are you suggesting to explain
> why a read/load from persistent storage don't want to wait for
> pmem_barrier() ?

I would expect that the explanation is that a typical rmb() is
sufficient and that there is nothing pmem specific semantic for read
ordering for pmem vs normal read-barrier semantics.

>
> modified   Documentation/memory-barriers.txt
> @@ -1935,6 +1935,16 @@ There are some more advanced barrier functions:
>       relaxed I/O accessors and the Documentation/DMA-API.txt file for more
>       information on consistent memory.
>
> + (*) pmem_barrier();
> +
> +     These are for use with persistent memory to esure the ordering of stores
> +     to persistent memory region.

If it was just ordering I would expect a typical wmb() to be
sufficient, why is the pmem-specific instruction needed? I thought it
was handshaking with hardware to ensure acceptance into a persistence
domain *in addition* to ordering the stores.

> +     For example, after a non temporal write to persistent storage we use pmem_barrier()
> +     to ensures that stores have updated the persistent storage before
> +     any data access or data transfer caused by subsequent instructions is initiated.

Isn't the ordering aspect is irrelevant relative to traditional wmb()?
For example if you used the wrong sync instruction the store ordering
will still be correct it would just not persist at the same time as
barrier completes. Or am I misunderstanding how these new instructions
are distinct?

> +
>
>  ===============================
>  IMPLICIT KERNEL MEMORY BARRIERS
> modified   arch/powerpc/include/asm/barrier.h
> @@ -97,6 +97,19 @@ do {                                                                 \
>  #define barrier_nospec()
>  #endif /* CONFIG_PPC_BARRIER_NOSPEC */
>
> +/*
> + * pmem_barrier() ensures that all stores for which the modification
> + * are written to persistent storage by preceding dcbfps/dcbstps
> + * instructions have updated persistent storage before any data
> + * access or data transfer caused by subsequent instructions is
> + * initiated.
> + */
> +#define pmem_barrier pmem_barrier
> +static inline void pmem_barrier(void)
> +{
> +       asm volatile(PPC_PHWSYNC ::: "memory");
> +}
> +
>  #include <asm-generic/barrier.h>
>
>  #endif /* _ASM_POWERPC_BARRIER_H */
> modified   include/asm-generic/barrier.h
> @@ -257,5 +257,16 @@ do {                                                                       \
>  })
>  #endif
>
> +/*
> + * pmem_barrier() ensures that all stores for which the modification
> + * are written to persistent storage by preceding instructions have
> + * updated persistent storage before any data  access or data transfer
> + * caused by subsequent instructions is
> + * initiated.
> + */
> +#ifndef pmem_barrier
> +#define pmem_barrier  wmb()
> +#endif
> +
>  #endif /* !__ASSEMBLY__ */
>  #endif /* __ASM_GENERIC_BARRIER_H */
>
Aneesh Kumar K.V June 30, 2020, 7:22 a.m. UTC | #4
On 6/30/20 12:36 PM, Dan Williams wrote:
> On Mon, Jun 29, 2020 at 10:02 PM Aneesh Kumar K.V
> <aneesh.kumar@linux.ibm.com> wrote:
>>
>> Dan Williams <dan.j.williams@intel.com> writes:
>>
>>> On Mon, Jun 29, 2020 at 1:29 PM Aneesh Kumar K.V
>>> <aneesh.kumar@linux.ibm.com> wrote:
>>>>
>>>> Architectures like ppc64 provide persistent memory specific barriers
>>>> that will ensure that all stores for which the modifications are
>>>> written to persistent storage by preceding dcbfps and dcbstps
>>>> instructions have updated persistent storage before any data
>>>> access or data transfer caused by subsequent instructions is initiated.
>>>> This is in addition to the ordering done by wmb()
>>>>
>>>> Update nvdimm core such that architecture can use barriers other than
>>>> wmb to ensure all previous writes are architecturally visible for
>>>> the platform buffer flush.
>>>>
>>>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>>>> ---
>>>>   drivers/md/dm-writecache.c   | 2 +-
>>>>   drivers/nvdimm/region_devs.c | 8 ++++----
>>>>   include/linux/libnvdimm.h    | 4 ++++
>>>>   3 files changed, 9 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/drivers/md/dm-writecache.c b/drivers/md/dm-writecache.c
>>>> index 74f3c506f084..8c6b6dce64e2 100644
>>>> --- a/drivers/md/dm-writecache.c
>>>> +++ b/drivers/md/dm-writecache.c
>>>> @@ -536,7 +536,7 @@ static void ssd_commit_superblock(struct dm_writecache *wc)
>>>>   static void writecache_commit_flushed(struct dm_writecache *wc, bool wait_for_ios)
>>>>   {
>>>>          if (WC_MODE_PMEM(wc))
>>>> -               wmb();
>>>> +               arch_pmem_flush_barrier();
>>>>          else
>>>>                  ssd_commit_flushed(wc, wait_for_ios);
>>>>   }
>>>> diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
>>>> index 4502f9c4708d..b308ad09b63d 100644
>>>> --- a/drivers/nvdimm/region_devs.c
>>>> +++ b/drivers/nvdimm/region_devs.c
>>>> @@ -1206,13 +1206,13 @@ int generic_nvdimm_flush(struct nd_region *nd_region)
>>>>          idx = this_cpu_add_return(flush_idx, hash_32(current->pid + idx, 8));
>>>>
>>>>          /*
>>>> -        * The first wmb() is needed to 'sfence' all previous writes
>>>> -        * such that they are architecturally visible for the platform
>>>> -        * buffer flush.  Note that we've already arranged for pmem
>>>> +        * The first arch_pmem_flush_barrier() is needed to 'sfence' all
>>>> +        * previous writes such that they are architecturally visible for
>>>> +        * the platform buffer flush. Note that we've already arranged for pmem
>>>>           * writes to avoid the cache via memcpy_flushcache().  The final
>>>>           * wmb() ensures ordering for the NVDIMM flush write.
>>>>           */
>>>> -       wmb();
>>>> +       arch_pmem_flush_barrier();
>>>>          for (i = 0; i < nd_region->ndr_mappings; i++)
>>>>                  if (ndrd_get_flush_wpq(ndrd, i, 0))
>>>>                          writeq(1, ndrd_get_flush_wpq(ndrd, i, idx));
>>>> diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
>>>> index 18da4059be09..66f6c65bd789 100644
>>>> --- a/include/linux/libnvdimm.h
>>>> +++ b/include/linux/libnvdimm.h
>>>> @@ -286,4 +286,8 @@ static inline void arch_invalidate_pmem(void *addr, size_t size)
>>>>   }
>>>>   #endif
>>>>
>>>> +#ifndef arch_pmem_flush_barrier
>>>> +#define arch_pmem_flush_barrier() wmb()
>>>> +#endif
>>>
>>> I think it is out of place to define this in libnvdimm.h and it is odd
>>> to give it such a long name. The other pmem api helpers like
>>> arch_wb_cache_pmem() and arch_invalidate_pmem() are function calls for
>>> libnvdimm driver operations, this barrier is just an instruction and
>>> is closer to wmb() than the pmem api routine.
>>>
>>> Since it is a store fence for pmem, so let's just call it pmem_wmb()
>>> and define the generic version in include/linux/compiler.h. It should
>>> probably also be documented alongside dma_wmb() in
>>> Documentation/memory-barriers.txt about why code would use it over
>>> wmb(), and why a symmetric pmem_rmb() is not needed.
>>
>> How about the below? I used pmem_barrier() instead of pmem_wmb().
> 
> Why? A barrier() is a bi-directional ordering mechanic for reads and
> writes, and the proposed semantics mechanism only orders writes +
> persistence. Otherwise the default fallback to wmb() on archs that
> don't override it does not make sense.
> 
>> I
>> guess we wanted this to order() any data access not jus the following
>> stores to persistent storage?
> 
> Why?
> 
>> W.r.t why a symmetric pmem_rmb() is not
>> needed I was not sure how to explain that. Are you suggesting to explain
>> why a read/load from persistent storage don't want to wait for
>> pmem_barrier() ?
> 
> I would expect that the explanation is that a typical rmb() is
> sufficient and that there is nothing pmem specific semantic for read
> ordering for pmem vs normal read-barrier semantics.
> 
>>
>> modified   Documentation/memory-barriers.txt
>> @@ -1935,6 +1935,16 @@ There are some more advanced barrier functions:
>>        relaxed I/O accessors and the Documentation/DMA-API.txt file for more
>>        information on consistent memory.
>>
>> + (*) pmem_barrier();
>> +
>> +     These are for use with persistent memory to esure the ordering of stores
>> +     to persistent memory region.
> 
> If it was just ordering I would expect a typical wmb() to be
> sufficient, why is the pmem-specific instruction needed? I thought it
> was handshaking with hardware to ensure acceptance into a persistence
> domain *in addition* to ordering the stores.
> 
>> +     For example, after a non temporal write to persistent storage we use pmem_barrier()
>> +     to ensures that stores have updated the persistent storage before
>> +     any data access or data transfer caused by subsequent instructions is initiated.
> 
> Isn't the ordering aspect is irrelevant relative to traditional wmb()?
> For example if you used the wrong sync instruction the store ordering
> will still be correct it would just not persist at the same time as
> barrier completes. Or am I misunderstanding how these new instructions
> are distinct?
> 

That is correct.

-aneesh
Aneesh Kumar K.V June 30, 2020, 7:53 a.m. UTC | #5
On 6/30/20 12:52 PM, Aneesh Kumar K.V wrote:
> On 6/30/20 12:36 PM, Dan Williams wrote:
>> On Mon, Jun 29, 2020 at 10:02 PM Aneesh Kumar K.V
>> <aneesh.kumar@linux.ibm.com> wrote:
>>>
>>> Dan Williams <dan.j.williams@intel.com> writes:
>>>
>>>> On Mon, Jun 29, 2020 at 1:29 PM Aneesh Kumar K.V
>>>> <aneesh.kumar@linux.ibm.com> wrote:
>>>>>
>>>>> Architectures like ppc64 provide persistent memory specific barriers
>>>>> that will ensure that all stores for which the modifications are
>>>>> written to persistent storage by preceding dcbfps and dcbstps
>>>>> instructions have updated persistent storage before any data
>>>>> access or data transfer caused by subsequent instructions is 
>>>>> initiated.
>>>>> This is in addition to the ordering done by wmb()
>>>>>
>>>>> Update nvdimm core such that architecture can use barriers other than
>>>>> wmb to ensure all previous writes are architecturally visible for
>>>>> the platform buffer flush.
>>>>>
>>>>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>>>>> ---
>>>>>   drivers/md/dm-writecache.c   | 2 +-
>>>>>   drivers/nvdimm/region_devs.c | 8 ++++----
>>>>>   include/linux/libnvdimm.h    | 4 ++++
>>>>>   3 files changed, 9 insertions(+), 5 deletions(-)
>>>>>
>>>>> diff --git a/drivers/md/dm-writecache.c b/drivers/md/dm-writecache.c
>>>>> index 74f3c506f084..8c6b6dce64e2 100644
>>>>> --- a/drivers/md/dm-writecache.c
>>>>> +++ b/drivers/md/dm-writecache.c
>>>>> @@ -536,7 +536,7 @@ static void ssd_commit_superblock(struct 
>>>>> dm_writecache *wc)
>>>>>   static void writecache_commit_flushed(struct dm_writecache *wc, 
>>>>> bool wait_for_ios)
>>>>>   {
>>>>>          if (WC_MODE_PMEM(wc))
>>>>> -               wmb();
>>>>> +               arch_pmem_flush_barrier();
>>>>>          else
>>>>>                  ssd_commit_flushed(wc, wait_for_ios);
>>>>>   }
>>>>> diff --git a/drivers/nvdimm/region_devs.c 
>>>>> b/drivers/nvdimm/region_devs.c
>>>>> index 4502f9c4708d..b308ad09b63d 100644
>>>>> --- a/drivers/nvdimm/region_devs.c
>>>>> +++ b/drivers/nvdimm/region_devs.c
>>>>> @@ -1206,13 +1206,13 @@ int generic_nvdimm_flush(struct nd_region 
>>>>> *nd_region)
>>>>>          idx = this_cpu_add_return(flush_idx, hash_32(current->pid 
>>>>> + idx, 8));
>>>>>
>>>>>          /*
>>>>> -        * The first wmb() is needed to 'sfence' all previous writes
>>>>> -        * such that they are architecturally visible for the platform
>>>>> -        * buffer flush.  Note that we've already arranged for pmem
>>>>> +        * The first arch_pmem_flush_barrier() is needed to 
>>>>> 'sfence' all
>>>>> +        * previous writes such that they are architecturally 
>>>>> visible for
>>>>> +        * the platform buffer flush. Note that we've already 
>>>>> arranged for pmem
>>>>>           * writes to avoid the cache via memcpy_flushcache().  The 
>>>>> final
>>>>>           * wmb() ensures ordering for the NVDIMM flush write.
>>>>>           */
>>>>> -       wmb();
>>>>> +       arch_pmem_flush_barrier();
>>>>>          for (i = 0; i < nd_region->ndr_mappings; i++)
>>>>>                  if (ndrd_get_flush_wpq(ndrd, i, 0))
>>>>>                          writeq(1, ndrd_get_flush_wpq(ndrd, i, idx));
>>>>> diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
>>>>> index 18da4059be09..66f6c65bd789 100644
>>>>> --- a/include/linux/libnvdimm.h
>>>>> +++ b/include/linux/libnvdimm.h
>>>>> @@ -286,4 +286,8 @@ static inline void arch_invalidate_pmem(void 
>>>>> *addr, size_t size)
>>>>>   }
>>>>>   #endif
>>>>>
>>>>> +#ifndef arch_pmem_flush_barrier
>>>>> +#define arch_pmem_flush_barrier() wmb()
>>>>> +#endif
>>>>
>>>> I think it is out of place to define this in libnvdimm.h and it is odd
>>>> to give it such a long name. The other pmem api helpers like
>>>> arch_wb_cache_pmem() and arch_invalidate_pmem() are function calls for
>>>> libnvdimm driver operations, this barrier is just an instruction and
>>>> is closer to wmb() than the pmem api routine.
>>>>
>>>> Since it is a store fence for pmem, so let's just call it pmem_wmb()
>>>> and define the generic version in include/linux/compiler.h. It should
>>>> probably also be documented alongside dma_wmb() in
>>>> Documentation/memory-barriers.txt about why code would use it over
>>>> wmb(), and why a symmetric pmem_rmb() is not needed.
>>>
>>> How about the below? I used pmem_barrier() instead of pmem_wmb().
>>
>> Why? A barrier() is a bi-directional ordering mechanic for reads and
>> writes, and the proposed semantics mechanism only orders writes +
>> persistence. Otherwise the default fallback to wmb() on archs that
>> don't override it does not make sense.
>>
>>> I
>>> guess we wanted this to order() any data access not jus the following
>>> stores to persistent storage?
>>
>> Why?
>>
>>> W.r.t why a symmetric pmem_rmb() is not
>>> needed I was not sure how to explain that. Are you suggesting to explain
>>> why a read/load from persistent storage don't want to wait for
>>> pmem_barrier() ?
>>
>> I would expect that the explanation is that a typical rmb() is
>> sufficient and that there is nothing pmem specific semantic for read
>> ordering for pmem vs normal read-barrier semantics.
>>

Should that be rmb()? A smp_rmb() would suffice right?


-aneesh
Aneesh Kumar K.V June 30, 2020, 12:48 p.m. UTC | #6
Update patch. 

From 1e6aa6c4182e14ec5d6bf878ae44c3f69ebff745 Mon Sep 17 00:00:00 2001
From: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Date: Tue, 12 May 2020 20:58:33 +0530
Subject: [PATCH] libnvdimm/nvdimm/flush: Allow architecture to override the
 flush barrier

Architectures like ppc64 provide persistent memory specific barriers
that will ensure that all stores for which the modifications are
written to persistent storage by preceding dcbfps and dcbstps
instructions have updated persistent storage before any data
access or data transfer caused by subsequent instructions is initiated.
This is in addition to the ordering done by wmb()

Update nvdimm core such that architecture can use barriers other than
wmb to ensure all previous writes are architecturally visible for
the platform buffer flush.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 Documentation/memory-barriers.txt | 14 ++++++++++++++
 drivers/md/dm-writecache.c        |  2 +-
 drivers/nvdimm/region_devs.c      |  8 ++++----
 include/asm-generic/barrier.h     | 10 ++++++++++
 4 files changed, 29 insertions(+), 5 deletions(-)

diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index eaabc3134294..340273a6b18e 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -1935,6 +1935,20 @@ There are some more advanced barrier functions:
      relaxed I/O accessors and the Documentation/DMA-API.txt file for more
      information on consistent memory.
 
+ (*) pmem_wmb();
+
+     This is for use with persistent memory to ensure that stores for which
+     modifications are written to persistent storage have updated the persistent
+     storage.
+
+     For example, after a non-temporal write to pmem region, we use pmem_wmb()
+     to ensures that stores have updated the persistent storage. This ensures
+     that stores have updated persistent storage before any data access or
+     data transfer caused by subsequent instructions is initiated. This is
+     in addition to the ordering done by wmb().
+
+     For load from persistent memory, existing read memory barriers are sufficient
+     to ensure read ordering.
 
 ===============================
 IMPLICIT KERNEL MEMORY BARRIERS
diff --git a/drivers/md/dm-writecache.c b/drivers/md/dm-writecache.c
index 74f3c506f084..00534fa4a384 100644
--- a/drivers/md/dm-writecache.c
+++ b/drivers/md/dm-writecache.c
@@ -536,7 +536,7 @@ static void ssd_commit_superblock(struct dm_writecache *wc)
 static void writecache_commit_flushed(struct dm_writecache *wc, bool wait_for_ios)
 {
 	if (WC_MODE_PMEM(wc))
-		wmb();
+		pmem_wmb();
 	else
 		ssd_commit_flushed(wc, wait_for_ios);
 }
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index 4502f9c4708d..2333b290bdcf 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -1206,13 +1206,13 @@ int generic_nvdimm_flush(struct nd_region *nd_region)
 	idx = this_cpu_add_return(flush_idx, hash_32(current->pid + idx, 8));
 
 	/*
-	 * The first wmb() is needed to 'sfence' all previous writes
-	 * such that they are architecturally visible for the platform
-	 * buffer flush.  Note that we've already arranged for pmem
+	 * The first arch_pmem_flush_barrier() is needed to 'sfence' all
+	 * previous writes such that they are architecturally visible for
+	 * the platform buffer flush. Note that we've already arranged for pmem
 	 * writes to avoid the cache via memcpy_flushcache().  The final
 	 * wmb() ensures ordering for the NVDIMM flush write.
 	 */
-	wmb();
+	pmem_wmb();
 	for (i = 0; i < nd_region->ndr_mappings; i++)
 		if (ndrd_get_flush_wpq(ndrd, i, 0))
 			writeq(1, ndrd_get_flush_wpq(ndrd, i, idx));
diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h
index 2eacaf7d62f6..879d68faec1d 100644
--- a/include/asm-generic/barrier.h
+++ b/include/asm-generic/barrier.h
@@ -257,5 +257,15 @@ do {									\
 })
 #endif
 
+/*
+ * pmem_barrier() ensures that all stores for which the modification
+ * are written to persistent storage by preceding instructions have
+ * updated persistent storage before any data  access or data transfer
+ * caused by subsequent instructions is initiated.
+ */
+#ifndef pmem_wmb
+#define pmem_wmb()	wmb()
+#endif
+
 #endif /* !__ASSEMBLY__ */
 #endif /* __ASM_GENERIC_BARRIER_H */
Dan Williams June 30, 2020, 7:21 p.m. UTC | #7
On Tue, Jun 30, 2020 at 5:48 AM Aneesh Kumar K.V
<aneesh.kumar@linux.ibm.com> wrote:
>
>
> Update patch.
>
> From 1e6aa6c4182e14ec5d6bf878ae44c3f69ebff745 Mon Sep 17 00:00:00 2001
> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
> Date: Tue, 12 May 2020 20:58:33 +0530
> Subject: [PATCH] libnvdimm/nvdimm/flush: Allow architecture to override the
>  flush barrier
>
> Architectures like ppc64 provide persistent memory specific barriers
> that will ensure that all stores for which the modifications are
> written to persistent storage by preceding dcbfps and dcbstps
> instructions have updated persistent storage before any data
> access or data transfer caused by subsequent instructions is initiated.
> This is in addition to the ordering done by wmb()
>
> Update nvdimm core such that architecture can use barriers other than
> wmb to ensure all previous writes are architecturally visible for
> the platform buffer flush.

Looks good, after a few minor fixups below you can add:

Reviewed-by: Dan Williams <dan.j.williams@intel.com>

I'm expecting that these will be merged through the powerpc tree since
they mostly impact powerpc with only minor touches to libnvdimm.

> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> ---
>  Documentation/memory-barriers.txt | 14 ++++++++++++++
>  drivers/md/dm-writecache.c        |  2 +-
>  drivers/nvdimm/region_devs.c      |  8 ++++----
>  include/asm-generic/barrier.h     | 10 ++++++++++
>  4 files changed, 29 insertions(+), 5 deletions(-)
>
> diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
> index eaabc3134294..340273a6b18e 100644
> --- a/Documentation/memory-barriers.txt
> +++ b/Documentation/memory-barriers.txt
> @@ -1935,6 +1935,20 @@ There are some more advanced barrier functions:
>       relaxed I/O accessors and the Documentation/DMA-API.txt file for more
>       information on consistent memory.
>
> + (*) pmem_wmb();
> +
> +     This is for use with persistent memory to ensure that stores for which
> +     modifications are written to persistent storage have updated the persistent
> +     storage.

I think this should be:

s/updated the persistent storage/reached a platform durability domain/

> +
> +     For example, after a non-temporal write to pmem region, we use pmem_wmb()
> +     to ensures that stores have updated the persistent storage. This ensures

s/ensures/ensure/

...and the same comment about "persistent storage" because pmem_wmb()
as implemented on x86 does not guarantee that the writes have reached
storage it ensures that writes have reached buffers / queues that are
within the ADR (platform persistence / durability) domain.

> +     that stores have updated persistent storage before any data access or
> +     data transfer caused by subsequent instructions is initiated. This is
> +     in addition to the ordering done by wmb().
> +
> +     For load from persistent memory, existing read memory barriers are sufficient
> +     to ensure read ordering.
>
>  ===============================
>  IMPLICIT KERNEL MEMORY BARRIERS
> diff --git a/drivers/md/dm-writecache.c b/drivers/md/dm-writecache.c
> index 74f3c506f084..00534fa4a384 100644
> --- a/drivers/md/dm-writecache.c
> +++ b/drivers/md/dm-writecache.c
> @@ -536,7 +536,7 @@ static void ssd_commit_superblock(struct dm_writecache *wc)
>  static void writecache_commit_flushed(struct dm_writecache *wc, bool wait_for_ios)
>  {
>         if (WC_MODE_PMEM(wc))
> -               wmb();
> +               pmem_wmb();
>         else
>                 ssd_commit_flushed(wc, wait_for_ios);
>  }
> diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
> index 4502f9c4708d..2333b290bdcf 100644
> --- a/drivers/nvdimm/region_devs.c
> +++ b/drivers/nvdimm/region_devs.c
> @@ -1206,13 +1206,13 @@ int generic_nvdimm_flush(struct nd_region *nd_region)
>         idx = this_cpu_add_return(flush_idx, hash_32(current->pid + idx, 8));
>
>         /*
> -        * The first wmb() is needed to 'sfence' all previous writes
> -        * such that they are architecturally visible for the platform
> -        * buffer flush.  Note that we've already arranged for pmem
> +        * The first arch_pmem_flush_barrier() is needed to 'sfence' all

One missed arch_pmem_flush_barrier() rename.

> +        * previous writes such that they are architecturally visible for
> +        * the platform buffer flush. Note that we've already arranged for pmem
>          * writes to avoid the cache via memcpy_flushcache().  The final
>          * wmb() ensures ordering for the NVDIMM flush write.
>          */
> -       wmb();
> +       pmem_wmb();
>         for (i = 0; i < nd_region->ndr_mappings; i++)
>                 if (ndrd_get_flush_wpq(ndrd, i, 0))
>                         writeq(1, ndrd_get_flush_wpq(ndrd, i, idx));
> diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h
> index 2eacaf7d62f6..879d68faec1d 100644
> --- a/include/asm-generic/barrier.h
> +++ b/include/asm-generic/barrier.h
> @@ -257,5 +257,15 @@ do {                                                                       \
>  })
>  #endif
>
> +/*
> + * pmem_barrier() ensures that all stores for which the modification

One missed pmem_barrier() conversion.

> + * are written to persistent storage by preceding instructions have
> + * updated persistent storage before any data  access or data transfer
> + * caused by subsequent instructions is initiated.
> + */
> +#ifndef pmem_wmb
> +#define pmem_wmb()     wmb()
> +#endif
> +
>  #endif /* !__ASSEMBLY__ */
>  #endif /* __ASM_GENERIC_BARRIER_H */
> --
> 2.26.2
>
diff mbox series

Patch

diff --git a/drivers/md/dm-writecache.c b/drivers/md/dm-writecache.c
index 74f3c506f084..8c6b6dce64e2 100644
--- a/drivers/md/dm-writecache.c
+++ b/drivers/md/dm-writecache.c
@@ -536,7 +536,7 @@  static void ssd_commit_superblock(struct dm_writecache *wc)
 static void writecache_commit_flushed(struct dm_writecache *wc, bool wait_for_ios)
 {
 	if (WC_MODE_PMEM(wc))
-		wmb();
+		arch_pmem_flush_barrier();
 	else
 		ssd_commit_flushed(wc, wait_for_ios);
 }
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index 4502f9c4708d..b308ad09b63d 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -1206,13 +1206,13 @@  int generic_nvdimm_flush(struct nd_region *nd_region)
 	idx = this_cpu_add_return(flush_idx, hash_32(current->pid + idx, 8));
 
 	/*
-	 * The first wmb() is needed to 'sfence' all previous writes
-	 * such that they are architecturally visible for the platform
-	 * buffer flush.  Note that we've already arranged for pmem
+	 * The first arch_pmem_flush_barrier() is needed to 'sfence' all
+	 * previous writes such that they are architecturally visible for
+	 * the platform buffer flush. Note that we've already arranged for pmem
 	 * writes to avoid the cache via memcpy_flushcache().  The final
 	 * wmb() ensures ordering for the NVDIMM flush write.
 	 */
-	wmb();
+	arch_pmem_flush_barrier();
 	for (i = 0; i < nd_region->ndr_mappings; i++)
 		if (ndrd_get_flush_wpq(ndrd, i, 0))
 			writeq(1, ndrd_get_flush_wpq(ndrd, i, idx));
diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
index 18da4059be09..66f6c65bd789 100644
--- a/include/linux/libnvdimm.h
+++ b/include/linux/libnvdimm.h
@@ -286,4 +286,8 @@  static inline void arch_invalidate_pmem(void *addr, size_t size)
 }
 #endif
 
+#ifndef arch_pmem_flush_barrier
+#define arch_pmem_flush_barrier() wmb()
+#endif
+
 #endif /* __LIBNVDIMM_H__ */