diff mbox

[v2,2/2] parisc: define stronger ordering for the default readX()

Message ID 1523938133-3224-2-git-send-email-okaya@codeaurora.org (mailing list archive)
State Not Applicable, archived
Delegated to: Andy Gross
Headers show

Commit Message

Sinan Kaya April 17, 2018, 4:08 a.m. UTC
parisc architecture seems to be mapping readX() and readX_relaxed() APIs
to __raw_readX() API.

__raw_readX() API doesn't provide any kind of ordering guarantees.
commit 032d59e1cde9 ("io: define stronger ordering for the default readX()
implementation") changed asm-generic implementation to use a more
conservative approach towards the readX() API.

Place a barrier() after the register read so that compiler doesn't
optimize across the regiter operation.

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 arch/parisc/include/asm/io.h | 23 +++++++++++++++++++----
 1 file changed, 19 insertions(+), 4 deletions(-)

Comments

James Bottomley April 17, 2018, 9:37 a.m. UTC | #1
On Tue, 2018-04-17 at 00:08 -0400, Sinan Kaya wrote:
> parisc architecture seems to be mapping readX() and readX_relaxed()
> APIs
> to __raw_readX() API.
> 
> __raw_readX() API doesn't provide any kind of ordering guarantees.
> commit 032d59e1cde9 ("io: define stronger ordering for the default
> readX()
> implementation") changed asm-generic implementation to use a more
> conservative approach towards the readX() API.

I don't follow your logic here.  function calls (even inline ones) are
sequence points and the compiler guarantees volatile variables are
stable before sequencing, so these two rules strictly compile order the
raw_read/write because the address is volatile.

> Place a barrier() after the register read so that compiler doesn't
> optimize across the regiter operation.

barrier() provides exactly the same guarantees as the sequence
point/volatile already above, so it seems to be completely unnecessary.

Perhaps if you gave an example of the actual problem you're trying to
fix we could assess if it affects parisc.

James


> Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
> ---
>  arch/parisc/include/asm/io.h | 23 +++++++++++++++++++----
>  1 file changed, 19 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/parisc/include/asm/io.h
> b/arch/parisc/include/asm/io.h
> index 2ec6405..e04c4ef 100644
> --- a/arch/parisc/include/asm/io.h
> +++ b/arch/parisc/include/asm/io.h
> @@ -179,19 +179,34 @@ static inline void __raw_writeq(unsigned long
> long b, volatile void __iomem *add
>  
>  static inline unsigned char readb(const volatile void __iomem *addr)
>  {
> -	return __raw_readb(addr);
> +	unsigned char ret;
> +
> +	ret = __raw_readb(addr);
> +	barrier();
> +	return ret;
>  }
>  static inline unsigned short readw(const volatile void __iomem
> *addr)
>  {
> -	return le16_to_cpu((__le16 __force) __raw_readw(addr));
> +	unsigned short ret;
> +
> +	ret = le16_to_cpu((__le16 __force) __raw_readw(addr));
> +	barrier();
> +	return ret;
>  }
>  static inline unsigned int readl(const volatile void __iomem *addr)
>  {
> -	return le32_to_cpu((__le32 __force) __raw_readl(addr));
> +	unsigned int ret;
> +	ret = le32_to_cpu((__le32 __force) __raw_readl(addr));
> +	barrier();
> +	return ret;
>  }
>  static inline unsigned long long readq(const volatile void __iomem
> *addr)
>  {
> -	return le64_to_cpu((__le64 __force) __raw_readq(addr));
> +	unsigned long long ret;
> +
> +	ret = le64_to_cpu((__le64 __force) __raw_readq(addr));
> +	barrier();
> +	return ret;
>  }
>  
>  static inline void writeb(unsigned char b, volatile void __iomem
> *addr)

--
To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sinan Kaya April 17, 2018, 2:13 p.m. UTC | #2
Hi James,

> 
> Perhaps if you gave an example of the actual problem you're trying to
> fix we could assess if it affects parisc.

Let me clarify myself here. Maybe, there is a better solution.

		/* assign ownership */
		desc->status = DEVICE_OWN;

		/* notify device of new descriptors */
		writel(DESC_NOTIFY, doorbell);

The difference between writel() and writel_relax() is writel() guarantees
memory transactions to be flushed to the device before the register write.
writel_relaxed() does not provide any guarantees about the memory and IO
operations.

Similarly, readl() provides following code to observe the DMA result while
readl_relaxed() does not provide this guarantee.

Ideally, you want to embed rmb() and wmb() into the writel() and readl()
to provide this guarantee.

PA-RISC doesn't seem to support neither one of the barrier types. If you are
familiar with the architecture, maybe you could guide us here.

Is __raw_writeX() enough to provide this guarantee for this architecture?

Sinan
James Bottomley April 17, 2018, 3:55 p.m. UTC | #3
On Tue, 2018-04-17 at 10:13 -0400, Sinan Kaya wrote:
> Hi James,
> 
> > 
> > Perhaps if you gave an example of the actual problem you're trying
> > to fix we could assess if it affects parisc.
> 
> Let me clarify myself here. Maybe, there is a better solution.
> 
> 		/* assign ownership */
> 		desc->status = DEVICE_OWN;
> 
> 		/* notify device of new descriptors */
> 		writel(DESC_NOTIFY, doorbell);
> 
> The difference between writel() and writel_relax() is writel()
> guarantees memory transactions to be flushed to the device before the
> register write.

Um, no it doesn't, at least not in PCI.  It guarantees the write will
be issued by the memory system, but it may still be cached (called
posting) in the PCI bridge.  So it doesn't guarantee the write reaches
the device by the time it returns.

>  writel_relaxed() does not provide any guarantees about the memory
> and IO operations.
> 
> Similarly, readl() provides following code to observe the DMA result
> while readl_relaxed() does not provide this guarantee.

Right, the relaxed operator provides no guarantee of ordering between
the memory and IO domains.  However, it's only really a problem on
multiple memory controller systems (i.e. NUMA).  Parisc (except
superdome, which we don't support) doesn't have this problem.  We also
turn of CPU stream reordering, so compile order is retire order on our
CPUs (which makes life a lot simpler).

> Ideally, you want to embed rmb() and wmb() into the writel() and
> readl() to provide this guarantee.
>  
> PA-RISC doesn't seem to support neither one of the barrier types. If
> you are familiar with the architecture, maybe you could guide us
> here.
> 
> Is __raw_writeX() enough to provide this guarantee for this
> architecture?

Well, with the volatile address it is.

The current implementations provide the expected semantics: namely the
position in the instruction stream is compile (retire) ordered and
issued from memory once retired.  We still do have the write posting
problem, but you'll find additional reads in the drivers to flush the
posted writes, so I don't actually believe we need anything changing.

James

--
To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sinan Kaya April 17, 2018, 6:28 p.m. UTC | #4
On 4/17/2018 11:55 AM, James Bottomley wrote:
> On Tue, 2018-04-17 at 10:13 -0400, Sinan Kaya wrote:
>> Hi James,
>>
>>>
>>> Perhaps if you gave an example of the actual problem you're trying
>>> to fix we could assess if it affects parisc.
>>
>> Let me clarify myself here. Maybe, there is a better solution.
>>
>> 		/* assign ownership */
>> 		desc->status = DEVICE_OWN;
>>
>> 		/* notify device of new descriptors */
>> 		writel(DESC_NOTIFY, doorbell);
>>
>> The difference between writel() and writel_relax() is writel()
>> guarantees memory transactions to be flushed to the device before the
>> register write.
> 
> Um, no it doesn't, at least not in PCI.  It guarantees the write will
> be issued by the memory system, but it may still be cached (called
> posting) in the PCI bridge.  So it doesn't guarantee the write reaches
> the device by the time it returns.


The correct terminology here would be to use observability. Yes, it can be
cached in whatever part of the system for some amount of time as long as
PCI device sees it in the correct order.

Let's do this exercise. 
1. OS writes to memory for some descriptor update
2. OS writes to the device via writel to hit a doorbell
3. Device comes and fetches the memory contents for the descriptor

writel() of PA-RISC needs to ensure that 3. cannot bypass 1. This is typically
done by a write barrier embedded into the writel() on relaxed architectures.

> 
>>  writel_relaxed() does not provide any guarantees about the memory
>> and IO operations.
>>
>> Similarly, readl() provides following code to observe the DMA result
>> while readl_relaxed() does not provide this guarantee.
> 
> Right, the relaxed operator provides no guarantee of ordering between
> the memory and IO domains.  However, it's only really a problem on
> multiple memory controller systems (i.e. NUMA).  Parisc (except
> superdome, which we don't support) doesn't have this problem.  We also
> turn of CPU stream reordering, so compile order is retire order on our
> CPUs (which makes life a lot simpler).

Good to know.

> 
>> Ideally, you want to embed rmb() and wmb() into the writel() and
>> readl() to provide this guarantee.
>>  
>> PA-RISC doesn't seem to support neither one of the barrier types. If
>> you are familiar with the architecture, maybe you could guide us
>> here.
>>
>> Is __raw_writeX() enough to provide this guarantee for this
>> architecture?
> 
> Well, with the volatile address it is.
> 
> The current implementations provide the expected semantics: namely the
> position in the instruction stream is compile (retire) ordered and
> issued from memory once retired.  We still do have the write posting
> problem, but you'll find additional reads in the drivers to flush the
> posted writes, so I don't actually believe we need anything changing.
> 

OK. I'll withdraw my patch. I'm just trying to ensure that all architectures
support writel() semantics. There is an attempt to remove unnecessary
write barriers from the drivers directory between the descriptor update and
writel. 

Just checking that PA-RISC won't break after this.

> James
> 
>
John David Anglin April 17, 2018, 10:53 p.m. UTC | #5
On 2018-04-17 2:28 PM, Sinan Kaya wrote:
> The correct terminology here would be to use observability. Yes, it can be
> cached in whatever part of the system for some amount of time as long as
> PCI device sees it in the correct order.
>
> Let's do this exercise.
> 1. OS writes to memory for some descriptor update
> 2. OS writes to the device via writel to hit a doorbell
> 3. Device comes and fetches the memory contents for the descriptor
>
> writel() of PA-RISC needs to ensure that 3. cannot bypass 1. This is typically
> done by a write barrier embedded into the writel() on relaxed architectures.
The sequence point after the argument evaluation for writel prevents the 
compiler from reordering
1 and 2.  Accesses to I/O space are strongly ordered on PA-RISC, so 1 
must occur before 2 (Page G-1
of the PA-RISC 2.0 Architecture).  Thus, the current code is okay.

Dave
Sinan Kaya April 18, 2018, 1:39 p.m. UTC | #6
On 4/17/2018 6:53 PM, John David Anglin wrote:
> On 2018-04-17 2:28 PM, Sinan Kaya wrote:
>> The correct terminology here would be to use observability. Yes, it can be
>> cached in whatever part of the system for some amount of time as long as
>> PCI device sees it in the correct order.
>>
>> Let's do this exercise.
>> 1. OS writes to memory for some descriptor update
>> 2. OS writes to the device via writel to hit a doorbell
>> 3. Device comes and fetches the memory contents for the descriptor
>>
>> writel() of PA-RISC needs to ensure that 3. cannot bypass 1. This is typically
>> done by a write barrier embedded into the writel() on relaxed architectures.
> The sequence point after the argument evaluation for writel prevents the compiler from reordering
> 1 and 2.  Accesses to I/O space are strongly ordered on PA-RISC, so 1 must occur before 2 (Page G-1
> of the PA-RISC 2.0 Architecture).  Thus, the current code is okay.
> 

Many thanks for the clarification.

> Dave
>
diff mbox

Patch

diff --git a/arch/parisc/include/asm/io.h b/arch/parisc/include/asm/io.h
index 2ec6405..e04c4ef 100644
--- a/arch/parisc/include/asm/io.h
+++ b/arch/parisc/include/asm/io.h
@@ -179,19 +179,34 @@  static inline void __raw_writeq(unsigned long long b, volatile void __iomem *add
 
 static inline unsigned char readb(const volatile void __iomem *addr)
 {
-	return __raw_readb(addr);
+	unsigned char ret;
+
+	ret = __raw_readb(addr);
+	barrier();
+	return ret;
 }
 static inline unsigned short readw(const volatile void __iomem *addr)
 {
-	return le16_to_cpu((__le16 __force) __raw_readw(addr));
+	unsigned short ret;
+
+	ret = le16_to_cpu((__le16 __force) __raw_readw(addr));
+	barrier();
+	return ret;
 }
 static inline unsigned int readl(const volatile void __iomem *addr)
 {
-	return le32_to_cpu((__le32 __force) __raw_readl(addr));
+	unsigned int ret;
+	ret = le32_to_cpu((__le32 __force) __raw_readl(addr));
+	barrier();
+	return ret;
 }
 static inline unsigned long long readq(const volatile void __iomem *addr)
 {
-	return le64_to_cpu((__le64 __force) __raw_readq(addr));
+	unsigned long long ret;
+
+	ret = le64_to_cpu((__le64 __force) __raw_readq(addr));
+	barrier();
+	return ret;
 }
 
 static inline void writeb(unsigned char b, volatile void __iomem *addr)