diff mbox

[06/16] ntb: add check and comment for link up to mw_count and mw_get_align

Message ID 20170629032648.3073-7-logang@deltatee.com (mailing list archive)
State New, archived
Delegated to: Bjorn Helgaas
Headers show

Commit Message

Logan Gunthorpe June 29, 2017, 3:26 a.m. UTC
This patch adds a comment and a check to the ntb_mw_get_align and
ntb_mw_count functions so that they always fail if the function is
called before the link is up.

This is to prevent accidental mis-use in clients that are testing
on hardware that this doesn't matter for.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
---
 include/linux/ntb.h | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

Comments

Allen Hubbe June 29, 2017, 6:11 p.m. UTC | #1
From: Logan Gunthorpe
> This patch adds a comment and a check to the ntb_mw_get_align and
> ntb_mw_count functions so that they always fail if the function is
> called before the link is up.
> 
> This is to prevent accidental mis-use in clients that are testing
> on hardware that this doesn't matter for.
> 
> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>

Nak.  This breaks a work around for stability issues on some hardware.  I am ok with changing the comment to say, this is only supported when called after link up.  I would still like to allow these to be called at any time.  Specific hardware drivers like Switchtec may return an error.  Upstream drivers, of course, should call these after link up: patch 5/16 part of this set looks good.

> ---
>  include/linux/ntb.h | 11 +++++++++--
>  1 file changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/ntb.h b/include/linux/ntb.h
> index 609e232c00da..de4f800c82b6 100644
> --- a/include/linux/ntb.h
> +++ b/include/linux/ntb.h
> @@ -730,12 +730,16 @@ static inline int ntb_link_disable(struct ntb_dev *ntb)
>   * Hardware and topology may support a different number of memory windows.
>   * Moreover different peer devices can support different number of memory
>   * windows. Simply speaking this method returns the number of possible inbound
> - * memory windows to share with specified peer device.
> + * memory windows to share with specified peer device. Note: this must only be
> + * called when the link is up.
>   *
>   * Return: the number of memory windows.
>   */
>  static inline int ntb_mw_count(struct ntb_dev *ntb, int pidx)
>  {
> +	if (!(ntb_link_is_up(ntb, NULL, NULL) & (1 << pidx)))
> +		return 0;
> +
>  	return ntb->ops->mw_count(ntb, pidx);
>  }
> 
> @@ -751,7 +755,7 @@ static inline int ntb_mw_count(struct ntb_dev *ntb, int pidx)
>   * Get the alignments of an inbound memory window with specified index.
>   * NULL may be given for any output parameter if the value is not needed.
>   * The alignment and size parameters may be used for allocation of proper
> - * shared memory.
> + * shared memory. Note: this must only be called when the link is up.
>   *
>   * Return: Zero on success, otherwise a negative error number.
>   */
> @@ -760,6 +764,9 @@ static inline int ntb_mw_get_align(struct ntb_dev *ntb, int pidx, int widx,
>  				   resource_size_t *size_align,
>  				   resource_size_t *size_max)
>  {
> +	if (!(ntb_link_is_up(ntb, NULL, NULL) & (1 << pidx)))
> +		return -ENOTCONN;
> +
>  	return ntb->ops->mw_get_align(ntb, pidx, widx, addr_align, size_align,
>  				      size_max);
>  }
> --
> 2.11.0
Logan Gunthorpe June 29, 2017, 7 p.m. UTC | #2
On 6/29/2017 12:11 PM, Allen Hubbe wrote:
> Nak.  This breaks a work around for stability issues on some hardware.  I am ok with changing the comment to say, this is only supported when called after link up.  I would still like to allow these to be called at any time.  Specific hardware drivers like Switchtec may return an error.  Upstream drivers, of course, should call these after link up: patch 5/16 part of this set looks good.

If absolutely necessary I can leave this out. But in terms of interface 
design it's _so_ much better to have it in. This change would bring the 
score from a 3 to a 5 on Rusty Russel's Hard to Misuse ranking[1]. To 
quote Rusty:

"3. Read the documentation and you'll get it right.
People only read instructions after they've already tied themselves into 
a knot. Then they skim them for keywords and don't read your warnings. I 
don't give an example of this; if this is the best an interface can get 
do, it's in trouble."

Can someone not just fix the out-of-tree code? And since when is 
out-of-tree code reasonable justification for what's done in upstream?

Logan

[1]http://ozlabs.org/~rusty/index.cgi/tech/2008-03-30.html
Allen Hubbe June 29, 2017, 8:13 p.m. UTC | #3
From: Logan Gunthorpe
> On 6/29/2017 12:11 PM, Allen Hubbe wrote:
> > Nak.  This breaks a work around for stability issues on some hardware.  I am ok with changing the
> comment to say, this is only supported when called after link up.  I would still like to allow these
> to be called at any time.  Specific hardware drivers like Switchtec may return an error.  Upstream
> drivers, of course, should call these after link up: patch 5/16 part of this set looks good.
> 
> If absolutely necessary I can leave this out. But in terms of interface
> design it's _so_ much better to have it in. This change would bring the
> score from a 3 to a 5 on Rusty Russel's Hard to Misuse ranking[1]. To
> quote Rusty:
> 
> "3. Read the documentation and you'll get it right.
> People only read instructions after they've already tied themselves into
> a knot. Then they skim them for keywords and don't read your warnings. I
> don't give an example of this; if this is the best an interface can get
> do, it's in trouble."
> 
> Can someone not just fix the out-of-tree code? And since when is
> out-of-tree code reasonable justification for what's done in upstream?

Unfortunately, it is to work around hardware errata.  That is not so trivial to fix.

The workaround in software is also not acceptable upstream, for doing things like writing directly to the peer's interrupt controller registers, bypassing the ntb doorbells.

> 
> Logan
> 
> [1]http://ozlabs.org/~rusty/index.cgi/tech/2008-03-30.html
Logan Gunthorpe June 29, 2017, 8:17 p.m. UTC | #4
On 6/29/2017 2:13 PM, Allen Hubbe wrote:
> Unfortunately, it is to work around hardware errata.  That is not so trivial to fix.

Can you describe more what the work around is doing? Can you share the 
code? It seems odd that a workaround is based on the alignment 
restrictions of the mws.

Logan
Allen Hubbe June 29, 2017, 9:27 p.m. UTC | #5
From: Logan Gunthorpe

> On 6/29/2017 2:13 PM, Allen Hubbe wrote:

> > Unfortunately, it is to work around hardware errata.  That is not so trivial to fix.

> 

> Can you describe more what the work around is doing? Can you share the

> code? It seems odd that a workaround is based on the alignment

> restrictions of the mws.


Sure, while not making any claim that this is ready for upstream.

It is not a workaround for alignment restrictions of the mws.  It is a restriction to avoid the use of doorbells and scratchpads.  Memory windows are used exclusively.

Read msi-x local <addr,data> and send that to the peer:
https://github.com/ntrdma/ntrdma/blob/master/drivers/ntc/ntc_ntb_msi.c#L603

Transform peer's addr to the memory window region:
https://github.com/ntrdma/ntrdma/blob/master/drivers/ntc/ntc_ntb_msi.c#L622

Append a dma immediate value operation after other operations, to write the data at addr:
https://github.com/ntrdma/ntrdma/blob/master/drivers/ntc/ntc_ntb_msi.c#L1208

Above describes the workaround.

Used in the context of a rdma-over-ntb driver here:
https://github.com/ntrdma/ntrdma/blob/master/drivers/infiniband/hw/ntrdma/ntrdma_qp.c#L1585

> 

> Logan
Allen Hubbe June 29, 2017, 9:35 p.m. UTC | #6
Let me try that again...

From: Hubbe, Allen
> From: Logan Gunthorpe
> > On 6/29/2017 2:13 PM, Allen Hubbe wrote:
> > > Unfortunately, it is to work around hardware errata.  That is not so trivial to fix.
> >
> > Can you describe more what the work around is doing? Can you share the
> > code? It seems odd that a workaround is based on the alignment
> > restrictions of the mws.
> 
> Sure, while not making any claim that this is ready for upstream.
> 
> It is not a workaround for alignment restrictions of the mws.  It is a restriction to avoid the use of
> doorbells and scratchpads.  Memory windows are used exclusively.
> 
> Read msi-x local <addr,data> and send that to the peer:

https://github.com/ntrdma/ntrdma/blob/master/drivers/ntc/ntc_ntb_msi.c#L583

> Transform peer's addr to the memory window region:

https://github.com/ntrdma/ntrdma/blob/master/drivers/ntc/ntc_ntb_msi.c#L603

> Append a dma immediate value operation after other operations, to write the data at addr:

https://github.com/ntrdma/ntrdma/blob/master/drivers/ntc/ntc_ntb_msi.c#L1195

> 
> Above describes the workaround.
> 
> Used in the context of a rdma-over-ntb driver here:
> https://github.com/ntrdma/ntrdma/blob/master/drivers/infiniband/hw/ntrdma/ntrdma_qp.c#L1585
> 
> >
> > Logan
Logan Gunthorpe June 29, 2017, 10:14 p.m. UTC | #7
On 29/06/17 03:35 PM, Allen Hubbe wrote:
>> It is not a workaround for alignment restrictions of the mws.  It is a restriction to avoid the use of
>> doorbells and scratchpads.  Memory windows are used exclusively.
>>
>> Read msi-x local <addr,data> and send that to the peer:
> 
> https://github.com/ntrdma/ntrdma/blob/master/drivers/ntc/ntc_ntb_msi.c#L583
> 
>> Transform peer's addr to the memory window region:
> 
> https://github.com/ntrdma/ntrdma/blob/master/drivers/ntc/ntc_ntb_msi.c#L603
> 
>> Append a dma immediate value operation after other operations, to write the data at addr:
> 
> https://github.com/ntrdma/ntrdma/blob/master/drivers/ntc/ntc_ntb_msi.c#L1195


Ok, I'm a little bit confused at the links you are pointing me to. The
patch you nak'd only makes changes to ntb_mw_get_align and ntb_mw_count:

1) ntb_mw_get_align gets the last two parameters of what was formerly
ntb_mw_get_range. However, the only two uses of get_range I can find in
the ntrdma code set those two parameters to NULL and thus would not use
ntb_mw_get_align.

2) ntb_mw_count is technically a new function (ntb_peer_mw_count is
equivalent to the old function of that name). However, the one place in
ntrdma where it would be called is during de-initialization. So if the
link goes down before deinit, the client will not correctly clean up all
the memory windows.

So, it seems to me that ntb_mw_get_align is fine the way I have it but
we should probably change ntb_mw_count: I'll get rid of the check for
link in that function and then have the switchtec driver locally cache
the peer's memory windows whenever the link goes up. So if the link is
down it will return the number of mws the last time the link was seen.
(And if the link was never seen it will return 0).

Thoughts?

Logan
diff mbox

Patch

diff --git a/include/linux/ntb.h b/include/linux/ntb.h
index 609e232c00da..de4f800c82b6 100644
--- a/include/linux/ntb.h
+++ b/include/linux/ntb.h
@@ -730,12 +730,16 @@  static inline int ntb_link_disable(struct ntb_dev *ntb)
  * Hardware and topology may support a different number of memory windows.
  * Moreover different peer devices can support different number of memory
  * windows. Simply speaking this method returns the number of possible inbound
- * memory windows to share with specified peer device.
+ * memory windows to share with specified peer device. Note: this must only be
+ * called when the link is up.
  *
  * Return: the number of memory windows.
  */
 static inline int ntb_mw_count(struct ntb_dev *ntb, int pidx)
 {
+	if (!(ntb_link_is_up(ntb, NULL, NULL) & (1 << pidx)))
+		return 0;
+
 	return ntb->ops->mw_count(ntb, pidx);
 }
 
@@ -751,7 +755,7 @@  static inline int ntb_mw_count(struct ntb_dev *ntb, int pidx)
  * Get the alignments of an inbound memory window with specified index.
  * NULL may be given for any output parameter if the value is not needed.
  * The alignment and size parameters may be used for allocation of proper
- * shared memory.
+ * shared memory. Note: this must only be called when the link is up.
  *
  * Return: Zero on success, otherwise a negative error number.
  */
@@ -760,6 +764,9 @@  static inline int ntb_mw_get_align(struct ntb_dev *ntb, int pidx, int widx,
 				   resource_size_t *size_align,
 				   resource_size_t *size_max)
 {
+	if (!(ntb_link_is_up(ntb, NULL, NULL) & (1 << pidx)))
+		return -ENOTCONN;
+
 	return ntb->ops->mw_get_align(ntb, pidx, widx, addr_align, size_align,
 				      size_max);
 }