diff mbox series

[for-rc] IB/cma: Fix false P_Key mismatch messages

Message ID 1620219241-24979-1-git-send-email-haakon.bugge@oracle.com (mailing list archive)
State Changes Requested
Delegated to: Jason Gunthorpe
Headers show
Series [for-rc] IB/cma: Fix false P_Key mismatch messages | expand

Commit Message

Haakon Bugge May 5, 2021, 12:54 p.m. UTC
There are three conditions that must be fulfilled in order to consider
a partition match. Those are:

      1. Both P_Keys must valid
      2. At least one must be a full member
      3. The partitions (lower 15 bits) must match

In system employing both limited and full membership ports, we see
these false warning messages:

RDMA CMA: got different BTH P_Key (0x2a00) and primary path P_Key (0xaa00)
RDMA CMA: in the future this may cause the request to be dropped

even though the partition is the same.

See IBTA 10.9.1.2 Special P_Keys and 10.9.3 Partition Key Matching for
a reference.

Fixes: 84424a7fc793 ("IB/cma: Print warning on different inner and header P_Keys")
Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
---
 drivers/infiniband/core/cma.c | 22 ++++++++++++++++++++--
 1 file changed, 20 insertions(+), 2 deletions(-)

Comments

Jason Gunthorpe May 10, 2021, 5:04 p.m. UTC | #1
On Wed, May 05, 2021 at 02:54:01PM +0200, Håkon Bugge wrote:
> There are three conditions that must be fulfilled in order to consider
> a partition match. Those are:
> 
>       1. Both P_Keys must valid
>       2. At least one must be a full member
>       3. The partitions (lower 15 bits) must match
> 
> In system employing both limited and full membership ports, we see
> these false warning messages:
> 
> RDMA CMA: got different BTH P_Key (0x2a00) and primary path P_Key (0xaa00)
> RDMA CMA: in the future this may cause the request to be dropped
> 
> even though the partition is the same.
> 
> See IBTA 10.9.1.2 Special P_Keys and 10.9.3 Partition Key Matching for
> a reference.
> 
> Fixes: 84424a7fc793 ("IB/cma: Print warning on different inner and header P_Keys")
> Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
> ---
>  drivers/infiniband/core/cma.c | 22 ++++++++++++++++++++--
>  1 file changed, 20 insertions(+), 2 deletions(-)

What is this trying to fix?

IMHO it is a bug on the sender side to send GMPs to use a pkey that
doesn't exactly match the data path pkey.

Jason
Haakon Bugge May 10, 2021, 6:52 p.m. UTC | #2
> On 10 May 2021, at 19:04, Jason Gunthorpe <jgg@nvidia.com> wrote:
> 
> On Wed, May 05, 2021 at 02:54:01PM +0200, Håkon Bugge wrote:
>> There are three conditions that must be fulfilled in order to consider
>> a partition match. Those are:
>> 
>>      1. Both P_Keys must valid
>>      2. At least one must be a full member
>>      3. The partitions (lower 15 bits) must match
>> 
>> In system employing both limited and full membership ports, we see
>> these false warning messages:
>> 
>> RDMA CMA: got different BTH P_Key (0x2a00) and primary path P_Key (0xaa00)
>> RDMA CMA: in the future this may cause the request to be dropped
>> 
>> even though the partition is the same.
>> 
>> See IBTA 10.9.1.2 Special P_Keys and 10.9.3 Partition Key Matching for
>> a reference.
>> 
>> Fixes: 84424a7fc793 ("IB/cma: Print warning on different inner and header P_Keys")
>> Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
>> ---
>> drivers/infiniband/core/cma.c | 22 ++++++++++++++++++++--
>> 1 file changed, 20 insertions(+), 2 deletions(-)
> 
> What is this trying to fix?

The false warning messages. The wrong way though:-)

> IMHO it is a bug on the sender side to send GMPs to use a pkey that
> doesn't exactly match the data path pkey.

The active connector calls ib_addr_get_pkey(). This function extracts the pkey from byte 8/9 in the device's bcast address. However, RFC 4391 explicitly states:

4.1.  Broadcast-GID Parameters

   The broadcast-GID is set up with the following attributes:

       1. P_Key

          A "Full Membership" P_Key (high-order bit is set to 1) MUST be
          used so that all members may communicate with one another.


In other words, ib_addr_get_pkey() will sometimes wrongly return the full-member version of the partition, when the port is given only the limited member.

Let me do some post-coffee, pre-lunch work tomorrow to come up with a solution, aka ib_find_cached_pkey() followed by an ib_get_cached_pkey()?


Thxs, Håkon
Jason Gunthorpe May 10, 2021, 7:12 p.m. UTC | #3
On Mon, May 10, 2021 at 06:52:54PM +0000, Haakon Bugge wrote:
> 
> 
> > On 10 May 2021, at 19:04, Jason Gunthorpe <jgg@nvidia.com> wrote:
> > 
> > On Wed, May 05, 2021 at 02:54:01PM +0200, Håkon Bugge wrote:
> >> There are three conditions that must be fulfilled in order to consider
> >> a partition match. Those are:
> >> 
> >>      1. Both P_Keys must valid
> >>      2. At least one must be a full member
> >>      3. The partitions (lower 15 bits) must match
> >> 
> >> In system employing both limited and full membership ports, we see
> >> these false warning messages:
> >> 
> >> RDMA CMA: got different BTH P_Key (0x2a00) and primary path P_Key (0xaa00)
> >> RDMA CMA: in the future this may cause the request to be dropped
> >> 
> >> even though the partition is the same.
> >> 
> >> See IBTA 10.9.1.2 Special P_Keys and 10.9.3 Partition Key Matching for
> >> a reference.
> >> 
> >> Fixes: 84424a7fc793 ("IB/cma: Print warning on different inner and header P_Keys")
> >> Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
> >> drivers/infiniband/core/cma.c | 22 ++++++++++++++++++++--
> >> 1 file changed, 20 insertions(+), 2 deletions(-)
> > 
> > What is this trying to fix?
> 
> The false warning messages. The wrong way though:-)
> 
> > IMHO it is a bug on the sender side to send GMPs to use a pkey that
> > doesn't exactly match the data path pkey.
> 
> The active connector calls ib_addr_get_pkey(). This function
> extracts the pkey from byte 8/9 in the device's bcast
> address. However, RFC 4391 explicitly states:

pkeys in CM come only from path records that the SM returns, the above
should only be used to feed into a path record query which could then
return back a limited pkey.

Everything thereafter should use the SM's version of the pkey.

Jason
Haakon Bugge June 29, 2021, 1:45 p.m. UTC | #4
> On 10 May 2021, at 21:12, Jason Gunthorpe <jgg@nvidia.com> wrote:
> 
> On Mon, May 10, 2021 at 06:52:54PM +0000, Haakon Bugge wrote:
>> 
>> 
>>> On 10 May 2021, at 19:04, Jason Gunthorpe <jgg@nvidia.com> wrote:
>>> 
>>> On Wed, May 05, 2021 at 02:54:01PM +0200, Håkon Bugge wrote:
>>>> There are three conditions that must be fulfilled in order to consider
>>>> a partition match. Those are:
>>>> 
>>>>     1. Both P_Keys must valid
>>>>     2. At least one must be a full member
>>>>     3. The partitions (lower 15 bits) must match
>>>> 
>>>> In system employing both limited and full membership ports, we see
>>>> these false warning messages:
>>>> 
>>>> RDMA CMA: got different BTH P_Key (0x2a00) and primary path P_Key (0xaa00)
>>>> RDMA CMA: in the future this may cause the request to be dropped
>>>> 
>>>> even though the partition is the same.
>>>> 
>>>> See IBTA 10.9.1.2 Special P_Keys and 10.9.3 Partition Key Matching for
>>>> a reference.
>>>> 
>>>> Fixes: 84424a7fc793 ("IB/cma: Print warning on different inner and header P_Keys")
>>>> Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
>>>> drivers/infiniband/core/cma.c | 22 ++++++++++++++++++++--
>>>> 1 file changed, 20 insertions(+), 2 deletions(-)
>>> 
>>> What is this trying to fix?
>> 
>> The false warning messages. The wrong way though:-)
>> 
>>> IMHO it is a bug on the sender side to send GMPs to use a pkey that
>>> doesn't exactly match the data path pkey.
>> 
>> The active connector calls ib_addr_get_pkey(). This function
>> extracts the pkey from byte 8/9 in the device's bcast
>> address. However, RFC 4391 explicitly states:
> 
> pkeys in CM come only from path records that the SM returns, the above
> should only be used to feed into a path record query which could then
> return back a limited pkey.
> 
> Everything thereafter should use the SM's version of the pkey.

Revisiting this. I think I mis-interpreted the scenario that led to the P_Key mismatch messages.

The CM retrieves the pkey_index that matched the P_Key in the BTH (cm_get_bth_pkey()) and thereafter calls ib_get_cached_pkey() to get the P_Key value of the particular pkey_index.

Assume a full-member sends a REQ. In that case, both P_Keys (BTH and primary path_rec) are full. Further, assume the recipient is only a limited member. Since full and limited members of the same partition are eligible to communicate, the P_Key retrieved by cm_get_bth_pkey() will be the limited one.

The CMA will then give a warning message, because the P_Key in the primary path and the one incorrectly assumed to be in the BTH, doesn't match.

The point is that cm_get_bth_pkey() may return the limited member, even though the packet on the wire had the full-member partition in it's BTH P_Key.

So, I think my first commit here isn't that bad after all :-)


Thxs, Håkon
Jason Gunthorpe July 5, 2021, 4:26 p.m. UTC | #5
On Tue, Jun 29, 2021 at 01:45:35PM +0000, Haakon Bugge wrote:

> >>> IMHO it is a bug on the sender side to send GMPs to use a pkey that
> >>> doesn't exactly match the data path pkey.
> >> 
> >> The active connector calls ib_addr_get_pkey(). This function
> >> extracts the pkey from byte 8/9 in the device's bcast
> >> address. However, RFC 4391 explicitly states:
> > 
> > pkeys in CM come only from path records that the SM returns, the above
> > should only be used to feed into a path record query which could then
> > return back a limited pkey.
> > 
> > Everything thereafter should use the SM's version of the pkey.
>
> Revisiting this. I think I mis-interpreted the scenario that led to
> the P_Key mismatch messages.
> 
> The CM retrieves the pkey_index that matched the P_Key in the BTH
> (cm_get_bth_pkey()) and thereafter calls ib_get_cached_pkey() to get
> the P_Key value of the particular pkey_index.
> 
> Assume a full-member sends a REQ. In that case, both P_Keys (BTH and
> primary path_rec) are full. Further, assume the recipient is only a
> limited member. Since full and limited members of the same partition
> are eligible to communicate, the P_Key retrieved by
> cm_get_bth_pkey() will be the limited one.

It is incorrect for the issuer of the REQ to put a full pkey in the
REQ message when the target is a limited member.

The CM model in IB has the target fully under the control of the
initiator, and it is up to the initiator to ask the SM how the target
should generate its return traffic. The SM is reponsible to say that
limited->full communication is done using the limited pkey.

The initiator is reponsible to place that limited pkey in the REQ
message.

Somewhere in your system this isn't happening properly, and it is a
bug that the CM is correctly identifying.

Jason
Haakon Bugge July 5, 2021, 4:59 p.m. UTC | #6
> On 5 Jul 2021, at 18:26, Jason Gunthorpe <jgg@nvidia.com> wrote:
> 
> On Tue, Jun 29, 2021 at 01:45:35PM +0000, Haakon Bugge wrote:
> 
>>>>> IMHO it is a bug on the sender side to send GMPs to use a pkey that
>>>>> doesn't exactly match the data path pkey.
>>>> 
>>>> The active connector calls ib_addr_get_pkey(). This function
>>>> extracts the pkey from byte 8/9 in the device's bcast
>>>> address. However, RFC 4391 explicitly states:
>>> 
>>> pkeys in CM come only from path records that the SM returns, the above
>>> should only be used to feed into a path record query which could then
>>> return back a limited pkey.
>>> 
>>> Everything thereafter should use the SM's version of the pkey.
>> 
>> Revisiting this. I think I mis-interpreted the scenario that led to
>> the P_Key mismatch messages.
>> 
>> The CM retrieves the pkey_index that matched the P_Key in the BTH
>> (cm_get_bth_pkey()) and thereafter calls ib_get_cached_pkey() to get
>> the P_Key value of the particular pkey_index.
>> 
>> Assume a full-member sends a REQ. In that case, both P_Keys (BTH and
>> primary path_rec) are full. Further, assume the recipient is only a
>> limited member. Since full and limited members of the same partition
>> are eligible to communicate, the P_Key retrieved by
>> cm_get_bth_pkey() will be the limited one.
> 
> It is incorrect for the issuer of the REQ to put a full pkey in the
> REQ message when the target is a limited member.

Sorry, I mis-interpreted the spec. I though the PKey in the Path record should be that of the initiator, not the target's. OK. Will come up with a fix.


Thxs, Håkon

> 
> The CM model in IB has the target fully under the control of the
> initiator, and it is up to the initiator to ask the SM how the target
> should generate its return traffic. The SM is reponsible to say that
> limited->full communication is done using the limited pkey.
> 
> The initiator is reponsible to place that limited pkey in the REQ
> message.
> 
> Somewhere in your system this isn't happening properly, and it is a
> bug that the CM is correctly identifying.
> 
> Jason
Haakon Bugge July 8, 2021, 3:59 p.m. UTC | #7
> On 5 Jul 2021, at 18:59, Haakon Bugge <haakon.bugge@oracle.com> wrote:
> 
> 
> 
>> On 5 Jul 2021, at 18:26, Jason Gunthorpe <jgg@nvidia.com> wrote:
>> 
>> On Tue, Jun 29, 2021 at 01:45:35PM +0000, Haakon Bugge wrote:
>> 
>>>>>> IMHO it is a bug on the sender side to send GMPs to use a pkey that
>>>>>> doesn't exactly match the data path pkey.
>>>>> 
>>>>> The active connector calls ib_addr_get_pkey(). This function
>>>>> extracts the pkey from byte 8/9 in the device's bcast
>>>>> address. However, RFC 4391 explicitly states:
>>>> 
>>>> pkeys in CM come only from path records that the SM returns, the above
>>>> should only be used to feed into a path record query which could then
>>>> return back a limited pkey.
>>>> 
>>>> Everything thereafter should use the SM's version of the pkey.
>>> 
>>> Revisiting this. I think I mis-interpreted the scenario that led to
>>> the P_Key mismatch messages.
>>> 
>>> The CM retrieves the pkey_index that matched the P_Key in the BTH
>>> (cm_get_bth_pkey()) and thereafter calls ib_get_cached_pkey() to get
>>> the P_Key value of the particular pkey_index.
>>> 
>>> Assume a full-member sends a REQ. In that case, both P_Keys (BTH and
>>> primary path_rec) are full. Further, assume the recipient is only a
>>> limited member. Since full and limited members of the same partition
>>> are eligible to communicate, the P_Key retrieved by
>>> cm_get_bth_pkey() will be the limited one.
>> 
>> It is incorrect for the issuer of the REQ to put a full pkey in the
>> REQ message when the target is a limited member.
> 
> Sorry, I mis-interpreted the spec. I though the PKey in the Path record should be that of the initiator, not the target's. OK. Will come up with a fix.

On the systems I have access to (running Oracle flavour OpenSM in our NM2 switches), the behaviour is exactly the opposite of what you say. 

So, if we (Oracle) are the only ones seeing this warning (I repeat it here to catch some interest):

RDMA CMA: got different BTH P_Key (0x2a00) and primary path P_Key (0xaa00)
RDMA CMA: in the future this may cause the request to be dropped

then there is no fix in the RDMA stack. It must be fixed in Oracle's OpenSM.

The only thing I can do here is to straighten up the warning message, which is imprecise. What about:

"the P_Key table entry (0x1234) matching incoming BTH.P_Key differs from primary path P_Key (0x9234)"

My literal interpretation of the old warning message confused me!


Thxs, Håkon


> 
> 
> Thxs, Håkon
> 
>> 
>> The CM model in IB has the target fully under the control of the
>> initiator, and it is up to the initiator to ask the SM how the target
>> should generate its return traffic. The SM is reponsible to say that
>> limited->full communication is done using the limited pkey.
>> 
>> The initiator is reponsible to place that limited pkey in the REQ
>> message.
>> 
>> Somewhere in your system this isn't happening properly, and it is a
>> bug that the CM is correctly identifying.
>> 
>> Jason
Jason Gunthorpe July 8, 2021, 6:52 p.m. UTC | #8
On Thu, Jul 08, 2021 at 03:59:25PM +0000, Haakon Bugge wrote:
> 
> 
> > On 5 Jul 2021, at 18:59, Haakon Bugge <haakon.bugge@oracle.com> wrote:
> > 
> > 
> > 
> >> On 5 Jul 2021, at 18:26, Jason Gunthorpe <jgg@nvidia.com> wrote:
> >> 
> >> On Tue, Jun 29, 2021 at 01:45:35PM +0000, Haakon Bugge wrote:
> >> 
> >>>>>> IMHO it is a bug on the sender side to send GMPs to use a pkey that
> >>>>>> doesn't exactly match the data path pkey.
> >>>>> 
> >>>>> The active connector calls ib_addr_get_pkey(). This function
> >>>>> extracts the pkey from byte 8/9 in the device's bcast
> >>>>> address. However, RFC 4391 explicitly states:
> >>>> 
> >>>> pkeys in CM come only from path records that the SM returns, the above
> >>>> should only be used to feed into a path record query which could then
> >>>> return back a limited pkey.
> >>>> 
> >>>> Everything thereafter should use the SM's version of the pkey.
> >>> 
> >>> Revisiting this. I think I mis-interpreted the scenario that led to
> >>> the P_Key mismatch messages.
> >>> 
> >>> The CM retrieves the pkey_index that matched the P_Key in the BTH
> >>> (cm_get_bth_pkey()) and thereafter calls ib_get_cached_pkey() to get
> >>> the P_Key value of the particular pkey_index.
> >>> 
> >>> Assume a full-member sends a REQ. In that case, both P_Keys (BTH and
> >>> primary path_rec) are full. Further, assume the recipient is only a
> >>> limited member. Since full and limited members of the same partition
> >>> are eligible to communicate, the P_Key retrieved by
> >>> cm_get_bth_pkey() will be the limited one.
> >> 
> >> It is incorrect for the issuer of the REQ to put a full pkey in the
> >> REQ message when the target is a limited member.
> > 
> > Sorry, I mis-interpreted the spec. I though the PKey in the Path record should be that of the initiator, not the target's. OK. Will come up with a fix.
> 
> On the systems I have access to (running Oracle flavour OpenSM in
> our NM2 switches), the behaviour is exactly the opposite of what you
> say.

Check with saquery what is happening, if you request a reversible path
from the CM target (limited pkey) to the CM client (full) you should
get the limited pkey or the SM is broken.

If the SM is working then probably something in the stack is using a
reversed src/dest when doing the PR query.

It is not intuitive but the PR query should have SGID as the CM Target
even though it is running on the CM Client.

This is because the REQ is supposed to contain a path that is relative
to the target.

Everything will be the same except for this small detail about
full/limited pkeys.

The client can figure out what to do with its own pkey table locally.

> "the P_Key table entry (0x1234) matching incoming BTH.P_Key differs from primary path P_Key (0x9234)"

"The REQ contains a PKey (0x1234) that is not found in this device's
PKey table. Using alternative limited Pkey (0x9234) instead. This is a
client bug"
 
Jason
Haakon Bugge July 9, 2021, 4:45 p.m. UTC | #9
> On 8 Jul 2021, at 20:52, Jason Gunthorpe <jgg@nvidia.com> wrote:
> 
> On Thu, Jul 08, 2021 at 03:59:25PM +0000, Haakon Bugge wrote:
>> 
>> 
>>> On 5 Jul 2021, at 18:59, Haakon Bugge <haakon.bugge@oracle.com> wrote:
>>> 
>>> 
>>> 
>>>> On 5 Jul 2021, at 18:26, Jason Gunthorpe <jgg@nvidia.com> wrote:
>>>> 
>>>> On Tue, Jun 29, 2021 at 01:45:35PM +0000, Haakon Bugge wrote:
>>>> 
>>>>>>>> IMHO it is a bug on the sender side to send GMPs to use a pkey that
>>>>>>>> doesn't exactly match the data path pkey.
>>>>>>> 
>>>>>>> The active connector calls ib_addr_get_pkey(). This function
>>>>>>> extracts the pkey from byte 8/9 in the device's bcast
>>>>>>> address. However, RFC 4391 explicitly states:
>>>>>> 
>>>>>> pkeys in CM come only from path records that the SM returns, the above
>>>>>> should only be used to feed into a path record query which could then
>>>>>> return back a limited pkey.
>>>>>> 
>>>>>> Everything thereafter should use the SM's version of the pkey.
>>>>> 
>>>>> Revisiting this. I think I mis-interpreted the scenario that led to
>>>>> the P_Key mismatch messages.
>>>>> 
>>>>> The CM retrieves the pkey_index that matched the P_Key in the BTH
>>>>> (cm_get_bth_pkey()) and thereafter calls ib_get_cached_pkey() to get
>>>>> the P_Key value of the particular pkey_index.
>>>>> 
>>>>> Assume a full-member sends a REQ. In that case, both P_Keys (BTH and
>>>>> primary path_rec) are full. Further, assume the recipient is only a
>>>>> limited member. Since full and limited members of the same partition
>>>>> are eligible to communicate, the P_Key retrieved by
>>>>> cm_get_bth_pkey() will be the limited one.
>>>> 
>>>> It is incorrect for the issuer of the REQ to put a full pkey in the
>>>> REQ message when the target is a limited member.
>>> 
>>> Sorry, I mis-interpreted the spec. I though the PKey in the Path record should be that of the initiator, not the target's. OK. Will come up with a fix.
>> 
>> On the systems I have access to (running Oracle flavour OpenSM in
>> our NM2 switches), the behaviour is exactly the opposite of what you
>> say.
> 
> Check with saquery what is happening, if you request a reversible path
> from the CM target (limited pkey) to the CM client (full) you should
> get the limited pkey or the SM is broken.
> 
> If the SM is working then probably something in the stack is using a
> reversed src/dest when doing the PR query.
> 
> It is not intuitive but the PR query should have SGID as the CM Target
> even though it is running on the CM Client.

That is not how it is today. And because of that, all accesses to the PR assume the d{gid,lid} is the remote peer. To fix this, I have to swap dgid/sgid and ib.dlid/ib.slid all over to get this working. That is pervasive. E.g., even includes ipoib. Let me know if that is what you want.


Thxs, Håkon

> 
> This is because the REQ is supposed to contain a path that is relative
> to the target.
> 
> Everything will be the same except for this small detail about
> full/limited pkeys.
> 
> The client can figure out what to do with its own pkey table locally.
> 
>> "the P_Key table entry (0x1234) matching incoming BTH.P_Key differs from primary path P_Key (0x9234)"
> 
> "The REQ contains a PKey (0x1234) that is not found in this device's
> PKey table. Using alternative limited Pkey (0x9234) instead. This is a
> client bug"
> 
> Jason
Jason Gunthorpe July 9, 2021, 4:56 p.m. UTC | #10
On Fri, Jul 09, 2021 at 04:45:21PM +0000, Haakon Bugge wrote:
> 
> 
> > On 8 Jul 2021, at 20:52, Jason Gunthorpe <jgg@nvidia.com> wrote:
> > 
> > On Thu, Jul 08, 2021 at 03:59:25PM +0000, Haakon Bugge wrote:
> >> 
> >> 
> >>> On 5 Jul 2021, at 18:59, Haakon Bugge <haakon.bugge@oracle.com> wrote:
> >>> 
> >>> 
> >>> 
> >>>> On 5 Jul 2021, at 18:26, Jason Gunthorpe <jgg@nvidia.com> wrote:
> >>>> 
> >>>> On Tue, Jun 29, 2021 at 01:45:35PM +0000, Haakon Bugge wrote:
> >>>> 
> >>>>>>>> IMHO it is a bug on the sender side to send GMPs to use a pkey that
> >>>>>>>> doesn't exactly match the data path pkey.
> >>>>>>> 
> >>>>>>> The active connector calls ib_addr_get_pkey(). This function
> >>>>>>> extracts the pkey from byte 8/9 in the device's bcast
> >>>>>>> address. However, RFC 4391 explicitly states:
> >>>>>> 
> >>>>>> pkeys in CM come only from path records that the SM returns, the above
> >>>>>> should only be used to feed into a path record query which could then
> >>>>>> return back a limited pkey.
> >>>>>> 
> >>>>>> Everything thereafter should use the SM's version of the pkey.
> >>>>> 
> >>>>> Revisiting this. I think I mis-interpreted the scenario that led to
> >>>>> the P_Key mismatch messages.
> >>>>> 
> >>>>> The CM retrieves the pkey_index that matched the P_Key in the BTH
> >>>>> (cm_get_bth_pkey()) and thereafter calls ib_get_cached_pkey() to get
> >>>>> the P_Key value of the particular pkey_index.
> >>>>> 
> >>>>> Assume a full-member sends a REQ. In that case, both P_Keys (BTH and
> >>>>> primary path_rec) are full. Further, assume the recipient is only a
> >>>>> limited member. Since full and limited members of the same partition
> >>>>> are eligible to communicate, the P_Key retrieved by
> >>>>> cm_get_bth_pkey() will be the limited one.
> >>>> 
> >>>> It is incorrect for the issuer of the REQ to put a full pkey in the
> >>>> REQ message when the target is a limited member.
> >>> 
> >>> Sorry, I mis-interpreted the spec. I though the PKey in the Path record should be that of the initiator, not the target's. OK. Will come up with a fix.
> >> 
> >> On the systems I have access to (running Oracle flavour OpenSM in
> >> our NM2 switches), the behaviour is exactly the opposite of what you
> >> say.
> > 
> > Check with saquery what is happening, if you request a reversible path
> > from the CM target (limited pkey) to the CM client (full) you should
> > get the limited pkey or the SM is broken.
> > 
> > If the SM is working then probably something in the stack is using a
> > reversed src/dest when doing the PR query.
> > 
> > It is not intuitive but the PR query should have SGID as the CM Target
> > even though it is running on the CM Client.
> 
> That is not how it is today. And because of that, all accesses to
> the PR assume the d{gid,lid} is the remote peer. To fix this, I have
> to swap dgid/sgid and ib.dlid/ib.slid all over to get this
> working. That is pervasive. E.g., even includes ipoib. Let me know
> if that is what you want.

It is only things that use the paths to generate CM REQ messages, and
yes it is the right thing to do.

Jason
diff mbox series

Patch

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 2b9ffc2..f5bcf7d 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -1368,6 +1368,24 @@  static int cma_save_net_info(struct sockaddr *src_addr,
 	return cma_save_ip_info(src_addr, dst_addr, ib_event, service_id);
 }
 
+/*
+ * If at least one of the pkeys is a full member, none of them are
+ * invalid, and the partitions (lower 15 bits) are equal, we have a
+ * match.
+ *
+ * See IBTA 10.9.1.2 Special P_Keys and 10.9.3 Partition Key Matching
+ */
+
+static bool partition_match(u16 pkey_a, u16 pkey_b)
+{
+	const u16 fmb = 0x8000; /* Full Member Bit */
+	const bool valid_pkeys = (pkey_a & ~fmb) && (pkey_b & ~fmb);
+	const bool one_full = (pkey_a | pkey_b) & fmb;
+	const bool same_partition = (pkey_a | fmb) == (pkey_b | fmb);
+
+	return valid_pkeys && one_full && same_partition;
+}
+
 static int cma_save_req_info(const struct ib_cm_event *ib_event,
 			     struct cma_req_info *req)
 {
@@ -1385,7 +1403,7 @@  static int cma_save_req_info(const struct ib_cm_event *ib_event,
 		req->has_gid	= true;
 		req->service_id = req_param->primary_path->service_id;
 		req->pkey	= be16_to_cpu(req_param->primary_path->pkey);
-		if (req->pkey != req_param->bth_pkey)
+		if (!partition_match(req->pkey, req_param->bth_pkey))
 			pr_warn_ratelimited("RDMA CMA: got different BTH P_Key (0x%x) and primary path P_Key (0x%x)\n"
 					    "RDMA CMA: in the future this may cause the request to be dropped\n",
 					    req_param->bth_pkey, req->pkey);
@@ -1396,7 +1414,7 @@  static int cma_save_req_info(const struct ib_cm_event *ib_event,
 		req->has_gid	= false;
 		req->service_id	= sidr_param->service_id;
 		req->pkey	= sidr_param->pkey;
-		if (req->pkey != sidr_param->bth_pkey)
+		if (!partition_match(req->pkey, sidr_param->bth_pkey))
 			pr_warn_ratelimited("RDMA CMA: got different BTH P_Key (0x%x) and SIDR request payload P_Key (0x%x)\n"
 					    "RDMA CMA: in the future this may cause the request to be dropped\n",
 					    sidr_param->bth_pkey, req->pkey);