diff mbox

[rdma-core/srp_daemon] Correct method field for PathRecord request

Message ID 1491924533-2548-1-git-send-email-honli@redhat.com (mailing list archive)
State Superseded
Headers show

Commit Message

Honggang LI April 11, 2017, 3:28 p.m. UTC
From: Honggang Li <honli@redhat.com>

According to InfiniBand Architecture Release 1.2.1, Table 208
Example PathRecord Request MAD Header Fields, MADHeader:Method
should setup to 0x12 (SubnAdmGetTable()).

Before send the MAD packet for PathRecord request, init_srp_mad setup
out_mad->method to SRP_MAD_METHOD_GET (0x01) for get_shared_pkeys. But
get_shared_pkeys setup the attr_id field to SRP_SA_ATTR_PATH_REC (0x35).

Because of this incorrect field in MAD packet, upstream srptools-1.0.3
failed with an embedded subnet manager running on an Intel True Scale
Edge Switch 12300.

Upstream srptools-1.0.3 works with upstream opensm, because
sa_mad_ctrl_process post the MAD to the dispatcher based on the attr_id
field. As attr_id had been set to SRP_SA_ATTR_PATH_REC (0x35),
PathRecord query works with opensm.

opensm/opensm/osm_sa_mad_ctrl.c
   292	static void sa_mad_ctrl_rcv_callback(IN osm_madw_t * p_madw, IN void *context,
.........
   357		switch (p_sa_mad->method) {
.........
   370		case IB_MAD_METHOD_GET:        // 0x01
   371		case IB_MAD_METHOD_GETTABLE:   // 0x12
   372	#if defined (VENDOR_RMPP_SUPPORT) && defined (DUAL_SIDED_RMPP)
   373		case IB_MAD_METHOD_GETMULTI:
   374	#endif
   375			is_get_request = TRUE;
   376		case IB_MAD_METHOD_SET:
   377		case IB_MAD_METHOD_DELETE:
   378			/* if we are closing down simply do nothing */
   379			if (osm_exit_flag)
   380				osm_mad_pool_put(p_ctrl->p_mad_pool, p_madw);
   381			else
   382				sa_mad_ctrl_process(p_ctrl, p_madw, is_get_request);
   383			break;
   384

2ad09524931dbf98d412e1912c1bdbf22f8ac81d ("srp_daemon: Work around SM
bug over non-default P_Key support") in the old git tree [1] introduced
this regression issue. Upstream srptools-0.0.4 works with the embedded
subnet manager.

[1] git://git.openfabrics.org/~bvanassche/srptools.git

Signed-off-by: Honggang Li <honli@redhat.com>
---
 srp_daemon/srp_daemon.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Jason Gunthorpe April 11, 2017, 3:49 p.m. UTC | #1
On Tue, Apr 11, 2017 at 11:28:53PM +0800, Honggang LI wrote:
> From: Honggang Li <honli@redhat.com>
> 
> According to InfiniBand Architecture Release 1.2.1, Table 208
> Example PathRecord Request MAD Header Fields, MADHeader:Method
> should setup to 0x12 (SubnAdmGetTable()).

That is just an example, Table 192 shows that Get and GetTable are
both valid query types.

I however expect that the SM will fail when using GetTable if the
query returns more than one result, could that be what is happening?
When you convert it to GET_TABLE does it return more than one result?

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hal Rosenstock April 11, 2017, 4:20 p.m. UTC | #2
On 4/11/2017 11:28 AM, Honggang LI wrote:
> From: Honggang Li <honli@redhat.com>
> 
> According to InfiniBand Architecture Release 1.2.1, Table 208
> Example PathRecord Request MAD Header Fields, MADHeader:Method
> should setup to 0x12 (SubnAdmGetTable()).

Both get and get table should be supported for Path Record.

> Before send the MAD packet for PathRecord request, init_srp_mad setup
> out_mad->method to SRP_MAD_METHOD_GET (0x01) for get_shared_pkeys. But
> get_shared_pkeys setup the attr_id field to SRP_SA_ATTR_PATH_REC (0x35).
> 
> Because of this incorrect field in MAD packet, upstream srptools-1.0.3
> failed with an embedded subnet manager running on an Intel True Scale
> Edge Switch 12300.

Sounds like a bug in the embedded SM.

If it works with GetTable and not with Get, sounds like there is more
than 1 path available to be returned.

SA PathRecord:NumbPath says:
In a SubnAdmGetTable() query request: Maximum
number of paths to return for each unique SGIDDGID
combination. If more paths that satisfy the
PathRecord query exist for a given SGID-DGID combination,
only NumbPath paths shall be returned
(implementation defined).
In a SubnAdmGet() query request, ignored; a value of
1 is used.

So in case of multiple paths and get method, SM is supposed to pick one
rather than return some error (like too many records). Is that the MAD
status returned ?

-- Hal

> Upstream srptools-1.0.3 works with upstream opensm, because
> sa_mad_ctrl_process post the MAD to the dispatcher based on the attr_id
> field. As attr_id had been set to SRP_SA_ATTR_PATH_REC (0x35),
> PathRecord query works with opensm.
> 
> opensm/opensm/osm_sa_mad_ctrl.c
>    292	static void sa_mad_ctrl_rcv_callback(IN osm_madw_t * p_madw, IN void *context,
> .........
>    357		switch (p_sa_mad->method) {
> .........
>    370		case IB_MAD_METHOD_GET:        // 0x01
>    371		case IB_MAD_METHOD_GETTABLE:   // 0x12
>    372	#if defined (VENDOR_RMPP_SUPPORT) && defined (DUAL_SIDED_RMPP)
>    373		case IB_MAD_METHOD_GETMULTI:
>    374	#endif
>    375			is_get_request = TRUE;
>    376		case IB_MAD_METHOD_SET:
>    377		case IB_MAD_METHOD_DELETE:
>    378			/* if we are closing down simply do nothing */
>    379			if (osm_exit_flag)
>    380				osm_mad_pool_put(p_ctrl->p_mad_pool, p_madw);
>    381			else
>    382				sa_mad_ctrl_process(p_ctrl, p_madw, is_get_request);
>    383			break;
>    384
> 
> 2ad09524931dbf98d412e1912c1bdbf22f8ac81d ("srp_daemon: Work around SM
> bug over non-default P_Key support") in the old git tree [1] introduced
> this regression issue. Upstream srptools-0.0.4 works with the embedded
> subnet manager.
> 
> [1] git://git.openfabrics.org/~bvanassche/srptools.git
> 
> Signed-off-by: Honggang Li <honli@redhat.com>
> ---
>  srp_daemon/srp_daemon.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/srp_daemon/srp_daemon.c b/srp_daemon/srp_daemon.c
> index 71b5f07..f905c6f 100644
> --- a/srp_daemon/srp_daemon.c
> +++ b/srp_daemon/srp_daemon.c
> @@ -1134,6 +1134,7 @@ static int get_shared_pkeys(struct resources *res,
>  			continue;
>  
>  		/* Mark components: DLID, SLID, PKEY */
> +		out_sa_mad->method = SRP_SA_METHOD_GET_TABLE;
>  		out_sa_mad->comp_mask = htobe64(1 << 4 | 1 << 5 | 1 << 13);
>  		out_sa_mad->rmpp_hdr.rmpp_version = UMAD_RMPP_VERSION;
>  		out_sa_mad->rmpp_hdr.rmpp_type = 1;
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Honggang LI April 12, 2017, 1:29 p.m. UTC | #3
On Tue, Apr 11, 2017 at 09:49:44AM -0600, Jason Gunthorpe wrote:
> On Tue, Apr 11, 2017 at 11:28:53PM +0800, Honggang LI wrote:
> > From: Honggang Li <honli@redhat.com>
> > 
> > According to InfiniBand Architecture Release 1.2.1, Table 208
> > Example PathRecord Request MAD Header Fields, MADHeader:Method
> > should setup to 0x12 (SubnAdmGetTable()).
> 
> That is just an example, Table 192 shows that Get and GetTable are
> both valid query types.

I don't understand this as "Table 192 PortInfoRecord" seems nothing to
do with PathRecord request.

> 
> I however expect that the SM will fail when using GetTable if the
> query returns more than one result, could that be what is happening?
> When you convert it to GET_TABLE does it return more than one result?

strace output shows srptools always received MAD packet in 320 bytes. So
the embedded subnet manager only returns one result.

> 
> Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hal Rosenstock April 12, 2017, 1:34 p.m. UTC | #4
On 4/12/2017 9:29 AM, Honggang LI wrote:
> On Tue, Apr 11, 2017 at 09:49:44AM -0600, Jason Gunthorpe wrote:
>> On Tue, Apr 11, 2017 at 11:28:53PM +0800, Honggang LI wrote:
>>> From: Honggang Li <honli@redhat.com>
>>>
>>> According to InfiniBand Architecture Release 1.2.1, Table 208
>>> Example PathRecord Request MAD Header Fields, MADHeader:Method
>>> should setup to 0x12 (SubnAdmGetTable()).
>>
>> That is just an example, Table 192 shows that Get and GetTable are
>> both valid query types.
> 
> I don't understand this as "Table 192 PortInfoRecord" seems nothing to
> do with PathRecord request.

In IBA 1.3 volume 1, it's Table 209 Subnet Administration
Attribute/Method Map shows PathRecord methods are Get and GetTable.

>>
>> I however expect that the SM will fail when using GetTable if the
>> query returns more than one result, could that be what is happening?
>> When you convert it to GET_TABLE does it return more than one result?
> 
> strace output shows srptools always received MAD packet in 320 bytes. So
> the embedded subnet manager only returns one result.

What is MAD status ?

-- Hal

>>
>> Jason
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hal Rosenstock April 12, 2017, 1:57 p.m. UTC | #5
On 4/11/2017 12:20 PM, Hal Rosenstock wrote:
> On 4/11/2017 11:28 AM, Honggang LI wrote:
>> From: Honggang Li <honli@redhat.com>
>>
>> According to InfiniBand Architecture Release 1.2.1, Table 208
>> Example PathRecord Request MAD Header Fields, MADHeader:Method
>> should setup to 0x12 (SubnAdmGetTable()).
> 
> Both get and get table should be supported for Path Record.
> 
>> Before send the MAD packet for PathRecord request, init_srp_mad setup
>> out_mad->method to SRP_MAD_METHOD_GET (0x01) for get_shared_pkeys. But
>> get_shared_pkeys setup the attr_id field to SRP_SA_ATTR_PATH_REC (0x35).
>>
>> Because of this incorrect field in MAD packet, upstream srptools-1.0.3
>> failed with an embedded subnet manager running on an Intel True Scale
>> Edge Switch 12300.
> 
> Sounds like a bug in the embedded SM.
> 
> If it works with GetTable and not with Get, sounds like there is more
> than 1 path available to be returned.
> 
> SA PathRecord:NumbPath says:
> In a SubnAdmGetTable() query request: Maximum
> number of paths to return for each unique SGIDDGID
> combination. If more paths that satisfy the
> PathRecord query exist for a given SGID-DGID combination,
> only NumbPath paths shall be returned
> (implementation defined).
> In a SubnAdmGet() query request, ignored; a value of
> 1 is used.
> 
> So in case of multiple paths and get method, SM is supposed to pick one
> rather than return some error (like too many records). Is that the MAD
> status returned ?
> 
> -- Hal
> 
>> Upstream srptools-1.0.3 works with upstream opensm, because
>> sa_mad_ctrl_process post the MAD to the dispatcher based on the attr_id
>> field. As attr_id had been set to SRP_SA_ATTR_PATH_REC (0x35),
>> PathRecord query works with opensm.
>>
>> opensm/opensm/osm_sa_mad_ctrl.c
>>    292	static void sa_mad_ctrl_rcv_callback(IN osm_madw_t * p_madw, IN void *context,
>> .........
>>    357		switch (p_sa_mad->method) {
>> .........
>>    370		case IB_MAD_METHOD_GET:        // 0x01
>>    371		case IB_MAD_METHOD_GETTABLE:   // 0x12
>>    372	#if defined (VENDOR_RMPP_SUPPORT) && defined (DUAL_SIDED_RMPP)
>>    373		case IB_MAD_METHOD_GETMULTI:
>>    374	#endif
>>    375			is_get_request = TRUE;
>>    376		case IB_MAD_METHOD_SET:
>>    377		case IB_MAD_METHOD_DELETE:
>>    378			/* if we are closing down simply do nothing */
>>    379			if (osm_exit_flag)
>>    380				osm_mad_pool_put(p_ctrl->p_mad_pool, p_madw);
>>    381			else
>>    382				sa_mad_ctrl_process(p_ctrl, p_madw, is_get_request);
>>    383			break;
>>    384
>>
>> 2ad09524931dbf98d412e1912c1bdbf22f8ac81d ("srp_daemon: Work around SM
>> bug over non-default P_Key support") in the old git tree [1] introduced
>> this regression issue. Upstream srptools-0.0.4 works with the embedded
>> subnet manager.
>>
>> [1] git://git.openfabrics.org/~bvanassche/srptools.git
>>
>> Signed-off-by: Honggang Li <honli@redhat.com>
>> ---
>>  srp_daemon/srp_daemon.c | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/srp_daemon/srp_daemon.c b/srp_daemon/srp_daemon.c
>> index 71b5f07..f905c6f 100644
>> --- a/srp_daemon/srp_daemon.c
>> +++ b/srp_daemon/srp_daemon.c
>> @@ -1134,6 +1134,7 @@ static int get_shared_pkeys(struct resources *res,
>>  			continue;
>>  
>>  		/* Mark components: DLID, SLID, PKEY */
>> +		out_sa_mad->method = SRP_SA_METHOD_GET_TABLE;
>>  		out_sa_mad->comp_mask = htobe64(1 << 4 | 1 << 5 | 1 << 13);

Note that just changing this to GetTable is noncompliant. SA PathRecord
says that for a GetTable request both SGID and NumbPath are required.

-- Hal

>>  		out_sa_mad->rmpp_hdr.rmpp_version = UMAD_RMPP_VERSION;
>>  		out_sa_mad->rmpp_hdr.rmpp_type = 1;
>>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Honggang LI April 12, 2017, 2:18 p.m. UTC | #6
On Wed, Apr 12, 2017 at 09:34:20AM -0400, Hal Rosenstock wrote:
> On 4/12/2017 9:29 AM, Honggang LI wrote:
> > On Tue, Apr 11, 2017 at 09:49:44AM -0600, Jason Gunthorpe wrote:
> >> On Tue, Apr 11, 2017 at 11:28:53PM +0800, Honggang LI wrote:
> >>> From: Honggang Li <honli@redhat.com>
> >>>
> >>> According to InfiniBand Architecture Release 1.2.1, Table 208
> >>> Example PathRecord Request MAD Header Fields, MADHeader:Method
> >>> should setup to 0x12 (SubnAdmGetTable()).
> >>
> >> That is just an example, Table 192 shows that Get and GetTable are
> >> both valid query types.
> > 
> > I don't understand this as "Table 192 PortInfoRecord" seems nothing to
> > do with PathRecord request.
> 
> In IBA 1.3 volume 1, it's Table 209 Subnet Administration
> Attribute/Method Map shows PathRecord methods are Get and GetTable.
> 

http://www.intel.com/content/dam/www/public/us/en/documents/product-briefs/truescale-infiniband-12300-switch-brief.pdf
Interoperability
• Compliant with IBTA* specifications 1.0a, 1.1, 1.2, and 1.2.

It seems the embedded subnet manager does not support IBA 1.3

> >>
> >> I however expect that the SM will fail when using GetTable if the
> >> query returns more than one result, could that be what is happening?
> >> When you convert it to GET_TABLE does it return more than one result?
> > 
> > strace output shows srptools always received MAD packet in 320 bytes. So
> > the embedded subnet manager only returns one result.
> 
> What is MAD status ?

ib_user_mad->status     = 0x0

I attached the MAD packet dump out with strace. Please see the
attachments p4.in amd p4.decode.

> 
> -- Hal
> 
> >>
> >> Jason
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x40\x01\x00\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x03\x02\x81\x00\x00\x00\x00\x00\x00\x00\x08\x00\x00\x00\x04\x00\x35\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x20\x30\x00\x00\x00\x00\x00\x00\x00\x00\xfe\x80\x00\x00\x00\x00\x00\x00\x00\x11\x75\x00\x00\x6f\x9b\xba\xfe\x80\x00\x00\x00\x00\x00\x00\x00\x11\x75\x00\x00\x6f\x9a\x34\x01\x9b\x00\xed\x00\x00\x00\x00\x00\x80\xff\xff\x00\x00\x84\x87\x8f\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
ib_user_mad->agent_id	= 0x0
ib_user_mad->status	= 0x0
ib_user_mad->timeout_ms	= 0x0
ib_user_mad->retries	= 0x0
ib_user_mad->length	= 0x140 = 320
ib_mad_addr->qpn		= BE32 0x1000000
ib_mad_addr->qkey		= BE32 0x0
ib_mad_addr->lid		= BE16 0x100
ib_mad_addr->sl 		= 0x0
ib_mad_addr->path_bits		= 0x0
ib_mad_addr->grh_present	= 0x0
ib_mad_addr->gid_index		= 0x0
ib_mad_addr->hop_limit		= 0x0
ib_mad_addr->traffic_class	= 0x0
ib_mad_addr->gid[16]= 0x\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
ib_mad_addr->flow_label		= BE32 0x0
ib_mad_addr->pkey_index		= BE16 0x0
ib_mad_addr->reserved[6]= 0x\x00\x00\x00\x00\x00\x00
srp_dm_rmpp_sa_mad->base_version	= 0x1
srp_dm_rmpp_sa_mad->mgmt_class		= 0x3
srp_dm_rmpp_sa_mad->class_version	= 0x2
srp_dm_rmpp_sa_mad->method		= 0x81 //IB_MAD_METHOD_GET_RESP opensm/include/iba/ib_types.h // GetResp() Method (13.4.5)
srp_dm_rmpp_sa_mad->status		= 0x0
srp_dm_rmpp_sa_mad->reserved1		= 0x0
srp_dm_rmpp_sa_mad->tid			= 0x400000008000000
srp_dm_rmpp_sa_mad->attr_id		= 0x3500
srp_dm_rmpp_sa_mad->reserved2		= 0x0
srp_dm_rmpp_sa_mad->attr_mod		= 0x0
srp_dm_rmpp_sa_mad->rmpp_version	= 0x0
srp_dm_rmpp_sa_mad->rmpp_type		= 0x0
srp_dm_rmpp_sa_mad->rmpp_rtime_flags	= 0x0
srp_dm_rmpp_sa_mad->rmpp_status		= 0x0
srp_dm_rmpp_sa_mad->seg_num		= 0x0
srp_dm_rmpp_sa_mad->paylen_newwin	= 0x0
srp_dm_rmpp_sa_mad->sm_key		= 0x0
srp_dm_rmpp_sa_mad->attr_offset		= 0x0
srp_dm_rmpp_sa_mad->reserved3		= 0x0
srp_dm_rmpp_sa_mad->comp_mask		= 0x3020000000000000
ib_path_rec_t->resv0[8]			= \x00\x00\x00\x00
ib_path_rec_t->dgid			= \xfe\x00\x00\x00\x00\x75\x00\x9b
ib_path_rec_t->sgid			= \xfe\x00\x00\x00\x00\x75\x00\x9a
ib_path_rec_t->dlid				= ib_net16_t 0x9b01
ib_path_rec_t->slid				= ib_net16_t 0xed00
ib_path_rec_t->hop_flow_raw			= ib_net32_t 0x0
ib_path_rec_t->tclass				= 0x0
ib_path_rec_t->num_path				= 0x80
ib_path_rec_t->pkey				= ib_net16_t 0xffff
ib_path_rec_t->sl				= ib_net16_t 0x0
ib_path_rec_t->mtu				= 0x84
ib_path_rec_t->rate				= 0x87
ib_path_rec_t->pkt_life				= 0x8f
ib_path_rec_t->preference			= 0x0
ib_path_rec_t->resv2[6]				= \x00\x00\x00
Honggang LI April 12, 2017, 2:24 p.m. UTC | #7
On Tue, Apr 11, 2017 at 12:20:29PM -0400, Hal Rosenstock wrote:
> On 4/11/2017 11:28 AM, Honggang LI wrote:
> > From: Honggang Li <honli@redhat.com>
> > 
> > According to InfiniBand Architecture Release 1.2.1, Table 208
> > Example PathRecord Request MAD Header Fields, MADHeader:Method
> > should setup to 0x12 (SubnAdmGetTable()).
> 
> Both get and get table should be supported for Path Record.
> 
> > Before send the MAD packet for PathRecord request, init_srp_mad setup
> > out_mad->method to SRP_MAD_METHOD_GET (0x01) for get_shared_pkeys. But
> > get_shared_pkeys setup the attr_id field to SRP_SA_ATTR_PATH_REC (0x35).
> > 
> > Because of this incorrect field in MAD packet, upstream srptools-1.0.3
> > failed with an embedded subnet manager running on an Intel True Scale
> > Edge Switch 12300.
> 
> Sounds like a bug in the embedded SM.
> 
> If it works with GetTable and not with Get, sounds like there is more
> than 1 path available to be returned.
> 

It is a big fat tree IB network. Some SRPT had been connected to the
same leaf switch as the node srptools running. Other SRPT connected to
different leaf switch. But PathRecord Request failed for all of SRPTs. 

> SA PathRecord:NumbPath says:
> In a SubnAdmGetTable() query request: Maximum
> number of paths to return for each unique SGIDDGID
> combination. If more paths that satisfy the
> PathRecord query exist for a given SGID-DGID combination,
> only NumbPath paths shall be returned
> (implementation defined).
> In a SubnAdmGet() query request, ignored; a value of
> 1 is used.
> 
> So in case of multiple paths and get method, SM is supposed to pick one
> rather than return some error (like too many records). Is that the MAD
> status returned ?

status always been zero.

> 
> -- Hal
> 
> > Upstream srptools-1.0.3 works with upstream opensm, because
> > sa_mad_ctrl_process post the MAD to the dispatcher based on the attr_id
> > field. As attr_id had been set to SRP_SA_ATTR_PATH_REC (0x35),
> > PathRecord query works with opensm.
> > 
> > opensm/opensm/osm_sa_mad_ctrl.c
> >    292	static void sa_mad_ctrl_rcv_callback(IN osm_madw_t * p_madw, IN void *context,
> > .........
> >    357		switch (p_sa_mad->method) {
> > .........
> >    370		case IB_MAD_METHOD_GET:        // 0x01
> >    371		case IB_MAD_METHOD_GETTABLE:   // 0x12
> >    372	#if defined (VENDOR_RMPP_SUPPORT) && defined (DUAL_SIDED_RMPP)
> >    373		case IB_MAD_METHOD_GETMULTI:
> >    374	#endif
> >    375			is_get_request = TRUE;
> >    376		case IB_MAD_METHOD_SET:
> >    377		case IB_MAD_METHOD_DELETE:
> >    378			/* if we are closing down simply do nothing */
> >    379			if (osm_exit_flag)
> >    380				osm_mad_pool_put(p_ctrl->p_mad_pool, p_madw);
> >    381			else
> >    382				sa_mad_ctrl_process(p_ctrl, p_madw, is_get_request);
> >    383			break;
> >    384
> > 
> > 2ad09524931dbf98d412e1912c1bdbf22f8ac81d ("srp_daemon: Work around SM
> > bug over non-default P_Key support") in the old git tree [1] introduced
> > this regression issue. Upstream srptools-0.0.4 works with the embedded
> > subnet manager.
> > 
> > [1] git://git.openfabrics.org/~bvanassche/srptools.git
> > 
> > Signed-off-by: Honggang Li <honli@redhat.com>
> > ---
> >  srp_daemon/srp_daemon.c | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/srp_daemon/srp_daemon.c b/srp_daemon/srp_daemon.c
> > index 71b5f07..f905c6f 100644
> > --- a/srp_daemon/srp_daemon.c
> > +++ b/srp_daemon/srp_daemon.c
> > @@ -1134,6 +1134,7 @@ static int get_shared_pkeys(struct resources *res,
> >  			continue;
> >  
> >  		/* Mark components: DLID, SLID, PKEY */
> > +		out_sa_mad->method = SRP_SA_METHOD_GET_TABLE;
> >  		out_sa_mad->comp_mask = htobe64(1 << 4 | 1 << 5 | 1 << 13);
> >  		out_sa_mad->rmpp_hdr.rmpp_version = UMAD_RMPP_VERSION;
> >  		out_sa_mad->rmpp_hdr.rmpp_type = 1;
> > 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hal Rosenstock April 12, 2017, 2:34 p.m. UTC | #8
On 4/12/2017 10:18 AM, Honggang LI wrote:
> On Wed, Apr 12, 2017 at 09:34:20AM -0400, Hal Rosenstock wrote:
>> On 4/12/2017 9:29 AM, Honggang LI wrote:
>>> On Tue, Apr 11, 2017 at 09:49:44AM -0600, Jason Gunthorpe wrote:
>>>> On Tue, Apr 11, 2017 at 11:28:53PM +0800, Honggang LI wrote:
>>>>> From: Honggang Li <honli@redhat.com>
>>>>>
>>>>> According to InfiniBand Architecture Release 1.2.1, Table 208
>>>>> Example PathRecord Request MAD Header Fields, MADHeader:Method
>>>>> should setup to 0x12 (SubnAdmGetTable()).
>>>>
>>>> That is just an example, Table 192 shows that Get and GetTable are
>>>> both valid query types.
>>>
>>> I don't understand this as "Table 192 PortInfoRecord" seems nothing to
>>> do with PathRecord request.
>>
>> In IBA 1.3 volume 1, it's Table 209 Subnet Administration
>> Attribute/Method Map shows PathRecord methods are Get and GetTable.
>>
> 
> http://www.intel.com/content/dam/www/public/us/en/documents/product-briefs/truescale-infiniband-12300-switch-brief.pdf
> Interoperability
> • Compliant with IBTA* specifications 1.0a, 1.1, 1.2, and 1.2.
> 
> It seems the embedded subnet manager does not support IBA 1.3

It's same for IBA 1.2.1. This part hasn't changed.

>>>>
>>>> I however expect that the SM will fail when using GetTable if the
>>>> query returns more than one result, could that be what is happening?
>>>> When you convert it to GET_TABLE does it return more than one result?
>>>
>>> strace output shows srptools always received MAD packet in 320 bytes. So
>>> the embedded subnet manager only returns one result.
>>
>> What is MAD status ?
> 
> ib_user_mad->status     = 0x0
> 
> I attached the MAD packet dump out with strace. Please see the
> attachments p4.in amd p4.decode.

p4.decode has:
rp_dm_rmpp_sa_mad->method		= 0x81 //IB_MAD_METHOD_GET_RESP
opensm/include/iba/ib_types.h // GetResp() Method (13.4.5)

Proper response to GetTable (0x12) method is GetTableResponse (0x92)
method but this is GetResp (0x81) response.

-- Hal

>>
>> -- Hal
>>
>>>>
>>>> Jason
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Honggang LI April 12, 2017, 2:49 p.m. UTC | #9
On Wed, Apr 12, 2017 at 10:34:47AM -0400, Hal Rosenstock wrote:
> On 4/12/2017 10:18 AM, Honggang LI wrote:
> > On Wed, Apr 12, 2017 at 09:34:20AM -0400, Hal Rosenstock wrote:
> >> On 4/12/2017 9:29 AM, Honggang LI wrote:
> >>> On Tue, Apr 11, 2017 at 09:49:44AM -0600, Jason Gunthorpe wrote:
> >>>> On Tue, Apr 11, 2017 at 11:28:53PM +0800, Honggang LI wrote:
> >>>>> From: Honggang Li <honli@redhat.com>
> >>>>>
> >>>>> According to InfiniBand Architecture Release 1.2.1, Table 208
> >>>>> Example PathRecord Request MAD Header Fields, MADHeader:Method
> >>>>> should setup to 0x12 (SubnAdmGetTable()).
> >>>>
> >>>> That is just an example, Table 192 shows that Get and GetTable are
> >>>> both valid query types.
> >>>
> >>> I don't understand this as "Table 192 PortInfoRecord" seems nothing to
> >>> do with PathRecord request.
> >>
> >> In IBA 1.3 volume 1, it's Table 209 Subnet Administration
> >> Attribute/Method Map shows PathRecord methods are Get and GetTable.
> >>
> > 
> > http://www.intel.com/content/dam/www/public/us/en/documents/product-briefs/truescale-infiniband-12300-switch-brief.pdf
> > Interoperability
> > • Compliant with IBTA* specifications 1.0a, 1.1, 1.2, and 1.2.
> > 
> > It seems the embedded subnet manager does not support IBA 1.3
> 
> It's same for IBA 1.2.1. This part hasn't changed.
> 
> >>>>
> >>>> I however expect that the SM will fail when using GetTable if the
> >>>> query returns more than one result, could that be what is happening?
> >>>> When you convert it to GET_TABLE does it return more than one result?
> >>>
> >>> strace output shows srptools always received MAD packet in 320 bytes. So
> >>> the embedded subnet manager only returns one result.
> >>
> >> What is MAD status ?
> > 
> > ib_user_mad->status     = 0x0
> > 
> > I attached the MAD packet dump out with strace. Please see the
> > attachments p4.in amd p4.decode.
> 
> p4.decode has:
> rp_dm_rmpp_sa_mad->method		= 0x81 //IB_MAD_METHOD_GET_RESP
> opensm/include/iba/ib_types.h // GetResp() Method (13.4.5)
> 
> Proper response to GetTable (0x12) method is GetTableResponse (0x92)
> method but this is GetResp (0x81) response.

Yes, you are right. p4.decode was captured with unpatched upstream srptools-1.0.3.

p4-184.decode was created with patched srptools-1.0.3.

> 
> -- Hal
> 
> >>
> >> -- Hal
> >>
> >>>>
> >>>> Jason
> >>> --
> >>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> >>> the body of a message to majordomo@vger.kernel.org
> >>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
ib_user_mad->agent_id	= 0x0
ib_user_mad->status	= 0x0
ib_user_mad->timeout_ms	= 0x0
ib_user_mad->retries	= 0x0
ib_user_mad->length	= 0xb8 = 184
ib_mad_addr->qpn		= BE32 0x1000000
ib_mad_addr->qkey		= BE32 0x0
ib_mad_addr->lid		= BE16 0x100
ib_mad_addr->sl 		= 0x0
ib_mad_addr->path_bits		= 0x0
ib_mad_addr->grh_present	= 0x0
ib_mad_addr->gid_index		= 0x0
ib_mad_addr->hop_limit		= 0x0
ib_mad_addr->traffic_class	= 0x0
ib_mad_addr->gid[16]= 0x\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
ib_mad_addr->flow_label		= BE32 0x0
ib_mad_addr->pkey_index		= BE16 0x0
ib_mad_addr->reserved[6]= 0x\x00\x00\x00\x00\x00\x00
srp_dm_rmpp_sa_mad->base_version	= 0x1
srp_dm_rmpp_sa_mad->mgmt_class		= 0x3
srp_dm_rmpp_sa_mad->class_version	= 0x2
srp_dm_rmpp_sa_mad->method		= 0x92 // IB_MAD_METHOD_GETTABLE_RESP opensm/include/iba/ib_types.h // SubnAdmGetTableResp() Method (15.2.2)
srp_dm_rmpp_sa_mad->status		= 0x0
srp_dm_rmpp_sa_mad->reserved1		= 0x0
srp_dm_rmpp_sa_mad->tid			= 0x400000035000000
srp_dm_rmpp_sa_mad->attr_id		= 0x3500
srp_dm_rmpp_sa_mad->reserved2		= 0x0
srp_dm_rmpp_sa_mad->attr_mod		= 0x0
srp_dm_rmpp_sa_mad->rmpp_version	= 0x1
srp_dm_rmpp_sa_mad->rmpp_type		= 0x1
srp_dm_rmpp_sa_mad->rmpp_rtime_flags	= 0xff
srp_dm_rmpp_sa_mad->rmpp_status		= 0x0
srp_dm_rmpp_sa_mad->seg_num		= 0x1000000
srp_dm_rmpp_sa_mad->paylen_newwin	= 0x54000000
srp_dm_rmpp_sa_mad->sm_key		= 0x0
srp_dm_rmpp_sa_mad->attr_offset		= 0x800
srp_dm_rmpp_sa_mad->reserved3		= 0x0
srp_dm_rmpp_sa_mad->comp_mask		= 0x0
ib_path_rec_t->resv0[8]			= \x00\x00\x00\x00
ib_path_rec_t->dgid			= \xfe\x00\x00\x00\x00\x75\x00\x9b
ib_path_rec_t->sgid			= \xfe\x00\x00\x00\x00\x75\x00\x9a
ib_path_rec_t->dlid				= ib_net16_t 0x9b01
ib_path_rec_t->slid				= ib_net16_t 0xed00
ib_path_rec_t->hop_flow_raw			= ib_net32_t 0x0
ib_path_rec_t->tclass				= 0x0
ib_path_rec_t->num_path				= 0x80
ib_path_rec_t->pkey				= ib_net16_t 0xffff
ib_path_rec_t->sl				= ib_net16_t 0x0
ib_path_rec_t->mtu				= 0x84
ib_path_rec_t->rate				= 0x87
ib_path_rec_t->pkt_life				= 0x8f
ib_path_rec_t->preference			= 0x0
ib_path_rec_t->resv2[6]				= \x00\x00\x00
Hal Rosenstock April 12, 2017, 4:51 p.m. UTC | #10
On 4/12/2017 10:49 AM, Honggang LI wrote:
> On Wed, Apr 12, 2017 at 10:34:47AM -0400, Hal Rosenstock wrote:
>> On 4/12/2017 10:18 AM, Honggang LI wrote:
>>> On Wed, Apr 12, 2017 at 09:34:20AM -0400, Hal Rosenstock wrote:
>>>> On 4/12/2017 9:29 AM, Honggang LI wrote:
>>>>> On Tue, Apr 11, 2017 at 09:49:44AM -0600, Jason Gunthorpe wrote:
>>>>>> On Tue, Apr 11, 2017 at 11:28:53PM +0800, Honggang LI wrote:
>>>>>>> From: Honggang Li <honli@redhat.com>
>>>>>>>
>>>>>>> According to InfiniBand Architecture Release 1.2.1, Table 208
>>>>>>> Example PathRecord Request MAD Header Fields, MADHeader:Method
>>>>>>> should setup to 0x12 (SubnAdmGetTable()).
>>>>>>
>>>>>> That is just an example, Table 192 shows that Get and GetTable are
>>>>>> both valid query types.
>>>>>
>>>>> I don't understand this as "Table 192 PortInfoRecord" seems nothing to
>>>>> do with PathRecord request.
>>>>
>>>> In IBA 1.3 volume 1, it's Table 209 Subnet Administration
>>>> Attribute/Method Map shows PathRecord methods are Get and GetTable.
>>>>
>>>
>>> http://www.intel.com/content/dam/www/public/us/en/documents/product-briefs/truescale-infiniband-12300-switch-brief.pdf
>>> Interoperability
>>> • Compliant with IBTA* specifications 1.0a, 1.1, 1.2, and 1.2.
>>>
>>> It seems the embedded subnet manager does not support IBA 1.3
>>
>> It's same for IBA 1.2.1. This part hasn't changed.
>>
>>>>>>
>>>>>> I however expect that the SM will fail when using GetTable if the
>>>>>> query returns more than one result, could that be what is happening?
>>>>>> When you convert it to GET_TABLE does it return more than one result?
>>>>>
>>>>> strace output shows srptools always received MAD packet in 320 bytes. So
>>>>> the embedded subnet manager only returns one result.
>>>>
>>>> What is MAD status ?
>>>
>>> ib_user_mad->status     = 0x0
>>>
>>> I attached the MAD packet dump out with strace. Please see the
>>> attachments p4.in amd p4.decode.
>>
>> p4.decode has:
>> rp_dm_rmpp_sa_mad->method		= 0x81 //IB_MAD_METHOD_GET_RESP
>> opensm/include/iba/ib_types.h // GetResp() Method (13.4.5)
>>
>> Proper response to GetTable (0x12) method is GetTableResponse (0x92)
>> method but this is GetResp (0x81) response.
> 
> Yes, you are right. p4.decode was captured with unpatched upstream srptools-1.0.3.

against the embedded SM ? If so, this response looks (mostly) good to me and should be accepted by srp_daemon. I think I see the issue in srp_daemon: it's due to the check of the attribute offset which is not being set in the response:

                size = ib_get_attr_size(in_sa_mad->attr_offset);
                if (!size) {
                        if (config->verbose)
                                printf("PathRec Query did not find any targets "
                                       "over P_Key %x\n", pkey);
                        continue;
                }

p4.decode has:
srp_dm_rmpp_sa_mad->attr_offset		= 0x0

Since a get rather than get table is being used, a good MAD status means that a valid PR was returned and there is no need for this check.

Would you try using the original get method without this code block and verify that it works with the embedded SM ?

Thanks.

-- Hal
 
> p4-184.decode was created with patched srptools-1.0.3.

 
>>
>> -- Hal
>>
>>>>
>>>> -- Hal
>>>>
>>>>>>
>>>>>> Jason
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hal Rosenstock April 12, 2017, 5:39 p.m. UTC | #11
On 4/12/2017 12:51 PM, Hal Rosenstock wrote:
> On 4/12/2017 10:49 AM, Honggang LI wrote:
>> On Wed, Apr 12, 2017 at 10:34:47AM -0400, Hal Rosenstock wrote:
>>> On 4/12/2017 10:18 AM, Honggang LI wrote:
>>>> On Wed, Apr 12, 2017 at 09:34:20AM -0400, Hal Rosenstock wrote:
>>>>> On 4/12/2017 9:29 AM, Honggang LI wrote:
>>>>>> On Tue, Apr 11, 2017 at 09:49:44AM -0600, Jason Gunthorpe wrote:
>>>>>>> On Tue, Apr 11, 2017 at 11:28:53PM +0800, Honggang LI wrote:
>>>>>>>> From: Honggang Li <honli@redhat.com>
>>>>>>>>
>>>>>>>> According to InfiniBand Architecture Release 1.2.1, Table 208
>>>>>>>> Example PathRecord Request MAD Header Fields, MADHeader:Method
>>>>>>>> should setup to 0x12 (SubnAdmGetTable()).
>>>>>>>
>>>>>>> That is just an example, Table 192 shows that Get and GetTable are
>>>>>>> both valid query types.
>>>>>>
>>>>>> I don't understand this as "Table 192 PortInfoRecord" seems nothing to
>>>>>> do with PathRecord request.
>>>>>
>>>>> In IBA 1.3 volume 1, it's Table 209 Subnet Administration
>>>>> Attribute/Method Map shows PathRecord methods are Get and GetTable.
>>>>>
>>>>
>>>> http://www.intel.com/content/dam/www/public/us/en/documents/product-briefs/truescale-infiniband-12300-switch-brief.pdf
>>>> Interoperability
>>>> • Compliant with IBTA* specifications 1.0a, 1.1, 1.2, and 1.2.
>>>>
>>>> It seems the embedded subnet manager does not support IBA 1.3
>>>
>>> It's same for IBA 1.2.1. This part hasn't changed.
>>>
>>>>>>>
>>>>>>> I however expect that the SM will fail when using GetTable if the
>>>>>>> query returns more than one result, could that be what is happening?
>>>>>>> When you convert it to GET_TABLE does it return more than one result?
>>>>>>
>>>>>> strace output shows srptools always received MAD packet in 320 bytes. So
>>>>>> the embedded subnet manager only returns one result.
>>>>>
>>>>> What is MAD status ?
>>>>
>>>> ib_user_mad->status     = 0x0
>>>>
>>>> I attached the MAD packet dump out with strace. Please see the
>>>> attachments p4.in amd p4.decode.
>>>
>>> p4.decode has:
>>> rp_dm_rmpp_sa_mad->method		= 0x81 //IB_MAD_METHOD_GET_RESP
>>> opensm/include/iba/ib_types.h // GetResp() Method (13.4.5)
>>>
>>> Proper response to GetTable (0x12) method is GetTableResponse (0x92)
>>> method but this is GetResp (0x81) response.
>>
>> Yes, you are right. p4.decode was captured with unpatched upstream srptools-1.0.3.
> 
> against the embedded SM ? If so, this response looks (mostly) good to me and should be accepted by srp_daemon. I think I see the issue in srp_daemon: it's due to the check of the attribute offset which is not being set in the response:
> 
>                 size = ib_get_attr_size(in_sa_mad->attr_offset);
>                 if (!size) {
>                         if (config->verbose)
>                                 printf("PathRec Query did not find any targets "
>                                        "over P_Key %x\n", pkey);
>                         continue;
>                 }
> 
> p4.decode has:
> srp_dm_rmpp_sa_mad->attr_offset		= 0x0
> 
> Since a get rather than get table is being used, a good MAD status means that a valid PR was returned and there is no need for this check.
> 
> Would you try using the original get method without this code block and verify that it works with the embedded SM ?

Just submitted PR for this:
https://github.com/linux-rdma/rdma-core/pull/116
I can post patch if you like.

-- Hal

> Thanks.
> 
> -- Hal
>  
>> p4-184.decode was created with patched srptools-1.0.3.
> 
>  
>>>
>>> -- Hal
>>>
>>>>>
>>>>> -- Hal
>>>>>
>>>>>>>
>>>>>>> Jason
>>>>>> --
>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Honggang LI April 12, 2017, 11:08 p.m. UTC | #12
On Wed, Apr 12, 2017 at 12:51:25PM -0400, Hal Rosenstock wrote:
> On 4/12/2017 10:49 AM, Honggang LI wrote:
> > On Wed, Apr 12, 2017 at 10:34:47AM -0400, Hal Rosenstock wrote:
> >> On 4/12/2017 10:18 AM, Honggang LI wrote:
> >>> On Wed, Apr 12, 2017 at 09:34:20AM -0400, Hal Rosenstock wrote:
> >>>> On 4/12/2017 9:29 AM, Honggang LI wrote:
> >>>>> On Tue, Apr 11, 2017 at 09:49:44AM -0600, Jason Gunthorpe wrote:
> >>>>>> On Tue, Apr 11, 2017 at 11:28:53PM +0800, Honggang LI wrote:
> >>>>>>> From: Honggang Li <honli@redhat.com>
> >>>>>>>
> >>>>>>> According to InfiniBand Architecture Release 1.2.1, Table 208
> >>>>>>> Example PathRecord Request MAD Header Fields, MADHeader:Method
> >>>>>>> should setup to 0x12 (SubnAdmGetTable()).
> >>>>>>
> >>>>>> That is just an example, Table 192 shows that Get and GetTable are
> >>>>>> both valid query types.
> >>>>>
> >>>>> I don't understand this as "Table 192 PortInfoRecord" seems nothing to
> >>>>> do with PathRecord request.
> >>>>
> >>>> In IBA 1.3 volume 1, it's Table 209 Subnet Administration
> >>>> Attribute/Method Map shows PathRecord methods are Get and GetTable.
> >>>>
> >>>
> >>> http://www.intel.com/content/dam/www/public/us/en/documents/product-briefs/truescale-infiniband-12300-switch-brief.pdf
> >>> Interoperability
> >>> • Compliant with IBTA* specifications 1.0a, 1.1, 1.2, and 1.2.
> >>>
> >>> It seems the embedded subnet manager does not support IBA 1.3
> >>
> >> It's same for IBA 1.2.1. This part hasn't changed.
> >>
> >>>>>>
> >>>>>> I however expect that the SM will fail when using GetTable if the
> >>>>>> query returns more than one result, could that be what is happening?
> >>>>>> When you convert it to GET_TABLE does it return more than one result?
> >>>>>
> >>>>> strace output shows srptools always received MAD packet in 320 bytes. So
> >>>>> the embedded subnet manager only returns one result.
> >>>>
> >>>> What is MAD status ?
> >>>
> >>> ib_user_mad->status     = 0x0
> >>>
> >>> I attached the MAD packet dump out with strace. Please see the
> >>> attachments p4.in amd p4.decode.
> >>
> >> p4.decode has:
> >> rp_dm_rmpp_sa_mad->method		= 0x81 //IB_MAD_METHOD_GET_RESP
> >> opensm/include/iba/ib_types.h // GetResp() Method (13.4.5)
> >>
> >> Proper response to GetTable (0x12) method is GetTableResponse (0x92)
> >> method but this is GetResp (0x81) response.
> > 
> > Yes, you are right. p4.decode was captured with unpatched upstream srptools-1.0.3.
> 
> against the embedded SM ? 

Yes.

> If so, this response looks (mostly) good to me and should be accepted by srp_daemon. I think I see the issue in srp_daemon: it's due to the check of the attribute offset which is not being set in the response:
> 
>                 size = ib_get_attr_size(in_sa_mad->attr_offset);
>                 if (!size) {
>                         if (config->verbose)
>                                 printf("PathRec Query did not find any targets "
>                                        "over P_Key %x\n", pkey);
>                         continue;
>                 }
> 
> p4.decode has:
> srp_dm_rmpp_sa_mad->attr_offset		= 0x0
> 
> Since a get rather than get table is being used, a good MAD status means that a valid PR was returned and there is no need for this check.
> 
> Would you try using the original get method without this code block and verify that it works with the embedded SM ?
>

I sent the scratch RPM with the patch you suggested to third party for testing. Please wait.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Honggang LI April 13, 2017, 1:42 p.m. UTC | #13
On Thu, Apr 13, 2017 at 07:08:58AM +0800, Honggang LI wrote:
> On Wed, Apr 12, 2017 at 12:51:25PM -0400, Hal Rosenstock wrote:
> > On 4/12/2017 10:49 AM, Honggang LI wrote:
> > > On Wed, Apr 12, 2017 at 10:34:47AM -0400, Hal Rosenstock wrote:
> > >> On 4/12/2017 10:18 AM, Honggang LI wrote:
> > >>> On Wed, Apr 12, 2017 at 09:34:20AM -0400, Hal Rosenstock wrote:
> > >>>> On 4/12/2017 9:29 AM, Honggang LI wrote:
> > >>>>> On Tue, Apr 11, 2017 at 09:49:44AM -0600, Jason Gunthorpe wrote:
> > >>>>>> On Tue, Apr 11, 2017 at 11:28:53PM +0800, Honggang LI wrote:
> > >>>>>>> From: Honggang Li <honli@redhat.com>
> > >>>>>>>
> > >>>>>>> According to InfiniBand Architecture Release 1.2.1, Table 208
> > >>>>>>> Example PathRecord Request MAD Header Fields, MADHeader:Method
> > >>>>>>> should setup to 0x12 (SubnAdmGetTable()).
> > >>>>>>
> > >>>>>> That is just an example, Table 192 shows that Get and GetTable are
> > >>>>>> both valid query types.
> > >>>>>
> > >>>>> I don't understand this as "Table 192 PortInfoRecord" seems nothing to
> > >>>>> do with PathRecord request.
> > >>>>
> > >>>> In IBA 1.3 volume 1, it's Table 209 Subnet Administration
> > >>>> Attribute/Method Map shows PathRecord methods are Get and GetTable.
> > >>>>
> > >>>
> > >>> http://www.intel.com/content/dam/www/public/us/en/documents/product-briefs/truescale-infiniband-12300-switch-brief.pdf
> > >>> Interoperability
> > >>> • Compliant with IBTA* specifications 1.0a, 1.1, 1.2, and 1.2.
> > >>>
> > >>> It seems the embedded subnet manager does not support IBA 1.3
> > >>
> > >> It's same for IBA 1.2.1. This part hasn't changed.
> > >>
> > >>>>>>
> > >>>>>> I however expect that the SM will fail when using GetTable if the
> > >>>>>> query returns more than one result, could that be what is happening?
> > >>>>>> When you convert it to GET_TABLE does it return more than one result?
> > >>>>>
> > >>>>> strace output shows srptools always received MAD packet in 320 bytes. So
> > >>>>> the embedded subnet manager only returns one result.
> > >>>>
> > >>>> What is MAD status ?
> > >>>
> > >>> ib_user_mad->status     = 0x0
> > >>>
> > >>> I attached the MAD packet dump out with strace. Please see the
> > >>> attachments p4.in amd p4.decode.
> > >>
> > >> p4.decode has:
> > >> rp_dm_rmpp_sa_mad->method		= 0x81 //IB_MAD_METHOD_GET_RESP
> > >> opensm/include/iba/ib_types.h // GetResp() Method (13.4.5)
> > >>
> > >> Proper response to GetTable (0x12) method is GetTableResponse (0x92)
> > >> method but this is GetResp (0x81) response.
> > > 
> > > Yes, you are right. p4.decode was captured with unpatched upstream srptools-1.0.3.
> > 
> > against the embedded SM ? 
> 
> Yes.
> 
> > If so, this response looks (mostly) good to me and should be accepted by srp_daemon. I think I see the issue in srp_daemon: it's due to the check of the attribute offset which is not being set in the response:
> > 
> >                 size = ib_get_attr_size(in_sa_mad->attr_offset);
> >                 if (!size) {
> >                         if (config->verbose)
> >                                 printf("PathRec Query did not find any targets "
> >                                        "over P_Key %x\n", pkey);
> >                         continue;
> >                 }
> > 
> > p4.decode has:
> > srp_dm_rmpp_sa_mad->attr_offset		= 0x0
> > 
> > Since a get rather than get table is being used, a good MAD status means that a valid PR was returned and there is no need for this check.
> > 
> > Would you try using the original get method without this code block and verify that it works with the embedded SM ?

Yes, it works. Thanks

> >
> 
> I sent the scratch RPM with the patch you suggested to third party for testing. Please wait.
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Honggang LI April 13, 2017, 1:58 p.m. UTC | #14
On Wed, Apr 12, 2017 at 01:39:02PM -0400, Hal Rosenstock wrote:
> On 4/12/2017 12:51 PM, Hal Rosenstock wrote:
> > On 4/12/2017 10:49 AM, Honggang LI wrote:
> >> On Wed, Apr 12, 2017 at 10:34:47AM -0400, Hal Rosenstock wrote:
> >>> On 4/12/2017 10:18 AM, Honggang LI wrote:
> >>>> On Wed, Apr 12, 2017 at 09:34:20AM -0400, Hal Rosenstock wrote:
> >>>>> On 4/12/2017 9:29 AM, Honggang LI wrote:
> >>>>>> On Tue, Apr 11, 2017 at 09:49:44AM -0600, Jason Gunthorpe wrote:
> >>>>>>> On Tue, Apr 11, 2017 at 11:28:53PM +0800, Honggang LI wrote:
> >>>>>>>> From: Honggang Li <honli@redhat.com>
> >>>>>>>>
> >>>>>>>> According to InfiniBand Architecture Release 1.2.1, Table 208
> >>>>>>>> Example PathRecord Request MAD Header Fields, MADHeader:Method
> >>>>>>>> should setup to 0x12 (SubnAdmGetTable()).
> >>>>>>>
> >>>>>>> That is just an example, Table 192 shows that Get and GetTable are
> >>>>>>> both valid query types.
> >>>>>>
> >>>>>> I don't understand this as "Table 192 PortInfoRecord" seems nothing to
> >>>>>> do with PathRecord request.
> >>>>>
> >>>>> In IBA 1.3 volume 1, it's Table 209 Subnet Administration
> >>>>> Attribute/Method Map shows PathRecord methods are Get and GetTable.
> >>>>>
> >>>>
> >>>> http://www.intel.com/content/dam/www/public/us/en/documents/product-briefs/truescale-infiniband-12300-switch-brief.pdf
> >>>> Interoperability
> >>>> • Compliant with IBTA* specifications 1.0a, 1.1, 1.2, and 1.2.
> >>>>
> >>>> It seems the embedded subnet manager does not support IBA 1.3
> >>>
> >>> It's same for IBA 1.2.1. This part hasn't changed.
> >>>
> >>>>>>>
> >>>>>>> I however expect that the SM will fail when using GetTable if the
> >>>>>>> query returns more than one result, could that be what is happening?
> >>>>>>> When you convert it to GET_TABLE does it return more than one result?
> >>>>>>
> >>>>>> strace output shows srptools always received MAD packet in 320 bytes. So
> >>>>>> the embedded subnet manager only returns one result.
> >>>>>
> >>>>> What is MAD status ?
> >>>>
> >>>> ib_user_mad->status     = 0x0
> >>>>
> >>>> I attached the MAD packet dump out with strace. Please see the
> >>>> attachments p4.in amd p4.decode.
> >>>
> >>> p4.decode has:
> >>> rp_dm_rmpp_sa_mad->method		= 0x81 //IB_MAD_METHOD_GET_RESP
> >>> opensm/include/iba/ib_types.h // GetResp() Method (13.4.5)
> >>>
> >>> Proper response to GetTable (0x12) method is GetTableResponse (0x92)
> >>> method but this is GetResp (0x81) response.
> >>
> >> Yes, you are right. p4.decode was captured with unpatched upstream srptools-1.0.3.
> > 
> > against the embedded SM ? If so, this response looks (mostly) good to me and should be accepted by srp_daemon. I think I see the issue in srp_daemon: it's due to the check of the attribute offset which is not being set in the response:
> > 
> >                 size = ib_get_attr_size(in_sa_mad->attr_offset);
> >                 if (!size) {
> >                         if (config->verbose)
> >                                 printf("PathRec Query did not find any targets "
> >                                        "over P_Key %x\n", pkey);
> >                         continue;
> >                 }
> > 
> > p4.decode has:
> > srp_dm_rmpp_sa_mad->attr_offset		= 0x0
> > 
> > Since a get rather than get table is being used, a good MAD status means that a valid PR was returned and there is no need for this check.
> > 
> > Would you try using the original get method without this code block and verify that it works with the embedded SM ?
> 
> Just submitted PR for this:
> https://github.com/linux-rdma/rdma-core/pull/116

Those three patches PASSED the test.

> I can post patch if you like.

Please fix the typo I mentioned in the PR and post those three patches.

thanks
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/srp_daemon/srp_daemon.c b/srp_daemon/srp_daemon.c
index 71b5f07..f905c6f 100644
--- a/srp_daemon/srp_daemon.c
+++ b/srp_daemon/srp_daemon.c
@@ -1134,6 +1134,7 @@  static int get_shared_pkeys(struct resources *res,
 			continue;
 
 		/* Mark components: DLID, SLID, PKEY */
+		out_sa_mad->method = SRP_SA_METHOD_GET_TABLE;
 		out_sa_mad->comp_mask = htobe64(1 << 4 | 1 << 5 | 1 << 13);
 		out_sa_mad->rmpp_hdr.rmpp_version = UMAD_RMPP_VERSION;
 		out_sa_mad->rmpp_hdr.rmpp_type = 1;