diff mbox

[v2] infiniband-diags/scripts: Add 'ibcheckspeed' and 'ibcheckportspeed' to scripts

Message ID 829ded920909100602h78614ac0jd4eb1ee8d7a3779b@mail.gmail.com (mailing list archive)
State Not Applicable, archived
Headers show

Commit Message

Keshetti Mahesh Sept. 10, 2009, 1:02 p.m. UTC
Added 'ibcheckspeed' and 'ibcheckportspeed': Similar to
'ibcheckwidth/ibcheckportwidth' in functionality and implementation.
Reports error/warning messages if the LinkSpeedActive is configured as
2.5 Gbps when the LinkSpeedSupported is more than 2.5 Gbps.

Signed-off-by: Keshetti Mahesh < keshetti.mahesh@gmail.com>
---
 infiniband-diags/scripts/ibcheckportspeed.in |  146 ++++++++++++++++++++++++++
 infiniband-diags/scripts/ibcheckportwidth.in |    2 +-
 infiniband-diags/scripts/ibcheckspeed.in     |  135 ++++++++++++++++++++++++
 3 files changed, 282 insertions(+), 1 deletions(-)
 create mode 100644 infiniband-diags/scripts/ibcheckportspeed.in
 create mode 100644 infiniband-diags/scripts/ibcheckspeed.in

 /^ib/  {print $0; next}

Comments

Ira Weiny Sept. 10, 2009, 4:02 p.m. UTC | #1
Also, iblinkinfo will report links which it finds capable of either faster or wider operation.  iblinkinfo checks both ends of the link as Hal mentions.  It reports this with output like.

Switch 0x0005ad0000092106 Cisco Switch SFS7000D:
...
           7    8[  ] ==( 4X 2.5 Gbps Active/  LinkUp)==>       8   12[  ] "MT47396 Infiniscale-III Mellanox Technologies" ( Could be 5.0 Gbps)
...

Also the portstatus console command in OpenSM will report links which are running at "reduced speed or width".  Although this does not check the remote port.

OpenSM $ help portstatus
portstatus [ca|switch|router]
summarize port status
   [ca|switch|router] -- limit the results to the node type specified
OpenSM $ portstatus
"ALL" port status:
   115 port(s) scanned on 9 nodes in 26 us
   85 down
   30 active
   32 at 4X
   22 at 2.5 Gbps
   8 at 5.0 Gbps
   2 at 10.0 Gbps

Possible issues:
   2 disabled
      0x0008f10400411b18 5 (ISR9024D Voltaire)
      0x0005ad0000092106 13 (Cisco Switch SFS7000D)
   6 with reduced speed
      0x0008f10500200220 33 (Voltaire 4036 - 36 QDR ports switch)
      0x0008f10500200220 19 (Voltaire 4036 - 36 QDR ports switch)
      0x0005ad0000092106 21 (Cisco Switch SFS7000D)
      0x0005ad0000092106 20 (Cisco Switch SFS7000D)
      0x0005ad0000092106 9 (Cisco Switch SFS7000D)
      0x0005ad0000092106 8 (Cisco Switch SFS7000D)


Ira

On Thu, 10 Sep 2009 09:23:35 -0400
Hal Rosenstock <hal.rosenstock@gmail.com> wrote:

> On Thu, Sep 10, 2009 at 9:02 AM, Keshetti Mahesh
> <keshetti.mahesh@gmail.com>wrote:
> 
> > Added 'ibcheckspeed' and 'ibcheckportspeed': Similar to
> > 'ibcheckwidth/ibcheckportwidth' in functionality and implementation.
> > Reports error/warning messages if the LinkSpeedActive is configured as
> > 2.5 Gbps when the LinkSpeedSupported is more than 2.5 Gbps.
> >
> 
> ibportstate checks for more than this in terms of speed (and width)
> anomalies.
> 
> Would it be better for these scripts to use that tool now ? Alternatively,
> the additional speed/width anomaly checks could be implemented in these
> scripts but it does involve checking the peer port so there's a little more
> to it.
> 
> -- Hal
> 
> 
> >
> > Signed-off-by: Keshetti Mahesh < keshetti.mahesh@gmail.com>
> > ---
> >  infiniband-diags/scripts/ibcheckportspeed.in |  146
> > ++++++++++++++++++++++++++
> >  infiniband-diags/scripts/ibcheckportwidth.in |    2 +-
> >  infiniband-diags/scripts/ibcheckspeed.in     |  135
> > ++++++++++++++++++++++++
> >  3 files changed, 282 insertions(+), 1 deletions(-)
> >  create mode 100644 infiniband-diags/scripts/ibcheckportspeed.in
> >  create mode 100644 infiniband-diags/scripts/ibcheckspeed.in
> >
> <snip...>
>
Keshetti Mahesh Sept. 11, 2009, 4:02 a.m. UTC | #2
My badness. I have not used 'iblinkinfo' before.
So, I guess there is no need for the above script. Apart from that, I feel
there should be a program/script which will first scan the fabric to find the
maximum common supported width/speed and then report the warning messages
of the links/ports which are configured with active width/speed less
than the found
value. Is there any tool already exists which does the same ?

-
Keshetti Mahesh

On Thu, Sep 10, 2009 at 9:32 PM, Ira Weiny <weiny2@llnl.gov> wrote:
> Also, iblinkinfo will report links which it finds capable of either faster or wider operation.  iblinkinfo checks both ends of the link as Hal mentions.  It reports this with output like.
>
> Switch 0x0005ad0000092106 Cisco Switch SFS7000D:
> ...
>           7    8[  ] ==( 4X 2.5 Gbps Active/  LinkUp)==>       8   12[  ] "MT47396 Infiniscale-III Mellanox Technologies" ( Could be 5.0 Gbps)
> ...
>
> Also the portstatus console command in OpenSM will report links which are running at "reduced speed or width".  Although this does not check the remote port.
>
> OpenSM $ help portstatus
> portstatus [ca|switch|router]
> summarize port status
>   [ca|switch|router] -- limit the results to the node type specified
> OpenSM $ portstatus
> "ALL" port status:
>   115 port(s) scanned on 9 nodes in 26 us
>   85 down
>   30 active
>   32 at 4X
>   22 at 2.5 Gbps
>   8 at 5.0 Gbps
>   2 at 10.0 Gbps
>
> Possible issues:
>   2 disabled
>      0x0008f10400411b18 5 (ISR9024D Voltaire)
>      0x0005ad0000092106 13 (Cisco Switch SFS7000D)
>   6 with reduced speed
>      0x0008f10500200220 33 (Voltaire 4036 - 36 QDR ports switch)
>      0x0008f10500200220 19 (Voltaire 4036 - 36 QDR ports switch)
>      0x0005ad0000092106 21 (Cisco Switch SFS7000D)
>      0x0005ad0000092106 20 (Cisco Switch SFS7000D)
>      0x0005ad0000092106 9 (Cisco Switch SFS7000D)
>      0x0005ad0000092106 8 (Cisco Switch SFS7000D)
>
>
> Ira
>
> On Thu, 10 Sep 2009 09:23:35 -0400
> Hal Rosenstock <hal.rosenstock@gmail.com> wrote:
>
>> On Thu, Sep 10, 2009 at 9:02 AM, Keshetti Mahesh
>> <keshetti.mahesh@gmail.com>wrote:
>>
>> > Added 'ibcheckspeed' and 'ibcheckportspeed': Similar to
>> > 'ibcheckwidth/ibcheckportwidth' in functionality and implementation.
>> > Reports error/warning messages if the LinkSpeedActive is configured as
>> > 2.5 Gbps when the LinkSpeedSupported is more than 2.5 Gbps.
>> >
>>
>> ibportstate checks for more than this in terms of speed (and width)
>> anomalies.
>>
>> Would it be better for these scripts to use that tool now ? Alternatively,
>> the additional speed/width anomaly checks could be implemented in these
>> scripts but it does involve checking the peer port so there's a little more
>> to it.
>>
>> -- Hal
>>
>>
>> >
>> > Signed-off-by: Keshetti Mahesh < keshetti.mahesh@gmail.com>
>> > ---
>> >  infiniband-diags/scripts/ibcheckportspeed.in |  146
>> > ++++++++++++++++++++++++++
>> >  infiniband-diags/scripts/ibcheckportwidth.in |    2 +-
>> >  infiniband-diags/scripts/ibcheckspeed.in     |  135
>> > ++++++++++++++++++++++++
>> >  3 files changed, 282 insertions(+), 1 deletions(-)
>> >  create mode 100644 infiniband-diags/scripts/ibcheckportspeed.in
>> >  create mode 100644 infiniband-diags/scripts/ibcheckspeed.in
>> >
>> <snip...>
>>
>
>
> --
> Ira Weiny
> Math Programmer/Computer Scientist
> Lawrence Livermore National Lab
> 925-423-8008
> weiny2@llnl.gov
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Keshetti Mahesh Sept. 11, 2009, 5:46 a.m. UTC | #3
> # ibtracert 10.10.10.1 10.10.10.3

ibtracert only supports source/destination addresses to be specified
in LID/GUID format. See man page of ibtracert.

-
Keshetti Mahesh
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Barry Mavin Sept. 11, 2009, 6:13 a.m. UTC | #4
When I start the subnet manager on redhat 5.3 with:

# service opensm restart

I get the following messages in the log file.

Sep 11 11:41:46 252576 [4EF83940] 0x01 -> __osm_mcmr_rcv_join_mgrp: ERR
1B12: __validate_more_comp_fields, __validate_port_caps, or JoinState = 0
failed from port 0x0002c90300047b91 (ibas1 HCA-1), sending
IB_SA_MAD_STATUS_REQ_INVALID
Sep 11 11:41:46 252688 [4EF83940] 0x01 -> __osm_mcmr_rcv_join_mgrp: ERR
1B12: __validate_more_comp_fields, __validate_port_caps, or JoinState = 0
failed from port 0x0002c90300044c61 (ibds2 HCA-1), sending
IB_SA_MAD_STATUS_REQ_INVALID
Sep 11 11:41:46 252731 [4EF83940] 0x01 -> __osm_mcmr_rcv_join_mgrp: ERR
1B12: __validate_more_comp_fields, __validate_port_caps, or JoinState = 0
failed from port 0x0002c90300047b7d (ibds1 HCA-1), sending
IB_SA_MAD_STATUS_REQ_INVALID
Sep 11 11:41:46 252929 [43170940] 0x01 -> __osm_mcmr_rcv_join_mgrp: ERR
1B12: __validate_more_comp_fields, __validate_port_caps, or JoinState = 0
failed from port 0x0002c90300047b75 (ibas2 HCA-1), sending
IB_SA_MAD_STATUS_REQ_INVALID

What is the cause of these?

---
Regards
Barry Mavin
Recital Corporation
Chairman and CEO
Website: http://www.recital.com
MSN Messenger: Barry_Mavin@msn.com
Skype: BarryMavin
Direct line worldwide: +1 9785224139



> From: Keshetti Mahesh <keshetti.mahesh@gmail.com>
> Date: Fri, 11 Sep 2009 11:16:44 +0530
> To: Barry Mavin <Barry.Mavin@recital.com>
> Cc: OFED mailing list <linux-rdma@vger.kernel.org>, OFED mailing list
> <general@lists.openfabrics.org>
> Subject: Re: [ofa-general] Re: [PATCH v2] infiniband-diags/scripts: Add
> 'ibcheckspeed' and 'ibcheckportspeed' to scripts
> 
>> # ibtracert 10.10.10.1 10.10.10.3
> 
> ibtracert only supports source/destination addresses to be specified
> in LID/GUID format. See man page of ibtracert.
> 
> -
> Keshetti Mahesh

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Keshetti Mahesh Sept. 14, 2009, 6:20 a.m. UTC | #5
I have a small question. If there are all 5 Gbps (maximum supported
speed) ports
except one with 10 Gbps in a subnet then what is the expected behavior
of OpenSM
while setting active link speed ? Does OpenSM force the port with 10
Gbps to operate
at 5 Gbps or not ?

--
Keshetti Mahesh

On Thu, Sep 10, 2009 at 9:32 PM, Ira Weiny <weiny2@llnl.gov> wrote:
> Also, iblinkinfo will report links which it finds capable of either faster or wider operation.  iblinkinfo checks both ends of the link as Hal mentions.  It reports this with output like.
>
> Switch 0x0005ad0000092106 Cisco Switch SFS7000D:
> ...
>           7    8[  ] ==( 4X 2.5 Gbps Active/  LinkUp)==>       8   12[  ] "MT47396 Infiniscale-III Mellanox Technologies" ( Could be 5.0 Gbps)
> ...
>
> Also the portstatus console command in OpenSM will report links which are running at "reduced speed or width".  Although this does not check the remote port.
>
> OpenSM $ help portstatus
> portstatus [ca|switch|router]
> summarize port status
>   [ca|switch|router] -- limit the results to the node type specified
> OpenSM $ portstatus
> "ALL" port status:
>   115 port(s) scanned on 9 nodes in 26 us
>   85 down
>   30 active
>   32 at 4X
>   22 at 2.5 Gbps
>   8 at 5.0 Gbps
>   2 at 10.0 Gbps
>
> Possible issues:
>   2 disabled
>      0x0008f10400411b18 5 (ISR9024D Voltaire)
>      0x0005ad0000092106 13 (Cisco Switch SFS7000D)
>   6 with reduced speed
>      0x0008f10500200220 33 (Voltaire 4036 - 36 QDR ports switch)
>      0x0008f10500200220 19 (Voltaire 4036 - 36 QDR ports switch)
>      0x0005ad0000092106 21 (Cisco Switch SFS7000D)
>      0x0005ad0000092106 20 (Cisco Switch SFS7000D)
>      0x0005ad0000092106 9 (Cisco Switch SFS7000D)
>      0x0005ad0000092106 8 (Cisco Switch SFS7000D)
>
>
> Ira
>
> On Thu, 10 Sep 2009 09:23:35 -0400
> Hal Rosenstock <hal.rosenstock@gmail.com> wrote:
>
>> On Thu, Sep 10, 2009 at 9:02 AM, Keshetti Mahesh
>> <keshetti.mahesh@gmail.com>wrote:
>>
>> > Added 'ibcheckspeed' and 'ibcheckportspeed': Similar to
>> > 'ibcheckwidth/ibcheckportwidth' in functionality and implementation.
>> > Reports error/warning messages if the LinkSpeedActive is configured as
>> > 2.5 Gbps when the LinkSpeedSupported is more than 2.5 Gbps.
>> >
>>
>> ibportstate checks for more than this in terms of speed (and width)
>> anomalies.
>>
>> Would it be better for these scripts to use that tool now ? Alternatively,
>> the additional speed/width anomaly checks could be implemented in these
>> scripts but it does involve checking the peer port so there's a little more
>> to it.
>>
>> -- Hal
>>
>>
>> >
>> > Signed-off-by: Keshetti Mahesh < keshetti.mahesh@gmail.com>
>> > ---
>> >  infiniband-diags/scripts/ibcheckportspeed.in |  146
>> > ++++++++++++++++++++++++++
>> >  infiniband-diags/scripts/ibcheckportwidth.in |    2 +-
>> >  infiniband-diags/scripts/ibcheckspeed.in     |  135
>> > ++++++++++++++++++++++++
>> >  3 files changed, 282 insertions(+), 1 deletions(-)
>> >  create mode 100644 infiniband-diags/scripts/ibcheckportspeed.in
>> >  create mode 100644 infiniband-diags/scripts/ibcheckspeed.in
>> >
>> <snip...>
>>
>
>
> --
> Ira Weiny
> Math Programmer/Computer Scientist
> Lawrence Livermore National Lab
> 925-423-8008
> weiny2@llnl.gov
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ira Weiny Sept. 14, 2009, 5:48 p.m. UTC | #6
On Mon, 14 Sep 2009 09:19:06 -0400
Hal Rosenstock <hal.rosenstock@gmail.com> wrote:

> On Mon, Sep 14, 2009 at 2:20 AM, Keshetti Mahesh
> <keshetti.mahesh@gmail.com>wrote:
> 
> > I have a small question. If there are all 5 Gbps (maximum supported
> > speed) ports
> > except one with 10 Gbps in a subnet then what is the expected behavior
> > of OpenSM
> > while setting active link speed ?
> 
> 
> It depends on the peer port and the link between them.
> 
> 
> > Does OpenSM force the port with 10
> > Gbps to operate
> > at 5 Gbps or not ?
> >
> 
> SM (including OpenSM) sets PortInfo enabled components based on peer ports'
> supported components and link negotiation determines the active components.
> 
> So in the case where one port supports 10 Gbps speed and it's peer port only
> supports 5, the SM sets LinkSpeedEnabled components on the peer ports to 5
> Gbps (encoded as 3). In the case where the peer port supports 10 Gbps, it is
> set to 10 Gbps (encoded as 5 or 7 depending on what is supported). The link
> then negotiates to one of the enabled speeds and sets LinkSpeedActive
> accoridingly.

Hal is correct.

Just to be clear, the examples below are running in a cluster which I "forced" to SDR using OpenSM's "force_link_speed" option.  This just happened to be the easiest way for me to show you the output of a link which is running at sub-optimal speeds.

Ira

> 
> -- Hal
> 
> 
> 
> >
> > --
> > Keshetti Mahesh
> >
> > On Thu, Sep 10, 2009 at 9:32 PM, Ira Weiny <weiny2@llnl.gov> wrote:
> >  > Also, iblinkinfo will report links which it finds capable of either
> > faster or wider operation.  iblinkinfo checks both ends of the link as Hal
> > mentions.  It reports this with output like.
> > >
> > > Switch 0x0005ad0000092106 Cisco Switch SFS7000D:
> > > ...
> > >           7    8[  ] ==( 4X 2.5 Gbps Active/  LinkUp)==>       8   12[  ]
> > "MT47396 Infiniscale-III Mellanox Technologies" ( Could be 5.0 Gbps)
> > > ...
> > >
> > > Also the portstatus console command in OpenSM will report links which are
> > running at "reduced speed or width".  Although this does not check the
> > remote port.
> > >
> > > OpenSM $ help portstatus
> > > portstatus [ca|switch|router]
> > > summarize port status
> > >   [ca|switch|router] -- limit the results to the node type specified
> > > OpenSM $ portstatus
> > > "ALL" port status:
> > >   115 port(s) scanned on 9 nodes in 26 us
> > >   85 down
> > >   30 active
> > >   32 at 4X
> > >   22 at 2.5 Gbps
> > >   8 at 5.0 Gbps
> > >   2 at 10.0 Gbps
> > >
> > > Possible issues:
> > >   2 disabled
> > >      0x0008f10400411b18 5 (ISR9024D Voltaire)
> > >      0x0005ad0000092106 13 (Cisco Switch SFS7000D)
> > >   6 with reduced speed
> > >      0x0008f10500200220 33 (Voltaire 4036 - 36 QDR ports switch)
> > >      0x0008f10500200220 19 (Voltaire 4036 - 36 QDR ports switch)
> > >      0x0005ad0000092106 21 (Cisco Switch SFS7000D)
> > >      0x0005ad0000092106 20 (Cisco Switch SFS7000D)
> > >      0x0005ad0000092106 9 (Cisco Switch SFS7000D)
> > >      0x0005ad0000092106 8 (Cisco Switch SFS7000D)
> > >
> > >
> > > Ira
> > >
> > > On Thu, 10 Sep 2009 09:23:35 -0400
> > > Hal Rosenstock <hal.rosenstock@gmail.com> wrote:
> > >
> > >> On Thu, Sep 10, 2009 at 9:02 AM, Keshetti Mahesh
> > >> <keshetti.mahesh@gmail.com>wrote:
> > >>
> > >> > Added 'ibcheckspeed' and 'ibcheckportspeed': Similar to
> > >> > 'ibcheckwidth/ibcheckportwidth' in functionality and implementation.
> > >> > Reports error/warning messages if the LinkSpeedActive is configured as
> > >> > 2.5 Gbps when the LinkSpeedSupported is more than 2.5 Gbps.
> > >> >
> > >>
> > >> ibportstate checks for more than this in terms of speed (and width)
> > >> anomalies.
> > >>
> > >> Would it be better for these scripts to use that tool now ?
> > Alternatively,
> > >> the additional speed/width anomaly checks could be implemented in these
> > >> scripts but it does involve checking the peer port so there's a little
> > more
> > >> to it.
> > >>
> > >> -- Hal
> > >>
> > >>
> > >> >
> > >> > Signed-off-by: Keshetti Mahesh < keshetti.mahesh@gmail.com>
> > >> > ---
> > >> >  infiniband-diags/scripts/ibcheckportspeed.in |  146
> > >> > ++++++++++++++++++++++++++
> > >> >  infiniband-diags/scripts/ibcheckportwidth.in |    2 +-
> > >> >  infiniband-diags/scripts/ibcheckspeed.in     |  135
> > >> > ++++++++++++++++++++++++
> > >> >  3 files changed, 282 insertions(+), 1 deletions(-)
> > >> >  create mode 100644 infiniband-diags/scripts/ibcheckportspeed.in
> > >> >  create mode 100644 infiniband-diags/scripts/ibcheckspeed.in
> > >> >
> > >> <snip...>
> > >>
> > >
> > >
> > > --
> > > Ira Weiny
> > > Math Programmer/Computer Scientist
> > > Lawrence Livermore National Lab
> > > 925-423-8008
> > > weiny2@llnl.gov
> > >
> >
>
Ira Weiny Sept. 14, 2009, 6:02 p.m. UTC | #7
On Fri, 11 Sep 2009 09:32:39 +0530
Keshetti Mahesh <keshetti.mahesh@gmail.com> wrote:

> My badness. I have not used 'iblinkinfo' before.
> So, I guess there is no need for the above script. Apart from that, I feel
> there should be a program/script which will first scan the fabric to find the
> maximum common supported width/speed and then report the warning messages
> of the links/ports which are configured with active width/speed less
> than the found
> value. Is there any tool already exists which does the same ?

Not that I know of.

While I could see the usefulness of such a tool in some environments I have gone down the path of making the OFED diags more generic and then writing some wrappers for our local needs.  Currently I have a script which runs iblinkinfo with the "-l" option and then returns total number of links at SDR, DDR, QDR as well as the number of links at 1, 4, or 12X.  I then leave it up to the sys admin to know if their cluster is homo or heterogenious and how many links should be at what speeds.  They can then use iblinkinfo to identify which links are incorrect for their particular installation.

Ira

> 
> -
> Keshetti Mahesh
> 
> On Thu, Sep 10, 2009 at 9:32 PM, Ira Weiny <weiny2@llnl.gov> wrote:
> > Also, iblinkinfo will report links which it finds capable of either faster or wider operation.  iblinkinfo checks both ends of the link as Hal mentions.  It reports this with output like.
> >
> > Switch 0x0005ad0000092106 Cisco Switch SFS7000D:
> > ...
> >           7    8[  ] ==( 4X 2.5 Gbps Active/  LinkUp)==>       8   12[  ] "MT47396 Infiniscale-III Mellanox Technologies" ( Could be 5.0 Gbps)
> > ...
> >
> > Also the portstatus console command in OpenSM will report links which are running at "reduced speed or width".  Although this does not check the remote port.
> >
> > OpenSM $ help portstatus
> > portstatus [ca|switch|router]
> > summarize port status
> >   [ca|switch|router] -- limit the results to the node type specified
> > OpenSM $ portstatus
> > "ALL" port status:
> >   115 port(s) scanned on 9 nodes in 26 us
> >   85 down
> >   30 active
> >   32 at 4X
> >   22 at 2.5 Gbps
> >   8 at 5.0 Gbps
> >   2 at 10.0 Gbps
> >
> > Possible issues:
> >   2 disabled
> >      0x0008f10400411b18 5 (ISR9024D Voltaire)
> >      0x0005ad0000092106 13 (Cisco Switch SFS7000D)
> >   6 with reduced speed
> >      0x0008f10500200220 33 (Voltaire 4036 - 36 QDR ports switch)
> >      0x0008f10500200220 19 (Voltaire 4036 - 36 QDR ports switch)
> >      0x0005ad0000092106 21 (Cisco Switch SFS7000D)
> >      0x0005ad0000092106 20 (Cisco Switch SFS7000D)
> >      0x0005ad0000092106 9 (Cisco Switch SFS7000D)
> >      0x0005ad0000092106 8 (Cisco Switch SFS7000D)
> >
> >
> > Ira
> >
> > On Thu, 10 Sep 2009 09:23:35 -0400
> > Hal Rosenstock <hal.rosenstock@gmail.com> wrote:
> >
> >> On Thu, Sep 10, 2009 at 9:02 AM, Keshetti Mahesh
> >> <keshetti.mahesh@gmail.com>wrote:
> >>
> >> > Added 'ibcheckspeed' and 'ibcheckportspeed': Similar to
> >> > 'ibcheckwidth/ibcheckportwidth' in functionality and implementation.
> >> > Reports error/warning messages if the LinkSpeedActive is configured as
> >> > 2.5 Gbps when the LinkSpeedSupported is more than 2.5 Gbps.
> >> >
> >>
> >> ibportstate checks for more than this in terms of speed (and width)
> >> anomalies.
> >>
> >> Would it be better for these scripts to use that tool now ? Alternatively,
> >> the additional speed/width anomaly checks could be implemented in these
> >> scripts but it does involve checking the peer port so there's a little more
> >> to it.
> >>
> >> -- Hal
> >>
> >>
> >> >
> >> > Signed-off-by: Keshetti Mahesh < keshetti.mahesh@gmail.com>
> >> > ---
> >> >  infiniband-diags/scripts/ibcheckportspeed.in |  146
> >> > ++++++++++++++++++++++++++
> >> >  infiniband-diags/scripts/ibcheckportwidth.in |    2 +-
> >> >  infiniband-diags/scripts/ibcheckspeed.in     |  135
> >> > ++++++++++++++++++++++++
> >> >  3 files changed, 282 insertions(+), 1 deletions(-)
> >> >  create mode 100644 infiniband-diags/scripts/ibcheckportspeed.in
> >> >  create mode 100644 infiniband-diags/scripts/ibcheckspeed.in
> >> >
> >> <snip...>
> >>
> >
> >
> > --
> > Ira Weiny
> > Math Programmer/Computer Scientist
> > Lawrence Livermore National Lab
> > 925-423-8008
> > weiny2@llnl.gov
> >
Eli Dorfman (Voltaire) Sept. 15, 2009, 12:28 p.m. UTC | #8
Hal Rosenstock wrote:
> 
> 
> On Mon, Sep 14, 2009 at 2:02 PM, Ira Weiny <weiny2@llnl.gov
> <mailto:weiny2@llnl.gov>> wrote:
> 
>     On Fri, 11 Sep 2009 09:32:39 +0530
>     Keshetti Mahesh <keshetti.mahesh@gmail.com
>     <mailto:keshetti.mahesh@gmail.com>> wrote:
> 
>     > My badness. I have not used 'iblinkinfo' before.
>     > So, I guess there is no need for the above script. Apart from
>     that, I feel
>     > there should be a program/script which will first scan the fabric
>     to find the
>     > maximum common supported width/speed and then report the warning
>     messages
>     > of the links/ports which are configured with active width/speed less
>     > than the found
>     > value. Is there any tool already exists which does the same ?
> 
>     Not that I know of.
> 
>  
> ibportstate does this but is on a per port basis. This could be readily
> scripted (ad hoc or in tree) for this purpose.
>  

But it would be very slow for large fabrics.
I think it would be better to add this option to iblinkinfo code.
Also it would be useful to find all ports in Disable state.

Eli
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Keshetti Mahesh Sept. 15, 2009, 12:41 p.m. UTC | #9
> Also it would be useful to find all ports in Disable state.

iblinkinfo already does this with '-d' option.
Eli Dorfman (Voltaire) Sept. 15, 2009, 1:04 p.m. UTC | #10
Keshetti Mahesh wrote:
>> Also it would be useful to find all ports in Disable state.
> 
> iblinkinfo already does this with '-d' option.
> 
It shows all the port that are in Down state - either cable disconnected or port disabled (by ibportstate).

Eli

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ira Weiny Sept. 15, 2009, 4:41 p.m. UTC | #11
On Tue, 15 Sep 2009 16:04:24 +0300
"Eli Dorfman (Voltaire)" <dorfman.eli@gmail.com> wrote:

> Keshetti Mahesh wrote:
> >> Also it would be useful to find all ports in Disable state.
> > 
> > iblinkinfo already does this with '-d' option.
> > 
> It shows all the port that are in Down state - either cable disconnected or port disabled (by ibportstate).
> 
> Eli
>

The -l option was designed to show all the info for each link on a single line to be grepped.  So to look for Disabled links on a 1152 node ftree cluster.

09:37:31 > time ./iblinkinfo -l | grep Disable

real    0m3.041s
user    0m0.026s
sys     0m0.039s

Eli, If you have data for this being slow on a larger cluster I would be interested to know.  

Ira
diff mbox

Patch

diff --git a/infiniband-diags/scripts/ibcheckportspeed.in
b/infiniband-diags/scripts/ibcheckportspeed.in
new file mode 100644
index 0000000..538a7a7
--- /dev/null
+++ b/infiniband-diags/scripts/ibcheckportspeed.in
@@ -0,0 +1,146 @@ 
+#!/bin/sh
+
+IBPATH=${IBPATH:-@IBSCRIPTPATH@}
+
+function usage() {
+    echo Usage: `basename $0` "[-h] [-v] [-N | -nocolor] [-G]" \
+    "[-C ca_name] [-P ca_port] [-t(imeout) timeout_ms] <lid|guid> <port>"
+    exit -1
+}
+
+function green() {
+    if [ "$bw" = "yes" ]; then
+        if [ "$verbose" = "yes" ]; then
+            echo $1
+        fi
+        return
+    fi
+    if [ "$verbose" = "yes" ]; then
+        echo -e "\\033[1;032m" $1 "\\033[0;39m"
+    fi
+}
+
+function red() {
+    if [ "$bw" = "yes" ]; then
+        echo $1
+        return
+    fi
+    echo -e "\\033[1;031m" $1 "\\033[0;39m"
+}
+
+function blue()
+{
+    if [ "$bw" = "yes" ]; then
+        echo $1
+        return
+    fi
+    echo -e "\033[1;034m" $1 "\033[0;39m"
+}
+
+guid_addr=""
+bw=""
+verbose=""
+ca_info=""
+
+while [ "$1" ]; do
+    case $1 in
+        -G)
+        guid_addr=yes
+        ;;
+        -nocolor|-N)
+        bw=yes
+        ;;
+        -v)
+        verbose=yes
+        ;;
+        -P | -C | -t | -timeout)
+        case $2 in
+            -*)
+            usage
+            ;;
+        esac
+        if [ x$2 = x ] ; then
+            usage
+        fi
+        ca_info="$ca_info $1 $2"
+        shift
+        ;;
+        -*)
+        usage
+        ;;
+        *)
+        break
+        ;;
+    esac
+    shift
+done
+
+if [ $# -lt 2 ]
+then
+    usage
+fi
+
+portnum=$2
+
+if [ "$guid_addr" ]
+then
+    if ! lid=`$IBPATH/ibaddr $ca_info -G -L $1 | awk '/failed/{exit
-1} {print $3}'`
+    then
+        echo -n "guid $1 address resolution: "
+        red "FAILED"
+        exit -1
+    fi
+    guid=$1
+else
+    lid=$1
+    if ! temp=`$IBPATH/ibaddr $ca_info -L $1 | awk '/failed/{exit -1}
{print $3}'`
+    then
+        echo -n "lid $1 address resolution: "
+        red "FAILED"
+        exit -1
+    fi
+fi
+
+text="`eval $IBPATH/smpquery $ca_info portinfo $lid $portnum`"
+rv=$?
+if echo "$text" | sed 's/[0-9]/#&/;s/ Gbps//g' | awk -v mono=$bw -F '#' '
+function blue(s)
+{
+       if (mono)
+               printf s
+       else if (!quiet) {
+               printf "\033[1;034m" s
+               printf "\033[0;39m"
+       }
+}
+
+# Only check LinkSpeedActive if LinkSpeedSupported is not 2.5 Gbps
+/^LinkSpeedSupported/{ if ($2 == "2.5") { exit } }
+/^LinkSpeedActive/{ if ($2 == "2.5") warn = warn "#warn: Link
configured as 2.5 Gbps  lid '$lid' port '$portnum'\n"}
+
+/^ib/  {print $0; next}
+/ibpanic:/     {print $0}
+/ibwarn:/      {print $0}
+/iberror:/     {print $0}
+
+END {
+       if (err != "") {
+               blue(err)
+               exit -1
+       }
+       if (warn != "") {
+               blue(warn)
+               exit -1
+       }
+       exit 0
+}' 2>&1 && test $rv -eq 0 ; then
+       if [ "$verbose" = "yes" ]; then
+               echo -n "Port check lid $lid port $portnum: "
+               green "OK"
+       fi
+       exit 0
+else
+       echo -n "Port check lid $lid port $portnum: "
+       red "FAILED"
+       exit -1
+fi
diff --git a/infiniband-diags/scripts/ibcheckportwidth.in
b/infiniband-diags/scripts/ibcheckportwidth.in
index 32c5c5e..60a0892 100644
--- a/infiniband-diags/scripts/ibcheckportwidth.in
+++ b/infiniband-diags/scripts/ibcheckportwidth.in
@@ -103,7 +103,7 @@  function blue(s)
 }

 # Only check LinkWidthActive if LinkWidthSupported is not 1X
-/^LinkWidthSupported/{ if ($2 != "1X") { next } }
+/^LinkWidthSupported/{ if ($2 == "1X") { exit } }
 /^LinkWidthActive/{ if ($2 == "1X") warn = warn "#warn: Link
configured as 1X  lid '$lid' port '$portnum'\n"}

 /^ib/  {print $0; next}
diff --git a/infiniband-diags/scripts/ibcheckspeed.in
b/infiniband-diags/scripts/ibcheckspeed.in
new file mode 100644
index 0000000..25c2201
--- /dev/null
+++ b/infiniband-diags/scripts/ibcheckspeed.in
@@ -0,0 +1,135 @@ 
+#!/bin/sh
+
+IBPATH=${IBPATH:-@IBSCRIPTPATH@}
+
+function usage() {
+       echo Usage: `basename $0` "[-h] [-v] [-N | -nocolor]" \
+           "[<topology-file> \| -C ca_name -P ca_port -t(imeout) timeout_ms]"
+       exit -1
+}
+
+function user_abort() {
+       echo "Aborted"
+       exit 1
+}
+
+trap user_abort SIGINT
+
+gflags=""
+verbose=""
+v=0
+ntype=""
+nodeguid=""
+oldlid=""
+topofile=""
+ca_info=""
+
+while [ "$1" ]; do
+       case $1 in
+       -h)
+               usage
+               ;;
+       -N|-nocolor)
+               gflags=-N
+               ;;
+       -v)
+               verbose="-v"
+               v=1
+               ;;
+       -P | -C | -t | -timeout)
+               case $2 in
+               -*)
+                       usage
+                       ;;
+               esac
+               if [ x$2 = x ] ; then
+                       usage
+               fi
+               ca_info="$ca_info $1 $2"
+               shift
+               ;;
+       -*)
+               usage
+               ;;
+       *)
+               if [ "$topofile" ]; then
+                       usage
+               fi
+               topofile="$1"
+               ;;
+       esac
+       shift
+done
+
+if [ "$topofile" ]; then
+       netcmd="cat $topofile"
+else
+       netcmd="$IBPATH/ibnetdiscover $ca_info"
+fi
+
+text="`eval $netcmd`"
+rv=$?
+echo "$text" | awk '
+BEGIN {
+       ne=0
+       pe=0
+}
+function check_node(lid)
+{
+       nodechecked=1
+       if (system("'$IBPATH'/ibchecknode'"$ca_info"' '$gflags'
'$verbose' " lid)) {
+               ne++
+               badnode=1
+               return
+       }
+}
+
+/^Ca/ || /^Switch/ || /^Rt/ {
+                       nnodes++
+                       ntype=$1; nodeguid=substr($3, 4, 16); ports=$2
+                       if ('$v')
+                               print "\n# Checking " ntype ":
nodeguid 0x" nodeguid
+
+                       nodechecked=0
+                       badnode=0
+                       if (ntype != "Switch")
+                               next
+
+                       lid = substr($0, index($0, "port 0 lid ") + 11)
+                       lid = substr(lid, 1, index(lid, " ") - 1)
+                       check_node(lid)
+               }
+/^\[/  {
+               nports++
+               port = $1
+               if (!nodechecked) {
+                       lid = substr($0, index($0, " lid ") + 5)
+                       lid = substr(lid, 1, index(lid, " ") - 1)
+                       check_node(lid)
+               }
+               if (badnode) {
+                       print "\n# " ntype ": nodeguid 0x" nodeguid " failed"
+                       next
+               }
+               sub("\\(.*\\)", "", port)
+               gsub("[\\[\\]]", "", port)
+               if (system("'$IBPATH'/ibcheckportspeed'"$ca_info"'
'$gflags' '$verbose' " lid " " port)) {
+                       if (!'$v' && oldlid != lid) {
+                               print "# Checked " ntype ": nodeguid
0x" nodeguid " with failure"
+                               oldlid = lid
+                       }
+                       pe++;
+               }
+}
+
+/^ib/  {print $0; next}
+/ibpanic:/     {print $0}
+/ibwarn:/      {print $0}
+/iberror:/     {print $0}
+
+END {
+       printf "\n## Summary: %d nodes checked, %d bad nodes found\n",
nnodes, ne
+       printf "##          %d ports checked, %d ports with 2.5 Gbps
speed in error found\n", nports, pe
+}
+'
+exit $rv
-- 
1.6.4.2


From 76e5f441bac10dff185244139a46124ff4736d56 Mon Sep 17 00:00:00 2001
From: Keshetti Mahesh <keshetti.mahesh@gmail.com>
Date: Thu, 10 Sep 2009 18:24:16 +0530
Subject: [PATCH 2/2] Revert the change made to 'ibcheckportwidth'
 Add man pages of 'ibcheckportspeed' and 'ibcheckspeed'
 Integrate 'ibcheckportspeed' and 'ibcheckspeed' into the build system
Organization: OFED

---
 infiniband-diags/Makefile.am                 |    3 +-
 infiniband-diags/configure.in                |    2 +
 infiniband-diags/man/ibcheckportspeed.8      |   44 ++++++++++++++++++++++++++
 infiniband-diags/man/ibcheckportwidth.8      |    2 +-
 infiniband-diags/man/ibcheckspeed.8          |   37 +++++++++++++++++++++
 infiniband-diags/scripts/ibcheckportwidth.in |    2 +-
 6 files changed, 87 insertions(+), 3 deletions(-)
 create mode 100644 infiniband-diags/man/ibcheckportspeed.8
 create mode 100644 infiniband-diags/man/ibcheckspeed.8

diff --git a/infiniband-diags/Makefile.am b/infiniband-diags/Makefile.am
index 1cdb60e..8c5b773 100644
--- a/infiniband-diags/Makefile.am
+++ b/infiniband-diags/Makefile.am
@@ -23,6 +23,7 @@  sbin_SCRIPTS = scripts/ibcheckerrs
scripts/ibchecknet scripts/ibchecknode \
                scripts/ibcheckport scripts/ibhosts scripts/ibstatus \
               scripts/ibswitches scripts/ibnodes scripts/ibrouters \
               scripts/ibcheckwidth scripts/ibcheckportwidth \
+              scripts/ibcheckspeed scripts/ibcheckportspeed \
               scripts/ibcheckstate scripts/ibcheckportstate \
               scripts/ibcheckerrors scripts/ibclearerrors \
               scripts/ibclearcounters scripts/ibdatacounts \
@@ -76,7 +77,7 @@  man_MANS = man/ibaddr.8 man/ibcheckerrors.8
man/ibcheckerrs.8 \
        man/ibprintswitch.8 man/ibprintca.8 man/ibfindnodesusing.8 \
        man/ibdatacounts.8 man/ibdatacounters.8 \
        man/ibrouters.8 man/ibprintrt.8 man/ibidsverify.8 \
-       man/check_lft_balance.8
+       man/check_lft_balance.8 man/ibcheckportspeed.8 man/ibcheckspeed.8

 BUILT_SOURCES = ibdiag_version
 ibdiag_version:
diff --git a/infiniband-diags/configure.in b/infiniband-diags/configure.in
index 3ef35cc..c647f40 100644
--- a/infiniband-diags/configure.in
+++ b/infiniband-diags/configure.in
@@ -156,8 +156,10 @@  AC_CONFIG_FILES([\
        scripts/ibcheckport \
        scripts/ibcheckportstate \
        scripts/ibcheckportwidth \
+       scripts/ibcheckportspeed \
        scripts/ibcheckstate \
        scripts/ibcheckwidth \
+       scripts/ibcheckspeed \
        scripts/ibclearcounters \
        scripts/ibclearerrors \
        scripts/ibdatacounts \
diff --git a/infiniband-diags/man/ibcheckportspeed.8
b/infiniband-diags/man/ibcheckportspeed.8
new file mode 100644
index 0000000..36beaac
--- /dev/null
+++ b/infiniband-diags/man/ibcheckportspeed.8
@@ -0,0 +1,44 @@ 
+.TH IBCHECKPORTSPEED 8 "Sep 10, 2009" "OpenIB" "OpenIB Diagnostics"
+
+.SH NAME
+ibcheckportspeed \- validate IB port for link speed
+
+.SH SYNOPSIS
+.B ibcheckportspeed
+[\-h] [\-v] [\-N | \-nocolor] [\-G] [\-C ca_name] [\-P ca_port]
+[\-t(imeout) timeout_ms]  <lid|guid> <port>
+
+.SH DESCRIPTION
+.PP
+Check connectivity and check the specified port for link speed.
Report warning if the LinkSpeedSupported is greater than 2.5 Gbps and
LinkSpeedActive is configured as 2.5 Gbps.
+
+Port address is a lid unless -G option is used to specify a GUID address.
+
+.SH OPTIONS
+.PP
+\-G      use GUID address argument. In most cases, it is the Port GUID.
+        Example:
+        "0x08f1040023"
+.PP
+\-v      increase the verbosity level
+.PP
+\-N | \-nocolor use mono rather than color mode
+.PP
+\-C <ca_name>    use the specified ca_name.
+.PP
+\-P <ca_port>    use the specified ca_port.
+.PP
+\-t <timeout_ms> override the default timeout for the solicited mads.
+
+.SH EXAMPLE
+.PP
+ibcheckportspeed 2 3         # check lid 2 port 3
+
+.SH SEE ALSO
+.BR smpquery(8),
+.BR ibaddr(8)
+
+.SH AUTHOR
+.TP
+Keshetti Mahesh
+.RI < keshetti.mahesh@gmail.com >
diff --git a/infiniband-diags/man/ibcheckportwidth.8
b/infiniband-diags/man/ibcheckportwidth.8
index 85c06fc..c368467 100644
--- a/infiniband-diags/man/ibcheckportwidth.8
+++ b/infiniband-diags/man/ibcheckportwidth.8
@@ -4,7 +4,7 @@ 
 ibcheckportwidth \- validate IB port for 1x link width

 .SH SYNOPSIS
-.B ibcheckport
+.B ibcheckportwidth
 [\-h] [\-v] [\-N | \-nocolor] [\-G] [\-C ca_name] [\-P ca_port]
 [\-t(imeout) timeout_ms]  <lid|guid> <port>

diff --git a/infiniband-diags/man/ibcheckspeed.8
b/infiniband-diags/man/ibcheckspeed.8
new file mode 100644
index 0000000..29aee37
--- /dev/null
+++ b/infiniband-diags/man/ibcheckspeed.8
@@ -0,0 +1,37 @@ 
+.TH IBCHECKSPEED 8 "Sep 10, 2009" "OpenIB" "OpenIB Diagnostics"
+
+.SH NAME
+ibcheckspeed \- find link speed configuration errors in IB subnet
+
+.SH SYNOPSIS
+.B ibcheckspeed
+[\-h] [\-v] [\-N | \-nocolor] [<topology-file> | \-C ca_name
+\-P ca_port \-t(imeout) timeout_ms]
+
+
+.SH DESCRIPTION
+.PP
+ibcheckspeed is a script which uses a full topology file that was created by
+ibnetdiscover, scans the network to validate the active link speeds and reports
+any links which are configured with less active link speed then the supported
+link speed.
+
+.SH OPTIONS
+.PP
+\-N | \-nocolor  use mono rather than color mode
+.PP
+\-C <ca_name>    use the specified ca_name.
+.PP
+\-P <ca_port>    use the specified ca_port.
+.PP
+\-t <timeout_ms> override the default timeout for the solicited mads.
+
+.SH SEE ALSO
+.BR ibnetdiscover(8),
+.BR ibchecknode(8),
+.BR ibcheckportspeed(8)
+
+.SH AUTHOR
+.TP
+Keshetti Mahesh
+.RI < keshetti.mahesh@gmail.com >
diff --git a/infiniband-diags/scripts/ibcheckportwidth.in
b/infiniband-diags/scripts/ibcheckportwidth.in
index 60a0892..32c5c5e 100644
--- a/infiniband-diags/scripts/ibcheckportwidth.in
+++ b/infiniband-diags/scripts/ibcheckportwidth.in
@@ -103,7 +103,7 @@  function blue(s)
 }

 # Only check LinkWidthActive if LinkWidthSupported is not 1X
-/^LinkWidthSupported/{ if ($2 == "1X") { exit } }
+/^LinkWidthSupported/{ if ($2 != "1X") { next } }
 /^LinkWidthActive/{ if ($2 == "1X") warn = warn "#warn: Link
configured as 1X  lid '$lid' port '$portnum'\n"}