diff mbox

PATCH: opensm enhancements

Message ID 51CB5BF1.1090601@nasa.gov (mailing list archive)
State Accepted
Delegated to: Hal Rosenstock
Headers show

Commit Message

Jeff Becker June 26, 2013, 9:24 p.m. UTC
Hi Hal. At the OFA workshop, I mentioned that I've been working on some 
modifications to opensm that we use at NASA. Following extensive testing 
of these applied to opensm 3.3.13 (the version we run here), I have 
ported these to top of tree opensm, and have tested them on a small 
cluster.

The first patch modifies the console logflush command to take "on" or 
"off" as an argument for toggling. The second (more extensive) patch 
adds a command line option to specify a file in which each line contains 
a switch GUID/port pair to be ignored by opensm. The idea is to specify 
this file when you start opensm (it can be empty), and add ports to 
ignore (one per line for each end of a connection) to the file. At the 
next heavy sweep (or HUP) the sm will reprogram the forwarding tables 
without including the ignored links. We use this for replacing cables, 
as well as for system expansion (adding new racks).

Please let me know if you have any questions/issues with these. Thanks.

-jeff
From cfb1c75a2b3fe7862f376bba44ebe3671b976ccd Mon Sep 17 00:00:00 2001
From: Jeffrey C. Becker <Jeffrey.C.Becker@nasa.gov>
Date: Tue, 25 Jun 2013 10:29:45 -0700
Subject: [PATCH 1/2] opensm: permit toggling log flush from console
 Signed-off-by: Jeff Becker <Jeffrey.C.Becker@nasa.gov>

---
 opensm/osm_console.c |   18 ++++++++++++++++--
 1 files changed, 16 insertions(+), 2 deletions(-)

Comments

Hal Rosenstock July 3, 2013, 10:23 a.m. UTC | #1
HI Jeff,

On 6/26/2013 5:24 PM, Jeff Becker wrote:
> Hi Hal. At the OFA workshop, I mentioned that I've been working on some
> modifications to opensm that we use at NASA. Following extensive testing
> of these applied to opensm 3.3.13 (the version we run here), I have
> ported these to top of tree opensm, and have tested them on a small
> cluster.

Thanks for getting this done! For future reference, patches should be
sent as plain text as this makes it easier to comment.

> The first patch modifies the console logflush command to take "on" or
> "off" as an argument for toggling. 

Thanks. Applied.

> The second (more extensive) patch
> adds a command line option to specify a file in which each line contains
> a switch GUID/port pair to be ignored by opensm. The idea is to specify
> this file when you start opensm (it can be empty), and add ports to
> ignore (one per line for each end of a connection) to the file. At the
> next heavy sweep (or HUP) the sm will reprogram the forwarding tables
> without including the ignored links. We use this for replacing cables,
> as well as for system expansion (adding new racks).

I'll comment on this one later.

-- Hal

> Please let me know if you have any questions/issues with these. Thanks.
> 
> -jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jeff Becker July 3, 2013, 4:20 p.m. UTC | #2
Hi Hal,

I have some testing info about the second patch below.

On 07/03/2013 03:23 AM, Hal Rosenstock wrote:
> HI Jeff,
>
> On 6/26/2013 5:24 PM, Jeff Becker wrote:
>> Hi Hal. At the OFA workshop, I mentioned that I've been working on some
>> modifications to opensm that we use at NASA. Following extensive testing
>> of these applied to opensm 3.3.13 (the version we run here), I have
>> ported these to top of tree opensm, and have tested them on a small
>> cluster.
> Thanks for getting this done! For future reference, patches should be
> sent as plain text as this makes it easier to comment.

OK. So I just send the output of git-format-patch directly? It appears 
to be formatted properly.
>
>> The first patch modifies the console logflush command to take "on" or
>> "off" as an argument for toggling.
> Thanks. Applied.
>
>> The second (more extensive) patch
>> adds a command line option to specify a file in which each line contains
>> a switch GUID/port pair to be ignored by opensm. The idea is to specify
>> this file when you start opensm (it can be empty), and add ports to
>> ignore (one per line for each end of a connection) to the file. At the
>> next heavy sweep (or HUP) the sm will reprogram the forwarding tables
>> without including the ignored links. We use this for replacing cables,
>> as well as for system expansion (adding new racks).
> I'll comment on this one later.

Dale (cc'd) did some testing with my patch on Pleiades in preparation 
for a system augmentation (new racks) happening soon. He found that the 
SM correctly produces routes that do not use links marked to be ignored, 
but when you then remove or disable the links, the SM re-routes the 
fabric anyway and comes up with different routes than before. This 
rerouting causes problems with existing connections. There also appears 
to be a bookkeeping problem such that some of these links get added to 
the SM's "light sampling" list and never get removed. This ties up 
outstanding MAD packet slots, causing the SM to become unresponsive for 
several seconds every time it reviews its light sampling list.

I'm working on fixing these. I'll take care of the second problem 
(incorrectly getting added to the light sampling list) first. Is it 
possible this problem is related to the re-routing on port disable 
problem? Anyhow, if you have any specific comments about these issues, 
that would be great. Thanks, and have a great Fourth of July.

-jeff
>
> -- Hal
>
>> Please let me know if you have any questions/issues with these. Thanks.
>>
>> -jeff

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hal Rosenstock July 3, 2013, 5:24 p.m. UTC | #3
Hi again Jeff,

On 7/3/2013 12:20 PM, Jeff Becker wrote:
> Hi Hal,
> 
> I have some testing info about the second patch below.
> 
> On 07/03/2013 03:23 AM, Hal Rosenstock wrote:
>> HI Jeff,
>>
>> On 6/26/2013 5:24 PM, Jeff Becker wrote:
>>> Hi Hal. At the OFA workshop, I mentioned that I've been working on some
>>> modifications to opensm that we use at NASA. Following extensive testing
>>> of these applied to opensm 3.3.13 (the version we run here), I have
>>> ported these to top of tree opensm, and have tested them on a small
>>> cluster.
>> Thanks for getting this done! For future reference, patches should be
>> sent as plain text as this makes it easier to comment.
> 
> OK. So I just send the output of git-format-patch directly? It appears
> to be formatted properly.
>>
>>> The first patch modifies the console logflush command to take "on" or
>>> "off" as an argument for toggling.
>> Thanks. Applied.
>>
>>> The second (more extensive) patch
>>> adds a command line option to specify a file in which each line contains
>>> a switch GUID/port pair to be ignored by opensm. The idea is to specify
>>> this file when you start opensm (it can be empty), and add ports to
>>> ignore (one per line for each end of a connection) to the file. At the
>>> next heavy sweep (or HUP) the sm will reprogram the forwarding tables
>>> without including the ignored links. We use this for replacing cables,
>>> as well as for system expansion (adding new racks).
>> I'll comment on this one later.
> 
> Dale (cc'd) did some testing with my patch on Pleiades in preparation
> for a system augmentation (new racks) happening soon. He found that the
> SM correctly produces routes that do not use links marked to be ignored,
> but when you then remove or disable the links, the SM re-routes the
> fabric anyway and comes up with different routes than before. This
> rerouting causes problems with existing connections. There also appears
> to be a bookkeeping problem such that some of these links get added to
> the SM's "light sampling" list and never get removed. This ties up
> outstanding MAD packet slots, causing the SM to become unresponsive for
> several seconds every time it reviews its light sampling list.

Yes, this is one of several issues with using this approach.

I plan on detailing these later as well as posting a slightly different
approach for this but that may take a little longer...

> I'm working on fixing these. I'll take care of the second problem
> (incorrectly getting added to the light sampling list) first. Is it
> possible this problem is related to the re-routing on port disable
> problem? Anyhow, if you have any specific comments about these issues,
> that would be great. 

> Thanks, and have a great Fourth of July.

Thanks; you too!

-- Hal

> -jeff
>>
>> -- Hal
>>
>>> Please let me know if you have any questions/issues with these. Thanks.
>>>
>>> -jeff
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/opensm/osm_console.c b/opensm/osm_console.c
index 0f80bdb..c065453 100644
--- a/opensm/osm_console.c
+++ b/opensm/osm_console.c
@@ -178,7 +178,7 @@  static void help_status(FILE * out, int detail)
 
 static void help_logflush(FILE * out, int detail)
 {
-	fprintf(out, "logflush -- flush the opensm.log file\n");
+	fprintf(out, "logflush [on|off] -- toggle opensm.log file flushing\n");
 }
 
 static void help_querylid(FILE * out, int detail)
@@ -599,7 +599,21 @@  static void sweep_parse(char **p_last, osm_opensm_t * p_osm, FILE * out)
 
 static void logflush_parse(char **p_last, osm_opensm_t * p_osm, FILE * out)
 {
-	fflush(p_osm->log.out_port);
+	char *p_cmd;
+
+	p_cmd = next_token(p_last);
+	if (!p_cmd ||
+	    (strcmp(p_cmd, "on") != 0 && strcmp(p_cmd, "off") != 0)) {
+		fprintf(out, "Invalid logflush command\n");
+		help_sweep(out, 1);
+	} else {
+		if (strcmp(p_cmd, "on") == 0) {
+			p_osm->log.flush = TRUE;
+	                fflush(p_osm->log.out_port);
+                }
+		else
+			p_osm->log.flush = FALSE;
+	}
 }
 
 static void querylid_parse(char **p_last, osm_opensm_t * p_osm, FILE * out)