diff mbox

[RFC] mlx4: Avoid that mlx4_cmd_wait() contributes to the system load

Message ID 51ED4E60.30203@acm.org (mailing list archive)
State Rejected
Headers show

Commit Message

Bart Van Assche July 22, 2013, 3:23 p.m. UTC
Avoid that kernel threads running mlx4_cmd_wait() contribute to the
system load by setting the task state to TASK_INTERRUPTIBLE instead
of TASK_UNINTERRUPTIBLE while waiting. This patch reduces the load
average from about 0.5 to about 0.0 on an idle system with one mlx4
HCA and no IB cables connected.

Note: I'm posting this patch as an RFC since it involves a behavior
change (a signal sent to a worker thread that is waiting for a
command to finish causes the command to fail) and since I'm not sure
this behavior change is acceptable.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/net/ethernet/mellanox/mlx4/cmd.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Or Gerlitz July 24, 2013, 3:17 p.m. UTC | #1
On 22/07/2013 18:23, Bart Van Assche wrote:
> Avoid that kernel threads running mlx4_cmd_wait() contribute to the
> system load by setting the task state to TASK_INTERRUPTIBLE instead
> of TASK_UNINTERRUPTIBLE while waiting. This patch reduces the load
> average from about 0.5 to about 0.0 on an idle system with one mlx4
> HCA and no IB cables connected.

Before diving to the implications of the patch, lets discuss the 
phenomena you see...
So the load average on this idle system is 0.5 or 0.05?


I don't see 0.5 or a like on my systems that are installed with HCAs and 
are idle.  Could it
be that some IB management entity is repeatedly  sending MADs to this 
system?

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bart Van Assche July 24, 2013, 4:48 p.m. UTC | #2
On 07/24/13 17:17, Or Gerlitz wrote:
> On 22/07/2013 18:23, Bart Van Assche wrote:
>> Avoid that kernel threads running mlx4_cmd_wait() contribute to the
>> system load by setting the task state to TASK_INTERRUPTIBLE instead
>> of TASK_UNINTERRUPTIBLE while waiting. This patch reduces the load
>> average from about 0.5 to about 0.0 on an idle system with one mlx4
>> HCA and no IB cables connected.
>
> Before diving to the implications of the patch, lets discuss the
> phenomena you see...
> So the load average on this idle system is 0.5 or 0.05?
>
> I don't see 0.5 or a like on my systems that are installed with HCAs and
> are idle.  Could it
> be that some IB management entity is repeatedly  sending MADs to this
> system?

Hello Or,

I saw a load of 0.5 with several different upstream kernels (3.6..3.10 
at least). The only IB-related process that was running on the system 
was opensmd. This is definitely reproducible. It was only a month after 
I had noticed this phenomenon that I started searching for the root cause.

Bart.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Or Gerlitz July 24, 2013, 5:06 p.m. UTC | #3
On 24/07/2013 19:48, Bart Van Assche wrote:
> I saw a load of 0.5 with several different upstream kernels (3.6..3.10 
> at least). The only IB-related process that was running on the system 
> was opensmd. This is definitely reproducible. It was only a month 
> after I had noticed this phenomenon that I started searching for the 
> root cause. 
do you see it also on systems that don't run opensm?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bart Van Assche July 24, 2013, 5:13 p.m. UTC | #4
On 07/24/13 19:06, Or Gerlitz wrote:
> On 24/07/2013 19:48, Bart Van Assche wrote:
>> I saw a load of 0.5 with several different upstream kernels (3.6..3.10
>> at least). The only IB-related process that was running on the system
>> was opensmd. This is definitely reproducible. It was only a month
>> after I had noticed this phenomenon that I started searching for the
>> root cause.
> do you see it also on systems that don't run opensm?

That's a test I have not yet run. So sorry, I don't know whether this 
also happens without opensm.

Bart.


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bart Van Assche July 24, 2013, 6:34 p.m. UTC | #5
On 07/24/13 19:06, Or Gerlitz wrote:
> On 24/07/2013 19:48, Bart Van Assche wrote:
>> I saw a load of 0.5 with several different upstream kernels (3.6..3.10
>> at least). The only IB-related process that was running on the system
>> was opensmd. This is definitely reproducible. It was only a month
>> after I had noticed this phenomenon that I started searching for the
>> root cause.
> do you see it also on systems that don't run opensm?

Yes. This happens both on systems running opensm and on systems not 
running opensm. A call trace from a system on which CPU load was higher 
than expected is as follows:

# echo w >/proc/sysrq-trigger; dmesg -c
SysRq : Show Blocked State
   task                        PC stack   pid father
kworker/u:7     D ffff88011fa125c0     0   181      2 0x00000000
  ffff880114d63b48 0000000000000046 ffff8801158d1c40 ffff880114d63fd8
  ffff880114d63fd8 ffff880114d63fd8 ffffffff81613440 ffff8801158d1c40
  ffffffff817a7180 ffff880114d63b80 ffffffff817a7180 000000010000ebf9
Call Trace:
  [<ffffffff813ea259>] schedule+0x29/0x70
  [<ffffffff813e88aa>] schedule_timeout+0x10a/0x1e0
  [<ffffffff813e98e2>] wait_for_common+0xd2/0x180
  [<ffffffff813e9a63>] wait_for_completion_timeout+0x13/0x20
  [<ffffffffa03bfb99>] __mlx4_cmd+0x259/0x5e0 [mlx4_core]
  [<ffffffffa03d60e4>] mlx4_SENSE_PORT+0x54/0xc0 [mlx4_core]
  [<ffffffffa03d620f>] mlx4_do_sense_ports+0xbf/0xd0 [mlx4_core]
  [<ffffffffa03d6262>] mlx4_sense_port+0x42/0xc0 [mlx4_core]
  [<ffffffff81055f9c>] process_one_work+0x16c/0x4b0
  [<ffffffff8105825d>] worker_thread+0x15d/0x450
  [<ffffffff8105d5b0>] kthread+0xc0/0xd0
  [<ffffffff813f34dc>] ret_from_fork+0x7c/0xb0

Bart.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/net/ethernet/mellanox/mlx4/cmd.c b/drivers/net/ethernet/mellanox/mlx4/cmd.c
index 299d018..fb7f7fa 100644
--- a/drivers/net/ethernet/mellanox/mlx4/cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx4/cmd.c
@@ -559,8 +559,8 @@  static int mlx4_cmd_wait(struct mlx4_dev *dev, u64 in_param, u64 *out_param,
 	mlx4_cmd_post(dev, in_param, out_param ? *out_param : 0,
 		      in_modifier, op_modifier, op, context->token, 1);
 
-	if (!wait_for_completion_timeout(&context->done,
-					 msecs_to_jiffies(timeout))) {
+	if (wait_for_completion_interruptible_timeout(&context->done,
+					 msecs_to_jiffies(timeout)) <= 0) {
 		mlx4_warn(dev, "command 0x%x timed out (go bit not cleared)\n",
 			  op);
 		err = -EBUSY;