Message ID | 51ED4E60.30203@acm.org (mailing list archive) |
---|---|
State | Rejected |
Headers | show |
On 22/07/2013 18:23, Bart Van Assche wrote: > Avoid that kernel threads running mlx4_cmd_wait() contribute to the > system load by setting the task state to TASK_INTERRUPTIBLE instead > of TASK_UNINTERRUPTIBLE while waiting. This patch reduces the load > average from about 0.5 to about 0.0 on an idle system with one mlx4 > HCA and no IB cables connected. Before diving to the implications of the patch, lets discuss the phenomena you see... So the load average on this idle system is 0.5 or 0.05? I don't see 0.5 or a like on my systems that are installed with HCAs and are idle. Could it be that some IB management entity is repeatedly sending MADs to this system? Or. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 07/24/13 17:17, Or Gerlitz wrote: > On 22/07/2013 18:23, Bart Van Assche wrote: >> Avoid that kernel threads running mlx4_cmd_wait() contribute to the >> system load by setting the task state to TASK_INTERRUPTIBLE instead >> of TASK_UNINTERRUPTIBLE while waiting. This patch reduces the load >> average from about 0.5 to about 0.0 on an idle system with one mlx4 >> HCA and no IB cables connected. > > Before diving to the implications of the patch, lets discuss the > phenomena you see... > So the load average on this idle system is 0.5 or 0.05? > > I don't see 0.5 or a like on my systems that are installed with HCAs and > are idle. Could it > be that some IB management entity is repeatedly sending MADs to this > system? Hello Or, I saw a load of 0.5 with several different upstream kernels (3.6..3.10 at least). The only IB-related process that was running on the system was opensmd. This is definitely reproducible. It was only a month after I had noticed this phenomenon that I started searching for the root cause. Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 24/07/2013 19:48, Bart Van Assche wrote: > I saw a load of 0.5 with several different upstream kernels (3.6..3.10 > at least). The only IB-related process that was running on the system > was opensmd. This is definitely reproducible. It was only a month > after I had noticed this phenomenon that I started searching for the > root cause. do you see it also on systems that don't run opensm? -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 07/24/13 19:06, Or Gerlitz wrote: > On 24/07/2013 19:48, Bart Van Assche wrote: >> I saw a load of 0.5 with several different upstream kernels (3.6..3.10 >> at least). The only IB-related process that was running on the system >> was opensmd. This is definitely reproducible. It was only a month >> after I had noticed this phenomenon that I started searching for the >> root cause. > do you see it also on systems that don't run opensm? That's a test I have not yet run. So sorry, I don't know whether this also happens without opensm. Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 07/24/13 19:06, Or Gerlitz wrote: > On 24/07/2013 19:48, Bart Van Assche wrote: >> I saw a load of 0.5 with several different upstream kernels (3.6..3.10 >> at least). The only IB-related process that was running on the system >> was opensmd. This is definitely reproducible. It was only a month >> after I had noticed this phenomenon that I started searching for the >> root cause. > do you see it also on systems that don't run opensm? Yes. This happens both on systems running opensm and on systems not running opensm. A call trace from a system on which CPU load was higher than expected is as follows: # echo w >/proc/sysrq-trigger; dmesg -c SysRq : Show Blocked State task PC stack pid father kworker/u:7 D ffff88011fa125c0 0 181 2 0x00000000 ffff880114d63b48 0000000000000046 ffff8801158d1c40 ffff880114d63fd8 ffff880114d63fd8 ffff880114d63fd8 ffffffff81613440 ffff8801158d1c40 ffffffff817a7180 ffff880114d63b80 ffffffff817a7180 000000010000ebf9 Call Trace: [<ffffffff813ea259>] schedule+0x29/0x70 [<ffffffff813e88aa>] schedule_timeout+0x10a/0x1e0 [<ffffffff813e98e2>] wait_for_common+0xd2/0x180 [<ffffffff813e9a63>] wait_for_completion_timeout+0x13/0x20 [<ffffffffa03bfb99>] __mlx4_cmd+0x259/0x5e0 [mlx4_core] [<ffffffffa03d60e4>] mlx4_SENSE_PORT+0x54/0xc0 [mlx4_core] [<ffffffffa03d620f>] mlx4_do_sense_ports+0xbf/0xd0 [mlx4_core] [<ffffffffa03d6262>] mlx4_sense_port+0x42/0xc0 [mlx4_core] [<ffffffff81055f9c>] process_one_work+0x16c/0x4b0 [<ffffffff8105825d>] worker_thread+0x15d/0x450 [<ffffffff8105d5b0>] kthread+0xc0/0xd0 [<ffffffff813f34dc>] ret_from_fork+0x7c/0xb0 Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/net/ethernet/mellanox/mlx4/cmd.c b/drivers/net/ethernet/mellanox/mlx4/cmd.c index 299d018..fb7f7fa 100644 --- a/drivers/net/ethernet/mellanox/mlx4/cmd.c +++ b/drivers/net/ethernet/mellanox/mlx4/cmd.c @@ -559,8 +559,8 @@ static int mlx4_cmd_wait(struct mlx4_dev *dev, u64 in_param, u64 *out_param, mlx4_cmd_post(dev, in_param, out_param ? *out_param : 0, in_modifier, op_modifier, op, context->token, 1); - if (!wait_for_completion_timeout(&context->done, - msecs_to_jiffies(timeout))) { + if (wait_for_completion_interruptible_timeout(&context->done, + msecs_to_jiffies(timeout)) <= 0) { mlx4_warn(dev, "command 0x%x timed out (go bit not cleared)\n", op); err = -EBUSY;
Avoid that kernel threads running mlx4_cmd_wait() contribute to the system load by setting the task state to TASK_INTERRUPTIBLE instead of TASK_UNINTERRUPTIBLE while waiting. This patch reduces the load average from about 0.5 to about 0.0 on an idle system with one mlx4 HCA and no IB cables connected. Note: I'm posting this patch as an RFC since it involves a behavior change (a signal sent to a worker thread that is waiting for a command to finish causes the command to fail) and since I'm not sure this behavior change is acceptable. Signed-off-by: Bart Van Assche <bvanassche@acm.org> --- drivers/net/ethernet/mellanox/mlx4/cmd.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)