diff mbox

[2.6.39-rc7] i915: kworker busily spinning...

Message ID 4DDB9061.3030502@mit.edu
State New, archived
Headers show

Commit Message

Andrew Lutomirski May 24, 2011, 11:02 a.m. UTC
On 05/17/2011 07:27 AM, Daniel J Blueman wrote:
> With 2.6.39-rc7 on my Sandy Bridge laptop GPU (8086:0126 rev 9),
> sometimes I find one of the kworker threads busily running with 15-20%
> system time for some minutes, causing terrible interactivity latency.
> I've seen it occur when plugging eg a HDMI display, and also when no
> display has been plugged (ie only the internal LVDS connection is
> active).
> 
> Across multiple kernel task captures, I see the kernel thread
> consistently reading one of the connector's EDID data [1]; I guess
> either it's having a hard time reading from a disconnected connector
> and retrying, or is incorrectly detecting a change when there is none.
> 
> I'll enable DRM debugging to see what connectors it believes it needs
> to read from. Anything else that would be handy to capture, or any
> thoughts?
> 
> Also, the 100ms connector change polling seems overkill, particularly
> when power consumption is important; 1000-2000ms would be sufficient,
> do you think?

I think it's meant to be every 10 seconds.

In any case, the output_poll_execute code does more work than necessary,
which might exacerbate the problem.  Here's an old patch I wrote that
probably doesn't apply any more but might help.

commit 82c9114cdc60aba7d9bec1396d818362a208cfdc
Author: Andy Lutomirski <luto@mit.edu>
Date:   Tue Aug 17 09:38:59 2010 -0400

    drm: Try to poll fewer unnecessary outputs
    
    We used to poll all outputs that had any kind of polling enabled every time
    that the output polling work ran (i.e. every ten seconds).  Now only poll
    hotplug-interrupt capable outputs when hotplug is detected and we only
    poll POLL_CONNECT and POLL_DISCONNECT when in the appropriate state.
    
    Signed-off-by: Andy Lutomirski <luto@mit.edu>



--Andy

> 
> Thanks,
>    Daniel
> 
> --- [1]
> 
> kworker/2:2     R  running task     5048    86      2 0x00000000
>   0000000000000002 ffff88021e804040 ffff88021e85f9b0 ffff88021e804040
>   ffff88021e85e000 0000000000004000 ffff8802210a4040 ffff88021e804040
>   0000000000000046 ffffffff81c18b20 ffff88022106c000 ffffffff8270b740
> Call Trace:
>   [<ffffffff8109a460>] ? mark_held_locks+0x70/0xa0
>   [<ffffffff81059261>] ? get_parent_ip+0x11/0x50
>   [<ffffffff8105933d>] ? sub_preempt_count+0x9d/0xd0
>   [<ffffffff81705a35>] schedule_timeout+0x175/0x250
>   [<ffffffff8106ec10>] ? run_timer_softirq+0x2a0/0x2a0
>   [<ffffffff81705b29>] schedule_timeout_uninterruptible+0x19/0x20
>   [<ffffffff8106f878>] msleep+0x18/0x20
>   [<ffffffffa017c620>] gmbus_xfer+0x400/0x620 [i915]
>   [<ffffffff8150c892>] i2c_transfer+0xa2/0xf0
>   [<ffffffffa002bc96>] drm_do_probe_ddc_edid+0x66/0xa0 [drm]
>   [<ffffffffa002c0f9>] drm_get_edid+0x29/0x60 [drm]
>   [<ffffffffa0176f86>] intel_hdmi_detect+0x56/0xe0 [i915]
>   [<ffffffffa00d1177>] output_poll_execute+0xd7/0x1a0 [drm_kms_helper]
>   [<ffffffff81078e14>] process_one_work+0x1a4/0x450
>   [<ffffffff81078db6>] ? process_one_work+0x146/0x450
>   [<ffffffffa00d10a0>] ?
> drm_helper_disable_unused_functions+0x150/0x150 [drm_kms_helper]
>   [<ffffffff810790ec>] process_scheduled_works+0x2c/0x40
>   [<ffffffff8107c384>] worker_thread+0x284/0x350
>   [<ffffffff8107c100>] ? manage_workers.clone.23+0x120/0x120
>   [<ffffffff81080ea6>] kthread+0xb6/0xc0
>   [<ffffffff8109a5cd>] ? trace_hardirqs_on_caller+0x13d/0x180
>   [<ffffffff8170a494>] kernel_thread_helper+0x4/0x10
>   [<ffffffff8104c64f>] ? finish_task_switch+0x6f/0x100
>   [<ffffffff81708bc4>] ? retint_restore_args+0xe/0xe
>   [<ffffffff81080df0>] ? __init_kthread_worker+0x70/0x70
>   [<ffffffff8170a490>] ? gs_change+0xb/0xb

Comments

Daniel J Blueman May 24, 2011, 11:25 a.m. UTC | #1
On 24 May 2011 12:02, Andy Lutomirski <luto@mit.edu> wrote:
> On 05/17/2011 07:27 AM, Daniel J Blueman wrote:
>> With 2.6.39-rc7 on my Sandy Bridge laptop GPU (8086:0126 rev 9),
>> sometimes I find one of the kworker threads busily running with 15-20%
>> system time for some minutes, causing terrible interactivity latency.
>> I've seen it occur when plugging eg a HDMI display, and also when no
>> display has been plugged (ie only the internal LVDS connection is
>> active).
>>
>> Across multiple kernel task captures, I see the kernel thread
>> consistently reading one of the connector's EDID data [1]; I guess
>> either it's having a hard time reading from a disconnected connector
>> and retrying, or is incorrectly detecting a change when there is none.
>>
>> I'll enable DRM debugging to see what connectors it believes it needs
>> to read from. Anything else that would be handy to capture, or any
>> thoughts?
>>
>> Also, the 100ms connector change polling seems overkill, particularly
>> when power consumption is important; 1000-2000ms would be sufficient,
>> do you think?
>
> I think it's meant to be every 10 seconds.
>
> In any case, the output_poll_execute code does more work than necessary,
> which might exacerbate the problem.  Here's an old patch I wrote that
> probably doesn't apply any more but might help.

Thanks for the patch Andy; I'll test it after we've found a fix for
the underlying issue.

Daniel
diff mbox

Patch

diff --git a/drivers/gpu/drm/drm_crtc_helper.c b/drivers/gpu/drm/drm_crtc_helper.c
index 9b2a541..5a39e3c 100644
--- a/drivers/gpu/drm/drm_crtc_helper.c
+++ b/drivers/gpu/drm/drm_crtc_helper.c
@@ -817,31 +817,38 @@  static void output_poll_execute(struct slow_work *work)
 	struct drm_device *dev = container_of(delayed_work, struct drm_device, mode_config.output_poll_slow_work);
 	struct drm_connector *connector;
 	enum drm_connector_status old_status, status;
-	bool repoll = false, changed = false;
+	bool repoll = false, changed = false, need_poll, hpd_detected;
 	int ret;
 
+	hpd_detected = ACCESS_ONCE(dev->mode_config.hpd_detected);
+	ACCESS_ONCE(dev->mode_config.hpd_detected) = 0;
+
 	mutex_lock(&dev->mode_config.mutex);
 	list_for_each_entry(connector, &dev->mode_config.connector_list, head) {
 
-		/* if this is HPD or polled don't check it -
-		   TV out for instance */
+		/* don't poll unless the drivers asked us to. */
 		if (!connector->polled)
 			continue;
 
-		else if (connector->polled & (DRM_CONNECTOR_POLL_CONNECT | DRM_CONNECTOR_POLL_DISCONNECT))
-			repoll = true;
+		/* poll if we got an HPD request... */
+		need_poll = hpd_detected;
 
+		/* ...or if we're supposed to poll all the time */
 		old_status = connector->status;
-		/* if we are connected and don't want to poll for disconnect
-		   skip it */
-		if (old_status == connector_status_connected &&
-		    !(connector->polled & DRM_CONNECTOR_POLL_DISCONNECT) &&
-		    !(connector->polled & DRM_CONNECTOR_POLL_HPD))
-			continue;
+		if (old_status == connector_status_connected ?
+		    (connector->polled & DRM_CONNECTOR_POLL_CONNECT) :
+		    (connector->polled & DRM_CONNECTOR_POLL_DISCONNECT))
+			need_poll = true;
 
-		status = connector->funcs->detect(connector);
-		if (old_status != status)
-			changed = true;
+		/* Do we re-queue? */
+		if (connector->polled & (DRM_CONNECTOR_POLL_CONNECT | DRM_CONNECTOR_POLL_DISCONNECT))
+			repoll = true;
+
+		if (need_poll) {
+			status = connector->funcs->detect(connector);
+			if (old_status != status)
+				changed = true;
+		}
 	}
 
 	mutex_unlock(&dev->mode_config.mutex);
@@ -911,6 +918,7 @@  void drm_helper_hpd_irq_event(struct drm_device *dev)
 		return;
 	delayed_slow_work_cancel(&dev->mode_config.output_poll_slow_work);
 	/* schedule a slow work asap */
+	ACCESS_ONCE(dev->mode_config.hpd_detected) = true;
 	delayed_slow_work_enqueue(&dev->mode_config.output_poll_slow_work, 0);
 }
 EXPORT_SYMBOL(drm_helper_hpd_irq_event);
diff --git a/include/drm/drm_crtc.h b/include/drm/drm_crtc.h
index 93a1a31..586f41f 100644
--- a/include/drm/drm_crtc.h
+++ b/include/drm/drm_crtc.h
@@ -596,6 +596,7 @@  struct drm_mode_config {
 	/* output poll support */
 	bool poll_enabled;
 	struct delayed_slow_work output_poll_slow_work;
+	bool hpd_detected;
 
 	/* pointers to standard properties */
 	struct list_head property_blob_list;