drm/mst: Fix MST sideband up-reply failure handling

Message ID	20190523212433.9058-1-imre.deak@intel.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <dri-devel-bounces@lists.freedesktop.org> From: Imre Deak <imre.deak@intel.com> To: dri-devel@lists.freedesktop.org Subject: [PATCH] drm/mst: Fix MST sideband up-reply failure handling Date: Fri, 24 May 2019 00:24:33 +0300 Message-Id: <20190523212433.9058-1-imre.deak@intel.com> Precedence: list Cc: Dave Airlie <airlied@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>
Series	drm/mst: Fix MST sideband up-reply failure handling \| expand drm/mst: Fix MST sideband up-reply failure handling

Message ID

20190523212433.9058-1-imre.deak@intel.com (mailing list archive)

State

New, archived

Headers

From: Imre Deak <imre.deak@intel.com>
To: dri-devel@lists.freedesktop.org
Subject: [PATCH] drm/mst: Fix MST sideband up-reply failure handling
Date: Fri, 24 May 2019 00:24:33 +0300
Message-Id: <20190523212433.9058-1-imre.deak@intel.com>
Precedence: list
Cc: Dave Airlie <airlied@redhat.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
Errors-To: dri-devel-bounces@lists.freedesktop.org
Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>

Series

drm/mst: Fix MST sideband up-reply failure handling | expand

Commit Message

Imre Deak May 23, 2019, 9:24 p.m. UTC

Fix the breakage resulting in the stacktrace below, due to tx queue
being full when trying to send an up-reply. txmsg->seqno is -1 in this
case leading to a corruption of the mstb object by

	txmsg->dst->tx_slots[txmsg->seqno] = NULL;

in process_single_up_tx_qlock().

[  +0,005162] [drm:process_single_tx_qlock [drm_kms_helper]] set_hdr_from_dst_qlock: failed to find slot
[  +0,000015] [drm:drm_dp_send_up_ack_reply.constprop.19 [drm_kms_helper]] failed to send msg in q -11
[  +0,000939] BUG: kernel NULL pointer dereference, address: 00000000000005a0
[  +0,006982] #PF: supervisor write access in kernel mode
[  +0,005223] #PF: error_code(0x0002) - not-present page
[  +0,005135] PGD 0 P4D 0
[  +0,002581] Oops: 0002 [#1] PREEMPT SMP NOPTI
[  +0,004359] CPU: 1 PID: 1200 Comm: kworker/u16:3 Tainted: G     U            5.2.0-rc1+ #410
[  +0,008433] Hardware name: Intel Corporation Ice Lake Client Platform/IceLake U DDR4 SODIMM PD RVP, BIOS ICLSFWR1.R00.3175.A00.1904261428 04/26/2019
[  +0,013323] Workqueue: i915-dp i915_digport_work_func [i915]
[  +0,005676] RIP: 0010:queue_work_on+0x19/0x70
[  +0,004372] Code: ff ff ff 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 41 56 49 89 f6 41 55 41 89 fd 41 54 55 53 48 89 d3 9c 5d fa e8 e7 81 0c 00 <f0> 48 0f ba 2b 00 73 31 45 31 e4 f7 c5 00 02 00 00 74 13 e8 cf 7f
[  +0,018750] RSP: 0018:ffffc900007dfc50 EFLAGS: 00010006
[  +0,005222] RAX: 0000000000000046 RBX: 00000000000005a0 RCX: 0000000000000001
[  +0,007133] RDX: 000000000001b608 RSI: 0000000000000000 RDI: ffffffff82121972
[  +0,007129] RBP: 0000000000000202 R08: 0000000000000000 R09: 0000000000000001
[  +0,007129] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88847bfa5096
[  +0,007131] R13: 0000000000000010 R14: ffff88849c08f3f8 R15: 0000000000000000
[  +0,007128] FS:  0000000000000000(0000) GS:ffff88849dc80000(0000) knlGS:0000000000000000
[  +0,008083] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  +0,005749] CR2: 00000000000005a0 CR3: 0000000005210006 CR4: 0000000000760ee0
[  +0,007128] PKRU: 55555554
[  +0,002722] Call Trace:
[  +0,002458]  drm_dp_mst_handle_up_req+0x517/0x540 [drm_kms_helper]
[  +0,006197]  ? drm_dp_mst_hpd_irq+0x5b/0x9c0 [drm_kms_helper]
[  +0,005764]  drm_dp_mst_hpd_irq+0x5b/0x9c0 [drm_kms_helper]
[  +0,005623]  ? intel_dp_hpd_pulse+0x205/0x370 [i915]
[  +0,005018]  intel_dp_hpd_pulse+0x205/0x370 [i915]
[  +0,004836]  i915_digport_work_func+0xbb/0x140 [i915]
[  +0,005108]  process_one_work+0x245/0x610
[  +0,004027]  worker_thread+0x37/0x380
[  +0,003684]  ? process_one_work+0x610/0x610
[  +0,004184]  kthread+0x119/0x130
[  +0,003240]  ? kthread_park+0x80/0x80
[  +0,003668]  ret_from_fork+0x24/0x50

Cc: Lyude Paul <lyude@redhat.com>
Cc: Dave Airlie <airlied@redhat.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
---
 drivers/gpu/drm/drm_dp_mst_topology.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

Comments

Lyude Paul May 23, 2019, 10:09 p.m. UTC | #1

Patch mostly looks good to me, one comment below

On Fri, 2019-05-24 at 00:24 +0300, Imre Deak wrote:
> Fix the breakage resulting in the stacktrace below, due to tx queue
> being full when trying to send an up-reply. txmsg->seqno is -1 in this
> case leading to a corruption of the mstb object by
> 
> 	txmsg->dst->tx_slots[txmsg->seqno] = NULL;
> 
> in process_single_up_tx_qlock().
> 
> [  +0,005162] [drm:process_single_tx_qlock [drm_kms_helper]]
> set_hdr_from_dst_qlock: failed to find slot
> [  +0,000015] [drm:drm_dp_send_up_ack_reply.constprop.19 [drm_kms_helper]]
> failed to send msg in q -11
> [  +0,000939] BUG: kernel NULL pointer dereference, address:
> 00000000000005a0
> [  +0,006982] #PF: supervisor write access in kernel mode
> [  +0,005223] #PF: error_code(0x0002) - not-present page
> [  +0,005135] PGD 0 P4D 0
> [  +0,002581] Oops: 0002 [#1] PREEMPT SMP NOPTI
> [  +0,004359] CPU: 1 PID: 1200 Comm: kworker/u16:3 Tainted:
> G     U            5.2.0-rc1+ #410
> [  +0,008433] Hardware name: Intel Corporation Ice Lake Client
> Platform/IceLake U DDR4 SODIMM PD RVP, BIOS ICLSFWR1.R00.3175.A00.1904261428
> 04/26/2019
> [  +0,013323] Workqueue: i915-dp i915_digport_work_func [i915]
> [  +0,005676] RIP: 0010:queue_work_on+0x19/0x70
> [  +0,004372] Code: ff ff ff 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 41 56
> 49 89 f6 41 55 41 89 fd 41 54 55 53 48 89 d3 9c 5d fa e8 e7 81 0c 00 <f0> 48
> 0f ba 2b 00 73 31 45 31 e4 f7 c5 00 02 00 00 74 13 e8 cf 7f
> [  +0,018750] RSP: 0018:ffffc900007dfc50 EFLAGS: 00010006
> [  +0,005222] RAX: 0000000000000046 RBX: 00000000000005a0 RCX:
> 0000000000000001
> [  +0,007133] RDX: 000000000001b608 RSI: 0000000000000000 RDI:
> ffffffff82121972
> [  +0,007129] RBP: 0000000000000202 R08: 0000000000000000 R09:
> 0000000000000001
> [  +0,007129] R10: 0000000000000000 R11: 0000000000000000 R12:
> ffff88847bfa5096
> [  +0,007131] R13: 0000000000000010 R14: ffff88849c08f3f8 R15:
> 0000000000000000
> [  +0,007128] FS:  0000000000000000(0000) GS:ffff88849dc80000(0000)
> knlGS:0000000000000000
> [  +0,008083] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  +0,005749] CR2: 00000000000005a0 CR3: 0000000005210006 CR4:
> 0000000000760ee0
> [  +0,007128] PKRU: 55555554
> [  +0,002722] Call Trace:
> [  +0,002458]  drm_dp_mst_handle_up_req+0x517/0x540 [drm_kms_helper]
> [  +0,006197]  ? drm_dp_mst_hpd_irq+0x5b/0x9c0 [drm_kms_helper]
> [  +0,005764]  drm_dp_mst_hpd_irq+0x5b/0x9c0 [drm_kms_helper]
> [  +0,005623]  ? intel_dp_hpd_pulse+0x205/0x370 [i915]
> [  +0,005018]  intel_dp_hpd_pulse+0x205/0x370 [i915]
> [  +0,004836]  i915_digport_work_func+0xbb/0x140 [i915]
> [  +0,005108]  process_one_work+0x245/0x610
> [  +0,004027]  worker_thread+0x37/0x380
> [  +0,003684]  ? process_one_work+0x610/0x610
> [  +0,004184]  kthread+0x119/0x130
> [  +0,003240]  ? kthread_park+0x80/0x80
> [  +0,003668]  ret_from_fork+0x24/0x50
> 
> Cc: Lyude Paul <lyude@redhat.com>
> Cc: Dave Airlie <airlied@redhat.com>
> Signed-off-by: Imre Deak <imre.deak@intel.com>
> ---
>  drivers/gpu/drm/drm_dp_mst_topology.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c
> b/drivers/gpu/drm/drm_dp_mst_topology.c
> index da1abca1b9e9..24c325f4a616 100644
> --- a/drivers/gpu/drm/drm_dp_mst_topology.c
> +++ b/drivers/gpu/drm/drm_dp_mst_topology.c
> @@ -1996,7 +1996,11 @@ static void process_single_up_tx_qlock(struct
> drm_dp_mst_topology_mgr *mgr,
>  	if (ret != 1)
>  		DRM_DEBUG_KMS("failed to send msg in q %d\n", ret);
>  
> -	txmsg->dst->tx_slots[txmsg->seqno] = NULL;
> +	if (txmsg->seqno != -1) {
> +		WARN_ON((unsigned)txmsg->seqno >
> +			ARRAY_SIZE(txmsg->dst->tx_slots));

Not 100% sure on this nitpick myself but, if we know that txmsg->seqno is
about to go out of bounds shouldn't we also try to take action to stop it?
like

if (!WARN_ON((unsigned)txmsg->seqno > ARRAY_SIZE(txmsg->dst->tx_slots)))
        txmsg->dst->tx_slots[txmsg->seqno] = NULL;



> +		txmsg->dst->tx_slots[txmsg->seqno] = NULL;
> +	}
>  }
>  
>  static void drm_dp_queue_down_tx(struct drm_dp_mst_topology_mgr *mgr,

Imre Deak May 23, 2019, 10:28 p.m. UTC | #2

On Thu, May 23, 2019 at 06:09:56PM -0400, Lyude Paul wrote:
> Patch mostly looks good to me, one comment below
> 
> On Fri, 2019-05-24 at 00:24 +0300, Imre Deak wrote:
> > Fix the breakage resulting in the stacktrace below, due to tx queue
> > being full when trying to send an up-reply. txmsg->seqno is -1 in this
> > case leading to a corruption of the mstb object by
> > 
> > 	txmsg->dst->tx_slots[txmsg->seqno] = NULL;
> > 
> > in process_single_up_tx_qlock().
> > 
> > [  +0,005162] [drm:process_single_tx_qlock [drm_kms_helper]]
> > set_hdr_from_dst_qlock: failed to find slot
> > [  +0,000015] [drm:drm_dp_send_up_ack_reply.constprop.19 [drm_kms_helper]]
> > failed to send msg in q -11
> > [  +0,000939] BUG: kernel NULL pointer dereference, address:
> > 00000000000005a0
> > [  +0,006982] #PF: supervisor write access in kernel mode
> > [  +0,005223] #PF: error_code(0x0002) - not-present page
> > [  +0,005135] PGD 0 P4D 0
> > [  +0,002581] Oops: 0002 [#1] PREEMPT SMP NOPTI
> > [  +0,004359] CPU: 1 PID: 1200 Comm: kworker/u16:3 Tainted:
> > G     U            5.2.0-rc1+ #410
> > [  +0,008433] Hardware name: Intel Corporation Ice Lake Client
> > Platform/IceLake U DDR4 SODIMM PD RVP, BIOS ICLSFWR1.R00.3175.A00.1904261428
> > 04/26/2019
> > [  +0,013323] Workqueue: i915-dp i915_digport_work_func [i915]
> > [  +0,005676] RIP: 0010:queue_work_on+0x19/0x70
> > [  +0,004372] Code: ff ff ff 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 41 56
> > 49 89 f6 41 55 41 89 fd 41 54 55 53 48 89 d3 9c 5d fa e8 e7 81 0c 00 <f0> 48
> > 0f ba 2b 00 73 31 45 31 e4 f7 c5 00 02 00 00 74 13 e8 cf 7f
> > [  +0,018750] RSP: 0018:ffffc900007dfc50 EFLAGS: 00010006
> > [  +0,005222] RAX: 0000000000000046 RBX: 00000000000005a0 RCX:
> > 0000000000000001
> > [  +0,007133] RDX: 000000000001b608 RSI: 0000000000000000 RDI:
> > ffffffff82121972
> > [  +0,007129] RBP: 0000000000000202 R08: 0000000000000000 R09:
> > 0000000000000001
> > [  +0,007129] R10: 0000000000000000 R11: 0000000000000000 R12:
> > ffff88847bfa5096
> > [  +0,007131] R13: 0000000000000010 R14: ffff88849c08f3f8 R15:
> > 0000000000000000
> > [  +0,007128] FS:  0000000000000000(0000) GS:ffff88849dc80000(0000)
> > knlGS:0000000000000000
> > [  +0,008083] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [  +0,005749] CR2: 00000000000005a0 CR3: 0000000005210006 CR4:
> > 0000000000760ee0
> > [  +0,007128] PKRU: 55555554
> > [  +0,002722] Call Trace:
> > [  +0,002458]  drm_dp_mst_handle_up_req+0x517/0x540 [drm_kms_helper]
> > [  +0,006197]  ? drm_dp_mst_hpd_irq+0x5b/0x9c0 [drm_kms_helper]
> > [  +0,005764]  drm_dp_mst_hpd_irq+0x5b/0x9c0 [drm_kms_helper]
> > [  +0,005623]  ? intel_dp_hpd_pulse+0x205/0x370 [i915]
> > [  +0,005018]  intel_dp_hpd_pulse+0x205/0x370 [i915]
> > [  +0,004836]  i915_digport_work_func+0xbb/0x140 [i915]
> > [  +0,005108]  process_one_work+0x245/0x610
> > [  +0,004027]  worker_thread+0x37/0x380
> > [  +0,003684]  ? process_one_work+0x610/0x610
> > [  +0,004184]  kthread+0x119/0x130
> > [  +0,003240]  ? kthread_park+0x80/0x80
> > [  +0,003668]  ret_from_fork+0x24/0x50
> > 
> > Cc: Lyude Paul <lyude@redhat.com>
> > Cc: Dave Airlie <airlied@redhat.com>
> > Signed-off-by: Imre Deak <imre.deak@intel.com>
> > ---
> >  drivers/gpu/drm/drm_dp_mst_topology.c | 6 +++++-
> >  1 file changed, 5 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c
> > b/drivers/gpu/drm/drm_dp_mst_topology.c
> > index da1abca1b9e9..24c325f4a616 100644
> > --- a/drivers/gpu/drm/drm_dp_mst_topology.c
> > +++ b/drivers/gpu/drm/drm_dp_mst_topology.c
> > @@ -1996,7 +1996,11 @@ static void process_single_up_tx_qlock(struct
> > drm_dp_mst_topology_mgr *mgr,
> >  	if (ret != 1)
> >  		DRM_DEBUG_KMS("failed to send msg in q %d\n", ret);
> >  
> > -	txmsg->dst->tx_slots[txmsg->seqno] = NULL;
> > +	if (txmsg->seqno != -1) {
> > +		WARN_ON((unsigned)txmsg->seqno >
> > +			ARRAY_SIZE(txmsg->dst->tx_slots));
> 
> Not 100% sure on this nitpick myself but, if we know that txmsg->seqno is
> about to go out of bounds shouldn't we also try to take action to stop it?
> like

Imo, it's worth keeping thins simple. If the WARN triggers we need to
fix the original issue in any case (txmsg->seqno never should be set to
anything else than -1 or a valid idx) so making the assignment here
conditional wouldn't have a real purpose.

> 
> if (!WARN_ON((unsigned)txmsg->seqno > ARRAY_SIZE(txmsg->dst->tx_slots)))
>         txmsg->dst->tx_slots[txmsg->seqno] = NULL;
> 
> 
> 
> > +		txmsg->dst->tx_slots[txmsg->seqno] = NULL;
> > +	}
> >  }
> >  
> >  static void drm_dp_queue_down_tx(struct drm_dp_mst_topology_mgr *mgr,
> -- 
> Cheers,
> 	Lyude Paul
>

Lyude Paul May 23, 2019, 10:31 p.m. UTC | #3

On Fri, 2019-05-24 at 01:28 +0300, Imre Deak wrote:
> On Thu, May 23, 2019 at 06:09:56PM -0400, Lyude Paul wrote:
> > Patch mostly looks good to me, one comment below
> > 
> > On Fri, 2019-05-24 at 00:24 +0300, Imre Deak wrote:
> > > Fix the breakage resulting in the stacktrace below, due to tx queue
> > > being full when trying to send an up-reply. txmsg->seqno is -1 in this
> > > case leading to a corruption of the mstb object by
> > > 
> > > 	txmsg->dst->tx_slots[txmsg->seqno] = NULL;
> > > 
> > > in process_single_up_tx_qlock().
> > > 
> > > [  +0,005162] [drm:process_single_tx_qlock [drm_kms_helper]]
> > > set_hdr_from_dst_qlock: failed to find slot
> > > [  +0,000015] [drm:drm_dp_send_up_ack_reply.constprop.19
> > > [drm_kms_helper]]
> > > failed to send msg in q -11
> > > [  +0,000939] BUG: kernel NULL pointer dereference, address:
> > > 00000000000005a0
> > > [  +0,006982] #PF: supervisor write access in kernel mode
> > > [  +0,005223] #PF: error_code(0x0002) - not-present page
> > > [  +0,005135] PGD 0 P4D 0
> > > [  +0,002581] Oops: 0002 [#1] PREEMPT SMP NOPTI
> > > [  +0,004359] CPU: 1 PID: 1200 Comm: kworker/u16:3 Tainted:
> > > G     U            5.2.0-rc1+ #410
> > > [  +0,008433] Hardware name: Intel Corporation Ice Lake Client
> > > Platform/IceLake U DDR4 SODIMM PD RVP, BIOS
> > > ICLSFWR1.R00.3175.A00.1904261428
> > > 04/26/2019
> > > [  +0,013323] Workqueue: i915-dp i915_digport_work_func [i915]
> > > [  +0,005676] RIP: 0010:queue_work_on+0x19/0x70
> > > [  +0,004372] Code: ff ff ff 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00
> > > 41 56
> > > 49 89 f6 41 55 41 89 fd 41 54 55 53 48 89 d3 9c 5d fa e8 e7 81 0c 00
> > > <f0> 48
> > > 0f ba 2b 00 73 31 45 31 e4 f7 c5 00 02 00 00 74 13 e8 cf 7f
> > > [  +0,018750] RSP: 0018:ffffc900007dfc50 EFLAGS: 00010006
> > > [  +0,005222] RAX: 0000000000000046 RBX: 00000000000005a0 RCX:
> > > 0000000000000001
> > > [  +0,007133] RDX: 000000000001b608 RSI: 0000000000000000 RDI:
> > > ffffffff82121972
> > > [  +0,007129] RBP: 0000000000000202 R08: 0000000000000000 R09:
> > > 0000000000000001
> > > [  +0,007129] R10: 0000000000000000 R11: 0000000000000000 R12:
> > > ffff88847bfa5096
> > > [  +0,007131] R13: 0000000000000010 R14: ffff88849c08f3f8 R15:
> > > 0000000000000000
> > > [  +0,007128] FS:  0000000000000000(0000) GS:ffff88849dc80000(0000)
> > > knlGS:0000000000000000
> > > [  +0,008083] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [  +0,005749] CR2: 00000000000005a0 CR3: 0000000005210006 CR4:
> > > 0000000000760ee0
> > > [  +0,007128] PKRU: 55555554
> > > [  +0,002722] Call Trace:
> > > [  +0,002458]  drm_dp_mst_handle_up_req+0x517/0x540 [drm_kms_helper]
> > > [  +0,006197]  ? drm_dp_mst_hpd_irq+0x5b/0x9c0 [drm_kms_helper]
> > > [  +0,005764]  drm_dp_mst_hpd_irq+0x5b/0x9c0 [drm_kms_helper]
> > > [  +0,005623]  ? intel_dp_hpd_pulse+0x205/0x370 [i915]
> > > [  +0,005018]  intel_dp_hpd_pulse+0x205/0x370 [i915]
> > > [  +0,004836]  i915_digport_work_func+0xbb/0x140 [i915]
> > > [  +0,005108]  process_one_work+0x245/0x610
> > > [  +0,004027]  worker_thread+0x37/0x380
> > > [  +0,003684]  ? process_one_work+0x610/0x610
> > > [  +0,004184]  kthread+0x119/0x130
> > > [  +0,003240]  ? kthread_park+0x80/0x80
> > > [  +0,003668]  ret_from_fork+0x24/0x50
> > > 
> > > Cc: Lyude Paul <lyude@redhat.com>
> > > Cc: Dave Airlie <airlied@redhat.com>
> > > Signed-off-by: Imre Deak <imre.deak@intel.com>
> > > ---
> > >  drivers/gpu/drm/drm_dp_mst_topology.c | 6 +++++-
> > >  1 file changed, 5 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c
> > > b/drivers/gpu/drm/drm_dp_mst_topology.c
> > > index da1abca1b9e9..24c325f4a616 100644
> > > --- a/drivers/gpu/drm/drm_dp_mst_topology.c
> > > +++ b/drivers/gpu/drm/drm_dp_mst_topology.c
> > > @@ -1996,7 +1996,11 @@ static void process_single_up_tx_qlock(struct
> > > drm_dp_mst_topology_mgr *mgr,
> > >  	if (ret != 1)
> > >  		DRM_DEBUG_KMS("failed to send msg in q %d\n", ret);
> > >  
> > > -	txmsg->dst->tx_slots[txmsg->seqno] = NULL;
> > > +	if (txmsg->seqno != -1) {
> > > +		WARN_ON((unsigned)txmsg->seqno >
> > > +			ARRAY_SIZE(txmsg->dst->tx_slots));
> > 
> > Not 100% sure on this nitpick myself but, if we know that txmsg->seqno is
> > about to go out of bounds shouldn't we also try to take action to stop it?
> > like
> 
> Imo, it's worth keeping thins simple. If the WARN triggers we need to
> fix the original issue in any case (txmsg->seqno never should be set to
> anything else than -1 or a valid idx) so making the assignment here
> conditional wouldn't have a real purpose.

mm, fair enough. Then:

Reviewed-by: Lyude Paul <lyude@redhat.com>

Thanks for the fix!

> 
> > if (!WARN_ON((unsigned)txmsg->seqno > ARRAY_SIZE(txmsg->dst->tx_slots)))
> >         txmsg->dst->tx_slots[txmsg->seqno] = NULL;
> > 
> > 
> > 
> > > +		txmsg->dst->tx_slots[txmsg->seqno] = NULL;
> > > +	}
> > >  }
> > >  
> > >  static void drm_dp_queue_down_tx(struct drm_dp_mst_topology_mgr *mgr,
> > -- 
> > Cheers,
> > 	Lyude Paul
> >

Imre Deak May 30, 2019, 9:43 a.m. UTC | #4

On Thu, May 23, 2019 at 06:31:15PM -0400, Lyude Paul wrote:
> On Fri, 2019-05-24 at 01:28 +0300, Imre Deak wrote:
> > On Thu, May 23, 2019 at 06:09:56PM -0400, Lyude Paul wrote:
> > > Patch mostly looks good to me, one comment below
> > > 
> > > On Fri, 2019-05-24 at 00:24 +0300, Imre Deak wrote:
> > > > Fix the breakage resulting in the stacktrace below, due to tx queue
> > > > being full when trying to send an up-reply. txmsg->seqno is -1 in this
> > > > case leading to a corruption of the mstb object by
> > > > 
> > > > 	txmsg->dst->tx_slots[txmsg->seqno] = NULL;
> > > > 
> > > > in process_single_up_tx_qlock().
> > > > 
> > > > [  +0,005162] [drm:process_single_tx_qlock [drm_kms_helper]]
> > > > set_hdr_from_dst_qlock: failed to find slot
> > > > [  +0,000015] [drm:drm_dp_send_up_ack_reply.constprop.19
> > > > [drm_kms_helper]]
> > > > failed to send msg in q -11
> > > > [  +0,000939] BUG: kernel NULL pointer dereference, address:
> > > > 00000000000005a0
> > > > [  +0,006982] #PF: supervisor write access in kernel mode
> > > > [  +0,005223] #PF: error_code(0x0002) - not-present page
> > > > [  +0,005135] PGD 0 P4D 0
> > > > [  +0,002581] Oops: 0002 [#1] PREEMPT SMP NOPTI
> > > > [  +0,004359] CPU: 1 PID: 1200 Comm: kworker/u16:3 Tainted:
> > > > G     U            5.2.0-rc1+ #410
> > > > [  +0,008433] Hardware name: Intel Corporation Ice Lake Client
> > > > Platform/IceLake U DDR4 SODIMM PD RVP, BIOS
> > > > ICLSFWR1.R00.3175.A00.1904261428
> > > > 04/26/2019
> > > > [  +0,013323] Workqueue: i915-dp i915_digport_work_func [i915]
> > > > [  +0,005676] RIP: 0010:queue_work_on+0x19/0x70
> > > > [  +0,004372] Code: ff ff ff 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00
> > > > 41 56
> > > > 49 89 f6 41 55 41 89 fd 41 54 55 53 48 89 d3 9c 5d fa e8 e7 81 0c 00
> > > > <f0> 48
> > > > 0f ba 2b 00 73 31 45 31 e4 f7 c5 00 02 00 00 74 13 e8 cf 7f
> > > > [  +0,018750] RSP: 0018:ffffc900007dfc50 EFLAGS: 00010006
> > > > [  +0,005222] RAX: 0000000000000046 RBX: 00000000000005a0 RCX:
> > > > 0000000000000001
> > > > [  +0,007133] RDX: 000000000001b608 RSI: 0000000000000000 RDI:
> > > > ffffffff82121972
> > > > [  +0,007129] RBP: 0000000000000202 R08: 0000000000000000 R09:
> > > > 0000000000000001
> > > > [  +0,007129] R10: 0000000000000000 R11: 0000000000000000 R12:
> > > > ffff88847bfa5096
> > > > [  +0,007131] R13: 0000000000000010 R14: ffff88849c08f3f8 R15:
> > > > 0000000000000000
> > > > [  +0,007128] FS:  0000000000000000(0000) GS:ffff88849dc80000(0000)
> > > > knlGS:0000000000000000
> > > > [  +0,008083] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > [  +0,005749] CR2: 00000000000005a0 CR3: 0000000005210006 CR4:
> > > > 0000000000760ee0
> > > > [  +0,007128] PKRU: 55555554
> > > > [  +0,002722] Call Trace:
> > > > [  +0,002458]  drm_dp_mst_handle_up_req+0x517/0x540 [drm_kms_helper]
> > > > [  +0,006197]  ? drm_dp_mst_hpd_irq+0x5b/0x9c0 [drm_kms_helper]
> > > > [  +0,005764]  drm_dp_mst_hpd_irq+0x5b/0x9c0 [drm_kms_helper]
> > > > [  +0,005623]  ? intel_dp_hpd_pulse+0x205/0x370 [i915]
> > > > [  +0,005018]  intel_dp_hpd_pulse+0x205/0x370 [i915]
> > > > [  +0,004836]  i915_digport_work_func+0xbb/0x140 [i915]
> > > > [  +0,005108]  process_one_work+0x245/0x610
> > > > [  +0,004027]  worker_thread+0x37/0x380
> > > > [  +0,003684]  ? process_one_work+0x610/0x610
> > > > [  +0,004184]  kthread+0x119/0x130
> > > > [  +0,003240]  ? kthread_park+0x80/0x80
> > > > [  +0,003668]  ret_from_fork+0x24/0x50
> > > > 
> > > > Cc: Lyude Paul <lyude@redhat.com>
> > > > Cc: Dave Airlie <airlied@redhat.com>
> > > > Signed-off-by: Imre Deak <imre.deak@intel.com>
> > > > ---
> > > >  drivers/gpu/drm/drm_dp_mst_topology.c | 6 +++++-
> > > >  1 file changed, 5 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c
> > > > b/drivers/gpu/drm/drm_dp_mst_topology.c
> > > > index da1abca1b9e9..24c325f4a616 100644
> > > > --- a/drivers/gpu/drm/drm_dp_mst_topology.c
> > > > +++ b/drivers/gpu/drm/drm_dp_mst_topology.c
> > > > @@ -1996,7 +1996,11 @@ static void process_single_up_tx_qlock(struct
> > > > drm_dp_mst_topology_mgr *mgr,
> > > >  	if (ret != 1)
> > > >  		DRM_DEBUG_KMS("failed to send msg in q %d\n", ret);
> > > >  
> > > > -	txmsg->dst->tx_slots[txmsg->seqno] = NULL;
> > > > +	if (txmsg->seqno != -1) {
> > > > +		WARN_ON((unsigned)txmsg->seqno >
> > > > +			ARRAY_SIZE(txmsg->dst->tx_slots));
> > > 
> > > Not 100% sure on this nitpick myself but, if we know that txmsg->seqno is
> > > about to go out of bounds shouldn't we also try to take action to stop it?
> > > like
> > 
> > Imo, it's worth keeping thins simple. If the WARN triggers we need to
> > fix the original issue in any case (txmsg->seqno never should be set to
> > anything else than -1 or a valid idx) so making the assignment here
> > conditional wouldn't have a real purpose.
> 
> mm, fair enough. Then:
> 
> Reviewed-by: Lyude Paul <lyude@redhat.com>
> 
> Thanks for the fix!

Thanks for the review. Got now the permission and so pushed the change
to drm-misc-next. I wonder if there's a CI for this branch similar we
have in place for the intel-gfx branches. Maybe I should've CC'd the
patch to the intel-gfx ML for a pre-merge CI report?

> 
> > 
> > > if (!WARN_ON((unsigned)txmsg->seqno > ARRAY_SIZE(txmsg->dst->tx_slots)))
> > >         txmsg->dst->tx_slots[txmsg->seqno] = NULL;
> > > 
> > > 
> > > 
> > > > +		txmsg->dst->tx_slots[txmsg->seqno] = NULL;
> > > > +	}
> > > >  }
> > > >  
> > > >  static void drm_dp_queue_down_tx(struct drm_dp_mst_topology_mgr *mgr,
> > > -- 
> > > Cheers,
> > > 	Lyude Paul
> > > 
> -- 
> Cheers,
> 	Lyude Paul
>

diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c b/drivers/gpu/drm/drm_dp_mst_topology.c
index da1abca1b9e9..24c325f4a616 100644
--- a/drivers/gpu/drm/drm_dp_mst_topology.c
+++ b/drivers/gpu/drm/drm_dp_mst_topology.c
@@ -1996,7 +1996,11 @@  static void process_single_up_tx_qlock(struct drm_dp_mst_topology_mgr *mgr,
 	if (ret != 1)
 		DRM_DEBUG_KMS("failed to send msg in q %d\n", ret);
 
-	txmsg->dst->tx_slots[txmsg->seqno] = NULL;
+	if (txmsg->seqno != -1) {
+		WARN_ON((unsigned)txmsg->seqno >
+			ARRAY_SIZE(txmsg->dst->tx_slots));
+		txmsg->dst->tx_slots[txmsg->seqno] = NULL;
+	}
 }
 
 static void drm_dp_queue_down_tx(struct drm_dp_mst_topology_mgr *mgr,