Message ID | 20181022101405.v7setqacuftrafrb@gofer.mess.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | media: rc: cec devices do not have a lirc chardev | expand |
On 10/22/2018 11:14 AM, Sean Young wrote: > On Mon, Oct 22, 2018 at 11:44:22AM +0200, Torbjorn Jansson wrote: >> On 2018-10-22 10:59, Sean Young wrote: >>> On Sat, Oct 20, 2018 at 11:12:16PM +0200, Hans Verkuil wrote: >>>> Hi Sean, >>>> >>>> Can you take a look at this, it appears to be an RC issue, see my analysis below. >>>> >>>> On 10/20/2018 03:26 PM, Torbjorn Jansson wrote: >>>>> Hello >>>>> >>>>> i'm using the pulse8 usb cec adapter to control my tv. >>>>> i have a few scripts that poll the power status of my tv and after a while it stops working returning errors when trying to check if tv is on or off. >>>>> this i think matches a kernel oops i'm seeing that i suspect is related to this. >>>>> >>>>> i have sometimes been able to recover from this problem by completely cutting power to my tv and also unplugging the usb cec adapter. >>>>> i have a feeling that the tv is at least partly to blame for cec-ctl not working but in any case there shouldn't be a kernel oops. >>>>> >>>>> >>>>> also every now and then i see this in dmesg: >>>>> cec cec0: transmit: failed 05 >>>>> cec cec0: transmit: failed 06 >>>>> but that doesn't appear to do any harm as far as i can tell. >>>>> >>>>> any idea whats causing the oops? >>>>> >>>>> the ops: >>>>> >>>>> BUG: unable to handle kernel NULL pointer dereference at 0000000000000038 >>>>> PGD 0 P4D 0 >>>>> Oops: 0000 [#1] SMP PTI >>>>> CPU: 9 PID: 27687 Comm: kworker/9:2 Tainted: P OE 4.18.12-200.fc28.x86_64 #1 >>>>> Hardware name: Supermicro C7X99-OCE-F/C7X99-OCE-F, BIOS 2.1a 06/15/2018 >>>>> Workqueue: events pulse8_irq_work_handler [pulse8_cec] >>>>> RIP: 0010:ir_lirc_scancode_event+0x3d/0xb0 [rc_core] >>>> >>>> Huh. ir_lirc_scancode_event() calls spin_lock_irqsave(&dev->lirc_fh_lock, flags); >>>> >>>> The spinlock dev->lirc_fh_lock is initialized in ir_lirc_register(), which is called >>>> from rc_register_device(), except when the protocol is CEC: >>>> >>>> /* Ensure that the lirc kfifo is setup before we start the thread */ >>>> if (dev->allowed_protocols != RC_PROTO_BIT_CEC) { >>>> rc = ir_lirc_register(dev); >>>> if (rc < 0) >>>> goto out_rx; >>>> } >>>> >>>> So it looks like ir_lirc_scancode_event() fails because dev->lirc_fh_lock was never >>>> initialized. >>>> >>>> Could this be fall-out from the lirc changes you did not too long ago? >>> >>> Yes, this is broken. My bad, sorry. I think this must have been broken since >>> v4.16. I can write a patch but I don't have a patch but I'm on the train >>> to ELCE in Edinburgh now, with no hardware to test on. >>> >>> >>> Sean >>> >> >> the kernel oops have been happening for a while now. >> yesterday when i checked my logs i can see them at least back a couple of >> months when i was running 4.17 >> >> also my scripts to poll status of my tv and turn it on/off works for a while >> so it doesn't crash right away. >> maybe it only crashes when i send cec command to turn on/off tv and only >> polling for status is no problem. >> >> >> i think i have a separate issue too because i had problems even before the >> kernel oopses started. >> but i suspect this is caused by my tv locking up the cec bus because >> unplugging power to tv for a few minutes (i must wait or it will still be >> just as broken) and then back used to resolve the cec errors from my >> scripts. > > > Would you be able to test the following patch please? Sean, I think you should be able to test this with the vivid driver. Load the vivid driver, run: cec-ctl --tv; cec-ctl -d1 --playback Then: cec-ctl -d1 -t0 --user-control-pressed ui-cmd=F5 That said, I tried this, but it doesn't crash for me, but perhaps I need to run some RC command first... Regards, Hans > > Thanks, > > Sean > --- > From 1b8b20b606b30c0e301c80e18af8d77194269bc1 Mon Sep 17 00:00:00 2001 > From: Sean Young <sean@mess.org> > Date: Mon, 22 Oct 2018 10:01:50 +0100 > Subject: [PATCH] media: rc: cec devices do not have a lirc chardev > MIME-Version: 1.0 > Content-Type: text/plain; charset=UTF-8 > Content-Transfer-Encoding: 8bit > > This fixes an oops in ir_lirc_scancode_event(). > > BUG: unable to handle kernel NULL pointer dereference at 0000000000000038 > PGD 0 P4D 0 > Oops: 0000 [#1] SMP PTI > CPU: 9 PID: 27687 Comm: kworker/9:2 Tainted: P OE 4.18.12-200.fc28.x86_64 #1 > Hardware name: Supermicro C7X99-OCE-F/C7X99-OCE-F, BIOS 2.1a 06/15/2018 > Workqueue: events pulse8_irq_work_handler [pulse8_cec] > RIP: 0010:ir_lirc_scancode_event+0x3d/0xb0 [rc_core] > Code: 8d ae b4 07 00 00 49 81 c6 b8 07 00 00 53 e8 4a df c3 d5 48 89 ef 49 89 45 00 e8 4e 84 41 d6 49 8b 1e 49 89 c4 4c 39 f3 74 58 <8b> 43 38 8b 53 40 89 c1 2b 4b 3c 39 ca 72 41 21 d0 49 8b 7d 00 49 > RSP: 0018:ffffaa10e3c07d58 EFLAGS: 00010017 > RAX: 0000000000000002 RBX: 0000000000000000 RCX: 0000000000000018 > RDX: 0000000000000001 RSI: 00316245397fa93c RDI: ffff966d31c8d7b4 > RBP: ffff966d31c8d7b4 R08: 0000000000000000 R09: 0000000000000000 > R10: 0000000000000003 R11: ffffaa10e3c07e28 R12: 0000000000000002 > R13: ffffaa10e3c07d88 R14: ffff966d31c8d7b8 R15: 0000000000000073 > FS: 0000000000000000(0000) GS:ffff966d3f440000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 0000000000000038 CR3: 00000009d820a003 CR4: 00000000003606e0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > Call Trace: > ir_do_keydown+0x75/0x260 [rc_core] > rc_keydown+0x54/0xc0 [rc_core] > cec_received_msg_ts+0xaa8/0xaf0 [cec] > process_one_work+0x1a1/0x350 > worker_thread+0x30/0x380 > ? pwq_unbound_release_workfn+0xd0/0xd0 > kthread+0x112/0x130 > ? kthread_create_worker_on_cpu+0x70/0x70 > ret_from_fork+0x35/0x40 > Modules linked in: rc_tt_1500 dvb_usb_dvbsky dvb_usb_v2 uas usb_storage fuse vhost_net vhost tap xt_CHECKSUM iptable_mangle ip6t_REJECT nf_reject_ipv6 tun 8021q garp mrp xt_nat macvlan xfs devlink ebta > si2157 si2168 cx25840 cx23885 kvm altera_ci tda18271 joydev ir_rc6_decoder rc_rc6_mce crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_cstate intel_uncore altera_stapl m88ds3103 tveeprom cx2341 > mxm_wmi igb crc32c_intel megaraid_sas dca i2c_algo_bit wmi vfio_pci irqbypass vfio_virqfd vfio_iommu_type1 vfio i2c_dev > CR2: 0000000000000038 > > Cc: <stable@vger.kernel.org> # v4.16+ > Signed-off-by: Sean Young <sean@mess.org> > --- > drivers/media/rc/rc-main.c | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/drivers/media/rc/rc-main.c b/drivers/media/rc/rc-main.c > index 552bbe82a160..877978dbd409 100644 > --- a/drivers/media/rc/rc-main.c > +++ b/drivers/media/rc/rc-main.c > @@ -695,7 +695,8 @@ void rc_repeat(struct rc_dev *dev) > (dev->last_toggle ? LIRC_SCANCODE_FLAG_TOGGLE : 0) > }; > > - ir_lirc_scancode_event(dev, &sc); > + if (dev->allowed_protocols != RC_PROTO_BIT_CEC) > + ir_lirc_scancode_event(dev, &sc); > > spin_lock_irqsave(&dev->keylock, flags); > > @@ -735,7 +736,8 @@ static void ir_do_keydown(struct rc_dev *dev, enum rc_proto protocol, > .keycode = keycode > }; > > - ir_lirc_scancode_event(dev, &sc); > + if (dev->allowed_protocols != RC_PROTO_BIT_CEC) > + ir_lirc_scancode_event(dev, &sc); > > if (new_event && dev->keypressed) > ir_do_keyup(dev, false); >
On Mon, Oct 22, 2018 at 12:30:29PM +0100, Hans Verkuil wrote: > On 10/22/2018 11:14 AM, Sean Young wrote: > > On Mon, Oct 22, 2018 at 11:44:22AM +0200, Torbjorn Jansson wrote: > >> On 2018-10-22 10:59, Sean Young wrote: > >>> On Sat, Oct 20, 2018 at 11:12:16PM +0200, Hans Verkuil wrote: > >>>> Hi Sean, > >>>> > >>>> Can you take a look at this, it appears to be an RC issue, see my analysis below. > >>>> > >>>> On 10/20/2018 03:26 PM, Torbjorn Jansson wrote: > >>>>> Hello > >>>>> > >>>>> i'm using the pulse8 usb cec adapter to control my tv. > >>>>> i have a few scripts that poll the power status of my tv and after a while it stops working returning errors when trying to check if tv is on or off. > >>>>> this i think matches a kernel oops i'm seeing that i suspect is related to this. > >>>>> > >>>>> i have sometimes been able to recover from this problem by completely cutting power to my tv and also unplugging the usb cec adapter. > >>>>> i have a feeling that the tv is at least partly to blame for cec-ctl not working but in any case there shouldn't be a kernel oops. > >>>>> > >>>>> > >>>>> also every now and then i see this in dmesg: > >>>>> cec cec0: transmit: failed 05 > >>>>> cec cec0: transmit: failed 06 > >>>>> but that doesn't appear to do any harm as far as i can tell. > >>>>> > >>>>> any idea whats causing the oops? > >>>>> > >>>>> the ops: > >>>>> > >>>>> BUG: unable to handle kernel NULL pointer dereference at 0000000000000038 > >>>>> PGD 0 P4D 0 > >>>>> Oops: 0000 [#1] SMP PTI > >>>>> CPU: 9 PID: 27687 Comm: kworker/9:2 Tainted: P OE 4.18.12-200.fc28.x86_64 #1 > >>>>> Hardware name: Supermicro C7X99-OCE-F/C7X99-OCE-F, BIOS 2.1a 06/15/2018 > >>>>> Workqueue: events pulse8_irq_work_handler [pulse8_cec] > >>>>> RIP: 0010:ir_lirc_scancode_event+0x3d/0xb0 [rc_core] > >>>> > >>>> Huh. ir_lirc_scancode_event() calls spin_lock_irqsave(&dev->lirc_fh_lock, flags); > >>>> > >>>> The spinlock dev->lirc_fh_lock is initialized in ir_lirc_register(), which is called > >>>> from rc_register_device(), except when the protocol is CEC: > >>>> > >>>> /* Ensure that the lirc kfifo is setup before we start the thread */ > >>>> if (dev->allowed_protocols != RC_PROTO_BIT_CEC) { > >>>> rc = ir_lirc_register(dev); > >>>> if (rc < 0) > >>>> goto out_rx; > >>>> } > >>>> > >>>> So it looks like ir_lirc_scancode_event() fails because dev->lirc_fh_lock was never > >>>> initialized. > >>>> > >>>> Could this be fall-out from the lirc changes you did not too long ago? > >>> > >>> Yes, this is broken. My bad, sorry. I think this must have been broken since > >>> v4.16. I can write a patch but I don't have a patch but I'm on the train > >>> to ELCE in Edinburgh now, with no hardware to test on. > >>> > >>> > >>> Sean > >>> > >> > >> the kernel oops have been happening for a while now. > >> yesterday when i checked my logs i can see them at least back a couple of > >> months when i was running 4.17 > >> > >> also my scripts to poll status of my tv and turn it on/off works for a while > >> so it doesn't crash right away. > >> maybe it only crashes when i send cec command to turn on/off tv and only > >> polling for status is no problem. > >> > >> > >> i think i have a separate issue too because i had problems even before the > >> kernel oopses started. > >> but i suspect this is caused by my tv locking up the cec bus because > >> unplugging power to tv for a few minutes (i must wait or it will still be > >> just as broken) and then back used to resolve the cec errors from my > >> scripts. > > > > > > Would you be able to test the following patch please? > > Sean, > > I think you should be able to test this with the vivid driver. Load the vivid driver, > run: > > cec-ctl --tv; cec-ctl -d1 --playback > > Then: > > cec-ctl -d1 -t0 --user-control-pressed ui-cmd=F5 Ah, thanks. That will help with testing/reproducing. > That said, I tried this, but it doesn't crash for me, but perhaps I need to run > some RC command first... Hmm I think those commands should be enough. It probably needs CONFIG_DEBUG_SPINLOCK to detect the uninitialized spinlock. I'm trying it now. Thanks, Sean
On 2018-10-22 14:28, Sean Young wrote: > On Mon, Oct 22, 2018 at 12:30:29PM +0100, Hans Verkuil wrote: >> On 10/22/2018 11:14 AM, Sean Young wrote: >>> On Mon, Oct 22, 2018 at 11:44:22AM +0200, Torbjorn Jansson wrote: >>>> On 2018-10-22 10:59, Sean Young wrote: >>>>> On Sat, Oct 20, 2018 at 11:12:16PM +0200, Hans Verkuil wrote: >>>>>> Hi Sean, >>>>>> >>>>>> Can you take a look at this, it appears to be an RC issue, see my analysis below. >>>>>> >>>>>> On 10/20/2018 03:26 PM, Torbjorn Jansson wrote: >>>>>>> Hello >>>>>>> >>>>>>> i'm using the pulse8 usb cec adapter to control my tv. >>>>>>> i have a few scripts that poll the power status of my tv and after a while it stops working returning errors when trying to check if tv is on or off. >>>>>>> this i think matches a kernel oops i'm seeing that i suspect is related to this. >>>>>>> >>>>>>> i have sometimes been able to recover from this problem by completely cutting power to my tv and also unplugging the usb cec adapter. >>>>>>> i have a feeling that the tv is at least partly to blame for cec-ctl not working but in any case there shouldn't be a kernel oops. >>>>>>> >>>>>>> >>>>>>> also every now and then i see this in dmesg: >>>>>>> cec cec0: transmit: failed 05 >>>>>>> cec cec0: transmit: failed 06 >>>>>>> but that doesn't appear to do any harm as far as i can tell. >>>>>>> >>>>>>> any idea whats causing the oops? >>>>>>> >>>>>>> the ops: >>>>>>> >>>>>>> BUG: unable to handle kernel NULL pointer dereference at 0000000000000038 >>>>>>> PGD 0 P4D 0 >>>>>>> Oops: 0000 [#1] SMP PTI >>>>>>> CPU: 9 PID: 27687 Comm: kworker/9:2 Tainted: P OE 4.18.12-200.fc28.x86_64 #1 >>>>>>> Hardware name: Supermicro C7X99-OCE-F/C7X99-OCE-F, BIOS 2.1a 06/15/2018 >>>>>>> Workqueue: events pulse8_irq_work_handler [pulse8_cec] >>>>>>> RIP: 0010:ir_lirc_scancode_event+0x3d/0xb0 [rc_core] >>>>>> >>>>>> Huh. ir_lirc_scancode_event() calls spin_lock_irqsave(&dev->lirc_fh_lock, flags); >>>>>> >>>>>> The spinlock dev->lirc_fh_lock is initialized in ir_lirc_register(), which is called >>>>>> from rc_register_device(), except when the protocol is CEC: >>>>>> >>>>>> /* Ensure that the lirc kfifo is setup before we start the thread */ >>>>>> if (dev->allowed_protocols != RC_PROTO_BIT_CEC) { >>>>>> rc = ir_lirc_register(dev); >>>>>> if (rc < 0) >>>>>> goto out_rx; >>>>>> } >>>>>> >>>>>> So it looks like ir_lirc_scancode_event() fails because dev->lirc_fh_lock was never >>>>>> initialized. >>>>>> >>>>>> Could this be fall-out from the lirc changes you did not too long ago? >>>>> >>>>> Yes, this is broken. My bad, sorry. I think this must have been broken since >>>>> v4.16. I can write a patch but I don't have a patch but I'm on the train >>>>> to ELCE in Edinburgh now, with no hardware to test on. >>>>> >>>>> >>>>> Sean >>>>> >>>> >>>> the kernel oops have been happening for a while now. >>>> yesterday when i checked my logs i can see them at least back a couple of >>>> months when i was running 4.17 >>>> >>>> also my scripts to poll status of my tv and turn it on/off works for a while >>>> so it doesn't crash right away. >>>> maybe it only crashes when i send cec command to turn on/off tv and only >>>> polling for status is no problem. >>>> >>>> >>>> i think i have a separate issue too because i had problems even before the >>>> kernel oopses started. >>>> but i suspect this is caused by my tv locking up the cec bus because >>>> unplugging power to tv for a few minutes (i must wait or it will still be >>>> just as broken) and then back used to resolve the cec errors from my >>>> scripts. >>> >>> >>> Would you be able to test the following patch please? >> >> Sean, >> >> I think you should be able to test this with the vivid driver. Load the vivid driver, >> run: >> >> cec-ctl --tv; cec-ctl -d1 --playback >> >> Then: >> >> cec-ctl -d1 -t0 --user-control-pressed ui-cmd=F5 > > Ah, thanks. That will help with testing/reproducing. > >> That said, I tried this, but it doesn't crash for me, but perhaps I need to run >> some RC command first... > > Hmm I think those commands should be enough. It probably needs > CONFIG_DEBUG_SPINLOCK to detect the uninitialized spinlock. I'm trying it now. > > Thanks, > > Sean > FYI the commands i run is as follows. getting status, this is run frequently several times per minute (once every 10 or 15 seconds): cec-ctl --to=0 --give-device-power-status then when i want to power on or off this is run: cec-ctl --to=0 --image-view-on or cec-ctl --to=0 --standby it usually takes day or two before i get a kernel oops. but i haven't studied super closely when exactly the oops happens in relation to what commands i sent since by now I'm used to it not working most of the time.
On Mon, Oct 22, 2018 at 01:28:42PM +0100, Sean Young wrote: > On Mon, Oct 22, 2018 at 12:30:29PM +0100, Hans Verkuil wrote: > > On 10/22/2018 11:14 AM, Sean Young wrote: > > > Would you be able to test the following patch please? > > > > Sean, > > > > I think you should be able to test this with the vivid driver. Load the vivid driver, > > run: > > > > cec-ctl --tv; cec-ctl -d1 --playback > > > > Then: > > > > cec-ctl -d1 -t0 --user-control-pressed ui-cmd=F5 > > Ah, thanks. That will help with testing/reproducing. > > > That said, I tried this, but it doesn't crash for me, but perhaps I need to run > > some RC command first... > > Hmm I think those commands should be enough. It probably needs > CONFIG_DEBUG_SPINLOCK to detect the uninitialized spinlock. I'm trying it now. Yes, that turned out to work. With CONFIG_DEBUG_SPINLOCK on, it goes bang every time. With the patch, the problem goes away. Without CONFIG_DEBUG_SPINLOCK we're going into undefined behaviour, so Torbjorn you're only seeing the oops occassionally (and which is why it has not been observed or reported before). Thanks, Sean
On 2018-10-22 15:17, Sean Young wrote: > On Mon, Oct 22, 2018 at 01:28:42PM +0100, Sean Young wrote: >> On Mon, Oct 22, 2018 at 12:30:29PM +0100, Hans Verkuil wrote: >>> On 10/22/2018 11:14 AM, Sean Young wrote: >>>> Would you be able to test the following patch please? >>> >>> Sean, >>> >>> I think you should be able to test this with the vivid driver. Load the vivid driver, >>> run: >>> >>> cec-ctl --tv; cec-ctl -d1 --playback >>> >>> Then: >>> >>> cec-ctl -d1 -t0 --user-control-pressed ui-cmd=F5 >> >> Ah, thanks. That will help with testing/reproducing. >> >>> That said, I tried this, but it doesn't crash for me, but perhaps I need to run >>> some RC command first... >> >> Hmm I think those commands should be enough. It probably needs >> CONFIG_DEBUG_SPINLOCK to detect the uninitialized spinlock. I'm trying it now. > > Yes, that turned out to work. With CONFIG_DEBUG_SPINLOCK on, it goes bang > every time. With the patch, the problem goes away. > > Without CONFIG_DEBUG_SPINLOCK we're going into undefined behaviour, so > Torbjorn you're only seeing the oops occassionally (and which is why it has > not been observed or reported before). > > Thanks, > > Sean > FYI i have tested this patch for a while now and have not seen any more kernel oops. So it appears to be working.
diff --git a/drivers/media/rc/rc-main.c b/drivers/media/rc/rc-main.c index 552bbe82a160..877978dbd409 100644 --- a/drivers/media/rc/rc-main.c +++ b/drivers/media/rc/rc-main.c @@ -695,7 +695,8 @@ void rc_repeat(struct rc_dev *dev) (dev->last_toggle ? LIRC_SCANCODE_FLAG_TOGGLE : 0) }; - ir_lirc_scancode_event(dev, &sc); + if (dev->allowed_protocols != RC_PROTO_BIT_CEC) + ir_lirc_scancode_event(dev, &sc); spin_lock_irqsave(&dev->keylock, flags); @@ -735,7 +736,8 @@ static void ir_do_keydown(struct rc_dev *dev, enum rc_proto protocol, .keycode = keycode }; - ir_lirc_scancode_event(dev, &sc); + if (dev->allowed_protocols != RC_PROTO_BIT_CEC) + ir_lirc_scancode_event(dev, &sc); if (new_event && dev->keypressed) ir_do_keyup(dev, false);