diff mbox series

[1/2] Input: synaptics-rmi4 - clear irqs before set irqs

Message ID 20190220164200.31044-1-aaron.ma@canonical.com (mailing list archive)
State New, archived
Headers show
Series [1/2] Input: synaptics-rmi4 - clear irqs before set irqs | expand

Commit Message

Aaron Ma Feb. 20, 2019, 4:41 p.m. UTC
rmi4 got spam data after S3 resume on some ThinkPads.
Then TrackPoint lost when be detected by psmouse.
Clear irqs status before set irqs will make TrackPoint back.

BugLink: https://bugs.launchpad.net/bugs/1791427
Cc: <stable@vger.kernel.org>
Signed-off-by: Aaron Ma <aaron.ma@canonical.com>
---
 drivers/input/rmi4/rmi_driver.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

Comments

Christopher Heiny March 8, 2019, 11:13 p.m. UTC | #1
On Wed, 2019-02-20 at 17:41 +0100, Aaron Ma wrote:
> CAUTION: Email originated externally, do not click links or open
> attachments unless you recognize the sender and know the content is
> safe.
> 
> 
> rmi4 got spam data after S3 resume on some ThinkPads.
> Then TrackPoint lost when be detected by psmouse.
> Clear irqs status before set irqs will make TrackPoint back.
> 
> BugLink: 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__bugs.launchpad.net_bugs_1791427&d=DwIBAg&c=7dfBJ8cXbWjhc0BhImu8wQ&r=veOxv1_7HLXIaWG-OKLqp-qvZc3r7ucT1d-68JSWqpM&m=3nNm4ob6G1wtf502YFuxorJVkSQvdBasje2RrZLxhTM&s=Z0nHSPAKhQLzdoENAZBYuDC6QmZNOliyiE7h1OOVkBA&e=
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Aaron Ma <aaron.ma@canonical.com>
> ---
>  drivers/input/rmi4/rmi_driver.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/drivers/input/rmi4/rmi_driver.c
> b/drivers/input/rmi4/rmi_driver.c
> index fc3ab93b7aea..20631b272f43 100644
> --- a/drivers/input/rmi4/rmi_driver.c
> +++ b/drivers/input/rmi4/rmi_driver.c
> @@ -374,6 +374,17 @@ static int rmi_driver_set_irq_bits(struct
> rmi_device *rmi_dev,
>         struct device *dev = &rmi_dev->dev;
> 
>         mutex_lock(&data->irq_mutex);
> +
> +       /* Dummy read in order to clear irqs */
> +       error = rmi_read_block(rmi_dev,
> +                       data->f01_container->fd.data_base_addr + 1,
> +                       data->irq_status, data->num_of_irq_regs);
> +       if (error < 0) {
> +               dev_err(dev, "%s: Failed to read interrupt status!",
> +                                                       __func__);
> +               goto error_unlock;
> +       }
> +
>         bitmap_or(data->new_irq_mask,
>                   data->current_irq_mask, mask, data->irq_count);
> 
> --
> 2.17.1
> 

Sorry for the long delay in following up on this.

I'm not sure this is a safe action, due to a race condition with the
actual IRQ handler (rmi_process_interrupt_requests from rmi_driver.c). 
Remember that reading the IRQ status register clears all the IRQ bits. 
So you're faced with this possible scenario:
    - ATTN asserted, indicating new data in IRQ status register
    - rmi_driver_set_irq_bits called
    - rmi_driver_set_irq_bits reads IRQ status, clearing bits
    - rmi_process_interrupt_requests called
    - rmi_process_interrupt_request reads IRQ status, sees no
      bits set, nested IRQs are not handled
This could lead to loss of data or inconsistent system state
information.  For example, a button up or down event may be lost, with
consequent weird behavior by the user interface.

CAVEAT: I haven't had much to do with the RMI4 driver for a long while,
and am just starting to poke around with it again.  I am not very
familiar with the current IRQ handling implementation.  Perhaps there
is a guarantee in the kernel IRQ mechanism that once ATTN is asserted,
then rmi_process_interrupt_requests will be called before anyone else
gets a chance to read the IRQ status register.  If that's the case, let
me know I'm worried about nothing, and ignore this comment.

Cheers,
Chris
Aaron Ma March 9, 2019, 8:37 a.m. UTC | #2
Hi,

On 3/9/19 7:13 AM, Christopher Heiny wrote:
> I'm not sure this is a safe action, due to a race condition with the
> actual IRQ handler (rmi_process_interrupt_requests from rmi_driver.c). 
> Remember that reading the IRQ status register clears all the IRQ bits. 
> So you're faced with this possible scenario:
>     - ATTN asserted, indicating new data in IRQ status register
>     - rmi_driver_set_irq_bits called
>     - rmi_driver_set_irq_bits reads IRQ status, clearing bits
>     - rmi_process_interrupt_requests called
>     - rmi_process_interrupt_request reads IRQ status, sees no
>       bits set, nested IRQs are not handled
> This could lead to loss of data or inconsistent system state
> information.  For example, a button up or down event may be lost, with
> consequent weird behavior by the user interface.

rmi_driver_set_irq_bits is only called to config and enable specific
functions of RMI.
Reading IRQ status before set irqs is supposed to clear spam data/irq
status.
spam data make probe/detect touchpad/trackpoint fail.

rmi_smb_resume -> rmi_driver_reset_handler -> fn-config ->
clear_irq_bits -> set_irq_bits -> enable_irq -> irq_handler  ->
rmi_process_interrupt_requests

set_irq_bits will not be in interrupt context, it enables IRQ bits of RMI.

Regards,
Aaron
Aaron Ma March 28, 2019, 6:02 a.m. UTC | #3
Hi Dmitry and Chiristopher:

Do you have any suggestion about these 2 patches?

Many users confirmed that they fixed issues of Trackpoint/Touchpad after S3.

Will you consider them be accepted?

Thanks,
Aaron
Christopher Heiny April 2, 2019, 4:16 p.m. UTC | #4
On Thu, 2019-03-28 at 14:02 +0800, Aaron Ma wrote:
> Hi Dmitry and Chiristopher:
> 
> Do you have any suggestion about these 2 patches?
> 
> Many users confirmed that they fixed issues of Trackpoint/Touchpad
> after S3.
> 
> Will you consider them be accepted?

Hi Aaron,

Sorry - I thought I'd replied on the NO SLEEP portion of these patches,
but looking back I don't find the draft or the sent email.  Sorry about
that.  I'll summarize here what I wrote last month.

This isn't so much a "fix" as a "hacky workaround" for the issue.  From
the descriptions in the bug you linked in your original patch
submission, it appears that the root cause is somewhere in SMBus system
(could be SMBus driver, SMBus hardware, or the devices on the SMBus
(touch devices or other devices) - it's hard to tell with the info
available), where the SMBus is failing to come online correctly coming
out of S3 state.  Anyway, this patch doesn't fix the root cause, but
merely works around it.

Setting the NO SLEEP bit will force the touch sensor to remain in a
high power consumption state while the rest of the system is in S3. 
While not a lot of power compared to things like the CPU, display, and
others, it is still non-trivial and will result in shorter time-on-
battery capability.

If you have access to the power pin(s) for the touch sensor(s)/styk(s),
it might be interesting to try turning power off on entering S3, and
restoring it on exit.  That's very hacky, and has the side effect of
slightly delaying touchpad readiness on exit from S3.  Plus you'll need
to restore touch sensor configuration settings on exit.  But it
definitely reduces power consumption.


Separately, I am still concerned about the possibility of dropped touch
events in the IRQ clearing.  I'm not convinced that the code is safe
(as you mentioned in your reply to my earlier comment), so I'll have to
study the implementation more carefully.

					Cheers,
						Chris



> 
> Thanks,
> Aaron
Aaron Ma April 3, 2019, 1:58 p.m. UTC | #5
On 4/3/19 12:16 AM, Christopher Heiny wrote:
> On Thu, 2019-03-28 at 14:02 +0800, Aaron Ma wrote:
>> Hi Dmitry and Chiristopher:
>>
>> Do you have any suggestion about these 2 patches?
>>
>> Many users confirmed that they fixed issues of Trackpoint/Touchpad
>> after S3.
>>
>> Will you consider them be accepted?
> Hi Aaron,
> 
> Sorry - I thought I'd replied on the NO SLEEP portion of these patches,
> but looking back I don't find the draft or the sent email.  Sorry about
> that.  I'll summarize here what I wrote last month.
> 
> This isn't so much a "fix" as a "hacky workaround" for the issue.  From
> the descriptions in the bug you linked in your original patch
> submission, it appears that the root cause is somewhere in SMBus system
> (could be SMBus driver, SMBus hardware, or the devices on the SMBus
> (touch devices or other devices) - it's hard to tell with the info
> available), where the SMBus is failing to come online correctly coming
> out of S3 state.  Anyway, this patch doesn't fix the root cause, but
> merely works around it.

Users confirmed the 1st patch that clear irq status fixed their multiple
issues on Touchpad and Trackpoint.
I think it is a fix.

NO SLEEP patch was tried to give users a choice a fix touchpad issues
that I didn't reproduce.
If you don't like this export, we can drop it now as users confirmed 1st
patch works.

> 
> Setting the NO SLEEP bit will force the touch sensor to remain in a
> high power consumption state while the rest of the system is in S3. 
> While not a lot of power compared to things like the CPU, display, and
> others, it is still non-trivial and will result in shorter time-on-
> battery capability.

Verified on s2idle and S3 and system running idle mode. no difference on
power consumption of whole system with or without set 1 to nosleep.

> 
> If you have access to the power pin(s) for the touch sensor(s)/styk(s),
> it might be interesting to try turning power off on entering S3, and
> restoring it on exit.  That's very hacky, and has the side effect of
> slightly delaying touchpad readiness on exit from S3.  Plus you'll need
> to restore touch sensor configuration settings on exit.  But it
> definitely reduces power consumption.
> 
> 
> Separately, I am still concerned about the possibility of dropped touch
> events in the IRQ clearing.  I'm not convinced that the code is safe
> (as you mentioned in your reply to my earlier comment), so I'll have to
> study the implementation more carefully.

Sure, take your time, if you have any questions let me know please.

Thanks,
Aaron

> 
> 					Cheers,
> 						Chris
> 
> 
> 
>> Thanks,
>> Aaron
>
Aaron Ma June 4, 2019, 2:45 a.m. UTC | #6
Hi Christopher:

Have got time to review these 2 patches?
Users reported it works fine since I sent out this patch.

Thanks,
Aaron

On 4/3/19 9:58 PM, Aaron Ma wrote:
> Sure, take your time, if you have any questions let me know please.
> 
> Thanks,
> Aaron
Christopher Heiny June 4, 2019, 5:19 a.m. UTC | #7
On Tue, 2019-06-04 at 10:45 +0800, Aaron Ma wrote:
> Hi Christopher:
> 
> Have got time to review these 2 patches?
> Users reported it works fine since I sent out this patch.

Hi Aaron,

I've been poking around with this off and on.  Unfortunately, more off
than on :-( but here's my current take:

rmi_driver_set_irq_bits() isn't going to be called all that often, and
it's not going to be called at all during normal operation, which is
where the most serious problem would occur.

I haven't entirely convinced myself that there couldn't be a problem
during repeated spontaneous device resets (for example, due to ESD, a
dodgy charger, or firmware bug, among other things).  On the other
hand, all the scenarios I have come up with are both unlikely and so
contrived that the system is probably hosed regardless of what we do in
the driver.

Given that, I'm willing to accept the patch as is.

					Cheers,
						Chris







> 
> Thanks,
> Aaron
> 
> On 4/3/19 9:58 PM, Aaron Ma wrote:
> > Sure, take your time, if you have any questions let me know please.
> > 
> > Thanks,
> > Aaron
Aaron Ma June 7, 2019, 7:48 a.m. UTC | #8
Hi Dmitry:

Will you apply them?

Thanks,
Aaron

On 6/4/19 1:19 PM, Christopher Heiny wrote:
> Given that, I'm willing to accept the patch as is.
> 
> 					Cheers,
> 						Chris
Dmitry Torokhov June 9, 2019, 4:55 p.m. UTC | #9
Hi Aaron,

On Wed, Feb 20, 2019 at 05:41:59PM +0100, Aaron Ma wrote:
> rmi4 got spam data after S3 resume on some ThinkPads.
> Then TrackPoint lost when be detected by psmouse.
> Clear irqs status before set irqs will make TrackPoint back.

Could you please give me an idea as to what this spam data is?

In F03 probe we clear all pending data before enabling the function,
maybe the same needs to be done on resume, instead of changing the way
we handle IRQ bits?

Thanks,
Aaron Ma June 10, 2019, 4:55 p.m. UTC | #10
On 6/10/19 12:55 AM, Dmitry Torokhov wrote:
> Hi Aaron,
> 
> On Wed, Feb 20, 2019 at 05:41:59PM +0100, Aaron Ma wrote:
>> rmi4 got spam data after S3 resume on some ThinkPads.
>> Then TrackPoint lost when be detected by psmouse.
>> Clear irqs status before set irqs will make TrackPoint back.
> Could you please give me an idea as to what this spam data is?
> 

It should be some data 0 during suspend/resume.
Actually I don't know how these data 0 is produced.
Not all synaptics touchpads have this issue.

> In F03 probe we clear all pending data before enabling the function,

Yes we did, but not after resume.

> maybe the same needs to be done on resume, instead of changing the way
> we handle IRQ bits?

This patch is supposed to clear irq status like it in fn probe. Not
changing IRQ bits.

Thanks,
Aaron

> 
> Thanks,
> 
> -- Dmitry
>
Dmitry Torokhov June 11, 2019, 5:35 p.m. UTC | #11
On Tue, Jun 11, 2019 at 12:55:58AM +0800, Aaron Ma wrote:
> 
> On 6/10/19 12:55 AM, Dmitry Torokhov wrote:
> > Hi Aaron,
> > 
> > On Wed, Feb 20, 2019 at 05:41:59PM +0100, Aaron Ma wrote:
> >> rmi4 got spam data after S3 resume on some ThinkPads.
> >> Then TrackPoint lost when be detected by psmouse.
> >> Clear irqs status before set irqs will make TrackPoint back.
> > Could you please give me an idea as to what this spam data is?
> > 
> 
> It should be some data 0 during suspend/resume.
> Actually I don't know how these data 0 is produced.
> Not all synaptics touchpads have this issue.
> 
> > In F03 probe we clear all pending data before enabling the function,
> 
> Yes we did, but not after resume.

Yes, I understand that. The question I was asking: if we add code
consuming all pending data to f03->suspend(), similarly to what we are
doing at probe time, will it fix the issue with trackstick losing
synchronization and attempting disconnect?

> 
> > maybe the same needs to be done on resume, instead of changing the way
> > we handle IRQ bits?
> 
> This patch is supposed to clear irq status like it in fn probe. Not
> changing IRQ bits.

What I meant is changing how we enable IRQ bits. I would really prefer
we did not lose IRQ state for other functions when we enable interrupts
for given function.

Thanks.
Aaron Ma June 14, 2019, 4:26 a.m. UTC | #12
On 6/12/19 1:35 AM, Dmitry Torokhov wrote:
> On Tue, Jun 11, 2019 at 12:55:58AM +0800, Aaron Ma wrote:
>> On 6/10/19 12:55 AM, Dmitry Torokhov wrote:
>>> Hi Aaron,
>>>
>>> On Wed, Feb 20, 2019 at 05:41:59PM +0100, Aaron Ma wrote:
>>>> rmi4 got spam data after S3 resume on some ThinkPads.
>>>> Then TrackPoint lost when be detected by psmouse.
>>>> Clear irqs status before set irqs will make TrackPoint back.
>>> Could you please give me an idea as to what this spam data is?
>>>
>> It should be some data 0 during suspend/resume.
>> Actually I don't know how these data 0 is produced.
>> Not all synaptics touchpads have this issue.
>>
>>> In F03 probe we clear all pending data before enabling the function,
>> Yes we did, but not after resume.
> Yes, I understand that. The question I was asking: if we add code
> consuming all pending data to f03->suspend(), similarly to what we are
> doing at probe time, will it fix the issue with trackstick losing
> synchronization and attempting disconnect?
> 

I just do some test via adding code in suspend or resume.
But they didn't work out.

>>> maybe the same needs to be done on resume, instead of changing the way
>>> we handle IRQ bits?
>> This patch is supposed to clear irq status like it in fn probe. Not
>> changing IRQ bits.
> What I meant is changing how we enable IRQ bits. I would really prefer
> we did not lose IRQ state for other functions when we enable interrupts
> for given function.
> 

Not only F03 with problem, F12 too which is touchpad .
User verified this patch fixes problem of F12 too.
Clear IRQ status before enable IRQ should be safe.

Or we can add code before enable IRQ in F03/F12.

Thanks,
Aaron

> Thanks.
> 
> -- Dmitry
>
Kai-Heng Feng Nov. 19, 2019, 5:34 a.m. UTC | #13
Hi Dmitry,

> On Jun 14, 2019, at 12:26, Aaron Ma <aaron.ma@canonical.com> wrote:
> 
> On 6/12/19 1:35 AM, Dmitry Torokhov wrote:
>> On Tue, Jun 11, 2019 at 12:55:58AM +0800, Aaron Ma wrote:
>>> On 6/10/19 12:55 AM, Dmitry Torokhov wrote:
>>>> Hi Aaron,
>>>> 
>>>> On Wed, Feb 20, 2019 at 05:41:59PM +0100, Aaron Ma wrote:
>>>>> rmi4 got spam data after S3 resume on some ThinkPads.
>>>>> Then TrackPoint lost when be detected by psmouse.
>>>>> Clear irqs status before set irqs will make TrackPoint back.
>>>> Could you please give me an idea as to what this spam data is?
>>>> 
>>> It should be some data 0 during suspend/resume.
>>> Actually I don't know how these data 0 is produced.
>>> Not all synaptics touchpads have this issue.
>>> 
>>>> In F03 probe we clear all pending data before enabling the function,
>>> Yes we did, but not after resume.
>> Yes, I understand that. The question I was asking: if we add code
>> consuming all pending data to f03->suspend(), similarly to what we are
>> doing at probe time, will it fix the issue with trackstick losing
>> synchronization and attempting disconnect?
>> 
> 
> I just do some test via adding code in suspend or resume.
> But they didn't work out.
> 
>>>> maybe the same needs to be done on resume, instead of changing the way
>>>> we handle IRQ bits?
>>> This patch is supposed to clear irq status like it in fn probe. Not
>>> changing IRQ bits.
>> What I meant is changing how we enable IRQ bits. I would really prefer
>> we did not lose IRQ state for other functions when we enable interrupts
>> for given function.
>> 
> 
> Not only F03 with problem, F12 too which is touchpad .
> User verified this patch fixes problem of F12 too.
> Clear IRQ status before enable IRQ should be safe.
> 
> Or we can add code before enable IRQ in F03/F12.

Users reported that  patch [1/2] alone can solve the issue.

Do we need more information before making this fix merged?

Kai-Heng

> 
> Thanks,
> Aaron
> 
>> Thanks.
>> 
>> -- Dmitry
diff mbox series

Patch

diff --git a/drivers/input/rmi4/rmi_driver.c b/drivers/input/rmi4/rmi_driver.c
index fc3ab93b7aea..20631b272f43 100644
--- a/drivers/input/rmi4/rmi_driver.c
+++ b/drivers/input/rmi4/rmi_driver.c
@@ -374,6 +374,17 @@  static int rmi_driver_set_irq_bits(struct rmi_device *rmi_dev,
 	struct device *dev = &rmi_dev->dev;
 
 	mutex_lock(&data->irq_mutex);
+
+	/* Dummy read in order to clear irqs */
+	error = rmi_read_block(rmi_dev,
+			data->f01_container->fd.data_base_addr + 1,
+			data->irq_status, data->num_of_irq_regs);
+	if (error < 0) {
+		dev_err(dev, "%s: Failed to read interrupt status!",
+							__func__);
+		goto error_unlock;
+	}
+
 	bitmap_or(data->new_irq_mask,
 		  data->current_irq_mask, mask, data->irq_count);