diff mbox

hidraw: fix list->buffer race condition

Message ID 1475628266-4767-1-git-send-email-gary.king@oculus.com (mailing list archive)
State New, archived
Headers show

Commit Message

Gary King Oct. 5, 2016, 12:44 a.m. UTC
hidraw input events are stored for each file descriptor in a lockless
circular queue. no memory barriers were used when the queue was
updated, which caused intermittent kernel panics due to heap corruption
when used on multi-core ARM systems.

add memory barriers to ensure that value updates are observable before
the head and tail referents are updated.

Change-Id: Ifb50f5ebe13c55c83aa105c5cd5926ca16fd93e0
Signed-off-by: Gary King <gary.king@oculus.com>
Reviewed-on: http://prn-ocugerrit01.thefacebook.com:8080/88
Reviewed-by: Ahmed Amin <ahmed.amin@oculus.com>
---
 drivers/hid/hidraw.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

Comments

Benjamin Tissoires Oct. 5, 2016, 7:53 a.m. UTC | #1
Hi Gary,

On Oct 04 2016 or thereabouts, Gary King wrote:
> hidraw input events are stored for each file descriptor in a lockless
> circular queue. no memory barriers were used when the queue was
> updated, which caused intermittent kernel panics due to heap corruption
> when used on multi-core ARM systems.
> 
> add memory barriers to ensure that value updates are observable before
> the head and tail referents are updated.

As a foreword, I must confess I am not that comfortable with memory
barriers on SMP.

I have a hard time trying to understand where the code can be reordered
and why you are having the heap corruption and how these barriers solve
the issue.

> 
> Change-Id: Ifb50f5ebe13c55c83aa105c5cd5926ca16fd93e0
> Signed-off-by: Gary King <gary.king@oculus.com>
> Reviewed-on: http://prn-ocugerrit01.thefacebook.com:8080/88

This doesn't look like a public URL, please drop if not.

> Reviewed-by: Ahmed Amin <ahmed.amin@oculus.com>
> ---
>  drivers/hid/hidraw.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/hid/hidraw.c b/drivers/hid/hidraw.c
> index f0e2757..dc3465f 100644
> --- a/drivers/hid/hidraw.c
> +++ b/drivers/hid/hidraw.c
> @@ -53,6 +53,7 @@ static ssize_t hidraw_read(struct file *file, char __user *buffer, size_t count,
>  	mutex_lock(&list->read_mutex);
>  
>  	while (ret == 0) {
> +		smp_rmb();
>  		if (list->head == list->tail) {
>  			add_wait_queue(&list->hidraw->wait, &wait);
>  			set_current_state(TASK_INTERRUPTIBLE);
> @@ -98,7 +99,9 @@ static ssize_t hidraw_read(struct file *file, char __user *buffer, size_t count,
>  
>  		kfree(list->buffer[list->tail].value);
>  		list->buffer[list->tail].value = NULL;
> +		smp_wmb();
>  		list->tail = (list->tail + 1) & (HIDRAW_BUFFER_SIZE - 1);
> +		smp_wmb();

How does these barriers be needed? To me, list->tail gets accessed just
before, so I doubt the compiler would decide to reorder the code without
changing the semantic.

Again, I am not an expert regarding memory barriers, but either you
convince me that this is the best solution, either there is an other
solution (like protecting the circular buffer with spinlocks between the
feeder and consumer).

>  	}
>  out:
>  	mutex_unlock(&list->read_mutex);
> @@ -487,7 +490,7 @@ int hidraw_report_event(struct hid_device *hid, u8 *data, int len)
>  	spin_lock_irqsave(&dev->list_lock, flags);
>  	list_for_each_entry(list, &dev->list, node) {
>  		int new_head = (list->head + 1) & (HIDRAW_BUFFER_SIZE - 1);
> -

Nitpicking, please do not drop this empty line, see the kernel coding
style, an empty line is required after a declaration.

> +		smp_rmb();
>  		if (new_head == list->tail)
>  			continue;
>  
> @@ -496,7 +499,9 @@ int hidraw_report_event(struct hid_device *hid, u8 *data, int len)
>  			break;
>  		}
>  		list->buffer[list->head].len = len;
> +		smp_wmb();
>  		list->head = new_head;
> +		smp_wmb();

Same comment than before, I don't understand how the barrier can help
you here.

>  		kill_fasync(&list->fasync, SIGIO, POLL_IN);
>  	}
>  	spin_unlock_irqrestore(&dev->list_lock, flags);
> -- 
> 1.9.1

Cheers,
Benjamin 
--
To unsubscribe from this list: send the line "unsubscribe linux-input" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jiri Kosina Oct. 7, 2016, 8:43 a.m. UTC | #2
On Tue, 4 Oct 2016, Gary King wrote:

> hidraw input events are stored for each file descriptor in a lockless
> circular queue. 

Well, there is a per-hidraw device list_lock; but you're right that it's 
usage throughout hidraw.c is probably incomplete and that should be fixed.

> ---
>  drivers/hid/hidraw.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/hid/hidraw.c b/drivers/hid/hidraw.c
> index f0e2757..dc3465f 100644
> --- a/drivers/hid/hidraw.c
> +++ b/drivers/hid/hidraw.c
> @@ -53,6 +53,7 @@ static ssize_t hidraw_read(struct file *file, char __user *buffer, size_t count,
>  	mutex_lock(&list->read_mutex);
>  
>  	while (ret == 0) {
> +		smp_rmb();

Which _wmb() does this one pair with?

>  		if (list->head == list->tail) {
>  			add_wait_queue(&list->hidraw->wait, &wait);
>  			set_current_state(TASK_INTERRUPTIBLE);
> @@ -98,7 +99,9 @@ static ssize_t hidraw_read(struct file *file, char __user *buffer, size_t count,
>  
>  		kfree(list->buffer[list->tail].value);
>  		list->buffer[list->tail].value = NULL;
> +		smp_wmb();
>  		list->tail = (list->tail + 1) & (HIDRAW_BUFFER_SIZE - 1);
> +		smp_wmb();

This is completely confusing; it seems like you imply that those two 
_wmb() should pair together, but that makes no sense. Please always, when 
using SMP barriers, document which _rmb() pairs to which _wmb(), otherwise 
the rules become quickly totally unclear.

>  	}
>  out:
>  	mutex_unlock(&list->read_mutex);
> @@ -487,7 +490,7 @@ int hidraw_report_event(struct hid_device *hid, u8 *data, int len)
>  	spin_lock_irqsave(&dev->list_lock, flags);
>  	list_for_each_entry(list, &dev->list, node) {
>  		int new_head = (list->head + 1) & (HIDRAW_BUFFER_SIZE - 1);
> -
> +		smp_rmb();
>  		if (new_head == list->tail)

Again, which _wmb() does this one pair with please?

>  			continue;
>  
> @@ -496,7 +499,9 @@ int hidraw_report_event(struct hid_device *hid, u8 *data, int len)
>  			break;
>  		}
>  		list->buffer[list->head].len = len;
> +		smp_wmb();
>  		list->head = new_head;
> +		smp_wmb();

Same as in hidraw_read(), I completely fail to see the point of having two 
here. Which read side does each of those pair with?

I think this is just papering the real problem (missing list_lock) over in 
an odd way; could you please look into whether locking list_lock properly 
would resolve the crashes you're able to trigger on your ARM system?

Thanks,
Jiri Kosina Oct. 7, 2016, 8:43 a.m. UTC | #3
On Wed, 5 Oct 2016, Benjamin Tissoires wrote:

> > @@ -98,7 +99,9 @@ static ssize_t hidraw_read(struct file *file, char __user *buffer, size_t count,
> >  
> >  		kfree(list->buffer[list->tail].value);
> >  		list->buffer[list->tail].value = NULL;
> > +		smp_wmb();
> >  		list->tail = (list->tail + 1) & (HIDRAW_BUFFER_SIZE - 1);
> > +		smp_wmb();
> 
> How does these barriers be needed? To me, list->tail gets accessed just
> before, so I doubt the compiler would decide to reorder the code without
> changing the semantic.

These are CPU barriers, not compiler barriers.

(that's just a clarification, it doesn't imply in any way that I'd think 
the barriers are correct :) ).

Thanks,
Gary King Oct. 7, 2016, 5:30 p.m. UTC | #4
Hi Jiri,

> > hidraw input events are stored for each file descriptor in a lockless
> > circular queue. 
> 
> Well, there is a per-hidraw device list_lock; but you're right that it's 
> usage throughout hidraw.c is probably incomplete and that should be fixed.

The complication with list_lock is that it is a spin-lock (and IRQ-safe
at that), and I did not want to introduce multiple IRQ-safe spinlock
acquires in hidraw_read (either via list_lock or by addings a per-fd
spinlock) without a substantially better test configuration than my
environment is able to accommodate.

> > ---
> >  drivers/hid/hidraw.c | 7 ++++++-
> >  1 file changed, 6 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/hid/hidraw.c b/drivers/hid/hidraw.c
> > index f0e2757..dc3465f 100644
> > --- a/drivers/hid/hidraw.c
> > +++ b/drivers/hid/hidraw.c
> > @@ -53,6 +53,7 @@ static ssize_t hidraw_read(struct file *file, char __user *buffer, size_t count,
> >  	mutex_lock(&list->read_mutex);
> >  
> >  	while (ret == 0) {
> > +		smp_rmb();
> 
> Which _wmb() does this one pair with?

The _rmb() operations don't really pair with the _wmb(), and aren't
needed for correctness (I should have cleaned this up before providing 
the patch, sorry). Using _rmb() here (and below) reduces rare chances
that the buffer is incorrectly detected as full / empty due to reads
being reordered across loop iterations when a second core enters
_read() / _report_event().

> >  		if (list->head == list->tail) {
> >  			add_wait_queue(&list->hidraw->wait, &wait);
> >  			set_current_state(TASK_INTERRUPTIBLE);
> > @@ -98,7 +99,9 @@ static ssize_t hidraw_read(struct file *file, char __user *buffer, size_t count,
> >  
> >  		kfree(list->buffer[list->tail].value);
> >  		list->buffer[list->tail].value = NULL;
> > +		smp_wmb();
> >  		list->tail = (list->tail + 1) & (HIDRAW_BUFFER_SIZE - 1);
> > +		smp_wmb();
> 
> This is completely confusing; it seems like you imply that those two 
> _wmb() should pair together, but that makes no sense. Please always, when 
> using SMP barriers, document which _rmb() pairs to which _wmb(), otherwise 
> the rules become quickly totally unclear.

Upon further review, I believe the second _wmb is unnecessary, and
redundant with memory barrier operations performed either by subsequent
loop iterations or by the mutex_unlock operation (I originally included it
to reduce the window in which a free entry in the circular queue was
unobservable to other cores, forgetting that the barrier would also be
provided by subsequent loop iterations or the mutex_unlock below).

The fundamental problem we encountered was that changes to the list->tail 
and list->head values were visible on other processors before the 
corresponding changes to list->buffer and list->buffer[N].

> >  	}
> >  out:
> >  	mutex_unlock(&list->read_mutex);
> > @@ -487,7 +490,7 @@ int hidraw_report_event(struct hid_device *hid, u8 *data, int len)
> >  	spin_lock_irqsave(&dev->list_lock, flags);
> >  	list_for_each_entry(list, &dev->list, node) {
> >  		int new_head = (list->head + 1) & (HIDRAW_BUFFER_SIZE - 1);
> > -
> > +		smp_rmb();
> >  		if (new_head == list->tail)
> 
> Again, which _wmb() does this one pair with please?
>
> >  			continue;
> >  
> > @@ -496,7 +499,9 @@ int hidraw_report_event(struct hid_device *hid, u8 *data, int len)
> >  			break;
> >  		}
> >  		list->buffer[list->head].len = len;
> > +		smp_wmb();
> >  		list->head = new_head;
> > +		smp_wmb();
> 
> Same as in hidraw_read(), I completely fail to see the point of having two 
> here. Which read side does each of those pair with?
> 
> I think this is just papering the real problem (missing list_lock) over in 
> an odd way; could you please look into whether locking list_lock properly 
> would resolve the crashes you're able to trigger on your ARM system?

list_lock does not appear to be a good solution for this, since it is
currently an IRQ-safe spinlock on the device. Using it would cause 
_read() to have unnecessary lock contention if the device is opened 
and read simultaneously from multiple fds, and every loop iteration would 
need to acquire the spinlock / disable interrupts at least once.

(Separately, the current locking design seems dangerous to me, since the 
amount of time spent with interrupts disabled in the current 
implementation may be arbitrarily long (the device may be opened
multiple times, and the hidraw event buffer that is kmemdup'd for each
open fd may be large), but I wanted to supply a patch that fixed the
crash before embarking on any more substantial changes to the stack.)

The patch I supplied should be functionally correct when reduced to just 
two calls to smp_wmb(): one preceding the update to list->tail in 
_read(), one preceding the update to list-head in _report_event(). I'm 
happy to make and test this change.

- Gary
--
To unsubscribe from this list: send the line "unsubscribe linux-input" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jiri Kosina Oct. 10, 2016, 8:36 a.m. UTC | #5
On Fri, 7 Oct 2016, Gary King wrote:

> > > diff --git a/drivers/hid/hidraw.c b/drivers/hid/hidraw.c
> > > index f0e2757..dc3465f 100644
> > > --- a/drivers/hid/hidraw.c
> > > +++ b/drivers/hid/hidraw.c
> > > @@ -53,6 +53,7 @@ static ssize_t hidraw_read(struct file *file, char __user *buffer, size_t count,
> > >  	mutex_lock(&list->read_mutex);
> > >  
> > >  	while (ret == 0) {
> > > +		smp_rmb();
> > 
> > Which _wmb() does this one pair with?
> 
> The _rmb() operations don't really pair with the _wmb()

That unfortunately immediately implies that its usage is incorrect though.

[ ... snip ... ]
> > I think this is just papering the real problem (missing list_lock) over in 
> > an odd way; could you please look into whether locking list_lock properly 
> > would resolve the crashes you're able to trigger on your ARM system?
> 
> list_lock does not appear to be a good solution for this, since it is
> currently an IRQ-safe spinlock on the device. Using it would cause 
> _read() to have unnecessary lock contention if the device is opened 
> and read simultaneously from multiple fds, and every loop iteration would 
> need to acquire the spinlock / disable interrupts at least once.

We could make the spinlock per struct file *, which'd remove the 
contention bottleneck when the same device is accessed from through 
multiple fds, but still maintain the correctness (as the list itself is a 
per-fd thing anyway).

Probably introducing new spinlock would make more sense, as list_lock is 
currently being used mostly to protect accessing of the dev->list linked 
list.

> (Separately, the current locking design seems dangerous to me, since the 
> amount of time spent with interrupts disabled in the current 
> implementation may be arbitrarily long (the device may be opened 
> multiple times, and the hidraw event buffer that is kmemdup'd for each 
> open fd may be large), 

We're memduping with GFP_ATOMIC of course, so there is really no 
bottleneck there.

Thanks,
Dmitry Torokhov Oct. 11, 2016, 3:12 a.m. UTC | #6
On Mon, Oct 10, 2016 at 1:36 AM, Jiri Kosina <jikos@kernel.org> wrote:
> On Fri, 7 Oct 2016, Gary King wrote:
>
>> > > diff --git a/drivers/hid/hidraw.c b/drivers/hid/hidraw.c
>> > > index f0e2757..dc3465f 100644
>> > > --- a/drivers/hid/hidraw.c
>> > > +++ b/drivers/hid/hidraw.c
>> > > @@ -53,6 +53,7 @@ static ssize_t hidraw_read(struct file *file, char __user *buffer, size_t count,
>> > >   mutex_lock(&list->read_mutex);
>> > >
>> > >   while (ret == 0) {
>> > > +         smp_rmb();
>> >
>> > Which _wmb() does this one pair with?
>>
>> The _rmb() operations don't really pair with the _wmb()
>
> That unfortunately immediately implies that its usage is incorrect though.
>
> [ ... snip ... ]
>> > I think this is just papering the real problem (missing list_lock) over in
>> > an odd way; could you please look into whether locking list_lock properly
>> > would resolve the crashes you're able to trigger on your ARM system?
>>
>> list_lock does not appear to be a good solution for this, since it is
>> currently an IRQ-safe spinlock on the device. Using it would cause
>> _read() to have unnecessary lock contention if the device is opened
>> and read simultaneously from multiple fds, and every loop iteration would
>> need to acquire the spinlock / disable interrupts at least once.
>
> We could make the spinlock per struct file *, which'd remove the
> contention bottleneck when the same device is accessed from through
> multiple fds, but still maintain the correctness (as the list itself is a
> per-fd thing anyway).
>
> Probably introducing new spinlock would make more sense, as list_lock is
> currently being used mostly to protect accessing of the dev->list linked
> list.

Yes, I'd protect dev->list with RCU, introducde per-file spinlock,
took it when both putting data into teh queue and taking values out.

One could try to optimize and rely on kfifo to avoid taking lock on
writer side, but we are not dealing with 40Gb network interface here
so I would not get too fancy.

Thanks.
Gary King Oct. 14, 2016, 1:29 a.m. UTC | #7
> > [snip]
> > 
> > list_lock does not appear to be a good solution for this, since it is
> > currently an IRQ-safe spinlock on the device. Using it would cause 
> > _read() to have unnecessary lock contention if the device is opened 
> > and read simultaneously from multiple fds, and every loop iteration would 
> > need to acquire the spinlock / disable interrupts at least once.
> 
> We could make the spinlock per struct file *, which'd remove the 
> contention bottleneck when the same device is accessed from through 
> multiple fds, but still maintain the correctness (as the list itself is a 
> per-fd thing anyway).
> 
> Probably introducing new spinlock would make more sense, as list_lock is 
> currently being used mostly to protect accessing of the dev->list linked 
> list.

I've got a patch I'm testing that follows Dmitry's suggestion to add
a per-struct file spinlock and change dev->list to use RCU.

Thanks,

- Gary
--
To unsubscribe from this list: send the line "unsubscribe linux-input" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/hid/hidraw.c b/drivers/hid/hidraw.c
index f0e2757..dc3465f 100644
--- a/drivers/hid/hidraw.c
+++ b/drivers/hid/hidraw.c
@@ -53,6 +53,7 @@  static ssize_t hidraw_read(struct file *file, char __user *buffer, size_t count,
 	mutex_lock(&list->read_mutex);
 
 	while (ret == 0) {
+		smp_rmb();
 		if (list->head == list->tail) {
 			add_wait_queue(&list->hidraw->wait, &wait);
 			set_current_state(TASK_INTERRUPTIBLE);
@@ -98,7 +99,9 @@  static ssize_t hidraw_read(struct file *file, char __user *buffer, size_t count,
 
 		kfree(list->buffer[list->tail].value);
 		list->buffer[list->tail].value = NULL;
+		smp_wmb();
 		list->tail = (list->tail + 1) & (HIDRAW_BUFFER_SIZE - 1);
+		smp_wmb();
 	}
 out:
 	mutex_unlock(&list->read_mutex);
@@ -487,7 +490,7 @@  int hidraw_report_event(struct hid_device *hid, u8 *data, int len)
 	spin_lock_irqsave(&dev->list_lock, flags);
 	list_for_each_entry(list, &dev->list, node) {
 		int new_head = (list->head + 1) & (HIDRAW_BUFFER_SIZE - 1);
-
+		smp_rmb();
 		if (new_head == list->tail)
 			continue;
 
@@ -496,7 +499,9 @@  int hidraw_report_event(struct hid_device *hid, u8 *data, int len)
 			break;
 		}
 		list->buffer[list->head].len = len;
+		smp_wmb();
 		list->head = new_head;
+		smp_wmb();
 		kill_fasync(&list->fasync, SIGIO, POLL_IN);
 	}
 	spin_unlock_irqrestore(&dev->list_lock, flags);