Message ID | 1475628266-4767-1-git-send-email-gary.king@oculus.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hi Gary, On Oct 04 2016 or thereabouts, Gary King wrote: > hidraw input events are stored for each file descriptor in a lockless > circular queue. no memory barriers were used when the queue was > updated, which caused intermittent kernel panics due to heap corruption > when used on multi-core ARM systems. > > add memory barriers to ensure that value updates are observable before > the head and tail referents are updated. As a foreword, I must confess I am not that comfortable with memory barriers on SMP. I have a hard time trying to understand where the code can be reordered and why you are having the heap corruption and how these barriers solve the issue. > > Change-Id: Ifb50f5ebe13c55c83aa105c5cd5926ca16fd93e0 > Signed-off-by: Gary King <gary.king@oculus.com> > Reviewed-on: http://prn-ocugerrit01.thefacebook.com:8080/88 This doesn't look like a public URL, please drop if not. > Reviewed-by: Ahmed Amin <ahmed.amin@oculus.com> > --- > drivers/hid/hidraw.c | 7 ++++++- > 1 file changed, 6 insertions(+), 1 deletion(-) > > diff --git a/drivers/hid/hidraw.c b/drivers/hid/hidraw.c > index f0e2757..dc3465f 100644 > --- a/drivers/hid/hidraw.c > +++ b/drivers/hid/hidraw.c > @@ -53,6 +53,7 @@ static ssize_t hidraw_read(struct file *file, char __user *buffer, size_t count, > mutex_lock(&list->read_mutex); > > while (ret == 0) { > + smp_rmb(); > if (list->head == list->tail) { > add_wait_queue(&list->hidraw->wait, &wait); > set_current_state(TASK_INTERRUPTIBLE); > @@ -98,7 +99,9 @@ static ssize_t hidraw_read(struct file *file, char __user *buffer, size_t count, > > kfree(list->buffer[list->tail].value); > list->buffer[list->tail].value = NULL; > + smp_wmb(); > list->tail = (list->tail + 1) & (HIDRAW_BUFFER_SIZE - 1); > + smp_wmb(); How does these barriers be needed? To me, list->tail gets accessed just before, so I doubt the compiler would decide to reorder the code without changing the semantic. Again, I am not an expert regarding memory barriers, but either you convince me that this is the best solution, either there is an other solution (like protecting the circular buffer with spinlocks between the feeder and consumer). > } > out: > mutex_unlock(&list->read_mutex); > @@ -487,7 +490,7 @@ int hidraw_report_event(struct hid_device *hid, u8 *data, int len) > spin_lock_irqsave(&dev->list_lock, flags); > list_for_each_entry(list, &dev->list, node) { > int new_head = (list->head + 1) & (HIDRAW_BUFFER_SIZE - 1); > - Nitpicking, please do not drop this empty line, see the kernel coding style, an empty line is required after a declaration. > + smp_rmb(); > if (new_head == list->tail) > continue; > > @@ -496,7 +499,9 @@ int hidraw_report_event(struct hid_device *hid, u8 *data, int len) > break; > } > list->buffer[list->head].len = len; > + smp_wmb(); > list->head = new_head; > + smp_wmb(); Same comment than before, I don't understand how the barrier can help you here. > kill_fasync(&list->fasync, SIGIO, POLL_IN); > } > spin_unlock_irqrestore(&dev->list_lock, flags); > -- > 1.9.1 Cheers, Benjamin -- To unsubscribe from this list: send the line "unsubscribe linux-input" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 4 Oct 2016, Gary King wrote: > hidraw input events are stored for each file descriptor in a lockless > circular queue. Well, there is a per-hidraw device list_lock; but you're right that it's usage throughout hidraw.c is probably incomplete and that should be fixed. > --- > drivers/hid/hidraw.c | 7 ++++++- > 1 file changed, 6 insertions(+), 1 deletion(-) > > diff --git a/drivers/hid/hidraw.c b/drivers/hid/hidraw.c > index f0e2757..dc3465f 100644 > --- a/drivers/hid/hidraw.c > +++ b/drivers/hid/hidraw.c > @@ -53,6 +53,7 @@ static ssize_t hidraw_read(struct file *file, char __user *buffer, size_t count, > mutex_lock(&list->read_mutex); > > while (ret == 0) { > + smp_rmb(); Which _wmb() does this one pair with? > if (list->head == list->tail) { > add_wait_queue(&list->hidraw->wait, &wait); > set_current_state(TASK_INTERRUPTIBLE); > @@ -98,7 +99,9 @@ static ssize_t hidraw_read(struct file *file, char __user *buffer, size_t count, > > kfree(list->buffer[list->tail].value); > list->buffer[list->tail].value = NULL; > + smp_wmb(); > list->tail = (list->tail + 1) & (HIDRAW_BUFFER_SIZE - 1); > + smp_wmb(); This is completely confusing; it seems like you imply that those two _wmb() should pair together, but that makes no sense. Please always, when using SMP barriers, document which _rmb() pairs to which _wmb(), otherwise the rules become quickly totally unclear. > } > out: > mutex_unlock(&list->read_mutex); > @@ -487,7 +490,7 @@ int hidraw_report_event(struct hid_device *hid, u8 *data, int len) > spin_lock_irqsave(&dev->list_lock, flags); > list_for_each_entry(list, &dev->list, node) { > int new_head = (list->head + 1) & (HIDRAW_BUFFER_SIZE - 1); > - > + smp_rmb(); > if (new_head == list->tail) Again, which _wmb() does this one pair with please? > continue; > > @@ -496,7 +499,9 @@ int hidraw_report_event(struct hid_device *hid, u8 *data, int len) > break; > } > list->buffer[list->head].len = len; > + smp_wmb(); > list->head = new_head; > + smp_wmb(); Same as in hidraw_read(), I completely fail to see the point of having two here. Which read side does each of those pair with? I think this is just papering the real problem (missing list_lock) over in an odd way; could you please look into whether locking list_lock properly would resolve the crashes you're able to trigger on your ARM system? Thanks,
On Wed, 5 Oct 2016, Benjamin Tissoires wrote: > > @@ -98,7 +99,9 @@ static ssize_t hidraw_read(struct file *file, char __user *buffer, size_t count, > > > > kfree(list->buffer[list->tail].value); > > list->buffer[list->tail].value = NULL; > > + smp_wmb(); > > list->tail = (list->tail + 1) & (HIDRAW_BUFFER_SIZE - 1); > > + smp_wmb(); > > How does these barriers be needed? To me, list->tail gets accessed just > before, so I doubt the compiler would decide to reorder the code without > changing the semantic. These are CPU barriers, not compiler barriers. (that's just a clarification, it doesn't imply in any way that I'd think the barriers are correct :) ). Thanks,
Hi Jiri, > > hidraw input events are stored for each file descriptor in a lockless > > circular queue. > > Well, there is a per-hidraw device list_lock; but you're right that it's > usage throughout hidraw.c is probably incomplete and that should be fixed. The complication with list_lock is that it is a spin-lock (and IRQ-safe at that), and I did not want to introduce multiple IRQ-safe spinlock acquires in hidraw_read (either via list_lock or by addings a per-fd spinlock) without a substantially better test configuration than my environment is able to accommodate. > > --- > > drivers/hid/hidraw.c | 7 ++++++- > > 1 file changed, 6 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/hid/hidraw.c b/drivers/hid/hidraw.c > > index f0e2757..dc3465f 100644 > > --- a/drivers/hid/hidraw.c > > +++ b/drivers/hid/hidraw.c > > @@ -53,6 +53,7 @@ static ssize_t hidraw_read(struct file *file, char __user *buffer, size_t count, > > mutex_lock(&list->read_mutex); > > > > while (ret == 0) { > > + smp_rmb(); > > Which _wmb() does this one pair with? The _rmb() operations don't really pair with the _wmb(), and aren't needed for correctness (I should have cleaned this up before providing the patch, sorry). Using _rmb() here (and below) reduces rare chances that the buffer is incorrectly detected as full / empty due to reads being reordered across loop iterations when a second core enters _read() / _report_event(). > > if (list->head == list->tail) { > > add_wait_queue(&list->hidraw->wait, &wait); > > set_current_state(TASK_INTERRUPTIBLE); > > @@ -98,7 +99,9 @@ static ssize_t hidraw_read(struct file *file, char __user *buffer, size_t count, > > > > kfree(list->buffer[list->tail].value); > > list->buffer[list->tail].value = NULL; > > + smp_wmb(); > > list->tail = (list->tail + 1) & (HIDRAW_BUFFER_SIZE - 1); > > + smp_wmb(); > > This is completely confusing; it seems like you imply that those two > _wmb() should pair together, but that makes no sense. Please always, when > using SMP barriers, document which _rmb() pairs to which _wmb(), otherwise > the rules become quickly totally unclear. Upon further review, I believe the second _wmb is unnecessary, and redundant with memory barrier operations performed either by subsequent loop iterations or by the mutex_unlock operation (I originally included it to reduce the window in which a free entry in the circular queue was unobservable to other cores, forgetting that the barrier would also be provided by subsequent loop iterations or the mutex_unlock below). The fundamental problem we encountered was that changes to the list->tail and list->head values were visible on other processors before the corresponding changes to list->buffer and list->buffer[N]. > > } > > out: > > mutex_unlock(&list->read_mutex); > > @@ -487,7 +490,7 @@ int hidraw_report_event(struct hid_device *hid, u8 *data, int len) > > spin_lock_irqsave(&dev->list_lock, flags); > > list_for_each_entry(list, &dev->list, node) { > > int new_head = (list->head + 1) & (HIDRAW_BUFFER_SIZE - 1); > > - > > + smp_rmb(); > > if (new_head == list->tail) > > Again, which _wmb() does this one pair with please? > > > continue; > > > > @@ -496,7 +499,9 @@ int hidraw_report_event(struct hid_device *hid, u8 *data, int len) > > break; > > } > > list->buffer[list->head].len = len; > > + smp_wmb(); > > list->head = new_head; > > + smp_wmb(); > > Same as in hidraw_read(), I completely fail to see the point of having two > here. Which read side does each of those pair with? > > I think this is just papering the real problem (missing list_lock) over in > an odd way; could you please look into whether locking list_lock properly > would resolve the crashes you're able to trigger on your ARM system? list_lock does not appear to be a good solution for this, since it is currently an IRQ-safe spinlock on the device. Using it would cause _read() to have unnecessary lock contention if the device is opened and read simultaneously from multiple fds, and every loop iteration would need to acquire the spinlock / disable interrupts at least once. (Separately, the current locking design seems dangerous to me, since the amount of time spent with interrupts disabled in the current implementation may be arbitrarily long (the device may be opened multiple times, and the hidraw event buffer that is kmemdup'd for each open fd may be large), but I wanted to supply a patch that fixed the crash before embarking on any more substantial changes to the stack.) The patch I supplied should be functionally correct when reduced to just two calls to smp_wmb(): one preceding the update to list->tail in _read(), one preceding the update to list-head in _report_event(). I'm happy to make and test this change. - Gary -- To unsubscribe from this list: send the line "unsubscribe linux-input" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 7 Oct 2016, Gary King wrote: > > > diff --git a/drivers/hid/hidraw.c b/drivers/hid/hidraw.c > > > index f0e2757..dc3465f 100644 > > > --- a/drivers/hid/hidraw.c > > > +++ b/drivers/hid/hidraw.c > > > @@ -53,6 +53,7 @@ static ssize_t hidraw_read(struct file *file, char __user *buffer, size_t count, > > > mutex_lock(&list->read_mutex); > > > > > > while (ret == 0) { > > > + smp_rmb(); > > > > Which _wmb() does this one pair with? > > The _rmb() operations don't really pair with the _wmb() That unfortunately immediately implies that its usage is incorrect though. [ ... snip ... ] > > I think this is just papering the real problem (missing list_lock) over in > > an odd way; could you please look into whether locking list_lock properly > > would resolve the crashes you're able to trigger on your ARM system? > > list_lock does not appear to be a good solution for this, since it is > currently an IRQ-safe spinlock on the device. Using it would cause > _read() to have unnecessary lock contention if the device is opened > and read simultaneously from multiple fds, and every loop iteration would > need to acquire the spinlock / disable interrupts at least once. We could make the spinlock per struct file *, which'd remove the contention bottleneck when the same device is accessed from through multiple fds, but still maintain the correctness (as the list itself is a per-fd thing anyway). Probably introducing new spinlock would make more sense, as list_lock is currently being used mostly to protect accessing of the dev->list linked list. > (Separately, the current locking design seems dangerous to me, since the > amount of time spent with interrupts disabled in the current > implementation may be arbitrarily long (the device may be opened > multiple times, and the hidraw event buffer that is kmemdup'd for each > open fd may be large), We're memduping with GFP_ATOMIC of course, so there is really no bottleneck there. Thanks,
On Mon, Oct 10, 2016 at 1:36 AM, Jiri Kosina <jikos@kernel.org> wrote: > On Fri, 7 Oct 2016, Gary King wrote: > >> > > diff --git a/drivers/hid/hidraw.c b/drivers/hid/hidraw.c >> > > index f0e2757..dc3465f 100644 >> > > --- a/drivers/hid/hidraw.c >> > > +++ b/drivers/hid/hidraw.c >> > > @@ -53,6 +53,7 @@ static ssize_t hidraw_read(struct file *file, char __user *buffer, size_t count, >> > > mutex_lock(&list->read_mutex); >> > > >> > > while (ret == 0) { >> > > + smp_rmb(); >> > >> > Which _wmb() does this one pair with? >> >> The _rmb() operations don't really pair with the _wmb() > > That unfortunately immediately implies that its usage is incorrect though. > > [ ... snip ... ] >> > I think this is just papering the real problem (missing list_lock) over in >> > an odd way; could you please look into whether locking list_lock properly >> > would resolve the crashes you're able to trigger on your ARM system? >> >> list_lock does not appear to be a good solution for this, since it is >> currently an IRQ-safe spinlock on the device. Using it would cause >> _read() to have unnecessary lock contention if the device is opened >> and read simultaneously from multiple fds, and every loop iteration would >> need to acquire the spinlock / disable interrupts at least once. > > We could make the spinlock per struct file *, which'd remove the > contention bottleneck when the same device is accessed from through > multiple fds, but still maintain the correctness (as the list itself is a > per-fd thing anyway). > > Probably introducing new spinlock would make more sense, as list_lock is > currently being used mostly to protect accessing of the dev->list linked > list. Yes, I'd protect dev->list with RCU, introducde per-file spinlock, took it when both putting data into teh queue and taking values out. One could try to optimize and rely on kfifo to avoid taking lock on writer side, but we are not dealing with 40Gb network interface here so I would not get too fancy. Thanks.
> > [snip] > > > > list_lock does not appear to be a good solution for this, since it is > > currently an IRQ-safe spinlock on the device. Using it would cause > > _read() to have unnecessary lock contention if the device is opened > > and read simultaneously from multiple fds, and every loop iteration would > > need to acquire the spinlock / disable interrupts at least once. > > We could make the spinlock per struct file *, which'd remove the > contention bottleneck when the same device is accessed from through > multiple fds, but still maintain the correctness (as the list itself is a > per-fd thing anyway). > > Probably introducing new spinlock would make more sense, as list_lock is > currently being used mostly to protect accessing of the dev->list linked > list. I've got a patch I'm testing that follows Dmitry's suggestion to add a per-struct file spinlock and change dev->list to use RCU. Thanks, - Gary -- To unsubscribe from this list: send the line "unsubscribe linux-input" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/hid/hidraw.c b/drivers/hid/hidraw.c index f0e2757..dc3465f 100644 --- a/drivers/hid/hidraw.c +++ b/drivers/hid/hidraw.c @@ -53,6 +53,7 @@ static ssize_t hidraw_read(struct file *file, char __user *buffer, size_t count, mutex_lock(&list->read_mutex); while (ret == 0) { + smp_rmb(); if (list->head == list->tail) { add_wait_queue(&list->hidraw->wait, &wait); set_current_state(TASK_INTERRUPTIBLE); @@ -98,7 +99,9 @@ static ssize_t hidraw_read(struct file *file, char __user *buffer, size_t count, kfree(list->buffer[list->tail].value); list->buffer[list->tail].value = NULL; + smp_wmb(); list->tail = (list->tail + 1) & (HIDRAW_BUFFER_SIZE - 1); + smp_wmb(); } out: mutex_unlock(&list->read_mutex); @@ -487,7 +490,7 @@ int hidraw_report_event(struct hid_device *hid, u8 *data, int len) spin_lock_irqsave(&dev->list_lock, flags); list_for_each_entry(list, &dev->list, node) { int new_head = (list->head + 1) & (HIDRAW_BUFFER_SIZE - 1); - + smp_rmb(); if (new_head == list->tail) continue; @@ -496,7 +499,9 @@ int hidraw_report_event(struct hid_device *hid, u8 *data, int len) break; } list->buffer[list->head].len = len; + smp_wmb(); list->head = new_head; + smp_wmb(); kill_fasync(&list->fasync, SIGIO, POLL_IN); } spin_unlock_irqrestore(&dev->list_lock, flags);