diff mbox

[2/3,V2] kernel: Move groups_sort to the caller of set_groups.

Message ID 20171205212847.GF26021@bombadil.infradead.org (mailing list archive)
State New, archived
Headers show

Commit Message

Matthew Wilcox (Oracle) Dec. 5, 2017, 9:28 p.m. UTC
On Tue, Dec 05, 2017 at 07:11:00AM +1100, NeilBrown wrote:
> As we don't seem to be pursuing this possibility is probably isn't very
> important, but I'd like to point out that the original fix isn't a true
> fix.
> It just sorts a shared group_info early.  This does not stop corruption.
> Every time a thread calls set_groups() on that group_info it will be
> sorted again.
> The sort algorithm used is the heap sort, and a heap sort always moves
> elements in the array around - it does not leave a sorted array
> untouched (unlike e.g. the quick sort which doesn't move anything in a
> sorted array).
> So it is still possible for two calls to groups_sort() to race.
> We *need* to move groups_sort() out of set_groups().

It must be relatively common to sort an already-sorted array.  I wonder
if something like this patch would be worthwhile?

I have deliberately broken this patch so it can't be applied.  I haven't
tested it, and for all I know, I got the sign of cmp_func wrong.



--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

NeilBrown Dec. 5, 2017, 10:05 p.m. UTC | #1
On Tue, Dec 05 2017, Matthew Wilcox wrote:

> On Tue, Dec 05, 2017 at 07:11:00AM +1100, NeilBrown wrote:
>> As we don't seem to be pursuing this possibility is probably isn't very
>> important, but I'd like to point out that the original fix isn't a true
>> fix.
>> It just sorts a shared group_info early.  This does not stop corruption.
>> Every time a thread calls set_groups() on that group_info it will be
>> sorted again.
>> The sort algorithm used is the heap sort, and a heap sort always moves
>> elements in the array around - it does not leave a sorted array
>> untouched (unlike e.g. the quick sort which doesn't move anything in a
>> sorted array).
>> So it is still possible for two calls to groups_sort() to race.
>> We *need* to move groups_sort() out of set_groups().
>
> It must be relatively common to sort an already-sorted array.  I wonder
> if something like this patch would be worthwhile?
>
> I have deliberately broken this patch so it can't be applied.  I haven't
> tested it, and for all I know, I got the sign of cmp_func wrong.
>
> diff --git a/lib/sort.c b/lib/sort.c
> index d6b7a202b0b6..2b527fde6dad 100644
> --- a/lib/sort.c
> +++ b/lib/sort.c
> @@ -75,7 +75,14 @@ void sort(void *base, size_t num, size_t size,
>  			swap_func = generic_swap;
>  	}
>  
> -       /* heapify */
> +	/* Do not sort an already-sorted array */
> +	for (c = 0; c < (n - size); c += size) {
> +		if (cmp_func(base + c, base + c + size) < 0)
> +			goto heapify;
> +	}
> +	return;
> +
> +heapify:
>  	for ( ; i >= 0; i -= size) {
>  		for (r = i; r * 2 + size < n; r  = c) {
>  			c = r * 2 + size;

I wondered about this possibility...
It is a clear win from a defensive-programming perspective.
Adding a small overhead to every sort call isn't ideal, but I doubt it
is noticeable (typically 2 or 3 tests I suspect).
I probably wouldn't advocate this approach, but I certainly wouldn't
fight it.
I *do* like the way you changed a comment to a label!

Thanks,
NeilBrown
Thiago Becker Dec. 5, 2017, 11:03 p.m. UTC | #2
On Tue, 5 Dec 2017, Matthew Wilcox wrote:

> On Tue, Dec 05, 2017 at 07:11:00AM +1100, NeilBrown wrote:
>> As we don't seem to be pursuing this possibility is probably isn't very
>> important, but I'd like to point out that the original fix isn't a true
>> fix.
>> It just sorts a shared group_info early.  This does not stop corruption.
>> Every time a thread calls set_groups() on that group_info it will be
>> sorted again.
>> The sort algorithm used is the heap sort, and a heap sort always moves
>> elements in the array around - it does not leave a sorted array
>> untouched (unlike e.g. the quick sort which doesn't move anything in a
>> sorted array).
>> So it is still possible for two calls to groups_sort() to race.
>> We *need* to move groups_sort() out of set_groups().

Hum, makes sense. I've applied it to the most recent Fedora kernel (that 
uses heapsort) and I didn't see the problem again. I should run a few more 
repetitions to be sure.

> It must be relatively common to sort an already-sorted array.  I wonder
> if something like this patch would be worthwhile?
>
> I have deliberately broken this patch so it can't be applied.  I haven't
> tested it, and for all I know, I got the sign of cmp_func wrong.
>
> diff --git a/lib/sort.c b/lib/sort.c
> index d6b7a202b0b6..2b527fde6dad 100644
> --- a/lib/sort.c
> +++ b/lib/sort.c
> @@ -75,7 +75,14 @@ void sort(void *base, size_t num, size_t size,
> 			swap_func = generic_swap;
> 	}
>
> -       /* heapify */
> +	/* Do not sort an already-sorted array */
> +	for (c = 0; c < (n - size); c += size) {
> +		if (cmp_func(base + c, base + c + size) < 0)
> +			goto heapify;
> +	}
> +	return;
> +
> +heapify:
> 	for ( ; i >= 0; i -= size) {
> 		for (r = i; r * 2 + size < n; r  = c) {
> 			c = r * 2 + size;

The bug happens when two threads enter sort_groups for the same 
group info in parallel, and one thread starts overwriting values 
that another thread may already have "heapified" or sorted.

Thread A                  Thread B
Enter groups_sort
                           Enter groups_sort
.
.
.
Return from groups_sort
                           .
                           .
                           .
                           Return from groups_sort

Wouldn't this patch just make both threads see the structure as unsorted 
and sort them?

Thanks,
trbecker
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Matthew Wilcox (Oracle) Dec. 5, 2017, 11:23 p.m. UTC | #3
On Tue, Dec 05, 2017 at 09:03:02PM -0200, Thiago Rafael Becker wrote:
> On Tue, 5 Dec 2017, Matthew Wilcox wrote:
> > It must be relatively common to sort an already-sorted array.  I wonder
> > if something like this patch would be worthwhile?
> 
> The bug happens when two threads enter sort_groups for the same group info
> in parallel, and one thread starts overwriting values that another thread
> may already have "heapified" or sorted.
> 
> Thread A                  Thread B
> Enter groups_sort
>                           Enter groups_sort
> .
> .
> .
> Return from groups_sort
>                           .
>                           .
>                           .
>                           Return from groups_sort
> 
> Wouldn't this patch just make both threads see the structure as unsorted and
> sort them?

I'm sorry, I wasn't clear.  I wasn't commenting on the original bug (and
I believe your analysis to be correct unless there's some locking which
prevents two calls to group_sort from happening at the same time).

I was wondering about whether our sort() implementation is the best that
it could possibly be, and whether having the trait of not modifying an
already-sorted array is worthwhile (eg it would not cause cachelines to
enter the dirty state if they were already clean).
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/lib/sort.c b/lib/sort.c
index d6b7a202b0b6..2b527fde6dad 100644
--- a/lib/sort.c
+++ b/lib/sort.c
@@ -75,7 +75,14 @@  void sort(void *base, size_t num, size_t size,
 			swap_func = generic_swap;
 	}
 
-       /* heapify */
+	/* Do not sort an already-sorted array */
+	for (c = 0; c < (n - size); c += size) {
+		if (cmp_func(base + c, base + c + size) < 0)
+			goto heapify;
+	}
+	return;
+
+heapify:
 	for ( ; i >= 0; i -= size) {
 		for (r = i; r * 2 + size < n; r  = c) {
 			c = r * 2 + size;