diff mbox series

[02/12] locking/rwsem: Exit early when held by an anonymous owner

Message ID 20200114161225.309792-3-hch@lst.de (mailing list archive)
State New, archived
Headers show
Series [01/12] mm: fix a comment in sys_swapon | expand

Commit Message

Christoph Hellwig Jan. 14, 2020, 4:12 p.m. UTC
The rwsem code overloads the owner field with either a task struct or
negative magic numbers.  Add a quick hack to catch these negative
values early on.  Without this spinning on a writer that replaced the
owner with RWSEM_OWNER_UNKNOWN, rwsem_spin_on_owner can crash while
deferencing the task_struct ->on_cpu field of a -8 value.

XXX: This might be a bit of a hack as the code otherwise doesn't use
the ERR_PTR family macros, better suggestions welcome.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 kernel/locking/rwsem.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Waiman Long Jan. 14, 2020, 6:17 p.m. UTC | #1
On 1/14/20 11:12 AM, Christoph Hellwig wrote:
> The rwsem code overloads the owner field with either a task struct or
> negative magic numbers.  Add a quick hack to catch these negative
> values early on.  Without this spinning on a writer that replaced the
> owner with RWSEM_OWNER_UNKNOWN, rwsem_spin_on_owner can crash while
> deferencing the task_struct ->on_cpu field of a -8 value.
>
> XXX: This might be a bit of a hack as the code otherwise doesn't use
> the ERR_PTR family macros, better suggestions welcome.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  kernel/locking/rwsem.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
> index 44e68761f432..6adc719a30a1 100644
> --- a/kernel/locking/rwsem.c
> +++ b/kernel/locking/rwsem.c
> @@ -725,6 +725,8 @@ rwsem_spin_on_owner(struct rw_semaphore *sem, unsigned long nonspinnable)
>  	state = rwsem_owner_state(owner, flags, nonspinnable);
>  	if (state != OWNER_WRITER)
>  		return state;
> +	if (IS_ERR(owner))
> +		return state;
>  
>  	rcu_read_lock();
>  	for (;;) {

The owner field is just a pointer to the task structure with the lower 3
bits served as flag bits. Setting owner to RWSEM_OWNER_UNKNOWN (-2) will
stop optimistic spinning. So under what condition did the crash happen?

Anyway, PeterZ is working on revising the percpu-rwsem implementation to
more gracefully handle the frozen case. At the end, there will not be a
need for the RWSEM_OWNER_UNKNOWN magic and it can be removed.

Cheers,
Longman

RWSEM_OWNER_UNKNOWN
Christoph Hellwig Jan. 14, 2020, 6:25 p.m. UTC | #2
On Tue, Jan 14, 2020 at 01:17:45PM -0500, Waiman Long wrote:
> The owner field is just a pointer to the task structure with the lower 3
> bits served as flag bits. Setting owner to RWSEM_OWNER_UNKNOWN (-2) will
> stop optimistic spinning. So under what condition did the crash happen?

When running xfstests with all patches in this series except for this
one, IIRC in generic/114.

> Anyway, PeterZ is working on revising the percpu-rwsem implementation to
> more gracefully handle the frozen case. At the end, there will not be a
> need for the RWSEM_OWNER_UNKNOWN magic and it can be removed.

Well, this series relies on that value.  And I think it fundamentally
is the right thing to do for AIO, and potentially other I/O related
locking where we take a lock to synchronize access to data, then
do I/O and then eventually get an I/O completion from an interrupt.
Even thinking from the PREEMP_RT context we want to boost the
initial thread as long as we can, then do nothing when it is off
to I/O hardware (except maybe providing good diagnostics that the cause
for the latency is I/O), and then boost the thread that is handling
the completion.  Things like the i_dio_count hack can't provide that.
Waiman Long Jan. 14, 2020, 6:33 p.m. UTC | #3
On 1/14/20 1:25 PM, Christoph Hellwig wrote:
> On Tue, Jan 14, 2020 at 01:17:45PM -0500, Waiman Long wrote:
>> The owner field is just a pointer to the task structure with the lower 3
>> bits served as flag bits. Setting owner to RWSEM_OWNER_UNKNOWN (-2) will
>> stop optimistic spinning. So under what condition did the crash happen?
> When running xfstests with all patches in this series except for this
> one, IIRC in generic/114.

OK, I think I know where the bug is. I will send a patch to fix that.

Thanks,
Longman
Waiman Long Jan. 14, 2020, 6:55 p.m. UTC | #4
On 1/14/20 1:25 PM, Christoph Hellwig wrote:
> On Tue, Jan 14, 2020 at 01:17:45PM -0500, Waiman Long wrote:
>> The owner field is just a pointer to the task structure with the lower 3
>> bits served as flag bits. Setting owner to RWSEM_OWNER_UNKNOWN (-2) will
>> stop optimistic spinning. So under what condition did the crash happen?
> When running xfstests with all patches in this series except for this
> one, IIRC in generic/114.

Could you try the attached patch to see if it can fix the problem?

Thanks,
Longman
diff mbox series

Patch

diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
index 44e68761f432..6adc719a30a1 100644
--- a/kernel/locking/rwsem.c
+++ b/kernel/locking/rwsem.c
@@ -725,6 +725,8 @@  rwsem_spin_on_owner(struct rw_semaphore *sem, unsigned long nonspinnable)
 	state = rwsem_owner_state(owner, flags, nonspinnable);
 	if (state != OWNER_WRITER)
 		return state;
+	if (IS_ERR(owner))
+		return state;
 
 	rcu_read_lock();
 	for (;;) {