Message ID | 20170323112443.30843-1-Jason@zx2c4.com (mailing list archive) |
---|---|
State | Accepted |
Delegated to: | Herbert Xu |
Headers | show |
On Thu, Mar 23, 2017 at 12:24:43PM +0100, Jason A. Donenfeld wrote: > Under extremely heavy uses of padata, crashes occur, and with list > debugging turned on, this happens instead: > > [87487.298728] WARNING: CPU: 1 PID: 882 at lib/list_debug.c:33 > __list_add+0xae/0x130 > [87487.301868] list_add corruption. prev->next should be next > (ffffb17abfc043d0), but was ffff8dba70872c80. (prev=ffff8dba70872b00). > [87487.339011] [<ffffffff9a53d075>] dump_stack+0x68/0xa3 > [87487.342198] [<ffffffff99e119a1>] ? console_unlock+0x281/0x6d0 > [87487.345364] [<ffffffff99d6b91f>] __warn+0xff/0x140 > [87487.348513] [<ffffffff99d6b9aa>] warn_slowpath_fmt+0x4a/0x50 > [87487.351659] [<ffffffff9a58b5de>] __list_add+0xae/0x130 > [87487.354772] [<ffffffff9add5094>] ? _raw_spin_lock+0x64/0x70 > [87487.357915] [<ffffffff99eefd66>] padata_reorder+0x1e6/0x420 > [87487.361084] [<ffffffff99ef0055>] padata_do_serial+0xa5/0x120 > > padata_reorder calls list_add_tail with the list to which its adding > locked, which seems correct: > > spin_lock(&squeue->serial.lock); > list_add_tail(&padata->list, &squeue->serial.list); > spin_unlock(&squeue->serial.lock); > > This therefore leaves only place where such inconsistency could occur: > if padata->list is added at the same time on two different threads. > This pdata pointer comes from the function call to > padata_get_next(pd), which has in it the following block: > > next_queue = per_cpu_ptr(pd->pqueue, cpu); > padata = NULL; > reorder = &next_queue->reorder; > if (!list_empty(&reorder->list)) { > padata = list_entry(reorder->list.next, > struct padata_priv, list); > spin_lock(&reorder->lock); > list_del_init(&padata->list); > atomic_dec(&pd->reorder_objects); > spin_unlock(&reorder->lock); > > pd->processed++; > > goto out; > } > out: > return padata; > > I strongly suspect that the problem here is that two threads can race > on reorder list. Even though the deletion is locked, call to > list_entry is not locked, which means it's feasible that two threads > pick up the same padata object and subsequently call list_add_tail on > them at the same time. The fix is thus be hoist that lock outside of > that block. > > Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Jason A. Donenfeld <Jason@zx2c4.com> wrote: > Under extremely heavy uses of padata, crashes occur, and with list > debugging turned on, this happens instead: > > [87487.298728] WARNING: CPU: 1 PID: 882 at lib/list_debug.c:33 > __list_add+0xae/0x130 > [87487.301868] list_add corruption. prev->next should be next > (ffffb17abfc043d0), but was ffff8dba70872c80. (prev=ffff8dba70872b00). > [87487.339011] [<ffffffff9a53d075>] dump_stack+0x68/0xa3 > [87487.342198] [<ffffffff99e119a1>] ? console_unlock+0x281/0x6d0 > [87487.345364] [<ffffffff99d6b91f>] __warn+0xff/0x140 > [87487.348513] [<ffffffff99d6b9aa>] warn_slowpath_fmt+0x4a/0x50 > [87487.351659] [<ffffffff9a58b5de>] __list_add+0xae/0x130 > [87487.354772] [<ffffffff9add5094>] ? _raw_spin_lock+0x64/0x70 > [87487.357915] [<ffffffff99eefd66>] padata_reorder+0x1e6/0x420 > [87487.361084] [<ffffffff99ef0055>] padata_do_serial+0xa5/0x120 > > padata_reorder calls list_add_tail with the list to which its adding > locked, which seems correct: > > spin_lock(&squeue->serial.lock); > list_add_tail(&padata->list, &squeue->serial.list); > spin_unlock(&squeue->serial.lock); > > This therefore leaves only place where such inconsistency could occur: > if padata->list is added at the same time on two different threads. > This pdata pointer comes from the function call to > padata_get_next(pd), which has in it the following block: > > next_queue = per_cpu_ptr(pd->pqueue, cpu); > padata = NULL; > reorder = &next_queue->reorder; > if (!list_empty(&reorder->list)) { > padata = list_entry(reorder->list.next, > struct padata_priv, list); > spin_lock(&reorder->lock); > list_del_init(&padata->list); > atomic_dec(&pd->reorder_objects); > spin_unlock(&reorder->lock); > > pd->processed++; > > goto out; > } > out: > return padata; > > I strongly suspect that the problem here is that two threads can race > on reorder list. Even though the deletion is locked, call to > list_entry is not locked, which means it's feasible that two threads > pick up the same padata object and subsequently call list_add_tail on > them at the same time. The fix is thus be hoist that lock outside of > that block. > > Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Patch applied. Thanks.
From: Steffen Klassert <steffen.klassert@secunet.com> Date: Fri, 24 Mar 2017 10:41:59 +0100 > On Thu, Mar 23, 2017 at 12:24:43PM +0100, Jason A. Donenfeld wrote: >> Under extremely heavy uses of padata, crashes occur, and with list >> debugging turned on, this happens instead: ... >> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> > > Acked-by: Steffen Klassert <steffen.klassert@secunet.com> Herbert, this should probably go via your crypto tree.
On Sat, Mar 25, 2017 at 08:01:51PM -0700, David Miller wrote: > From: Steffen Klassert <steffen.klassert@secunet.com> > Date: Fri, 24 Mar 2017 10:41:59 +0100 > > > On Thu, Mar 23, 2017 at 12:24:43PM +0100, Jason A. Donenfeld wrote: > >> Under extremely heavy uses of padata, crashes occur, and with list > >> debugging turned on, this happens instead: > ... > >> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> > > > > Acked-by: Steffen Klassert <steffen.klassert@secunet.com> > > Herbert, this should probably go via your crypto tree. Thanks David. It's already in the crypto tree.
I've got a few other races in padata, I think, that I'm working on fixing. If these pan out, I'll submit them exclusively to -crypto instead of -netdev, to avoid this confusion next time. Of course, if I'm able to fix these, then I'll probably be bald from pulling my hair out during this insane debugging frenzy of the last few days... Jason
Herbert applied this to his tree. It's probably a good stable candidate, since it's a two line change to fix a race condition. On Fri, Mar 24, 2017 at 3:16 PM, Herbert Xu <herbert@gondor.apana.org.au> wrote: > Jason A. Donenfeld <Jason@zx2c4.com> wrote: >> Under extremely heavy uses of padata, crashes occur, and with list >> debugging turned on, this happens instead: >> >> [87487.298728] WARNING: CPU: 1 PID: 882 at lib/list_debug.c:33 >> __list_add+0xae/0x130 >> [87487.301868] list_add corruption. prev->next should be next >> (ffffb17abfc043d0), but was ffff8dba70872c80. (prev=ffff8dba70872b00). >> [87487.339011] [<ffffffff9a53d075>] dump_stack+0x68/0xa3 >> [87487.342198] [<ffffffff99e119a1>] ? console_unlock+0x281/0x6d0 >> [87487.345364] [<ffffffff99d6b91f>] __warn+0xff/0x140 >> [87487.348513] [<ffffffff99d6b9aa>] warn_slowpath_fmt+0x4a/0x50 >> [87487.351659] [<ffffffff9a58b5de>] __list_add+0xae/0x130 >> [87487.354772] [<ffffffff9add5094>] ? _raw_spin_lock+0x64/0x70 >> [87487.357915] [<ffffffff99eefd66>] padata_reorder+0x1e6/0x420 >> [87487.361084] [<ffffffff99ef0055>] padata_do_serial+0xa5/0x120 >> >> padata_reorder calls list_add_tail with the list to which its adding >> locked, which seems correct: >> >> spin_lock(&squeue->serial.lock); >> list_add_tail(&padata->list, &squeue->serial.list); >> spin_unlock(&squeue->serial.lock); >> >> This therefore leaves only place where such inconsistency could occur: >> if padata->list is added at the same time on two different threads. >> This pdata pointer comes from the function call to >> padata_get_next(pd), which has in it the following block: >> >> next_queue = per_cpu_ptr(pd->pqueue, cpu); >> padata = NULL; >> reorder = &next_queue->reorder; >> if (!list_empty(&reorder->list)) { >> padata = list_entry(reorder->list.next, >> struct padata_priv, list); >> spin_lock(&reorder->lock); >> list_del_init(&padata->list); >> atomic_dec(&pd->reorder_objects); >> spin_unlock(&reorder->lock); >> >> pd->processed++; >> >> goto out; >> } >> out: >> return padata; >> >> I strongly suspect that the problem here is that two threads can race >> on reorder list. Even though the deletion is locked, call to >> list_entry is not locked, which means it's feasible that two threads >> pick up the same padata object and subsequently call list_add_tail on >> them at the same time. The fix is thus be hoist that lock outside of >> that block. >> >> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> > > Patch applied. Thanks. > -- > Email: Herbert Xu <herbert@gondor.apana.org.au> > Home Page: http://gondor.apana.org.au/~herbert/ > PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
On Tue, Apr 04, 2017 at 01:53:15PM +0200, Jason A. Donenfeld wrote: > Herbert applied this to his tree. It's probably a good stable > candidate, since it's a two line change to fix a race condition. > > On Fri, Mar 24, 2017 at 3:16 PM, Herbert Xu <herbert@gondor.apana.org.au> wrote: > > Jason A. Donenfeld <Jason@zx2c4.com> wrote: > >> Under extremely heavy uses of padata, crashes occur, and with list > >> debugging turned on, this happens instead: > >> > >> [87487.298728] WARNING: CPU: 1 PID: 882 at lib/list_debug.c:33 > >> __list_add+0xae/0x130 > >> [87487.301868] list_add corruption. prev->next should be next > >> (ffffb17abfc043d0), but was ffff8dba70872c80. (prev=ffff8dba70872b00). > >> [87487.339011] [<ffffffff9a53d075>] dump_stack+0x68/0xa3 > >> [87487.342198] [<ffffffff99e119a1>] ? console_unlock+0x281/0x6d0 > >> [87487.345364] [<ffffffff99d6b91f>] __warn+0xff/0x140 > >> [87487.348513] [<ffffffff99d6b9aa>] warn_slowpath_fmt+0x4a/0x50 > >> [87487.351659] [<ffffffff9a58b5de>] __list_add+0xae/0x130 > >> [87487.354772] [<ffffffff9add5094>] ? _raw_spin_lock+0x64/0x70 > >> [87487.357915] [<ffffffff99eefd66>] padata_reorder+0x1e6/0x420 > >> [87487.361084] [<ffffffff99ef0055>] padata_do_serial+0xa5/0x120 > >> > >> padata_reorder calls list_add_tail with the list to which its adding > >> locked, which seems correct: > >> > >> spin_lock(&squeue->serial.lock); > >> list_add_tail(&padata->list, &squeue->serial.list); > >> spin_unlock(&squeue->serial.lock); > >> > >> This therefore leaves only place where such inconsistency could occur: > >> if padata->list is added at the same time on two different threads. > >> This pdata pointer comes from the function call to > >> padata_get_next(pd), which has in it the following block: > >> > >> next_queue = per_cpu_ptr(pd->pqueue, cpu); > >> padata = NULL; > >> reorder = &next_queue->reorder; > >> if (!list_empty(&reorder->list)) { > >> padata = list_entry(reorder->list.next, > >> struct padata_priv, list); > >> spin_lock(&reorder->lock); > >> list_del_init(&padata->list); > >> atomic_dec(&pd->reorder_objects); > >> spin_unlock(&reorder->lock); > >> > >> pd->processed++; > >> > >> goto out; > >> } > >> out: > >> return padata; > >> > >> I strongly suspect that the problem here is that two threads can race > >> on reorder list. Even though the deletion is locked, call to > >> list_entry is not locked, which means it's feasible that two threads > >> pick up the same padata object and subsequently call list_add_tail on > >> them at the same time. The fix is thus be hoist that lock outside of > >> that block. > >> > >> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> > > > > Patch applied. Thanks. Any clue as to what the git commit id is? thanks, greg k-h
On Tue, Apr 04, 2017 at 08:26:12PM +0200, Greg KH wrote:
> Any clue as to what the git commit id is?
It's
https://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6.git/commit/?h=linus&id=de5540d088fe97ad583cc7d396586437b32149a5
Thanks,
diff --git a/kernel/padata.c b/kernel/padata.c index 05316c9f32da..3202aa17492c 100644 --- a/kernel/padata.c +++ b/kernel/padata.c @@ -186,19 +186,20 @@ static struct padata_priv *padata_get_next(struct parallel_data *pd) reorder = &next_queue->reorder; + spin_lock(&reorder->lock); if (!list_empty(&reorder->list)) { padata = list_entry(reorder->list.next, struct padata_priv, list); - spin_lock(&reorder->lock); list_del_init(&padata->list); atomic_dec(&pd->reorder_objects); - spin_unlock(&reorder->lock); pd->processed++; + spin_unlock(&reorder->lock); goto out; } + spin_unlock(&reorder->lock); if (__this_cpu_read(pd->pqueue->cpu_index) == next_queue->cpu_index) { padata = ERR_PTR(-ENODATA);
Under extremely heavy uses of padata, crashes occur, and with list debugging turned on, this happens instead: [87487.298728] WARNING: CPU: 1 PID: 882 at lib/list_debug.c:33 __list_add+0xae/0x130 [87487.301868] list_add corruption. prev->next should be next (ffffb17abfc043d0), but was ffff8dba70872c80. (prev=ffff8dba70872b00). [87487.339011] [<ffffffff9a53d075>] dump_stack+0x68/0xa3 [87487.342198] [<ffffffff99e119a1>] ? console_unlock+0x281/0x6d0 [87487.345364] [<ffffffff99d6b91f>] __warn+0xff/0x140 [87487.348513] [<ffffffff99d6b9aa>] warn_slowpath_fmt+0x4a/0x50 [87487.351659] [<ffffffff9a58b5de>] __list_add+0xae/0x130 [87487.354772] [<ffffffff9add5094>] ? _raw_spin_lock+0x64/0x70 [87487.357915] [<ffffffff99eefd66>] padata_reorder+0x1e6/0x420 [87487.361084] [<ffffffff99ef0055>] padata_do_serial+0xa5/0x120 padata_reorder calls list_add_tail with the list to which its adding locked, which seems correct: spin_lock(&squeue->serial.lock); list_add_tail(&padata->list, &squeue->serial.list); spin_unlock(&squeue->serial.lock); This therefore leaves only place where such inconsistency could occur: if padata->list is added at the same time on two different threads. This pdata pointer comes from the function call to padata_get_next(pd), which has in it the following block: next_queue = per_cpu_ptr(pd->pqueue, cpu); padata = NULL; reorder = &next_queue->reorder; if (!list_empty(&reorder->list)) { padata = list_entry(reorder->list.next, struct padata_priv, list); spin_lock(&reorder->lock); list_del_init(&padata->list); atomic_dec(&pd->reorder_objects); spin_unlock(&reorder->lock); pd->processed++; goto out; } out: return padata; I strongly suspect that the problem here is that two threads can race on reorder list. Even though the deletion is locked, call to list_entry is not locked, which means it's feasible that two threads pick up the same padata object and subsequently call list_add_tail on them at the same time. The fix is thus be hoist that lock outside of that block. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> --- kernel/padata.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)