diff mbox series

padata: Remove broken queue flushing

Message ID 20191119051731.yev6dcsp2znjaagz@gondor.apana.org.au (mailing list archive)
State Accepted
Delegated to: Herbert Xu
Headers show
Series padata: Remove broken queue flushing | expand

Commit Message

Herbert Xu Nov. 19, 2019, 5:17 a.m. UTC
The function padata_flush_queues is fundamentally broken because
it cannot force padata users to complete the request that is
underway.  IOW padata has to passively wait for the completion
of any outstanding work.

As it stands flushing is used in two places.  Its use in padata_stop
is simply unnecessary because nothing depends on the queues to
be flushed afterwards.

The other use in padata_replace is more substantial as we depend
on it to free the old pd structure.  This patch instead uses the
pd->refcnt to dynamically free the pd structure once all requests
are complete.

Fixes: 2b73b07ab8a4 ("padata: Flush the padata queues actively")
Cc: <stable@vger.kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Comments

Daniel Jordan Nov. 19, 2019, 7:24 p.m. UTC | #1
On Tue, Nov 19, 2019 at 01:17:31PM +0800, Herbert Xu wrote:
> The function padata_flush_queues is fundamentally broken because
> it cannot force padata users to complete the request that is
> underway.  IOW padata has to passively wait for the completion
> of any outstanding work.
> 
> As it stands flushing is used in two places.  Its use in padata_stop
> is simply unnecessary because nothing depends on the queues to
> be flushed afterwards.
> 
> The other use in padata_replace is more substantial as we depend
> on it to free the old pd structure.  This patch instead uses the
> pd->refcnt to dynamically free the pd structure once all requests
> are complete.

__padata_free unconditionally frees pd, so a padata job might choke on it
later.  padata_do_parallel calls seem safe because they use RCU, but it seems
possible that a job could call padata_do_serial after the instance is gone.

Best idea I can think of now is to indicate the instance has been freed in the
pd before dropping the initial pd ref in __padata_free, and use that to bail
out early from places that touch the instance or its data (workqueues say).
Will think more on this.


(By the way, I was on leave longer than anticipated, so thanks for picking up
my slack on this patch.  I plan to repost my other padata fixes soon.)
Herbert Xu Nov. 19, 2019, 9:53 p.m. UTC | #2
On Tue, Nov 19, 2019 at 02:24:05PM -0500, Daniel Jordan wrote:
>
> __padata_free unconditionally frees pd, so a padata job might choke on it
> later.  padata_do_parallel calls seem safe because they use RCU, but it seems
> possible that a job could call padata_do_serial after the instance is gone.

Actually __padata_free no longer frees the pd after my patch.

We should also mandate that __padata_free can only be called after
all jobs have completed.  This is already the case with pcrypt.

In fact we should discuss merging padata into pcrypt.  It's been
10 years and not a single user of padata has emerged in the kernel.

Cheers,
Daniel Jordan Nov. 19, 2019, 10:44 p.m. UTC | #3
On Wed, Nov 20, 2019 at 05:53:45AM +0800, Herbert Xu wrote:
> On Tue, Nov 19, 2019 at 02:24:05PM -0500, Daniel Jordan wrote:
> >
> > __padata_free unconditionally frees pd, so a padata job might choke on it
> > later.  padata_do_parallel calls seem safe because they use RCU, but it seems
> > possible that a job could call padata_do_serial after the instance is gone.
> 
> Actually __padata_free no longer frees the pd after my patch.

I assume you mean the third patch you recently posted, "crypto: pcrypt - Avoid
deadlock by using per-instance padata queues".  That's true, the problem is
fixed there, and the bug being present in bisection doesn't seem like enough
justification to implement something short-lived just to prevent it.

> We should also mandate that __padata_free can only be called after
> all jobs have completed.  This is already the case with pcrypt.

Makes sense to me, though I don't see how it's enforced now in pcrypt.  I'm not
an async crypto person, but wouldn't unloading the pcrypt module when there are
still outstanding async jobs break this rule?

> In fact we should discuss merging padata into pcrypt.  It's been
> 10 years and not a single user of padata has emerged in the kernel.

Actually, I'm working on an RFC right now to add more users for padata.  It
should be posted in the coming week or two, and I hope it can be part of that
discussion.
Herbert Xu Nov. 19, 2019, 10:50 p.m. UTC | #4
On Tue, Nov 19, 2019 at 05:44:32PM -0500, Daniel Jordan wrote:
> 
> I assume you mean the third patch you recently posted, "crypto: pcrypt - Avoid
> deadlock by using per-instance padata queues".  That's true, the problem is
> fixed there, and the bug being present in bisection doesn't seem like enough
> justification to implement something short-lived just to prevent it.

Right.  But as pcrypt is the only user this should still work.
 
> Makes sense to me, though I don't see how it's enforced now in pcrypt.  I'm not
> an async crypto person, but wouldn't unloading the pcrypt module when there are
> still outstanding async jobs break this rule?

It's enforced through module reference counting.  While there are
any outstanding requests, there must be allocated crypto tfms.  Each
crypto tfm maintains a module reference count on pcrypt which
prevents it from being unloaded.

> Actually, I'm working on an RFC right now to add more users for padata.  It
> should be posted in the coming week or two, and I hope it can be part of that
> discussion.

OK.

Cheers,
Daniel Jordan Nov. 27, 2019, 11:36 p.m. UTC | #5
On Tue, Nov 19, 2019 at 01:17:31PM +0800, Herbert Xu wrote:
> The function padata_flush_queues is fundamentally broken because
> it cannot force padata users to complete the request that is
> underway.  IOW padata has to passively wait for the completion
> of any outstanding work.
> 
> As it stands flushing is used in two places.  Its use in padata_stop
> is simply unnecessary because nothing depends on the queues to
> be flushed afterwards.
> 
> The other use in padata_replace is more substantial as we depend
> on it to free the old pd structure.  This patch instead uses the
> pd->refcnt to dynamically free the pd structure once all requests
> are complete.
> 
> Fixes: 2b73b07ab8a4 ("padata: Flush the padata queues actively")
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
diff mbox series

Patch

diff --git a/kernel/padata.c b/kernel/padata.c
index c3fec1413295..da56a235a255 100644
--- a/kernel/padata.c
+++ b/kernel/padata.c
@@ -35,6 +35,8 @@ 
 
 #define MAX_OBJ_NUM 1000
 
+static void padata_free_pd(struct parallel_data *pd);
+
 static int padata_index_to_cpu(struct parallel_data *pd, int cpu_index)
 {
 	int cpu, target_cpu;
@@ -283,6 +285,7 @@  static void padata_serial_worker(struct work_struct *serial_work)
 	struct padata_serial_queue *squeue;
 	struct parallel_data *pd;
 	LIST_HEAD(local_list);
+	int cnt;
 
 	local_bh_disable();
 	squeue = container_of(serial_work, struct padata_serial_queue, work);
@@ -292,6 +295,8 @@  static void padata_serial_worker(struct work_struct *serial_work)
 	list_replace_init(&squeue->serial.list, &local_list);
 	spin_unlock(&squeue->serial.lock);
 
+	cnt = 0;
+
 	while (!list_empty(&local_list)) {
 		struct padata_priv *padata;
 
@@ -301,9 +306,12 @@  static void padata_serial_worker(struct work_struct *serial_work)
 		list_del_init(&padata->list);
 
 		padata->serial(padata);
-		atomic_dec(&pd->refcnt);
+		cnt++;
 	}
 	local_bh_enable();
+
+	if (atomic_sub_and_test(cnt, &pd->refcnt))
+		padata_free_pd(pd);
 }
 
 /**
@@ -440,7 +448,7 @@  static struct parallel_data *padata_alloc_pd(struct padata_instance *pinst,
 	padata_init_squeues(pd);
 	atomic_set(&pd->seq_nr, -1);
 	atomic_set(&pd->reorder_objects, 0);
-	atomic_set(&pd->refcnt, 0);
+	atomic_set(&pd->refcnt, 1);
 	spin_lock_init(&pd->lock);
 	pd->cpu = cpumask_first(pd->cpumask.pcpu);
 	INIT_WORK(&pd->reorder_work, invoke_padata_reorder);
@@ -466,29 +474,6 @@  static void padata_free_pd(struct parallel_data *pd)
 	kfree(pd);
 }
 
-/* Flush all objects out of the padata queues. */
-static void padata_flush_queues(struct parallel_data *pd)
-{
-	int cpu;
-	struct padata_parallel_queue *pqueue;
-	struct padata_serial_queue *squeue;
-
-	for_each_cpu(cpu, pd->cpumask.pcpu) {
-		pqueue = per_cpu_ptr(pd->pqueue, cpu);
-		flush_work(&pqueue->work);
-	}
-
-	if (atomic_read(&pd->reorder_objects))
-		padata_reorder(pd);
-
-	for_each_cpu(cpu, pd->cpumask.cbcpu) {
-		squeue = per_cpu_ptr(pd->squeue, cpu);
-		flush_work(&squeue->work);
-	}
-
-	BUG_ON(atomic_read(&pd->refcnt) != 0);
-}
-
 static void __padata_start(struct padata_instance *pinst)
 {
 	pinst->flags |= PADATA_INIT;
@@ -502,10 +487,6 @@  static void __padata_stop(struct padata_instance *pinst)
 	pinst->flags &= ~PADATA_INIT;
 
 	synchronize_rcu();
-
-	get_online_cpus();
-	padata_flush_queues(pinst->pd);
-	put_online_cpus();
 }
 
 /* Replace the internal control structure with a new one. */
@@ -526,8 +507,8 @@  static void padata_replace(struct padata_instance *pinst,
 	if (!cpumask_equal(pd_old->cpumask.cbcpu, pd_new->cpumask.cbcpu))
 		notification_mask |= PADATA_CPU_SERIAL;
 
-	padata_flush_queues(pd_old);
-	padata_free_pd(pd_old);
+	if (atomic_dec_and_test(&pd_old->refcnt))
+		padata_free_pd(pd_old);
 
 	if (notification_mask)
 		blocking_notifier_call_chain(&pinst->cpumask_change_notifier,