diff mbox series

[v3,4/9] percpu-refcount: Introduce percpu_ref_read()

Message ID 20180802182944.14442-5-bart.vanassche@wdc.com (mailing list archive)
State New, archived
Headers show
Series blk-mq: Enable runtime power management | expand

Commit Message

Bart Van Assche Aug. 2, 2018, 6:29 p.m. UTC
Introduce a function that allows to read the value of a per-cpu counter.
This function will be used in the next patch to check whether a per-cpu
counter in atomic mode has the value one.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
---
 include/linux/percpu-refcount.h |  2 ++
 lib/percpu-refcount.c           | 29 +++++++++++++++++++++++++++++
 2 files changed, 31 insertions(+)

Comments

Tejun Heo Aug. 2, 2018, 7:06 p.m. UTC | #1
Hello,

On Thu, Aug 02, 2018 at 11:29:39AM -0700, Bart Van Assche wrote:
> Introduce a function that allows to read the value of a per-cpu counter.
> This function will be used in the next patch to check whether a per-cpu
> counter in atomic mode has the value one.

I'm not a big fan of exposing this.  If you need to test for atomic &&
1, I'd much prefer a helper specifically testing for that.  But can
you please explain a bit why you need this?

Thanks.
Bart Van Assche Aug. 2, 2018, 8:04 p.m. UTC | #2
On 08/02/18 12:06, Tejun Heo wrote:
> On Thu, Aug 02, 2018 at 11:29:39AM -0700, Bart Van Assche wrote:
>> Introduce a function that allows to read the value of a per-cpu counter.
>> This function will be used in the next patch to check whether a per-cpu
>> counter in atomic mode has the value one.
> 
> I'm not a big fan of exposing this.  If you need to test for atomic &&
> 1, I'd much prefer a helper specifically testing for that.  But can
> you please explain a bit why you need this?
Hello Tejun,

As you probably know one of the long term goals for the block layer is 
to switch to blk-mq and to drop the legacy block layer. Hence this patch 
series that adds support for runtime power management to blk-mq because 
today that feature is missing from blk-mq. The approach is the same as 
for the legacy block layer: if the autosuspend timer expires and no
requests are in flight, suspend the block device. So we need a mechanism 
to track whether or not any requests are in flight. One possible 
approach is to check the value of q_usage_counter. percpu_ref_is_zero() 
could only be used to check q_usage_counter if that counter would be 
switched to atomic mode first and if the initial reference would be 
dropped too. I want to avoid the overhead of that switch to atomic mode 
whenever possible. Hence the proposal to introduce the percpu_ref_read() 
function. If the value returned by that function is larger than one then 
we know that requests are in flight and hence that the switch to atomic 
mode can be skipped.

Proposals for alternative approaches are welcome.

Thanks,

Bart.
Tejun Heo Aug. 6, 2018, 5:18 p.m. UTC | #3
Hello, Bart.

On Thu, Aug 02, 2018 at 01:04:43PM -0700, Bart Van Assche wrote:
> As you probably know one of the long term goals for the block layer
> is to switch to blk-mq and to drop the legacy block layer. Hence
> this patch series that adds support for runtime power management to
> blk-mq because today that feature is missing from blk-mq. The
> approach is the same as for the legacy block layer: if the
> autosuspend timer expires and no
> requests are in flight, suspend the block device. So we need a
> mechanism to track whether or not any requests are in flight. One
> possible approach is to check the value of q_usage_counter.
> percpu_ref_is_zero() could only be used to check q_usage_counter if
> that counter would be switched to atomic mode first and if the
> initial reference would be dropped too. I want to avoid the overhead
> of that switch to atomic mode whenever possible. Hence the proposal
> to introduce the percpu_ref_read() function. If the value returned
> by that function is larger than one then we know that requests are
> in flight and hence that the switch to atomic mode can be skipped.
> 
> Proposals for alternative approaches are welcome.

I'm worried that this is too inviting for misuses and subtle problems.
For example, your patch which uses this function uses
synchronize_rcu() to synchronize against per-cpu counts after the
first snoop; however, percpu_ref uses sched rcu, not the regular one,
so depending on the config, this will lead to *really* subtle
failures.  Even if that gets fixed, it's still leaking percpu-ref
internal details to its users - details which may change in the future
and will cause super subtle bugs.

I'd go for something a lot more specific, like percpu_ref_is_one(), so
that all the magics can be contained in percpu-ref implementation
proper.

Thanks.
Bart Van Assche Aug. 6, 2018, 5:28 p.m. UTC | #4
On Mon, 2018-08-06 at 10:18 -0700, Tejun Heo wrote:
> I'm worried that this is too inviting for misuses and subtle problems.
> For example, your patch which uses this function uses
> synchronize_rcu() to synchronize against per-cpu counts after the
> first snoop; however, percpu_ref uses sched rcu, not the regular one,
> so depending on the config, this will lead to *really* subtle
> failures.  Even if that gets fixed, it's still leaking percpu-ref
> internal details to its users - details which may change in the future
> and will cause super subtle bugs.
> 
> I'd go for something a lot more specific, like percpu_ref_is_one(), so
> that all the magics can be contained in percpu-ref implementation
> proper.

Hello Tejun,

I will try to come up with an approach that does not require to modify the
per-cpu counter code.

Thanks,

Bart.
Bart Van Assche Aug. 7, 2018, 10:53 p.m. UTC | #5
On Mon, 2018-08-06 at 10:18 -0700, Tejun Heo wrote:
> I'm worried that this is too inviting for misuses and subtle problems.
> For example, your patch which uses this function uses
> synchronize_rcu() to synchronize against per-cpu counts after the
> first snoop; however, percpu_ref uses sched rcu, not the regular one,
> so depending on the config, this will lead to *really* subtle
> failures.  Even if that gets fixed, it's still leaking percpu-ref
> internal details to its users - details which may change in the future
> and will cause super subtle bugs.
> 
> I'd go for something a lot more specific, like percpu_ref_is_one(), so
> that all the magics can be contained in percpu-ref implementation
> proper.

Hi Tejun,

Can you have a look at the new percpu_ref_is_in_use() function? Please also
note that the synchronize_rcu() call between the two percpu_ref_is_in_use()
calls is not related to the use of RCU in the per-cpu refcount implementation.

Thanks,

Bart.
diff mbox series

Patch

diff --git a/include/linux/percpu-refcount.h b/include/linux/percpu-refcount.h
index 009cdf3d65b6..5707289ba828 100644
--- a/include/linux/percpu-refcount.h
+++ b/include/linux/percpu-refcount.h
@@ -331,4 +331,6 @@  static inline bool percpu_ref_is_zero(struct percpu_ref *ref)
 	return !atomic_long_read(&ref->count);
 }
 
+unsigned long percpu_ref_read(struct percpu_ref *ref);
+
 #endif
diff --git a/lib/percpu-refcount.c b/lib/percpu-refcount.c
index 9f96fa7bc000..c0b9fc8efa6b 100644
--- a/lib/percpu-refcount.c
+++ b/lib/percpu-refcount.c
@@ -369,3 +369,32 @@  void percpu_ref_reinit(struct percpu_ref *ref)
 	spin_unlock_irqrestore(&percpu_ref_switch_lock, flags);
 }
 EXPORT_SYMBOL_GPL(percpu_ref_reinit);
+
+/**
+ * percpu_ref_read - read a percpu refcount
+ * @ref: percpu_ref to test
+ *
+ * This function is safe to call as long as @ref is between init and exit. It
+ * is the responsibility of the caller to handle changes of @ref concurrently
+ * with this function. If this function is called while @ref is in per-cpu
+ * mode the returned value may be incorrect if e.g. percpu_ref_get() is called
+ * from one CPU and percpu_ref_put() is called from another CPU.
+ */
+unsigned long percpu_ref_read(struct percpu_ref *ref)
+{
+	unsigned long __percpu *percpu_count;
+	unsigned long sum = 0;
+	int cpu;
+
+	rcu_read_lock_sched();
+	if (__ref_is_percpu(ref, &percpu_count)) {
+		for_each_possible_cpu(cpu)
+			sum += *per_cpu_ptr(percpu_count, cpu);
+	}
+	rcu_read_unlock_sched();
+	sum += atomic_long_read(&ref->count);
+	sum &= ~PERCPU_COUNT_BIAS;
+
+	return sum;
+}
+EXPORT_SYMBOL_GPL(percpu_ref_read);