Message ID | 1536120586-3378-1-git-send-email-jianchao.w.wang@oracle.com (mailing list archive) |
---|---|
Headers | show |
Series | Introduce a light-weight queue close feature | expand |
On Wed, 2018-09-05 at 12:09 +0800, Jianchao Wang wrote: > As we know, queue freeze is used to stop new IO comming in and drain > the request queue. And the draining queue here is necessary, because > queue freeze kills the percpu-ref q_usage_counter and need to drain > the q_usage_counter before switch it back to percpu mode. This could > be a trouble when we just want to prevent new IO. > > In nvme-pci, nvme_dev_disable freezes queues to prevent new IO. > nvme_reset_work will unfreeze and wait to drain the queues. However, > if IO timeout at the moment, no body could do recovery as nvme_reset_work > is waiting. We will encounter IO hang. > > So introduce a light-weight queue close feature in this patch set > which could prevent new IO and needn't drain the queue. > > The 1st patch introduces a queue_gate into request queue and migrate > preempt only from queue flags on it. > > The 2nd patch introduces queue close feature. > > The 3rd patch apply the queue close in nvme-pci to avoid the IO hang > issue above. Hello Jianchao, Is this patch series based on a theoretical concern or rather on something you ran into? In the latter case, can you explain which scenario makes it likely on your setup to encounter an NVMe timeout? Thanks, Bart.
On Wed, Sep 05, 2018 at 12:09:43PM +0800, Jianchao Wang wrote: > Dear all > > As we know, queue freeze is used to stop new IO comming in and drain > the request queue. And the draining queue here is necessary, because > queue freeze kills the percpu-ref q_usage_counter and need to drain > the q_usage_counter before switch it back to percpu mode. This could > be a trouble when we just want to prevent new IO. > > In nvme-pci, nvme_dev_disable freezes queues to prevent new IO. > nvme_reset_work will unfreeze and wait to drain the queues. However, > if IO timeout at the moment, no body could do recovery as nvme_reset_work > is waiting. We will encounter IO hang. As we discussed this nvme time issue before, I have pointed out that this is because of blk_mq_unfreeze_queue()'s limit which requires that unfreeze can only be done when this queue ref counter drops to zero. For this nvme timeout case, we may relax the limit, for example, introducing another API of blk_freeze_queue_stop() as counter-pair of blk_freeze_queue_start(), and simply switch the percpu-ref to percpu mode from atomic mode inside the new API. > > So introduce a light-weight queue close feature in this patch set > which could prevent new IO and needn't drain the queue. Frankly speaking, IMO, it may not be an good idea to mess up the fast path just for handling the extremely unusual timeout event. The same is true for doing the preemp only stuff, as you saw I have posted patchset for killing it. Thanks, Ming
Hi Bart On 09/05/2018 11:48 PM, Bart Van Assche wrote: >> The 3rd patch apply the queue close in nvme-pci to avoid the IO hang >> issue above. > Hello Jianchao, > > Is this patch series based on a theoretical concern or rather on something > you ran into? In the latter case, can you explain which scenario makes it > likely on your setup to encounter an NVMe timeout? It is a theoretical concern. But I think it is very obvious. It could be very easy to reproduce if simulate IO timeout in software. Thanks Jianchao
Hi Ming On 09/06/2018 05:27 AM, Ming Lei wrote: > On Wed, Sep 05, 2018 at 12:09:43PM +0800, Jianchao Wang wrote: >> Dear all >> >> As we know, queue freeze is used to stop new IO comming in and drain >> the request queue. And the draining queue here is necessary, because >> queue freeze kills the percpu-ref q_usage_counter and need to drain >> the q_usage_counter before switch it back to percpu mode. This could >> be a trouble when we just want to prevent new IO. >> >> In nvme-pci, nvme_dev_disable freezes queues to prevent new IO. >> nvme_reset_work will unfreeze and wait to drain the queues. However, >> if IO timeout at the moment, no body could do recovery as nvme_reset_work >> is waiting. We will encounter IO hang. > > As we discussed this nvme time issue before, I have pointed out that > this is because of blk_mq_unfreeze_queue()'s limit which requires that > unfreeze can only be done when this queue ref counter drops to zero. > > For this nvme timeout case, we may relax the limit, for example, > introducing another API of blk_freeze_queue_stop() as counter-pair of > blk_freeze_queue_start(), and simply switch the percpu-ref to percpu mode > from atomic mode inside the new API. Looks like we cannot switch a percpu-ref to percpu mode directly w/o drain it. Some references maybe lost. static void __percpu_ref_switch_to_percpu(struct percpu_ref *ref) { unsigned long __percpu *percpu_count = percpu_count_ptr(ref); int cpu; BUG_ON(!percpu_count); if (!(ref->percpu_count_ptr & __PERCPU_REF_ATOMIC)) return; atomic_long_add(PERCPU_COUNT_BIAS, &ref->count); /* * Restore per-cpu operation. smp_store_release() is paired * with READ_ONCE() in __ref_is_percpu() and guarantees that the * zeroing is visible to all percpu accesses which can see the * following __PERCPU_REF_ATOMIC clearing. */ for_each_possible_cpu(cpu) *per_cpu_ptr(percpu_count, cpu) = 0; smp_store_release(&ref->percpu_count_ptr, ref->percpu_count_ptr & ~__PERCPU_REF_ATOMIC); } > >> >> So introduce a light-weight queue close feature in this patch set >> which could prevent new IO and needn't drain the queue. > > Frankly speaking, IMO, it may not be an good idea to mess up the fast path > just for handling the extremely unusual timeout event. The same is true > for doing the preemp only stuff, as you saw I have posted patchset for > killing it. > In normal case, it is just a judgment like if (unlikely(READ_ONCE(q->queue_gate)) It should not be a big deal. Thanks Jianchao > Thanks, > Ming > > _______________________________________________ > Linux-nvme mailing list > Linux-nvme@lists.infradead.org > https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.infradead.org_mailman_listinfo_linux-2Dnvme&d=DwICAg&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=7WdAxUBeiTUTCy8v-7zXyr4qk7sx26ATvfo6QSTvZyQ&m=DAEJOGyHAQ8bbVD3QYxBNFE2vn70OyFrmEF5VkwzHRw&s=pmqDbwFqgHROYOJBCE4k9SKOc7PRMk4ESya6gYks_CQ&e= >
On Thu, Sep 06, 2018 at 09:51:43AM +0800, jianchao.wang wrote: > Hi Ming > > On 09/06/2018 05:27 AM, Ming Lei wrote: > > On Wed, Sep 05, 2018 at 12:09:43PM +0800, Jianchao Wang wrote: > >> Dear all > >> > >> As we know, queue freeze is used to stop new IO comming in and drain > >> the request queue. And the draining queue here is necessary, because > >> queue freeze kills the percpu-ref q_usage_counter and need to drain > >> the q_usage_counter before switch it back to percpu mode. This could > >> be a trouble when we just want to prevent new IO. > >> > >> In nvme-pci, nvme_dev_disable freezes queues to prevent new IO. > >> nvme_reset_work will unfreeze and wait to drain the queues. However, > >> if IO timeout at the moment, no body could do recovery as nvme_reset_work > >> is waiting. We will encounter IO hang. > > > > As we discussed this nvme time issue before, I have pointed out that > > this is because of blk_mq_unfreeze_queue()'s limit which requires that > > unfreeze can only be done when this queue ref counter drops to zero. > > > > For this nvme timeout case, we may relax the limit, for example, > > introducing another API of blk_freeze_queue_stop() as counter-pair of > > blk_freeze_queue_start(), and simply switch the percpu-ref to percpu mode > > from atomic mode inside the new API. > > Looks like we cannot switch a percpu-ref to percpu mode directly w/o drain it. > Some references maybe lost. > > static void __percpu_ref_switch_to_percpu(struct percpu_ref *ref) > { > unsigned long __percpu *percpu_count = percpu_count_ptr(ref); > int cpu; > > BUG_ON(!percpu_count); > > if (!(ref->percpu_count_ptr & __PERCPU_REF_ATOMIC)) > return; > > atomic_long_add(PERCPU_COUNT_BIAS, &ref->count); > > /* > * Restore per-cpu operation. smp_store_release() is paired > * with READ_ONCE() in __ref_is_percpu() and guarantees that the > * zeroing is visible to all percpu accesses which can see the > * following __PERCPU_REF_ATOMIC clearing. > */ > for_each_possible_cpu(cpu) > *per_cpu_ptr(percpu_count, cpu) = 0; > > smp_store_release(&ref->percpu_count_ptr, > ref->percpu_count_ptr & ~__PERCPU_REF_ATOMIC); Before REF_ATOMIC is cleared, all counting is done on the atomic type of &ref->count, and it is easy to keep the reference counter at ATOMIC mode. Also the reference counter can only be READ at atomic mode. So could you explain a bit how the lost may happen? And it is lost at atomic mode or percpu mode? > } > > > > >> > >> So introduce a light-weight queue close feature in this patch set > >> which could prevent new IO and needn't drain the queue. > > > > Frankly speaking, IMO, it may not be an good idea to mess up the fast path > > just for handling the extremely unusual timeout event. The same is true > > for doing the preemp only stuff, as you saw I have posted patchset for > > killing it. > > > > In normal case, it is just a judgment like > > if (unlikely(READ_ONCE(q->queue_gate)) > > It should not be a big deal. Adding this stuff in fast path is quite difficult to verify its correctness because it is really lockless, or even barrier-less. Not to mention, READ_ONCE() implies one barrier of smp_read_barrier_depends(). Thanks, Ming
On 09/06/2018 08:57 PM, Ming Lei wrote: > On Thu, Sep 06, 2018 at 09:51:43AM +0800, jianchao.wang wrote: >> Hi Ming >> >> On 09/06/2018 05:27 AM, Ming Lei wrote: >>> On Wed, Sep 05, 2018 at 12:09:43PM +0800, Jianchao Wang wrote: >>>> Dear all >>>> >>>> As we know, queue freeze is used to stop new IO comming in and drain >>>> the request queue. And the draining queue here is necessary, because >>>> queue freeze kills the percpu-ref q_usage_counter and need to drain >>>> the q_usage_counter before switch it back to percpu mode. This could >>>> be a trouble when we just want to prevent new IO. >>>> >>>> In nvme-pci, nvme_dev_disable freezes queues to prevent new IO. >>>> nvme_reset_work will unfreeze and wait to drain the queues. However, >>>> if IO timeout at the moment, no body could do recovery as nvme_reset_work >>>> is waiting. We will encounter IO hang. >>> >>> As we discussed this nvme time issue before, I have pointed out that >>> this is because of blk_mq_unfreeze_queue()'s limit which requires that >>> unfreeze can only be done when this queue ref counter drops to zero. >>> >>> For this nvme timeout case, we may relax the limit, for example, >>> introducing another API of blk_freeze_queue_stop() as counter-pair of >>> blk_freeze_queue_start(), and simply switch the percpu-ref to percpu mode >>> from atomic mode inside the new API. >> >> Looks like we cannot switch a percpu-ref to percpu mode directly w/o drain it. >> Some references maybe lost. >> >> static void __percpu_ref_switch_to_percpu(struct percpu_ref *ref) >> { >> unsigned long __percpu *percpu_count = percpu_count_ptr(ref); >> int cpu; >> >> BUG_ON(!percpu_count); >> >> if (!(ref->percpu_count_ptr & __PERCPU_REF_ATOMIC)) >> return; >> >> atomic_long_add(PERCPU_COUNT_BIAS, &ref->count); >> >> /* >> * Restore per-cpu operation. smp_store_release() is paired >> * with READ_ONCE() in __ref_is_percpu() and guarantees that the >> * zeroing is visible to all percpu accesses which can see the >> * following __PERCPU_REF_ATOMIC clearing.i >> */ >> for_each_possible_cpu(cpu) >> *per_cpu_ptr(percpu_count, cpu) = 0; >> >> smp_store_release(&ref->percpu_count_ptr, >> ref->percpu_count_ptr & ~__PERCPU_REF_ATOMIC); > > Before REF_ATOMIC is cleared, all counting is done on the atomic type > of &ref->count, and it is easy to keep the reference counter at > ATOMIC mode. Also the reference counter can only be READ at atomic mode. > > So could you explain a bit how the lost may happen? And it is lost at > atomic mode or percpu mode? I just mean __percpu_ref_switch_percpu just zeros the percpu_count. It doesn't give the original values back to the percpu_count from atomic count > >> } >> >>> >>>> >>>> So introduce a light-weight queue close feature in this patch set >>>> which could prevent new IO and needn't drain the queue. >>> >>> Frankly speaking, IMO, it may not be an good idea to mess up the fast path >>> just for handling the extremely unusual timeout event. The same is true >>> for doing the preemp only stuff, as you saw I have posted patchset for >>> killing it. >>> >> >> In normal case, it is just a judgment like >> >> if (unlikely(READ_ONCE(q->queue_gate)) >> >> It should not be a big deal.> > Adding this stuff in fast path is quite difficult to verify its correctness > because it is really lockless, or even barrier-less. > > Not to mention, READ_ONCE() implies one barrier of smp_read_barrier_depends(). The checking is under rcu lock, the write side could use synchonize_rcu to ensure the updating is globally visible.t As for the READ_ONCE, it could be discarded. Thanks Jianchao > > Thanks, > Ming > > _______________________________________________ > Linux-nvme mailing list > Linux-nvme@lists.infradead.org > https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.infradead.org_mailman_listinfo_linux-2Dnvme&d=DwICAg&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=7WdAxUBeiTUTCy8v-7zXyr4qk7sx26ATvfo6QSTvZyQ&m=BQVCMSSS6lYwogr6CE82oIlpLu5ReP8c4lHGgnvswV4&s=rZLfbqzKKXjlCpY1Sy6ocaQjAKZoJq_A49gvSucohNk&e= >
On Thu, Sep 06, 2018 at 09:55:21PM +0800, jianchao.wang wrote: > > > On 09/06/2018 08:57 PM, Ming Lei wrote: > > On Thu, Sep 06, 2018 at 09:51:43AM +0800, jianchao.wang wrote: > >> Hi Ming > >> > >> On 09/06/2018 05:27 AM, Ming Lei wrote: > >>> On Wed, Sep 05, 2018 at 12:09:43PM +0800, Jianchao Wang wrote: > >>>> Dear all > >>>> > >>>> As we know, queue freeze is used to stop new IO comming in and drain > >>>> the request queue. And the draining queue here is necessary, because > >>>> queue freeze kills the percpu-ref q_usage_counter and need to drain > >>>> the q_usage_counter before switch it back to percpu mode. This could > >>>> be a trouble when we just want to prevent new IO. > >>>> > >>>> In nvme-pci, nvme_dev_disable freezes queues to prevent new IO. > >>>> nvme_reset_work will unfreeze and wait to drain the queues. However, > >>>> if IO timeout at the moment, no body could do recovery as nvme_reset_work > >>>> is waiting. We will encounter IO hang. > >>> > >>> As we discussed this nvme time issue before, I have pointed out that > >>> this is because of blk_mq_unfreeze_queue()'s limit which requires that > >>> unfreeze can only be done when this queue ref counter drops to zero. > >>> > >>> For this nvme timeout case, we may relax the limit, for example, > >>> introducing another API of blk_freeze_queue_stop() as counter-pair of > >>> blk_freeze_queue_start(), and simply switch the percpu-ref to percpu mode > >>> from atomic mode inside the new API. > >> > >> Looks like we cannot switch a percpu-ref to percpu mode directly w/o drain it. > >> Some references maybe lost. > >> > >> static void __percpu_ref_switch_to_percpu(struct percpu_ref *ref) > >> { > >> unsigned long __percpu *percpu_count = percpu_count_ptr(ref); > >> int cpu; > >> > >> BUG_ON(!percpu_count); > >> > >> if (!(ref->percpu_count_ptr & __PERCPU_REF_ATOMIC)) > >> return; > >> > >> atomic_long_add(PERCPU_COUNT_BIAS, &ref->count); > >> > >> /* > >> * Restore per-cpu operation. smp_store_release() is paired > >> * with READ_ONCE() in __ref_is_percpu() and guarantees that the > >> * zeroing is visible to all percpu accesses which can see the > >> * following __PERCPU_REF_ATOMIC clearing.i > >> */ > >> for_each_possible_cpu(cpu) > >> *per_cpu_ptr(percpu_count, cpu) = 0; > >> > >> smp_store_release(&ref->percpu_count_ptr, > >> ref->percpu_count_ptr & ~__PERCPU_REF_ATOMIC); > > > > Before REF_ATOMIC is cleared, all counting is done on the atomic type > > of &ref->count, and it is easy to keep the reference counter at > > ATOMIC mode. Also the reference counter can only be READ at atomic mode. > > > > So could you explain a bit how the lost may happen? And it is lost at > > atomic mode or percpu mode? > > I just mean __percpu_ref_switch_percpu just zeros the percpu_count. > It doesn't give the original values back to the percpu_count from atomic count The original value has been added to the atomic part of the refcount, please see percpu_ref_switch_to_atomic_rcu(), so zeroing is fine. What do you think of the following patch? Which will allow to reinit one percpu_refcount from atomic mode when it doesn't drop to zero. -- diff --git a/block/blk-mq.c b/block/blk-mq.c index 85a1c1a59c72..c05d8924b4cb 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -134,19 +134,33 @@ void blk_mq_in_flight_rw(struct request_queue *q, struct hd_struct *part, blk_mq_queue_tag_busy_iter(q, blk_mq_check_inflight_rw, &mi); } -void blk_freeze_queue_start(struct request_queue *q) +static void __blk_freeze_queue_start(struct request_queue *q, bool kill_sync) { int freeze_depth; freeze_depth = atomic_inc_return(&q->mq_freeze_depth); if (freeze_depth == 1) { - percpu_ref_kill(&q->q_usage_counter); + if (kill_sync) + percpu_ref_kill_sync(&q->q_usage_counter); + else + percpu_ref_kill(&q->q_usage_counter); if (q->mq_ops) blk_mq_run_hw_queues(q, false); } } + +void blk_freeze_queue_start(struct request_queue *q) +{ + __blk_freeze_queue_start(q, false); +} EXPORT_SYMBOL_GPL(blk_freeze_queue_start); +void blk_freeze_queue_start_sync(struct request_queue *q) +{ + __blk_freeze_queue_start(q, true); +} +EXPORT_SYMBOL_GPL(blk_freeze_queue_start_sync); + void blk_mq_freeze_queue_wait(struct request_queue *q) { wait_event(q->mq_freeze_wq, percpu_ref_is_zero(&q->q_usage_counter)); diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index dd8ec1dd9219..bde5a054597d 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -3644,7 +3644,7 @@ void nvme_start_freeze(struct nvme_ctrl *ctrl) down_read(&ctrl->namespaces_rwsem); list_for_each_entry(ns, &ctrl->namespaces, list) - blk_freeze_queue_start(ns->queue); + blk_freeze_queue_start_sync(ns->queue); up_read(&ctrl->namespaces_rwsem); } EXPORT_SYMBOL_GPL(nvme_start_freeze); diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index d668682f91df..7b9ac1f9bce3 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -2328,7 +2328,6 @@ static void nvme_reset_work(struct work_struct *work) new_state = NVME_CTRL_ADMIN_ONLY; } else { nvme_start_queues(&dev->ctrl); - nvme_wait_freeze(&dev->ctrl); /* hit this only when allocate tagset fails */ if (nvme_dev_add(dev)) new_state = NVME_CTRL_ADMIN_ONLY; diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index 1da59c16f637..5cf8e3a6335c 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -280,6 +280,7 @@ void blk_mq_tagset_busy_iter(struct blk_mq_tag_set *tagset, void blk_mq_freeze_queue(struct request_queue *q); void blk_mq_unfreeze_queue(struct request_queue *q); void blk_freeze_queue_start(struct request_queue *q); +void blk_freeze_queue_start_sync(struct request_queue *q); void blk_mq_freeze_queue_wait(struct request_queue *q); int blk_mq_freeze_queue_wait_timeout(struct request_queue *q, unsigned long timeout); diff --git a/include/linux/percpu-refcount.h b/include/linux/percpu-refcount.h index 009cdf3d65b6..36f4428b2a6c 100644 --- a/include/linux/percpu-refcount.h +++ b/include/linux/percpu-refcount.h @@ -127,6 +127,8 @@ static inline void percpu_ref_kill(struct percpu_ref *ref) percpu_ref_kill_and_confirm(ref, NULL); } +extern void percpu_ref_kill_sync(struct percpu_ref *ref); + /* * Internal helper. Don't use outside percpu-refcount proper. The * function doesn't return the pointer and let the caller test it for NULL diff --git a/lib/percpu-refcount.c b/lib/percpu-refcount.c index 9f96fa7bc000..4c0ee5eda2d6 100644 --- a/lib/percpu-refcount.c +++ b/lib/percpu-refcount.c @@ -130,8 +130,10 @@ static void percpu_ref_switch_to_atomic_rcu(struct rcu_head *rcu) unsigned long count = 0; int cpu; - for_each_possible_cpu(cpu) + for_each_possible_cpu(cpu) { count += *per_cpu_ptr(percpu_count, cpu); + *per_cpu_ptr(percpu_count, cpu) = 0; + } pr_debug("global %ld percpu %ld", atomic_long_read(&ref->count), (long)count); @@ -187,7 +189,6 @@ static void __percpu_ref_switch_to_atomic(struct percpu_ref *ref, static void __percpu_ref_switch_to_percpu(struct percpu_ref *ref) { unsigned long __percpu *percpu_count = percpu_count_ptr(ref); - int cpu; BUG_ON(!percpu_count); @@ -196,15 +197,6 @@ static void __percpu_ref_switch_to_percpu(struct percpu_ref *ref) atomic_long_add(PERCPU_COUNT_BIAS, &ref->count); - /* - * Restore per-cpu operation. smp_store_release() is paired - * with READ_ONCE() in __ref_is_percpu() and guarantees that the - * zeroing is visible to all percpu accesses which can see the - * following __PERCPU_REF_ATOMIC clearing. - */ - for_each_possible_cpu(cpu) - *per_cpu_ptr(percpu_count, cpu) = 0; - smp_store_release(&ref->percpu_count_ptr, ref->percpu_count_ptr & ~__PERCPU_REF_ATOMIC); } @@ -278,6 +270,23 @@ void percpu_ref_switch_to_atomic_sync(struct percpu_ref *ref) EXPORT_SYMBOL_GPL(percpu_ref_switch_to_atomic_sync); /** + * percpu_ref_kill_sync - drop the initial ref and and wait for the + * switch to complete. + * @ref: percpu_ref to kill + * + * Must be used to drop the initial ref on a percpu refcount; must be called + * precisely once before shutdown. + * + * Switches @ref into atomic mode before gathering up the percpu counters + * and dropping the initial ref. + */ +void percpu_ref_kill_sync(struct percpu_ref *ref) +{ + percpu_ref_kill(ref); + wait_event(percpu_ref_switch_waitq, !ref->confirm_switch); +} + +/** * percpu_ref_switch_to_percpu - switch a percpu_ref to percpu mode * @ref: percpu_ref to switch to percpu mode * @@ -349,7 +358,7 @@ EXPORT_SYMBOL_GPL(percpu_ref_kill_and_confirm); * * Re-initialize @ref so that it's in the same state as when it finished * percpu_ref_init() ignoring %PERCPU_REF_INIT_DEAD. @ref must have been - * initialized successfully and reached 0 but not exited. + * initialized successfully and in atomic mode but not exited. * * Note that percpu_ref_tryget[_live]() are safe to perform on @ref while * this function is in progress. @@ -357,10 +366,11 @@ EXPORT_SYMBOL_GPL(percpu_ref_kill_and_confirm); void percpu_ref_reinit(struct percpu_ref *ref) { unsigned long flags; + unsigned long __percpu *percpu_count; spin_lock_irqsave(&percpu_ref_switch_lock, flags); - WARN_ON_ONCE(!percpu_ref_is_zero(ref)); + WARN_ON_ONCE(__ref_is_percpu(ref, &percpu_count)); ref->percpu_count_ptr &= ~__PERCPU_REF_DEAD; percpu_ref_get(ref); Thanks, Ming