diff mbox series

[-next,RFC,v2,8/8] sbitmap: wake up the number of threads based on required tags

Message ID 20220408073916.1428590-9-yukuai3@huawei.com (mailing list archive)
State New, archived
Headers show
Series improve tag allocation under heavy load | expand

Commit Message

Yu Kuai April 8, 2022, 7:39 a.m. UTC
Always wake up 'wake_batch' threads will intensify competition and
split io won't be issued continuously. Now that how many tags is required
is recorded for huge io, it's safe to wake up baed on required tags.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 lib/sbitmap.c | 22 +++++++++++++++++++++-
 1 file changed, 21 insertions(+), 1 deletion(-)

Comments

Bart Van Assche April 8, 2022, 2:31 p.m. UTC | #1
On 4/8/22 00:39, Yu Kuai wrote:
> Always wake up 'wake_batch' threads will intensify competition and
> split io won't be issued continuously. Now that how many tags is required
> is recorded for huge io, it's safe to wake up baed on required tags.
> 
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
> ---
>   lib/sbitmap.c | 22 +++++++++++++++++++++-
>   1 file changed, 21 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/sbitmap.c b/lib/sbitmap.c
> index 8d01e02ea4b1..eac9fa5c2b4d 100644
> --- a/lib/sbitmap.c
> +++ b/lib/sbitmap.c
> @@ -614,6 +614,26 @@ static inline void sbq_update_preemption(struct sbitmap_queue *sbq,
>   	WRITE_ONCE(sbq->force_tag_preemption, force);
>   }
>   
> +static unsigned int get_wake_nr(struct sbq_wait_state *ws, unsigned int nr_tags)

Consider renaming "get_wake_nr()" into "nr_to_wake_up()".

> +{
> +	struct sbq_wait *wait;
> +	struct wait_queue_entry *entry;
> +	unsigned int nr = 1;
> +
> +	spin_lock_irq(&ws->wait.lock);
> +	list_for_each_entry(entry, &ws->wait.head, entry) {
> +		wait = container_of(entry, struct sbq_wait, wait);
> +		if (nr_tags <= wait->nr_tags)
> +			break;
> +
> +		nr++;
> +		nr_tags -= wait->nr_tags;
> +	}
> +	spin_unlock_irq(&ws->wait.lock);
> +
> +	return nr;
> +}
> +
>   static bool __sbq_wake_up(struct sbitmap_queue *sbq)
>   {
>   	struct sbq_wait_state *ws;
> @@ -648,7 +668,7 @@ static bool __sbq_wake_up(struct sbitmap_queue *sbq)
>   	smp_mb__before_atomic();
>   	atomic_set(&ws->wait_cnt, wake_batch);
>   	sbq_update_preemption(sbq, wake_batch);
> -	wake_up_nr(&ws->wait, wake_batch);
> +	wake_up_nr(&ws->wait, get_wake_nr(ws, wake_batch));
>   
>   	return true;
>   }

ws->wait.lock is unlocked after the number of threads to wake up has 
been computed and is locked again by wake_up_nr(). The ws->wait.head 
list may be modified after get_wake_nr() returns and before wake_up_nr() 
is called. Isn't that a race condition?

Thanks,

Bart.
Bart Van Assche April 8, 2022, 9:13 p.m. UTC | #2
On 4/8/22 00:39, Yu Kuai wrote:
> +static unsigned int get_wake_nr(struct sbq_wait_state *ws, unsigned int nr_tags)
> +{
> +	struct sbq_wait *wait;
> +	struct wait_queue_entry *entry;
> +	unsigned int nr = 1;
> +
> +	spin_lock_irq(&ws->wait.lock);
> +	list_for_each_entry(entry, &ws->wait.head, entry) {
> +		wait = container_of(entry, struct sbq_wait, wait);
> +		if (nr_tags <= wait->nr_tags)
> +			break;
> +
> +		nr++;
> +		nr_tags -= wait->nr_tags;
> +	}
> +	spin_unlock_irq(&ws->wait.lock);
> +
> +	return nr;
> +}
> +
>   static bool __sbq_wake_up(struct sbitmap_queue *sbq)
>   {
>   	struct sbq_wait_state *ws;
> @@ -648,7 +668,7 @@ static bool __sbq_wake_up(struct sbitmap_queue *sbq)
>   	smp_mb__before_atomic();
>   	atomic_set(&ws->wait_cnt, wake_batch);
>   	sbq_update_preemption(sbq, wake_batch);
> -	wake_up_nr(&ws->wait, wake_batch);
> +	wake_up_nr(&ws->wait, get_wake_nr(ws, wake_batch));
>   
>   	return true;
>   }

An additional comment: my understanding is that __sbq_wake_up() should 
wake up exactly `wake_batch` waiters. The above patch changes that into 
waking up at most `wake_batch` waiters. I think that's wrong.

Bart.
Yu Kuai April 9, 2022, 2:17 a.m. UTC | #3
在 2022/04/09 5:13, Bart Van Assche 写道:
> On 4/8/22 00:39, Yu Kuai wrote:
>> +static unsigned int get_wake_nr(struct sbq_wait_state *ws, unsigned 
>> int nr_tags)
>> +{
>> +    struct sbq_wait *wait;
>> +    struct wait_queue_entry *entry;
>> +    unsigned int nr = 1;
>> +
>> +    spin_lock_irq(&ws->wait.lock);
>> +    list_for_each_entry(entry, &ws->wait.head, entry) {
>> +        wait = container_of(entry, struct sbq_wait, wait);
>> +        if (nr_tags <= wait->nr_tags)
>> +            break;
>> +
>> +        nr++;
>> +        nr_tags -= wait->nr_tags;
>> +    }
>> +    spin_unlock_irq(&ws->wait.lock);
>> +
>> +    return nr;
>> +}
>> +
>>   static bool __sbq_wake_up(struct sbitmap_queue *sbq)
>>   {
>>       struct sbq_wait_state *ws;
>> @@ -648,7 +668,7 @@ static bool __sbq_wake_up(struct sbitmap_queue *sbq)
>>       smp_mb__before_atomic();
>>       atomic_set(&ws->wait_cnt, wake_batch);
>>       sbq_update_preemption(sbq, wake_batch);
>> -    wake_up_nr(&ws->wait, wake_batch);
>> +    wake_up_nr(&ws->wait, get_wake_nr(ws, wake_batch));
>>       return true;
>>   }
> 
> An additional comment: my understanding is that __sbq_wake_up() should 
> wake up exactly `wake_batch` waiters. The above patch changes that into 
> waking up at most `wake_batch` waiters. I think that's wrong.
Hi,

I think the reason to wake up 'wake_batch' waiters is to make sure
wakers will use up 'wake_batch' tags that is just freed, because each
wakers should aquire at least one tag. Thus I think if we can make sure
wakers will use up 'wake_batch' tags, it's ok to wake up less waiters.

Please kindly correct me if I'm wrong.

Thanks,
Kuai
> 
> Bart.
> .
>
Yu Kuai April 9, 2022, 2:19 a.m. UTC | #4
在 2022/04/08 22:31, Bart Van Assche 写道:
> On 4/8/22 00:39, Yu Kuai wrote:
>> Always wake up 'wake_batch' threads will intensify competition and
>> split io won't be issued continuously. Now that how many tags is required
>> is recorded for huge io, it's safe to wake up baed on required tags.
>>
>> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
>> ---
>>   lib/sbitmap.c | 22 +++++++++++++++++++++-
>>   1 file changed, 21 insertions(+), 1 deletion(-)
>>
>> diff --git a/lib/sbitmap.c b/lib/sbitmap.c
>> index 8d01e02ea4b1..eac9fa5c2b4d 100644
>> --- a/lib/sbitmap.c
>> +++ b/lib/sbitmap.c
>> @@ -614,6 +614,26 @@ static inline void sbq_update_preemption(struct 
>> sbitmap_queue *sbq,
>>       WRITE_ONCE(sbq->force_tag_preemption, force);
>>   }
>> +static unsigned int get_wake_nr(struct sbq_wait_state *ws, unsigned 
>> int nr_tags)
> 
> Consider renaming "get_wake_nr()" into "nr_to_wake_up()".
> 
>> +{
>> +    struct sbq_wait *wait;
>> +    struct wait_queue_entry *entry;
>> +    unsigned int nr = 1;
>> +
>> +    spin_lock_irq(&ws->wait.lock);
>> +    list_for_each_entry(entry, &ws->wait.head, entry) {
>> +        wait = container_of(entry, struct sbq_wait, wait);
>> +        if (nr_tags <= wait->nr_tags)
>> +            break;
>> +
>> +        nr++;
>> +        nr_tags -= wait->nr_tags;
>> +    }
>> +    spin_unlock_irq(&ws->wait.lock);
>> +
>> +    return nr;
>> +}
>> +
>>   static bool __sbq_wake_up(struct sbitmap_queue *sbq)
>>   {
>>       struct sbq_wait_state *ws;
>> @@ -648,7 +668,7 @@ static bool __sbq_wake_up(struct sbitmap_queue *sbq)
>>       smp_mb__before_atomic();
>>       atomic_set(&ws->wait_cnt, wake_batch);
>>       sbq_update_preemption(sbq, wake_batch);
>> -    wake_up_nr(&ws->wait, wake_batch);
>> +    wake_up_nr(&ws->wait, get_wake_nr(ws, wake_batch));
>>       return true;
>>   }
> 
> ws->wait.lock is unlocked after the number of threads to wake up has 
> been computed and is locked again by wake_up_nr(). The ws->wait.head 
> list may be modified after get_wake_nr() returns and before wake_up_nr() 
> is called. Isn't that a race condition?
Hi,

That is a race condition, I was hoping that the problem patch 5 fixed
can cover this.

Thanks,
Kuai
> 
> Thanks,
> 
> Bart.
> .
>
Bart Van Assche April 9, 2022, 4:16 a.m. UTC | #5
On 4/8/22 19:17, yukuai (C) wrote:
> I think the reason to wake up 'wake_batch' waiters is to make sure
> wakers will use up 'wake_batch' tags that is just freed, because each
> wakers should aquire at least one tag. Thus I think if we can make sure
> wakers will use up 'wake_batch' tags, it's ok to wake up less waiters.

Hmm ... I think it's up to you to (a) explain this behavior change in 
detail in the commit message and (b) to prove that this behavior change 
won't cause trouble (I guess this change will cause trouble).

Thanks,

Bart.
Yu Kuai April 9, 2022, 7:01 a.m. UTC | #6
在 2022/04/09 12:16, Bart Van Assche 写道:
> On 4/8/22 19:17, yukuai (C) wrote:
>> I think the reason to wake up 'wake_batch' waiters is to make sure
>> wakers will use up 'wake_batch' tags that is just freed, because each
>> wakers should aquire at least one tag. Thus I think if we can make sure
>> wakers will use up 'wake_batch' tags, it's ok to wake up less waiters.
> 
> Hmm ... I think it's up to you to (a) explain this behavior change in 
> detail in the commit message and (b) to prove that this behavior change 
> won't cause trouble (I guess this change will cause trouble).

Hi, Bart

Sorry that the commit message doesn't explain clearly.

There are only two situations that wakers will be less than 'wake_batch'
after this patch:

(a) some wakers will acquire multipul tags, as I mentioned above, this
is ok because wakers will use up 'wake_batch' tags.

(b) the total number of waiters is less than 'wake_batch', this is
problematic if tag preemption is disabled, because io concurrency will
be declined.(patch 5 should fix the problem)

For the race that new threads are waited after get_wake_nr() and before
wake_up_nr() in situation (b), I can't figure out how this can be
problematic, however, this can be optimized by triggering additional
wake up:

@@ -623,15 +623,17 @@ static unsigned int get_wake_nr(struct 
sbq_wait_state *ws, unsigned int nr_tags)
         spin_lock_irq(&ws->wait.lock);
         list_for_each_entry(entry, &ws->wait.head, entry) {
                 wait = container_of(entry, struct sbq_wait, wait);
-               if (nr_tags <= wait->nr_tags)
+               if (nr_tags <= wait->nr_tags) {
+                       nr_tags = 0;
                         break;
+               }

                 nr++;
                 nr_tags -= wait->nr_tags;
         }
         spin_unlock_irq(&ws->wait.lock);

-       return nr;
+       return nr + nr_tags;
  }

What do you think?

Thanks,
Kuai

> 
> Thanks,
> 
> Bart.
> .
>
Bart Van Assche April 12, 2022, 3:20 a.m. UTC | #7
On 4/9/22 00:01, yukuai (C) wrote:
> For the race that new threads are waited after get_wake_nr() and before
> wake_up_nr() in situation (b), I can't figure out how this can be
> problematic [ ... ]

If the atomic_dec_return() statement in __sbq_wake_up() returns a value 
that is less than or equal to zero, wake_batch waiters should be woken 
up or I/O will hang until sbitmap_queue_wake_all() is called. That last 
function should not be called unless there is no alternative.

Thanks,

Bart.
diff mbox series

Patch

diff --git a/lib/sbitmap.c b/lib/sbitmap.c
index 8d01e02ea4b1..eac9fa5c2b4d 100644
--- a/lib/sbitmap.c
+++ b/lib/sbitmap.c
@@ -614,6 +614,26 @@  static inline void sbq_update_preemption(struct sbitmap_queue *sbq,
 	WRITE_ONCE(sbq->force_tag_preemption, force);
 }
 
+static unsigned int get_wake_nr(struct sbq_wait_state *ws, unsigned int nr_tags)
+{
+	struct sbq_wait *wait;
+	struct wait_queue_entry *entry;
+	unsigned int nr = 1;
+
+	spin_lock_irq(&ws->wait.lock);
+	list_for_each_entry(entry, &ws->wait.head, entry) {
+		wait = container_of(entry, struct sbq_wait, wait);
+		if (nr_tags <= wait->nr_tags)
+			break;
+
+		nr++;
+		nr_tags -= wait->nr_tags;
+	}
+	spin_unlock_irq(&ws->wait.lock);
+
+	return nr;
+}
+
 static bool __sbq_wake_up(struct sbitmap_queue *sbq)
 {
 	struct sbq_wait_state *ws;
@@ -648,7 +668,7 @@  static bool __sbq_wake_up(struct sbitmap_queue *sbq)
 	smp_mb__before_atomic();
 	atomic_set(&ws->wait_cnt, wake_batch);
 	sbq_update_preemption(sbq, wake_batch);
-	wake_up_nr(&ws->wait, wake_batch);
+	wake_up_nr(&ws->wait, get_wake_nr(ws, wake_batch));
 
 	return true;
 }