mm/swap: piggyback lru_add_drain_all() calls
diff mbox series

Message ID 157018386639.6110.3058050375244904201.stgit@buzz
State New
Headers show
Series
  • mm/swap: piggyback lru_add_drain_all() calls
Related show

Commit Message

Konstantin Khlebnikov Oct. 4, 2019, 10:11 a.m. UTC
This is very slow operation. There is no reason to do it again if somebody
else already drained all per-cpu vectors after we waited for lock.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
---
 mm/swap.c |   13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

Comments

Matthew Wilcox Oct. 4, 2019, 12:10 p.m. UTC | #1
On Fri, Oct 04, 2019 at 01:11:06PM +0300, Konstantin Khlebnikov wrote:
> This is very slow operation. There is no reason to do it again if somebody
> else already drained all per-cpu vectors after we waited for lock.
> +	seq = raw_read_seqcount_latch(&seqcount);
> +
>  	mutex_lock(&lock);
> +
> +	/* Piggyback on drain done by somebody else. */
> +	if (__read_seqcount_retry(&seqcount, seq))
> +		goto done;
> +
> +	raw_write_seqcount_latch(&seqcount);
> +

Do we really need the seqcount to do this?  Wouldn't a mutex_trylock()
have the same effect?
Konstantin Khlebnikov Oct. 4, 2019, 12:24 p.m. UTC | #2
On 04/10/2019 15.10, Matthew Wilcox wrote:
> On Fri, Oct 04, 2019 at 01:11:06PM +0300, Konstantin Khlebnikov wrote:
>> This is very slow operation. There is no reason to do it again if somebody
>> else already drained all per-cpu vectors after we waited for lock.
>> +	seq = raw_read_seqcount_latch(&seqcount);
>> +
>>   	mutex_lock(&lock);
>> +
>> +	/* Piggyback on drain done by somebody else. */
>> +	if (__read_seqcount_retry(&seqcount, seq))
>> +		goto done;
>> +
>> +	raw_write_seqcount_latch(&seqcount);
>> +
> 
> Do we really need the seqcount to do this?  Wouldn't a mutex_trylock()
> have the same effect?
> 

No, this is completely different semantics.

Operation could be safely skipped only if somebody else started and
finished drain after current task called this function.
Michal Hocko Oct. 4, 2019, 12:27 p.m. UTC | #3
On Fri 04-10-19 05:10:17, Matthew Wilcox wrote:
> On Fri, Oct 04, 2019 at 01:11:06PM +0300, Konstantin Khlebnikov wrote:
> > This is very slow operation. There is no reason to do it again if somebody
> > else already drained all per-cpu vectors after we waited for lock.
> > +	seq = raw_read_seqcount_latch(&seqcount);
> > +
> >  	mutex_lock(&lock);
> > +
> > +	/* Piggyback on drain done by somebody else. */
> > +	if (__read_seqcount_retry(&seqcount, seq))
> > +		goto done;
> > +
> > +	raw_write_seqcount_latch(&seqcount);
> > +
> 
> Do we really need the seqcount to do this?  Wouldn't a mutex_trylock()
> have the same effect?

Yeah, this makes sense. From correctness point of view it should be ok
because no caller can expect that per-cpu pvecs are empty on return.
This might have some runtime effects that some paths might retry more -
e.g. offlining path drains pcp pvces before migrating the range away, if
there are pages still waiting for a worker to drain them then the
migration would fail and we would retry. But this not a correctness
issue.
Konstantin Khlebnikov Oct. 4, 2019, 12:32 p.m. UTC | #4
On 04/10/2019 15.27, Michal Hocko wrote:
> On Fri 04-10-19 05:10:17, Matthew Wilcox wrote:
>> On Fri, Oct 04, 2019 at 01:11:06PM +0300, Konstantin Khlebnikov wrote:
>>> This is very slow operation. There is no reason to do it again if somebody
>>> else already drained all per-cpu vectors after we waited for lock.
>>> +	seq = raw_read_seqcount_latch(&seqcount);
>>> +
>>>   	mutex_lock(&lock);
>>> +
>>> +	/* Piggyback on drain done by somebody else. */
>>> +	if (__read_seqcount_retry(&seqcount, seq))
>>> +		goto done;
>>> +
>>> +	raw_write_seqcount_latch(&seqcount);
>>> +
>>
>> Do we really need the seqcount to do this?  Wouldn't a mutex_trylock()
>> have the same effect?
> 
> Yeah, this makes sense. From correctness point of view it should be ok
> because no caller can expect that per-cpu pvecs are empty on return.
> This might have some runtime effects that some paths might retry more -
> e.g. offlining path drains pcp pvces before migrating the range away, if
> there are pages still waiting for a worker to drain them then the
> migration would fail and we would retry. But this not a correctness
> issue.
> 

Caller might expect that pages added by him before are drained.
Exiting after mutex_trylock() will not guarantee that.

For example POSIX_FADV_DONTNEED uses that.
Michal Hocko Oct. 4, 2019, 12:37 p.m. UTC | #5
On Fri 04-10-19 15:32:01, Konstantin Khlebnikov wrote:
> 
> 
> On 04/10/2019 15.27, Michal Hocko wrote:
> > On Fri 04-10-19 05:10:17, Matthew Wilcox wrote:
> > > On Fri, Oct 04, 2019 at 01:11:06PM +0300, Konstantin Khlebnikov wrote:
> > > > This is very slow operation. There is no reason to do it again if somebody
> > > > else already drained all per-cpu vectors after we waited for lock.
> > > > +	seq = raw_read_seqcount_latch(&seqcount);
> > > > +
> > > >   	mutex_lock(&lock);
> > > > +
> > > > +	/* Piggyback on drain done by somebody else. */
> > > > +	if (__read_seqcount_retry(&seqcount, seq))
> > > > +		goto done;
> > > > +
> > > > +	raw_write_seqcount_latch(&seqcount);
> > > > +
> > > 
> > > Do we really need the seqcount to do this?  Wouldn't a mutex_trylock()
> > > have the same effect?
> > 
> > Yeah, this makes sense. From correctness point of view it should be ok
> > because no caller can expect that per-cpu pvecs are empty on return.
> > This might have some runtime effects that some paths might retry more -
> > e.g. offlining path drains pcp pvces before migrating the range away, if
> > there are pages still waiting for a worker to drain them then the
> > migration would fail and we would retry. But this not a correctness
> > issue.
> > 
> 
> Caller might expect that pages added by him before are drained.
> Exiting after mutex_trylock() will not guarantee that.
> 
> For example POSIX_FADV_DONTNEED uses that.

OK, I was not aware of this case. Please make sure to document that in
the changelog and a comment in the code wouldn't hurt either. It would
certainly explain more thatn "Piggyback on drain done by somebody
else.".

Thanks!

Patch
diff mbox series

diff --git a/mm/swap.c b/mm/swap.c
index 38c3fa4308e2..6203918e1316 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -708,9 +708,10 @@  static void lru_add_drain_per_cpu(struct work_struct *dummy)
  */
 void lru_add_drain_all(void)
 {
+	static seqcount_t seqcount = SEQCNT_ZERO(seqcount);
 	static DEFINE_MUTEX(lock);
 	static struct cpumask has_work;
-	int cpu;
+	int cpu, seq;
 
 	/*
 	 * Make sure nobody triggers this path before mm_percpu_wq is fully
@@ -719,7 +720,16 @@  void lru_add_drain_all(void)
 	if (WARN_ON(!mm_percpu_wq))
 		return;
 
+	seq = raw_read_seqcount_latch(&seqcount);
+
 	mutex_lock(&lock);
+
+	/* Piggyback on drain done by somebody else. */
+	if (__read_seqcount_retry(&seqcount, seq))
+		goto done;
+
+	raw_write_seqcount_latch(&seqcount);
+
 	cpumask_clear(&has_work);
 
 	for_each_online_cpu(cpu) {
@@ -740,6 +750,7 @@  void lru_add_drain_all(void)
 	for_each_cpu(cpu, &has_work)
 		flush_work(&per_cpu(lru_add_drain_work, cpu));
 
+done:
 	mutex_unlock(&lock);
 }
 #else