diff mbox series

[8/8] mm/zswap: Use local lock to protect per-CPU data

Message ID 20200519201912.1564477-9-bigeasy@linutronix.de (mailing list archive)
State New, archived
Headers show
Series None | expand

Commit Message

Sebastian Andrzej Siewior May 19, 2020, 8:19 p.m. UTC
From: "Luis Claudio R. Goncalves" <lgoncalv@redhat.com>

zwap uses per-CPU compression. The per-CPU data pointer is acquired with
get_cpu_ptr() which implicitly disables preemption. It allocates
memory inside the preempt disabled region which conflicts with the
PREEMPT_RT semantics.

Replace the implicit preemption control with an explicit local lock.
This allows RT kernels to substitute it with a real per CPU lock, which
serializes the access but keeps the code section preemptible. On non RT
kernels this maps to preempt_disable() as before, i.e. no functional
change.

[bigeasy: Use local_lock(), additional hunks, patch description]

Cc: Seth Jennings <sjenning@redhat.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Vitaly Wool <vitaly.wool@konsulko.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org
Signed-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 mm/zswap.c | 23 ++++++++++++++---------
 1 file changed, 14 insertions(+), 9 deletions(-)

Comments

Song Bao Hua (Barry Song) May 19, 2020, 9:46 p.m. UTC | #1
> From: "Luis Claudio R. Goncalves" <lgoncalv@redhat.com>

> zwap uses per-CPU compression. The per-CPU data pointer is acquired with
> get_cpu_ptr() which implicitly disables preemption. It allocates memory inside the preempt disabled region which conflicts with the PREEMPT_RT semantics.

> Replace the implicit preemption control with an explicit local lock.
> This allows RT kernels to substitute it with a real per CPU lock, which serializes the access but keeps the code section preemptible. On non RT kernels this maps to preempt_disable() as before, i.e. no functional change.

Hi Luis,
In the below patch, in order to use the acomp APIs to leverage the power of hardware compressors. I have moved to mutex:
https://marc.info/?l=linux-crypto-vger&m=158941285830302&w=2
https://marc.info/?l=linux-crypto-vger&m=158941287930311&w=2

so once we get some progress on that one, I guess we don't need a special patch for RT any more.

> [bigeasy: Use local_lock(), additional hunks, patch description]

> Cc: Seth Jennings <sjenning@redhat.com>
> Cc: Dan Streetman <ddstreet@ieee.org>
> Cc: Vitaly Wool <vitaly.wool@konsulko.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: linux-mm@kvack.org
> Signed-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---
>  mm/zswap.c | 23 ++++++++++++++---------
>  1 file changed, 14 insertions(+), 9 deletions(-)

Thanks
Barry
Sebastian Andrzej Siewior May 20, 2020, 10:26 a.m. UTC | #2
On 2020-05-19 21:46:06 [+0000], Song Bao Hua wrote:
> Hi Luis,
> In the below patch, in order to use the acomp APIs to leverage the power of hardware compressors. I have moved to mutex:
> https://marc.info/?l=linux-crypto-vger&m=158941285830302&w=2
> https://marc.info/?l=linux-crypto-vger&m=158941287930311&w=2
> 
> so once we get some progress on that one, I guess we don't need a special patch for RT any more.

If you convert this way from the current concept then we could drop it
from the series.
The second patch shows the following hunk:

|@@ -1075,11 +1124,20 @@ static int zswap_frontswap_store(unsigned type, pgoff_t offset,
| 
| 	/* compress */
| 	dst = get_cpu_var(zswap_dstmem);
|	acomp_ctx = *this_cpu_ptr(entry->pool->acomp_ctx);
|	put_cpu_var(zswap_dstmem);

So here you get per-CPU version of `dst' and `acomp_ctx' and then allow
preemption again.

|	mutex_lock(&acomp_ctx->mutex);
|
|	src = kmap(page);
|	sg_init_one(&input, src, PAGE_SIZE);
|	/* zswap_dstmem is of size (PAGE_SIZE * 2). Reflect same in sg_list */
|	sg_init_one(&output, dst, PAGE_SIZE * 2);

and here you use `dst' and `acomp_ctx' after the preempt_disable() has
been dropped so I don't understand why you used get_cpu_var(). It is
either protected by the mutex and doesn't require get_cpu_var() or it
isn't (and should have additional protection).

|	acomp_request_set_params(acomp_ctx->req, &input, &output, PAGE_SIZE, dlen);
|	ret = crypto_wait_req(crypto_acomp_compress(acomp_ctx->req), &acomp_ctx->wait);
|	dlen = acomp_ctx->req->dlen;
|	kunmap(page);
|
| 	if (ret) {
| 		ret = -EINVAL;
| 		goto put_dstmem;
|

Sebastian
Song Bao Hua (Barry Song) May 20, 2020, 11:13 a.m. UTC | #3
> On 2020-05-19 21:46:06 [+0000], Song Bao Hua wrote:
> > Hi Luis,
> > In the below patch, in order to use the acomp APIs to leverage the power of
> hardware compressors. I have moved to mutex:
> > https://marc.info/?l=linux-crypto-vger&m=158941285830302&w=2
> > https://marc.info/?l=linux-crypto-vger&m=158941287930311&w=2
> >
> > so once we get some progress on that one, I guess we don't need a special
> patch for RT any more.
> 
> If you convert this way from the current concept then we could drop it from
> the series.
> The second patch shows the following hunk:
> 
> |@@ -1075,11 +1124,20 @@ static int zswap_frontswap_store(unsigned
> type,
> |pgoff_t offset,
> |
> | 	/* compress */
> | 	dst = get_cpu_var(zswap_dstmem);
> |	acomp_ctx = *this_cpu_ptr(entry->pool->acomp_ctx);
> |	put_cpu_var(zswap_dstmem);
> 
> So here you get per-CPU version of `dst' and `acomp_ctx' and then allow
> preemption again.
> 
> |	mutex_lock(&acomp_ctx->mutex);
> |
> |	src = kmap(page);
> |	sg_init_one(&input, src, PAGE_SIZE);
> |	/* zswap_dstmem is of size (PAGE_SIZE * 2). Reflect same in sg_list */
> |	sg_init_one(&output, dst, PAGE_SIZE * 2);
> 
> and here you use `dst' and `acomp_ctx' after the preempt_disable() has been
> dropped so I don't understand why you used get_cpu_var(). It is either
> protected by the mutex and doesn't require get_cpu_var() or it isn't (and
> should have additional protection).

The old code was like:
For each cpu, there is one percpu comp and one percpu pages for compression destination - zswap_dstmem.
For example, on cpu1, once you begin to compress, you hold the percpu comp and percpu destination buffer. Meanwhile, preemption is disabled. So decompression won't be able to work at the same core in parallel. And two compressions won't be able to do at the same core in parallel as well. At the same time, the thread won't be able to migrate to another core. Other cores might can do compression/decompression in parallel

The new code is like:
For each cpu, there is still one percpu acomp-ctx and one percpu pages for compression destination. Here acomp replaces comp, and acomp requires sleep during compressing and decompressing.
For example, on cpu1, once you begin to compress, you hold the percpu acomp-ctx and percpu destination buffer of CPU1, the below code makes sure you get the acomp and dstmem from the same core by disabling preemption with get_cpu_var and put_cpu_var:
dst = get_cpu_var(zswap_dstmem);
acomp_ctx = *this_cpu_ptr(entry->pool->acomp_ctx);
put_cpu_var(zswap_dstmem);

then there are two cases:

1. after getting dst and acomp_ctx of cpu1, you might always work in cpu1, the mutex in per-cpu acomp-ctx will guarantee two compressions won't do at the same core in parallel, and it also makes certain compression and decompression won't do at the same core in parallel. Everything is like before.

2. after getting dst and acomp_ctx of cpu1, you might migrate to cpu2, but even you move to cpu2, you are still holding the mutex of cpu1's acomp-ctx.
If at that time, cpu1 has another request to do compression, it will be blocked by the mutex held by cpu2.
If at that time, cpu1 wants to do decompression, it wil be blocked by the mutex held by cpu2.

Everything is like before. No matter which core you have migrated to, once you hold the mutex of core N, another compression/decompression who wants to hold the mutex of core N will be blocked. So mostly, if you have M cores, you can do M compression/decompression in parallel like before.

My basic idea is keeping the work model unchanged like before.

> 
> |	acomp_request_set_params(acomp_ctx->req, &input, &output,
> PAGE_SIZE, dlen);
> |	ret = crypto_wait_req(crypto_acomp_compress(acomp_ctx->req),
> &acomp_ctx->wait);
> |	dlen = acomp_ctx->req->dlen;
> |	kunmap(page);
> |
> | 	if (ret) {
> | 		ret = -EINVAL;
> | 		goto put_dstmem;
> |
> 
> Sebastian

Barry
Sebastian Andrzej Siewior May 20, 2020, 11:57 a.m. UTC | #4
On 2020-05-20 11:13:31 [+0000], Song Bao Hua wrote:
> For example, on cpu1, once you begin to compress, you hold the percpu acomp-ctx and percpu destination buffer of CPU1, the below code makes sure you get the acomp and dstmem from the same core by disabling preemption with get_cpu_var and put_cpu_var:
> dst = get_cpu_var(zswap_dstmem);
> acomp_ctx = *this_cpu_ptr(entry->pool->acomp_ctx);
> put_cpu_var(zswap_dstmem);
> 
> then there are two cases:
> 
> 1. after getting dst and acomp_ctx of cpu1, you might always work in cpu1, the mutex in per-cpu acomp-ctx will guarantee two compressions won't do at the same core in parallel, and it also makes certain compression and decompression won't do at the same core in parallel. Everything is like before.

For readability I suggest not to mix per-CPU and per-CTX variables like
that. If zswap_dstmem is protected by the mutex, please make it part of
acomp_ctx.

Sebastian
Song Bao Hua (Barry Song) May 20, 2020, 12:01 p.m. UTC | #5
> Subject: Re: [PATCH 8/8] mm/zswap: Use local lock to protect per-CPU data
> 
> On 2020-05-20 11:13:31 [+0000], Song Bao Hua wrote:
> > For example, on cpu1, once you begin to compress, you hold the percpu
> acomp-ctx and percpu destination buffer of CPU1, the below code makes sure
> you get the acomp and dstmem from the same core by disabling preemption
> with get_cpu_var and put_cpu_var:
> > dst = get_cpu_var(zswap_dstmem);
> > acomp_ctx = *this_cpu_ptr(entry->pool->acomp_ctx);
> > put_cpu_var(zswap_dstmem);
> >
> > then there are two cases:
> >
> > 1. after getting dst and acomp_ctx of cpu1, you might always work in cpu1,
> the mutex in per-cpu acomp-ctx will guarantee two compressions won't do at
> the same core in parallel, and it also makes certain compression and
> decompression won't do at the same core in parallel. Everything is like before.
> 
> For readability I suggest not to mix per-CPU and per-CTX variables like that. If
> zswap_dstmem is protected by the mutex, please make it part of acomp_ctx.
> 

Yep. it seems this will avoid the further explanations to more people who will read the code :-)

> Sebastian

Barry
diff mbox series

Patch

diff --git a/mm/zswap.c b/mm/zswap.c
index fbb782924ccc5..1db2ad941e501 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -18,6 +18,7 @@ 
 #include <linux/highmem.h>
 #include <linux/slab.h>
 #include <linux/spinlock.h>
+#include <linux/locallock.h>
 #include <linux/types.h>
 #include <linux/atomic.h>
 #include <linux/frontswap.h>
@@ -388,6 +389,8 @@  static struct zswap_entry *zswap_entry_find_get(struct rb_root *root,
 * per-cpu code
 **********************************/
 static DEFINE_PER_CPU(u8 *, zswap_dstmem);
+/* Used for zswap_dstmem and tfm */
+static DEFINE_LOCAL_LOCK(zswap_cpu_lock);
 
 static int zswap_dstmem_prepare(unsigned int cpu)
 {
@@ -919,10 +922,11 @@  static int zswap_writeback_entry(struct zpool *pool, unsigned long handle)
 		dlen = PAGE_SIZE;
 		src = (u8 *)zhdr + sizeof(struct zswap_header);
 		dst = kmap_atomic(page);
-		tfm = *get_cpu_ptr(entry->pool->tfm);
+		local_lock(zswap_cpu_lock);
+		tfm = *this_cpu_ptr(entry->pool->tfm);
 		ret = crypto_comp_decompress(tfm, src, entry->length,
 					     dst, &dlen);
-		put_cpu_ptr(entry->pool->tfm);
+		local_unlock(zswap_cpu_lock);
 		kunmap_atomic(dst);
 		BUG_ON(ret);
 		BUG_ON(dlen != PAGE_SIZE);
@@ -1074,12 +1078,12 @@  static int zswap_frontswap_store(unsigned type, pgoff_t offset,
 	}
 
 	/* compress */
-	dst = get_cpu_var(zswap_dstmem);
-	tfm = *get_cpu_ptr(entry->pool->tfm);
+	local_lock(zswap_cpu_lock);
+	dst = *this_cpu_ptr(&zswap_dstmem);
+	tfm = *this_cpu_ptr(entry->pool->tfm);
 	src = kmap_atomic(page);
 	ret = crypto_comp_compress(tfm, src, PAGE_SIZE, dst, &dlen);
 	kunmap_atomic(src);
-	put_cpu_ptr(entry->pool->tfm);
 	if (ret) {
 		ret = -EINVAL;
 		goto put_dstmem;
@@ -1103,7 +1107,7 @@  static int zswap_frontswap_store(unsigned type, pgoff_t offset,
 	memcpy(buf, &zhdr, hlen);
 	memcpy(buf + hlen, dst, dlen);
 	zpool_unmap_handle(entry->pool->zpool, handle);
-	put_cpu_var(zswap_dstmem);
+	local_unlock(zswap_cpu_lock);
 
 	/* populate entry */
 	entry->offset = offset;
@@ -1131,7 +1135,7 @@  static int zswap_frontswap_store(unsigned type, pgoff_t offset,
 	return 0;
 
 put_dstmem:
-	put_cpu_var(zswap_dstmem);
+	local_unlock(zswap_cpu_lock);
 	zswap_pool_put(entry->pool);
 freepage:
 	zswap_entry_cache_free(entry);
@@ -1176,9 +1180,10 @@  static int zswap_frontswap_load(unsigned type, pgoff_t offset,
 	if (zpool_evictable(entry->pool->zpool))
 		src += sizeof(struct zswap_header);
 	dst = kmap_atomic(page);
-	tfm = *get_cpu_ptr(entry->pool->tfm);
+	local_lock(zswap_cpu_lock);
+	tfm = *this_cpu_ptr(entry->pool->tfm);
 	ret = crypto_comp_decompress(tfm, src, entry->length, dst, &dlen);
-	put_cpu_ptr(entry->pool->tfm);
+	local_unlock(zswap_cpu_lock);
 	kunmap_atomic(dst);
 	zpool_unmap_handle(entry->pool->zpool, entry->handle);
 	BUG_ON(ret);