Message ID | 20241113052413.157039-1-kanchana.p.sridhar@intel.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [v1] mm: zswap: Fix a potential memory leak in zswap_decompress(). | expand |
On Tue, Nov 12, 2024 at 9:24 PM Kanchana P Sridhar <kanchana.p.sridhar@intel.com> wrote: > > This is a hotfix for a potential zpool memory leak that could result in > the existing zswap_decompress(): > > mutex_unlock(&acomp_ctx->mutex); > > if (src != acomp_ctx->buffer) > zpool_unmap_handle(zpool, entry->handle); > > Releasing the lock before the conditional does not protect the integrity of > "src", which is set earlier under the acomp_ctx mutex lock. This poses a > risk for the conditional behaving as intended, and consequently not > unmapping the zpool handle, which could cause a zswap zpool memory leak. > > This patch moves the mutex_unlock() to occur after the conditional and > subsequent zpool_unmap_handle(). This ensures that the value of "src" > obtained earlier, with the mutex locked, does not change. The commit log is too complicated and incorrect. It is talking about the stability of 'src', but that's a local variable on the stack anyway. It doesn't need protection. The problem is 'acomp_ctx->buffer' being reused and changed after the mutex is released. Leading to the check not being reliable. Please simplify this. > > Even though an actual memory leak was not observed, this fix seems like a > cleaner implementation. > > Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com> > Fixes: 9c500835f279 ("mm: zswap: fix kernel BUG in sg_init_one") > --- > mm/zswap.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/mm/zswap.c b/mm/zswap.c > index f6316b66fb23..58810fa8ff23 100644 > --- a/mm/zswap.c > +++ b/mm/zswap.c > @@ -986,10 +986,11 @@ static void zswap_decompress(struct zswap_entry *entry, struct folio *folio) > acomp_request_set_params(acomp_ctx->req, &input, &output, entry->length, PAGE_SIZE); > BUG_ON(crypto_wait_req(crypto_acomp_decompress(acomp_ctx->req), &acomp_ctx->wait)); > BUG_ON(acomp_ctx->req->dlen != PAGE_SIZE); > - mutex_unlock(&acomp_ctx->mutex); > > if (src != acomp_ctx->buffer) > zpool_unmap_handle(zpool, entry->handle); Actually now that I think more about it, I think this check isn't entirely safe, even under the lock. Is it possible that 'acomp_ctx->buffer' just happens to be equal to 'src' from a previous decompression at the same handle? In this case, we will also mistakenly skip the unmap. It would be more reliable to set a boolean variable if we copy to acomp_ctx->buffer and do the unmap, and check that flag here to check if the unmap was done or not. > + > + mutex_unlock(&acomp_ctx->mutex); > } > > /********************************* > > base-commit: 0e5bdedb39ded767bff4c6184225578595cee98c > -- > 2.27.0 >
> -----Original Message----- > From: Yosry Ahmed <yosryahmed@google.com> > Sent: Tuesday, November 12, 2024 9:35 PM > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com> > Cc: linux-kernel@vger.kernel.org; linux-mm@kvack.org; > hannes@cmpxchg.org; nphamcs@gmail.com; chengming.zhou@linux.dev; > usamaarif642@gmail.com; ryan.roberts@arm.com; Huang, Ying > <ying.huang@intel.com>; 21cnbao@gmail.com; akpm@linux-foundation.org; > Feghali, Wajdi K <wajdi.k.feghali@intel.com>; Gopal, Vinodh > <vinodh.gopal@intel.com> > Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in > zswap_decompress(). > > On Tue, Nov 12, 2024 at 9:24 PM Kanchana P Sridhar > <kanchana.p.sridhar@intel.com> wrote: > > > > This is a hotfix for a potential zpool memory leak that could result in > > the existing zswap_decompress(): > > > > mutex_unlock(&acomp_ctx->mutex); > > > > if (src != acomp_ctx->buffer) > > zpool_unmap_handle(zpool, entry->handle); > > > > Releasing the lock before the conditional does not protect the integrity of > > "src", which is set earlier under the acomp_ctx mutex lock. This poses a > > risk for the conditional behaving as intended, and consequently not > > unmapping the zpool handle, which could cause a zswap zpool memory > leak. > > > > This patch moves the mutex_unlock() to occur after the conditional and > > subsequent zpool_unmap_handle(). This ensures that the value of "src" > > obtained earlier, with the mutex locked, does not change. > > The commit log is too complicated and incorrect. It is talking about > the stability of 'src', but that's a local variable on the stack > anyway. It doesn't need protection. > > The problem is 'acomp_ctx->buffer' being reused and changed after the > mutex is released. Leading to the check not being reliable. Please > simplify this. Thanks Yosry. That's exactly what I meant, but I think the wording got confusing. The problem I was trying to fix is the acomp_ctx->buffer value changing after the lock is released. This could happen as a result of any other compress or decompress that acquires the lock. I will simplify and clarify accordingly. > > > > > Even though an actual memory leak was not observed, this fix seems like a > > cleaner implementation. > > > > Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com> > > Fixes: 9c500835f279 ("mm: zswap: fix kernel BUG in sg_init_one") > > --- > > mm/zswap.c | 3 ++- > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > diff --git a/mm/zswap.c b/mm/zswap.c > > index f6316b66fb23..58810fa8ff23 100644 > > --- a/mm/zswap.c > > +++ b/mm/zswap.c > > @@ -986,10 +986,11 @@ static void zswap_decompress(struct > zswap_entry *entry, struct folio *folio) > > acomp_request_set_params(acomp_ctx->req, &input, &output, entry- > >length, PAGE_SIZE); > > BUG_ON(crypto_wait_req(crypto_acomp_decompress(acomp_ctx- > >req), &acomp_ctx->wait)); > > BUG_ON(acomp_ctx->req->dlen != PAGE_SIZE); > > - mutex_unlock(&acomp_ctx->mutex); > > > > if (src != acomp_ctx->buffer) > > zpool_unmap_handle(zpool, entry->handle); > > Actually now that I think more about it, I think this check isn't > entirely safe, even under the lock. Is it possible that > 'acomp_ctx->buffer' just happens to be equal to 'src' from a previous > decompression at the same handle? In this case, we will also > mistakenly skip the unmap. If we move the mutex_unlock to happen after the conditional and unmap, shouldn't that be sufficient under all conditions? With the fix, "src" can take on only 2 values in this procedure: the mapped handle or acomp_ctx->buffer. The only ambiguity would be in the (unlikely?) case if the mapped zpool handle happens to be equal to acomp_ctx->buffer. Please let me know if I am missing anything. > > It would be more reliable to set a boolean variable if we copy to > acomp_ctx->buffer and do the unmap, and check that flag here to check > if the unmap was done or not. Sure, this could be done, but not sure if it is required. Please let me know if we still need the boolean variable in addition to moving the mutex_unlock(). Thanks, Kanchana > > > + > > + mutex_unlock(&acomp_ctx->mutex); > > } > > > > /********************************* > > > > base-commit: 0e5bdedb39ded767bff4c6184225578595cee98c > > -- > > 2.27.0 > >
On Tue, Nov 12, 2024 at 9:59 PM Sridhar, Kanchana P <kanchana.p.sridhar@intel.com> wrote: > > > > -----Original Message----- > > From: Yosry Ahmed <yosryahmed@google.com> > > Sent: Tuesday, November 12, 2024 9:35 PM > > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com> > > Cc: linux-kernel@vger.kernel.org; linux-mm@kvack.org; > > hannes@cmpxchg.org; nphamcs@gmail.com; chengming.zhou@linux.dev; > > usamaarif642@gmail.com; ryan.roberts@arm.com; Huang, Ying > > <ying.huang@intel.com>; 21cnbao@gmail.com; akpm@linux-foundation.org; > > Feghali, Wajdi K <wajdi.k.feghali@intel.com>; Gopal, Vinodh > > <vinodh.gopal@intel.com> > > Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in > > zswap_decompress(). > > > > On Tue, Nov 12, 2024 at 9:24 PM Kanchana P Sridhar > > <kanchana.p.sridhar@intel.com> wrote: > > > > > > This is a hotfix for a potential zpool memory leak that could result in > > > the existing zswap_decompress(): > > > > > > mutex_unlock(&acomp_ctx->mutex); > > > > > > if (src != acomp_ctx->buffer) > > > zpool_unmap_handle(zpool, entry->handle); > > > > > > Releasing the lock before the conditional does not protect the integrity of > > > "src", which is set earlier under the acomp_ctx mutex lock. This poses a > > > risk for the conditional behaving as intended, and consequently not > > > unmapping the zpool handle, which could cause a zswap zpool memory > > leak. > > > > > > This patch moves the mutex_unlock() to occur after the conditional and > > > subsequent zpool_unmap_handle(). This ensures that the value of "src" > > > obtained earlier, with the mutex locked, does not change. > > > > The commit log is too complicated and incorrect. It is talking about > > the stability of 'src', but that's a local variable on the stack > > anyway. It doesn't need protection. > > > > The problem is 'acomp_ctx->buffer' being reused and changed after the > > mutex is released. Leading to the check not being reliable. Please > > simplify this. > > Thanks Yosry. That's exactly what I meant, but I think the wording got > confusing. The problem I was trying to fix is the acomp_ctx->buffer > value changing after the lock is released. This could happen as a result of any > other compress or decompress that acquires the lock. I will simplify and > clarify accordingly. > > > > > > > > > Even though an actual memory leak was not observed, this fix seems like a > > > cleaner implementation. > > > > > > Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com> > > > Fixes: 9c500835f279 ("mm: zswap: fix kernel BUG in sg_init_one") > > > --- > > > mm/zswap.c | 3 ++- > > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > > > diff --git a/mm/zswap.c b/mm/zswap.c > > > index f6316b66fb23..58810fa8ff23 100644 > > > --- a/mm/zswap.c > > > +++ b/mm/zswap.c > > > @@ -986,10 +986,11 @@ static void zswap_decompress(struct > > zswap_entry *entry, struct folio *folio) > > > acomp_request_set_params(acomp_ctx->req, &input, &output, entry- > > >length, PAGE_SIZE); > > > BUG_ON(crypto_wait_req(crypto_acomp_decompress(acomp_ctx- > > >req), &acomp_ctx->wait)); > > > BUG_ON(acomp_ctx->req->dlen != PAGE_SIZE); > > > - mutex_unlock(&acomp_ctx->mutex); > > > > > > if (src != acomp_ctx->buffer) > > > zpool_unmap_handle(zpool, entry->handle); > > > > Actually now that I think more about it, I think this check isn't > > entirely safe, even under the lock. Is it possible that > > 'acomp_ctx->buffer' just happens to be equal to 'src' from a previous > > decompression at the same handle? In this case, we will also > > mistakenly skip the unmap. > > If we move the mutex_unlock to happen after the conditional and unmap, > shouldn't that be sufficient under all conditions? With the fix, "src" can > take on only 2 values in this procedure: the mapped handle or > acomp_ctx->buffer. The only ambiguity would be in the (unlikely?) case > if the mapped zpool handle happens to be equal to acomp_ctx->buffer. Yes, that's the scenario I mean. Specifically, in zswap_decompress(), we do not use 'acomp_ctx->buffer' so 'src' is equal to the mapped handle. But, 'acomp_ctx->buffer' happens to be equal to the same handle from a previous operation as we don't clear it. In this case, the 'src != acomp_ctx->buffer' check will be false, even though it should be true. This will result in an extra zpool_unmap_handle() call. I didn't look closely, but this seems like it can have a worse effect than leaking memory (e.g. there will be an extra __kunmap_atomic() call in zsmalloc, and we may end up corrupting a random handle). > > Please let me know if I am missing anything. > > > > > It would be more reliable to set a boolean variable if we copy to > > acomp_ctx->buffer and do the unmap, and check that flag here to check > > if the unmap was done or not. > > Sure, this could be done, but not sure if it is required. Please let me know > if we still need the boolean variable in addition to moving the mutex_unlock(). If we use a boolean, there is no need to move mutex_unlock(). The boolean will be a local variable on the stack that doesn't need protection. > > Thanks, > Kanchana > > > > > > + > > > + mutex_unlock(&acomp_ctx->mutex); > > > } > > > > > > /********************************* > > > > > > base-commit: 0e5bdedb39ded767bff4c6184225578595cee98c > > > -- > > > 2.27.0 > > >
Hi Yosry, > -----Original Message----- > From: Yosry Ahmed <yosryahmed@google.com> > Sent: Tuesday, November 12, 2024 10:22 PM > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com> > Cc: linux-kernel@vger.kernel.org; linux-mm@kvack.org; > hannes@cmpxchg.org; nphamcs@gmail.com; chengming.zhou@linux.dev; > usamaarif642@gmail.com; ryan.roberts@arm.com; Huang, Ying > <ying.huang@intel.com>; 21cnbao@gmail.com; akpm@linux-foundation.org; > Feghali, Wajdi K <wajdi.k.feghali@intel.com>; Gopal, Vinodh > <vinodh.gopal@intel.com> > Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in > zswap_decompress(). > > On Tue, Nov 12, 2024 at 9:59 PM Sridhar, Kanchana P > <kanchana.p.sridhar@intel.com> wrote: > > > > > > > -----Original Message----- > > > From: Yosry Ahmed <yosryahmed@google.com> > > > Sent: Tuesday, November 12, 2024 9:35 PM > > > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com> > > > Cc: linux-kernel@vger.kernel.org; linux-mm@kvack.org; > > > hannes@cmpxchg.org; nphamcs@gmail.com; > chengming.zhou@linux.dev; > > > usamaarif642@gmail.com; ryan.roberts@arm.com; Huang, Ying > > > <ying.huang@intel.com>; 21cnbao@gmail.com; akpm@linux- > foundation.org; > > > Feghali, Wajdi K <wajdi.k.feghali@intel.com>; Gopal, Vinodh > > > <vinodh.gopal@intel.com> > > > Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in > > > zswap_decompress(). > > > > > > On Tue, Nov 12, 2024 at 9:24 PM Kanchana P Sridhar > > > <kanchana.p.sridhar@intel.com> wrote: > > > > > > > > This is a hotfix for a potential zpool memory leak that could result in > > > > the existing zswap_decompress(): > > > > > > > > mutex_unlock(&acomp_ctx->mutex); > > > > > > > > if (src != acomp_ctx->buffer) > > > > zpool_unmap_handle(zpool, entry->handle); > > > > > > > > Releasing the lock before the conditional does not protect the integrity > of > > > > "src", which is set earlier under the acomp_ctx mutex lock. This poses a > > > > risk for the conditional behaving as intended, and consequently not > > > > unmapping the zpool handle, which could cause a zswap zpool memory > > > leak. > > > > > > > > This patch moves the mutex_unlock() to occur after the conditional and > > > > subsequent zpool_unmap_handle(). This ensures that the value of "src" > > > > obtained earlier, with the mutex locked, does not change. > > > > > > The commit log is too complicated and incorrect. It is talking about > > > the stability of 'src', but that's a local variable on the stack > > > anyway. It doesn't need protection. > > > > > > The problem is 'acomp_ctx->buffer' being reused and changed after the > > > mutex is released. Leading to the check not being reliable. Please > > > simplify this. > > > > Thanks Yosry. That's exactly what I meant, but I think the wording got > > confusing. The problem I was trying to fix is the acomp_ctx->buffer > > value changing after the lock is released. This could happen as a result of > any > > other compress or decompress that acquires the lock. I will simplify and > > clarify accordingly. > > > > > > > > > > > > > Even though an actual memory leak was not observed, this fix seems > like a > > > > cleaner implementation. > > > > > > > > Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com> > > > > Fixes: 9c500835f279 ("mm: zswap: fix kernel BUG in sg_init_one") > > > > --- > > > > mm/zswap.c | 3 ++- > > > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/mm/zswap.c b/mm/zswap.c > > > > index f6316b66fb23..58810fa8ff23 100644 > > > > --- a/mm/zswap.c > > > > +++ b/mm/zswap.c > > > > @@ -986,10 +986,11 @@ static void zswap_decompress(struct > > > zswap_entry *entry, struct folio *folio) > > > > acomp_request_set_params(acomp_ctx->req, &input, &output, > entry- > > > >length, PAGE_SIZE); > > > > BUG_ON(crypto_wait_req(crypto_acomp_decompress(acomp_ctx- > > > >req), &acomp_ctx->wait)); > > > > BUG_ON(acomp_ctx->req->dlen != PAGE_SIZE); > > > > - mutex_unlock(&acomp_ctx->mutex); > > > > > > > > if (src != acomp_ctx->buffer) > > > > zpool_unmap_handle(zpool, entry->handle); > > > > > > Actually now that I think more about it, I think this check isn't > > > entirely safe, even under the lock. Is it possible that > > > 'acomp_ctx->buffer' just happens to be equal to 'src' from a previous > > > decompression at the same handle? In this case, we will also > > > mistakenly skip the unmap. > > > > If we move the mutex_unlock to happen after the conditional and unmap, > > shouldn't that be sufficient under all conditions? With the fix, "src" can > > take on only 2 values in this procedure: the mapped handle or > > acomp_ctx->buffer. The only ambiguity would be in the (unlikely?) case > > if the mapped zpool handle happens to be equal to acomp_ctx->buffer. > > Yes, that's the scenario I mean. > > Specifically, in zswap_decompress(), we do not use 'acomp_ctx->buffer' > so 'src' is equal to the mapped handle. But, 'acomp_ctx->buffer' > happens to be equal to the same handle from a previous operation as we > don't clear it. Although, we never modify 'acomp_ctx->buffer' in zswap_decompress(), we only copy to it. > > In this case, the 'src != acomp_ctx->buffer' check will be false, even > though it should be true. This will result in an extra > zpool_unmap_handle() call. I didn't look closely, but this seems like > it can have a worse effect than leaking memory (e.g. there will be an > extra __kunmap_atomic() call in zsmalloc, and we may end up corrupting > a random handle). > > > > > Please let me know if I am missing anything. > > > > > > > > It would be more reliable to set a boolean variable if we copy to > > > acomp_ctx->buffer and do the unmap, and check that flag here to check > > > if the unmap was done or not. > > > > Sure, this could be done, but not sure if it is required. Please let me know > > if we still need the boolean variable in addition to moving the > mutex_unlock(). > > If we use a boolean, there is no need to move mutex_unlock(). The > boolean will be a local variable on the stack that doesn't need > protection. I agree, using the boolean variable to do the unmap rather than the check for (src != acomp_ctx->buffer) is more fail-safe. I am still thinking moving the mutex_unlock() could help, or at least have no downside. The acomp_ctx is per-cpu and it's mutex_lock/unlock safeguards the interaction between the decompress operation, the sg_*() API calls inside zswap_decompress() and the shared zpool. If we release the per-cpu acomp_ctx's mutex lock before the zpool_unmap_handle(), is it possible that another cpu could acquire it's acomp_ctx's lock and map the same zpool handle (that the earlier cpu has yet to unmap or is concurrently unmapping) for a write? If this could happen, would it result in undefined state for both these zpool ops on different cpu's? Would keeping the per-cpu mutex locked through the zpool_unmap_handle() create a non-preemptible state that would avoid this? IOW, if the above scenario is possible, does the per-cpu acomp_ctx's mutex help/is a no-op? Or, is the above scenario somehow prevented by the implementation of the zpools? Just thought I would bring up these open questions. Please do share your thoughts and advise. Thanks, Kanchana > > > > > Thanks, > > Kanchana > > > > > > > > > + > > > > + mutex_unlock(&acomp_ctx->mutex); > > > > } > > > > > > > > /********************************* > > > > > > > > base-commit: 0e5bdedb39ded767bff4c6184225578595cee98c > > > > -- > > > > 2.27.0 > > > >
On Wed, Nov 13, 2024 at 11:12 AM Sridhar, Kanchana P <kanchana.p.sridhar@intel.com> wrote: > > Hi Yosry, > > > -----Original Message----- > > From: Yosry Ahmed <yosryahmed@google.com> > > Sent: Tuesday, November 12, 2024 10:22 PM > > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com> > > Cc: linux-kernel@vger.kernel.org; linux-mm@kvack.org; > > hannes@cmpxchg.org; nphamcs@gmail.com; chengming.zhou@linux.dev; > > usamaarif642@gmail.com; ryan.roberts@arm.com; Huang, Ying > > <ying.huang@intel.com>; 21cnbao@gmail.com; akpm@linux-foundation.org; > > Feghali, Wajdi K <wajdi.k.feghali@intel.com>; Gopal, Vinodh > > <vinodh.gopal@intel.com> > > Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in > > zswap_decompress(). > > > > On Tue, Nov 12, 2024 at 9:59 PM Sridhar, Kanchana P > > <kanchana.p.sridhar@intel.com> wrote: > > > > > > > > > > -----Original Message----- > > > > From: Yosry Ahmed <yosryahmed@google.com> > > > > Sent: Tuesday, November 12, 2024 9:35 PM > > > > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com> > > > > Cc: linux-kernel@vger.kernel.org; linux-mm@kvack.org; > > > > hannes@cmpxchg.org; nphamcs@gmail.com; > > chengming.zhou@linux.dev; > > > > usamaarif642@gmail.com; ryan.roberts@arm.com; Huang, Ying > > > > <ying.huang@intel.com>; 21cnbao@gmail.com; akpm@linux- > > foundation.org; > > > > Feghali, Wajdi K <wajdi.k.feghali@intel.com>; Gopal, Vinodh > > > > <vinodh.gopal@intel.com> > > > > Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in > > > > zswap_decompress(). > > > > > > > > On Tue, Nov 12, 2024 at 9:24 PM Kanchana P Sridhar > > > > <kanchana.p.sridhar@intel.com> wrote: > > > > > > > > > > This is a hotfix for a potential zpool memory leak that could result in > > > > > the existing zswap_decompress(): > > > > > > > > > > mutex_unlock(&acomp_ctx->mutex); > > > > > > > > > > if (src != acomp_ctx->buffer) > > > > > zpool_unmap_handle(zpool, entry->handle); > > > > > > > > > > Releasing the lock before the conditional does not protect the integrity > > of > > > > > "src", which is set earlier under the acomp_ctx mutex lock. This poses a > > > > > risk for the conditional behaving as intended, and consequently not > > > > > unmapping the zpool handle, which could cause a zswap zpool memory > > > > leak. > > > > > > > > > > This patch moves the mutex_unlock() to occur after the conditional and > > > > > subsequent zpool_unmap_handle(). This ensures that the value of "src" > > > > > obtained earlier, with the mutex locked, does not change. > > > > > > > > The commit log is too complicated and incorrect. It is talking about > > > > the stability of 'src', but that's a local variable on the stack > > > > anyway. It doesn't need protection. > > > > > > > > The problem is 'acomp_ctx->buffer' being reused and changed after the > > > > mutex is released. Leading to the check not being reliable. Please > > > > simplify this. > > > > > > Thanks Yosry. That's exactly what I meant, but I think the wording got > > > confusing. The problem I was trying to fix is the acomp_ctx->buffer > > > value changing after the lock is released. This could happen as a result of > > any > > > other compress or decompress that acquires the lock. I will simplify and > > > clarify accordingly. > > > > > > > > > > > > > > > > > Even though an actual memory leak was not observed, this fix seems > > like a > > > > > cleaner implementation. > > > > > > > > > > Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com> > > > > > Fixes: 9c500835f279 ("mm: zswap: fix kernel BUG in sg_init_one") > > > > > --- > > > > > mm/zswap.c | 3 ++- > > > > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > > > > > > > diff --git a/mm/zswap.c b/mm/zswap.c > > > > > index f6316b66fb23..58810fa8ff23 100644 > > > > > --- a/mm/zswap.c > > > > > +++ b/mm/zswap.c > > > > > @@ -986,10 +986,11 @@ static void zswap_decompress(struct > > > > zswap_entry *entry, struct folio *folio) > > > > > acomp_request_set_params(acomp_ctx->req, &input, &output, > > entry- > > > > >length, PAGE_SIZE); > > > > > BUG_ON(crypto_wait_req(crypto_acomp_decompress(acomp_ctx- > > > > >req), &acomp_ctx->wait)); > > > > > BUG_ON(acomp_ctx->req->dlen != PAGE_SIZE); > > > > > - mutex_unlock(&acomp_ctx->mutex); > > > > > > > > > > if (src != acomp_ctx->buffer) > > > > > zpool_unmap_handle(zpool, entry->handle); > > > > > > > > Actually now that I think more about it, I think this check isn't > > > > entirely safe, even under the lock. Is it possible that > > > > 'acomp_ctx->buffer' just happens to be equal to 'src' from a previous > > > > decompression at the same handle? In this case, we will also > > > > mistakenly skip the unmap. > > > > > > If we move the mutex_unlock to happen after the conditional and unmap, > > > shouldn't that be sufficient under all conditions? With the fix, "src" can > > > take on only 2 values in this procedure: the mapped handle or > > > acomp_ctx->buffer. The only ambiguity would be in the (unlikely?) case > > > if the mapped zpool handle happens to be equal to acomp_ctx->buffer. > > > > Yes, that's the scenario I mean. > > > > Specifically, in zswap_decompress(), we do not use 'acomp_ctx->buffer' > > so 'src' is equal to the mapped handle. But, 'acomp_ctx->buffer' > > happens to be equal to the same handle from a previous operation as we > > don't clear it. > > Although, we never modify 'acomp_ctx->buffer' in zswap_decompress(), > we only copy to it. Duh, yes. I confused myself, sorry for the noise. > > > > > In this case, the 'src != acomp_ctx->buffer' check will be false, even > > though it should be true. This will result in an extra > > zpool_unmap_handle() call. I didn't look closely, but this seems like > > it can have a worse effect than leaking memory (e.g. there will be an > > extra __kunmap_atomic() call in zsmalloc, and we may end up corrupting > > a random handle). > > > > > > > > Please let me know if I am missing anything. > > > > > > > > > > > It would be more reliable to set a boolean variable if we copy to > > > > acomp_ctx->buffer and do the unmap, and check that flag here to check > > > > if the unmap was done or not. > > > > > > Sure, this could be done, but not sure if it is required. Please let me know > > > if we still need the boolean variable in addition to moving the > > mutex_unlock(). > > > > If we use a boolean, there is no need to move mutex_unlock(). The > > boolean will be a local variable on the stack that doesn't need > > protection. > > I agree, using the boolean variable to do the unmap rather than the check > for (src != acomp_ctx->buffer) is more fail-safe. > > I am still thinking moving the mutex_unlock() could help, or at least have > no downside. The acomp_ctx is per-cpu and it's mutex_lock/unlock > safeguards the interaction between the decompress operation, the > sg_*() API calls inside zswap_decompress() and the shared zpool. > > If we release the per-cpu acomp_ctx's mutex lock before the > zpool_unmap_handle(), is it possible that another cpu could acquire > it's acomp_ctx's lock and map the same zpool handle (that the earlier > cpu has yet to unmap or is concurrently unmapping) for a write? > If this could happen, would it result in undefined state for both > these zpool ops on different cpu's? Why would this result in an undefined state? For zsmalloc, mapping uses a per-CPU buffer and preemption is disabled between mapping and unmapping anyway. If another CPU maps the object it should be fine. > > Would keeping the per-cpu mutex locked through the > zpool_unmap_handle() create a non-preemptible state that would > avoid this? IOW, if the above scenario is possible, does the per-cpu > acomp_ctx's mutex help/is a no-op? Or, is the above scenario somehow > prevented by the implementation of the zpools? At least for zsmalloc, I think it is. > > Just thought I would bring up these open questions. Please do share > your thoughts and advise. I think moving the mutex unlock after the unmap won't make much of a difference from a performance side, at least for zsmalloc. Preemption will be disabled until the unmap is done anyway, so even after we release the per-CPU mutex it cannot be acquired by anyone else until the unmap is done. Anyway, I think the fix you have right now is fine, if you prefer not adding a boolean. If you do add a boolean, whether you move the mutex unlock or not should not make a difference. Please just rewrite the commit log and CC stable (in the commit log, not in the email CC list). Thanks and sorry for the confusion! > > Thanks, > Kanchana > > > > > > > > > Thanks, > > > Kanchana > > > > > > > > > > > > + > > > > > + mutex_unlock(&acomp_ctx->mutex); > > > > > } > > > > > > > > > > /********************************* > > > > > > > > > > base-commit: 0e5bdedb39ded767bff4c6184225578595cee98c > > > > > -- > > > > > 2.27.0 > > > > >
> -----Original Message----- > From: Yosry Ahmed <yosryahmed@google.com> > Sent: Wednesday, November 13, 2024 12:11 PM > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com> > Cc: linux-kernel@vger.kernel.org; linux-mm@kvack.org; > hannes@cmpxchg.org; nphamcs@gmail.com; chengming.zhou@linux.dev; > usamaarif642@gmail.com; ryan.roberts@arm.com; Huang, Ying > <ying.huang@intel.com>; 21cnbao@gmail.com; akpm@linux-foundation.org; > Feghali, Wajdi K <wajdi.k.feghali@intel.com>; Gopal, Vinodh > <vinodh.gopal@intel.com> > Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in > zswap_decompress(). > > On Wed, Nov 13, 2024 at 11:12 AM Sridhar, Kanchana P > <kanchana.p.sridhar@intel.com> wrote: > > > > Hi Yosry, > > > > > -----Original Message----- > > > From: Yosry Ahmed <yosryahmed@google.com> > > > Sent: Tuesday, November 12, 2024 10:22 PM > > > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com> > > > Cc: linux-kernel@vger.kernel.org; linux-mm@kvack.org; > > > hannes@cmpxchg.org; nphamcs@gmail.com; > chengming.zhou@linux.dev; > > > usamaarif642@gmail.com; ryan.roberts@arm.com; Huang, Ying > > > <ying.huang@intel.com>; 21cnbao@gmail.com; akpm@linux- > foundation.org; > > > Feghali, Wajdi K <wajdi.k.feghali@intel.com>; Gopal, Vinodh > > > <vinodh.gopal@intel.com> > > > Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in > > > zswap_decompress(). > > > > > > On Tue, Nov 12, 2024 at 9:59 PM Sridhar, Kanchana P > > > <kanchana.p.sridhar@intel.com> wrote: > > > > > > > > > > > > > -----Original Message----- > > > > > From: Yosry Ahmed <yosryahmed@google.com> > > > > > Sent: Tuesday, November 12, 2024 9:35 PM > > > > > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com> > > > > > Cc: linux-kernel@vger.kernel.org; linux-mm@kvack.org; > > > > > hannes@cmpxchg.org; nphamcs@gmail.com; > > > chengming.zhou@linux.dev; > > > > > usamaarif642@gmail.com; ryan.roberts@arm.com; Huang, Ying > > > > > <ying.huang@intel.com>; 21cnbao@gmail.com; akpm@linux- > > > foundation.org; > > > > > Feghali, Wajdi K <wajdi.k.feghali@intel.com>; Gopal, Vinodh > > > > > <vinodh.gopal@intel.com> > > > > > Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in > > > > > zswap_decompress(). > > > > > > > > > > On Tue, Nov 12, 2024 at 9:24 PM Kanchana P Sridhar > > > > > <kanchana.p.sridhar@intel.com> wrote: > > > > > > > > > > > > This is a hotfix for a potential zpool memory leak that could result in > > > > > > the existing zswap_decompress(): > > > > > > > > > > > > mutex_unlock(&acomp_ctx->mutex); > > > > > > > > > > > > if (src != acomp_ctx->buffer) > > > > > > zpool_unmap_handle(zpool, entry->handle); > > > > > > > > > > > > Releasing the lock before the conditional does not protect the > integrity > > > of > > > > > > "src", which is set earlier under the acomp_ctx mutex lock. This > poses a > > > > > > risk for the conditional behaving as intended, and consequently not > > > > > > unmapping the zpool handle, which could cause a zswap zpool > memory > > > > > leak. > > > > > > > > > > > > This patch moves the mutex_unlock() to occur after the conditional > and > > > > > > subsequent zpool_unmap_handle(). This ensures that the value of > "src" > > > > > > obtained earlier, with the mutex locked, does not change. > > > > > > > > > > The commit log is too complicated and incorrect. It is talking about > > > > > the stability of 'src', but that's a local variable on the stack > > > > > anyway. It doesn't need protection. > > > > > > > > > > The problem is 'acomp_ctx->buffer' being reused and changed after > the > > > > > mutex is released. Leading to the check not being reliable. Please > > > > > simplify this. > > > > > > > > Thanks Yosry. That's exactly what I meant, but I think the wording got > > > > confusing. The problem I was trying to fix is the acomp_ctx->buffer > > > > value changing after the lock is released. This could happen as a result > of > > > any > > > > other compress or decompress that acquires the lock. I will simplify and > > > > clarify accordingly. > > > > > > > > > > > > > > > > > > > > > Even though an actual memory leak was not observed, this fix seems > > > like a > > > > > > cleaner implementation. > > > > > > > > > > > > Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com> > > > > > > Fixes: 9c500835f279 ("mm: zswap: fix kernel BUG in sg_init_one") > > > > > > --- > > > > > > mm/zswap.c | 3 ++- > > > > > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > > > > > > > > > diff --git a/mm/zswap.c b/mm/zswap.c > > > > > > index f6316b66fb23..58810fa8ff23 100644 > > > > > > --- a/mm/zswap.c > > > > > > +++ b/mm/zswap.c > > > > > > @@ -986,10 +986,11 @@ static void zswap_decompress(struct > > > > > zswap_entry *entry, struct folio *folio) > > > > > > acomp_request_set_params(acomp_ctx->req, &input, &output, > > > entry- > > > > > >length, PAGE_SIZE); > > > > > > > BUG_ON(crypto_wait_req(crypto_acomp_decompress(acomp_ctx- > > > > > >req), &acomp_ctx->wait)); > > > > > > BUG_ON(acomp_ctx->req->dlen != PAGE_SIZE); > > > > > > - mutex_unlock(&acomp_ctx->mutex); > > > > > > > > > > > > if (src != acomp_ctx->buffer) > > > > > > zpool_unmap_handle(zpool, entry->handle); > > > > > > > > > > Actually now that I think more about it, I think this check isn't > > > > > entirely safe, even under the lock. Is it possible that > > > > > 'acomp_ctx->buffer' just happens to be equal to 'src' from a previous > > > > > decompression at the same handle? In this case, we will also > > > > > mistakenly skip the unmap. > > > > > > > > If we move the mutex_unlock to happen after the conditional and > unmap, > > > > shouldn't that be sufficient under all conditions? With the fix, "src" can > > > > take on only 2 values in this procedure: the mapped handle or > > > > acomp_ctx->buffer. The only ambiguity would be in the (unlikely?) case > > > > if the mapped zpool handle happens to be equal to acomp_ctx->buffer. > > > > > > Yes, that's the scenario I mean. > > > > > > Specifically, in zswap_decompress(), we do not use 'acomp_ctx->buffer' > > > so 'src' is equal to the mapped handle. But, 'acomp_ctx->buffer' > > > happens to be equal to the same handle from a previous operation as we > > > don't clear it. > > > > Although, we never modify 'acomp_ctx->buffer' in zswap_decompress(), > > we only copy to it. > > Duh, yes. I confused myself, sorry for the noise. > > > > > > > > > In this case, the 'src != acomp_ctx->buffer' check will be false, even > > > though it should be true. This will result in an extra > > > zpool_unmap_handle() call. I didn't look closely, but this seems like > > > it can have a worse effect than leaking memory (e.g. there will be an > > > extra __kunmap_atomic() call in zsmalloc, and we may end up corrupting > > > a random handle). > > > > > > > > > > > Please let me know if I am missing anything. > > > > > > > > > > > > > > It would be more reliable to set a boolean variable if we copy to > > > > > acomp_ctx->buffer and do the unmap, and check that flag here to > check > > > > > if the unmap was done or not. > > > > > > > > Sure, this could be done, but not sure if it is required. Please let me know > > > > if we still need the boolean variable in addition to moving the > > > mutex_unlock(). > > > > > > If we use a boolean, there is no need to move mutex_unlock(). The > > > boolean will be a local variable on the stack that doesn't need > > > protection. > > > > I agree, using the boolean variable to do the unmap rather than the check > > for (src != acomp_ctx->buffer) is more fail-safe. > > > > I am still thinking moving the mutex_unlock() could help, or at least have > > no downside. The acomp_ctx is per-cpu and it's mutex_lock/unlock > > safeguards the interaction between the decompress operation, the > > sg_*() API calls inside zswap_decompress() and the shared zpool. > > > > If we release the per-cpu acomp_ctx's mutex lock before the > > zpool_unmap_handle(), is it possible that another cpu could acquire > > it's acomp_ctx's lock and map the same zpool handle (that the earlier > > cpu has yet to unmap or is concurrently unmapping) for a write? > > If this could happen, would it result in undefined state for both > > these zpool ops on different cpu's? > > Why would this result in an undefined state? For zsmalloc, mapping > uses a per-CPU buffer and preemption is disabled between mapping and > unmapping anyway. If another CPU maps the object it should be fine. > > > > > Would keeping the per-cpu mutex locked through the > > zpool_unmap_handle() create a non-preemptible state that would > > avoid this? IOW, if the above scenario is possible, does the per-cpu > > acomp_ctx's mutex help/is a no-op? Or, is the above scenario somehow > > prevented by the implementation of the zpools? > > At least for zsmalloc, I think it is. > > > > > Just thought I would bring up these open questions. Please do share > > your thoughts and advise. > > I think moving the mutex unlock after the unmap won't make much of a > difference from a performance side, at least for zsmalloc. Preemption > will be disabled until the unmap is done anyway, so even after we > release the per-CPU mutex it cannot be acquired by anyone else until > the unmap is done. > > Anyway, I think the fix you have right now is fine, if you prefer not > adding a boolean. If you do add a boolean, whether you move the mutex > unlock or not should not make a difference. Thanks for the guidance on next steps! I will add the boolean and move the mutex. > > Please just rewrite the commit log and CC stable (in the commit log, > not in the email CC list). I guess this refers to adding "Cc: stable@vger.kernel.org" in the commit log (as found from the latest mainline git log) ? Thanks again, Kanchana
[..] > > > > Please just rewrite the commit log and CC stable (in the commit log, > > not in the email CC list). > > I guess this refers to adding "Cc: stable@vger.kernel.org" in the commit log > (as found from the latest mainline git log) ? Yes, thanks! > > Thanks again, > Kanchana >
> -----Original Message----- > From: Yosry Ahmed <yosryahmed@google.com> > Sent: Wednesday, November 13, 2024 1:00 PM > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com> > Cc: linux-kernel@vger.kernel.org; linux-mm@kvack.org; > hannes@cmpxchg.org; nphamcs@gmail.com; chengming.zhou@linux.dev; > usamaarif642@gmail.com; ryan.roberts@arm.com; Huang, Ying > <ying.huang@intel.com>; 21cnbao@gmail.com; akpm@linux-foundation.org; > Feghali, Wajdi K <wajdi.k.feghali@intel.com>; Gopal, Vinodh > <vinodh.gopal@intel.com> > Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in > zswap_decompress(). > > [..] > > > > > > Please just rewrite the commit log and CC stable (in the commit log, > > > not in the email CC list). > > > > I guess this refers to adding "Cc: stable@vger.kernel.org" in the commit log > > (as found from the latest mainline git log) ? > > Yes, thanks! Sounds good, thanks Yosry! > > > > > Thanks again, > > Kanchana > >
On Wed, Nov 13, 2024 at 07:12:18PM +0000, Sridhar, Kanchana P wrote: > I am still thinking moving the mutex_unlock() could help, or at least have > no downside. The acomp_ctx is per-cpu and it's mutex_lock/unlock > safeguards the interaction between the decompress operation, the > sg_*() API calls inside zswap_decompress() and the shared zpool. > > If we release the per-cpu acomp_ctx's mutex lock before the > zpool_unmap_handle(), is it possible that another cpu could acquire > it's acomp_ctx's lock and map the same zpool handle (that the earlier > cpu has yet to unmap or is concurrently unmapping) for a write? > If this could happen, would it result in undefined state for both > these zpool ops on different cpu's? The code is fine as is. Like you said, acomp_ctx->buffer (the pointer) doesn't change. It points to whatever was kmalloced in zswap_cpu_comp_prepare(). The handle points to backend memory. Neither of those addresses can change under us. There is no confusing them, and they cannot coincide. The mutex guards the *memory* behind the buffer, so that we don't have multiple (de)compressors stepping on each others' toes. But it's fine to drop the mutex once we're done working with the memory. We don't need the mutex to check whether src holds the acomp buffer address. That being said, I do think there is a UAF bug in CPU hotplugging. There is an acomp_ctx for each cpu, but note that this is best effort parallelism, not a guarantee that we always have the context of the local CPU. Look closely: we pick the "local" CPU with preemption enabled, then contend for the mutex. This may well put us to sleep and get us migrated, so we could be using the context of a CPU we are no longer running on. This is fine because we hold the mutex - if that other CPU tries to use the acomp_ctx, it'll wait for us. However, if we get migrated and vacate the CPU whose context we have locked, the CPU might get offlined and zswap_cpu_comp_dead() can free the context underneath us. I think we need to refcount the acomp_ctx.
On Wed, Nov 13, 2024 at 1:30 PM Johannes Weiner <hannes@cmpxchg.org> wrote: > > On Wed, Nov 13, 2024 at 07:12:18PM +0000, Sridhar, Kanchana P wrote: > > I am still thinking moving the mutex_unlock() could help, or at least have > > no downside. The acomp_ctx is per-cpu and it's mutex_lock/unlock > > safeguards the interaction between the decompress operation, the > > sg_*() API calls inside zswap_decompress() and the shared zpool. > > > > If we release the per-cpu acomp_ctx's mutex lock before the > > zpool_unmap_handle(), is it possible that another cpu could acquire > > it's acomp_ctx's lock and map the same zpool handle (that the earlier > > cpu has yet to unmap or is concurrently unmapping) for a write? > > If this could happen, would it result in undefined state for both > > these zpool ops on different cpu's? > > The code is fine as is. > > Like you said, acomp_ctx->buffer (the pointer) doesn't change. It > points to whatever was kmalloced in zswap_cpu_comp_prepare(). The > handle points to backend memory. Neither of those addresses can change > under us. There is no confusing them, and they cannot coincide. > > The mutex guards the *memory* behind the buffer, so that we don't have > multiple (de)compressors stepping on each others' toes. But it's fine > to drop the mutex once we're done working with the memory. We don't > need the mutex to check whether src holds the acomp buffer address. I have to admit that I confused myself with this alleged bug more than I like to admit :) I initially thought acomp_ctx->buffer can be changed, then when I realized it cannot be changed I did not tie that back to the 'fix' not being needed at all. I need more coffee. > > That being said, I do think there is a UAF bug in CPU hotplugging. > > There is an acomp_ctx for each cpu, but note that this is best effort > parallelism, not a guarantee that we always have the context of the > local CPU. Look closely: we pick the "local" CPU with preemption > enabled, then contend for the mutex. This may well put us to sleep and > get us migrated, so we could be using the context of a CPU we are no > longer running on. This is fine because we hold the mutex - if that > other CPU tries to use the acomp_ctx, it'll wait for us. > > However, if we get migrated and vacate the CPU whose context we have > locked, the CPU might get offlined and zswap_cpu_comp_dead() can free > the context underneath us. I think we need to refcount the acomp_ctx.
> -----Original Message----- > From: Johannes Weiner <hannes@cmpxchg.org> > Sent: Wednesday, November 13, 2024 1:30 PM > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com> > Cc: Yosry Ahmed <yosryahmed@google.com>; linux-kernel@vger.kernel.org; > linux-mm@kvack.org; nphamcs@gmail.com; chengming.zhou@linux.dev; > usamaarif642@gmail.com; ryan.roberts@arm.com; Huang, Ying > <ying.huang@intel.com>; 21cnbao@gmail.com; akpm@linux-foundation.org; > Feghali, Wajdi K <wajdi.k.feghali@intel.com>; Gopal, Vinodh > <vinodh.gopal@intel.com> > Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in > zswap_decompress(). > > On Wed, Nov 13, 2024 at 07:12:18PM +0000, Sridhar, Kanchana P wrote: > > I am still thinking moving the mutex_unlock() could help, or at least have > > no downside. The acomp_ctx is per-cpu and it's mutex_lock/unlock > > safeguards the interaction between the decompress operation, the > > sg_*() API calls inside zswap_decompress() and the shared zpool. > > > > If we release the per-cpu acomp_ctx's mutex lock before the > > zpool_unmap_handle(), is it possible that another cpu could acquire > > it's acomp_ctx's lock and map the same zpool handle (that the earlier > > cpu has yet to unmap or is concurrently unmapping) for a write? > > If this could happen, would it result in undefined state for both > > these zpool ops on different cpu's? > > The code is fine as is. > > Like you said, acomp_ctx->buffer (the pointer) doesn't change. It > points to whatever was kmalloced in zswap_cpu_comp_prepare(). The > handle points to backend memory. Neither of those addresses can change > under us. There is no confusing them, and they cannot coincide. > > The mutex guards the *memory* behind the buffer, so that we don't have > multiple (de)compressors stepping on each others' toes. But it's fine > to drop the mutex once we're done working with the memory. We don't > need the mutex to check whether src holds the acomp buffer address. Thanks Johannes, for these insights. I was thinking of the following in zswap_decompress() as creating a non-preemptible context because of the call to raw_cpu_ptr() at the start; with this context extending until the mutex_unlock(): acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx); mutex_lock(&acomp_ctx->mutex); [...] mutex_unlock(&acomp_ctx->mutex); if (src != acomp_ctx->buffer) zpool_unmap_handle(zpool, entry->handle); Based on this understanding, I was a bit worried about the "acomp_ctx->buffer" in the conditional that gates the zpool_unmap_handle() not being the same acomp_ctx as the one at the beginning. I may have been confusing myself, since the acomp_ctx is not re-evaluated before the conditional, just reused from the start. My apologies to you and Yosry! > > That being said, I do think there is a UAF bug in CPU hotplugging. > > There is an acomp_ctx for each cpu, but note that this is best effort > parallelism, not a guarantee that we always have the context of the > local CPU. Look closely: we pick the "local" CPU with preemption > enabled, then contend for the mutex. This may well put us to sleep and > get us migrated, so we could be using the context of a CPU we are no > longer running on. This is fine because we hold the mutex - if that > other CPU tries to use the acomp_ctx, it'll wait for us. > > However, if we get migrated and vacate the CPU whose context we have > locked, the CPU might get offlined and zswap_cpu_comp_dead() can free > the context underneath us. I think we need to refcount the acomp_ctx. I see. Wouldn't it then seem to make the code more fail-safe to not allow the migration to happen until after the check for (src != acomp_ctx->buffer), by moving the mutex_unlock() after this check? Or, use a boolean to determine if the unmap_handle needs to be done as Yosry suggested? Thanks, Kanchana
On Wed, Nov 13, 2024 at 2:13 PM Sridhar, Kanchana P <kanchana.p.sridhar@intel.com> wrote: > > > > Thanks Johannes, for these insights. I was thinking of the following > in zswap_decompress() as creating a non-preemptible context because > of the call to raw_cpu_ptr() at the start; with this context extending > until the mutex_unlock(): > > acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx); > mutex_lock(&acomp_ctx->mutex); > > [...] > > mutex_unlock(&acomp_ctx->mutex); > > if (src != acomp_ctx->buffer) > zpool_unmap_handle(zpool, entry->handle); > > Based on this understanding, I was a bit worried about the > "acomp_ctx->buffer" in the conditional that gates the > zpool_unmap_handle() not being the same acomp_ctx as the one > at the beginning. I may have been confusing myself, since the acomp_ctx > is not re-evaluated before the conditional, just reused from the > start. My apologies to you and Yosry! > > > > > That being said, I do think there is a UAF bug in CPU hotplugging. > > > > There is an acomp_ctx for each cpu, but note that this is best effort > > parallelism, not a guarantee that we always have the context of the > > local CPU. Look closely: we pick the "local" CPU with preemption > > enabled, then contend for the mutex. This may well put us to sleep and > > get us migrated, so we could be using the context of a CPU we are no > > longer running on. This is fine because we hold the mutex - if that > > other CPU tries to use the acomp_ctx, it'll wait for us. > > > > However, if we get migrated and vacate the CPU whose context we have > > locked, the CPU might get offlined and zswap_cpu_comp_dead() can free > > the context underneath us. I think we need to refcount the acomp_ctx. > > I see. Wouldn't it then seem to make the code more fail-safe to not allow > the migration to happen until after the check for (src != acomp_ctx->buffer), by > moving the mutex_unlock() after this check? Or, use a boolean to determine > if the unmap_handle needs to be done as Yosry suggested? Hmm does it make it safe? It is mutex_lock() that puts the task to sleep, after which it can get migrated to a different CPU. Moving mutex_unlock() to below or not doesn't really matter, no? Or am I missing something here... I think Johannes' proposal is the safest :)
> -----Original Message----- > From: Nhat Pham <nphamcs@gmail.com> > Sent: Wednesday, November 13, 2024 4:29 PM > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com> > Cc: Johannes Weiner <hannes@cmpxchg.org>; Yosry Ahmed > <yosryahmed@google.com>; linux-kernel@vger.kernel.org; linux- > mm@kvack.org; chengming.zhou@linux.dev; usamaarif642@gmail.com; > ryan.roberts@arm.com; Huang, Ying <ying.huang@intel.com>; > 21cnbao@gmail.com; akpm@linux-foundation.org; Feghali, Wajdi K > <wajdi.k.feghali@intel.com>; Gopal, Vinodh <vinodh.gopal@intel.com> > Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in > zswap_decompress(). > > On Wed, Nov 13, 2024 at 2:13 PM Sridhar, Kanchana P > <kanchana.p.sridhar@intel.com> wrote: > > > > > > > > Thanks Johannes, for these insights. I was thinking of the following > > in zswap_decompress() as creating a non-preemptible context because > > of the call to raw_cpu_ptr() at the start; with this context extending > > until the mutex_unlock(): > > > > acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx); > > mutex_lock(&acomp_ctx->mutex); > > > > [...] > > > > mutex_unlock(&acomp_ctx->mutex); > > > > if (src != acomp_ctx->buffer) > > zpool_unmap_handle(zpool, entry->handle); > > > > Based on this understanding, I was a bit worried about the > > "acomp_ctx->buffer" in the conditional that gates the > > zpool_unmap_handle() not being the same acomp_ctx as the one > > at the beginning. I may have been confusing myself, since the acomp_ctx > > is not re-evaluated before the conditional, just reused from the > > start. My apologies to you and Yosry! > > > > > > > > That being said, I do think there is a UAF bug in CPU hotplugging. > > > > > > There is an acomp_ctx for each cpu, but note that this is best effort > > > parallelism, not a guarantee that we always have the context of the > > > local CPU. Look closely: we pick the "local" CPU with preemption > > > enabled, then contend for the mutex. This may well put us to sleep and > > > get us migrated, so we could be using the context of a CPU we are no > > > longer running on. This is fine because we hold the mutex - if that > > > other CPU tries to use the acomp_ctx, it'll wait for us. > > > > > > However, if we get migrated and vacate the CPU whose context we have > > > locked, the CPU might get offlined and zswap_cpu_comp_dead() can free > > > the context underneath us. I think we need to refcount the acomp_ctx. > > > > I see. Wouldn't it then seem to make the code more fail-safe to not allow > > the migration to happen until after the check for (src != acomp_ctx->buffer), > by > > moving the mutex_unlock() after this check? Or, use a boolean to determine > > if the unmap_handle needs to be done as Yosry suggested? > > Hmm does it make it safe? It is mutex_lock() that puts the task to > sleep, after which it can get migrated to a different CPU. Moving > mutex_unlock() to below or not doesn't really matter, no? Or am I > missing something here... Thanks for the comments, Nhat. I guess my last comment in response to what Johannes mentioned about the UAF situation, is the one that's still an open from my perspective. If the process can get migrated after the mutex unlock, and if the acomp_ctx obtained earlier gets deleted (as Johannes was mentioning), then we could be accessing an invalid pointer in the "if (src != acomp_ctx->buffer)". So my question was, can we prevent the migration to a different cpu by relinquishing the mutex lock after this conditional; or, make the conditional be determined by a boolean that gets set earlier to decide whether or not to unmap the zpool handle (as against using the acomp_ctx to decide this). Thanks, Kanchana > > I think Johannes' proposal is the safest :)
On Thu, Nov 14, 2024 at 01:56:16AM +0000, Sridhar, Kanchana P wrote: > So my question was, can we prevent the migration to a different cpu > by relinquishing the mutex lock after this conditional Holding the mutex doesn't prevent preemption/migration.
> -----Original Message----- > From: Johannes Weiner <hannes@cmpxchg.org> > Sent: Wednesday, November 13, 2024 9:12 PM > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com> > Cc: Nhat Pham <nphamcs@gmail.com>; Yosry Ahmed > <yosryahmed@google.com>; linux-kernel@vger.kernel.org; linux- > mm@kvack.org; chengming.zhou@linux.dev; usamaarif642@gmail.com; > ryan.roberts@arm.com; Huang, Ying <ying.huang@intel.com>; > 21cnbao@gmail.com; akpm@linux-foundation.org; Feghali, Wajdi K > <wajdi.k.feghali@intel.com>; Gopal, Vinodh <vinodh.gopal@intel.com> > Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in > zswap_decompress(). > > On Thu, Nov 14, 2024 at 01:56:16AM +0000, Sridhar, Kanchana P wrote: > > So my question was, can we prevent the migration to a different cpu > > by relinquishing the mutex lock after this conditional > > Holding the mutex doesn't prevent preemption/migration. Sure, however, is this also applicable to holding the mutex of a per-cpu structure obtained via raw_cpu_ptr()? Would holding the mutex prevent the acomp_ctx of the cpu prior to the migration (in the UAF scenario you described) from being deleted? If holding the per-cpu acomp_ctx's mutex isn't sufficient to prevent the UAF, I agree, we might need a way to prevent the acomp_ctx from being deleted, e.g. with refcounts as you've suggested, or to not use the acomp_ctx at all for the check, instead use a boolean. Thanks, Kanchana
diff --git a/mm/zswap.c b/mm/zswap.c index f6316b66fb23..58810fa8ff23 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -986,10 +986,11 @@ static void zswap_decompress(struct zswap_entry *entry, struct folio *folio) acomp_request_set_params(acomp_ctx->req, &input, &output, entry->length, PAGE_SIZE); BUG_ON(crypto_wait_req(crypto_acomp_decompress(acomp_ctx->req), &acomp_ctx->wait)); BUG_ON(acomp_ctx->req->dlen != PAGE_SIZE); - mutex_unlock(&acomp_ctx->mutex); if (src != acomp_ctx->buffer) zpool_unmap_handle(zpool, entry->handle); + + mutex_unlock(&acomp_ctx->mutex); } /*********************************
This is a hotfix for a potential zpool memory leak that could result in the existing zswap_decompress(): mutex_unlock(&acomp_ctx->mutex); if (src != acomp_ctx->buffer) zpool_unmap_handle(zpool, entry->handle); Releasing the lock before the conditional does not protect the integrity of "src", which is set earlier under the acomp_ctx mutex lock. This poses a risk for the conditional behaving as intended, and consequently not unmapping the zpool handle, which could cause a zswap zpool memory leak. This patch moves the mutex_unlock() to occur after the conditional and subsequent zpool_unmap_handle(). This ensures that the value of "src" obtained earlier, with the mutex locked, does not change. Even though an actual memory leak was not observed, this fix seems like a cleaner implementation. Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com> Fixes: 9c500835f279 ("mm: zswap: fix kernel BUG in sg_init_one") --- mm/zswap.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) base-commit: 0e5bdedb39ded767bff4c6184225578595cee98c