[stable/4.14.y,0/3] mmc: Fix a potential resource leak when shutting down request queue.

Message ID	20190513175521.84955-1-rrangel@chromium.org (mailing list archive)
Headers	show Return-Path: <linux-mmc-owner@kernel.org> From: Raul E Rangel <rrangel@chromium.org> To: stable@vger.kernel.org Cc: linux-mmc@vger.kernel.org, djkurtz@google.com, adrian.hunter@intel.com, zwisler@chromium.org, Raul E Rangel <rrangel@chromium.org>, Linus Walleij <linus.walleij@linaro.org>, linux-kernel@vger.kernel.org, Chris Boot <bootc@bootc.net>, =?utf-8?b?Q2zDqW1lbnQgUMOpcm9u?= <peron.clem@gmail.com>, Greg Kroah-Hartman <gregkh@linuxfoundation.org>, Ulf Hansson <ulf.hansson@linaro.org> Subject: [stable/4.14.y PATCH 0/3] mmc: Fix a potential resource leak when shutting down request queue. Date: Mon, 13 May 2019 11:55:18 -0600 Message-Id: <20190513175521.84955-1-rrangel@chromium.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-mmc-owner@vger.kernel.org Precedence: bulk
Series	mmc: Fix a potential resource leak when shutting down request queue. \| expand [stable/4.14.y,0/3] mmc: Fix a potential resource leak when shutting down request queue. [stable/4.14.y,1/3] mmc: block: Simplify cleaning up the queue [stable/4.14.y,2/3] mmc: Fix null pointer dereference in mmc_init_request [stable/4.14.y,3/3] mmc: Kill the request if the queuedata has been removed

Raul Rangel May 13, 2019, 5:55 p.m. UTC

I think we should cherry-pick 41e3efd07d5a02c80f503e29d755aa1bbb4245de
https://lore.kernel.org/patchwork/patch/856512/ into 4.14. It fixes a
potential resource leak when shutting down the request queue.

Once this patch is applied, there is a potential for a null pointer dereference.
That's what the second patch fixes.

The third patch is just an optimization to stop processing earlier.

See https://patchwork.kernel.org/patch/10925469/ for the initial motivation.

This commit applies to v4.14.116. It is already included in 4.19. 4.19 doesn't
suffer from the null pointer dereference because later commits migrate the mmc
stack to blk-mq.

I tested this patch set by randomly connecting/disconnecting the SD
card. I got over 189650 itarations without a problem.

Thanks,
Raul


Adrian Hunter (1):
  mmc: block: Simplify cleaning up the queue

Raul E Rangel (2):
  mmc: Fix null pointer dereference in mmc_init_request
  mmc: Kill the request if the queuedata has been removed

 drivers/mmc/core/block.c | 17 ++++++++++++-----
 drivers/mmc/core/queue.c | 14 +++++++++++---
 2 files changed, 23 insertions(+), 8 deletions(-)

Sasha Levin May 14, 2019, 12:43 a.m. UTC | #1

On Mon, May 13, 2019 at 11:55:18AM -0600, Raul E Rangel wrote:
>I think we should cherry-pick 41e3efd07d5a02c80f503e29d755aa1bbb4245de
>https://lore.kernel.org/patchwork/patch/856512/ into 4.14. It fixes a
>potential resource leak when shutting down the request queue.
>
>Once this patch is applied, there is a potential for a null pointer dereference.
>That's what the second patch fixes.
>
>The third patch is just an optimization to stop processing earlier.

Is this actually part of a fix? Why do we want this optimization?

--
Thanks,
Sasha

Greg Kroah-Hartman May 14, 2019, 9:19 a.m. UTC | #2

On Mon, May 13, 2019 at 11:55:18AM -0600, Raul E Rangel wrote:
> I think we should cherry-pick 41e3efd07d5a02c80f503e29d755aa1bbb4245de
> https://lore.kernel.org/patchwork/patch/856512/ into 4.14. It fixes a
> potential resource leak when shutting down the request queue.

Potential meaning "it does happen", or "it can happen if we do this", or
just "maybe it might happen, we really do not know?"

> Once this patch is applied, there is a potential for a null pointer dereference.
> That's what the second patch fixes.

What is the git id of that upstream fix?

> The third patch is just an optimization to stop processing earlier.

That's not how stable kernels work :(

> See https://patchwork.kernel.org/patch/10925469/ for the initial motivation.

I don't understand the motivation from that link at all :(

> This commit applies to v4.14.116. It is already included in 4.19. 4.19 doesn't
> suffer from the null pointer dereference because later commits migrate the mmc
> stack to blk-mq.

What are those later commits?

> I tested this patch set by randomly connecting/disconnecting the SD
> card. I got over 189650 itarations without a problem.

And if you do not have these patches, on 4.14.y, how many iterations
cause a problem?  If you just apply the first patch, does that work?

_EVERY_ time we take a patch that is not upstream, something usually is
broken and needs to be fixed.  We have a long long long history of this,
so if you want to have a patch that is not upstream applied to a stable
kernel release, you need a whole lot of justification and explanation
and begging.  And you need to be around to fix the fallout for when it
breaks :)

thanks,

greg k-h

Raul Rangel June 19, 2019, 4:46 p.m. UTC | #3

On Tue, May 14, 2019 at 11:19:34AM +0200, Greg Kroah-Hartman wrote:
> On Mon, May 13, 2019 at 11:55:18AM -0600, Raul E Rangel wrote:
> > I think we should cherry-pick 41e3efd07d5a02c80f503e29d755aa1bbb4245de
> > https://lore.kernel.org/patchwork/patch/856512/ into 4.14. It fixes a
> > potential resource leak when shutting down the request queue.
> 
> Potential meaning "it does happen", or "it can happen if we do this", or
> just "maybe it might happen, we really do not know?"
It does happen if the AMD SDHCI patches are cherry-picked into 4.14.
https://lkml.org/lkml/2019/5/1/398
It can be mitigated by changing the line in the patch with
`mmc_detect_change(host, 0)` to mmc_detect_change(host, 200)`, but
that's just a workaround to play with the timing so the race condition
doesn't happen.
> 
> > Once this patch is applied, there is a potential for a null pointer dereference.
> > That's what the second patch fixes.
> 
> What is the git id of that upstream fix?
So there is no specific upstream fix. There was a large patch set that
migrated mmc to using blk-mq, so the bug just kind of went away.
https://lwn.net/Articles/739774/ or 0fbfd12518303e9b32ac9fd231439459eac848f9

> 
> > The third patch is just an optimization to stop processing earlier.
> 
> That's not how stable kernels work :(
Oops, I guess we can ignore that patch. It just prevents mmc_init_request
from being called, but it doesn't matter since the 2nd patch actually
checks for NULL now.
> 
> > See https://patchwork.kernel.org/patch/10925469/ for the initial motivation.
> 
> I don't understand the motivation from that link at all :(
> 
> > This commit applies to v4.14.116. It is already included in 4.19. 4.19 doesn't
> > suffer from the null pointer dereference because later commits migrate the mmc
> > stack to blk-mq.
> 
> What are those later commits?
Commit 0fbfd12518303e9b32ac9fd231439459eac848f9 specifically deletes the
code for the 2nd patch. As I said above, the NULL pointer dereference
just kind of went away as part of the blk-mq migration, so there is no
upstream fix :(

> 
> > I tested this patch set by randomly connecting/disconnecting the SD
> > card. I got over 189650 itarations without a problem.
> 
> And if you do not have these patches, on 4.14.y, how many iterations
> cause a problem?  If you just apply the first patch, does that work?
If I apply the AMD SDHCI patches and nothing else, then I can cause a
resource leak within 10 iterations. If I apply just the first patch then
I can cause a NULL pointer error within 10 iterations. If I apply both 1
and 2, then everything works as expected and I can't cause a problem.
> 
> _EVERY_ time we take a patch that is not upstream, something usually is
> broken and needs to be fixed.  We have a long long long history of this,
> so if you want to have a patch that is not upstream applied to a stable
> kernel release, you need a whole lot of justification and explanation
> and begging.  And you need to be around to fix the fallout for when it
> breaks :)

It also looks like 2361bfb055f948eac6583fa3c75a014da84fe554 includes a
fix for 41e3efd07d5a02c80f503e29d755aa1bbb4245de, so that would need to
be cherry picked in.

I guess I should have included a fixes: line in my second patch.

So to summarize:
- cherry-pick 41e3efd07d5a02c80f503e29d755aa1bbb4245de
- cherry-pick 2361bfb055f948eac6583fa3c75a014da84fe554
- apply 2nd patch but add to commit message:
  Fixes: 41e3efd07d5a ("mmc: block: Simplify cleaning up the queue")
- Ignore patch 3 since it's an optimization.
> 
> thanks,
> 
> greg k-h
Thanks again for your time, and sorry for the really late response!

Raul

Greg Kroah-Hartman June 19, 2019, 5:09 p.m. UTC | #4

On Wed, Jun 19, 2019 at 10:46:25AM -0600, Raul Rangel wrote:
> On Tue, May 14, 2019 at 11:19:34AM +0200, Greg Kroah-Hartman wrote:
> > On Mon, May 13, 2019 at 11:55:18AM -0600, Raul E Rangel wrote:
> > > I think we should cherry-pick 41e3efd07d5a02c80f503e29d755aa1bbb4245de
> > > https://lore.kernel.org/patchwork/patch/856512/ into 4.14. It fixes a
> > > potential resource leak when shutting down the request queue.
> > 
> > Potential meaning "it does happen", or "it can happen if we do this", or
> > just "maybe it might happen, we really do not know?"
> It does happen if the AMD SDHCI patches are cherry-picked into 4.14.
> https://lkml.org/lkml/2019/5/1/398

Why are those patches somehow being required to be added to 4.14.y?  If
they are not added, is all fine?

thanks,

greg k-h

Raul Rangel June 19, 2019, 6:23 p.m. UTC | #5

On Wed, Jun 19, 2019 at 07:09:17PM +0200, Greg Kroah-Hartman wrote:
> On Wed, Jun 19, 2019 at 10:46:25AM -0600, Raul Rangel wrote:
> > On Tue, May 14, 2019 at 11:19:34AM +0200, Greg Kroah-Hartman wrote:
> > > On Mon, May 13, 2019 at 11:55:18AM -0600, Raul E Rangel wrote:
> > > > I think we should cherry-pick 41e3efd07d5a02c80f503e29d755aa1bbb4245de
> > > > https://lore.kernel.org/patchwork/patch/856512/ into 4.14. It fixes a
> > > > potential resource leak when shutting down the request queue.
> > > 
> > > Potential meaning "it does happen", or "it can happen if we do this", or
> > > just "maybe it might happen, we really do not know?"
> > It does happen if the AMD SDHCI patches are cherry-picked into 4.14.
> > https://lkml.org/lkml/2019/5/1/398
> 
> Why are those patches somehow being required to be added to 4.14.y?  If
> they are not added, is all fine?
I was just thinking we would backport the patches to fix this AMD SDHCI
hardware bug, but I guess we don't need to.

Thanks

Greg Kroah-Hartman June 19, 2019, 6:36 p.m. UTC | #6

On Wed, Jun 19, 2019 at 12:23:04PM -0600, Raul Rangel wrote:
> On Wed, Jun 19, 2019 at 07:09:17PM +0200, Greg Kroah-Hartman wrote:
> > On Wed, Jun 19, 2019 at 10:46:25AM -0600, Raul Rangel wrote:
> > > On Tue, May 14, 2019 at 11:19:34AM +0200, Greg Kroah-Hartman wrote:
> > > > On Mon, May 13, 2019 at 11:55:18AM -0600, Raul E Rangel wrote:
> > > > > I think we should cherry-pick 41e3efd07d5a02c80f503e29d755aa1bbb4245de
> > > > > https://lore.kernel.org/patchwork/patch/856512/ into 4.14. It fixes a
> > > > > potential resource leak when shutting down the request queue.
> > > > 
> > > > Potential meaning "it does happen", or "it can happen if we do this", or
> > > > just "maybe it might happen, we really do not know?"
> > > It does happen if the AMD SDHCI patches are cherry-picked into 4.14.
> > > https://lkml.org/lkml/2019/5/1/398
> > 
> > Why are those patches somehow being required to be added to 4.14.y?  If
> > they are not added, is all fine?
> I was just thinking we would backport the patches to fix this AMD SDHCI
> hardware bug, but I guess we don't need to.

Has anyone asked for those to be backported?  Does anyone require them
to be?  What's keeping users from using a newer kernel that have this
specific hardware issue?

Trying to apply patches to a stable kernel due to an issue that is not
even in that stable kernel is crazy.  No wonder I am totally confused...

thanks,

greg k-h

[stable/4.14.y,0/3] mmc: Fix a potential resource leak when shutting down request queue.

Message

Comments