Message ID | 20190513175521.84955-1-rrangel@chromium.org (mailing list archive) |
---|---|
Headers | show |
Series | mmc: Fix a potential resource leak when shutting down request queue. | expand |
On Mon, May 13, 2019 at 11:55:18AM -0600, Raul E Rangel wrote: >I think we should cherry-pick 41e3efd07d5a02c80f503e29d755aa1bbb4245de >https://lore.kernel.org/patchwork/patch/856512/ into 4.14. It fixes a >potential resource leak when shutting down the request queue. > >Once this patch is applied, there is a potential for a null pointer dereference. >That's what the second patch fixes. > >The third patch is just an optimization to stop processing earlier. Is this actually part of a fix? Why do we want this optimization? -- Thanks, Sasha
On Mon, May 13, 2019 at 11:55:18AM -0600, Raul E Rangel wrote: > I think we should cherry-pick 41e3efd07d5a02c80f503e29d755aa1bbb4245de > https://lore.kernel.org/patchwork/patch/856512/ into 4.14. It fixes a > potential resource leak when shutting down the request queue. Potential meaning "it does happen", or "it can happen if we do this", or just "maybe it might happen, we really do not know?" > Once this patch is applied, there is a potential for a null pointer dereference. > That's what the second patch fixes. What is the git id of that upstream fix? > The third patch is just an optimization to stop processing earlier. That's not how stable kernels work :( > See https://patchwork.kernel.org/patch/10925469/ for the initial motivation. I don't understand the motivation from that link at all :( > This commit applies to v4.14.116. It is already included in 4.19. 4.19 doesn't > suffer from the null pointer dereference because later commits migrate the mmc > stack to blk-mq. What are those later commits? > I tested this patch set by randomly connecting/disconnecting the SD > card. I got over 189650 itarations without a problem. And if you do not have these patches, on 4.14.y, how many iterations cause a problem? If you just apply the first patch, does that work? _EVERY_ time we take a patch that is not upstream, something usually is broken and needs to be fixed. We have a long long long history of this, so if you want to have a patch that is not upstream applied to a stable kernel release, you need a whole lot of justification and explanation and begging. And you need to be around to fix the fallout for when it breaks :) thanks, greg k-h
On Tue, May 14, 2019 at 11:19:34AM +0200, Greg Kroah-Hartman wrote: > On Mon, May 13, 2019 at 11:55:18AM -0600, Raul E Rangel wrote: > > I think we should cherry-pick 41e3efd07d5a02c80f503e29d755aa1bbb4245de > > https://lore.kernel.org/patchwork/patch/856512/ into 4.14. It fixes a > > potential resource leak when shutting down the request queue. > > Potential meaning "it does happen", or "it can happen if we do this", or > just "maybe it might happen, we really do not know?" It does happen if the AMD SDHCI patches are cherry-picked into 4.14. https://lkml.org/lkml/2019/5/1/398 It can be mitigated by changing the line in the patch with `mmc_detect_change(host, 0)` to mmc_detect_change(host, 200)`, but that's just a workaround to play with the timing so the race condition doesn't happen. > > > Once this patch is applied, there is a potential for a null pointer dereference. > > That's what the second patch fixes. > > What is the git id of that upstream fix? So there is no specific upstream fix. There was a large patch set that migrated mmc to using blk-mq, so the bug just kind of went away. https://lwn.net/Articles/739774/ or 0fbfd12518303e9b32ac9fd231439459eac848f9 > > > The third patch is just an optimization to stop processing earlier. > > That's not how stable kernels work :( Oops, I guess we can ignore that patch. It just prevents mmc_init_request from being called, but it doesn't matter since the 2nd patch actually checks for NULL now. > > > See https://patchwork.kernel.org/patch/10925469/ for the initial motivation. > > I don't understand the motivation from that link at all :( > > > This commit applies to v4.14.116. It is already included in 4.19. 4.19 doesn't > > suffer from the null pointer dereference because later commits migrate the mmc > > stack to blk-mq. > > What are those later commits? Commit 0fbfd12518303e9b32ac9fd231439459eac848f9 specifically deletes the code for the 2nd patch. As I said above, the NULL pointer dereference just kind of went away as part of the blk-mq migration, so there is no upstream fix :( > > > I tested this patch set by randomly connecting/disconnecting the SD > > card. I got over 189650 itarations without a problem. > > And if you do not have these patches, on 4.14.y, how many iterations > cause a problem? If you just apply the first patch, does that work? If I apply the AMD SDHCI patches and nothing else, then I can cause a resource leak within 10 iterations. If I apply just the first patch then I can cause a NULL pointer error within 10 iterations. If I apply both 1 and 2, then everything works as expected and I can't cause a problem. > > _EVERY_ time we take a patch that is not upstream, something usually is > broken and needs to be fixed. We have a long long long history of this, > so if you want to have a patch that is not upstream applied to a stable > kernel release, you need a whole lot of justification and explanation > and begging. And you need to be around to fix the fallout for when it > breaks :) It also looks like 2361bfb055f948eac6583fa3c75a014da84fe554 includes a fix for 41e3efd07d5a02c80f503e29d755aa1bbb4245de, so that would need to be cherry picked in. I guess I should have included a fixes: line in my second patch. So to summarize: - cherry-pick 41e3efd07d5a02c80f503e29d755aa1bbb4245de - cherry-pick 2361bfb055f948eac6583fa3c75a014da84fe554 - apply 2nd patch but add to commit message: Fixes: 41e3efd07d5a ("mmc: block: Simplify cleaning up the queue") - Ignore patch 3 since it's an optimization. > > thanks, > > greg k-h Thanks again for your time, and sorry for the really late response! Raul
On Wed, Jun 19, 2019 at 10:46:25AM -0600, Raul Rangel wrote: > On Tue, May 14, 2019 at 11:19:34AM +0200, Greg Kroah-Hartman wrote: > > On Mon, May 13, 2019 at 11:55:18AM -0600, Raul E Rangel wrote: > > > I think we should cherry-pick 41e3efd07d5a02c80f503e29d755aa1bbb4245de > > > https://lore.kernel.org/patchwork/patch/856512/ into 4.14. It fixes a > > > potential resource leak when shutting down the request queue. > > > > Potential meaning "it does happen", or "it can happen if we do this", or > > just "maybe it might happen, we really do not know?" > It does happen if the AMD SDHCI patches are cherry-picked into 4.14. > https://lkml.org/lkml/2019/5/1/398 Why are those patches somehow being required to be added to 4.14.y? If they are not added, is all fine? thanks, greg k-h
On Wed, Jun 19, 2019 at 07:09:17PM +0200, Greg Kroah-Hartman wrote: > On Wed, Jun 19, 2019 at 10:46:25AM -0600, Raul Rangel wrote: > > On Tue, May 14, 2019 at 11:19:34AM +0200, Greg Kroah-Hartman wrote: > > > On Mon, May 13, 2019 at 11:55:18AM -0600, Raul E Rangel wrote: > > > > I think we should cherry-pick 41e3efd07d5a02c80f503e29d755aa1bbb4245de > > > > https://lore.kernel.org/patchwork/patch/856512/ into 4.14. It fixes a > > > > potential resource leak when shutting down the request queue. > > > > > > Potential meaning "it does happen", or "it can happen if we do this", or > > > just "maybe it might happen, we really do not know?" > > It does happen if the AMD SDHCI patches are cherry-picked into 4.14. > > https://lkml.org/lkml/2019/5/1/398 > > Why are those patches somehow being required to be added to 4.14.y? If > they are not added, is all fine? I was just thinking we would backport the patches to fix this AMD SDHCI hardware bug, but I guess we don't need to. Thanks
On Wed, Jun 19, 2019 at 12:23:04PM -0600, Raul Rangel wrote: > On Wed, Jun 19, 2019 at 07:09:17PM +0200, Greg Kroah-Hartman wrote: > > On Wed, Jun 19, 2019 at 10:46:25AM -0600, Raul Rangel wrote: > > > On Tue, May 14, 2019 at 11:19:34AM +0200, Greg Kroah-Hartman wrote: > > > > On Mon, May 13, 2019 at 11:55:18AM -0600, Raul E Rangel wrote: > > > > > I think we should cherry-pick 41e3efd07d5a02c80f503e29d755aa1bbb4245de > > > > > https://lore.kernel.org/patchwork/patch/856512/ into 4.14. It fixes a > > > > > potential resource leak when shutting down the request queue. > > > > > > > > Potential meaning "it does happen", or "it can happen if we do this", or > > > > just "maybe it might happen, we really do not know?" > > > It does happen if the AMD SDHCI patches are cherry-picked into 4.14. > > > https://lkml.org/lkml/2019/5/1/398 > > > > Why are those patches somehow being required to be added to 4.14.y? If > > they are not added, is all fine? > I was just thinking we would backport the patches to fix this AMD SDHCI > hardware bug, but I guess we don't need to. Has anyone asked for those to be backported? Does anyone require them to be? What's keeping users from using a newer kernel that have this specific hardware issue? Trying to apply patches to a stable kernel due to an issue that is not even in that stable kernel is crazy. No wonder I am totally confused... thanks, greg k-h