Message ID | 878ulaxn6d.fsf@nemi.mork.no (mailing list archive) |
---|---|
State | Accepted, archived |
Headers | show |
On Tuesday, September 23, 2014 07:27:06 PM Bjørn Mork wrote: > > --=-=-= > Content-Type: text/plain; charset=utf-8 > Content-Transfer-Encoding: quoted-printable > > "Rafael J. Wysocki" <rjw@rjwysocki.net> writes: > > > > I would suspect one of these commits: > > > > 84c91b7ae07c PM / hibernate: avoid unsafe pages in e820 reserved regions > > 0f7d83e85dbd PM / Hibernate: Touch Soft Lockup Watchdog in rtree_next_node > > 9047eb629e5c PM / Hibernate: Remove the old memory-bitmap implementation > > 6efde38f0769 PM / Hibernate: Iterate over set bits instead of PFNs in sws= > usp_free() > > 3a20cb177961 PM / Hibernate: Implement position keeping in radix tree > > 07a338236fdc PM / Hibernate: Add memory_rtree_find_bit function > > f469f02dc6fa PM / Hibernate: Create a Radix-Tree to store memory bitmap > > > > so I guess you can start from checking them (the topmpost one is the late= > st). > > Thanks. Yes, you were correct. The bad commit is > > 6efde38f0769 PM / Hibernate: Iterate over set bits instead of PFNs in swsu= > sp_free() > > I have confirmed that reverting only this commit on top of a clean > v3.17-rc6 fixes the problem. I am attaching the context-modified revert > patch I used. OK, thanks! Joerg, can you please have a look at this? > --=-=-= > Content-Type: text/x-diff > Content-Disposition: inline; > filename=0001-Revert-PM-Hibernate-Iterate-over-set-bits-instead-of.patch > > From 92950fd86c2f74ae17840bfc15651b6ae77e43df Mon Sep 17 00:00:00 2001 > From: =?UTF-8?q?Bj=C3=B8rn=20Mork?= <bjorn@mork.no> > Date: Tue, 23 Sep 2014 19:18:43 +0200 > Subject: [PATCH] Revert "PM / Hibernate: Iterate over set bits instead of > PFNs in swsusp_free()" > > This reverts commit 6efde38f07690652bf0d93f5e4f1a5f496574806. > > Conflicts: > kernel/power/snapshot.c > --- > kernel/power/snapshot.c | 50 ++++++++++++++--------------------------------- > 1 file changed, 15 insertions(+), 35 deletions(-) > > diff --git a/kernel/power/snapshot.c b/kernel/power/snapshot.c > index c4b8093c80b3..f1604d8cf489 100644 > --- a/kernel/power/snapshot.c > +++ b/kernel/power/snapshot.c > @@ -725,14 +725,6 @@ static void memory_bm_clear_bit(struct memory_bitmap *bm, unsigned long pfn) > clear_bit(bit, addr); > } > > -static void memory_bm_clear_current(struct memory_bitmap *bm) > -{ > - int bit; > - > - bit = max(bm->cur.node_bit - 1, 0); > - clear_bit(bit, bm->cur.node->data); > -} > - > static int memory_bm_test_bit(struct memory_bitmap *bm, unsigned long pfn) > { > void *addr; > @@ -1341,35 +1333,23 @@ static struct memory_bitmap copy_bm; > > void swsusp_free(void) > { > - unsigned long fb_pfn, fr_pfn; > - > - memory_bm_position_reset(forbidden_pages_map); > - memory_bm_position_reset(free_pages_map); > - > -loop: > - fr_pfn = memory_bm_next_pfn(free_pages_map); > - fb_pfn = memory_bm_next_pfn(forbidden_pages_map); > - > - /* > - * Find the next bit set in both bitmaps. This is guaranteed to > - * terminate when fb_pfn == fr_pfn == BM_END_OF_MAP. > - */ > - do { > - if (fb_pfn < fr_pfn) > - fb_pfn = memory_bm_next_pfn(forbidden_pages_map); > - if (fr_pfn < fb_pfn) > - fr_pfn = memory_bm_next_pfn(free_pages_map); > - } while (fb_pfn != fr_pfn); > - > - if (fr_pfn != BM_END_OF_MAP && pfn_valid(fr_pfn)) { > - struct page *page = pfn_to_page(fr_pfn); > + struct zone *zone; > + unsigned long pfn, max_zone_pfn; > > - memory_bm_clear_current(forbidden_pages_map); > - memory_bm_clear_current(free_pages_map); > - __free_page(page); > - goto loop; > + for_each_populated_zone(zone) { > + max_zone_pfn = zone_end_pfn(zone); > + for (pfn = zone->zone_start_pfn; pfn < max_zone_pfn; pfn++) > + if (pfn_valid(pfn)) { > + struct page *page = pfn_to_page(pfn); > + > + if (swsusp_page_is_forbidden(page) && > + swsusp_page_is_free(page)) { > + swsusp_unset_page_forbidden(page); > + swsusp_unset_page_free(page); > + __free_page(page); > + } > + } > } > - > nr_copy_pages = 0; > nr_meta_pages = 0; > restore_pblist = NULL; >
On Tuesday, September 23, 2014 07:27:06 PM Bjørn Mork wrote: > > --=-=-= > Content-Type: text/plain; charset=utf-8 > Content-Transfer-Encoding: quoted-printable > > "Rafael J. Wysocki" <rjw@rjwysocki.net> writes: > > > > I would suspect one of these commits: > > > > 84c91b7ae07c PM / hibernate: avoid unsafe pages in e820 reserved regions > > 0f7d83e85dbd PM / Hibernate: Touch Soft Lockup Watchdog in rtree_next_node > > 9047eb629e5c PM / Hibernate: Remove the old memory-bitmap implementation > > 6efde38f0769 PM / Hibernate: Iterate over set bits instead of PFNs in sws= > usp_free() > > 3a20cb177961 PM / Hibernate: Implement position keeping in radix tree > > 07a338236fdc PM / Hibernate: Add memory_rtree_find_bit function > > f469f02dc6fa PM / Hibernate: Create a Radix-Tree to store memory bitmap > > > > so I guess you can start from checking them (the topmpost one is the late= > st). > > Thanks. Yes, you were correct. The bad commit is > > 6efde38f0769 PM / Hibernate: Iterate over set bits instead of PFNs in swsu= > sp_free() > > I have confirmed that reverting only this commit on top of a clean > v3.17-rc6 fixes the problem. I am attaching the context-modified revert > patch I used. Instead of reverting the commit, can you please check if adding an "if (page)" check around memory_bm_clear_current(forbidden_pages_map); memory_bm_clear_current(free_pages_map); __free_page(page); in swsusp_free() makes the NULL pointer deref go away? Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
"Rafael J. Wysocki" <rjw@rjwysocki.net> writes: > On Tuesday, September 23, 2014 07:27:06 PM Bjørn Mork wrote: >> >> --=-=-= >> Content-Type: text/plain; charset=utf-8 >> Content-Transfer-Encoding: quoted-printable >> >> "Rafael J. Wysocki" <rjw@rjwysocki.net> writes: >> >> >> > I would suspect one of these commits: >> > >> > 84c91b7ae07c PM / hibernate: avoid unsafe pages in e820 reserved regions >> > 0f7d83e85dbd PM / Hibernate: Touch Soft Lockup Watchdog in rtree_next_node >> > 9047eb629e5c PM / Hibernate: Remove the old memory-bitmap implementation >> > 6efde38f0769 PM / Hibernate: Iterate over set bits instead of PFNs in sws= >> usp_free() >> > 3a20cb177961 PM / Hibernate: Implement position keeping in radix tree >> > 07a338236fdc PM / Hibernate: Add memory_rtree_find_bit function >> > f469f02dc6fa PM / Hibernate: Create a Radix-Tree to store memory bitmap >> > >> > so I guess you can start from checking them (the topmpost one is the late= >> st). >> >> Thanks. Yes, you were correct. The bad commit is >> >> 6efde38f0769 PM / Hibernate: Iterate over set bits instead of PFNs in swsu= >> sp_free() >> >> I have confirmed that reverting only this commit on top of a clean >> v3.17-rc6 fixes the problem. I am attaching the context-modified revert >> patch I used. > > Instead of reverting the commit, can you please check if adding an "if (page)" > check around > > memory_bm_clear_current(forbidden_pages_map); > memory_bm_clear_current(free_pages_map); > __free_page(page); > > in swsusp_free() makes the NULL pointer deref go away? Tested. But on a hunch, I also added this just to be sure: @@ -1343,7 +1343,15 @@ void swsusp_free(void) { unsigned long fb_pfn, fr_pfn; +WARN_ON(!forbidden_pages_map); +if (!forbidden_pages_map) + return; + memory_bm_position_reset(forbidden_pages_map); +WARN_ON(!free_pages_map); +if (!free_pages_map) + return; + memory_bm_position_reset(free_pages_map); And got: [ 34.500851] WARNING: CPU: 1 PID: 5052 at kernel/power/snapshot.c:1346 swsusp_free+0x25/0x160() [ 34.500854] Modules linked in: xt_multiport iptable_filter ip_tables dm_mod bnep xt_hl nf_log_ipv6 nf_log_common xt_LOG binfmt_misc ip6table_filter ip6_tables x_tables nfsd nfs_acl nfs lockd fscache sunrpc 8021q garp stp mrp llc tun loop fuse snd_hda_codec_conexant snd_hda_codec_generic iTCO_wdt iTCO_vendor_support arc4 coretemp kvm_intel cdc_mbim kvm cdc_wdm uvcvideo cdc_ncm videobuf2_vmalloc evdev videobuf2_memops videobuf2_core psmouse cdc_acm v4l2_common usbnet mii videodev serio_raw i2c_i801 iwlmvm snd_hda_intel snd_hda_controller mac80211 snd_hda_codec snd_hwdep snd_pcm_oss snd_mixer_oss iwlwifi i915 lpc_ich i2c_algo_bit mfd_core drm_kms_helper snd_pcm cfg80211 snd_timer drm wmi thinkpad_acpi nvram snd soundcore battery ecb ac btusb bluetooth video rfkill button acpi_cpufreq processor ext4 crc16 jbd2 mbcache nbd sg sd_mod crc_t10dif sr_mod crct10dif_generic cdrom crct10dif_common microcode ahci libahci libata scsi_mod uhci_hcd ehci_pci ehci_hcd e1000e usbcore ptp usb _common pps_core thermal thermal_sys [ 34.501050] CPU: 1 PID: 5052 Comm: s2disk Not tainted 3.17.0-rc6+ #260 [ 34.501054] Hardware name: LENOVO 2776LEG/2776LEG, BIOS 6EET55WW (3.15 ) 12/19/2011 [ 34.501057] 0000000000000009 ffff88021e9cfde8 ffffffff813bea17 0000000000000000 [ 34.501066] 0000000000000000 ffff88021e9cfe28 ffffffff8103eb60 ffffffff8107f8be [ 34.501075] ffffffff8107c902 ffff8800ae809a00 0000000000000010 ffff880232149690 [ 34.501083] Call Trace: [ 34.501092] [<ffffffff813bea17>] dump_stack+0x4e/0x68 [ 34.501100] [<ffffffff8103eb60>] warn_slowpath_common+0x7c/0x96 [ 34.501107] [<ffffffff8107f8be>] ? lock_system_sleep+0x22/0x24 [ 34.501113] [<ffffffff8107c902>] ? swsusp_free+0x25/0x160 [ 34.501119] [<ffffffff8103eb8f>] warn_slowpath_null+0x15/0x17 [ 34.501125] [<ffffffff8107c902>] swsusp_free+0x25/0x160 [ 34.501131] [<ffffffff8107fb82>] snapshot_release+0x13/0x68 [ 34.501138] [<ffffffff81136c78>] __fput+0x10d/0x1e5 [ 34.501144] [<ffffffff81136d80>] ____fput+0x9/0xb [ 34.501151] [<ffffffff81056a51>] task_work_run+0x81/0xb0 [ 34.501159] [<ffffffff810026d8>] do_notify_resume+0x55/0x66 [ 34.501166] [<ffffffff811ee7ce>] ? trace_hardirqs_on_thunk+0x3a/0x3f [ 34.501172] [<ffffffff813c4a98>] int_signal+0x12/0x17 [ 34.501177] ---[ end trace fd197b669f9d81a3 ]--- where 1346 is bjorn@canardo:/usr/local/src/build-tmp/linux$ sed -ne 1346p kernel/power/snapshot.c WARN_ON(!forbidden_pages_map); So it looks like that map for some reason is uninitialized when swsusp_free() is called after a hibernate->resume cycle. Is this only hitting me? If so, then I *really* wonder what I'm doing out of the ordinary... Bjørn -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Sep 23, 2014 at 10:28:00PM +0200, Rafael J. Wysocki wrote: > On Tuesday, September 23, 2014 07:27:06 PM Bjørn Mork wrote: > > > > --=-=-= > > Content-Type: text/plain; charset=utf-8 > > Content-Transfer-Encoding: quoted-printable > > > > "Rafael J. Wysocki" <rjw@rjwysocki.net> writes: > > > > > > > I would suspect one of these commits: > > > > > > 84c91b7ae07c PM / hibernate: avoid unsafe pages in e820 reserved regions > > > 0f7d83e85dbd PM / Hibernate: Touch Soft Lockup Watchdog in rtree_next_node > > > 9047eb629e5c PM / Hibernate: Remove the old memory-bitmap implementation > > > 6efde38f0769 PM / Hibernate: Iterate over set bits instead of PFNs in sws= > > usp_free() > > > 3a20cb177961 PM / Hibernate: Implement position keeping in radix tree > > > 07a338236fdc PM / Hibernate: Add memory_rtree_find_bit function > > > f469f02dc6fa PM / Hibernate: Create a Radix-Tree to store memory bitmap > > > > > > so I guess you can start from checking them (the topmpost one is the late= > > st). > > > > Thanks. Yes, you were correct. The bad commit is > > > > 6efde38f0769 PM / Hibernate: Iterate over set bits instead of PFNs in swsu= > > sp_free() > > > > I have confirmed that reverting only this commit on top of a clean > > v3.17-rc6 fixes the problem. I am attaching the context-modified revert > > patch I used. > > OK, thanks! > > Joerg, can you please have a look at this? Yes, I have a look. Joerg -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Sep 24, 2014 at 09:45:26AM +0200, Bjørn Mork wrote: > >> Thanks. Yes, you were correct. The bad commit is > >> > >> 6efde38f0769 PM / Hibernate: Iterate over set bits instead of PFNs in swsu= > >> sp_free() > >> > >> I have confirmed that reverting only this commit on top of a clean > >> v3.17-rc6 fixes the problem. I am attaching the context-modified revert > >> patch I used. This is weird, because ... > > > > Instead of reverting the commit, can you please check if adding an "if (page)" > > check around > > > > memory_bm_clear_current(forbidden_pages_map); > > memory_bm_clear_current(free_pages_map); > > __free_page(page); > > > > in swsusp_free() makes the NULL pointer deref go away? > > Tested. But on a hunch, I also added this just to be sure: > > > @@ -1343,7 +1343,15 @@ void swsusp_free(void) > { > unsigned long fb_pfn, fr_pfn; > > +WARN_ON(!forbidden_pages_map); > +if (!forbidden_pages_map) > + return; > + > memory_bm_position_reset(forbidden_pages_map); > +WARN_ON(!free_pages_map); > +if (!free_pages_map) > + return; > + > memory_bm_position_reset(free_pages_map); ... the old code did not check for a valid forbidden_pages_map and free_pages_map either, so it should crash there too. Joerg -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Joerg Roedel <jroedel@suse.de> writes: >> @@ -1343,7 +1343,15 @@ void swsusp_free(void) >> { >> unsigned long fb_pfn, fr_pfn; >> >> +WARN_ON(!forbidden_pages_map); >> +if (!forbidden_pages_map) >> + return; >> + >> memory_bm_position_reset(forbidden_pages_map); >> +WARN_ON(!free_pages_map); >> +if (!free_pages_map) >> + return; >> + >> memory_bm_position_reset(free_pages_map); > > ... the old code did not check for a valid forbidden_pages_map and > free_pages_map either, so it should crash there too. Well, it did test AFAICS... The 3.16 code is int swsusp_page_is_forbidden(struct page *page) { return forbidden_pages_map ? memory_bm_test_bit(forbidden_pages_map, page_to_pfn(page)) : 0; } .. void swsusp_free(void) { .. if (swsusp_page_is_forbidden(page) && swsusp_page_is_free(page)) { swsusp_unset_page_forbidden(page); swsusp_unset_page_free(page); __free_page(page); } Bjørn -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wednesday, September 24, 2014 12:17:18 PM Bjørn Mork wrote: > Joerg Roedel <jroedel@suse.de> writes: > > >> @@ -1343,7 +1343,15 @@ void swsusp_free(void) > >> { > >> unsigned long fb_pfn, fr_pfn; > >> > >> +WARN_ON(!forbidden_pages_map); > >> +if (!forbidden_pages_map) > >> + return; > >> + > >> memory_bm_position_reset(forbidden_pages_map); > >> +WARN_ON(!free_pages_map); > >> +if (!free_pages_map) > >> + return; > >> + > >> memory_bm_position_reset(free_pages_map); > > > > ... the old code did not check for a valid forbidden_pages_map and > > free_pages_map either, so it should crash there too. > > Well, it did test AFAICS... The 3.16 code is > > int swsusp_page_is_forbidden(struct page *page) > { > return forbidden_pages_map ? > memory_bm_test_bit(forbidden_pages_map, page_to_pfn(page)) : 0; > } > > .. > > void swsusp_free(void) > { > .. > if (swsusp_page_is_forbidden(page) && > swsusp_page_is_free(page)) { > swsusp_unset_page_forbidden(page); > swsusp_unset_page_free(page); > __free_page(page); > } > > > > > OK I've decided to go with a revert for 3.17, as we don't seem to have an immediate fix and the final 3.17 may be as close as this Sunday. So I'm going to send my final pull request for 3.17 to Linus tomorrow or early on Friday.
"Rafael J. Wysocki" <rjw@rjwysocki.net> writes: > I've decided to go with a revert for 3.17, as we don't seem to have an immediate > fix and the final 3.17 may be as close as this Sunday. So I'm going to send my > final pull request for 3.17 to Linus tomorrow or early on Friday. Sounds safest to me, FWIW. For the next round of this, I think the only missing part was some test like if (!forbidden_pages_map || !free_pages_map) goto return_without_freeing_anything; at the beginning of swsusp_free(). I think we can agree that it isn't necessary to repeat that test for every page like the old code did :-) But I I believe it would be useful to analyze exactly why this is necessary, and possibly add a comment with that explanation. I don't have any other arguments for the test than the old code and the oops. If this is how it is designed then that would be nice to spell out. If not, then maybe there is a flaw somewhere else that should be fixed instead. And BTW, I believe it would be useful if at least one more person in the world tested hibernation between each release ;-) Bjørn -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From 92950fd86c2f74ae17840bfc15651b6ae77e43df Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Bj=C3=B8rn=20Mork?= <bjorn@mork.no> Date: Tue, 23 Sep 2014 19:18:43 +0200 Subject: [PATCH] Revert "PM / Hibernate: Iterate over set bits instead of PFNs in swsusp_free()" This reverts commit 6efde38f07690652bf0d93f5e4f1a5f496574806. Conflicts: kernel/power/snapshot.c --- kernel/power/snapshot.c | 50 ++++++++++++++--------------------------------- 1 file changed, 15 insertions(+), 35 deletions(-) diff --git a/kernel/power/snapshot.c b/kernel/power/snapshot.c index c4b8093c80b3..f1604d8cf489 100644 --- a/kernel/power/snapshot.c +++ b/kernel/power/snapshot.c @@ -725,14 +725,6 @@ static void memory_bm_clear_bit(struct memory_bitmap *bm, unsigned long pfn) clear_bit(bit, addr); } -static void memory_bm_clear_current(struct memory_bitmap *bm) -{ - int bit; - - bit = max(bm->cur.node_bit - 1, 0); - clear_bit(bit, bm->cur.node->data); -} - static int memory_bm_test_bit(struct memory_bitmap *bm, unsigned long pfn) { void *addr; @@ -1341,35 +1333,23 @@ static struct memory_bitmap copy_bm; void swsusp_free(void) { - unsigned long fb_pfn, fr_pfn; - - memory_bm_position_reset(forbidden_pages_map); - memory_bm_position_reset(free_pages_map); - -loop: - fr_pfn = memory_bm_next_pfn(free_pages_map); - fb_pfn = memory_bm_next_pfn(forbidden_pages_map); - - /* - * Find the next bit set in both bitmaps. This is guaranteed to - * terminate when fb_pfn == fr_pfn == BM_END_OF_MAP. - */ - do { - if (fb_pfn < fr_pfn) - fb_pfn = memory_bm_next_pfn(forbidden_pages_map); - if (fr_pfn < fb_pfn) - fr_pfn = memory_bm_next_pfn(free_pages_map); - } while (fb_pfn != fr_pfn); - - if (fr_pfn != BM_END_OF_MAP && pfn_valid(fr_pfn)) { - struct page *page = pfn_to_page(fr_pfn); + struct zone *zone; + unsigned long pfn, max_zone_pfn; - memory_bm_clear_current(forbidden_pages_map); - memory_bm_clear_current(free_pages_map); - __free_page(page); - goto loop; + for_each_populated_zone(zone) { + max_zone_pfn = zone_end_pfn(zone); + for (pfn = zone->zone_start_pfn; pfn < max_zone_pfn; pfn++) + if (pfn_valid(pfn)) { + struct page *page = pfn_to_page(pfn); + + if (swsusp_page_is_forbidden(page) && + swsusp_page_is_free(page)) { + swsusp_unset_page_forbidden(page); + swsusp_unset_page_free(page); + __free_page(page); + } + } } - nr_copy_pages = 0; nr_meta_pages = 0; restore_pblist = NULL; -- 1.7.10.4