Message ID | 1485422212-31546-1-git-send-email-ashijeetacharya@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu, Jan 26, 2017 at 02:46:52PM +0530, Ashijeet Acharya wrote: > Migration of a "none" machine with no RAM crashes abruptly as > bitmap_new() fails and thus aborts. Instead, place a check for > last_ram_offset() being '0' at the start of ram_save_setup() and > error out with a meaningful error message. > > Signed-off-by: Ashijeet Acharya <ashijeetacharya@gmail.com> > --- > migration/ram.c | 5 +++++ > 1 file changed, 5 insertions(+) You state the problem in the one-line Git commit summary message, it's usually preferred to summarize the _fix_ that you're making :-). If the below variant sounds any better, maybe the maintainer can reword it upon applying: migrate: Gracefully handle crash of a 'none' machine with no RAM [...]
* Ashijeet Acharya (ashijeetacharya@gmail.com) wrote: > Migration of a "none" machine with no RAM crashes abruptly as > bitmap_new() fails and thus aborts. Instead, place a check for > last_ram_offset() being '0' at the start of ram_save_setup() and > error out with a meaningful error message. > > Signed-off-by: Ashijeet Acharya <ashijeetacharya@gmail.com> > --- > migration/ram.c | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/migration/ram.c b/migration/ram.c > index ef8fadf..bf05d69 100644 > --- a/migration/ram.c > +++ b/migration/ram.c > @@ -1947,6 +1947,11 @@ static int ram_save_setup(QEMUFile *f, void *opaque) > { > RAMBlock *block; > > + if (last_ram_offset() == 0) { > + error_report("Failed to migrate: No RAM available!"); > + return -1; > + } > + Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Dave > /* migration has already setup the bitmap, reuse it. */ > if (!migration_in_colo_state()) { > if (ram_save_init_globals() < 0) { > -- > 2.6.2 > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
On Thu, Jan 26, 2017 at 02:46:52PM +0530, Ashijeet Acharya wrote: > Migration of a "none" machine with no RAM crashes abruptly as > bitmap_new() fails and thus aborts. Instead, place a check for > last_ram_offset() being '0' at the start of ram_save_setup() and > error out with a meaningful error message. > > Signed-off-by: Ashijeet Acharya <ashijeetacharya@gmail.com> > --- > migration/ram.c | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/migration/ram.c b/migration/ram.c > index ef8fadf..bf05d69 100644 > --- a/migration/ram.c > +++ b/migration/ram.c > @@ -1947,6 +1947,11 @@ static int ram_save_setup(QEMUFile *f, void *opaque) > { > RAMBlock *block; > > + if (last_ram_offset() == 0) { > + error_report("Failed to migrate: No RAM available!"); > + return -1; > + } > + If we're merely going to block migration, as opposed to making it work, then IMHO we should use migration blockers registered at machine setup for this task. We have a new cli arg added to QEMU to tell it to abort startup if the machine configuration is not migratable. That only works if using migration blockers - this check you've added is too late to be detected at startup. Regards, Daniel
* Daniel P. Berrange (berrange@redhat.com) wrote: > On Thu, Jan 26, 2017 at 02:46:52PM +0530, Ashijeet Acharya wrote: > > Migration of a "none" machine with no RAM crashes abruptly as > > bitmap_new() fails and thus aborts. Instead, place a check for > > last_ram_offset() being '0' at the start of ram_save_setup() and > > error out with a meaningful error message. > > > > Signed-off-by: Ashijeet Acharya <ashijeetacharya@gmail.com> > > --- > > migration/ram.c | 5 +++++ > > 1 file changed, 5 insertions(+) > > > > diff --git a/migration/ram.c b/migration/ram.c > > index ef8fadf..bf05d69 100644 > > --- a/migration/ram.c > > +++ b/migration/ram.c > > @@ -1947,6 +1947,11 @@ static int ram_save_setup(QEMUFile *f, void *opaque) > > { > > RAMBlock *block; > > > > + if (last_ram_offset() == 0) { > > + error_report("Failed to migrate: No RAM available!"); > > + return -1; > > + } > > + > > If we're merely going to block migration, as opposed to making it work, > then IMHO we should use migration blockers registered at machine setup > for this task. > > We have a new cli arg added to QEMU to tell it to abort startup if the > machine configuration is not migratable. That only works if using > migration blockers - this check you've added is too late to be detected > at startup. Hmm, yes you're right, that would be better. Although it does lead to an interesting situation where starting with a 'none' and constructing it component at a time would get rejected by --only-migratable even though the final construction is fine. Dave > > Regards, > Daniel > -- > |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| > |: http://libvirt.org -o- http://virt-manager.org :| > |: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :| -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Okay On Friday, 27 January 2017, Daniel P. Berrange <berrange@redhat.com> wrote: > On Thu, Jan 26, 2017 at 02:46:52PM +0530, Ashijeet Acharya wrote: > > Migration of a "none" machine with no RAM crashes abruptly as > > bitmap_new() fails and thus aborts. Instead, place a check for > > last_ram_offset() being '0' at the start of ram_save_setup() and > > error out with a meaningful error message. > > > > Signed-off-by: Ashijeet Acharya <ashijeetacharya@gmail.com > <javascript:;>> > > --- > > migration/ram.c | 5 +++++ > > 1 file changed, 5 insertions(+) > > > > diff --git a/migration/ram.c b/migration/ram.c > > index ef8fadf..bf05d69 100644 > > --- a/migration/ram.c > > +++ b/migration/ram.c > > @@ -1947,6 +1947,11 @@ static int ram_save_setup(QEMUFile *f, void > *opaque) > > { > > RAMBlock *block; > > > > + if (last_ram_offset() == 0) { > > + error_report("Failed to migrate: No RAM available!"); > > + return -1; > > + } > > + > > If we're merely going to block migration, as opposed to making it work, > then IMHO we should use migration blockers registered at machine setup > for this task. > > We have a new cli arg added to QEMU to tell it to abort startup if the > machine configuration is not migratable. That only works if using Are you referring to the new "--only-migratable" option I added along with John last week? migration blockers - this check you've added is too late to be detected > at startup. I think that the machine is not completely 'non-migratable' because if I boot qemu with "-m 1G" (say) the migration actually completes successfully. Ashijeet > > Regards, > Daniel > -- > |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ > :| > |: http://libvirt.org -o- http://virt-manager.org > :| > |: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ > :| >
On Fri, Jan 27, 2017 at 09:46:13AM +0000, Dr. David Alan Gilbert wrote: > * Daniel P. Berrange (berrange@redhat.com) wrote: > > On Thu, Jan 26, 2017 at 02:46:52PM +0530, Ashijeet Acharya wrote: > > > Migration of a "none" machine with no RAM crashes abruptly as > > > bitmap_new() fails and thus aborts. Instead, place a check for > > > last_ram_offset() being '0' at the start of ram_save_setup() and > > > error out with a meaningful error message. > > > > > > Signed-off-by: Ashijeet Acharya <ashijeetacharya@gmail.com> > > > --- > > > migration/ram.c | 5 +++++ > > > 1 file changed, 5 insertions(+) > > > > > > diff --git a/migration/ram.c b/migration/ram.c > > > index ef8fadf..bf05d69 100644 > > > --- a/migration/ram.c > > > +++ b/migration/ram.c > > > @@ -1947,6 +1947,11 @@ static int ram_save_setup(QEMUFile *f, void *opaque) > > > { > > > RAMBlock *block; > > > > > > + if (last_ram_offset() == 0) { > > > + error_report("Failed to migrate: No RAM available!"); > > > + return -1; > > > + } > > > + > > > > If we're merely going to block migration, as opposed to making it work, > > then IMHO we should use migration blockers registered at machine setup > > for this task. > > > > We have a new cli arg added to QEMU to tell it to abort startup if the > > machine configuration is not migratable. That only works if using > > migration blockers - this check you've added is too late to be detected > > at startup. > > Hmm, yes you're right, that would be better. > Although it does lead to an interesting situation where starting with a 'none' > and constructing it component at a time would get rejected by --only-migratable > even though the final construction is fine. Even in that case I would expect the ram to be specified upfront e.g. $qemu -machine none -m 500 thus avoiding addition of the migration blocker Regards, Daniel
On Fri, Jan 27, 2017 at 03:22:38PM +0530, Ashijeet Acharya wrote: > Okay > > On Friday, 27 January 2017, Daniel P. Berrange <berrange@redhat.com> wrote: > > > On Thu, Jan 26, 2017 at 02:46:52PM +0530, Ashijeet Acharya wrote: > > > Migration of a "none" machine with no RAM crashes abruptly as > > > bitmap_new() fails and thus aborts. Instead, place a check for > > > last_ram_offset() being '0' at the start of ram_save_setup() and > > > error out with a meaningful error message. > > > > > > Signed-off-by: Ashijeet Acharya <ashijeetacharya@gmail.com > > <javascript:;>> > > > --- > > > migration/ram.c | 5 +++++ > > > 1 file changed, 5 insertions(+) > > > > > > diff --git a/migration/ram.c b/migration/ram.c > > > index ef8fadf..bf05d69 100644 > > > --- a/migration/ram.c > > > +++ b/migration/ram.c > > > @@ -1947,6 +1947,11 @@ static int ram_save_setup(QEMUFile *f, void > > *opaque) > > > { > > > RAMBlock *block; > > > > > > + if (last_ram_offset() == 0) { > > > + error_report("Failed to migrate: No RAM available!"); > > > + return -1; > > > + } > > > + > > > > If we're merely going to block migration, as opposed to making it work, > > then IMHO we should use migration blockers registered at machine setup > > for this task. > > > > We have a new cli arg added to QEMU to tell it to abort startup if the > > machine configuration is not migratable. That only works if using > > > Are you referring to the new "--only-migratable" option I added along with > John last week? > > migration blockers - this check you've added is too late to be detected > > at startup. > > > I think that the machine is not completely 'non-migratable' because if I > boot qemu with "-m 1G" (say) the migration actually completes successfully. Of course - you would only add the migration blocker when ram size is 0 Regards, Daniel
On Thu, 26 Jan 2017 14:46:52 +0530 Ashijeet Acharya <ashijeetacharya@gmail.com> wrote: > Migration of a "none" machine with no RAM crashes abruptly as > bitmap_new() fails and thus aborts. Instead, place a check for > last_ram_offset() being '0' at the start of ram_save_setup() and > error out with a meaningful error message. > > Signed-off-by: Ashijeet Acharya <ashijeetacharya@gmail.com> > --- Maybe a naive question: why a "none" machine with zero RAM should fail to migrate ? > migration/ram.c | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/migration/ram.c b/migration/ram.c > index ef8fadf..bf05d69 100644 > --- a/migration/ram.c > +++ b/migration/ram.c > @@ -1947,6 +1947,11 @@ static int ram_save_setup(QEMUFile *f, void *opaque) > { > RAMBlock *block; > > + if (last_ram_offset() == 0) { > + error_report("Failed to migrate: No RAM available!"); > + return -1; > + } > + > /* migration has already setup the bitmap, reuse it. */ > if (!migration_in_colo_state()) { > if (ram_save_init_globals() < 0) {
On Sun, Jan 29, 2017 at 12:11 AM, Greg Kurz <groug@kaod.org> wrote: > On Thu, 26 Jan 2017 14:46:52 +0530 > Ashijeet Acharya <ashijeetacharya@gmail.com> wrote: > >> Migration of a "none" machine with no RAM crashes abruptly as >> bitmap_new() fails and thus aborts. Instead, place a check for >> last_ram_offset() being '0' at the start of ram_save_setup() and >> error out with a meaningful error message. >> >> Signed-off-by: Ashijeet Acharya <ashijeetacharya@gmail.com> >> --- > cc'ing Paolo in : I had an IRC chat with him and he has a very interesting twist in the tale to add here. > Maybe a naive question: why a "none" machine with zero RAM should fail to > migrate ? Assuming you are referring to why its failing ATM; it fails because g_try_malloc0() inside bitmap_try_new() returns a NULL pointer for zero bits and thus the check for NULL inside bitmap_new() becomes true and it aborts. Check bitmap_new() for convenience. Ignore the noise if you already knew this! :-) Ashijeet
On Sun, 29 Jan 2017 01:06:47 +0530 Ashijeet Acharya <ashijeetacharya@gmail.com> wrote: > On Sun, Jan 29, 2017 at 12:11 AM, Greg Kurz <groug@kaod.org> wrote: > > On Thu, 26 Jan 2017 14:46:52 +0530 > > Ashijeet Acharya <ashijeetacharya@gmail.com> wrote: > > > >> Migration of a "none" machine with no RAM crashes abruptly as > >> bitmap_new() fails and thus aborts. Instead, place a check for > >> last_ram_offset() being '0' at the start of ram_save_setup() and > >> error out with a meaningful error message. > >> > >> Signed-off-by: Ashijeet Acharya <ashijeetacharya@gmail.com> > >> --- > > > > cc'ing Paolo in : I had an IRC chat with him and he has a very > interesting twist in the tale to add here. > > > Maybe a naive question: why a "none" machine with zero RAM should fail to > > migrate ? > > Assuming you are referring to why its failing ATM; it fails because My question was more: why deciding to fail migration instead of fixing the crash ? One would naively think that no RAM is *just* less state to migrate... but maybe the current code assumes that a machine always has RAM. > g_try_malloc0() inside bitmap_try_new() returns a NULL pointer for > zero bits and thus the check for NULL inside bitmap_new() becomes true > and it aborts. Check bitmap_new() for convenience. > > Ignore the noise if you already knew this! :-) > I hadn't checked, thanks for the details. Cheers. -- Greg > Ashijeet
On Sun, Jan 29, 2017 at 6:33 PM, Greg Kurz <groug@kaod.org> wrote: > On Sun, 29 Jan 2017 01:06:47 +0530 > Ashijeet Acharya <ashijeetacharya@gmail.com> wrote: > >> On Sun, Jan 29, 2017 at 12:11 AM, Greg Kurz <groug@kaod.org> wrote: >> > On Thu, 26 Jan 2017 14:46:52 +0530 >> > Ashijeet Acharya <ashijeetacharya@gmail.com> wrote: >> > >> >> Migration of a "none" machine with no RAM crashes abruptly as >> >> bitmap_new() fails and thus aborts. Instead, place a check for >> >> last_ram_offset() being '0' at the start of ram_save_setup() and >> >> error out with a meaningful error message. >> >> >> >> Signed-off-by: Ashijeet Acharya <ashijeetacharya@gmail.com> >> >> --- >> > >> >> cc'ing Paolo in : I had an IRC chat with him and he has a very >> interesting twist in the tale to add here. >> >> > Maybe a naive question: why a "none" machine with zero RAM should fail to >> > migrate ? >> >> Assuming you are referring to why its failing ATM; it fails because > > My question was more: why deciding to fail migration instead of fixing the > crash ? One would naively think that no RAM is *just* less state to > migrate... but maybe the current code assumes that a machine always has > RAM. Actually, I think, the current code does assume that the machine always has RAM which is why it crashes here. I am not aware if we can also boot machines other than 'none' with zero RAM so I have not tested that case. Also, Paolo has a similar opinion to yours that we should fix it instead of blocking it. He suggests to migrate only the device states and skip all the RAM related stuff. Maybe he can explain it better when he is available. I have had mixed opinions on this one which is why I am waiting until we reach a common ground. Ashijeet > >> g_try_malloc0() inside bitmap_try_new() returns a NULL pointer for >> zero bits and thus the check for NULL inside bitmap_new() becomes true >> and it aborts. Check bitmap_new() for convenience. >> >> Ignore the noise if you already knew this! :-) >> > > I hadn't checked, thanks for the details. > > Cheers. > > -- > Greg > >> Ashijeet >
On 29/01/2017 08:19, Ashijeet Acharya wrote: > Also, Paolo has a similar opinion to yours that we should fix it > instead of blocking it. He suggests to migrate only the device states > and skip all the RAM related stuff. Maybe he can explain it better > when he is available. Yes, there is even support already in ram_migration_cleanup for not allocating the bitmap: struct BitmapRcu *bitmap = migration_bitmap_rcu; atomic_rcu_set(&migration_bitmap_rcu, NULL); if (bitmap) { memory_global_dirty_log_stop(); call_rcu(bitmap, migration_bitmap_free, rcu); } I think you should just iterate until you find no more misbehaving code. The allocation is one, but ram_find_and_save_block should also return 0 if QLIST_FIRST_RCU(&ram_list.blocks) is NULL, for example. Paolo
On Tue, Jan 31, 2017 at 2:18 AM, Paolo Bonzini <pbonzini@redhat.com> wrote: > > > On 29/01/2017 08:19, Ashijeet Acharya wrote: >> Also, Paolo has a similar opinion to yours that we should fix it >> instead of blocking it. He suggests to migrate only the device states >> and skip all the RAM related stuff. Maybe he can explain it better >> when he is available. > > Yes, there is even support already in ram_migration_cleanup for not > allocating the bitmap: > > struct BitmapRcu *bitmap = migration_bitmap_rcu; > atomic_rcu_set(&migration_bitmap_rcu, NULL); > if (bitmap) { > memory_global_dirty_log_stop(); > call_rcu(bitmap, migration_bitmap_free, rcu); > } > > I think you should just iterate until you find no more misbehaving code. > The allocation is one, but ram_find_and_save_block should also return 0 > if QLIST_FIRST_RCU(&ram_list.blocks) is NULL, for example. > Okay, I will drop the idea of adding a migration blocker in this case and proceed as you say. Ashijeet > Paolo
diff --git a/migration/ram.c b/migration/ram.c index ef8fadf..bf05d69 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -1947,6 +1947,11 @@ static int ram_save_setup(QEMUFile *f, void *opaque) { RAMBlock *block; + if (last_ram_offset() == 0) { + error_report("Failed to migrate: No RAM available!"); + return -1; + } + /* migration has already setup the bitmap, reuse it. */ if (!migration_in_colo_state()) { if (ram_save_init_globals() < 0) {
Migration of a "none" machine with no RAM crashes abruptly as bitmap_new() fails and thus aborts. Instead, place a check for last_ram_offset() being '0' at the start of ram_save_setup() and error out with a meaningful error message. Signed-off-by: Ashijeet Acharya <ashijeetacharya@gmail.com> --- migration/ram.c | 5 +++++ 1 file changed, 5 insertions(+)