Message ID | 1462663968-26607-2-git-send-email-nli@suse.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 05/07/2016 05:32 PM, Nan Li wrote: > When running the command "dump-guest-memory", we usually need a large space > of storage to save the dumpfile into disk. It costs not only much time to > save a file in some of hard disks, but also costs limited storage in host. > In order to reduce the saving time and make it convenient for users to dump > the guest memory, we introduce a Filesystem in Userspace (FUSE) to save the > dump file in RAM. It is selectable in the configure file, adding a compiling > of package "fuse-devel". It doesn't change the way of dumping guest memory. Why introduce FUSE? Can we reuse NBD instead? > > qemu_fuse_main(int argc, char *argv[]) is the API for qemu code to mount > this filesystem. And it only supports these operations just for dumping > guest memory. > > static struct fuse_operations qemu_fuse_oper = { > .getattr = qemu_fuse_getattr, > .fgetattr = qemu_fuse_fgetattr, > .readdir = qemu_fuse_readdir, > .create = qemu_fuse_create, > .open = qemu_fuse_open, > .read = qemu_fuse_read, > .write = qemu_fuse_write, > .unlink = qemu_fuse_unlink, > }; > > Signed-off-by: Nan Li <nli@suse.com> > --- > Makefile.target | 1 + > configure | 34 +++++ > fuse-mem.c | 376 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > fuse-mem.h | 2 + > 4 files changed, 413 insertions(+) > create mode 100644 fuse-mem.c > create mode 100644 fuse-mem.h New files should be listed in MAINTAINERS; also, new files usually belong better in an appropriate subdirectory rather than littering the top directory (we're trying to reduce, not increase, the number of top-level files). I haven't closely reviewed the patch, because I think the meta-questions about the feature in general should be discussed first.
On Mon, May 09, 2016 at 09:52:28AM -0600, Eric Blake wrote: > On 05/07/2016 05:32 PM, Nan Li wrote: > > When running the command "dump-guest-memory", we usually need a large space > > of storage to save the dumpfile into disk. It costs not only much time to > > save a file in some of hard disks, but also costs limited storage in host. > > In order to reduce the saving time and make it convenient for users to dump > > the guest memory, we introduce a Filesystem in Userspace (FUSE) to save the > > dump file in RAM. It is selectable in the configure file, adding a compiling > > of package "fuse-devel". It doesn't change the way of dumping guest memory. > > Why introduce FUSE? Can we reuse NBD instead? The commit message talks of letting QEMU dump to RAM avoiding disk I/O. IOW, it seems like it could just dump to any tmpfs directory. I'm not really seeing a compelling reason why QEMU needs to mount a fuse filesystem itself - whatever app is using QEMU could handle mounting of fs without QEMU's involvement at all. Regards, Daniel
On Mon, 9 May 2016 17:13:07 +0100 "Daniel P. Berrange" <berrange@redhat.com> wrote: > On Mon, May 09, 2016 at 09:52:28AM -0600, Eric Blake wrote: > > On 05/07/2016 05:32 PM, Nan Li wrote: > > > When running the command "dump-guest-memory", we usually need a large space > > > of storage to save the dumpfile into disk. It costs not only much time to > > > save a file in some of hard disks, but also costs limited storage in host. > > > In order to reduce the saving time and make it convenient for users to dump > > > the guest memory, we introduce a Filesystem in Userspace (FUSE) to save the > > > dump file in RAM. It is selectable in the configure file, adding a compiling > > > of package "fuse-devel". It doesn't change the way of dumping guest memory. > > > > Why introduce FUSE? Can we reuse NBD instead? > > The commit message talks of letting QEMU dump to RAM avoiding disk I/O. > IOW, it seems like it could just dump to any tmpfs directory. > > I'm not really seeing a compelling reason why QEMU needs to mount a fuse > filesystem itself - whatever app is using QEMU could handle mounting of > fs without QEMU's involvement at all. The ultimate goal is to export internal QEMU state (memory content, register values) as an ELF file, so you could simply reuse any existing tools that can work with ELF dump files (gdb, crash, makedumpfile, readelf, etc.) instead of re-inventing the wheel for each of those tools. This cannot be really done from outside of QEMU without too much overhead (how would you access guest memory from outside QEMU?). And since this information should be available as an ELF file, it cannot be achieved with NBD, because that's a (raw) block device. Petr T
On Mon, May 09, 2016 at 06:20:22PM +0200, Petr Tesarik wrote: > On Mon, 9 May 2016 17:13:07 +0100 > "Daniel P. Berrange" <berrange@redhat.com> wrote: > > > On Mon, May 09, 2016 at 09:52:28AM -0600, Eric Blake wrote: > > > On 05/07/2016 05:32 PM, Nan Li wrote: > > > > When running the command "dump-guest-memory", we usually need a large space > > > > of storage to save the dumpfile into disk. It costs not only much time to > > > > save a file in some of hard disks, but also costs limited storage in host. > > > > In order to reduce the saving time and make it convenient for users to dump > > > > the guest memory, we introduce a Filesystem in Userspace (FUSE) to save the > > > > dump file in RAM. It is selectable in the configure file, adding a compiling > > > > of package "fuse-devel". It doesn't change the way of dumping guest memory. > > > > > > Why introduce FUSE? Can we reuse NBD instead? > > > > The commit message talks of letting QEMU dump to RAM avoiding disk I/O. > > IOW, it seems like it could just dump to any tmpfs directory. > > > > I'm not really seeing a compelling reason why QEMU needs to mount a fuse > > filesystem itself - whatever app is using QEMU could handle mounting of > > fs without QEMU's involvement at all. > > The ultimate goal is to export internal QEMU state (memory content, > register values) as an ELF file, so you could simply reuse any existing > tools that can work with ELF dump files (gdb, crash, makedumpfile, > readelf, etc.) instead of re-inventing the wheel for each of those > tools. > > This cannot be really done from outside of QEMU without too much > overhead (how would you access guest memory from outside QEMU?). Maybe I'm missing something, but IIUC the 'dump-guest-memory' monitor command in QEMU already dumps in ELF format which can be used by standard ELF tools. If you don't want that dump to hit disk, then you could mount a tmpfs and then tell QEMU to write to that. Regards, Daniel
On Mon, 9 May 2016 09:52:28 -0600 Eric Blake <eblake@redhat.com> wrote: > On 05/07/2016 05:32 PM, Nan Li wrote: > > When running the command "dump-guest-memory", we usually need a large space > > of storage to save the dumpfile into disk. It costs not only much time to > > save a file in some of hard disks, but also costs limited storage in host. > > In order to reduce the saving time and make it convenient for users to dump > > the guest memory, we introduce a Filesystem in Userspace (FUSE) to save the > > dump file in RAM. It is selectable in the configure file, adding a compiling > > of package "fuse-devel". It doesn't change the way of dumping guest memory. > > Why introduce FUSE? Can we reuse NBD instead? Let me answer this one, because it's me who came up with the idea, although I wasn't involved in the actual implementation. The idea is to get something more like Linux's /proc/kcore, but for a QEMU guest. So, yes, the same idea could be implemented as a standalone application which talks to QEMU using the gdb remote protocol and exposes the data in a structured form through a FUSE filesystem. However, the performance of such a solution cannot get even close to that of exposing the data directly from QEMU. Maybe it's still the best way to start the project... Regarding NBD ... correct me if I'm wrong, but I've always thought NBD can be used to export _disks_ from the QEMU instance, not guest RAM content. Regards, Petr T
On Mon, 9 May 2016 17:32:50 +0100 "Daniel P. Berrange" <berrange@redhat.com> wrote: > On Mon, May 09, 2016 at 06:20:22PM +0200, Petr Tesarik wrote: > > On Mon, 9 May 2016 17:13:07 +0100 > > "Daniel P. Berrange" <berrange@redhat.com> wrote: > > > > > On Mon, May 09, 2016 at 09:52:28AM -0600, Eric Blake wrote: > > > > On 05/07/2016 05:32 PM, Nan Li wrote: > > > > > When running the command "dump-guest-memory", we usually need a large space > > > > > of storage to save the dumpfile into disk. It costs not only much time to > > > > > save a file in some of hard disks, but also costs limited storage in host. > > > > > In order to reduce the saving time and make it convenient for users to dump > > > > > the guest memory, we introduce a Filesystem in Userspace (FUSE) to save the > > > > > dump file in RAM. It is selectable in the configure file, adding a compiling > > > > > of package "fuse-devel". It doesn't change the way of dumping guest memory. > > > > > > > > Why introduce FUSE? Can we reuse NBD instead? > > > > > > The commit message talks of letting QEMU dump to RAM avoiding disk I/O. > > > IOW, it seems like it could just dump to any tmpfs directory. > > > > > > I'm not really seeing a compelling reason why QEMU needs to mount a fuse > > > filesystem itself - whatever app is using QEMU could handle mounting of > > > fs without QEMU's involvement at all. > > > > The ultimate goal is to export internal QEMU state (memory content, > > register values) as an ELF file, so you could simply reuse any existing > > tools that can work with ELF dump files (gdb, crash, makedumpfile, > > readelf, etc.) instead of re-inventing the wheel for each of those > > tools. > > > > This cannot be really done from outside of QEMU without too much > > overhead (how would you access guest memory from outside QEMU?). > > Maybe I'm missing something, but IIUC the 'dump-guest-memory' monitor > command in QEMU already dumps in ELF format which can be used by standard > ELF tools. If you don't want that dump to hit disk, then you could mount > a tmpfs and then tell QEMU to write to that. It's not the same kind of beast: 1. You need double the amount of RAM in the host. Oh, yes, some folks like to create VMs with a RAM size of a few hundred GBs of RAM, and then it may not be negligible... 2. The memory must still be copied. This is made a bit worse by the fact that tmpfs does not pre-allocate enough RAM, so even copying a few GBs takes several seconds. 3. Most importantly, if the file is created on the fly, it's a live memory source, i.e. repeated reads will reflect changes in the running guest. Some use cases are substantially slower with the dump-then-use approach. For example, makedumpfile can estimate the resulting dump size based on data from the running kernel. It reads only a tiny portion of system RAM to do the analysis, but since only makedumpfile knows the exact addresses, you would still need a full dump for that. With the FUSE approach, guest pages are served on demand when the application requests them. Petr T
On Tue, May 10, 2016 at 08:19:38AM +0200, Petr Tesarik wrote: > On Mon, 9 May 2016 17:32:50 +0100 > "Daniel P. Berrange" <berrange@redhat.com> wrote: > > > On Mon, May 09, 2016 at 06:20:22PM +0200, Petr Tesarik wrote: > > > On Mon, 9 May 2016 17:13:07 +0100 > > > "Daniel P. Berrange" <berrange@redhat.com> wrote: > > > > > > > On Mon, May 09, 2016 at 09:52:28AM -0600, Eric Blake wrote: > > > > > On 05/07/2016 05:32 PM, Nan Li wrote: > > > > > > When running the command "dump-guest-memory", we usually need a large space > > > > > > of storage to save the dumpfile into disk. It costs not only much time to > > > > > > save a file in some of hard disks, but also costs limited storage in host. > > > > > > In order to reduce the saving time and make it convenient for users to dump > > > > > > the guest memory, we introduce a Filesystem in Userspace (FUSE) to save the > > > > > > dump file in RAM. It is selectable in the configure file, adding a compiling > > > > > > of package "fuse-devel". It doesn't change the way of dumping guest memory. > > > > > > > > > > Why introduce FUSE? Can we reuse NBD instead? > > > > > > > > The commit message talks of letting QEMU dump to RAM avoiding disk I/O. > > > > IOW, it seems like it could just dump to any tmpfs directory. > > > > > > > > I'm not really seeing a compelling reason why QEMU needs to mount a fuse > > > > filesystem itself - whatever app is using QEMU could handle mounting of > > > > fs without QEMU's involvement at all. > > > > > > The ultimate goal is to export internal QEMU state (memory content, > > > register values) as an ELF file, so you could simply reuse any existing > > > tools that can work with ELF dump files (gdb, crash, makedumpfile, > > > readelf, etc.) instead of re-inventing the wheel for each of those > > > tools. > > > > > > This cannot be really done from outside of QEMU without too much > > > overhead (how would you access guest memory from outside QEMU?). > > > > Maybe I'm missing something, but IIUC the 'dump-guest-memory' monitor > > command in QEMU already dumps in ELF format which can be used by standard > > ELF tools. If you don't want that dump to hit disk, then you could mount > > a tmpfs and then tell QEMU to write to that. > > It's not the same kind of beast: > > 1. You need double the amount of RAM in the host. Oh, yes, some > folks like to create VMs with a RAM size of a few hundred GBs of > RAM, and then it may not be negligible... > > 2. The memory must still be copied. This is made a bit worse by the > fact that tmpfs does not pre-allocate enough RAM, so even copying > a few GBs takes several seconds. > > 3. Most importantly, if the file is created on the fly, it's a live > memory source, i.e. repeated reads will reflect changes in the > running guest. > > Some use cases are substantially slower with the dump-then-use > approach. For example, makedumpfile can estimate the resulting dump > size based on data from the running kernel. It reads only a tiny > portion of system RAM to do the analysis, but since only makedumpfile > knows the exact addresses, you would still need a full dump for that. > > With the FUSE approach, guest pages are served on demand when the > application requests them. AFAICT, what you describe here is not what this patch set is actually doing. This patch isn't modifying the guest-dump-memory monitor command at all - it is just mounting a fuse filesystem and saying you use the guest-dump-memory as normal to write to that filesystem. Regards, Daniel
On Tue, May 10, 2016 at 07:59:41AM +0200, Petr Tesarik wrote: > On Mon, 9 May 2016 09:52:28 -0600 > Eric Blake <eblake@redhat.com> wrote: > > > On 05/07/2016 05:32 PM, Nan Li wrote: > > > When running the command "dump-guest-memory", we usually need a large space > > > of storage to save the dumpfile into disk. It costs not only much time to > > > save a file in some of hard disks, but also costs limited storage in host. > > > In order to reduce the saving time and make it convenient for users to dump > > > the guest memory, we introduce a Filesystem in Userspace (FUSE) to save the > > > dump file in RAM. It is selectable in the configure file, adding a compiling > > > of package "fuse-devel". It doesn't change the way of dumping guest memory. > > > > Why introduce FUSE? Can we reuse NBD instead? > > Let me answer this one, because it's me who came up with the idea, > although I wasn't involved in the actual implementation. > > The idea is to get something more like Linux's /proc/kcore, but for a > QEMU guest. So, yes, the same idea could be implemented as a standalone > application which talks to QEMU using the gdb remote protocol and > exposes the data in a structured form through a FUSE filesystem. > > However, the performance of such a solution cannot get even close to > that of exposing the data directly from QEMU. Maybe it's still the best > way to start the project... IIUC, the performance penalty will be related to the copying of guest RAM. All the other supplementary information you want (register state etc) is low volume, so should not be performance critical to copy that over the QMP monitor command or via libvirt monitor command passthrough. So if want to have an external program provide a /proc/kcore like service via FUSE, the problem we need to solve here is a mechanism for providing efficient access to QEMU memory. I think this can be done quite simply by having QEMU guest RAM exposed via tmpfs or hugetlbfs as appropriate. This approach is what is already used for the vhost-user network backend in an external process which likewise needs copy-free access to guest RAM pages. Obviously this requires that users start QEMU in this particular setup for RAM, but I don't think that's a particularly onerous requirement as any non-trivial management application will already know how to do this. > Regarding NBD ... correct me if I'm wrong, but I've always thought NBD > can be used to export _disks_ from the QEMU instance, not guest RAM > content. Yeah, NBD seems like the wrong fit for this problem. Regards, Daniel
On Tue, 10 May 2016 09:48:48 +0100 "Daniel P. Berrange" <berrange@redhat.com> wrote: > On Tue, May 10, 2016 at 07:59:41AM +0200, Petr Tesarik wrote: > > On Mon, 9 May 2016 09:52:28 -0600 > > Eric Blake <eblake@redhat.com> wrote: > > > > > On 05/07/2016 05:32 PM, Nan Li wrote: > > > > When running the command "dump-guest-memory", we usually need a large space > > > > of storage to save the dumpfile into disk. It costs not only much time to > > > > save a file in some of hard disks, but also costs limited storage in host. > > > > In order to reduce the saving time and make it convenient for users to dump > > > > the guest memory, we introduce a Filesystem in Userspace (FUSE) to save the > > > > dump file in RAM. It is selectable in the configure file, adding a compiling > > > > of package "fuse-devel". It doesn't change the way of dumping guest memory. > > > > > > Why introduce FUSE? Can we reuse NBD instead? > > > > Let me answer this one, because it's me who came up with the idea, > > although I wasn't involved in the actual implementation. > > > > The idea is to get something more like Linux's /proc/kcore, but for a > > QEMU guest. So, yes, the same idea could be implemented as a standalone > > application which talks to QEMU using the gdb remote protocol and > > exposes the data in a structured form through a FUSE filesystem. > > > > However, the performance of such a solution cannot get even close to > > that of exposing the data directly from QEMU. Maybe it's still the best > > way to start the project... > > IIUC, the performance penalty will be related to the copying of guest > RAM. All the other supplementary information you want (register state > etc) is low volume, so should not be performance critical to copy that > over the QMP monitor command or via libvirt monitor command passthrough. Agreed. Even if the number of guest CPUs ever rises to the order of thousands, the additional impact is negligible. > So if want to have an external program provide a /proc/kcore like > service via FUSE, the problem we need to solve here is a mechanism > for providing efficient access to QEMU memory. Indeed. This is the main reason for tinkering with QEMU sources at all. > I think this can be done quite simply by having QEMU guest RAM exposed > via tmpfs or hugetlbfs as appropriate. This approach is what is already > used for the vhost-user network backend in an external process which > likewise needs copy-free access to guest RAM pages. Ha! We didn't realize this is an option. We can certainly have a look at implementing a generic mechanism for mapping QEMU guest RAM from another process on the host. And yes, this would address any performance concerns nicely. > Obviously this requires that users start QEMU in this particular setup > for RAM, but I don't think that's a particularly onerous requirement > as any non-trivial management application will already know how to do > this. Agreed. This is not an issue. Our main target would be libvirt, which adds quite a bit of infrastructure already. ;-) Thanks for your thoughts! Petr T
On Tue, May 10, 2016 at 07:59:41AM +0200, Petr Tesarik wrote: > On Mon, 9 May 2016 09:52:28 -0600 > Eric Blake <eblake@redhat.com> wrote: > > > On 05/07/2016 05:32 PM, Nan Li wrote: > > > When running the command "dump-guest-memory", we usually need a large space > > > of storage to save the dumpfile into disk. It costs not only much time to > > > save a file in some of hard disks, but also costs limited storage in host. > > > In order to reduce the saving time and make it convenient for users to dump > > > the guest memory, we introduce a Filesystem in Userspace (FUSE) to save the > > > dump file in RAM. It is selectable in the configure file, adding a compiling > > > of package "fuse-devel". It doesn't change the way of dumping guest memory. > > > > Why introduce FUSE? Can we reuse NBD instead? > > Let me answer this one, because it's me who came up with the idea, > although I wasn't involved in the actual implementation. > > The idea is to get something more like Linux's /proc/kcore, but for a > QEMU guest. So, yes, the same idea could be implemented as a standalone > application which talks to QEMU using the gdb remote protocol and > exposes the data in a structured form through a FUSE filesystem. > > However, the performance of such a solution cannot get even close to > that of exposing the data directly from QEMU. Maybe it's still the best > way to start the project... If you want no overhead and are willing to pause the guest, use QEMU's gdb stub (directly, no extra FUSE file system layer). If you cannot pause the guest then take a copy of memory with dump-guest-memory to tmpfs. There might be a middle-ground where you can copy-on-write pages and let the guest continue to run, but this is probably not worth the effort/complexity. I find it hard to see where adding more code or using FUSE would make things better? Stefan
>>> On 5/10/2016 at 5:56 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote: > On Tue, May 10, 2016 at 07:59:41AM +0200, Petr Tesarik wrote: >> On Mon, 9 May 2016 09:52:28 ?0600 >> Eric Blake <eblake@redhat.com> wrote: >> >> > On 05/07/2016 05:32 PM, Nan Li wrote: >> > > When running the command "dump?guest?memory", we usually need a large space >> > > of storage to save the dumpfile into disk. It costs not only much time to >> > > save a file in some of hard disks, but also costs limited storage in host. >> > > In order to reduce the saving time and make it convenient for users to > dump >> > > the guest memory, we introduce a Filesystem in Userspace (FUSE) to save > the >> > > dump file in RAM. It is selectable in the configure file, adding a > compiling >> > > of package "fuse?devel". It doesn't change the way of dumping guest memory. >> > >> > Why introduce FUSE? Can we reuse NBD instead? >> >> Let me answer this one, because it's me who came up with the idea, >> although I wasn't involved in the actual implementation. >> >> The idea is to get something more like Linux's /proc/kcore, but for a >> QEMU guest. So, yes, the same idea could be implemented as a standalone >> application which talks to QEMU using the gdb remote protocol and >> exposes the data in a structured form through a FUSE filesystem. >> >> However, the performance of such a solution cannot get even close to >> that of exposing the data directly from QEMU. Maybe it's still the best >> way to start the project... > > If you want no overhead and are willing to pause the guest, use QEMU's > gdb stub (directly, no extra FUSE file system layer). If you cannot > pause the guest then take a copy of memory with dump?guest?memory to > tmpfs. > > There might be a middle?ground where you can copy?on?write pages and let > the guest continue to run, but this is probably not worth the > effort/complexity. > Yes, pausing the guest and then accessing the guest memory to do the core analysis work is much easier than handing the running guest. Thank you for your thoughts. > I find it hard to see where adding more code or using FUSE would make > things better? > > Stefan
>>> On 5/10/2016 at 5:42 PM, Petr Tesarik <ptesarik@suse.com> wrote: > On Tue, 10 May 2016 09:48:48 +0100 > "Daniel P. Berrange" <berrange@redhat.com> wrote: > >> On Tue, May 10, 2016 at 07:59:41AM +0200, Petr Tesarik wrote: >> > On Mon, 9 May 2016 09:52:28 ?0600 >> > Eric Blake <eblake@redhat.com> wrote: >> > >> > > On 05/07/2016 05:32 PM, Nan Li wrote: >> > > > When running the command "dump?guest?memory", we usually need a large space >> > > > of storage to save the dumpfile into disk. It costs not only much time to >> > > > save a file in some of hard disks, but also costs limited storage in > host. >> > > > In order to reduce the saving time and make it convenient for users to > dump >> > > > the guest memory, we introduce a Filesystem in Userspace (FUSE) to save > the >> > > > dump file in RAM. It is selectable in the configure file, adding a > compiling >> > > > of package "fuse?devel". It doesn't change the way of dumping guest > memory. >> > > >> > > Why introduce FUSE? Can we reuse NBD instead? >> > >> > Let me answer this one, because it's me who came up with the idea, >> > although I wasn't involved in the actual implementation. >> > >> > The idea is to get something more like Linux's /proc/kcore, but for a >> > QEMU guest. So, yes, the same idea could be implemented as a standalone >> > application which talks to QEMU using the gdb remote protocol and >> > exposes the data in a structured form through a FUSE filesystem. >> > >> > However, the performance of such a solution cannot get even close to >> > that of exposing the data directly from QEMU. Maybe it's still the best >> > way to start the project... >> >> IIUC, the performance penalty will be related to the copying of guest >> RAM. All the other supplementary information you want (register state >> etc) is low volume, so should not be performance critical to copy that >> over the QMP monitor command or via libvirt monitor command passthrough. > > Agreed. Even if the number of guest CPUs ever rises to the order of > thousands, the additional impact is negligible. > >> So if want to have an external program provide a /proc/kcore like >> service via FUSE, the problem we need to solve here is a mechanism >> for providing efficient access to QEMU memory. > > Indeed. This is the main reason for tinkering with QEMU sources at all. > >> I think this can be done quite simply by having QEMU guest RAM exposed >> via tmpfs or hugetlbfs as appropriate. This approach is what is already >> used for the vhost?user network backend in an external process which >> likewise needs copy?free access to guest RAM pages. > > Ha! We didn't realize this is an option. We can certainly have a look > at implementing a generic mechanism for mapping QEMU guest RAM from > another process on the host. And yes, this would address any > performance concerns nicely. > Agreed. It sounds a good option. I will try to investigate it. >> Obviously this requires that users start QEMU in this particular setup >> for RAM, but I don't think that's a particularly onerous requirement >> as any non?trivial management application will already know how to do >> this. > > Agreed. This is not an issue. Our main target would be libvirt, which > adds quite a bit of infrastructure already. ;?) > > Thanks for your thoughts! > > Petr T Thanks very much for all your thoughts. Nan Li
On Tue, 10 May 2016 10:56:42 +0100 Stefan Hajnoczi <stefanha@gmail.com> wrote: > On Tue, May 10, 2016 at 07:59:41AM +0200, Petr Tesarik wrote: > > On Mon, 9 May 2016 09:52:28 -0600 > > Eric Blake <eblake@redhat.com> wrote: > > > > > On 05/07/2016 05:32 PM, Nan Li wrote: > > > > When running the command "dump-guest-memory", we usually need a large space > > > > of storage to save the dumpfile into disk. It costs not only much time to > > > > save a file in some of hard disks, but also costs limited storage in host. > > > > In order to reduce the saving time and make it convenient for users to dump > > > > the guest memory, we introduce a Filesystem in Userspace (FUSE) to save the > > > > dump file in RAM. It is selectable in the configure file, adding a compiling > > > > of package "fuse-devel". It doesn't change the way of dumping guest memory. > > > > > > Why introduce FUSE? Can we reuse NBD instead? > > > > Let me answer this one, because it's me who came up with the idea, > > although I wasn't involved in the actual implementation. > > > > The idea is to get something more like Linux's /proc/kcore, but for a > > QEMU guest. So, yes, the same idea could be implemented as a standalone > > application which talks to QEMU using the gdb remote protocol and > > exposes the data in a structured form through a FUSE filesystem. > > > > However, the performance of such a solution cannot get even close to > > that of exposing the data directly from QEMU. Maybe it's still the best > > way to start the project... > > If you want no overhead and are willing to pause the guest, use QEMU's > gdb stub (directly, no extra FUSE file system layer). Well, the obvious downside of this solution is that you need GDB protocol support. AFAIK there are more tools which can work with ELF dump files than with the GDB protocol. Sure, I could add GDB protocol support to each and every one of them, but I fail to see how that is better use of time than adding an additional layer which allows to use any ELF-capable tool directly. > If you cannot > pause the guest then take a copy of memory with dump-guest-memory to > tmpfs. > > There might be a middle-ground where you can copy-on-write pages and let > the guest continue to run, but this is probably not worth the > effort/complexity. > > I find it hard to see where adding more code or using FUSE would make > things better? Please see my explanation in another branch of this thread why generating dump files on the fly is better for some use cases than saving a complete copy. BTW FUSE is definitely not tmpfs. See the nice diagram on Wikipedia: https://en.wikipedia.org/wiki/Filesystem_in_Userspace Our main motivation is not better performance but more flexibility. OTOH why should we burn more CPU cycles than necessary? Petr T
On Tue, May 10, 2016 at 01:55:10PM +0200, Petr Tesarik wrote: > On Tue, 10 May 2016 10:56:42 +0100 > Stefan Hajnoczi <stefanha@gmail.com> wrote: > > > On Tue, May 10, 2016 at 07:59:41AM +0200, Petr Tesarik wrote: > > > On Mon, 9 May 2016 09:52:28 -0600 > > > Eric Blake <eblake@redhat.com> wrote: > > > > > > > On 05/07/2016 05:32 PM, Nan Li wrote: > > > > > When running the command "dump-guest-memory", we usually need a large space > > > > > of storage to save the dumpfile into disk. It costs not only much time to > > > > > save a file in some of hard disks, but also costs limited storage in host. > > > > > In order to reduce the saving time and make it convenient for users to dump > > > > > the guest memory, we introduce a Filesystem in Userspace (FUSE) to save the > > > > > dump file in RAM. It is selectable in the configure file, adding a compiling > > > > > of package "fuse-devel". It doesn't change the way of dumping guest memory. > > > > > > > > Why introduce FUSE? Can we reuse NBD instead? > > > > > > Let me answer this one, because it's me who came up with the idea, > > > although I wasn't involved in the actual implementation. > > > > > > The idea is to get something more like Linux's /proc/kcore, but for a > > > QEMU guest. So, yes, the same idea could be implemented as a standalone > > > application which talks to QEMU using the gdb remote protocol and > > > exposes the data in a structured form through a FUSE filesystem. > > > > > > However, the performance of such a solution cannot get even close to > > > that of exposing the data directly from QEMU. Maybe it's still the best > > > way to start the project... > > > > If you want no overhead and are willing to pause the guest, use QEMU's > > gdb stub (directly, no extra FUSE file system layer). > > Well, the obvious downside of this solution is that you need GDB > protocol support. AFAIK there are more tools which can work with ELF > dump files than with the GDB protocol. Sure, I could add GDB protocol > support to each and every one of them, but I fail to see how that is > better use of time than adding an additional layer which allows to use > any ELF-capable tool directly. Out of interest, which tools are you thinking about? I use gdb and crash. Would be interesting to learn about additional options that you are familiar with. Stefan
On Thu, 12 May 2016 11:09:02 +0100 Stefan Hajnoczi <stefanha@gmail.com> wrote: > On Tue, May 10, 2016 at 01:55:10PM +0200, Petr Tesarik wrote: > > On Tue, 10 May 2016 10:56:42 +0100 > > Stefan Hajnoczi <stefanha@gmail.com> wrote: > > > > > On Tue, May 10, 2016 at 07:59:41AM +0200, Petr Tesarik wrote: > > > > On Mon, 9 May 2016 09:52:28 -0600 > > > > Eric Blake <eblake@redhat.com> wrote: > > > > > > > > > On 05/07/2016 05:32 PM, Nan Li wrote: > > > > > > When running the command "dump-guest-memory", we usually need a large space > > > > > > of storage to save the dumpfile into disk. It costs not only much time to > > > > > > save a file in some of hard disks, but also costs limited storage in host. > > > > > > In order to reduce the saving time and make it convenient for users to dump > > > > > > the guest memory, we introduce a Filesystem in Userspace (FUSE) to save the > > > > > > dump file in RAM. It is selectable in the configure file, adding a compiling > > > > > > of package "fuse-devel". It doesn't change the way of dumping guest memory. > > > > > > > > > > Why introduce FUSE? Can we reuse NBD instead? > > > > > > > > Let me answer this one, because it's me who came up with the idea, > > > > although I wasn't involved in the actual implementation. > > > > > > > > The idea is to get something more like Linux's /proc/kcore, but for a > > > > QEMU guest. So, yes, the same idea could be implemented as a standalone > > > > application which talks to QEMU using the gdb remote protocol and > > > > exposes the data in a structured form through a FUSE filesystem. > > > > > > > > However, the performance of such a solution cannot get even close to > > > > that of exposing the data directly from QEMU. Maybe it's still the best > > > > way to start the project... > > > > > > If you want no overhead and are willing to pause the guest, use QEMU's > > > gdb stub (directly, no extra FUSE file system layer). > > > > Well, the obvious downside of this solution is that you need GDB > > protocol support. AFAIK there are more tools which can work with ELF > > dump files than with the GDB protocol. Sure, I could add GDB protocol > > support to each and every one of them, but I fail to see how that is > > better use of time than adding an additional layer which allows to use > > any ELF-capable tool directly. > > Out of interest, which tools are you thinking about? > > I use gdb and crash. Would be interesting to learn about additional > options that you are familiar with. The one that started my thinking was "makedumpfile". I'm also exploring ways of writing a standalone eppic application on top of libkdumpfile, and, of course, you can use libkdumpfile python bindings today already to write an analysis tool in python which will work equally well on dumps and live VMs. Petr T
diff --git a/Makefile.target b/Makefile.target index 34ddb7e..7619ef8 100644 --- a/Makefile.target +++ b/Makefile.target @@ -138,6 +138,7 @@ obj-$(CONFIG_KVM) += kvm-all.o obj-y += memory.o cputlb.o obj-y += memory_mapping.o obj-y += dump.o +obj-$(CONFIG_FUSE) += fuse-mem.o obj-y += migration/ram.o migration/savevm.o LIBS := $(libs_softmmu) $(LIBS) diff --git a/configure b/configure index 5db29f0..0769caf 100755 --- a/configure +++ b/configure @@ -275,6 +275,7 @@ trace_backends="log" trace_file="trace" spice="" rbd="" +fuse="yes" smartcard="" libusb="" usb_redir="" @@ -1023,6 +1024,10 @@ for opt do ;; --enable-rbd) rbd="yes" ;; + --disable-fuse) fuse="no" + ;; + --enable-fuse) fuse="yes" + ;; --disable-xfsctl) xfs="no" ;; --enable-xfsctl) xfs="yes" @@ -1349,6 +1354,7 @@ disabled with --disable-FEATURE, default is enabled if available: vhost-net vhost-net acceleration support spice spice rbd rados block device (rbd) + fuse the support of dumping guest memory via fuse libiscsi iscsi support libnfs nfs support smartcard smartcard support (libcacard) @@ -3139,6 +3145,28 @@ EOF fi ########################################## +# fuse probe +min_fuse_version=2.9.3 +if test "$fuse" != "no" ; then + if $pkg_config --atleast-version=$min_fuse_version fuse; then + fuse_cflags=`$pkg_config fuse --cflags` + fuse_libs=`$pkg_config fuse --libs` + QEMU_CFLAGS="$fuse_cflags $QEMU_CFLAGS" + libs_softmmu="$fuse_libs $libs_softmmu" + fuse=yes + else + if $pkg_config fuse; then + if test "$fuse" = "yes" ; then + error_exit "fuse >= $min_fuse_version required for --enable-fuse" + fi + else + feature_not_found "fuse" "Please install fuse devel pkgs: fuse-devel" + fi + fuse=no + fi +fi + +########################################## # libssh2 probe min_libssh2_version=1.2.8 if test "$libssh2" != "no" ; then @@ -4815,6 +4843,7 @@ else echo "spice support $spice" fi echo "rbd support $rbd" +echo "fuse support $fuse" echo "xfsctl support $xfs" echo "smartcard support $smartcard" echo "libusb $libusb" @@ -5293,6 +5322,11 @@ if test "$rbd" = "yes" ; then echo "RBD_CFLAGS=$rbd_cflags" >> $config_host_mak echo "RBD_LIBS=$rbd_libs" >> $config_host_mak fi +if test "$fuse" = "yes" ; then + echo "CONFIG_FUSE=y" >> $config_host_mak + echo "FUSE_CFLAGS=$fuse_cflags" >> $config_host_mak + echo "FUSE_LIBS=$fuse_libs" >> $config_host_mak +fi echo "CONFIG_COROUTINE_BACKEND=$coroutine" >> $config_host_mak if test "$coroutine_pool" = "yes" ; then diff --git a/fuse-mem.c b/fuse-mem.c new file mode 100644 index 0000000..3365ddb --- /dev/null +++ b/fuse-mem.c @@ -0,0 +1,376 @@ +/* + + gcc -Wall myfuse.c -lfuse -D_FILE_OFFSET_BITS=64 -o myfuse +*/ + +#define FUSE_USE_VERSION 26 + +#include <fuse.h> +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <errno.h> +#include <fcntl.h> +#include "fuse-mem.h" + +//static const char *qemu_str = "Hello World!\n"; +//static const char *qemu_path = "/etc/qemu"; + +#define PAGE_SIZE (0x100000) +#define FILE_BUFFER_PAGE (PAGE_SIZE - sizeof(struct file_buffer)) + +struct file_buffer { + struct file_buffer *next; + size_t used; + size_t size; + /* Data points here */ + unsigned char data[0]; +}; + +struct file_bufhead { + //spinlock_t lock; + struct file_buffer *head; + struct file_buffer *tail; + //struct file_buffer *current; + size_t filesize; + //off_t offset; + //char *offset_ptr; +}; + +struct fuse_file { + char path[128]; + struct fuse_file_info fileinfo; + struct file_bufhead file; + struct fuse_file *next; +}; + +struct fuse_file_root { + struct fuse_file *head; + struct fuse_file *tail; +}; + +struct fuse_file_root root; + +#if 0 +void dumpfile(struct fuse_file *fuse_file_ptr) +{ + struct file_buffer *file_buffer_ptr; + int i; + printf("DUMPFILE:\n"); + for (file_buffer_ptr = fuse_file_ptr->file.head; file_buffer_ptr != NULL; file_buffer_ptr = file_buffer_ptr->next) { + for (i = 0; i < file_buffer_ptr->used; i++) { + printf("Address:0x%x: 0x%x\n", &file_buffer_ptr->data[i], file_buffer_ptr->data[i]); + } + } +} +#endif + +static int qemu_fuse_getattr(const char *path, struct stat *stbuf) +{ + struct fuse_file *fuse_file_ptr; + fuse_file_ptr = root.head; + + memset(stbuf, 0, sizeof(struct stat)); + if (strcmp(path, "/") == 0) { + stbuf->st_mode = S_IFDIR | 0777; + stbuf->st_nlink = 2; + } else { + while(fuse_file_ptr != NULL) { + if (strcmp(fuse_file_ptr->path, path) == 0) { + stbuf->st_mode = S_IFREG | 0666; + stbuf->st_nlink = 1; + stbuf->st_size = fuse_file_ptr->file.filesize; + return 0; + } + else + fuse_file_ptr = fuse_file_ptr->next; + } + return -ENOENT; + } + + return 0; +} +/* +static int qemu_fuse_getattr(const char *path, struct stat *stbuf) +{ + int res; + + res = lstat(path, stbuf); + if (res == -1) + return -errno; + + return 0; +} +*/ +static int qemu_fuse_fgetattr(const char *path, struct stat *stbuf, + struct fuse_file_info *fi) +{ + struct fuse_file *fuse_file_ptr; + fuse_file_ptr = root.head; + + memset(stbuf, 0, sizeof(struct stat)); + + while(fuse_file_ptr != NULL) { + if (fuse_file_ptr->fileinfo.fh == fi->fh) { + stbuf->st_mode = S_IFREG | 0666; + stbuf->st_nlink = 1; + stbuf->st_size = fuse_file_ptr->file.filesize; + return 0; + } + else + fuse_file_ptr = fuse_file_ptr->next; + } + return -ENOENT; + + return 0; +} + +static int qemu_fuse_readdir(const char *path, void *buf, fuse_fill_dir_t filler, + off_t offset, struct fuse_file_info *fi) +{ + + return 0; +} + +static int qemu_fuse_create(const char *path, mode_t mode, struct fuse_file_info *fi) +{ + struct fuse_file *fuse_file_ptr; + + fuse_file_ptr = (struct fuse_file *)malloc(sizeof(struct fuse_file)); + if (fuse_file_ptr) { + memcpy(&fuse_file_ptr->fileinfo, fi, sizeof(struct fuse_file_info)); + memset(&fuse_file_ptr->file, 0, sizeof(fuse_file_ptr->file)); + fuse_file_ptr->next = NULL; + if (root.head == NULL) { + root.head = fuse_file_ptr; + fi->fh = 1; + } else { + root.tail->next = fuse_file_ptr; + fi->fh = root.tail->fileinfo.fh + 1; + } + root.tail = fuse_file_ptr; + fuse_file_ptr->fileinfo.fh = fi->fh; + strcpy(fuse_file_ptr->path, path); + } else { + return -ENOMEM; + } + + return 0; +} + +static int qemu_fuse_open(const char *path, struct fuse_file_info *fi) +{ + struct fuse_file *fuse_file_ptr; + + fuse_file_ptr = root.head; + + while(fuse_file_ptr != NULL) { + if (strcmp(fuse_file_ptr->path, path) == 0) { + fi->fh = fuse_file_ptr->fileinfo.fh; + memcpy(&fuse_file_ptr->fileinfo, fi, sizeof(struct fuse_file_info)); + return 0; + } + else + fuse_file_ptr = fuse_file_ptr->next; + } + + return -ENOENT; +} + +static int qemu_fuse_read(const char *path, char *buf, size_t size, off_t offset, + struct fuse_file_info *fi) +{ +//printf("herbert:read:size=%u, offset=%u\n", size, offset); + + struct fuse_file *fuse_file_ptr; + struct file_buffer *file_buffer_ptr; + + fuse_file_ptr = root.head; + long n, count; + int item, index; + int i = 0, j = 0; + + while(fuse_file_ptr != NULL) { + if (fuse_file_ptr->fileinfo.fh == fi->fh) { + if ((fuse_file_ptr->file.filesize <= offset) || (fuse_file_ptr->file.filesize == 0)) + return 0; + if (size + offset > fuse_file_ptr->file.filesize) + size = fuse_file_ptr->file.filesize - offset; + n = size; + + item = offset / FILE_BUFFER_PAGE; + index = offset % FILE_BUFFER_PAGE; + + for (file_buffer_ptr = fuse_file_ptr->file.head; file_buffer_ptr != NULL; file_buffer_ptr = file_buffer_ptr->next) { + if ( i == item ) + break; + i++; + } + + j = index; + while (file_buffer_ptr != NULL && n > 0) { + if ( n > ((long)file_buffer_ptr->used - j) ) + count = ((long)file_buffer_ptr->used - j); + else + count = n; + + memcpy(buf + size -n, &file_buffer_ptr->data[j], count); + n -= count; + j = 0; + if (n > 0) + file_buffer_ptr = file_buffer_ptr->next; + } +//dumpfile(fuse_file_ptr); + return size; + } + else { + fuse_file_ptr = fuse_file_ptr->next; + } + } + + return -EBADF; +} + +static int qemu_fuse_write(const char *path, const char *buf, size_t size, + off_t offset, struct fuse_file_info *fi) +{ +//printf("herbert:write:size=%u, offset=%u\n", size, offset); + + struct fuse_file *fuse_file_ptr; + struct file_buffer *file_buffer_ptr; + + long n, count; + int item, index; + int i = 0, j = 0; + + fuse_file_ptr = root.head; + + while(fuse_file_ptr != NULL) { + if (fuse_file_ptr->fileinfo.fh == fi->fh) { + n = size; + + item = offset / FILE_BUFFER_PAGE; + index = offset % FILE_BUFFER_PAGE; + + for (file_buffer_ptr = fuse_file_ptr->file.head; file_buffer_ptr != NULL; file_buffer_ptr = file_buffer_ptr->next) { + if ( i == item ) + break; + i++; + } + + j = index; + + while (file_buffer_ptr != NULL && n > 0) { + if ( n > file_buffer_ptr->size - j ) + count = file_buffer_ptr->size- j; + else + count = n; + + memcpy(&file_buffer_ptr->data[j], buf + size -n, count); + if ((count + j - (long)file_buffer_ptr->used) > 0) { + fuse_file_ptr->file.filesize += (count + j - (long)file_buffer_ptr->used); + file_buffer_ptr->used = count + j; + } + n -= count; + j = 0; + + if (n > 0) + file_buffer_ptr = file_buffer_ptr->next; + } + + while (n > 0) { + file_buffer_ptr = (struct file_buffer *)malloc(PAGE_SIZE); + if (file_buffer_ptr) { + file_buffer_ptr->next = NULL; + file_buffer_ptr->size = FILE_BUFFER_PAGE; + if ( n > file_buffer_ptr->size ) + count = file_buffer_ptr->size; + else + count = n; + + memcpy(file_buffer_ptr->data, buf + size -n, count); + + file_buffer_ptr->used = count; + + if (fuse_file_ptr->file.head == NULL) { + fuse_file_ptr->file.head = file_buffer_ptr; + } else { + fuse_file_ptr->file.tail->next = file_buffer_ptr; + } + fuse_file_ptr->file.tail = file_buffer_ptr; + fuse_file_ptr->file.filesize += count; + + n -= count; + + if (n > 0) + file_buffer_ptr = file_buffer_ptr->next; + } else { + return -ENOMEM; + } + } +//dumpfile(fuse_file_ptr); + return size; + + } + else { + fuse_file_ptr = fuse_file_ptr->next; + } + } + + return -EBADF; +} + +static int qemu_fuse_unlink(const char *path) +{ + struct fuse_file *fuse_file_ptr, *p; + struct file_buffer *file_buffer_ptr, *q; + + fuse_file_ptr = root.head; + + while(fuse_file_ptr != NULL) { + if (strcmp(fuse_file_ptr->path, path) == 0) { + + file_buffer_ptr = fuse_file_ptr->file.head; + q = fuse_file_ptr->file.head; + while (file_buffer_ptr != NULL) { + q = q->next; + free(file_buffer_ptr); + file_buffer_ptr = q; + } + + if (fuse_file_ptr == root.head) { + root.head = fuse_file_ptr->next; + } else { + for (p = root.head; p->next != fuse_file_ptr; p = p->next); + p->next = fuse_file_ptr->next; + } + + free(fuse_file_ptr); + + return 0; + } + else + fuse_file_ptr = fuse_file_ptr->next; + } + + return -EACCES; +} + +static struct fuse_operations qemu_fuse_oper = { + .getattr = qemu_fuse_getattr, + .fgetattr = qemu_fuse_fgetattr, + .readdir = qemu_fuse_readdir, + .create = qemu_fuse_create, + .open = qemu_fuse_open, + .read = qemu_fuse_read, + .write = qemu_fuse_write, + .unlink = qemu_fuse_unlink, +}; + +extern int qemu_fuse_main(int argc, char *argv[]) +{ + return fuse_main(argc, argv, &qemu_fuse_oper, NULL); +} + + + diff --git a/fuse-mem.h b/fuse-mem.h new file mode 100644 index 0000000..1a40168 --- /dev/null +++ b/fuse-mem.h @@ -0,0 +1,2 @@ +extern int qemu_fuse_main(int argc, char *argv[]); +
When running the command "dump-guest-memory", we usually need a large space of storage to save the dumpfile into disk. It costs not only much time to save a file in some of hard disks, but also costs limited storage in host. In order to reduce the saving time and make it convenient for users to dump the guest memory, we introduce a Filesystem in Userspace (FUSE) to save the dump file in RAM. It is selectable in the configure file, adding a compiling of package "fuse-devel". It doesn't change the way of dumping guest memory. qemu_fuse_main(int argc, char *argv[]) is the API for qemu code to mount this filesystem. And it only supports these operations just for dumping guest memory. static struct fuse_operations qemu_fuse_oper = { .getattr = qemu_fuse_getattr, .fgetattr = qemu_fuse_fgetattr, .readdir = qemu_fuse_readdir, .create = qemu_fuse_create, .open = qemu_fuse_open, .read = qemu_fuse_read, .write = qemu_fuse_write, .unlink = qemu_fuse_unlink, }; Signed-off-by: Nan Li <nli@suse.com> --- Makefile.target | 1 + configure | 34 +++++ fuse-mem.c | 376 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ fuse-mem.h | 2 + 4 files changed, 413 insertions(+) create mode 100644 fuse-mem.c create mode 100644 fuse-mem.h