diff mbox

[1/2] Dump: introduce a Filesystem in Userspace

Message ID 1462663968-26607-2-git-send-email-nli@suse.com (mailing list archive)
State New, archived
Headers show

Commit Message

Nan Li May 7, 2016, 11:32 p.m. UTC
When running the command "dump-guest-memory", we usually need a large space
of storage to save the dumpfile into disk. It costs not only much time to
save a file in some of hard disks, but also costs limited storage in host.
In order to reduce the saving time and make it convenient for users to dump
the guest memory, we introduce a Filesystem in Userspace (FUSE) to save the
dump file in RAM. It is selectable in the configure file, adding a compiling
of package "fuse-devel". It doesn't change the way of dumping guest memory.

qemu_fuse_main(int argc, char *argv[]) is the API for qemu code to mount
this filesystem. And it only supports these operations just for dumping
guest memory.

static struct fuse_operations qemu_fuse_oper = {
	.getattr	= qemu_fuse_getattr,
	.fgetattr	= qemu_fuse_fgetattr,
	.readdir	= qemu_fuse_readdir,
	.create   = qemu_fuse_create,
	.open	= qemu_fuse_open,
	.read	= qemu_fuse_read,
	.write	= qemu_fuse_write,
	.unlink	= qemu_fuse_unlink,
};

Signed-off-by: Nan Li <nli@suse.com>
---
 Makefile.target |   1 +
 configure       |  34 +++++
 fuse-mem.c      | 376 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 fuse-mem.h      |   2 +
 4 files changed, 413 insertions(+)
 create mode 100644 fuse-mem.c
 create mode 100644 fuse-mem.h

Comments

Eric Blake May 9, 2016, 3:52 p.m. UTC | #1
On 05/07/2016 05:32 PM, Nan Li wrote:
> When running the command "dump-guest-memory", we usually need a large space
> of storage to save the dumpfile into disk. It costs not only much time to
> save a file in some of hard disks, but also costs limited storage in host.
> In order to reduce the saving time and make it convenient for users to dump
> the guest memory, we introduce a Filesystem in Userspace (FUSE) to save the
> dump file in RAM. It is selectable in the configure file, adding a compiling
> of package "fuse-devel". It doesn't change the way of dumping guest memory.

Why introduce FUSE? Can we reuse NBD instead?

> 
> qemu_fuse_main(int argc, char *argv[]) is the API for qemu code to mount
> this filesystem. And it only supports these operations just for dumping
> guest memory.
> 
> static struct fuse_operations qemu_fuse_oper = {
> 	.getattr	= qemu_fuse_getattr,
> 	.fgetattr	= qemu_fuse_fgetattr,
> 	.readdir	= qemu_fuse_readdir,
> 	.create   = qemu_fuse_create,
> 	.open	= qemu_fuse_open,
> 	.read	= qemu_fuse_read,
> 	.write	= qemu_fuse_write,
> 	.unlink	= qemu_fuse_unlink,
> };
> 
> Signed-off-by: Nan Li <nli@suse.com>
> ---
>  Makefile.target |   1 +
>  configure       |  34 +++++
>  fuse-mem.c      | 376 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  fuse-mem.h      |   2 +
>  4 files changed, 413 insertions(+)
>  create mode 100644 fuse-mem.c
>  create mode 100644 fuse-mem.h

New files should be listed in MAINTAINERS; also, new files usually
belong better in an appropriate subdirectory rather than littering the
top directory (we're trying to reduce, not increase, the number of
top-level files).

I haven't closely reviewed the patch, because I think the meta-questions
about the feature in general should be discussed first.
Daniel P. Berrangé May 9, 2016, 4:13 p.m. UTC | #2
On Mon, May 09, 2016 at 09:52:28AM -0600, Eric Blake wrote:
> On 05/07/2016 05:32 PM, Nan Li wrote:
> > When running the command "dump-guest-memory", we usually need a large space
> > of storage to save the dumpfile into disk. It costs not only much time to
> > save a file in some of hard disks, but also costs limited storage in host.
> > In order to reduce the saving time and make it convenient for users to dump
> > the guest memory, we introduce a Filesystem in Userspace (FUSE) to save the
> > dump file in RAM. It is selectable in the configure file, adding a compiling
> > of package "fuse-devel". It doesn't change the way of dumping guest memory.
> 
> Why introduce FUSE? Can we reuse NBD instead?

The commit message talks of letting QEMU dump to RAM avoiding disk I/O.
IOW, it seems like it could just dump to any tmpfs directory.

I'm not really seeing a compelling reason why QEMU needs to mount a fuse
filesystem itself - whatever app is using QEMU could handle mounting of
fs without QEMU's involvement at all.


Regards,
Daniel
Petr Tesarik May 9, 2016, 4:20 p.m. UTC | #3
On Mon, 9 May 2016 17:13:07 +0100
"Daniel P. Berrange" <berrange@redhat.com> wrote:

> On Mon, May 09, 2016 at 09:52:28AM -0600, Eric Blake wrote:
> > On 05/07/2016 05:32 PM, Nan Li wrote:
> > > When running the command "dump-guest-memory", we usually need a large space
> > > of storage to save the dumpfile into disk. It costs not only much time to
> > > save a file in some of hard disks, but also costs limited storage in host.
> > > In order to reduce the saving time and make it convenient for users to dump
> > > the guest memory, we introduce a Filesystem in Userspace (FUSE) to save the
> > > dump file in RAM. It is selectable in the configure file, adding a compiling
> > > of package "fuse-devel". It doesn't change the way of dumping guest memory.
> > 
> > Why introduce FUSE? Can we reuse NBD instead?
> 
> The commit message talks of letting QEMU dump to RAM avoiding disk I/O.
> IOW, it seems like it could just dump to any tmpfs directory.
> 
> I'm not really seeing a compelling reason why QEMU needs to mount a fuse
> filesystem itself - whatever app is using QEMU could handle mounting of
> fs without QEMU's involvement at all.

The ultimate goal is to export internal QEMU state (memory content,
register values) as an ELF file, so you could simply reuse any existing
tools that can work with ELF dump files (gdb, crash, makedumpfile,
readelf, etc.) instead of re-inventing the wheel for each of those
tools.

This cannot be really done from outside of QEMU without too much
overhead (how would you access guest memory from outside QEMU?).

And since this information should be available as an ELF file, it
cannot be achieved with NBD, because that's a (raw) block device.

Petr T
Daniel P. Berrangé May 9, 2016, 4:32 p.m. UTC | #4
On Mon, May 09, 2016 at 06:20:22PM +0200, Petr Tesarik wrote:
> On Mon, 9 May 2016 17:13:07 +0100
> "Daniel P. Berrange" <berrange@redhat.com> wrote:
> 
> > On Mon, May 09, 2016 at 09:52:28AM -0600, Eric Blake wrote:
> > > On 05/07/2016 05:32 PM, Nan Li wrote:
> > > > When running the command "dump-guest-memory", we usually need a large space
> > > > of storage to save the dumpfile into disk. It costs not only much time to
> > > > save a file in some of hard disks, but also costs limited storage in host.
> > > > In order to reduce the saving time and make it convenient for users to dump
> > > > the guest memory, we introduce a Filesystem in Userspace (FUSE) to save the
> > > > dump file in RAM. It is selectable in the configure file, adding a compiling
> > > > of package "fuse-devel". It doesn't change the way of dumping guest memory.
> > > 
> > > Why introduce FUSE? Can we reuse NBD instead?
> > 
> > The commit message talks of letting QEMU dump to RAM avoiding disk I/O.
> > IOW, it seems like it could just dump to any tmpfs directory.
> > 
> > I'm not really seeing a compelling reason why QEMU needs to mount a fuse
> > filesystem itself - whatever app is using QEMU could handle mounting of
> > fs without QEMU's involvement at all.
> 
> The ultimate goal is to export internal QEMU state (memory content,
> register values) as an ELF file, so you could simply reuse any existing
> tools that can work with ELF dump files (gdb, crash, makedumpfile,
> readelf, etc.) instead of re-inventing the wheel for each of those
> tools.
> 
> This cannot be really done from outside of QEMU without too much
> overhead (how would you access guest memory from outside QEMU?).

Maybe I'm missing something, but IIUC the 'dump-guest-memory' monitor
command in QEMU already dumps in ELF format which can be used by standard
ELF tools. If you don't want that dump to hit disk, then you could mount
a tmpfs and then tell QEMU to write to that.

Regards,
Daniel
Petr Tesarik May 10, 2016, 5:59 a.m. UTC | #5
On Mon, 9 May 2016 09:52:28 -0600
Eric Blake <eblake@redhat.com> wrote:

> On 05/07/2016 05:32 PM, Nan Li wrote:
> > When running the command "dump-guest-memory", we usually need a large space
> > of storage to save the dumpfile into disk. It costs not only much time to
> > save a file in some of hard disks, but also costs limited storage in host.
> > In order to reduce the saving time and make it convenient for users to dump
> > the guest memory, we introduce a Filesystem in Userspace (FUSE) to save the
> > dump file in RAM. It is selectable in the configure file, adding a compiling
> > of package "fuse-devel". It doesn't change the way of dumping guest memory.
> 
> Why introduce FUSE? Can we reuse NBD instead?

Let me answer this one, because it's me who came up with the idea,
although I wasn't involved in the actual implementation.

The idea is to get something more like Linux's /proc/kcore, but for a
QEMU guest. So, yes, the same idea could be implemented as a standalone
application which talks to QEMU using the gdb remote protocol and
exposes the data in a structured form through a FUSE filesystem.

However, the performance of such a solution cannot get even close to
that of exposing the data directly from QEMU. Maybe it's still the best
way to start the project...

Regarding NBD ... correct me if I'm wrong, but I've always thought NBD
can be used to export _disks_ from the QEMU instance, not guest RAM
content.

Regards,
Petr T
Petr Tesarik May 10, 2016, 6:19 a.m. UTC | #6
On Mon, 9 May 2016 17:32:50 +0100
"Daniel P. Berrange" <berrange@redhat.com> wrote:

> On Mon, May 09, 2016 at 06:20:22PM +0200, Petr Tesarik wrote:
> > On Mon, 9 May 2016 17:13:07 +0100
> > "Daniel P. Berrange" <berrange@redhat.com> wrote:
> > 
> > > On Mon, May 09, 2016 at 09:52:28AM -0600, Eric Blake wrote:
> > > > On 05/07/2016 05:32 PM, Nan Li wrote:
> > > > > When running the command "dump-guest-memory", we usually need a large space
> > > > > of storage to save the dumpfile into disk. It costs not only much time to
> > > > > save a file in some of hard disks, but also costs limited storage in host.
> > > > > In order to reduce the saving time and make it convenient for users to dump
> > > > > the guest memory, we introduce a Filesystem in Userspace (FUSE) to save the
> > > > > dump file in RAM. It is selectable in the configure file, adding a compiling
> > > > > of package "fuse-devel". It doesn't change the way of dumping guest memory.
> > > > 
> > > > Why introduce FUSE? Can we reuse NBD instead?
> > > 
> > > The commit message talks of letting QEMU dump to RAM avoiding disk I/O.
> > > IOW, it seems like it could just dump to any tmpfs directory.
> > > 
> > > I'm not really seeing a compelling reason why QEMU needs to mount a fuse
> > > filesystem itself - whatever app is using QEMU could handle mounting of
> > > fs without QEMU's involvement at all.
> > 
> > The ultimate goal is to export internal QEMU state (memory content,
> > register values) as an ELF file, so you could simply reuse any existing
> > tools that can work with ELF dump files (gdb, crash, makedumpfile,
> > readelf, etc.) instead of re-inventing the wheel for each of those
> > tools.
> > 
> > This cannot be really done from outside of QEMU without too much
> > overhead (how would you access guest memory from outside QEMU?).
> 
> Maybe I'm missing something, but IIUC the 'dump-guest-memory' monitor
> command in QEMU already dumps in ELF format which can be used by standard
> ELF tools. If you don't want that dump to hit disk, then you could mount
> a tmpfs and then tell QEMU to write to that.

It's not the same kind of beast:

  1. You need double the amount of RAM in the host. Oh, yes, some
     folks like to create VMs with a RAM size of a few hundred GBs of
     RAM, and then it may not be negligible...

  2. The memory must still be copied. This is made a bit worse by the
     fact that tmpfs does not pre-allocate enough RAM, so even copying
     a few GBs takes several seconds.

  3. Most importantly, if the file is created on the fly, it's a live
     memory source, i.e. repeated reads will reflect changes in the
     running guest.

Some use cases are substantially slower with the dump-then-use
approach. For example, makedumpfile can estimate the resulting dump
size based on data from the running kernel. It reads only a tiny
portion of system RAM to do the analysis, but since only makedumpfile
knows the exact addresses, you would still need a full dump for that.

With the FUSE approach, guest pages are served on demand when the
application requests them.

Petr T
Daniel P. Berrangé May 10, 2016, 8:39 a.m. UTC | #7
On Tue, May 10, 2016 at 08:19:38AM +0200, Petr Tesarik wrote:
> On Mon, 9 May 2016 17:32:50 +0100
> "Daniel P. Berrange" <berrange@redhat.com> wrote:
> 
> > On Mon, May 09, 2016 at 06:20:22PM +0200, Petr Tesarik wrote:
> > > On Mon, 9 May 2016 17:13:07 +0100
> > > "Daniel P. Berrange" <berrange@redhat.com> wrote:
> > > 
> > > > On Mon, May 09, 2016 at 09:52:28AM -0600, Eric Blake wrote:
> > > > > On 05/07/2016 05:32 PM, Nan Li wrote:
> > > > > > When running the command "dump-guest-memory", we usually need a large space
> > > > > > of storage to save the dumpfile into disk. It costs not only much time to
> > > > > > save a file in some of hard disks, but also costs limited storage in host.
> > > > > > In order to reduce the saving time and make it convenient for users to dump
> > > > > > the guest memory, we introduce a Filesystem in Userspace (FUSE) to save the
> > > > > > dump file in RAM. It is selectable in the configure file, adding a compiling
> > > > > > of package "fuse-devel". It doesn't change the way of dumping guest memory.
> > > > > 
> > > > > Why introduce FUSE? Can we reuse NBD instead?
> > > > 
> > > > The commit message talks of letting QEMU dump to RAM avoiding disk I/O.
> > > > IOW, it seems like it could just dump to any tmpfs directory.
> > > > 
> > > > I'm not really seeing a compelling reason why QEMU needs to mount a fuse
> > > > filesystem itself - whatever app is using QEMU could handle mounting of
> > > > fs without QEMU's involvement at all.
> > > 
> > > The ultimate goal is to export internal QEMU state (memory content,
> > > register values) as an ELF file, so you could simply reuse any existing
> > > tools that can work with ELF dump files (gdb, crash, makedumpfile,
> > > readelf, etc.) instead of re-inventing the wheel for each of those
> > > tools.
> > > 
> > > This cannot be really done from outside of QEMU without too much
> > > overhead (how would you access guest memory from outside QEMU?).
> > 
> > Maybe I'm missing something, but IIUC the 'dump-guest-memory' monitor
> > command in QEMU already dumps in ELF format which can be used by standard
> > ELF tools. If you don't want that dump to hit disk, then you could mount
> > a tmpfs and then tell QEMU to write to that.
> 
> It's not the same kind of beast:
> 
>   1. You need double the amount of RAM in the host. Oh, yes, some
>      folks like to create VMs with a RAM size of a few hundred GBs of
>      RAM, and then it may not be negligible...
> 
>   2. The memory must still be copied. This is made a bit worse by the
>      fact that tmpfs does not pre-allocate enough RAM, so even copying
>      a few GBs takes several seconds.
> 
>   3. Most importantly, if the file is created on the fly, it's a live
>      memory source, i.e. repeated reads will reflect changes in the
>      running guest.
> 
> Some use cases are substantially slower with the dump-then-use
> approach. For example, makedumpfile can estimate the resulting dump
> size based on data from the running kernel. It reads only a tiny
> portion of system RAM to do the analysis, but since only makedumpfile
> knows the exact addresses, you would still need a full dump for that.
> 
> With the FUSE approach, guest pages are served on demand when the
> application requests them.

AFAICT, what you describe here is not what this patch set is actually
doing. This patch isn't modifying the guest-dump-memory monitor command
at all - it is just mounting a fuse filesystem and saying you use the
guest-dump-memory as normal to write to that filesystem.

Regards,
Daniel
Daniel P. Berrangé May 10, 2016, 8:48 a.m. UTC | #8
On Tue, May 10, 2016 at 07:59:41AM +0200, Petr Tesarik wrote:
> On Mon, 9 May 2016 09:52:28 -0600
> Eric Blake <eblake@redhat.com> wrote:
> 
> > On 05/07/2016 05:32 PM, Nan Li wrote:
> > > When running the command "dump-guest-memory", we usually need a large space
> > > of storage to save the dumpfile into disk. It costs not only much time to
> > > save a file in some of hard disks, but also costs limited storage in host.
> > > In order to reduce the saving time and make it convenient for users to dump
> > > the guest memory, we introduce a Filesystem in Userspace (FUSE) to save the
> > > dump file in RAM. It is selectable in the configure file, adding a compiling
> > > of package "fuse-devel". It doesn't change the way of dumping guest memory.
> > 
> > Why introduce FUSE? Can we reuse NBD instead?
> 
> Let me answer this one, because it's me who came up with the idea,
> although I wasn't involved in the actual implementation.
> 
> The idea is to get something more like Linux's /proc/kcore, but for a
> QEMU guest. So, yes, the same idea could be implemented as a standalone
> application which talks to QEMU using the gdb remote protocol and
> exposes the data in a structured form through a FUSE filesystem.
> 
> However, the performance of such a solution cannot get even close to
> that of exposing the data directly from QEMU. Maybe it's still the best
> way to start the project...

IIUC, the performance penalty will be related to the copying of guest
RAM. All the other supplementary information you want (register state
etc) is low volume, so should not be performance critical to copy that
over the QMP monitor command or via libvirt monitor command passthrough.

So if want to have an external program provide a /proc/kcore like
service via FUSE, the problem we need to solve here is a mechanism
for providing efficient access to QEMU memory.

I think this can be done quite simply by having QEMU guest RAM exposed
via tmpfs or hugetlbfs as appropriate. This approach is what is already
used for the vhost-user network backend in an external process which
likewise needs copy-free access to guest RAM pages.

Obviously this requires that users start QEMU in this particular setup
for RAM, but I don't think that's a particularly onerous requirement
as any non-trivial management application will already know how to do
this.

> Regarding NBD ... correct me if I'm wrong, but I've always thought NBD
> can be used to export _disks_ from the QEMU instance, not guest RAM
> content.

Yeah, NBD seems like the wrong fit for this problem.

Regards,
Daniel
Petr Tesarik May 10, 2016, 9:42 a.m. UTC | #9
On Tue, 10 May 2016 09:48:48 +0100
"Daniel P. Berrange" <berrange@redhat.com> wrote:

> On Tue, May 10, 2016 at 07:59:41AM +0200, Petr Tesarik wrote:
> > On Mon, 9 May 2016 09:52:28 -0600
> > Eric Blake <eblake@redhat.com> wrote:
> > 
> > > On 05/07/2016 05:32 PM, Nan Li wrote:
> > > > When running the command "dump-guest-memory", we usually need a large space
> > > > of storage to save the dumpfile into disk. It costs not only much time to
> > > > save a file in some of hard disks, but also costs limited storage in host.
> > > > In order to reduce the saving time and make it convenient for users to dump
> > > > the guest memory, we introduce a Filesystem in Userspace (FUSE) to save the
> > > > dump file in RAM. It is selectable in the configure file, adding a compiling
> > > > of package "fuse-devel". It doesn't change the way of dumping guest memory.
> > > 
> > > Why introduce FUSE? Can we reuse NBD instead?
> > 
> > Let me answer this one, because it's me who came up with the idea,
> > although I wasn't involved in the actual implementation.
> > 
> > The idea is to get something more like Linux's /proc/kcore, but for a
> > QEMU guest. So, yes, the same idea could be implemented as a standalone
> > application which talks to QEMU using the gdb remote protocol and
> > exposes the data in a structured form through a FUSE filesystem.
> > 
> > However, the performance of such a solution cannot get even close to
> > that of exposing the data directly from QEMU. Maybe it's still the best
> > way to start the project...
> 
> IIUC, the performance penalty will be related to the copying of guest
> RAM. All the other supplementary information you want (register state
> etc) is low volume, so should not be performance critical to copy that
> over the QMP monitor command or via libvirt monitor command passthrough.

Agreed. Even if the number of guest CPUs ever rises to the order of
thousands, the additional impact is negligible.

> So if want to have an external program provide a /proc/kcore like
> service via FUSE, the problem we need to solve here is a mechanism
> for providing efficient access to QEMU memory.

Indeed. This is the main reason for tinkering with QEMU sources at all.

> I think this can be done quite simply by having QEMU guest RAM exposed
> via tmpfs or hugetlbfs as appropriate. This approach is what is already
> used for the vhost-user network backend in an external process which
> likewise needs copy-free access to guest RAM pages.

Ha! We didn't realize this is an option. We can certainly have a look
at implementing a generic mechanism for mapping QEMU guest RAM from
another process on the host. And yes, this would address any
performance concerns nicely.

> Obviously this requires that users start QEMU in this particular setup
> for RAM, but I don't think that's a particularly onerous requirement
> as any non-trivial management application will already know how to do
> this.

Agreed. This is not an issue. Our main target would be libvirt, which
adds quite a bit of infrastructure already. ;-)

Thanks for your thoughts!

Petr T
Stefan Hajnoczi May 10, 2016, 9:56 a.m. UTC | #10
On Tue, May 10, 2016 at 07:59:41AM +0200, Petr Tesarik wrote:
> On Mon, 9 May 2016 09:52:28 -0600
> Eric Blake <eblake@redhat.com> wrote:
> 
> > On 05/07/2016 05:32 PM, Nan Li wrote:
> > > When running the command "dump-guest-memory", we usually need a large space
> > > of storage to save the dumpfile into disk. It costs not only much time to
> > > save a file in some of hard disks, but also costs limited storage in host.
> > > In order to reduce the saving time and make it convenient for users to dump
> > > the guest memory, we introduce a Filesystem in Userspace (FUSE) to save the
> > > dump file in RAM. It is selectable in the configure file, adding a compiling
> > > of package "fuse-devel". It doesn't change the way of dumping guest memory.
> > 
> > Why introduce FUSE? Can we reuse NBD instead?
> 
> Let me answer this one, because it's me who came up with the idea,
> although I wasn't involved in the actual implementation.
> 
> The idea is to get something more like Linux's /proc/kcore, but for a
> QEMU guest. So, yes, the same idea could be implemented as a standalone
> application which talks to QEMU using the gdb remote protocol and
> exposes the data in a structured form through a FUSE filesystem.
> 
> However, the performance of such a solution cannot get even close to
> that of exposing the data directly from QEMU. Maybe it's still the best
> way to start the project...

If you want no overhead and are willing to pause the guest, use QEMU's
gdb stub (directly, no extra FUSE file system layer).  If you cannot
pause the guest then take a copy of memory with dump-guest-memory to
tmpfs.

There might be a middle-ground where you can copy-on-write pages and let
the guest continue to run, but this is probably not worth the
effort/complexity.

I find it hard to see where adding more code or using FUSE would make
things better?

Stefan
Nan Li May 10, 2016, 11:02 a.m. UTC | #11
>>> On 5/10/2016 at 5:56 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> On Tue, May 10, 2016 at 07:59:41AM +0200, Petr Tesarik wrote:
>> On Mon, 9 May 2016 09:52:28 ?0600
>> Eric Blake <eblake@redhat.com> wrote:
>> 
>> > On 05/07/2016 05:32 PM, Nan Li wrote:
>> > > When running the command "dump?guest?memory", we usually need a large space
>> > > of storage to save the dumpfile into disk. It costs not only much time to
>> > > save a file in some of hard disks, but also costs limited storage in host.
>> > > In order to reduce the saving time and make it convenient for users to 
> dump
>> > > the guest memory, we introduce a Filesystem in Userspace (FUSE) to save 
> the
>> > > dump file in RAM. It is selectable in the configure file, adding a 
> compiling
>> > > of package "fuse?devel". It doesn't change the way of dumping guest memory.
>> > 
>> > Why introduce FUSE? Can we reuse NBD instead?
>> 
>> Let me answer this one, because it's me who came up with the idea,
>> although I wasn't involved in the actual implementation.
>> 
>> The idea is to get something more like Linux's /proc/kcore, but for a
>> QEMU guest. So, yes, the same idea could be implemented as a standalone
>> application which talks to QEMU using the gdb remote protocol and
>> exposes the data in a structured form through a FUSE filesystem.
>> 
>> However, the performance of such a solution cannot get even close to
>> that of exposing the data directly from QEMU. Maybe it's still the best
>> way to start the project...
> 
> If you want no overhead and are willing to pause the guest, use QEMU's
> gdb stub (directly, no extra FUSE file system layer).  If you cannot
> pause the guest then take a copy of memory with dump?guest?memory to
> tmpfs.
> 
> There might be a middle?ground where you can copy?on?write pages and let
> the guest continue to run, but this is probably not worth the
> effort/complexity.
> 

Yes, pausing the guest and then accessing the guest memory to do the core analysis work 
is much easier than handing the running guest. Thank you for your thoughts.

> I find it hard to see where adding more code or using FUSE would make
> things better?
> 
> Stefan
Nan Li May 10, 2016, 11:26 a.m. UTC | #12
>>> On 5/10/2016 at 5:42 PM, Petr Tesarik <ptesarik@suse.com> wrote:
> On Tue, 10 May 2016 09:48:48 +0100
> "Daniel P. Berrange" <berrange@redhat.com> wrote:
> 
>> On Tue, May 10, 2016 at 07:59:41AM +0200, Petr Tesarik wrote:
>> > On Mon, 9 May 2016 09:52:28 ?0600
>> > Eric Blake <eblake@redhat.com> wrote:
>> > 
>> > > On 05/07/2016 05:32 PM, Nan Li wrote:
>> > > > When running the command "dump?guest?memory", we usually need a large space
>> > > > of storage to save the dumpfile into disk. It costs not only much time to
>> > > > save a file in some of hard disks, but also costs limited storage in 
> host.
>> > > > In order to reduce the saving time and make it convenient for users to 
> dump
>> > > > the guest memory, we introduce a Filesystem in Userspace (FUSE) to save 
> the
>> > > > dump file in RAM. It is selectable in the configure file, adding a 
> compiling
>> > > > of package "fuse?devel". It doesn't change the way of dumping guest 
> memory.
>> > > 
>> > > Why introduce FUSE? Can we reuse NBD instead?
>> > 
>> > Let me answer this one, because it's me who came up with the idea,
>> > although I wasn't involved in the actual implementation.
>> > 
>> > The idea is to get something more like Linux's /proc/kcore, but for a
>> > QEMU guest. So, yes, the same idea could be implemented as a standalone
>> > application which talks to QEMU using the gdb remote protocol and
>> > exposes the data in a structured form through a FUSE filesystem.
>> > 
>> > However, the performance of such a solution cannot get even close to
>> > that of exposing the data directly from QEMU. Maybe it's still the best
>> > way to start the project...
>> 
>> IIUC, the performance penalty will be related to the copying of guest
>> RAM. All the other supplementary information you want (register state
>> etc) is low volume, so should not be performance critical to copy that
>> over the QMP monitor command or via libvirt monitor command passthrough.
> 
> Agreed. Even if the number of guest CPUs ever rises to the order of
> thousands, the additional impact is negligible.
> 
>> So if want to have an external program provide a /proc/kcore like
>> service via FUSE, the problem we need to solve here is a mechanism
>> for providing efficient access to QEMU memory.
> 
> Indeed. This is the main reason for tinkering with QEMU sources at all.
> 
>> I think this can be done quite simply by having QEMU guest RAM exposed
>> via tmpfs or hugetlbfs as appropriate. This approach is what is already
>> used for the vhost?user network backend in an external process which
>> likewise needs copy?free access to guest RAM pages.
> 
> Ha! We didn't realize this is an option. We can certainly have a look
> at implementing a generic mechanism for mapping QEMU guest RAM from
> another process on the host. And yes, this would address any
> performance concerns nicely.
> 

Agreed. It sounds a good option. I will try to investigate it.

>> Obviously this requires that users start QEMU in this particular setup
>> for RAM, but I don't think that's a particularly onerous requirement
>> as any non?trivial management application will already know how to do
>> this.
> 
> Agreed. This is not an issue. Our main target would be libvirt, which
> adds quite a bit of infrastructure already. ;?)
> 
> Thanks for your thoughts!
> 
> Petr T

Thanks very much for all your thoughts.

Nan Li
Petr Tesarik May 10, 2016, 11:55 a.m. UTC | #13
On Tue, 10 May 2016 10:56:42 +0100
Stefan Hajnoczi <stefanha@gmail.com> wrote:

> On Tue, May 10, 2016 at 07:59:41AM +0200, Petr Tesarik wrote:
> > On Mon, 9 May 2016 09:52:28 -0600
> > Eric Blake <eblake@redhat.com> wrote:
> > 
> > > On 05/07/2016 05:32 PM, Nan Li wrote:
> > > > When running the command "dump-guest-memory", we usually need a large space
> > > > of storage to save the dumpfile into disk. It costs not only much time to
> > > > save a file in some of hard disks, but also costs limited storage in host.
> > > > In order to reduce the saving time and make it convenient for users to dump
> > > > the guest memory, we introduce a Filesystem in Userspace (FUSE) to save the
> > > > dump file in RAM. It is selectable in the configure file, adding a compiling
> > > > of package "fuse-devel". It doesn't change the way of dumping guest memory.
> > > 
> > > Why introduce FUSE? Can we reuse NBD instead?
> > 
> > Let me answer this one, because it's me who came up with the idea,
> > although I wasn't involved in the actual implementation.
> > 
> > The idea is to get something more like Linux's /proc/kcore, but for a
> > QEMU guest. So, yes, the same idea could be implemented as a standalone
> > application which talks to QEMU using the gdb remote protocol and
> > exposes the data in a structured form through a FUSE filesystem.
> > 
> > However, the performance of such a solution cannot get even close to
> > that of exposing the data directly from QEMU. Maybe it's still the best
> > way to start the project...
> 
> If you want no overhead and are willing to pause the guest, use QEMU's
> gdb stub (directly, no extra FUSE file system layer).

Well, the obvious downside of this solution is that you need GDB
protocol support. AFAIK there are more tools which can work with ELF
dump files than with the GDB protocol. Sure, I could add GDB protocol
support to each and every one of them, but I fail to see how that is
better use of time than adding an additional layer which allows to use
any ELF-capable tool directly.

> If you cannot
> pause the guest then take a copy of memory with dump-guest-memory to
> tmpfs.
> 
> There might be a middle-ground where you can copy-on-write pages and let
> the guest continue to run, but this is probably not worth the
> effort/complexity.
> 
> I find it hard to see where adding more code or using FUSE would make
> things better?

Please see my explanation in another branch of this thread why
generating dump files on the fly is better for some use cases than
saving a complete copy.

BTW FUSE is definitely not tmpfs. See the nice diagram on Wikipedia:
https://en.wikipedia.org/wiki/Filesystem_in_Userspace

Our main motivation is not better performance but more flexibility.
OTOH why should we burn more CPU cycles than necessary?

Petr T
Stefan Hajnoczi May 12, 2016, 10:09 a.m. UTC | #14
On Tue, May 10, 2016 at 01:55:10PM +0200, Petr Tesarik wrote:
> On Tue, 10 May 2016 10:56:42 +0100
> Stefan Hajnoczi <stefanha@gmail.com> wrote:
> 
> > On Tue, May 10, 2016 at 07:59:41AM +0200, Petr Tesarik wrote:
> > > On Mon, 9 May 2016 09:52:28 -0600
> > > Eric Blake <eblake@redhat.com> wrote:
> > > 
> > > > On 05/07/2016 05:32 PM, Nan Li wrote:
> > > > > When running the command "dump-guest-memory", we usually need a large space
> > > > > of storage to save the dumpfile into disk. It costs not only much time to
> > > > > save a file in some of hard disks, but also costs limited storage in host.
> > > > > In order to reduce the saving time and make it convenient for users to dump
> > > > > the guest memory, we introduce a Filesystem in Userspace (FUSE) to save the
> > > > > dump file in RAM. It is selectable in the configure file, adding a compiling
> > > > > of package "fuse-devel". It doesn't change the way of dumping guest memory.
> > > > 
> > > > Why introduce FUSE? Can we reuse NBD instead?
> > > 
> > > Let me answer this one, because it's me who came up with the idea,
> > > although I wasn't involved in the actual implementation.
> > > 
> > > The idea is to get something more like Linux's /proc/kcore, but for a
> > > QEMU guest. So, yes, the same idea could be implemented as a standalone
> > > application which talks to QEMU using the gdb remote protocol and
> > > exposes the data in a structured form through a FUSE filesystem.
> > > 
> > > However, the performance of such a solution cannot get even close to
> > > that of exposing the data directly from QEMU. Maybe it's still the best
> > > way to start the project...
> > 
> > If you want no overhead and are willing to pause the guest, use QEMU's
> > gdb stub (directly, no extra FUSE file system layer).
> 
> Well, the obvious downside of this solution is that you need GDB
> protocol support. AFAIK there are more tools which can work with ELF
> dump files than with the GDB protocol. Sure, I could add GDB protocol
> support to each and every one of them, but I fail to see how that is
> better use of time than adding an additional layer which allows to use
> any ELF-capable tool directly.

Out of interest, which tools are you thinking about?

I use gdb and crash.  Would be interesting to learn about additional
options that you are familiar with.

Stefan
Petr Tesarik May 12, 2016, 10:30 a.m. UTC | #15
On Thu, 12 May 2016 11:09:02 +0100
Stefan Hajnoczi <stefanha@gmail.com> wrote:

> On Tue, May 10, 2016 at 01:55:10PM +0200, Petr Tesarik wrote:
> > On Tue, 10 May 2016 10:56:42 +0100
> > Stefan Hajnoczi <stefanha@gmail.com> wrote:
> > 
> > > On Tue, May 10, 2016 at 07:59:41AM +0200, Petr Tesarik wrote:
> > > > On Mon, 9 May 2016 09:52:28 -0600
> > > > Eric Blake <eblake@redhat.com> wrote:
> > > > 
> > > > > On 05/07/2016 05:32 PM, Nan Li wrote:
> > > > > > When running the command "dump-guest-memory", we usually need a large space
> > > > > > of storage to save the dumpfile into disk. It costs not only much time to
> > > > > > save a file in some of hard disks, but also costs limited storage in host.
> > > > > > In order to reduce the saving time and make it convenient for users to dump
> > > > > > the guest memory, we introduce a Filesystem in Userspace (FUSE) to save the
> > > > > > dump file in RAM. It is selectable in the configure file, adding a compiling
> > > > > > of package "fuse-devel". It doesn't change the way of dumping guest memory.
> > > > > 
> > > > > Why introduce FUSE? Can we reuse NBD instead?
> > > > 
> > > > Let me answer this one, because it's me who came up with the idea,
> > > > although I wasn't involved in the actual implementation.
> > > > 
> > > > The idea is to get something more like Linux's /proc/kcore, but for a
> > > > QEMU guest. So, yes, the same idea could be implemented as a standalone
> > > > application which talks to QEMU using the gdb remote protocol and
> > > > exposes the data in a structured form through a FUSE filesystem.
> > > > 
> > > > However, the performance of such a solution cannot get even close to
> > > > that of exposing the data directly from QEMU. Maybe it's still the best
> > > > way to start the project...
> > > 
> > > If you want no overhead and are willing to pause the guest, use QEMU's
> > > gdb stub (directly, no extra FUSE file system layer).
> > 
> > Well, the obvious downside of this solution is that you need GDB
> > protocol support. AFAIK there are more tools which can work with ELF
> > dump files than with the GDB protocol. Sure, I could add GDB protocol
> > support to each and every one of them, but I fail to see how that is
> > better use of time than adding an additional layer which allows to use
> > any ELF-capable tool directly.
> 
> Out of interest, which tools are you thinking about?
> 
> I use gdb and crash.  Would be interesting to learn about additional
> options that you are familiar with.

The one that started my thinking was "makedumpfile". I'm also exploring
ways of writing a standalone eppic application on top of libkdumpfile,
and, of course, you can use libkdumpfile python bindings today already
to write an analysis tool in python which will work equally well on
dumps and live VMs.

Petr T
diff mbox

Patch

diff --git a/Makefile.target b/Makefile.target
index 34ddb7e..7619ef8 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -138,6 +138,7 @@  obj-$(CONFIG_KVM) += kvm-all.o
 obj-y += memory.o cputlb.o
 obj-y += memory_mapping.o
 obj-y += dump.o
+obj-$(CONFIG_FUSE) += fuse-mem.o
 obj-y += migration/ram.o migration/savevm.o
 LIBS := $(libs_softmmu) $(LIBS)
 
diff --git a/configure b/configure
index 5db29f0..0769caf 100755
--- a/configure
+++ b/configure
@@ -275,6 +275,7 @@  trace_backends="log"
 trace_file="trace"
 spice=""
 rbd=""
+fuse="yes"
 smartcard=""
 libusb=""
 usb_redir=""
@@ -1023,6 +1024,10 @@  for opt do
   ;;
   --enable-rbd) rbd="yes"
   ;;
+  --disable-fuse) fuse="no"
+  ;;
+  --enable-fuse) fuse="yes"
+  ;;
   --disable-xfsctl) xfs="no"
   ;;
   --enable-xfsctl) xfs="yes"
@@ -1349,6 +1354,7 @@  disabled with --disable-FEATURE, default is enabled if available:
   vhost-net       vhost-net acceleration support
   spice           spice
   rbd             rados block device (rbd)
+  fuse            the support of dumping guest memory via fuse
   libiscsi        iscsi support
   libnfs          nfs support
   smartcard       smartcard support (libcacard)
@@ -3139,6 +3145,28 @@  EOF
 fi
 
 ##########################################
+# fuse probe
+min_fuse_version=2.9.3
+if test "$fuse" != "no" ; then
+  if $pkg_config --atleast-version=$min_fuse_version fuse; then
+    fuse_cflags=`$pkg_config fuse --cflags`
+    fuse_libs=`$pkg_config fuse --libs`
+    QEMU_CFLAGS="$fuse_cflags $QEMU_CFLAGS"
+    libs_softmmu="$fuse_libs $libs_softmmu"
+    fuse=yes
+  else
+    if $pkg_config fuse; then
+      if test "$fuse" = "yes" ; then
+        error_exit "fuse >= $min_fuse_version required for --enable-fuse"
+      fi
+    else
+      feature_not_found "fuse" "Please install fuse devel pkgs: fuse-devel"
+    fi
+    fuse=no
+  fi
+fi
+
+##########################################
 # libssh2 probe
 min_libssh2_version=1.2.8
 if test "$libssh2" != "no" ; then
@@ -4815,6 +4843,7 @@  else
 echo "spice support     $spice"
 fi
 echo "rbd support       $rbd"
+echo "fuse support      $fuse"
 echo "xfsctl support    $xfs"
 echo "smartcard support $smartcard"
 echo "libusb            $libusb"
@@ -5293,6 +5322,11 @@  if test "$rbd" = "yes" ; then
   echo "RBD_CFLAGS=$rbd_cflags" >> $config_host_mak
   echo "RBD_LIBS=$rbd_libs" >> $config_host_mak
 fi
+if test "$fuse" = "yes" ; then
+  echo "CONFIG_FUSE=y" >> $config_host_mak
+  echo "FUSE_CFLAGS=$fuse_cflags" >> $config_host_mak
+  echo "FUSE_LIBS=$fuse_libs" >> $config_host_mak
+fi
 
 echo "CONFIG_COROUTINE_BACKEND=$coroutine" >> $config_host_mak
 if test "$coroutine_pool" = "yes" ; then
diff --git a/fuse-mem.c b/fuse-mem.c
new file mode 100644
index 0000000..3365ddb
--- /dev/null
+++ b/fuse-mem.c
@@ -0,0 +1,376 @@ 
+/*
+
+  gcc -Wall myfuse.c -lfuse -D_FILE_OFFSET_BITS=64 -o myfuse
+*/
+
+#define FUSE_USE_VERSION 26
+
+#include <fuse.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <errno.h>
+#include <fcntl.h>
+#include "fuse-mem.h"
+
+//static const char *qemu_str = "Hello World!\n";
+//static const char *qemu_path = "/etc/qemu";
+
+#define PAGE_SIZE	(0x100000)
+#define FILE_BUFFER_PAGE	(PAGE_SIZE - sizeof(struct file_buffer))
+
+struct file_buffer {
+	struct file_buffer *next;
+	size_t used;
+	size_t size;
+	/* Data points here */
+	unsigned char data[0];
+};
+
+struct file_bufhead {
+	//spinlock_t lock;
+	struct file_buffer *head;	
+	struct file_buffer *tail;
+	//struct file_buffer *current;	
+	size_t filesize;
+	//off_t offset;
+	//char *offset_ptr;
+};
+
+struct fuse_file {
+	char path[128];
+	struct fuse_file_info fileinfo;
+	struct file_bufhead file;
+	struct fuse_file *next;
+};
+
+struct fuse_file_root {
+	struct fuse_file *head;
+	struct fuse_file *tail;
+};
+
+struct fuse_file_root root;
+
+#if 0
+void dumpfile(struct fuse_file *fuse_file_ptr)
+{
+	struct file_buffer *file_buffer_ptr;
+	int i;
+	printf("DUMPFILE:\n");
+	for (file_buffer_ptr = fuse_file_ptr->file.head; file_buffer_ptr != NULL; file_buffer_ptr = file_buffer_ptr->next) {
+		for (i = 0; i < file_buffer_ptr->used; i++) {
+			printf("Address:0x%x:  0x%x\n", &file_buffer_ptr->data[i], file_buffer_ptr->data[i]);
+		}
+	}
+}
+#endif
+
+static int qemu_fuse_getattr(const char *path, struct stat *stbuf)
+{
+	struct fuse_file *fuse_file_ptr;
+	fuse_file_ptr = root.head;
+	
+	memset(stbuf, 0, sizeof(struct stat));
+	if (strcmp(path, "/") == 0) {
+		stbuf->st_mode = S_IFDIR | 0777;
+		stbuf->st_nlink = 2;
+	} else {
+		while(fuse_file_ptr != NULL) {
+			if (strcmp(fuse_file_ptr->path, path) == 0) {
+				stbuf->st_mode = S_IFREG | 0666;
+				stbuf->st_nlink = 1;
+				stbuf->st_size = fuse_file_ptr->file.filesize;
+				return 0;
+			}
+			else
+				fuse_file_ptr = fuse_file_ptr->next;
+		}
+		return -ENOENT;
+	}
+
+	return 0;
+}
+/*
+static int qemu_fuse_getattr(const char *path, struct stat *stbuf)
+{
+	int res;
+
+	res = lstat(path, stbuf);
+	if (res == -1)
+		return -errno;
+
+	return 0;
+}
+*/
+static int qemu_fuse_fgetattr(const char *path, struct stat *stbuf,
+			struct fuse_file_info *fi)
+{
+	struct fuse_file *fuse_file_ptr;
+	fuse_file_ptr = root.head;
+	
+	memset(stbuf, 0, sizeof(struct stat));
+
+	while(fuse_file_ptr != NULL) {
+		if (fuse_file_ptr->fileinfo.fh == fi->fh) {
+			stbuf->st_mode = S_IFREG | 0666;
+			stbuf->st_nlink = 1;
+			stbuf->st_size = fuse_file_ptr->file.filesize;
+			return 0;
+		}
+		else
+			fuse_file_ptr = fuse_file_ptr->next;
+	}
+	return -ENOENT;
+
+	return 0;
+}
+
+static int qemu_fuse_readdir(const char *path, void *buf, fuse_fill_dir_t filler,
+			 off_t offset, struct fuse_file_info *fi)
+{
+
+	return 0;
+}
+
+static int qemu_fuse_create(const char *path, mode_t mode, struct fuse_file_info *fi)
+{
+	struct fuse_file *fuse_file_ptr;
+
+	fuse_file_ptr = (struct fuse_file *)malloc(sizeof(struct fuse_file));
+	if (fuse_file_ptr) {
+		memcpy(&fuse_file_ptr->fileinfo, fi, sizeof(struct fuse_file_info));
+		memset(&fuse_file_ptr->file, 0, sizeof(fuse_file_ptr->file));
+		fuse_file_ptr->next = NULL;
+		if (root.head == NULL) {
+			root.head = fuse_file_ptr;
+			fi->fh = 1;
+		} else {
+			root.tail->next = fuse_file_ptr;
+			fi->fh = root.tail->fileinfo.fh + 1;
+		}
+		root.tail = fuse_file_ptr;
+		fuse_file_ptr->fileinfo.fh = fi->fh;
+		strcpy(fuse_file_ptr->path, path);
+	} else {
+		return -ENOMEM;
+	}
+	
+	return 0;
+}
+
+static int qemu_fuse_open(const char *path, struct fuse_file_info *fi)
+{
+	struct fuse_file *fuse_file_ptr;
+
+	fuse_file_ptr = root.head;
+
+	while(fuse_file_ptr != NULL) {
+		if (strcmp(fuse_file_ptr->path, path) == 0) {
+			fi->fh = fuse_file_ptr->fileinfo.fh;
+			memcpy(&fuse_file_ptr->fileinfo, fi, sizeof(struct fuse_file_info));
+			return 0;
+		}
+		else
+			fuse_file_ptr = fuse_file_ptr->next;
+	}
+
+	return -ENOENT;
+}
+
+static int qemu_fuse_read(const char *path, char *buf, size_t size, off_t offset,
+		      struct fuse_file_info *fi)
+{
+//printf("herbert:read:size=%u, offset=%u\n", size, offset);
+
+	struct fuse_file *fuse_file_ptr;
+	struct file_buffer *file_buffer_ptr;
+
+	fuse_file_ptr = root.head;
+	long n, count;
+	int item, index;
+	int i = 0, j = 0;
+
+	while(fuse_file_ptr != NULL) {
+		if (fuse_file_ptr->fileinfo.fh == fi->fh) {
+			if ((fuse_file_ptr->file.filesize <= offset) || (fuse_file_ptr->file.filesize == 0))
+				return 0;
+			if (size + offset > fuse_file_ptr->file.filesize)
+				size = fuse_file_ptr->file.filesize - offset;
+			n = size;
+			
+			item = offset / FILE_BUFFER_PAGE;
+			index = offset % FILE_BUFFER_PAGE;
+			
+			for (file_buffer_ptr = fuse_file_ptr->file.head; file_buffer_ptr != NULL; file_buffer_ptr = file_buffer_ptr->next) {
+				if ( i == item )
+					break;
+				i++;
+			}
+
+			j = index;
+			while (file_buffer_ptr != NULL && n > 0) {	
+				if ( n > ((long)file_buffer_ptr->used - j) )
+					count = ((long)file_buffer_ptr->used - j);
+				else
+					count = n;
+				
+				memcpy(buf + size -n, &file_buffer_ptr->data[j], count);
+				n -= count;
+				j = 0;
+				if (n > 0)
+					file_buffer_ptr = file_buffer_ptr->next;
+			}
+//dumpfile(fuse_file_ptr);			
+			return size;
+		}
+		else {
+			fuse_file_ptr = fuse_file_ptr->next;
+		}
+	}
+
+	return -EBADF;
+}
+
+static int qemu_fuse_write(const char *path, const char *buf, size_t size,
+		     off_t offset, struct fuse_file_info *fi)
+{
+//printf("herbert:write:size=%u, offset=%u\n", size, offset);
+
+	struct fuse_file *fuse_file_ptr;
+	struct file_buffer *file_buffer_ptr;
+
+	long n, count;
+	int item, index;
+	int i = 0, j = 0;
+	
+	fuse_file_ptr = root.head;
+
+	while(fuse_file_ptr != NULL) {
+		if (fuse_file_ptr->fileinfo.fh == fi->fh) {
+			n = size;
+			
+			item = offset / FILE_BUFFER_PAGE;
+			index = offset % FILE_BUFFER_PAGE;
+
+			for (file_buffer_ptr = fuse_file_ptr->file.head; file_buffer_ptr != NULL; file_buffer_ptr = file_buffer_ptr->next) {
+				if ( i == item )
+					break;
+				i++;
+			}
+
+			j = index;
+
+			while (file_buffer_ptr != NULL && n > 0) {	
+				if ( n > file_buffer_ptr->size - j )
+					count = file_buffer_ptr->size- j;
+				else
+					count = n;
+				
+				memcpy(&file_buffer_ptr->data[j], buf + size -n, count);
+				if ((count + j - (long)file_buffer_ptr->used) > 0) {
+					fuse_file_ptr->file.filesize += (count + j - (long)file_buffer_ptr->used);
+					file_buffer_ptr->used = count + j; 
+				}
+				n -= count;
+				j = 0;
+				
+				if (n > 0)
+					file_buffer_ptr = file_buffer_ptr->next;
+			}
+
+			while (n > 0) {
+				file_buffer_ptr = (struct file_buffer *)malloc(PAGE_SIZE);
+				if (file_buffer_ptr) {
+					file_buffer_ptr->next = NULL;
+					file_buffer_ptr->size = FILE_BUFFER_PAGE;
+					if ( n > file_buffer_ptr->size )
+						count = file_buffer_ptr->size;
+					else
+						count = n;
+					
+					memcpy(file_buffer_ptr->data, buf + size -n, count);
+					
+					file_buffer_ptr->used = count;
+					
+					if (fuse_file_ptr->file.head == NULL) {
+						fuse_file_ptr->file.head = file_buffer_ptr;
+					} else {
+						fuse_file_ptr->file.tail->next = file_buffer_ptr;
+					}
+					fuse_file_ptr->file.tail = file_buffer_ptr;
+					fuse_file_ptr->file.filesize += count;
+					
+					n -= count;
+					
+					if (n > 0)
+						file_buffer_ptr = file_buffer_ptr->next;
+				} else {
+					return -ENOMEM;
+				}	
+			}
+//dumpfile(fuse_file_ptr);
+			return size;
+			
+		}
+		else {
+			fuse_file_ptr = fuse_file_ptr->next;
+		}
+	}
+	
+	return -EBADF;
+}
+
+static int qemu_fuse_unlink(const char *path)
+{
+	struct fuse_file *fuse_file_ptr, *p;
+	struct file_buffer *file_buffer_ptr, *q;
+	
+	fuse_file_ptr = root.head;
+
+	while(fuse_file_ptr != NULL) {
+		if (strcmp(fuse_file_ptr->path, path) == 0) {
+
+			file_buffer_ptr = fuse_file_ptr->file.head;
+			q = fuse_file_ptr->file.head;
+			while (file_buffer_ptr != NULL) {
+				q = q->next;
+				free(file_buffer_ptr);
+				file_buffer_ptr = q;
+			}
+
+			if (fuse_file_ptr == root.head) {
+				root.head = fuse_file_ptr->next;
+			} else {
+				for (p = root.head; p->next != fuse_file_ptr; p = p->next);
+				p->next = fuse_file_ptr->next;
+			}
+			
+			free(fuse_file_ptr);
+
+			return 0;
+		}
+		else
+			fuse_file_ptr = fuse_file_ptr->next;
+	}
+	
+	return -EACCES;
+}
+
+static struct fuse_operations qemu_fuse_oper = {
+	.getattr	= qemu_fuse_getattr,
+	.fgetattr	= qemu_fuse_fgetattr,
+	.readdir	= qemu_fuse_readdir,
+	.create   = qemu_fuse_create,
+	.open	= qemu_fuse_open,
+	.read	= qemu_fuse_read,
+	.write	= qemu_fuse_write,
+	.unlink	= qemu_fuse_unlink,
+};
+
+extern int qemu_fuse_main(int argc, char *argv[])
+{
+	return fuse_main(argc, argv, &qemu_fuse_oper, NULL);
+}
+
+
+
diff --git a/fuse-mem.h b/fuse-mem.h
new file mode 100644
index 0000000..1a40168
--- /dev/null
+++ b/fuse-mem.h
@@ -0,0 +1,2 @@ 
+extern int qemu_fuse_main(int argc, char *argv[]);
+