diff mbox series

[02/18] fuse: Allow exporting BDSs via FUSE

Message ID 20191219143818.1646168-3-mreitz@redhat.com (mailing list archive)
State New, archived
Headers show
Series block: Allow exporting BDSs via FUSE | expand

Commit Message

Max Reitz Dec. 19, 2019, 2:38 p.m. UTC
fuse-export-add allows mounting block graph nodes via FUSE on some
existing regular file.  That file should then appears like a raw disk
image, and accesses to it result in accesses to the exported BDS.

Right now, we only set up the mount point and tear all mount points down
in bdrv_close_all().  We do not implement any access functions, so
accessing the mount point only results in errors.  This will be
addressed by a followup patch.

The set of exported nodes is kept in a hash table so we can later add a
fuse-export-remove that allows unmounting.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block.c              |   4 +
 block/Makefile.objs  |   3 +
 block/fuse.c         | 260 +++++++++++++++++++++++++++++++++++++++++++
 include/block/fuse.h |  24 ++++
 qapi/block.json      |  23 ++++
 5 files changed, 314 insertions(+)
 create mode 100644 block/fuse.c
 create mode 100644 include/block/fuse.h

Comments

Kevin Wolf Dec. 20, 2019, 10:26 a.m. UTC | #1
Am 19.12.2019 um 15:38 hat Max Reitz geschrieben:
> fuse-export-add allows mounting block graph nodes via FUSE on some
> existing regular file.  That file should then appears like a raw disk
> image, and accesses to it result in accesses to the exported BDS.
> 
> Right now, we only set up the mount point and tear all mount points down
> in bdrv_close_all().  We do not implement any access functions, so
> accessing the mount point only results in errors.  This will be
> addressed by a followup patch.
> 
> The set of exported nodes is kept in a hash table so we can later add a
> fuse-export-remove that allows unmounting.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>

> diff --git a/qapi/block.json b/qapi/block.json
> index 145c268bb6..03f8d1b537 100644
> --- a/qapi/block.json
> +++ b/qapi/block.json
> @@ -317,6 +317,29 @@
>  ##
>  { 'command': 'nbd-server-stop' }
>  
> +##
> +# @fuse-export-add:
> +#
> +# Exports a block graph node on some (file) mountpoint as a raw image.
> +#
> +# @node-name: Node to be exported
> +#
> +# @mountpoint: Path on which to export the block device via FUSE.
> +#              This must point to an existing regular file.
> +#
> +# @writable: Whether clients should be able to write to the block
> +#            device via the FUSE export. (default: false)
> +#
> +# Since: 5.0
> +##
> +{ 'command': 'fuse-export-add',
> +  'data': {
> +      'node-name': 'str',
> +      'mountpoint': 'str',
> +      '*writable': 'bool'
> +  },
> +  'if': 'defined(CONFIG_FUSE)' }

Can this use a BlockExport union from the start like I'm introducing in
the storage daemon series, together with a generic block-export-add?

It also looks like node-name and writable should be part of the common
base of BlockExport. Unfortunately this would mean that I can't use the
same BlockExportNbd for the existing nbd-server-add command any more. I
guess I could somehow get a shared base type for both, though.

Markus, any thoughts on these QAPI interfaces?

Kevin
Max Reitz Dec. 20, 2019, 10:48 a.m. UTC | #2
On 20.12.19 11:26, Kevin Wolf wrote:
> Am 19.12.2019 um 15:38 hat Max Reitz geschrieben:
>> fuse-export-add allows mounting block graph nodes via FUSE on some
>> existing regular file.  That file should then appears like a raw disk
>> image, and accesses to it result in accesses to the exported BDS.
>>
>> Right now, we only set up the mount point and tear all mount points down
>> in bdrv_close_all().  We do not implement any access functions, so
>> accessing the mount point only results in errors.  This will be
>> addressed by a followup patch.
>>
>> The set of exported nodes is kept in a hash table so we can later add a
>> fuse-export-remove that allows unmounting.
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
> 
>> diff --git a/qapi/block.json b/qapi/block.json
>> index 145c268bb6..03f8d1b537 100644
>> --- a/qapi/block.json
>> +++ b/qapi/block.json
>> @@ -317,6 +317,29 @@
>>  ##
>>  { 'command': 'nbd-server-stop' }
>>  
>> +##
>> +# @fuse-export-add:
>> +#
>> +# Exports a block graph node on some (file) mountpoint as a raw image.
>> +#
>> +# @node-name: Node to be exported
>> +#
>> +# @mountpoint: Path on which to export the block device via FUSE.
>> +#              This must point to an existing regular file.
>> +#
>> +# @writable: Whether clients should be able to write to the block
>> +#            device via the FUSE export. (default: false)
>> +#
>> +# Since: 5.0
>> +##
>> +{ 'command': 'fuse-export-add',
>> +  'data': {
>> +      'node-name': 'str',
>> +      'mountpoint': 'str',
>> +      '*writable': 'bool'
>> +  },
>> +  'if': 'defined(CONFIG_FUSE)' }
> 
> Can this use a BlockExport union from the start like I'm introducing in
> the storage daemon series, together with a generic block-export-add?

Hm, you mean still adding a FuseExport structure that would be part of
BlockExport and then dropping fuse-export-add in favor of a
block-export-add that we want anyway?

> It also looks like node-name and writable should be part of the common
> base of BlockExport.

node-name definitely, I’m not so sure about writable.  Or, to be more
precise, I think that if we want writable to be in the base, we also
want growable to be there: Both are primarily options for the
BlockBackend that the exports use.

But both of course also need to be supported by the export
implementation.  nbd can make its BB growable all it wants, but that
doesn’t make it work.

So if we kept writable and growable in the common base, then the schema
would give no information about what exports actually support them.

On one hand, I don’t know whether it’s important to have this
information in a static form, or whether it’s sufficient to learn at
runtime.

On the other, I don’t know whether it’s important to have those fields
in the base or not.  Would it make a difference on the wire?

> Unfortunately this would mean that I can't use the
> same BlockExportNbd for the existing nbd-server-add command any more. I
> guess I could somehow get a shared base type for both, though.

Hm.  This sounds like you want to make it your problem.  Can I take that
to mean that you want to implement block-export-add and I can wait with
v2 until that’s done? :-)

Max

> Markus, any thoughts on these QAPI interfaces?
> 
> Kevin
Kevin Wolf Dec. 20, 2019, 11:24 a.m. UTC | #3
Am 20.12.2019 um 11:48 hat Max Reitz geschrieben:
> On 20.12.19 11:26, Kevin Wolf wrote:
> > Am 19.12.2019 um 15:38 hat Max Reitz geschrieben:
> >> fuse-export-add allows mounting block graph nodes via FUSE on some
> >> existing regular file.  That file should then appears like a raw disk
> >> image, and accesses to it result in accesses to the exported BDS.
> >>
> >> Right now, we only set up the mount point and tear all mount points down
> >> in bdrv_close_all().  We do not implement any access functions, so
> >> accessing the mount point only results in errors.  This will be
> >> addressed by a followup patch.
> >>
> >> The set of exported nodes is kept in a hash table so we can later add a
> >> fuse-export-remove that allows unmounting.
> >>
> >> Signed-off-by: Max Reitz <mreitz@redhat.com>
> > 
> >> diff --git a/qapi/block.json b/qapi/block.json
> >> index 145c268bb6..03f8d1b537 100644
> >> --- a/qapi/block.json
> >> +++ b/qapi/block.json
> >> @@ -317,6 +317,29 @@
> >>  ##
> >>  { 'command': 'nbd-server-stop' }
> >>  
> >> +##
> >> +# @fuse-export-add:
> >> +#
> >> +# Exports a block graph node on some (file) mountpoint as a raw image.
> >> +#
> >> +# @node-name: Node to be exported
> >> +#
> >> +# @mountpoint: Path on which to export the block device via FUSE.
> >> +#              This must point to an existing regular file.
> >> +#
> >> +# @writable: Whether clients should be able to write to the block
> >> +#            device via the FUSE export. (default: false)
> >> +#
> >> +# Since: 5.0
> >> +##
> >> +{ 'command': 'fuse-export-add',
> >> +  'data': {
> >> +      'node-name': 'str',
> >> +      'mountpoint': 'str',
> >> +      '*writable': 'bool'
> >> +  },
> >> +  'if': 'defined(CONFIG_FUSE)' }
> > 
> > Can this use a BlockExport union from the start like I'm introducing in
> > the storage daemon series, together with a generic block-export-add?
> 
> Hm, you mean still adding a FuseExport structure that would be part of
> BlockExport and then dropping fuse-export-add in favor of a
> block-export-add that we want anyway?

Yes.

> > It also looks like node-name and writable should be part of the common
> > base of BlockExport.
> 
> node-name definitely, I’m not so sure about writable.  Or, to be more
> precise, I think that if we want writable to be in the base, we also
> want growable to be there: Both are primarily options for the
> BlockBackend that the exports use.
> 
> But both of course also need to be supported by the export
> implementation.  nbd can make its BB growable all it wants, but that
> doesn’t make it work.

Right. Pragmatically, I think exports are very like to support writable,
but probably rather unlikely to support growable. So I do think there
would be a point for making writable part of the common base, but not
growable.

> So if we kept writable and growable in the common base, then the schema
> would give no information about what exports actually support them.
> 
> On one hand, I don’t know whether it’s important to have this
> information in a static form, or whether it’s sufficient to learn at
> runtime.
> 
> On the other, I don’t know whether it’s important to have those fields
> in the base or not.  Would it make a difference on the wire?

Not for the command itself, so I think we're free to change it later. It
might make a difference for introspection, though, not sure. Markus?

Having it in the base might allow us to remove some duplication in the
code. Probably not much, though, so not too important.

> > Unfortunately this would mean that I can't use the
> > same BlockExportNbd for the existing nbd-server-add command any more. I
> > guess I could somehow get a shared base type for both, though.
> 
> Hm.  This sounds like you want to make it your problem.  Can I take that
> to mean that you want to implement block-export-add and I can wait with
> v2 until that’s done? :-)

The NBD integration, yes. I already added the BlockExport type to my
patches, too, but I expect you would beat me to it. I'm not currently
planning to write a block-export-add because it doesn't add anything new
for the storage daemon, so FuseExport and the command this is your part.
The type currently only exists for --export.

Kevin
Max Reitz Dec. 20, 2019, 12:09 p.m. UTC | #4
On 20.12.19 12:24, Kevin Wolf wrote:
> Am 20.12.2019 um 11:48 hat Max Reitz geschrieben:
>> On 20.12.19 11:26, Kevin Wolf wrote:
>>> Am 19.12.2019 um 15:38 hat Max Reitz geschrieben:
>>>> fuse-export-add allows mounting block graph nodes via FUSE on some
>>>> existing regular file.  That file should then appears like a raw disk
>>>> image, and accesses to it result in accesses to the exported BDS.
>>>>
>>>> Right now, we only set up the mount point and tear all mount points down
>>>> in bdrv_close_all().  We do not implement any access functions, so
>>>> accessing the mount point only results in errors.  This will be
>>>> addressed by a followup patch.
>>>>
>>>> The set of exported nodes is kept in a hash table so we can later add a
>>>> fuse-export-remove that allows unmounting.
>>>>
>>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>>
>>>> diff --git a/qapi/block.json b/qapi/block.json
>>>> index 145c268bb6..03f8d1b537 100644
>>>> --- a/qapi/block.json
>>>> +++ b/qapi/block.json
>>>> @@ -317,6 +317,29 @@
>>>>  ##
>>>>  { 'command': 'nbd-server-stop' }
>>>>  
>>>> +##
>>>> +# @fuse-export-add:
>>>> +#
>>>> +# Exports a block graph node on some (file) mountpoint as a raw image.
>>>> +#
>>>> +# @node-name: Node to be exported
>>>> +#
>>>> +# @mountpoint: Path on which to export the block device via FUSE.
>>>> +#              This must point to an existing regular file.
>>>> +#
>>>> +# @writable: Whether clients should be able to write to the block
>>>> +#            device via the FUSE export. (default: false)
>>>> +#
>>>> +# Since: 5.0
>>>> +##
>>>> +{ 'command': 'fuse-export-add',
>>>> +  'data': {
>>>> +      'node-name': 'str',
>>>> +      'mountpoint': 'str',
>>>> +      '*writable': 'bool'
>>>> +  },
>>>> +  'if': 'defined(CONFIG_FUSE)' }
>>>
>>> Can this use a BlockExport union from the start like I'm introducing in
>>> the storage daemon series, together with a generic block-export-add?
>>
>> Hm, you mean still adding a FuseExport structure that would be part of
>> BlockExport and then dropping fuse-export-add in favor of a
>> block-export-add that we want anyway?
> 
> Yes.
> 
>>> It also looks like node-name and writable should be part of the common
>>> base of BlockExport.
>>
>> node-name definitely, I’m not so sure about writable.  Or, to be more
>> precise, I think that if we want writable to be in the base, we also
>> want growable to be there: Both are primarily options for the
>> BlockBackend that the exports use.
>>
>> But both of course also need to be supported by the export
>> implementation.  nbd can make its BB growable all it wants, but that
>> doesn’t make it work.
> 
> Right. Pragmatically, I think exports are very like to support writable,
> but probably rather unlikely to support growable. So I do think there
> would be a point for making writable part of the common base, but not
> growable.

True.

But there’s nothing that inherently binds it to FUSE, so I think both
from an implementation’s POV and from a user’s POV, it looks just as
generic as “writable”.  But that’s theory.  I agree that in practice, it
won’t be as generic.

(I realize this doesn’t help much in finding out what we should do.)

>> So if we kept writable and growable in the common base, then the schema
>> would give no information about what exports actually support them.
>>
>> On one hand, I don’t know whether it’s important to have this
>> information in a static form, or whether it’s sufficient to learn at
>> runtime.
>>
>> On the other, I don’t know whether it’s important to have those fields
>> in the base or not.  Would it make a difference on the wire?
> 
> Not for the command itself, so I think we're free to change it later. It
> might make a difference for introspection, though, not sure. Markus?

Yes, I asked because I’m wondering whether it would be more cumbersome
to users if we didn’t keep it in the base structure.

The duplication depends on how we want to design the command.  Should
the export implementations receive a ready-to-use BB?  Or just a
node-name?  In the latter case, we wouldn’t get rid of duplicated code
by having writable/growable in the base.  For the former, it could, but
then again, just taking the WRITE permission and making the BB growable
isn’t that bad to duplicate.

Something to consider is that of course the current NBD code wants a
node-name and not a BB.  So if we decided to generally give export
implementations a BB, then the initial implementation of
qmp_block_export_add() would look a bit freaky: It would first branch
off to qmp_nbd_server_add(), and then open the BB for the “common” case,
but this common case only exists for FUSE (right now).

OTOH, right now we’re free to decide whether we open the BB in
qmp_block_export_add() or fuse.c, and so we might as well just do it in
the former.  If we later find out that this was a stupid idea, we can
always move it into fuse.c.


Now I don’t quite know where I’m trying to get with this.

I suppose it means that we should start with qmp_block_export_add()
creating the BB and handing it over to the export implementation.

Then it makes sense to me that the BB should be ready to go, i.e. have
all the necessary flags and permissions set.  If so, writable and
growable should be part of the base, so the WRITE permission can be
taken and allow_write_beyond_eof can be set.

But there’s a catch: The common code actually cannot pass on a
ready-to-go BB, because it depends on the export type what kinds of
permissions can be shared.  FUSE exports are fine with sharing the
RESIZE permission, but NBD can’t do that.  And, well, if we require the
export implementation to adjust the permissions anyway, we might as well
require it to take WRITE if necessary.  And then the argument for
avoiding duplication is gone.

> Having it in the base might allow us to remove some duplication in the
> code. Probably not much, though, so not too important.
> 
>>> Unfortunately this would mean that I can't use the
>>> same BlockExportNbd for the existing nbd-server-add command any more. I
>>> guess I could somehow get a shared base type for both, though.
>>
>> Hm.  This sounds like you want to make it your problem.  Can I take that
>> to mean that you want to implement block-export-add and I can wait with
>> v2 until that’s done? :-)
> 
> The NBD integration, yes. I already added the BlockExport type to my
> patches, too, but I expect you would beat me to it. I'm not currently
> planning to write a block-export-add because it doesn't add anything new
> for the storage daemon, so FuseExport and the command this is your part.
> The type currently only exists for --export.

That’s too bad. ;-)

I was mostly asking because I imagine it would actually make more sense
to add block-export-add in a seperate (prerequisite) series.

Max
Markus Armbruster Dec. 20, 2019, 12:48 p.m. UTC | #5
Kevin Wolf <kwolf@redhat.com> writes:

> Am 20.12.2019 um 11:48 hat Max Reitz geschrieben:
>> On 20.12.19 11:26, Kevin Wolf wrote:
>> > Am 19.12.2019 um 15:38 hat Max Reitz geschrieben:
>> >> fuse-export-add allows mounting block graph nodes via FUSE on some
>> >> existing regular file.  That file should then appears like a raw disk
>> >> image, and accesses to it result in accesses to the exported BDS.
>> >>
>> >> Right now, we only set up the mount point and tear all mount points down
>> >> in bdrv_close_all().  We do not implement any access functions, so
>> >> accessing the mount point only results in errors.  This will be
>> >> addressed by a followup patch.
>> >>
>> >> The set of exported nodes is kept in a hash table so we can later add a
>> >> fuse-export-remove that allows unmounting.
>> >>
>> >> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> > 
>> >> diff --git a/qapi/block.json b/qapi/block.json
>> >> index 145c268bb6..03f8d1b537 100644
>> >> --- a/qapi/block.json
>> >> +++ b/qapi/block.json
>> >> @@ -317,6 +317,29 @@
>> >>  ##
>> >>  { 'command': 'nbd-server-stop' }
>> >>  
>> >> +##
>> >> +# @fuse-export-add:
>> >> +#
>> >> +# Exports a block graph node on some (file) mountpoint as a raw image.
>> >> +#
>> >> +# @node-name: Node to be exported
>> >> +#
>> >> +# @mountpoint: Path on which to export the block device via FUSE.
>> >> +#              This must point to an existing regular file.
>> >> +#
>> >> +# @writable: Whether clients should be able to write to the block
>> >> +#            device via the FUSE export. (default: false)
>> >> +#
>> >> +# Since: 5.0
>> >> +##
>> >> +{ 'command': 'fuse-export-add',
>> >> +  'data': {
>> >> +      'node-name': 'str',
>> >> +      'mountpoint': 'str',
>> >> +      '*writable': 'bool'
>> >> +  },
>> >> +  'if': 'defined(CONFIG_FUSE)' }
>> > 
>> > Can this use a BlockExport union from the start like I'm introducing in
>> > the storage daemon series, together with a generic block-export-add?
>> 
>> Hm, you mean still adding a FuseExport structure that would be part of
>> BlockExport and then dropping fuse-export-add in favor of a
>> block-export-add that we want anyway?
>
> Yes.
>
>> > It also looks like node-name and writable should be part of the common
>> > base of BlockExport.
>> 
>> node-name definitely, I’m not so sure about writable.  Or, to be more
>> precise, I think that if we want writable to be in the base, we also
>> want growable to be there: Both are primarily options for the
>> BlockBackend that the exports use.
>> 
>> But both of course also need to be supported by the export
>> implementation.  nbd can make its BB growable all it wants, but that
>> doesn’t make it work.
>
> Right. Pragmatically, I think exports are very like to support writable,
> but probably rather unlikely to support growable. So I do think there
> would be a point for making writable part of the common base, but not
> growable.
>
>> So if we kept writable and growable in the common base, then the schema
>> would give no information about what exports actually support them.
>> 
>> On one hand, I don’t know whether it’s important to have this
>> information in a static form, or whether it’s sufficient to learn at
>> runtime.
>> 
>> On the other, I don’t know whether it’s important to have those fields
>> in the base or not.  Would it make a difference on the wire?
>
> Not for the command itself, so I think we're free to change it later. It
> might make a difference for introspection, though, not sure. Markus?

QAPI schema introspection is designed to hide the difference between
local members and base members.  You can move members to or from a base
type freely without affecting introspection.  Even if that creates or
deletes the base type.

Example: DriveBackup

    { 'struct': 'DriveBackup',
      'base': 'BackupCommon',
      'data': { 'target': 'str',
                '*format': 'str',
                '*mode': 'NewImageMode' } }

where BackupCommon is

    { 'struct': 'BackupCommon',
      'data': { '*job-id': 'str', 'device': 'str',
                'sync': 'MirrorSyncMode', '*speed': 'int',
                '*bitmap': 'str', '*bitmap-mode': 'BitmapSyncMode',
                '*compress': 'bool',
                '*on-source-error': 'BlockdevOnError',
                '*on-target-error': 'BlockdevOnError',
                '*auto-finalize': 'bool', '*auto-dismiss': 'bool',
                '*filter-node-name': 'str' } }

query-qmp-schema describes DriveBackup like this:

    {"name": "30",
     "members": [
         {"name": "job-id", "default": null, "type": "str"},
         {"name": "device", "type": "str"},
         {"name": "sync", "type": "235"},
         {"name": "speed", "default": null, "type": "int"},
         {"name": "bitmap", "default": null, "type": "str"},
         {"name": "bitmap-mode", "default": null, "type": "236"},
         {"name": "compress", "default": null, "type": "bool"},
         {"name": "on-source-error", "default": null, "type": "237"},
         {"name": "on-target-error", "default": null, "type": "237"},
         {"name": "auto-finalize", "default": null, "type": "bool"},
         {"name": "auto-dismiss", "default": null, "type": "bool"},
         {"name": "filter-node-name", "default": null, "type": "str"},
         {"name": "target", "type": "str"},
         {"name": "format", "default": null, "type": "str"},
         {"name": "mode", "default": null, "type": "234"}],
     "meta-type": "object"}

[...]
Markus Armbruster Dec. 20, 2019, 12:49 p.m. UTC | #6
Kevin Wolf <kwolf@redhat.com> writes:

> Am 19.12.2019 um 15:38 hat Max Reitz geschrieben:
>> fuse-export-add allows mounting block graph nodes via FUSE on some
>> existing regular file.  That file should then appears like a raw disk
>> image, and accesses to it result in accesses to the exported BDS.
>> 
>> Right now, we only set up the mount point and tear all mount points down
>> in bdrv_close_all().  We do not implement any access functions, so
>> accessing the mount point only results in errors.  This will be
>> addressed by a followup patch.
>> 
>> The set of exported nodes is kept in a hash table so we can later add a
>> fuse-export-remove that allows unmounting.
>> 
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>
>> diff --git a/qapi/block.json b/qapi/block.json
>> index 145c268bb6..03f8d1b537 100644
>> --- a/qapi/block.json
>> +++ b/qapi/block.json
>> @@ -317,6 +317,29 @@
>>  ##
>>  { 'command': 'nbd-server-stop' }
>>  
>> +##
>> +# @fuse-export-add:
>> +#
>> +# Exports a block graph node on some (file) mountpoint as a raw image.
>> +#
>> +# @node-name: Node to be exported
>> +#
>> +# @mountpoint: Path on which to export the block device via FUSE.
>> +#              This must point to an existing regular file.
>> +#
>> +# @writable: Whether clients should be able to write to the block
>> +#            device via the FUSE export. (default: false)
>> +#
>> +# Since: 5.0
>> +##
>> +{ 'command': 'fuse-export-add',
>> +  'data': {
>> +      'node-name': 'str',
>> +      'mountpoint': 'str',
>> +      '*writable': 'bool'
>> +  },
>> +  'if': 'defined(CONFIG_FUSE)' }
>
> Can this use a BlockExport union from the start like I'm introducing in
> the storage daemon series, together with a generic block-export-add?
>
> It also looks like node-name and writable should be part of the common
> base of BlockExport. Unfortunately this would mean that I can't use the
> same BlockExportNbd for the existing nbd-server-add command any more. I
> guess I could somehow get a shared base type for both, though.
>
> Markus, any thoughts on these QAPI interfaces?

Context?  How far back should I read?
Kevin Wolf Dec. 20, 2019, 12:58 p.m. UTC | #7
Am 20.12.2019 um 13:48 hat Markus Armbruster geschrieben:
> Kevin Wolf <kwolf@redhat.com> writes:
> 
> > Am 20.12.2019 um 11:48 hat Max Reitz geschrieben:
> >> So if we kept writable and growable in the common base, then the schema
> >> would give no information about what exports actually support them.
> >> 
> >> On one hand, I don’t know whether it’s important to have this
> >> information in a static form, or whether it’s sufficient to learn at
> >> runtime.
> >> 
> >> On the other, I don’t know whether it’s important to have those fields
> >> in the base or not.  Would it make a difference on the wire?
> >
> > Not for the command itself, so I think we're free to change it later. It
> > might make a difference for introspection, though, not sure. Markus?
> 
> QAPI schema introspection is designed to hide the difference between
> local members and base members.  You can move members to or from a base
> type freely without affecting introspection.  Even if that creates or
> deletes the base type.

Good, that's helpful. So I can split the nbd-server-add argument type
into a base that is reused as a union branch and the rest without
potentially breaking anything.

I suppose moving a field between a union base and all variants does
still result in different introspection even though the accepted inputs
are the same. Is this kind of movement still allowed unconditionally or
should we be more careful with something like this?

Kevin
Kevin Wolf Dec. 20, 2019, 1:02 p.m. UTC | #8
Am 20.12.2019 um 13:49 hat Markus Armbruster geschrieben:
> Kevin Wolf <kwolf@redhat.com> writes:
> 
> > Am 19.12.2019 um 15:38 hat Max Reitz geschrieben:
> >> fuse-export-add allows mounting block graph nodes via FUSE on some
> >> existing regular file.  That file should then appears like a raw disk
> >> image, and accesses to it result in accesses to the exported BDS.
> >> 
> >> Right now, we only set up the mount point and tear all mount points down
> >> in bdrv_close_all().  We do not implement any access functions, so
> >> accessing the mount point only results in errors.  This will be
> >> addressed by a followup patch.
> >> 
> >> The set of exported nodes is kept in a hash table so we can later add a
> >> fuse-export-remove that allows unmounting.
> >> 
> >> Signed-off-by: Max Reitz <mreitz@redhat.com>
> >
> >> diff --git a/qapi/block.json b/qapi/block.json
> >> index 145c268bb6..03f8d1b537 100644
> >> --- a/qapi/block.json
> >> +++ b/qapi/block.json
> >> @@ -317,6 +317,29 @@
> >>  ##
> >>  { 'command': 'nbd-server-stop' }
> >>  
> >> +##
> >> +# @fuse-export-add:
> >> +#
> >> +# Exports a block graph node on some (file) mountpoint as a raw image.
> >> +#
> >> +# @node-name: Node to be exported
> >> +#
> >> +# @mountpoint: Path on which to export the block device via FUSE.
> >> +#              This must point to an existing regular file.
> >> +#
> >> +# @writable: Whether clients should be able to write to the block
> >> +#            device via the FUSE export. (default: false)
> >> +#
> >> +# Since: 5.0
> >> +##
> >> +{ 'command': 'fuse-export-add',
> >> +  'data': {
> >> +      'node-name': 'str',
> >> +      'mountpoint': 'str',
> >> +      '*writable': 'bool'
> >> +  },
> >> +  'if': 'defined(CONFIG_FUSE)' }
> >
> > Can this use a BlockExport union from the start like I'm introducing in
> > the storage daemon series, together with a generic block-export-add?
> >
> > It also looks like node-name and writable should be part of the common
> > base of BlockExport. Unfortunately this would mean that I can't use the
> > same BlockExportNbd for the existing nbd-server-add command any more. I
> > guess I could somehow get a shared base type for both, though.
> >
> > Markus, any thoughts on these QAPI interfaces?
> 
> Context?  How far back should I read?

Basically just the hunk quoted above and the QAPI portion of the
following storage daemon patch:

    [RFC PATCH 08/18] qemu-storage-daemon: Add --export option

Maybe the cover letter, too, if you need a more high-level introduction.

Kevin
Markus Armbruster Dec. 20, 2019, 1:25 p.m. UTC | #9
Kevin Wolf <kwolf@redhat.com> writes:

> Am 20.12.2019 um 13:48 hat Markus Armbruster geschrieben:
>> Kevin Wolf <kwolf@redhat.com> writes:
>> 
>> > Am 20.12.2019 um 11:48 hat Max Reitz geschrieben:
>> >> So if we kept writable and growable in the common base, then the schema
>> >> would give no information about what exports actually support them.
>> >> 
>> >> On one hand, I don’t know whether it’s important to have this
>> >> information in a static form, or whether it’s sufficient to learn at
>> >> runtime.
>> >> 
>> >> On the other, I don’t know whether it’s important to have those fields
>> >> in the base or not.  Would it make a difference on the wire?
>> >
>> > Not for the command itself, so I think we're free to change it later. It
>> > might make a difference for introspection, though, not sure. Markus?
>> 
>> QAPI schema introspection is designed to hide the difference between
>> local members and base members.  You can move members to or from a base
>> type freely without affecting introspection.  Even if that creates or
>> deletes the base type.
>
> Good, that's helpful. So I can split the nbd-server-add argument type
> into a base that is reused as a union branch and the rest without
> potentially breaking anything.
>
> I suppose moving a field between a union base and all variants does
> still result in different introspection even though the accepted inputs
> are the same.

Correct.  A common member (whether it's local or from the base) is in
SchemaInfoObject.members[].  Moving it to all the variants moves it to
the variant types' .members[].

>               Is this kind of movement still allowed unconditionally or
> should we be more careful with something like this?

QMP's backward compatibility promise does not include "introspection
value won't change".  Still, such changes can conceivably confuse
clients.  Care is advisable.  But it's not a hard "no".
Eric Blake Dec. 20, 2019, 9:15 p.m. UTC | #10
On 12/19/19 8:38 AM, Max Reitz wrote:
> fuse-export-add allows mounting block graph nodes via FUSE on some
> existing regular file.  That file should then appears like a raw disk
> image, and accesses to it result in accesses to the exported BDS.
> 
> Right now, we only set up the mount point and tear all mount points down
> in bdrv_close_all().  We do not implement any access functions, so
> accessing the mount point only results in errors.  This will be
> addressed by a followup patch.
> 
> The set of exported nodes is kept in a hash table so we can later add a
> fuse-export-remove that allows unmounting.

Before I review this, a quick question:

How does this compare to the recently added nbdfuse?
https://www.redhat.com/archives/libguestfs/2019-October/msg00080.html

Or put another way, maybe we get the same effect by combining qemu-nbd 
with nbdfuse, but this new utility would cut out a middleman for more 
efficiency, right?


> +++ b/block/fuse.c
> @@ -0,0 +1,260 @@
> +/*
> + * Present a block device as a raw image through FUSE
> + *
> + * Copyright (c) 2019 Max Reitz <mreitz@redhat.com>
Eric Blake Dec. 20, 2019, 9:18 p.m. UTC | #11
On 12/20/19 7:25 AM, Markus Armbruster wrote:

>>
>> I suppose moving a field between a union base and all variants does
>> still result in different introspection even though the accepted inputs
>> are the same.
> 
> Correct.  A common member (whether it's local or from the base) is in
> SchemaInfoObject.members[].  Moving it to all the variants moves it to
> the variant types' .members[].
> 
>>                Is this kind of movement still allowed unconditionally or
>> should we be more careful with something like this?
> 
> QMP's backward compatibility promise does not include "introspection
> value won't change".  Still, such changes can conceivably confuse
> clients.  Care is advisable.  But it's not a hard "no".

And libvirt already correctly handles movements like this (so there are 
existing clients aware of the potential confusion).
Max Reitz Jan. 6, 2020, noon UTC | #12
On 20.12.19 22:15, Eric Blake wrote:
> On 12/19/19 8:38 AM, Max Reitz wrote:
>> fuse-export-add allows mounting block graph nodes via FUSE on some
>> existing regular file.  That file should then appears like a raw disk
>> image, and accesses to it result in accesses to the exported BDS.
>>
>> Right now, we only set up the mount point and tear all mount points down
>> in bdrv_close_all().  We do not implement any access functions, so
>> accessing the mount point only results in errors.  This will be
>> addressed by a followup patch.
>>
>> The set of exported nodes is kept in a hash table so we can later add a
>> fuse-export-remove that allows unmounting.
> 
> Before I review this, a quick question:
> 
> How does this compare to the recently added nbdfuse?
> https://www.redhat.com/archives/libguestfs/2019-October/msg00080.html

Hm.  Well, one thing is that it uses a file mount point instead of a
cumbersome directory + "ramdisk" file. O:-)  (Which, again, is fun
because this allows you to mount a qcow2 file on itself so it appears
like a raw image.)

Then we get all native block layer things without needing NBD support,
like resize (also growing on post-EOF writes).

(It also has features the nbdfuse patch mentions are not supported there
yet, i.e. fallocate() (zero writes and discards).  And I don’t suppose
nbdfuse supports lseek() yet either.  I suppose those features could be
added to nbdfuse, but, well, they are here now.)

> Or put another way, maybe we get the same effect by combining qemu-nbd
> with nbdfuse, but this new utility would cut out a middleman for more
> efficiency, right?

I would assume it has better efficiency, yes.  But the performance is
not very good anyway.

Max
diff mbox series

Patch

diff --git a/block.c b/block.c
index c390ec6461..887c0b105e 100644
--- a/block.c
+++ b/block.c
@@ -26,6 +26,7 @@ 
 #include "block/trace.h"
 #include "block/block_int.h"
 #include "block/blockjob.h"
+#include "block/fuse.h"
 #include "block/nbd.h"
 #include "block/qdict.h"
 #include "qemu/error-report.h"
@@ -4077,6 +4078,9 @@  void bdrv_close_all(void)
 {
     assert(job_next(NULL) == NULL);
     nbd_export_close_all();
+#ifdef CONFIG_FUSE
+    fuse_export_close_all();
+#endif
 
     /* Drop references from requests still in flight, such as canceled block
      * jobs whose AIO context has not been polled yet */
diff --git a/block/Makefile.objs b/block/Makefile.objs
index e394fe0b6c..b02846a6e7 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -43,6 +43,7 @@  block-obj-y += crypto.o
 
 block-obj-y += aio_task.o
 block-obj-y += backup-top.o
+block-obj-$(CONFIG_FUSE) += fuse.o
 
 common-obj-y += stream.o
 
@@ -67,3 +68,5 @@  qcow.o-libs        := -lz
 linux-aio.o-libs   := -laio
 parallels.o-cflags := $(LIBXML2_CFLAGS)
 parallels.o-libs   := $(LIBXML2_LIBS)
+fuse.o-cflags      := $(FUSE_CFLAGS)
+fuse.o-libs        := $(FUSE_LIBS)
diff --git a/block/fuse.c b/block/fuse.c
new file mode 100644
index 0000000000..3a22579dca
--- /dev/null
+++ b/block/fuse.c
@@ -0,0 +1,260 @@ 
+/*
+ * Present a block device as a raw image through FUSE
+ *
+ * Copyright (c) 2019 Max Reitz <mreitz@redhat.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; under version 2 or later of the License.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#define FUSE_USE_VERSION 31
+
+#include "qemu/osdep.h"
+#include "block/aio.h"
+#include "block/block.h"
+#include "block/fuse.h"
+#include "block/qapi.h"
+#include "qapi/error.h"
+#include "qapi/qapi-commands-block.h"
+#include "sysemu/block-backend.h"
+
+#include <fuse.h>
+#include <fuse_lowlevel.h>
+
+
+typedef struct BdrvFuseSession {
+    struct fuse_session *fuse_session;
+    struct fuse_buf fuse_buf;
+    BlockBackend *blk;
+    uint64_t perm, shared_perm;
+    bool mounted, fd_handler_set_up;
+    bool writable;
+} BdrvFuseSession;
+
+static GHashTable *sessions;
+static const struct fuse_lowlevel_ops fuse_ops;
+
+static void init_fuse(void);
+static int setup_fuse_session(BdrvFuseSession *session, const char *mountpoint,
+                              Error **errp);
+static void read_from_fuse_session(void *opaque);
+static void close_fuse_session(BdrvFuseSession *session);
+static void drop_fuse_session_from_hash_table(gpointer value);
+
+static bool is_regular_file(const char *path, Error **errp);
+
+
+void qmp_fuse_export_add(const char *node_name, const char *mountpoint,
+                         bool has_writable, bool writable,
+                         Error **errp)
+{
+    BlockDriverState *bs;
+    BdrvFuseSession *session = NULL;
+
+    if (!has_writable) {
+        writable = false;
+    }
+
+    init_fuse();
+
+    /*
+     * It is important to do this check before calling is_regular_file() --
+     * that function will do a stat(), which we would have to handle if we
+     * already exported something on @mountpoint.  But we cannot, because
+     * we are currently caught up here.
+     */
+    if (g_hash_table_contains(sessions, mountpoint)) {
+        error_setg(errp, "There already is a FUSE export on '%s'", mountpoint);
+        goto fail;
+    }
+
+    if (!is_regular_file(mountpoint, errp)) {
+        goto fail;
+    }
+
+    bs = bdrv_find_node(node_name);
+    if (!bs) {
+        error_setg(errp, "Node '%s' not found", node_name);
+        goto fail;
+    }
+
+    session = g_new(BdrvFuseSession, 1);
+    *session = (BdrvFuseSession){
+        .fuse_buf = {
+            .mem = NULL,
+        },
+
+        .writable = writable,
+    };
+
+    session->perm = BLK_PERM_CONSISTENT_READ;
+    if (writable) {
+        session->perm |= BLK_PERM_WRITE;
+    }
+    session->shared_perm = BLK_PERM_ALL;
+
+    session->blk = blk_new(bdrv_get_aio_context(bs),
+                           session->perm, session->shared_perm);
+    if (blk_insert_bs(session->blk, bs, errp) < 0) {
+        goto fail;
+    }
+
+    if (setup_fuse_session(session, mountpoint, errp) < 0) {
+        goto fail;
+    }
+
+    g_hash_table_insert(sessions, g_strdup(mountpoint), session);
+    return;
+
+fail:
+    close_fuse_session(session);
+}
+
+/**
+ * Drop all FUSE exports.
+ */
+void fuse_export_close_all(void)
+{
+    if (sessions) {
+        g_hash_table_destroy(sessions);
+    }
+}
+
+
+/**
+ * Ensure that the global FUSE context is set up.
+ */
+static void init_fuse(void)
+{
+    if (sessions) {
+        return;
+    }
+
+    sessions = g_hash_table_new_full(g_str_hash, g_str_equal, g_free,
+                                     drop_fuse_session_from_hash_table);
+}
+
+/**
+ * Create session->fuse_session and mount it.
+ */
+static int setup_fuse_session(BdrvFuseSession *session, const char *mountpoint,
+                              Error **errp)
+{
+    const char *fuse_argv[2];
+    struct fuse_args fuse_args;
+    int ret;
+
+    fuse_argv[0] = ""; /* Dummy program name */
+    fuse_argv[1] = NULL;
+    fuse_args = (struct fuse_args)FUSE_ARGS_INIT(1, (char **)fuse_argv);
+
+    session->fuse_session = fuse_session_new(&fuse_args, &fuse_ops,
+                                             sizeof(fuse_ops), session);
+    if (!session->fuse_session) {
+        error_setg(errp, "Failed to set up FUSE session");
+        return -EIO;
+    }
+
+    ret = fuse_session_mount(session->fuse_session, mountpoint);
+    if (ret < 0) {
+        error_setg(errp, "Failed to mount FUSE session to export");
+        return -EIO;
+    }
+    session->mounted = true;
+
+    aio_set_fd_handler(blk_get_aio_context(session->blk),
+                       fuse_session_fd(session->fuse_session), true,
+                       read_from_fuse_session, NULL, NULL, session);
+    session->fd_handler_set_up = true;
+
+    return 0;
+}
+
+/**
+ * Callback to be invoked when the FUSE session FD can be read from.
+ * (This is basically the FUSE event loop.)
+ */
+static void read_from_fuse_session(void *opaque)
+{
+    BdrvFuseSession *session = opaque;
+    int ret;
+
+    ret = fuse_session_receive_buf(session->fuse_session, &session->fuse_buf);
+    if (ret < 0) {
+        return;
+    }
+
+    fuse_session_process_buf(session->fuse_session, &session->fuse_buf);
+}
+
+/**
+ * Drop a FUSE session (unmount it and free all associated resources).
+ * It is not removed from the @sessions hash table.
+ */
+static void close_fuse_session(BdrvFuseSession *session)
+{
+    if (!session) {
+        return;
+    }
+
+    if (session->fuse_session) {
+        if (session->mounted) {
+            fuse_session_unmount(session->fuse_session);
+        }
+        if (session->fd_handler_set_up) {
+            aio_set_fd_handler(blk_get_aio_context(session->blk),
+                               fuse_session_fd(session->fuse_session), true,
+                               NULL, NULL, NULL, NULL);
+        }
+        fuse_session_destroy(session->fuse_session);
+    }
+    blk_unref(session->blk);
+
+    g_free(session);
+}
+
+/**
+ * Wrapper around close_fuse_session() for use with
+ * g_hash_table_new_full().  This allows dropping sessions by removing
+ * them from the @sessions hash table.
+ */
+static void drop_fuse_session_from_hash_table(gpointer value)
+{
+    return close_fuse_session(value);
+}
+
+
+/**
+ * Check whether @path points to a regular file.  If not, put an
+ * appropriate message into *errp.
+ */
+static bool is_regular_file(const char *path, Error **errp)
+{
+    struct stat statbuf;
+    int ret;
+
+    ret = stat(path, &statbuf);
+    if (ret < 0) {
+        error_setg_errno(errp, errno, "Failed to stat '%s'", path);
+        return false;
+    }
+
+    if (!S_ISREG(statbuf.st_mode)) {
+        error_setg(errp, "'%s' is not a regular file", path);
+        return false;
+    }
+
+    return true;
+}
+
+static const struct fuse_lowlevel_ops fuse_ops = {
+};
diff --git a/include/block/fuse.h b/include/block/fuse.h
new file mode 100644
index 0000000000..1d24dded50
--- /dev/null
+++ b/include/block/fuse.h
@@ -0,0 +1,24 @@ 
+/*
+ * Present a block device as a raw image through FUSE
+ *
+ * Copyright (c) 2019 Max Reitz <mreitz@redhat.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; under version 2 or later of the License.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef BLOCK_FUSE_H
+#define BLOCK_FUSE_H
+
+void fuse_export_close_all(void);
+
+#endif
diff --git a/qapi/block.json b/qapi/block.json
index 145c268bb6..03f8d1b537 100644
--- a/qapi/block.json
+++ b/qapi/block.json
@@ -317,6 +317,29 @@ 
 ##
 { 'command': 'nbd-server-stop' }
 
+##
+# @fuse-export-add:
+#
+# Exports a block graph node on some (file) mountpoint as a raw image.
+#
+# @node-name: Node to be exported
+#
+# @mountpoint: Path on which to export the block device via FUSE.
+#              This must point to an existing regular file.
+#
+# @writable: Whether clients should be able to write to the block
+#            device via the FUSE export. (default: false)
+#
+# Since: 5.0
+##
+{ 'command': 'fuse-export-add',
+  'data': {
+      'node-name': 'str',
+      'mountpoint': 'str',
+      '*writable': 'bool'
+  },
+  'if': 'defined(CONFIG_FUSE)' }
+
 ##
 # @DEVICE_TRAY_MOVED:
 #